Lecture Notes in Mathematics Editors: A. Dold, Heidelberg E Takens, Groningen B. Teissier, Paris
1730
Springer Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Singapore Tokyo
Siegfried Graf Harald Luschgy
Foundations of Quantization for Probability Distributions
~ Springer
Authors Siegfried Graf Faculty for Mathematics and Computer Science University of Passau 94030 Passau, Germany E-mail: graf@ fmi.uni-passau.de Harald Luschgy FB IV, Mathematics University of Trier 54286 Trier, Germany E-maih
[email protected]
Cataloging-in-Publication Data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Graf, Siegfried: Foundations of quantization for probability distributions / Siegfried G r a f ; Harald Luschgy. - Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; Milan ; Paris ; Singapore ; Tokyo : Springer, 2000 (Lecture notes in mathematics ; 1730) ISBN 3-540-67394-6
Mathematics Subject Classification (2000): 60Exx, 62H30, 28A80, 90B05, 94A29 ISSN 0075- 8434 ISBN 3-540-67394-6 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag is a company in the BertelsmannSpringer publishing group. © Springer-Verlag Berlin Heidelberg 2000 Printed in Germany The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready TEX output by the author Printed on acid-free paper SPIN: 10724973 41/3143/du
543210
Contents
List of Figures
VIII
List of Tables
IX
Introduction
1
I
7
General 1
2
Voronoi partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.1
General norms . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.2
Euclidean norms
. . . . . . . . . . . . . . . . . . . . . . . . .
C e n t e r s a n d m o m e n t s of p r o b a b i l i t y d i s t r i b u t i o n s
16
...........
20
2.1
U n i q u e n e s s a n d c h a r a c t e r i z a t i o n of c e n t e r s . . . . . . . . . . .
20
2.2
M o m e n t s of b a l l s
. . . . . . . . . . . . . . . . . . . . . . . . .
26
. . . . . . . . . . . . . . . . . . . . . . . .
30
3
The quantization problem
4
B a s i c p r o p e r t i e s of o p t i m a l q u a n t i z e r s . . . . . . . . . . . . . . . . . .
37
4.1
Stationarity and existence
37
4.2
T h e f u n c t i o n a l V,~,r . . . . . . . . . . . . . . . . . . . . . . . .
48
4.3
Q u a n t i z a t i o n e r r o r for b a l l p a c k i n g s . . . . . . . . . . . . . . .
50
4.4
Examples
52
4.5
Stability properties and empirical versions
5
II
properties of the quantization for probability distributions
....................
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
U n i q u e n e s s a n d o p t i m a l i t y in o n e d i m e n s i o n
...........
..............
57 64
5.1
Uniqueness
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
5.2
Optimal Quantizers . . . . . . . . . . . . . . . . . . . . . . . .
66
Asymptotic quantization for nonsingular probability distributions
77
6
77
A s y m p t o t i c s for t h e q u a n t i z a t i o n e r r o r
.................
vi
Contents ?
8
9
10
Asymptotically optimal quantizers ....................
93
7.1
Mixtures and partitions
93
7.2
Empirical measures . . . . . . . . . . . . . . . . . . . . . . . .
96
7.3
A s y m p t o t i c o p t i m a l i t y in o n e d i m e n s i o n
99
7.4
Product quantizers
.....................
............
. . . . . . . . . . . . . . . . . . . . . . . .
R e g u l a r q u a n t i z e r s a n d q u a n t i z a t i o n coefficients
............
. . . . . . . . . . . . . . . . . . . . . . . . .
12
106
8.1
B a l l lower b o u n d
8.2
Space-filling figures, r e g u l a r q u a n t i z e r s a n d u p p e r b o u n d s . . .
107
8.3
Lattice quantizers . . . . . . . . . . . . . . . . . . . . . . . . .
111
8.4
Q u a n t i z a t i o n coefficients of o n e - d i m e n s i o n a l d i s t r i b u t i o n s . . .
121
107
R a n d o m q u a n t i z e r s a n d q u a n t i z a t i o n coefficients . . . . . . . . . . . .
127
9.1
A s y m p t o t i c s for r a n d o m q u a n t i z e r s
127
9.2
Random quantizer upper bound .................
130
9.3
d-asymptotics and entropy ....................
132
...............
A s y m p t o t i c s for t h e c o v e r i n g r a d i u s . . . . . . . . . . . . . . . . . . .
137
10.1
Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . .
137
10.2
Asymptotic covering radius
141
10.3
C o v e r i n g r a d i u s of l a t t i c e s a n d b o u n d s
10.4
Stability properties and empirical versions
................... ............. ...........
III Asymptotic quantization for singular probability distributions 11
102
The quantization dimension
. . . . . . . . . . . . . . . . . . . . . . .
151
155 155
11.1
Definition and elementary properties
11.2
Comparison to the Hausdorif dimension
11.3
Comparison to the box dimension ................
158
11.4
Comparison to the rate distortion dimension ..........
161
R e g u l a r sets a n d m e a s u r e s of d i m e n s i o n D
..............
146
............
...............
12.1
Definition and examples
.....................
12.2
A s y m p t o t i c s for t h e q u a ~ t i z a t i o n e r r o r
.............
155 157
165 165 173
13
Rectifiable curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
180
14
Self-similar sets a n d m e a s u r e s
......................
190
14.1
......................
190
Basic notion and facts
Contents
vii 14.2
A n upper b o u n d for the quantization dimension . . . . . . . .
192
14.3
A lower bound for the quantization dimension . . . . . . . . .
195
14.4
The quantization dimension . . . . . . . . . . . . . . . . . . .
199
14.5
The quantization coefficient . . . . . . . . . . . . . . . . . . .
203
Appendix
Univariate distributions
209
Bibliography
215
Symbols
225
Index
229
List of Figures 1.1
Voronoi diagram of a finite set in R 2 with respect to t h e / p - n o r m for (a) p - - 2 and (b) p = 1 . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.2
Voronoi region and separator with respect to t h e / 1 - n o r m . . . . . . .
12
1.3
Voronoi region with respect t o / 2 - n o r m which is not a polyhedral set .
17
1.4
Unbounded Voronoi region with respect to t h e / 1 - n o r m generated by an interior point of conv (~ . . . . . . . . . . . . . . . . . . . . . . . .
18
C t ( P ) and Cr(P),r > 1, with respect to t h e / ~ - n o r m for a discrete probability P with two supporting points . . . . . . . . . . . . . . . .
26
3.1
Quantization scheme . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
4.1
2-optimal centers of order 1 with respect to the I~ norm
4.2
Square quantizer for U([0, 1]2) . . . . . . . . . . . . . . . . . . . . . .
52
4.3
3- and 4-stationaxy sets of centers for P = N2 (0,/2) of order r = 2 and Voronoi diagrams with respect to t h e / 2 - n o r m . . . . . . . . . . . . .
58
5.1
2-optimal centers of order 2
65
8.1
Tesselation of [0, 1]2 into m -- 6 regular hexagons and a boundary region, n ----10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
109
Voronoi region W(01A) with respect to t h e / 1 - n o r m for a nonadmissible lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
113
Voronoi region W(0[A) with respect to t h e / 2 - n o r m for the hexagonal lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
116
8.4
Truncated octahedron
..........................
119
8.5
Densities of hyper-exponential distributions P = H(a, b) with variance equal to one and Q2(P) :- 1.8470 (top), Q2(P) -- 3.3106 (center), Q2(P) -- 8.1000 (bottom) . . . . . . . . . . . . . . . . . . . . . . . . .
122
2.1
8.2 8.3
.......
.......................
41
List of Tables 5.1
n - o p t i m a l centers a n d n - t h q u a n t i z a t i o n error for the n o r m a l distribution N(0, 1) of order r = 2 . . . . . . . . . . . . . . . . . . . . . . . .
71
5.2
Logistic d i s t r i b u t i o n L ( ~ ) ,
71
5.3
D o u b l e e x p o n e n t i a l d i s t r i b u t i o n D E ( ~ 2 ) , r -- 2
5.4
E x p o n e n t i a l d i s t r i b u t i o n E ( 1 ) , r -- 2
..................
72
5.5
G a m m a d i s t r i b u t i o n F ( ~ , 2), r = 2
..................
72
5.6
Rayleigh d i s t r i b u t i o n W ( 4:~_~ , 2), r = 2
5.7
N o r m a l d i s t r i b u t i o n N ( 0 , ~), r = 1
5.8
Logistic d i s t r i b u t i o n L ( ~ )1,
5.9
Double exponential distribution D E ( I ) , r = 1 . . . . . . . . . . . . .
r = 2 .................... ............
................
72
73
...................
73
r -- 1 . . . . . . . . . . . . . . . . . . .
73
5.10 E x p o n e n t i a l d i s t r i b u t i o n E ( ~ )1,
r = 1 .................
74 74
5.11 G a m m a d i s t r i b u t i o n F(a, 2), a = 0 . 9 5 0 8 . . . , r -- 1 . . . . . . . . . . .
74
5.12 Rayleigh d i s t r i b u t i o n W(a, 2), a = 2 . 7 0 2 7 . . . , r -- 1 . . . . . . . . . .
75
7.1
P r o b a b i l i t y d i s t r i b u t i o n s Pr
.......................
98
7.2
Probability distributions (~
P) r ....................
99
7.3
r -- 2, V2(P) -- 1. Q u a n t i z a t i o n error for ~ - q u a n t i l e s (first line) a n d i n~-i-quantiles (second line) of P2, 1 < i < n . . . . . . . . . . . . . . .
7.4
r = 1, VI(P) -- 1. Q u a n t i z a t i o n error for ~ - q u a n t i l e s
102
(first line) a n d
n ~ v q u a n t i l e s (second line) of PI, 1 < i < n . . . . . . . . . . . . . . .
102
8.1
Q u a n t i z a t i o n coefficients . . . . . . . . . . . . . . . . . . . . . . . . .
123
8.2
r -- 2. Q u a n t i z a t i o n coefficients of d i s t r i b u t i o n s P w i t h V2(P) = 1 . .
124
8.3
r -- 1. Q u a n t i z a t i o n coefficients of d i s t r i b u t i o n s P with VI(P) = 1 . .
125
9.1
/2-norm, r = 2. Ball lower b o u n d a n d r a n d o m q u a n t i z e r u p p e r b o u n d for Q2([0,1] d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
131
List of Tables
X
9.2
/1-norm, r = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
132
9.3
Differential entropies. ~ = F I / F , "y =
9.4
Q u a n t i z a t i o n coefficients for p r o d u c t probability measures up to Qr ([0,1] 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Euler's constant
=- 0.5772 . . . .
136 136
Introduction The term "quantization" in the title originates in the theory of signal processing. It was used by electrical engineers starting in the late 40's. In this context quantization means a process of discretising signals and should not be mistaken for the same term in quantum physics. As a mathematical topic quantization for probability distributions concerns the best approximation of a d-dimensional probability distribution P by a discrete probability with a given number n of supporting points or in other words, the best approximation of a d-dimensional random vector X with distribution P by a random vector Y with at most n values in its image. It turns out that for the error measures used in this book there is always a best approximation of the form f ( X ) , a "quantized version of X". The quantization problem can be rephrased as a partition problem of the underlying space which explains the term quantization. Much of the early attention in the engineering and statistical literature was concentrated on the one-dimensional quantization problem. See Bennett (1948), Panter and Dire (1951), Lloyd's 1957 paper (published 1982), Dalenius (1950), and Cox (1957). Steinhaus (1956) was apparently the first who explicitly dealt with the problem and formulated it for general (3-dimensional) spaces. Since then quantization occurred in various scientific fields, for instance • Information theory (signal compression): Shannon (1959), Gersho and Gray (1992) • Cluster analysis (quantization of empirical measures), pattern recognition, speech recognition: Anderberg (1973), Bock (1974), Diday and Simon (1976), Tou and Gonzales (1974) • Numerical integration: Pages (1997) • Stochastic processes (sampling design): Bucklew and Cambanis (1988), Benhenni and Cambanis (1996) • Mathematical models in economics (optimal location of service centers): Boltob~s (1972,1973) The aim of the present book is to describe the mathematical theory underlying the different applications of quantization. The emphasis is on absolutely continuous as
2
Introduction
well as on singular (continuous) distributions on R d. In the nonsingular case we present a rigorous treatment of known results in quantization including various new aspects while the results for singular distributions seem to be completely new. In more detail we consider the following problem. We concentrate on norm-based error measures. Let II II be a norm on R g and 1 _< r < oo. Define the Wasserstein-Kantorovitch Lr-metric p~ for probabilities/)1,/>2 by P~(P1,/°2) = inf
{(f
[Ix - Y[r d#(x, y)
: # probability on R d x R a with
marginals P1 and P2 } and the (minimal) quantization error of a given probability P and n c N by
V,~,~(P) =
inf {p~(P,Q)~: [ supp(Q)[ _< n}
= inf { E IIx
-
S(X)ll~ : S: R ~ -* R a measurable,
Is (R )I _< n}.
For an optimal quantizing rule f , i.e. f attaining the inf, the domains of constancy provide P-almost surely a Voronoi partition of R a with respect to its respective values. As a consequence the quantization problem is also equivalent to the n-centers problem, which requires finding a set of n elements a which minimizes the expression
E ~nllX whose minimum value equals
V,~,~(P) =
V~,~(P), that
-
all"
is
inf{E ~ n l l X - allr: a C a d, I~1 -< n).
In Chapter I we present general properties of the quantization problem for a fixed number n of quantizing levels. We discuss existence of optimal quantizers, necessary conditions for optimality, and a sufficient conditon for uniqueness in the onedimensional case. Under this uniqueness condition it is easy to find numerically optimal quantizers for P in the one-dimensional case. However, for dimensions d _> 2 it is difficult even for small n to determine optimal quantizers. This is caused (among other things) by the fact that the minimization function (al,. • • , an) ~-~ E min H X - aiH r is l
(typically) nonconvex. As examples we consider uniform distributions on ball packings and spherical distributions. Furthermore, we prove stability properties which can be applied to the empirical analysis of the quantization problem. Chapters II and III focuse on the asymptotic behaviour of the quantization problem when the number n of quantizing levels tends to infinity. Chapter II starts with the investigation of the order of convergence to zero for the sequence of quantization errors (Vu,r(P))n_>l for probabilities P on R d whose absolutely continuous part does not vanish. It is shown that the limit of the sequence (nzV,~,r(P))u>_lexists in (0, co), provided a certain moment condition is satisfied. The limit
Qr(P)
= lim
nr/dv,~r(P)
Introduction
3
is called r-th quantization coefficient and can be expressed in terms of the r-th quantization coefficient
Qr([o, 1] = dimoo
(u
= inf n"/aV,~,r(U ([0, lld)) n>l of the uniform distribution on the unit cube of R a and the density of the absolutely continuous part of P with respect to Lebesgue measure. Quantization coefficients provide interesting parameters for probability distributions. They can be evaluated for univariate distributions and some of them also for multivariate distributions. Fundamental work is due to Fejes Tdth (1959) and Zador (1963). Next we define asymptotically optimal sequences of quantizers and sets of centers for nonsingular probability distributions P and investigate their properties. It is proved that the empirical measures corresponding to asymptotically optimal sets of centers converge weakly to a probability on R d which is explicitly given using P. Furthermore, the asymptotic performance of certain classes of quantizers is compared to that of (asymptotically) optimal quantizers. In particular, we consider regular quantizers which are based on space-filling figures in R d, lattice quantizers, product quantizers, and random quantizers. The results provide bounds for the quantization coefficients Q~ ([0,1]a). All these considerations concern the ease 1 _< r < +oo and arbitrary norms on R e. The rest of the chapter is devoted to the study of similar results for a geometric covering problem which corresponds to the case r = c~. Here the quantization error of a probability P (with compact support) and n c N is defined to be e~,~(P)
= = =
inf{p~(P,Q): Isupp(Q)t < n} inf{esssup IIX - f ( x ) l l : If(Ra)l _< n} infacRd supxesupp(p) min~e~llx - all , lal
where Poo denotes the Loo-minimal metric, and coincides with the covering radius of the most economical covering of supp(P) by at most n balls of equal radius, that is e~,oo(P) = aca dinf min { s > 0 : _ I~l<__n
U B(a, s ) D
supp(P)}.
ae~
The limit of the sequence (nl/ae,~,,,o(P)),,>_:exists in (0, c~) provided supp(P) is compact Jordan measurable with positive (d-dimensional) volume. The limit
Qoo(supp(P)) = lim
nl/%,,oo(P)
is called covering coefficient or quantization coefficient of order oo and can be expressed in terms of the covering coefficient Qoo ([0, t] a) of the unit cube and the volume of supp(P). The results for r = c~ cast new light on the quantization problem for r < oo.
4
Introduction
Chapter III deals with the asymptotic behaviour of the quantization error for probabilities P on R d which are singular with respect to Lebesgue measure. Following Zador (1982) we introduce the concept of quantization dimension of order r. For r E [1, +col define fV~,r(P) 1/r if 1 _< r < co,
en r(P)
"l
'
[e.,oo(P)
i f r = co.
and the quantization dimension of order r log n
Dr(P) = limo° iloge~,r I if this limit exists. We compare this concept of dimension to several concepts of dimension which are used in fractal geometry or information theory, like Hausdorff dimension, box dimension, and rate distortion dimension. Then we consider the class of regular probabilities of dimension D on R d, where D is a non-negative real number. A probability P on R a is regular of dimension D if P has compact support and there is a constant c > 0 so t h a t
lSD <_ P(B(x, s)) <_cs D C
for all balls B(x, s) whose center lies in the support of P and whose radius s is smaller then a certain value So. Examples of this type of measures are the normalized Lebesgue measure on a convex compact set, the normalized surface measure on a convex compact set or a smooth compact manifold, and the normalized Hausdorff measure on certain self-similar sets. For each regular probability P of dimension D the quantization dimension of order r, r E [1, +co], is proved to be D, and moreover
0 < liminfne~,r(P) D < limsupne~,r(P) D < +co. ~--~ 00
Here en,r(P) is defined using an a r b i t r a r y norm on ]Rd. For t h e / 2 - n o r m on ~d a n d P the normalized one-dimensional Hausdorff measure on a rectifiable curve in R d of length L we show t h a t the quantization dimension of order r, r E [1, +co], is one and t h a t lim .-+o¢
hen
r(P)
'
fQr([0, t])'/rL
if 1 < r < + c o ,
[Qoo([O, 1])L
if r = +co,
where Qr([O, 1]) = ~1-47 t 1 and Q~([0,1]) = ~. Finally we deal with self-similar probabilities on R d. A probability P on R d is self-similar if there are an N C N, N > 2, contracting similarity transformations $1,... , SN of R d, and a probability vector ( P t , . . . P~¢) with N
P =~ i=1
p~P o S~-t.
Introduction
5
P satisfies the strong separation property if the $1,..., Sly above can be chosen to satisfy Si(supp(P)) 71Sj(supp(P)) = 0 for i ¢ j. If ( s l , . . . , sN) are the contractions numbers corresponding to ($1,... , S~) then the similarity dimension is the unique D E [0, +co) with s~ + . . . + s D = 1. If the probability vector (pl,... ,ply) equals ( s ~ , . . . , s D) then the corresponding selfsimilar probability P equals the normalized D-dimensional Hansdorff measure on the support of P. If, in additon, P satisfies the strong separation condition, then the quantization dimension Dr(P) of order r equals D for all r E [1, +co] and, moreover, 0 < liminfne~r < limsupneDr < +co. ~--+00
P
--
n--+O0
If (pl,... ,PN) ¢ ( s D , - - - , S° ) and the strong separation condition holds then the quantization dimension Dr(P) of the corresponding P satisfies N
=
1
i=1
and D~(P) < Dr(P) if r < t. Still, for every r E [1, +co],
0 < liminfn~,r(P) DT(P)< limsupnen,r(P) DT(F) < +c¢. I'1.-+ o o
Thus, self-similar probabilities constitute a class of probabilities for which the quantization dimensions of different orders do not all agree. It remains an open problem for which probabilities lim nen,r(P) °~(e) exists, but it can be shown that for the classical n--~oo
Cantor distribution this limit does not exist if r -- 2. In the present book we do not intend to give a complete overview over the large subject of quantization. We will focus on the quantization problem as stated earlier, the so-called fixed rate quantization problem, and develop the underlying theory in a mathematically rigorous way. For a comprehensive recent survey of the theory of quantization including its historical development we refer the reader to the article of Gray and Neuhoff (1998). This article also contains an extensive list of papers published in electrical engineering journals on the subject. A c k n o w l e d g e m e n t . Helpful comments by W. Quebbemann and F. Fehringer at an early stage of the project are gratefully acknowledged. Thanks are due to H. Strasser and M. Scheutzow for having invited the authors to give some lectures on quantization at the Wirtschaftsuniversit~t Wien in May 1997 and the Technische Universit£t Berlin in August 1997. We further wish to thank S. Stark for his help in preparing the figures and doing the numerical computations.
Chapter I General properties of the quantization for probability distributions In this chapter we introduce the quantization problem for probability distributions on R d with norm-based distortion measure and derive the basic features of optimal quantizers. The investigation of optimal quantizers requires the concepts of Voronoi diagrams and Voronoi partitions and the concepts of centers and moments of probability distributions on R d. This chapter also serves to develop the properties of these notions as needed for the quantization problem.
1
Voronoi partitions
Voronoi partitions of R d will play a central role as optimal quantizing partitions for probability distributions on R d. In this section we introduce Voronoi regions, Voronoi diagrams and Voronoi partitions with respect to discrete point sets and describe some of their basic properties.
1.1
General
norms
Consider a nonempty subset c~ of R d. Throughout a is assumed to be locally finite in the sense that the number of points of a within any bounded subset of R d is finite. This implies that a is countable and closed. The quantization problem is associated with finite point sets. However, in Chapter II we deal with lattices which are infinite sets of regulary placed points. Let ][ [[ denote any norm on R d. The Voronoi region generated by a c a is defined
8
L General properties of the quantization for probability distributions
by (1.1)
W(alo~ ) -- {= • R " : II= - all -- m i n [ix - bll } bee
and {W(a[a): a E a } is called the V o r o n o i d i a g r a m of a; see Figure 1.1. Thus W ( a l a ) consists of all points x such that a is a nearest point to x in a. The dependence of the Voronoi regions (and of several other objects occuring later) on the norm is not explicitely indicated. Most common norms a r e / f n o r m s given by [[x[[ = (~-~ [xi[P) 1/p for 1 _< p < co and [[x[[ = ma~xl<_i<_d [xi[ for p = oo.
(a)
(b)
Figure 1.1: Voronoi d i a g r a m of a finite set in R 2 with respect to t h e / p - n o r m for (a) p - - 2 and (b) p = 1 A family A of subsets of R d is called l o c a l l y f i n i t e if the number of sets in A intersecting any bounded subset of R ~ is finite. If x E R d and A is a nonempty subset of R d, the distance from x to A is
d(x,A) = inf I[x - all. aEA
The closed ball with center a E R d and radius r _> 0 is denoted by
B(a,r)
= {x e Rd:
IIz-all
1.1 P r o p o s i t i o n The Voronoi diagram (W(a]c~) : a e a } is a IocalIy ~nite covering o f R d. Proof Let x E R d. Since locally finite subsets of R d are closed, there exists a E ~ such t h a t IIx - all -- d(x, ~) and thus x E W(a[cQ. This proves t h a t the Voronoi d i a g r a m is a
1. Voronoi partitions
9
covering of R a, that is
[ J W ( a l ~ ) = R a. aC~
Moreover, let 7 = {a ~ a : W(a[c~) n B ( 0 , s) # 0} with s > minlJall. aEa
Choose
b E c~ n B(0, s). if a ~ 7, then there exists x e B(0, s) such that IIx - all -< Ilx - bl[ implying that Ila[I _ IIz - bll + [Ixll < 211xll + [Ib[I _< 3s. This gives 7 c B(O, 3s) and hence, 7 is finite. Thus the Voronoi diagram is locally finite. [] The o p e n V o r o n o i r e g i o n generated by a E ~ is defined by (1.2)
Wo(al,~ ) = {x e Rd: IIx -- all <
rain IIx - bll }.
These regions axe pairwise disjoint but do not provide a covering of R a. A Borel measurable partition (A~ : a E a} of R d is called V o r o n o i p a r t i t i o n of R a with respect to (~ (and P ) if (1.3)
Aa C W(alex )
(P-a.s.) for every a E c~,
where P denotes a Borel probability measure on R d. Proposition 1.1 shows that Voronoi partitions o f R d with respect to ~ do exist and are locally finite. Furthermore, the elements A~ of a Voronoi partition satisfy
Wo(aice ) c A~ for every a E 4. The Voronoi regions are closed and star-shaped relative to their generator point, that is, the line segment joining any x E W(alce ) and the point a is contained in W(aice ). In case d = 1, where the underlying norm is throughout the absolute value, Voronoi regions are closed intervals. For a, b E R a, let (1.4)
H(a,b) = {x E ~ta: IIx - all _< IIx - bll}.
be the Leibnitz "halfspace". Then H(a, b) =
(1.5)
W(al{a, b}) and
W(al~) = [-1 H(a, b). bern
1.2 P r o p o s i t i o n (a) W(al~ ) is closed and star-shaped relative to a.
(b) int W(alee ) = N int H(a, b) and Wo(alc~ ) is an open subset ofint W(alce ) which bEce
is star-shaped relative to a. In particular, a E int W(alce) .
10 (c) a W ( ~ l a )
I. Genera] properties of the quantization for probability distributions = U
OH(a,b)
n W(ala).
be-
Proof (a) Let x E W(ala) and 0 < s < 1. The point y = sx + (1 - s)a on the line segment joining a and x satisfies IIx - yll + Ily - all = IIx - all. Since t]x - all < IIx - btt < IIx - Yll + IlY - bll for every b E a, we obtain y E W(aia). This shows that W(ala) is star-shaped relative to a. The Voronoi region is obviously closed. (b) The region W0(aic~) = {x E R a : i]x _ ai] < d(x,a \ {a})} is open because the distance function d(-, A) is continuous. As in the proof of (a) one shows that W0(aia) is star-shaped relative to the point a. Furthermore, we have int W(ala) C
~ int H(a, b) C W ( a l a ). ben
It remains to show that Abe, int H(a, b) is open. This is clearly true if ~ is finite. So assume a is not finite. Let x E Abe, int H(a, b) and set 7 = {b C o~ : lix-all = IIx-bI]}. Since 7 c ~ A B(x, lix - all),3' is finite by the local finiteness of a. Therefore,
N i n t H ( a , b ) A W 0 ( a i ( a \ 7 ) U{a}) be'y
is an open subset of Abe, int H(a, b) containing x. (c) follows immediately from (b). In fact, we have
o W ( a l ~ ) -- W(al~) n (int W(al~)) c
= U(intH(a,b))C n W(ala) ben
= U OH(a, b) n W(~l~). ben
[] By the preceding proposition, one can find Voronoi partitions of Ra with respect to a consisting of Borel sets A~ which are star-shaped relative to a E (~. In fact, let a = {al, a z , . . . } be an enumeration of a and set, for instance, A1 = W(alla), Ak = W(aela) \ U W(aj I°~) j
= W(akla) nWo(akl(al,... ,ak}),
k >_ 2.
1. Voronoi partitions
11
Here ties are broken in favour of smaller indices. Some difficulties arise from the fact that the intersection of different Voronoi regions may have interior points. This corresponds to the fact that the separator of two points may have interior points; see the subsequent Example 1.4. However, if the underlying norm is strictly convex this cannot happen. For a, b • R d, a ~ b, the s e p a r a t o r is defined by (1.6)
S(a, b) = {x • R'~: IIx - all = II x - bll}-
The separator contains the midpoint (a + b)/2 but no other point from the line through a and b. The norm II II is said to be strictly convex if Ilxll = IlYll -- 1, x ¢ y implies Ilsx + (1 - s)yll < 1 for every s • (0, 1). The/p-norms are strictly convex for 1 < p < oo, while the/1-norm and the/oo-norm are not strictly convex. 1.3 P r o p o s i t i o n ( S t r i c t l y c o n v e x n o r m s )
Suppose the underlying norm is strictly convex. (a) int W ( a l a ) -- W0(al~).
(b) cOW(ala) : U S(a, b) n W(alo~), bEa b#=a
(c) W(al~)=clWo(ala). Proof The proof is based on the following observation. For b • a \ {a}, we have
(1.7)
~
÷ (1 - s ) a • {y • R '~ : Ily - all < Ily - bll}
for every x • H(a, b), 0 < s < 1. To verify this, let y = s x + ( 1 - s ) a w i t h O < s < 1. Notice that y • H(a,b) by Proposition 1.2 (a) and ~(y - a) -- x - a. Assume IlY - all = Ily - bll- From the strict convexity of the norm it follows that Ily - a - s(b - a)ll = I1(1 - s)(y
-
a) + s(y - b)ll < IlY - all-
This gives
I]x- bll = Ill(y-
a) - (b-
a)l I
= lily - a - 8(b - a)ll < ~llY - all = llx - all _< llx - bll, a contradiction.
12
I. General properties of the quantization for probability distributions
(a) In view of Proposition 1.2 (b) we have to show for b • a \ (a} intH(a,b) C {x • Rd: IIx - all <
IIx -
b{I}.
Let x • int H(a, b) with x ¢ a and choose e > 0 such that B ( x , c) C H(a, b). Let t = 1 + c/]]x - ai] and z = a + t(x - a), Then IIz -
xll =
II(t -
1)x
-
(t -
1)~11 = c
implying z • H(a, b). Since x -- -~z + (1 - ~)a 1 and 7 < 1, it follows from (1.7) that
IIz - all < IIx - bll. (b) follows immediately from (a) and Proposition 1.2 (c). []
(c) follows from (1.7). The preceding proposition implies int W(al~ ) N int W(b[~)
(1.8)
W(al~) n W(bl~) 0W(al~)
= 0, a, b C ~ , a ¢ b , = OW(al(~ ) MOW(b]~), a,b C ~, a ~ b,
=
U w(bl~) n w( al~) , a • ~. bEc~
b•a
provided the underlying norm is strictly convex. The following example shows that all assertions of Proposition 1.3 (and of (1.8)) can fail if [[ ]] is an arbitrary norm.
/
iiiiiiiii
a
xI
Figure 1.2: Voronoi region and separator with respect to the/1-norm
1.4 Example Let the underlying norm on R 2 be the /1-norm. For a = (1, 0) and b -- (0, 1), we obtain H(a,b)={xER
2:x2<0}U(xER
2:xl_>I}U(xER
and
S(a, b) = ( - ~ , 0] 5 U {s(1, 1): 0 < s < 1} U [1, ~)~.
2:x2_<xl}
1. Voronoi partitions
13
Thus H(a, b) is the union of three halfspaces and the separator is the disjoint union of two quarterspaces and a line segment; see Figure 1.2. Clearly, all assertions of Proposition 1.3 fail for (~ = (a, b}. Under various conditions, Voronoi regions are geometrically regular. Let [A[ denote the cardinality of a set A and let Ad denote the d-dimensional Lebesgue measure. 1.5 T h e o r e m ( B o u n d a r y t h e o r e m )
Each of the following conditions implies Ad(OW(alc~)) = O, a • a. (i) The underlying norm is strictly convex. Oi) The underlying norm is the/p-norm with 1 <_p <_ oo. (iii) d = 2.
Proof According to Proposition 1.2 (c) it is sufficient to show that A~(OH(a, b)) = 0 for a ¢ b. Since H(a, b) = H(a - b, 0) + b, we may assume without loss of generality that b = 0. Set
A = H(a,O) c = {x • R d : Ilxll < [ I x - all}. Then OA = OH(a, 0) and by Proposition 1.2(b), A is open and star-shaped relative to0•A. F o r x • R d,let
I(x) = {t > 0 : tx • OA}. Since A and cl(A) are star-shaped, I(x) is a closed interval (possibly empty, possibly degenerate). Moreover, since
Ad(OA) = fs ~+ rd-lloA(rx) dr da(x) = fs f1(x) rd-l dr da(x), where S = (x • R d : IIx[[~2= 1} denotes the unit/2-sphere in R d and a its surface measure, we find that Ad(OA) = 0 if and only if (1.9)
< 1
for a-almost all x • S. Now assume (i). By (1.7), sy • A for every y • OA, 0 < s < 1. Therefore condition (1.9) is satisfied for every x • R d.
L General p r o p e r t i e s o f the q u a n t i z a t i o n for p r o b a b i l i t y d i s t r i b u t i o n s
14
Assume (ii). Let p = 1 or p = oo. Then H ( a , 0) is a finite union of polyhedral sets implying Ad(OH(a, 0)) = 0. If 1 < p < 0% then t h e / p - n o r m is strictly convex. Assume (iii). We show t h a t (1.9) holds for every x. Assume on the contrary t h a t there exists a point x and t > 1 such t h a t {x, t x } C OA. Since OA C S ( a , 0), the convex function g : R -+ N given by g(s) = IIx - sail satisfies g(o) = g ( 1 / t ) = ~(1) = INI. Hence we get g(s) = [[xl[ for every s e [0, 11.
This yields { s x : s >_ 1} C S ( a , 0 ) and thus (1.10)
{ s x : s >_ r} C S ( r a , 0) for every r _> 0.
By assumption there exists a sequence (Y~)~_>I in A such t h a t lirn~_,~ y~ = tx. Note t h a t z and a are linearly independent because the separator S ( a , 0) contains a / 2 but no other point from the line { s a : s E R}. So, using d = 2, we have Yn = sna + t n x , sn, tn E ~.
Then s~ --+ 0 and t~ --+ t > 1. Choose no E N such t h a t
Is~l < 1,
t,~>_-s,, tn - > i 1 - sn -
1 0 < - < 1, sn + t,~
for every n >_ no.
We claim t h a t s,~ < 0 for n >_ no. We have 1 a+--(yr~-a)=(1---)a+ 1-s,~" "
1
sn
1 - s,~"
1 - s,~
a+
t~ 1 - s,~
x--
tn --x. 1 - sn
Therefore, it follows from (1.10) that 1 a + l_---L-~n(y,~ - a) • S ( a , O) and hence 1
--(y,~ i
--
sn
- a) e S ( - a , o).
Thus the convex function hn : N --+ N given by hn(s) = IlYn - a + sail satisfies h~(0) = h~(1 - s~) =
Ily~ - all.
This implies h,~(s) h,~(s)
<_ I l y , ~ - a l l , O < s < l - s , ~ , >_ I l y ~ - a l l , s >_ l - s,~.
1. Voronoi partitions
15
Since hn(1) = Ily-II < Ily= - all, one gets 1 < 1 - sn, n > no and our claim is proved. Since t,~ > - s ~ > O, it follows from (1.10) that tn s~ + t~ x E S ( -
s~ a, O), n > no. Sn + tn
Hence l i - - y ,1, l l s,~ + t .
= I I - - xe~ s,~ + t,~
+
s.@t_ _,.
_,.
.e. JRxll, s , ~ + t,~"
all -
~ > no. -
Moreover, again by (1.10) we have - - z Sn + tn
e S(
n+
a, O)
and therefore 1 II~y=-all=lls.+t=
t=
x
t= s~+
all -
t=
s. T t l Izll, n___no.
This yields
1
s= + t~ y~ E S(a, 0), n > no,
a contradiction in view of y~ E A and 0 < 1/(s~ + t~) < 1.
[]
For a Borel subset C of R a and a Borel measure # on R a, a t t - t e s s e l a t i o n of C is a countable covering {Cn : n E N} of C by Borel subsets C~ C C such that tt(Cn N Cm) = 0 for n # m. A/~d-tesselation is simply called t e s s e l a t i o n . In view of Propositions 1.1 and 1.2 the Voronoi diagram of a is a #-tesselation of R d if and only if #(Rd \ U Wo(ala)) -- 0; aC~t
it is a tesselation of R a if and only if int W(alc~ ) A int W ( b l ~ ) = 0 for every a, b E (~, a ¢ b, Ad(OW(aloO) = 0 for every a E (~.
We know from the Example 1.4 that the Voronoi diagram of a, in general does not provide a tesselation of the space R d. (Notice that the Voronoi diagrams in Figure 1.1 provide tesselations.) According to (1.8) and Theorem 1.5, the Voronoi diagram of (~ with respect to a strictly convex norm is a tesselation of R a. Two further properties of Voronoi regions concerning neighbouring regions and equivariance under similarity transformations are of interest. A bijective mapping T : ]Rd --+ R d is called s i m i l a r i t y t r a n s f o r m a t i o n if there exists c E (0, co), the s c a l i n g n u m b e r , such that HTx - Tyll = c]lx - Yll for every x , y E R d. Let T ( a ) = { T a : a E a}; T((~) is locally finite.
16
L Genera/properties o f the quantization for probability distributions
1.6 L e m m a Let T : R ~ --+ R d be a similarity transformation. Then W(TaIT(~)) = TW(a[a).
Proof Obvious.
[]
Voronoi regions are determined by their neighbouring regions in the following sense. 1.7 L e m m a For a E c~, let
= {b • ~ : W(bt~ ) n W(al~ ) ¢ 0}. Then W(al~ ) = W(alfl ).
Proof Clearly we have W(al~) c W(al~). To prove the converse inclusion, let x • R a \ W(alc~) and consider the line segment {y, : s • [0, 1]} joining a and x, where y, = sx + (1 - s)a. Since W(a[a) is closed, a • int W(ala), and W(ala) is star-shaped relative to a, we obtain
W(al~ ) n ( y , : s • [0,11} = { y , : s • [0,s0]} for some 0 < So < 1. By the local finiteness of the Voronoi diagram, -y = (c •
W(cl
) n {y,: s • (so, 1]} ¢
0}
is finite. Since U W(cl~) is closed, one gets eC~'
{y,: s • [So, 11} C U W(clol)" cC7
Choose b • V with Yso • W(bl~). Then b • ~ and Yt • W(bl~) for some t • (So, 1]. Since Yt • W(al~), the point Yt satisfies Yt ¢ W(aIZ). This implies x ¢ W(alfl ). [] Notice that bounded Voronoi regions have a finite number of neighbouring regions by the local finiteness of Voronoi diagrams. 1.2
Euclidean
norms
Voronoi regions with respect to euclidean norms exhibit some special features. Let ( , ) be any scalar product on R d and ]lxll = (x, x) 1/2. Then for a ¢ b, H(a, b) is the closed halfspace (1.11)
1
H(a,b) = {x E R d : ( a - b , x - ~(a + b ) ) > 0}
1. Voronoi partitions
17
bounded by the separating hyperplane (1.12)
s ( a , b) = { x e R d : (a - b, x -
+ b)) = 0}.
The hyperplane S(a, b) contains the midpoint (a ÷ b)/2 and is perpendicular (with respect to ( , ) ) to the line through a and b. Thus the Voronoi regions W(a[a) are convex, ff a is finite, then W(a]a) is a polyhedral set, that is, a finite intersection of closed halfspaces in R ~. In the sequel a (convex) polytope means a compact polyhedral set. By Lemma 1.7, bounded Voronoi regions are polytopes. The following example shows that, in general, unbounded Voronoi regions are not polyhedral sets. 1.8 E x a m p l e Let the norm on R 2 be the/2-norm. Consider the set ~ = {a, : n > 0} with a0 = (2, 0) and a , -- (0, n) for n > 1. Then ~ is locally finite and the points (4(n2q-n) q-l,nq-1),
n>l
are extreme points of W(a0]~); see Figure 1.3. Since polyhedral sets in R d have a finite number of extreme points, W(a0[c~) is not polyhedral.
a 5
a4
a3
a 2
a 1
Figure 1.3: Voronoi region with respect to/2-norm which is not a polyhedral set 1.9 R e m a r k
As indicated above, the Voronoi regions are convex in the euclidean case. It is an interesting fact that the convexity of Voronoi regions even characterizes euclidean norms. More precisely, if W(a]c~) is convex for every finite subset a of R d and every a E (~, then the underlying norm is euclidean. This is a classical result of Mann (1935). See also Gruber (1974). For euclidean norms, there is a simple characterization of boundedness of Voronoi regions. Denote by cony ~ the convex hull of a.
18
I. General properties of the quantization for probability distributions
1.10 Proposition Let IIxH ----(x,x) U2 for some scalar product ( , ) is bounded if and only if a E int cony a.
on R d and let a E c~. Then W(alc~ )
Proof For u E •d, u ~ 0, consider the halfline Lu = {a + su : s >_ 0} with initial point a. We have Lu C W(a[(~), that is, Ila + su - bll 2 = Ila - bll 2 + 2s(~,
a - b) + 8211ull 2
> [[a ÷ su - all 2 -- s211ull~ for every b e o~, s > 0 if and only if (u, a) > (u, b) for every b E a. Assume a E 0 c o n y (~. Then there passes a support hyperplane to conv a through a, that is, (u,a) > (u,b) for every b E a and some u E R d,u ¢ 0 (cf. Webster, 1994, Theorem 2.4.12). Hence, Lu C W(a[a) so that W(a[a) is not bounded. Now assume a E int cony a. Then for every u E ]Rd, u ¢ 0 there exists b E a such that (u, a) < (u, b). Therefore, W(a[a) does not contain any hairline with initial point a. This implies the boundedness of W(a[a) (cf. Webster, 1994, Theorem 2.5.1). [] Notice that for arbitrary norms the interior point condition for the generator point does not imply the boundedness of the corresponding Voronoi region. This is illustrated by the following example. See also Figure 1.1(b).
1.11 Example Let the underlying norm on R 2 be the/l-norm. Consider a -- ((0, 0), (0, - 1 ) , (2, 1), ( - 2 , 1)} and let a = (0,0). Then a E int cony a, but W(a[a) is unbounded since, for instance, the halfline (s(0,1) : s _~ 0} is contained in W(ala); see Figure 1.4.
Figure 1.4: Unbounded Voronoi region with respect to t h e / l - n o r m generated by an interior point of conv (~
1. Voronoi partitions
19
Notes For detailed treatments of Voronoi diagrams of finite point sets we refer to the notable book by Okabe et al. (1992), the review article by Aurenhammer (1991), and the book by Klein (1989). A discussion of random Voronoi tesselations with respect to the/2-norm may be found in Moiler (1994). Theorem 1.5 (iii) on the geometric regularity of Voronoi regions is certainly known but we are not aware of a reference. 1.12 C o n j e c t u r e The assertion of Theorem 1.5, that is, Ad(OW(ala)) = 0 for every a • a, holds for arbitrary norms and arbitrary dimensions.
20
2
I. General properties o f the quantization for probability distributions
Centers tions
and
moments
of probability
distribu-
We present some facts about centers and moments of probability distributions on R d needed in the sequel
2.1
Uniqueness
and
characterization
of centers
Let X = ( X b . . . , Xd) be a Rd-valued random variable with distribution P. before, II II denotes any norm on R d. Let 1 < r < co and assume throughout
As
EIIX[Ir < By a c e n t e r o f P o f o r d e r r we mean a point a C •d such that (2.1)
E I I X - all r -= inf E I I X - bllr. bCR~
Let C r ( P ) denote the set of all centers of P of order r. Centers of order 1 are usually called (spatial) medians. The r - t h ( a b s o l u t e ) m o m e n t o f P about the center is defined by (2.2)
V~(P) = inf E[[X - all r. aERd
We also write Vr(X) and Cr(X) instead of Vr(P) and Cr(P). For A • B ( R d) bounded with A4(A) > 0, define the n o r m a l i z e d r - t h m o m e n t A about the center by (2.3)
of
M r ( A ) = V~(U(A))
where U(A) denotes the uniform distribution on A. Normalization yields an important scaling invaxiance property. Recall that bijective isometries T : ~d __+ Rd with T(0) = 0 are linear (cf. Semadeni, 1971, L e m m a 7.8.3) and Ad is invariant under bijeetive isometries on R d, 2.1 L e m m a Let T : l~~ -~ R d be a similarity transformation with scaling number c > O.
(a) C r ( T ( X ) ) = T C ~ ( X ) , V~(T(X)) = crVr(X) . (b) M r ( T ( A ) ) = Mr(A), i r A • B ( R a) is bounded with Ad(A) > 0.
2. Centers and moments of probability distributions
21
Proof (a) is obvious. (b) If X is U(A)-distributed, then T(X) is U(T(A))-distributed. Prom (a) it follows that
V~(T(X)) crV~(X) = Mr(A). M r ( T ( A ) ) - M(T(A))~/d = (¢tM(A))~/~ [] Centers of order r are global minima of the function
(2.4)
¢r : R ~ -~ R+, Cr(a) = EI[X - air.
The function Cr is obviously convex and hence continuous. Therefore, local minima of Cr are global minima of ¢~. 2.2 L e m m a
The level set {¢r < c} is convex and compact for every c E R+. In particular, Cr(P) is convex compact and nonempty. Moreover,
cr(P) c B(0, 2(2EIIXIF) v~) and
CI(P) C B(0, 2EIIXII). Proof By convexity and continuity of Or, the level sets are convex and closed. Since
Ilall _ IIz - all + Ilxll < 2 max{llx - all, Ilzll} and hence
Ilalr _ 2r max{fix - air, IIxlF} < 2r(llx - all ~ + tlxll~), a , x c R d one obtains for a E (¢r -< c}
Ilalr ~ 2r(c + EIIXllr). Therefore,
{¢r ~ c} c B(0, 2(a + ElIXIF)l/r). In case r = i, we have {¢1 _< c} C B(0, c + EIIXII)). Choosing c = E[[X[[ r and c - Vr(P), respectively, gives the assertions.
[]
22
L General properties of the quantization for probabifity distributions
2.3 E x a m p l e (a) If P is symmetric (about the origin), then 0 E C~(P) and thus, V~(P) = E[[X[[ r. In fact, Cr(P) is symmetric by Lemma 2.1 (a) and from convexity of Cr(P) follows
o c c~(P). (b) For the /2-norm, we obtain C2(X) = { E X } and V2(X) = ~-~=1 VarX,, where VarXi denotes the variance of Xi. If the underlying norm is the /l-norm, then C I ( X ) = X~ Med(Xi), where Med(Xi) is the set of medians of the real random variable Xi. The center of a probability distribution need not be unique; think of the median of one-dimensional distributions. Conditions for the uniqueness of the center are derived in the following theorem. Condition (iii) is due to Milasevic and Ducharme (1987) (for euclidean norms) and Kemperman (1987).
2.4 Theorem (Uniqueness) Each of the following conditions implies IC~(P)[ = 1. (i) The underlying norm is strictly convex and r > 1. (ii) P ( S ( a , b)) < 1 for every a, b E R d, a ~ b, and r > 1. (iii) The underlying norm is strictly convex, r = 1, and P ( L ) < 1 for every line L C R d. Proof We show that ¢~ is strictly convex. This yields the assertion. Let a, b E R d, a ~ b, and0<s
I1~ -
(sa + (1 -
~)b)ll ~ =
-<
IIs(x - a) + (1 -
s)(x
(sllz
s)ll~
- all + (1 -
-
b)ll r - bll) ~
<_ sllx - alF + (1 - s)llx - b[]~,
x C R ~.
Let
A = {x e Rd: [Is(x-- a) + (1-- s ) ( x - - b)[ r = s l l x - all ~ + (1 - s ) l l x - bliP}. In case r > 1, A is contained in the separator S(a, b) since t ~-+ t r is strictly convex. If, additionally, the norm is strictly convex, then A -- 0. If r = 1 and the norm is strictly convex, then A is a subset of the line L through a and b given by L --- (ta + ( 1 - t)b : t c R}. To see this, let x E A and assume x ¢ b. By strict convexity of the norm, there exists t E R+ such that s(x - a) = t(1 - s)(x - b). This gives
x--
s s-t+st
ts - t .a + - - b s-t+st
2. Centers and moments of probability distributions
23
and hence x E L. Notice that s - t + st # 0 since otherwise, a = b. Therefore, under any of the above conditions we have P ( A ) < 1. This implies C r ( s a + (1 - s)b) < s ¢~(a) + (1 - s)¢~(b). [] Next, we characterize the centers of P by means of the derivative of the underlying norm. Let V+H ]](x,y) denote the one-sided directional derivative of the norm at x E R d in direction y E R ~ given by V+l I II(x,y)--- lira I I x + t Y l l - Ilxll t
t-+0+
The norm ]J Jl is said to be smooth if it is differentiable at every point x # 0. In the smooth case, the (two-sided) directional derivative exists at every x # 0 and coincides with
(vii II(x), y) =
lim
IIx +tyll-
Ilxll, y E R d,
t
t~0
d
where VII II(x) denotes the derivative of the norm at x and (x, y) = ~-]xiy~. 1
The /p-norms are smooth for 1 < p < c¢ while the /l-norm and loo-norm are not smooth. 2.5 L e m m a For a E R d, we have a E C~(P) f f a n d only if
llx- allr-~V+ll II(a-- x,y)dP(x)
>_ 0
for every y E R d. I f the underlying norm is smooth, this condition takes the form
f
llx-
alF-lVll II(a-
x) dP(x) = O, r > 1,
{x#a} < / V H]](a-x) de(x),y> {z~a}
<<_P({a})llyll
d
for every y E R d, r = 1. (Recall (z, y> = ~ z~y~.) 1
Proof
Notice first that a E Cr(P) if and only if V + ¢ r ( a , y) ___ 0 for every y E R d. This is a consequence of the convexity of Cr. Furthermore, Cr(a + ty) - ¢r(a) = f Ila - x + ty[F - [la - xlF dP(x). t J t
24
I. General properties of the quantization for probability distributions
The function g: R --~ 1~ given by g(t) = tla - z + tyll ~ is convex and thus satisfies g(0) - g ( - 1 ) < g(t) - g(0) < g(1) - g(0), 0 < t < 1 (cf. Webster, 1994, Theorem 5.1.1). Therefore, by Lebesgue's dominated convergence theorem V+¢~(a, y) = f V+J I II~(a - x, y) dP(x). d Since
V÷ll lit(x, ~)= rllxil~-lV÷ll li(x,y) this yields the first assertion. Now assume that the norm is smooth. Since V+H I1(0,y) = ilYll one gets
f lix -
allr-lv+li
I](a - x, y) dP(x)
:( f
f
(xCa}
Hx-allr-lllYHdP(x)
{z=a}
for every y E R a. This yields the second assertion.
[]
Remark
(a) Using the dual norm of Ilxll given by
IlxllD = sup{(x, y): Ilyll -< 1}, the above equivalent condition for a e R a to belong to C1 (P) in the smooth case means f
Vii II(a
x) dR(x)
D
<_ P({a}).
(b) Suppose the underlying norm is smooth. Then ¢~ is differentiable on R d for r > 1 while ¢1 is differentiable at every point a • R 4 with P((a}) = 0. The derivative is given by V¢~(a) = r /
[ I x - allr-lVll II(a- x) dP(x).
* ¢
{z~a}
The diameter of a nonempty bounded subset A of Rd is the number diam(A) = sup(Ha - bll: a, b • A}. Denote by supp(P) the topological support of P.
2. Centers and moments of probaM1ity distributions
25
2.6 L e m m a
(a) (Euclidean norms) Let ]]xl] = (x,x) 1/2 for some scalar product ( , Then
on ~d"
Cr(P) C cl conv(supp(P)).
(b) Suppose supp(P) is compact. Then sup
rain
seCt(P) zesupp(P)
[Ix - all _< diam(supp(P)).
Proof
(a) Let K -- clconv(supp(P)) and a ~ K. Then K and a can be strictly separated by a hyperplane H, that is, K and a lie on opposite open halfspaces determined by H. Denote by b the orthogonal projection of a onto H. For x • K, let y be the point on the line segment joining a and x which lies in H. Since (y - b, a - b) = 0, we obtain IlY - alas -- IlY - bll 2 + li b - all 2 > Ily - bll 2
and therefore, IIx - bil
_< <
Ilx - yll + Ily- bll IIx - vii + ]Iv - all - IIx - all.
This implies a ~ Cr(P). (b) Let a C R d such that d(a, supp(P)) > diam(supp(P)) and let y • supp(P). Then fi x - yll -< diam(supp(P)) < lax - all
for every x • supp(P). This implies a ~ Cr(P)
[]
The assertion of part (a) of the preceding lemma can fail ff f] 11is an arbitrary norm. This is exhibited by the following example. 2.7 E x a m p l e Let the underlying norm on R 2 be t h e / o o - n o r m . Consider P = ½(5(-1,0) + 5(1,0)), where 5~ is the point mass at x. Since P is symmetric about E X = (0, 0), this point belongs to Cr(P) and thus, Vr(P) = EIOX[]r = 1 for every 1 < r < oo. We find
{¢1 = 1} = {x • R2 : Ix~l + Ix~l < ]}, C~(P) = {¢~ = 1} = {s(0,1): - 1 < s < 1} for r > 1;
cl(P) =
see Figure 2.1. Clearly, the assertion of Lemma 2.6 does not hold for P.
26
L General properties of the quantization for probability distributions
Figure 2.1: CI(P) and C r ( P ) , r > 1, with respect to the loo-norm for a discrete probability P with two supporting points
2.2
M o m e n t s of balls
Balls have minimal moments for measures # which vanish on spheres,
i.e.
tz(OB(a,s)) = 0 for every a E R g and every s ~ 0. (Note t h a t OB(a,s) = ( x E R d : [Ix - a I [ ----s}.) This statement is meant in the following sense. 2.8 L e m m a
Let # be a Borel measure on R d that is finite on compact sets and vanishes on spheres. Then, for every bounded set A E B(R d) with #(A) > 0 and every a E R d there is an s >_ 0 with #(B(a, s)) = #(A). Moreover, for such an s, f [Ix - a[[r d#(x) > /
[Ix - at[r dl~(x).
*g
A
B(a,s)
In particular, we have for a E Cr(#(-IA))
V~(~(.IA)) >_ V~(U(.IB(a,s)), where ~(.[A) = ~(. n
A)/~(A).
Proof Since A is bounded there exists an So > 0 with A C B(a, So), hence 0 < #(A) _< # ( B ( a , so)) < co. Since the m a p R+ --+ R+, s ~-+ #(B(a, s)) is continuous under the assumptions for ~ the intermediate value theorem yields the existence of an s > 0 with s _< So and ~(B(a, s)) = #(A). Then ~(A \ B(a, s)) = #(B(a, s) \ A) and we
2. Centers and moments of probability distributions
27
have /[Ix-all
f
r d"(x) =
A
B(a,s)
-
/
[Ix-all r d"(x) +
IIx -aH r d#(x)
A\B(a,s)
/
IIx-alrd#(x).
B(a,s)\A
Obviously I[x - a[F d#(x) >_s~#(A \ B(a, s)) A\B(a,s)
and
f Ilx-
alff d#(x) < g#(B(a, s) \ A).
B(a,s)\A
This implies
f
Hx_a[[rd#(x) -
A\B(a,s)
/
IIx-alrd#(x)
B(a,s)\A
> g(#(A \ B(a, s)) - #(B(a, s) \ A)) = O. Hence, the lemma is proved.
[]
We can deduce a well known fact about the moments of unffom distributions on balls. 2.9 L e m m a
We have M~(B(O, 1)) = min{M~(A) : A E B(R d) bounded, Ad(A) > 0} and B(O, 1) is the essentially unique minimizer of Mr in that any bounded set A E B(R a) with Mr(A) = Mr(B(O, 1)), Ad(A) = An(B(0, 1)), and 0 C C~(U(A)) satisfies Ad(A A B(0, 1)) -- 0.
If, additionally, A is regularly dosed (that is, A = cl(int A)), then A = B(0, 1). Moreover, d (2.5) Mr(B(O, 1)) = (d + r)Ad(B(0, 1)) r/d" Proof The first assertion follows from Lemma 2.8 with the choice # = Ad and Lemma 2.1 (b). As for uniqueness, let A be a set with the above properties. Then
/ ''x''rdx= / A
B(0,1)
['x'[rdx"
28
I. General properties of the quantization for probability distributions
It follows that )~d(B(0, 1) \ A) = Aa(A \ B(0, 1)) _<
f ]
Ilzllr dx
. 1
A\B(0,1)
=
/
HxHrdx < )~d(S(O, 1) \A).
B(0,1)\A Therefore,
(llxlr-l) dx = 0
f A\B(0,1)
which implies A \ B(0,1) C OB(O, I) Ad-a.s. Since ,~d(OB(O,1)) = 0, we obtain Ad(AAB(0, 1)) = 0. ff A is closed, then B(0, 1) C A. Otherwise {[[x]l < 1} \ A is a nonempty open set implying )~a(B(0,1) \ A) _> ~d({llxll < 1} \ A) > 0, a contradiction. It follows that {llxll < 1} C int A and hence Ad(B(0, 1 ) ) =
~a((llxll <
1}) _< )~(int A) < Ad(A).
Therefore, Ad({[[x[[ < 1}) = Aa(int A) which implies {llxil < closedness of A follows A = B(0,1).
1} =
int A. From regular
Moreover, in view of the symmetry of U(B(0, 1)) one gets
Vr(U(B(O, 1))) = f Ilxllr dU(B(O, 1))(x) oo
= fU(B(O, 1))(llxll r > t)dt 0 1
--
f
(1 - t d/r) dt
-
d + r"
0
This gives the formula (2.5).
[]
In view of the above formula for Mr(B(O, 1)) it is worth to recall that the volume of unit balls with respect to the/p-norms for 1 _< p < co is given by (2.6)
(cf. e.g. Pisier, 1989, p. 11).
~(z(0,1))
-
(2r(1 + ~)V r(1 + ~)
2. Centers and moments of probability distributions
29
Notes Among spatial centers the spatial medians have received special attention. We refer to the survey article by Small (1990) for a discussion of several notions of spatial medians. A good source for norm-based medians as defined in (2.1) is Kemperman (1987).
30
3
L General properties of the quantization for probability distributions
The quantization problem
In this section we will give several equivalent formulations of the quantization problem for probability distributions on R d with norm-based distortion measure. Let X denote a Rd-valued random variable with distribution P. For n C N, let ~-u be the set of all Borel measurable maps f : R d --+ R d with [f(Rd)] _~ n. The elements of ~-~ are called n - q u a n t i z e r s . For each f E ~'n, f ( X ) gives a quantized version of X. Let 1 < r < c~ and assume
EliXir < oo. The n - t h q u a n t i z a t i o n e r r o r for P o f o r d e r r is defined by
(3.1)
Vnr(P) = inf '
fE.T~
EIIX- f ( x ) l r .
We will also write V,~,r(X) instead of V,~,~(P). A quantizer f C ~-,~ is called n-optimal for P o f o r d e r r if V,~,~(P) = E I I X - f(X)lr.
Note that VI,~(P) = V~(P). al _A -w
= vA
-A
=
~(x)
l
Figure 3.1: Quantization scheme For fixed n E N, searching for an n-optimal quantizer is equivalent to the n-centers problem.
3.1 Lemma
V,,,r(P)=
inf aCR d [ai_
E~nlJX-alV.
3. The quantization problem
31
Proof
For f E ~-~, let a = f ( R d) and A~ = { f = a}, a E a. Then EHX
/(x)lr = Ef[]xd
a[Fdf(x )
aEa A~
= E ~ n IIX - blL Conversely, for a c N d with Ic~l < n, let {A~ : a E c~} be a Voronoi partition of R d with respect to a and let f = ~ alA.. Then f E ~ and aE~
EminllX-all~=a~e~fllx-allrdP(x)=EIIX-aea f(X)l[r" [] A set a C R d with lal _< n is called n - o p t i m a l s e t o f c e n t e r s f o r P o f o r d e r r if
V,~,r(P) = E ma Ei n~ IIX - a l l The proof of Lemma 3.1 shows that if f is an n-optimM quantizer, then f ( R d) is an n-optimal set of centers. Conversely, if a c R d is an n-optimal set of centers and {Aa : a E a} is a Voronoi partition of N d with respect to a, then f = ~ alA, is an aEa
n-optimal quantizer. Recall that by Proposition 1.2, the sets Aa may be chosen to be star-shaped relative to a. Let Cn,r(P) denote the set of all n-optimal sets of centers for P of order r. We also write C,,,r(X) instead of Cn,r(P). Note that CI,~(P) can be identified with Cr(P). For A E B(R d) bounded with Ad(A) > 0, define the n o r m a l i z e d n - t h q u a n t i z a t i o n e r r o r for A o f o r d e r r by (3.2)
Mn,r(A)- V,~,r(U(A)) )~d(A)r/d
"
The following equivariance, scaling and invariance properties extend those of Lemma 2.1. 3.2 Lemma
Let T: R d --+R d be a similarity transformation with scaling number c > 0. (a) G , , ( T ( X ) ) = TC~,~(X) , V~,~(T(X)) = ~'V,~,(X).
32
L General properties of the quantization for probabifity distributions
(b) M,~,r(T(A)) = M,~,~(A), if A E B(R d) is bounded with Ad(A) > O. Proof Obvious.
[]
Next, we show that the quantization problem is equivalent to a partitioning problem for the space Rg. 3.3 L e m m a
V~,,(P) =i~f~)-~V~(P(.IA))P(A), AC~
where the infimum is taken over all Borel measurable partitions ,4 ofR d with ]~4] _< n. Proof For f E ~-, let a ----f(R d) and A~ -- ( f -- a}. Then (An: a E a} is a partition of R ~ and
EiIX-f(X)ir=~f[ix-ail~dP(x) >- ~ V~(P('IA~))P(Aa). aEa
Conversely, for a Borel measurable partition ~4 of R ~ with [~41 < n, choose a A E Cr(P. [A)),A E A, which is possible by L e m m a 2.2 and let f = ~ aA1A. (If
P(A) --- 0, let aA be an
ACA
arbitrary point in R~.) Then f E 9vn and
V~(P('tA))P(A) = Z AEA
/ I l x - aAtl~dP(x)
=
EIIX
-
f(X)tt'.
AEA A
[] A Borel measurable partition ¢4 of ]Rd with I~4] <_ n is called n - o p t i m a l p a r t i t i o n for P o f o r d e r r if
V~,r(P) =~V~(P(.IA))P(A ). AEA
The proof of the preceding lemma shows that if f is an n-optimal quantizer, then {{f = a} : a E f(R~)} is an n-optimal partition. Conversely, if .4 is an n-optimal partition and aA E Cr(P(.IA)) for A E .4, then f -- ~ aA1A is an n-optimal AEA
quantizer.
3. The quantization problem
33
The quantization problem for P is further equivalent to the problem of approximating P by a discrete probability with at most n supporting points. For Borel probability measures/)1,/°2 on R a with f Ilxllr dPi(x) < oc, let
(3.3)
P~(P1,P2) = inf
(/
I1= - ulF d#(x, y)
,
where the infimum is taken over all Borel probabilities # on R d x R d with fixed marginals P1 and P2. The L r - m i n i m a l m e t r i c p~ ( L r - W a s s e r s t e i n m e t r i c or LrK a n t o r o v i c h m e t r i c ) is appropriate for the quantization problem. This has been observed by Gray et al. (1975), Gray and Davisson (1975) and Pollard (1982a). By P~ denote the set of all discrete probabilities Q on R d with I supp(Q)I < n. 3.4 L e m m a
V,,r(P) = inf p~(P, p I ) = j n f pT(P, Q), fEY~ qIEP~ where p I denotes the image measure of P under f. Proof Given f E ~'~, let # I denote the image measure of P under the m a p R ~ -+ ]R~ x
R d, x ~-> (x, f(x)). Then
EIIX - f(X)lF = f IIx - yll~ d#f(x,y) > p;(P, Pf). d
This implies
V,~,~(P) > inf p~(P, pf) > inf p~(P, Q). --
fe.~n
--
Qe'Pn
If Q E P~ with Q(a) = 1, ]a[ < n, then for every Borel probability tt on R d × R d with marginals P and Q
f tl=- ytFd~(~,y) = f
I1~ - Yllrd~(~,Y)
Rdx¢~
_>f
min.~IIx - all r d#(x, y)
Rd×¢~ :
f 2tlx- all"dP(x),
hence
py(P,Q) >_E ~ n l l X - alff.
34
I. General properties of the quantization for probability distributions
By Lemma 3.1, this yields
inf pr(p, Q) > v,,r(P).
QET~,~
[] A measure Q E P , is called n - o p t i m a l q u a n t i z i n g m e a s u r e for P o f o r d e r r if Vn,r(P) = prr(p, Q). If f E 9vn is an n-optimal quantizer, then p I E P , is an n-optimal quantizing measure. Conversely, if Q E P . is an n-optimal quantizing measure and {A~ : a E a} is a Voronoi partition with respect to c~ = supp(Q), then f -- ~ alA~ is an n-optimal aE~
quantizer. Several functional descriptions of p~ are known. Among them the most famous is the Kantorovich representation for r = 1
PI(P1,P2) = sup ] / gdPI - / gdP2', g
where the supremum is taken over all functions g: R d -+ N satisfying the Lipschitz condition Ig(x) - g(Y)l <- IIx - Yl[ for all x,y E ]Rd In case d = 1, p~ admits the representation 1
pr (P1, P~) = ( / I F l - ~ ( t ) - F~-l(t)l~dt) 1/~ . ]
0
and
f
px(Pl, P~) = ] IF,(t) - F2(t) ldt, where Fi denotes the distribution function and Fi- t the quantile function of P~ (F~-1(t) = inf{x e R: Fi(x) _> t},t e (0, 1)). For this background on Lr-minimal metrics we refer to Rachev (1991) and Rachev and Riischendorf (1998, Chapters 2.5 and 2.6). The empirical counterpart of quantization is cluster analysis. Somewhat more precisely, partitioning methods of cluster analysis for a finite sample according to a norm-based optimality criterion correspond to quantization for the empirical measure. 3.5 E x a m p l e ( E m p i r i c a l v e r s i o n , c l u s t e r a n a l y s i s ) k Let x l , . . . , xk E R d with xi = (xil,... , xid) and let P = ~ ~ 5~, denote the empirical i=1
measure. We obtain from Lemma 3.3 1 Vn,r(P) = - min ~ min E Hx~ - air, k c C~C~CaERaieC
3. The quantization problem
35
where the infimum is taken over all partitions C of { 1 , . . . , k} with ]C] _< n. If the underlying norm is the/2-norm, then 1 . V~,2(P) = ~ m~n ~
~
]]x~ - ~(C)]]2,
CEC iEC
where E(C) = ~ ,~c X,. This is the variance criterion for optimal grouping of data x l , . . . , Xk. If the underlying norm is the/t-norm, then
1
.
V,,,,(P) = ~-m~n~ ~ I1=, - med(C) ll, CEC iEC
where med(C) is an arbitrary element of X~=1med(xij, i E C) and med(xij, i E C) is the set of empirical medians of the real data x~j, i E C (cf. Example 2.3 (b)). This is the/1-criterion for optimal grouping of data. For treatments of cluster analysis which contain discussions of the above optimality criteria we refer to Bock (1974) and Sp~th (1985). The n-optimal sets of centers for P of order r correspond to global minima of the function (3.4)
¢~,r : (Rd) n -+ R+, ¢~,r(al,... , an) = E min IIX - aiiIr. l
Notice that ¢1,r = Cr. While Cr is convex, Cn,~ is typically not convex for n > 2. Therefore, local minimum points of Cn,r may not be global minimum points of Cn,r. The lack of any straightforward solution for the quantization problem (at least for d > 2) is a result of the difficulty in dealing with the nonconvex nature of quantization. Notes Treatments of the quantization problem with applications in information theory (analog-to-digital conversion, signal compression, coding theory) are contained in the March 1982 Special Issue of IEEE Transactions on Information Theory (Vol. 28, pp. 127-202), in Gray (1990), Abut (1990), Gersho and Gray (1992), and in Calderbank et al. (1993). Some material may also be found in Fang and Wang (1994). The nth quantization error Vn,r(P) appears in error bounds for numerical integration; see Pages (1997). In the one-dimensional case the quantization problem for r = 2 corresponds to the optimal stratification problem for Bowley (or proportional) sampling schemes of Dalenius (1950). Also in the one-dimensional case the quantization problem can be seen as optimal knot selection for piecewise constant Lcapproximation. A review of the problem in this spirit for r = 2 can be found in Eubank (1988). A fuzzy version of the quantization problem is discussed by Yang and Yu (1991). Let us mention that n-optimal sets of centers are sometimes called sets of principal points or representative points.
L General properties of the quantization for probability distributions
36
Occasionally, it may be preferable to use other measures for the quantization error than Lr-metrics as in (3.1). The limiting case of the Loo-metric ("worst-case error") ess
supllX -
f ( x ) l l = inf{c > 0: h ° ( l l / -
f ( x ) [ I > c) = 0 } , f e -%-n
is studied in Section 10 and the Ky Fan metric inf(e > 0: ~°(llX - f(X)ll > ~) _< ~) is studied in Graf and Luschgy (1999a). While the first metric requires X to be bounded, the latter does not. The Ky Fan error measure leads to the approximation problem for P with respect to the Prohorov metric (in the sense of Lemma 3.4). An investigation of the quantization problem based on the geometric mean error e x p E log t l X - f(X)tl as measure of performance can be found in Graf and Luschgy (1999b). Input weighted error measures of the form E(X - f(x))tB(X)(X
- f(X)),
where B(x) is a positive definite matrix for every x C R a, have proved useful in speech and image compression systems. For various aspects of the quantization problem based on this error see e.g. Gray and Karnin (1982), Gardner and Rao (1995), Li et al. (1999) and Linder et al. (1999). Basically different quantization problems have been treated by Elias (1970) and more recently by Bock (1992) and PStzelberger and Strasser (1999).
4. Basic properties of optimal quantizers
4
37
Basic properties of optimal quantizers
As in the previous section, let X be a Rd-valued random variable with distribution P such that EIIXI] ~ < co for some i _< r < co. Further, we assume (with the only exception of the last subsection) n > 2 and in order to avoid trivial cases, we also assume P ~ P u - i , that is, ] supp(P)] >_ n.
4.1
Stationarity
and
existence
The following two theorems provide necessary conditions for n-optimality of quantizers. They provide the gateway to most available algorithmic solutions. 4.1 T h e o r e m ( N e c e s s a r y c o n d i t i o n s for o p t i m a l i t y )
Let ~ C C~,r(P) and let {An : a E c~} be a Voronoi partition o f N d with respect to a and P . Then [a[ = n, P(Aa) > 0 for every a C a,
e
I U Ao)) eorovery
C
with
I 1--m.
aEB
In particular, P(W(ala)) > O, a e C,(P(.[W(ala))) for every a E ~.
(4.1)
Proof Let 7 = {a E a : P(Aa) > 0} and assume ]7I < n. Obviously, 7 C C~,,(P). Since P ~ Pn-1, there exists a C 7 such that P(.IAa) is not a point mass. We can conclude that
P ( g ( a , b) c M An) > 0 for some b e R d. (Recall H(a,b) = {x E R d : [ix -- ail _< ]ix -- bi]}.) In fact, we have P(A~ \ {a}) > 0 and hence, there is a compact set K C A~ \ {a} with P ( K ) > O. Since K C U H(a, b) c and H(a, b)c is open, we can find a finite subset B of K such bcK
that K C U H(a, b)c. This gives the existence of a point b C B with the required bEB
property. It follows that
V~,r(P) = E m i n [ i X - a[[r > E min [IX - a[[r > V~,(P), aE7
aETt2{b}
'
a contradiction. As for the assertion concerning/3, assume/~ ¢ C,~,~(P(. I U A~)). Then there exists aE/~
5 C R d with 15[ < m and
f U.~
neff
mni[[x-b[irb¢~
dP(x) > / U,~
aEf~
mini[x-b[[ dP(x).
38
L General properties of the quantization for probability distributions
It follows that
v~,~(P) = E ~ n l l X - a l l ~ > E
rain IlX-all" ~ v,~,.(P),
ae6u(~\~)
a contradiction.
[]
We know from (1.8) and Theorem 1.5 that the Voronoi diagram of every finite subset of tt[d is a P-tesselation provided the underlying norm is strictly convex and P is absolutely continuous with respect to Ad. So the following result is of interest for probability distributions P which are not absolutely continuous with respect to Ad. (Such probabilities are considered in Chapter III.) 4.2 T h e o r e m ( N e c e s s a r y c o n d i t i o n for o p t i m a l i t y ) Let o~ • C~,~(P) and let r > 1 or P ( a ) = O. Suppose the underlying norm is strictly convex and smooth. Then the Voronoi diagram of a is a P-tesselation o f R d. Proof We have to prove
P(W(al~) n W(bl~)) = 0 for every a,b • a , a 76 b. Fix a,b • oi, a ¢ b and assume P ( W ( a [ o 0 M W(blot)) > O. Choose a Voronoi partition {Ac : c • a} with respect to a such that A~ = W(ala ) \ W(bla ). Then by Theorem 4.1, a • Cr(P(.IA~)) n Cr(P(.iW(aloO) ).
From Lemma 2.5 it follows that
f
llx - all~-'Vll II(a- x) dP(x)
=
0
A~\(~} and
f
IIx - ~ll~-lvll II(a - x)dR(x) = 0
wcala)\{a}
which yields
f
IIx - all~-lVll II(a - x) dP(x)
=
0.
w(ata)nw(bl~) Therefore, again by Lemma 2.5, a e Cr(Q) with Q = P ( . i W ( a i a ) n W(bia)). Since W(ala) Cl W(bIc~) is contained in the separator S(a, b), this implies b C C,.(Q). Thus,
4. Basic properties of optimal quantizers
39
Q has two different centers a and b of order r. By Theorem 2.4, this can happen only
ff r = 1 and Q(L) = 1, where L is the line through a and b. Since i M W(al~ ) N W(blo~) C i N S(a, b) = ((a + b)/2}, one obtains Q = 5(a+b)/2. It follows
{a, b} C C~(Q) = {(a + b)/2}, a contradiction.
[]
A set ~ c R d with ](~] -- n satisfying condition (4.1) is called n - s t a t i o n a r y set o f c e n t e r s for P o f o r d e r r. Let S,~,~(P) denote the set of all these n-stationary sets for P and denote by SS,~,r(P) the subset of S,~,~(P) consisting of all c~ E S,~,r(P) such that the Voronoi diagram of (~ is a P-tesselation. Then by Theorem 4.1,
Cn,r(P) C S,~,r(P). Note that any Voronoi partition {An : a E a } with respect to (~ E SS,~,r(P) and P satisfies Aa = W(al~ ) P-a.s., a E c~. We also write S,~,~(X) and SS,~,r(X) instead of S~,r(P) and SS~,~(P), respectively. 4.3 C o r o l l a r y
(a) Let A be an n-optimal partition for P of order r. Then [AI = n , P ( A ) > 0 for every A E A, C~(P. IA)) n C~(P(.IB)) = 0 for every A, B E A, A • B, and ,4 is a Voronoi partition o f R d with respect to ~ {aA : A C A } and P for any choice of aA C Cr(P(.IA)). =
(b) Let f E Y=,~be an n-optimal quantizer for P of order r and let a = f(Rd). Then a E Sn,~(P), { { f : a} : a E a ) is a Yoronoi partition o f R d with respect to o~ and P, P ( { f = a}) > 0 and a • Cr(P('l{f -- a})) for every a • ~.
Proof
(a) We have V~,r(P) = ~
V~(P('IA))P(A ) = Z
ACA
>_ ~
/ [ i x - aAllrdP(x)
AEA A
f min IIx - biIr dP(x) = [ min Iix - bllT dP(x)
J AEA A
ben
J
bca
>_ V,,,r(P). This implies (~ E C,~,r(P) and
f jlx J
A
r dP(x) = f
m i n Iix - bll r
J boa A
dP(x), A E A.
40
I. General properties of the quantization for probability distributions
Therefore, Jt is a Voronoi partition of R d with respect to a and P . The remaining assertions follow from Theorem 4.1. (b) As in (a) one can check t h a t { { f = a } : a E a } is a Voronoi p a r t i t i o n of ~a with respect to ~ and P . The remaining assertions follow from Theorem 4.1. [] Under the condition C,~,r(P) C SS,~,~(P) there is a characterization of optimal quantizing measures. 4.4 L e m m a
Suppose C,~,r(P) C SS,~,~(P), that is, the Voronoi diagram of every a E Cr~,r(P) is a P-tesselation o f R a. Then the set of n-optimal quantizing measures for P of order r coincides with the set { P f : f E .7:,~ n-optimal for P of order r}. Proof Let Q = )-~aea PaPa be an n - o p t i m a l quantizing measure of order r. Choose a Borel probability # on R d × R d with marginals P and Q such t h a t p~(P,Q)~ = f IIx yll ~ d#(x, y) and let f = ~ a e a alAa, where {Aa : a E a } denotes a Voronoi partition of R d with respect to ~. Then f is an n - o p t i m a l quantizer and a E Cnx(P). Therefore r
J IIx - alV ,~eo ~ x {,q
dlz(x, y)
llx - yll" d,(z, y)
=
Raxa
= pr(P, Q)r = Vn,,(P) = /min
j
bea
=
IIx -
f
b i t alP(x)
minb~.IIx - bll r
d,(x, y).
ae aRd x {a} This implies Rd x
{a}
c
W(ala)
x Rd
#-a.s.
for every a E ~. Hence p~ < P(W(aI(~)) , a E a. It follows from the assumption t h a t ~ a e ~ P(W(a[o~)) = 1. Since ~ a e ~ P a = ~ a e , P(A~) = 1, one obtains Pa = P ( W ( a l a ) ) = P(A~), a E ~. This gives Q = P f . The converse inclusion was already mentioned in Section 3. [] In general, the Voronoi diagram of an n-optimal set of centers (~ C C,~,r(P) need not be a P-tesselation and also, the assertion of L e m m a 4.4 may fail. This is exhibited by the following example. 4.5 E x a m p l e Let the underlying norm on R 2 be the lot-norm. Consider P = ¼(5(-1,0) +5(0,U +5(1,0) + 5(0,-1)) and let n -= 2, r = 1. It is geometrically rather obvious t h a t V2,1(P) = 1/2
4. Basic properties of optimal quantizers
41
and C2,1(P) consists of all sets {a,b} with a,b • {x • R 2 : Ixll + Ix~l = 1} such that the line segment joining a and b meets the liae {xl = 0}; see Figure 4.1. Now let a = ( - 1 , 0 ) and b -- (1,0). Then {a,b} e C~,I(P) and S(a,b) contains the line through ( 0 , - 1 ) and (0, 1). One obtains P(S(a, b)) = 1/2 > 0 and hence, the Voronoi diagram {H(a, b), H(b, a)} of {a, b} is not a P-tesselation. Furthermore, the probability Q = ~SaS+ 35b is a 2-optimal quantizing measure for P. In fact, let X l : ( - 1 , 0), X2 ---- ( 0 , 1), x3 = (1, 0), X 4 = ( 0 , - - 1 ) , and define a discrete probability/z onR2 × R: by
#({(xl,a)})=~({(xa, b)})=#({(xd,a)})=l/4, ~({(x2, a)})=~({(x~,b)})=l/S. Then the marginals of # are P and Q, respectively, and
fi
x
- yll
dp,(x, y)
= 1/2.
This yields ½,1(P) = pl(P,Q). Obviously, Q ¢ P f for every f • 22. X2
X1
Figure 4.1: 2-optimal centers of order 1 with respect to the lo~ norm 4.6 R e m a r k ( E u c l i d e a n n o r m s ) Let Ilxll = (x,x> 1/2 for some scalar product ( , > on R ~. (a) We have
U{o~ : o~ • S,~,r(P)} C clconv(supp(P)). This follows from L e m m a 2.6(a). Furthermore, by Theorem 4.2, Cn,r(P) C SS,~,r(P) provided r > 1. Recall that in case r = 2, the second condition of (4.1) means a = E ( X I X • W ( a l ~ ) ) , a • 4.
42
L General properties of the quantization for probability distributions
(b) If a C SS~,2(P), then
aP(W(a[o~)) = E X . aE~
In case d --- 1, a simple but sometimes useful consequence is that m i n ~ < E X < m a x ~ holds for o~ C SS~,2(P) (n > 2). (c) If ~ C SSn,2(P), then E min [IX - a[[ 2 = EI[X[[ 2 - ~ aCa
[[aH2P(W(a[a))
aE~ --
½ ( x ) + IlEX[I 2 - ~
IlaIlZP(W(alod).
o,Eot
Hence, for a C R d we have a C C,,,2(P) if and only if ot E SS,~,2(P) and
Z[la[IZP(W(a[°~)) = max ~-~l[b[[2p(W(bIfl)). ,SeSSn,r(P) ~Efl 6Eo~
(d) If f e .T~ is an n-optimal quantizer for P of order 2, then El(X) = EX, E ( X - f ( X ) , f ( X ) ) = O, v,~,2(x) = EIIX
-
. f ( X ) l l 2 -- E I I X I I 2 - E I I . t ' ( X ) I I ~.
This follows from (b) and (c). The sets Sn,r(X) and SS,~,r(X) have the same equivariance property as Cn,r(X). 4.7 L e m m a
Let T : R ~ --+ ~d be a similarity transformation. Then S,~,~(T(X)) = TS,~,,(X), SS,~,r(T(X)) = TSS,~,~(X). Proof Easy consequence of the equivariance properties of Voronoi regions and Cr(X) given in Lemmas 1.6 and 2.1 (a). [] Stationary product quantizers are discussed in the following lemma. 4.8 L e m m a ( P r o d u c t q u a n t i z e r s )
Let the underlying norm be their-norm. Let ni E 1N, t3i C R with [fli[ ~ hi, 1 < i < d, and ~ = ~d= 1 Zi. d n i. (a) Suppose that X ~ , . . . , Xd are independent and let n = rIi=t S~,~(X) if aad only if ~i E S~,,~(Xi) for every i.
Then a E
4. Basic properties of optimal quantizers
43
(b) If t3i E Cn,,'-(Xi) for every i, then d
IIX - all" = ~
Emin
v,,,'-(x,).
4=1
Proof (a) Let Pi denote the d i s t r i b u t i o n of b~ C/~i for every i, we have
Xi, 1 < i < d. For a = ( b l , . . . ,bd) E a with d
W(aloO = x W(b, lZ,). A s s u m e a e Sn,'-(X). T h e n 1-I~=l IZ~I = Io~1 = ~ and d
1-I P~(W(bd,6~)) = P ( W ( a l a ) )
> 0, a = (b~,... ,bd) C o~
i=1
which gives ]fli[ = n~ a n d P~(W(bil/3i)) > 0 for every i. F i x i a n d let c = ( c l , . . . ,ca) C ]Rd with cj = bj for j # i. T h e n d
f
f
J=lw(a[a)
IIx - all" dP(x)
wC~la)
Ilx - c[r dP(x) w(al~)
= "~'.i f
]xj - b J dP(x) + f
a¢ W(~l~)
]x, - cdr dP(x)
W(ala)
and hence
/
Ix,-b, lrdP(x)~_
wCal~)
Since for
/
Ix,-cd'-dP(x).
W(~l~)
ci C R f ]xi - cd'-dP~(xi)/P~(W(bil~i)) = [
f W(bd~O this yields
[x~ - c~l'-dP(x)/P(W(alt~)),
w(~l~)
bi C C'-(P~(.IW(bd~))). Therefore, fli E Sn,,'-(Xi).
44
I. General properties of the quantization for probability distributions
Conversely, assume j3i E Sn,,~(Xi) for every i. Then [a[ = n and P(W(a[a)) > 0. Let One obtains
C ---- ( C 1 , . . . , Cd) E N d.
d
w(ala)
~=1w(bil~)
f
d
Ix'-c~[r dPi(x~)/P'(W(b~[/3~))
I,:1 W(bil,5. )
= / IIx-cll~dP(x) w(~l,~)
and hence a E C~(P(.IW(alo~))). Therefore, oc E Sn,r(X). (b) We have d
E rain IIX - all ~ = ~ aEa
~
i=1 d
E min IXi - b r bE~i
=
i=1
[] The n-stationary sets for P are related to the stationary points of the function Cn,~ (see (3.4)). 4.9 L e m m a ¢~,r is continuous on (Rd) ~. Proof
Immediate consequence of the continuity of (al,... , an) ~-~ minl
-
aill r
for []
4.10 L e m m a Let a l , . . . , am E R d with a~ ~ aj for i ~ j. Suppose the Voronoi diagram of ~ = ( a l , . . . , an} is a P-tesselation o f R d. Then ~n,r has a one-sided directional derivate at a = (at,... ,an) in every direction y = (Yl,... ,Yn) E (Rd) n given by
V+¢mr(a'y)=r~l =
/
[]x-ai][r-lV+[[ [[(ai- x'yi)dP(x)"
W(ai]a)
/ f the underlying norm is smooth and furthermore, r > 1 or P(a) = O, then Cn,r is differentiable at the point a with derivative V¢,~,r(a)--(r
/ w(o~i~)\{o~}
[Ix-ai[[r-tV[[ [ [ ( a i - x ) d P ( x ) ) l < , < n.
4. Basic properties of optimal quantizers
45
Proof Recall t h a t
wo(ad,~) = i'-'1 {= j:¢i
• Re:
IIx
-
a, II < IIx
-
ajll).
For b = (bl,...,b,~) • (Re) '~ set d(x,b) = minl_
P ( 0 Wo(ail°~)) = 1. i=I
Furthermore, we have Id(x, a + b) - d(x, a)l < lmax ] IIx - ai - bdl - IIx - aill I
< m a x Ilbdl -
l
and since lu ~ - vrl <_ rmax{u~-l,v~-*}lu- vl,
u , v > o,
we obtain (4.2)
Id(x, a + b)~ - d(x, a)~l < (C~I[xW-1 + C2) m a x Ilbdl, --
1
x E Rd,b E (Rd) '~ with maxt
_ 0 not depending on x and b. Let y = ( Y l , - - - , Y,) • (Rd) '~. Then
t-l(¢~,r(a+ty)-¢~,r(a)) = ~
/
t-l(d(x'a+ty)r-d(x'a)~)de(x)"
i=lw0(ad~) For x • Wo(aila), there exists e > 0 such t h a t the Rd-components of a + ty are pairwise different and
x • W(ai +tyi[{al + t y l , . . . ,a~ + ty,~}) for every 0 < t _< e. This implies t - ~ ( d ( x , a + t y ) ~ - d(x, a) r ) = t - l ( l l x - (a~ + ty~)ll ~ - IIx - a~ll ~) -+ V+ll
I1"(~,
- x, y,)
as t -~ 0+.
Thus the assertion about the one-sided directional derivative of ¢~,r follows from Lebesgue's dominated convergence theorem in view of (4.2). Now assume t h a t the underlying norm is smooth. Then we have
46
I General properties o f the quantization for probabifity distributions
/ .I,,I'I°,-.l,-I.l,,,)) ~=i
W(~la)\{~}
P
(lmi~,~_._. ]lbjH) -1 ~
J/
(d(x, a + b) r - d(x, a) r - (V H H"(a, - x),b,)) d P ( x )
d
where ( x , z ) = ~ x j z j , x , z • R d. For x • Wo(aila), there exists ~ > 0 such that the j=l
Rd-components of a + b are pairwise different and
x • W(ai +bil{al +bl,...
,a,~ +bn})
for every b • (Rd) ~ with maxl<j<, l]bj[] < ¢. This implies ( m a x IIbjll) -1 (d(x,a + b)r - d(x,a) r - (V H II'(a~ - x),bi) ) --+ 0 l<j~n
as max ][bl[ --+0
(x~a,,
fir=l).
In view of (4.2), the assertion about VCn,r(a) follows from Lebesgues's dominated convergence theorem. [] Consequently, in view of Lemma 2.5, n-stationary sets a E SSr,,,.(P) of centers provide stationary points of ¢~,r, i.e. V+¢~,r(a, y) > 0 for every y E (Rd)% The following example taken from Lloyd (1982) shows that a n-stationary set of centers does not necessarily yield a local minimum point of ¢~,~.
4.11 Example Let P = c t U ( [ - 1 , 0]) + c2U([0, 1]) with c2 > c~ > 0, cl + c2 = 1, and let n = 2, r = 2. 1 and {-½,½} e $22(P). T h e n E X = $ ( 21 c - c l ) , E X 2 = -~ . . For. - 1 . < al < a2 < 1, we have
c.[/a, a.,
¢(al, a2) -----~-
¢(~,,~.) =
~
]
b (i + a,)3 -- a~ + ~ [(I -- a2)3 + a 3] if a I -~-g2 --< O,
-f [(1 + ~)~ - ~] + -f
~
+ ~ + (i - ~.)~ ~ ~ + ~. > o,
where ¢ = ¢2,2. One obtains ¢ (-½, ½) = ~ and
¢
-
+e,
=~+cle
2+c~
-
<
for e v e r y 0 < e < 5 - - -
C2
provided c2 > ~. Thus ( - ~t, ~) l is not a local minimum point of ¢ in case c2 > ~. (It is also not a local maximum point of ¢.) We have
{{ :}} if..,:
4. Basic properties of optimal quantizers
4-c2 S2,2(P) = C2,2(P) U
47
'
4c2
{{ 11}} -
,
and if cz > ~.
The next theorem ensures the existence of n-optimal quantizers. We follow the lines of Pollard's (1982a) proof for the euclidean case and r -- 2. 4.12 T h e o r e m ( E x i s t e n c e ) We have Vn,,(P) < Vn-l,r(P). The level set {¢mr <- c} is compact for every 0 <_ c < Vn-l,,(P). In particular, Cn,,(P) is not empty and U { a : a e C,~,,(P)} is a bounded
subset of R d. Proof By L e m m a 4.9, the level sets of Cn,r are closed. Choose 0 < s < S (depending on n, r, P and c) such that
P(B(O, s)) > 0, (S- s)'P(B(O, s)) > c, 2"
/
IlxllrdP(x) < Vn-l,r(P) - c.
B(O,2S)c Let ( a l , . . . , an) C {¢~,r _< c}. Since c < V~-l,r(P), we have ai ~ aj for i ~ j. Assume without loss of generality ]lax[[ ~ . . . <~ I[an[[.Then Hall[ ~ S. Otherwise mAn [ x - a i [ [ ' d P ( x ) > _ ( S - s ) r p ( B ( O , s ) ) , i<_i<_n
c_> f
J
B(0,~)
a contradiction. ~ r t h e r m o r e , IIanl[ < 5S. Otherwise IIx - alll <_ ltz - a lll.(0, s)(x)
+ 211xtlb(0,
s)o(z)
for every x C ]Rd because Ilalll _< S. Therefore, if ( A 1 , . . . ,An} denotes a Voronoi partition of R d with respect to { a l , . . . , an}, one gets
Vn-l,,(P) < C n - l , , ( a b . . . , an-i) n =~-" f
min
]lx-aiIFdf(x)
J l_
3= Aj n-1 <--J~--~=IA f llZ- ajllrdP(x) q- / n
<_zf
f
B(0,2S) c
< Cn,r(al,. • - , an) + Vn-i,,(P) - c _< Vn-l,r(P),
48
L General properties of the quantization for probability distributions
a contradiction. We thus obtain (¢~,r < c} C ~ B(0, 5S). Hence, if Vu,r(P) < c < V~-I,~(P), the level set (¢~,~ < c) is not empty and compact. This implies that (¢~,r = Vn,r(P)} is not empty and compact and, in particular, C,~,r(P) ¢ @provided Vu,r(P) < V,~-I,r(P). Finally, we observe that this condition holds. We have Cr(P) ¢ q} by L e m m a 2.2. Therefore, V2,r(P) < Vr(P), since otherwise there exists an 2-optimal set ~ of centers with ]~1 = 1 which contradicts Theorem 4.1. Proceed inductively: if Vm,~(P) < Vm-t,r(P) for some 2 ~ m _< n - 1, then C,~,~(P) ¢ q} by the preceding part of the proof and hence, Vm+I,~(P) < Vm,r(P) again by Theorem 4.1. [] From Example 4.5 we know that there may be more than one n-optimal set of centers for P. Here is another example of this fact for an univariate symmetric distribution. 4.13 E x a m p l e Let P denote the uniform distribution on [ - 2 , - 1 ] U [1,2] and let n = 3, r = 2. Then
'
4'
4'
'
2'4'
'
3 (Note that ~ = ( - ~3, 0 , ~) is not a 3-stationary set, since P(W(OIc~)) = P([-¼, ¼]) = 0. However, ( - ~3, 0, ~) 3 is a stationary point of ¢3,2-) Here we have V2(P) -- V a r X -7/3 and Va,2(P) = 5/96. Thus already 3-level quantization reduces the variance considerably.
For dimensions d _> 2, typically [C,~,r(P)] _> 2 holds. This is related to the equivariance property of C,~,,.(P) (see Lemma 3.2). A uniqueness criterion for univariate distributions is discussed in the next section.
4.2
The
functional
V~,r
The following simple properties of the n-th quantization error functional turn out to be useful. 4.14 L e m m a
Let P =
siPi, st >_ O, ~ si = 1, f Ilxllr dP~(x) < oo. i=1
i=l
(a) (Concavity) V,~,r(P)>_~
s~V~,r(P~).
i=1
i=1 m
V~ r(P) < f min IIx - a]V'dP(x ) <_ ~ '
--
!
aCo~
i=1
siVn,r(Pi).
4. Basic properties of optimal quantizers
49
Proof (a) Let a • C~,r(P). T h e n
IIx -
Vn,r(P) = f min J
alF dR(x)
aQa m
= ~ /=1
s,
fminllz-a[I r dPi(x) J
age
__ i=1
(b) Since [a[ _< n, we have Vn,r(P) <_ f min IIx - a i r dP(x) a~Ct gt~
si f minllx-- alr dPi(x) i=l
_< i=1
J aea
s, f rain Ilz - al r de(z) j aec~i
m =
i=1
[] Let X = ( X 1 , . . . , Xd).
4.15 Lemma (One-dimensional marginals)
a
Let the underlying norm be t h e / r - n o r m . If ni E IN, [I n~ < n, then i=1 d
< i=t
and equality holds if and only if there exists a E Cn,r(X) of the type a = Xi.d=lfli with fli c R and Iflil = ni for every i. Moreover, such n-optimal product sets ~ satisfy fli ~ C~,,r(X~) for every i. Proof d
For i < i < d, let fli C R with Iflil <- ni and let ol = Xid=l fli . T h e n [c~l _< 1-Ini _< n i=1
and d
Y~,r(X) < E ~ n l l X - alF = ~-" E min I X / - b F. i=l
50
L General properties of the quantization t'or probability distributions
Therefore, if we choose fli C C~,,~(Xi) for every i, we obtain d
<
(cf. L e m m a 4.8 (b)) and if equality holds, then a E C,~,~(X). In particular, In[ = n by Theorem 4.1 which gives lfli[ = ni for every i. Conversely, assume a ~ C,~,r(X). Then d
d
V,~,~(X) : ~-~ E m i n [ X i - bl~ > ~-~ V,~,,~(Xi) be& i:1 i=i d
implying Vn,r(X) = ~ Vm,~(Xi ) and fli C Cm,r(Xi ) for every i.
[]
i=1
4.3
Q u a n t i z a t i o n error for ball p a c k i n g s
Ball packings consisting of n translates of a ball minimize the normalized n-th quantization error for bounded sets. This observation extends the corresponding statement of L e m m a 2.9 for balls to the case n > 2. By a / t - p a c k i n g in R ~ we mean a countable family {Cj: j 6 /5/} of Borel sets Cj c N ~ such that #(Ci M Cj) = 0 for i # j. i Ad-packing is simply called p a c k i n g .
4.16 Theorem (Ball packing theorem) Let s > 0 and a~,... , a,~ E R ~ such that {B(a~, s) : i = 1 , . . . , n} is a pacldng in R ~. Let B = 0 B(a~, s). Then i:l
M~,r(B) = min{Mu,r(A) : A E B(R d) bounded, Ad(A) > 0}.
Moreover, i=,r(B) =
1)),
(al,..., a=} e and f = ~ ailB(a,,s) is (U (B )-a.s. equal to) an n-optimal quantizer for U (B ) of order i=l r.
Proof Let A c B(R d) be bounded with A4(A) > 0 and denote by Q the uniform distribution U(A). Let C be an n-optimal partition for Q of order r. By Corollary 4.3 we know that IC]-- n a n d Q ( C ) > 0 for every C E C. Note that Q(-]C) = U(A;3C). One
4. Basic properties of optimal quantizers
51
obtains from Lemma 2.9
v,~,r(Q) = ~ V~(Q(.IC))Q(C) CEC
= ~ Mr(A n C)Ad(A n C)~/dQ(C) CEC
Q(C) (d+~)/d.
_> Mr (B(0, 1))Ad(A) r/d ~ CEC
H51der's inequality with p = (d + r)/d and q = (d + r)/r gives
1= ~ Q(C) <_ ( Z Q(C)') 1/pnl/q CEC
CCC
and hence
~_, Q(C) (~+r)/d >_n-~l d. C~C
This implies
V,~,r(Q) >_ (Ad(A)/n)r/4Mr(B(O, 1)) and
M,~,r(A) >_n-r/dir(B(O, 1)). Now let B = [J,~=l B(a~, s) and denote by P the uniform distribution U(B). Note 1
n
that P = ~ ~ U(B(ai, s)). By Lemma 4.14 (b), we have i=1
V~,r(P) < f min [[x -- ai[Ir dP(x) -
j
l
= Vr(U(B(O, s)))
= (A~(B)/n)~/dM~(B(O, 1)). The last equality follows from the scale invariance of Mr (see Lemma 2.1). Thus we obtain
v.(p): f
min ilx- ai[Ir dP(x) = Vr(U(B(O, s))) J 1
and
M~,r(B) = n-r/dMr("(0, 1)). Furthermore, let {A1,... ,A~} be a Borel measurable partition of ]Ca with B n n
A~ C B(ai, s) for every i and let g = ~ailA~.
Then f = g P-a.s.
and since
W(ail{al,... , a~}) n B = B(ai, s), {A1,... , A,~} is a Voronoi partition with respect to {al,... ,a,~} and P. Hence, the theorem is proved. []
52
4.4
I. General properties of the quantization for probability distributions
Examples
We present some examples of optimal quantiziers and stationary sets of centers for dimensions d > 2. Optimal quantizers for several univariate distributions are given in the next section. 4.17 E x a m p l e ( U n i f o r m d i s t r i b u t i o n on a c u b e a n d t h e c u b e q u a n t i z e r ) Let P = U([0,1]) d) and consider a tesselation of [0,1] d consisting of n = k d translates C 1 , . •. , C ~ of the cube [0,1d ~] . Denote by ai the midpoint of Ci. Then • , L 2~ : i-1 , . . . , k } ~ and by the symmetry of Ci about ai, tl
we have a~ ~ Cr(U(C~)) = Cr(P(.tCi)). Let f~ = ~ a~lc~; see Figure 4.2. From scale i=l
and translation invariance of Mr it follows that II
/*
E]]X - fn(X)]l r -- Z i=1
] ]Ix - ai[I~ dx J
Ci
= y ~ Mr(C~)P(C~)(d÷r)/~ i=1
= n-r/~Mr([O, 1]d).
•
•
•
$
0
1
Figure 4.2: Square quantizer for U([0,1] 2) We see that n-level quantization for P reduces Vr(P) = Mr(J0, n -rId. Note that Mr([0,1] a) =
/
[[xl[rex.
2~2 J
For instance, we have for the/r-norm, 1 < r < oo d Mr([O, 1]d) = (1 + r)2 r
1] d) at least by a factor
4. Basic properties of optimal quantizers
53
and for t h e / ~ - n o r m M~([0, 1] d) = M~(B(O, 1)) -
d (d + r)2 r
(cf. L e m m a 2.9). Note further that a E S,~,,.(P) and E I I X - A(X)II" = E rain IlX i_
a~ll" -- p T ( P , P + ~ )
-
provided Ci = [0, 1]dA W(ail~) for every i. This condition is satisfied, for instance, if the underlying norm is the/p-norm for 1 < p < c~. The error of the cube quantizer fu is of optimal order n -r/d but the constant Mr([0, 1] d) is conjectured to be not optimal for common norms with one exception. (This will be seen from the asymptotics for the n-th quantization error as n -+ c~ treated in Chapter II.) The exceptional case concerns the/co-norm in arbitraxy dimensions. In this case, we have C~ = B(a~, ~ ) and therefore, by Theorem 4.16, fu is (P-a.s. equal to) an n-optimal quantizer of order r and a E Cu,r(P) for every r > 1. In particular, for the/co-norm we obtain (4.3)
Vn,r(P) = n -r/d
d
( d + r ) 2 r"
4.18 E x a m p l e ( S p h e r i c a l distributions) Let the underlying norm be the/2-norm on R d and let P be a spherical probability, that is, P is invariant under the orthogonal group O(Rd). Consider the case r -- 2 and n :- 2. Suppose EIIXII 2 < c~. If (~ = (ai, a2) E SS2,2(X) then 0 = E X -= 2
aiP(W(ailo~)) by Remark 4.6 (b) and hence, one can find T E O(R d) such that i=1
Ta~ = ( c 1 , 0 , . . . , 0 ) a n d T a 2 = ( c 2 , 0 , . . . , 0 ) w i t h c l < 0 < c2. By L e m m a 4 . 7 , T(o~) E SS2,2(X). Since W (Tai]T(~) ) = W (ci](cl, c~}) × ~d-1, one gets (cl, c2} C SS2,2(X1). Note that X1 is symmetric (about the origin). Now we use a uniqueness result which is discussed in the next section. Suppose the distribution of X1 is strongly unimodal. Then it follows from Theorem 5.1 that S2,:(X1) = {{-E]X1], E]Xll}). Therefore, cl = -SIX1] and c2 = E]X1]. This yields c2,z(x) = s&,2(x) = ((-a,a)
: a c R ~, IJall = EIX~]).
Since P(W(:t:a](-a, a})) = ½, it follows from Remark 4.6(c) that v2,~(x) = elJxll ~ - (EIX1 I)2 = d E X ~ - ( E I X d ) 2. = ( d - 1 ) E X ~ + V2,2(X1).
54
I. General properties of the quantization for probability distributions
4.19 E x a m p l e ( U n i f o r m d i s t r i b u t i o n o n a n e u c l i d e a n b a l l ) Let the underlying norm be t h e / s - n o r m on R d and let P -- U(B(0, 1)). Then P is a spherical distribution. Consider the case n = 2 and r = 2. The distribution function of X1 is given by
F(t) = P ( [ - 1 , t ] × R ~-1) t
_
1
/Aa-I(Bd_,(0,
~qB.(0, 1))
l~T~--y2))dy
-1 P
= )~d-l(Bd_l(O
,
t
1))/(1
- yS)(d-W: dy
~d(B~(O, 1)) -1 t
r(i+3) f - ¢-~r(½ + 3) J(1 - ys)(d-,)/s dy,
ttl <_ 1.
-1
We thus see that X1 has a Beta distribution. Since log[(1 - y2)(d-1)/s] is concave on ( - 1 , 1), the distribution of X1 is strongly unimodal. So Example 4.18 applies. We have 1
2F(1 + ~) f E[XIJ- v~C(½ + ~) 5 y(1 -
y2)(d-WSdy
2F(1 + ~-) -- (d + 1)v~F(½ + ~)' 1
E X ~ = "d + 2 and hence v~,s(x) -
d
a + 2
(EIX~I)~"
4.20 E x a m p l e ( d - d i m e n s i o n a l s t a n d a r d n o r m a l d i s t r i b u t i o n ) Let the underlying norm be t h e / s - n o r m and let P = Nd(O, I~) where I~ is the unit matrix. Let r = 2. Here Example 4.18 applies and we obtain
cs,dx))
= & , s ( x ) = { { - a , a } : a ~ R ~, Ilall = V ~ - ; }
and 2
V2,2(X) = d - - . 7(
Now assume d = 2. For n = 3, we immediately find two types of 3-stationary sets of centers. Let ~1 = 7 × {0}, where 7 = { - c , 0, e} with c > 0 is the uniquely
4. Basic properties of optimal quantizers
55
determined 3-optimal set of centers for X, of order 2 (cf. Theorem 5.1). By Lemma 4.8, al 6 $3,2(X) and E min ]iX - aJJ2 = Va,2(X,) -5 1. a6~l
The numerical solution is given by c = 1.2240 and Z m i n H X - a H 2 = 1.1902 a6at
(cf. Table 5.1). As second configuration consider
2 ' with b > 0. Then conv a2 is a equilateral triangle and P(W(a[o~)) = 1/3, a 6 a2. If b satisfies
b= 3
f
x2 dN2 (0,12) (x)
tJ
W((O,b)]a2) oo
:3//
x2 dN(O, 1)(x2) dN(O, 1)(x,)
I,d/,5
dN(O,1)(xl)
= 3 f ~o(Iz, I/~) 3v~ 2v~ -
-
-
-
1.0364... ,
where ~ denotes the A-density of N(0, 1), then o~2 6 Sa,2(X) and by Remark 4.6 (c) E min J]X - aJ]2 = 2 - ~ a6a2
Hall2/3
a6ot2
27
= 2 - ~ = 0.92~7
.
.
.
.
Note that a2 is considerably better than al. Flury (1990) provides numerical evidence that a2 6 C3,2(X). For n = 4, we find three types of 4-stationary sets of centers. Let fll = 3' × {0}, where
7 -= { - c 2 , - c l , cl,c2} with 0 < Cl < c~ is the uniquely determined 4-optimal set of centers for X1 of order 2 (cf. Theorem 5.1). Then by Lemma 4.8, fll 6 $4,2(X) and E m i n [ [ X - a[[ 2 ---- V 4 2 ( X 1 ) -5 1. a6~1
The numerical solution is given by cl = 0.4528, c2 = 1.5104 and E min ]iX
- aJJ 2
= 1.1175
56
I. General properties of the quantization for probability distributions
(cf. Table 5.1). Next consider
with b > O, where b solves the equation oo
co
b / ( 2 ~ ( v / - 3 y ) - 1) dN(O, 1)(y) = / ( 2 ~ ( v ~ y ) ,ur
b/2
- 1)y dN(O, 1)(y).
b/z
Here • denotes the distribution function of N(0, 1). Then for a E/32, a ¢ (0, 0)
aP(W(ai&)) =
f
xdP(x),
W(al~2)
P(W(al&)) = P(W((O, b)l&) ) oo
=
-
hi2 Since ~
1)
dN(O,1)(y).
a = (0, 0), this implies
aE~
S
xdP(x) = (0, 0).
w((o,o)lZ2) Therefore,/32 E $4,2(X) and by Remark 4.6 (c)
Ilall2P(W(al&))
E maE~ i n l l X - all 2 = 2 - ~ aGfl2
oo
= 2 - 3b2 i ( 2 ¢ ( , z ~ 1 - ~) dN(O, 1)(y). b12
The numerical solution is given by b = 1.2791 and E m i n [IX - all ~ = 0.8203, aEfl2
The product quantizer/73 = {-X/2X/~, x / ~ } 4.8,/33 E $4,2(X) and
2 beats/31 and ~ . In fact, by L e m m a
E min ]IX - a][ 2 -- 2V2 2(X1) -- 2 _ _4 _- 0.7267 . . . . aE/~3
'
71-
G r a y and Karnin (1982) provide some numerical evidence for their conjecture that /3i, i = 1, 2, 3 are the only 4-stationary sets of centers of order 2 (up to 12-isometries).
4. Basic properties of optimal quantizers
57
But a formal proof of this conjecture has not yet been given. Figure 4.3 shows the above stationary sets and the corresponding Voronoi tesselations. (Instead of f13, a rotated version of fla is used.) In three dimensions the product quantizer { - X / ~ , vf2-~} 3 can be improved upon. For n = 8, Gray and Karnin (1982) give three different configurations that beat the product quantizer. The authors report simulation results to show that these quantizers are superior. Iyengar and Solomon (1983) provide similar results based on numerical integration.
4.5
Stability properties and empirical versions
Stability and consistency results for the quantization problem are well known. See e.g. Pollard (1981, 1982a), Abaya and Wise (1984), Sabin and Gray (1986), Piirna (1988, 1990), Jahnke (1988), Cuesta-Albertos et al. (1988), Graf and Luschgy (1994b). Let 92~r = 9Rr(Rd) denote the set of all Borel probability measures P on R d such that f Ilxllr de(x) < oo, 1 _< r < oo. Recall that Pr is a metric on YYcrand for Pk, P E 9Rr
p,(p~, P) -~ 0 if and only if
Pk
D
P (weak convergence) and f IIxllr dPk(x) -+ f Ilxll~dP(x)
(cf. Rachev and Riischendorf, 1998, Theorem 2.6.4). A stability property for the n-th quantization error of order r in terms of the Lrminimal metric pr follows immediately from Lemma 3.4. If P1,/)2 ~ 9)It, then
IV,~,r(P~)'/~ - V,~,r(P2)Url ~_ pr(P1,P2)
(4.4)
for every n E N. A stability result for n-optimal quantizing measures can also be based on Pr. The Hansdorff metric given by
dH(A, B) = max(max min la - bll, max min Ila - bll } " aEA
bEB
b~B
aEA
for nonempty compact subsets A, B of R d is convenient for formulating a stability result for n-optimM sets of centers. Notice that
f min IIx-all" dR(x)) ~/~- ( jfminllx-bll" dP(x))'/"l ~B acA <- - \j,(lminlxa---E A aH- min IIx- bll rdP(x))'/r bEB
J
(4.5)
sup I min llx - a l l - m i n l x - bll}
- - xERd I a E A
<_dH(A, B).
bEB
1
58
I. General properties of the quantization for probability distributions
,"
"X
j ................................
-'/ "\,
Figure 4.3: 3- and 4-stationary sets of centers for P = N2(0,12) of order r = 2 and Voronoi diagrams with respect to t h e / 2 - n o r m
4. Basic properties o f optimal quantizers
59
Let Dn,r(P) denote the set of n-optimal quantizing measures for P 6 Y)~ of order r. 4.21 T h e o r e m Let Pr(Pk, P) -+ 0 for Pk, P 6 !3Rr and suppose Isupp(P)I > n, n 6 N.
(a) Let Qk 6 D~,r(Pk), k 6 N. Then the set of pr-cluster points of the sequence (Qk)k>l is a nonempty subset of D~,r(P) and pr(Qk, D~,r(P)) --~ 0 as k --+ oo. (b) Let (~k 6 C~,r(Pk), k 6 N. Then the set of dg-cluster points of the sequence ((~k)k>l is a nonempty subset of Cn,~(P) and dH(ak, C~,r(P)) --+ 0 as k --~ oo. The preceding theorem can be derived from a simple statement for arbitrary metric spaces. 4.22 L e m m a Let (M, d) be a metric space and let N C M be a nonempty subset.
(a) Let f : N -+ R+ be a lower semicontinuous function and suppose the level set L(c) = {y 6 N : f ( y ) ~ c} is compact for some c > ienf f ( z ). Let D = {y • N : f ( y ) = i g f(z)} and let (Yk)k>l be a minimizing sequence in N for f , i.e., f(Yk) --~ inf f ( z ) . --
z6N
Then the set of cluster points of (Y~)k>_l is a nonempty subset o l D and d(yk, D) -+ O, k -4 oo. (b) For x • M , set D(x) = {y • N : d(x,y) = d ( x , N ) } . Suppose xk -+ x and let Yk • D(xk), k • N. Then (Yk)k>l is a minimizingsequence in N for f = d(x, .). Proof (a) Let y • M be the limit of a convergent subsequence (Yk,)n_>l of (Yk)k_>l. Then y • L(c) c N and i g f ( z ) = lim f ( y k , ) >_ f(y). We deduce y • D. The existence of a cluster point of (yk)~>l follows immediately from the compactness of L(c) and the fact that (Yk)k_>lis eventually in L(c). This proves the first assertion. The second assertion is a consequence of the first one. In fact, assume lim supk_~oo d(yk, D) > 0. Then there exists e > 0 and a convergent subsequence
60
L General properties of the quantization for probability distributions
(Yk,)n>_l of (Yk)k>l satisfying d(yk.,D) _> ¢ for every n _> 1 and limya~ E D, a contradiction. (b) Since d(x,N) < d(x, yk) ~_ d(x, xk)+d(xk, yk) =d(X, Xk)+d(xk, N) and d(xk, N) -4 d(x, N), one gets
d(x, Yk) -+ d(x, N) as k --~ co. [] To prove the theorem, the following lemma is required. 4.23 L e m m a
Let B C R d be nonempty and compact. Then (o~ C R~: 1 < lal _< n, ~ c B}
is dH-compact and (Q e ~,,: Q(B) = 1} is pr-compact.
Proof The dH-compactness of {o~ C Rd: 1 ~ [o4 < n, ol C B} follows immediately from the dH-continuity of the m a p ( a l , . . . , an) ~ { a l , . . . , a~}, ai e R d. Set Q = {Q e f~n: Q(B) = 1). It is clear that Q is relatively pr-compact. To show that L~ is pr-closed, let (Qk)k>l be a sequence in L~ and Q c !)~ such that Pr(Qk, Q) --~ O. Set ~k = supp(Qk) and let ~ C R d, 1 < lal < n, a C B be the limit of a dg-convergent subsequence ( ~ ) j > l of (~k)k>l. Then for e > 0 there is a J0 E N such that
U
j>_jo
cU
Since limsupQkj ( U B ( a , ~ ) ) 3-~°°
aE~
aea
Q(U
<_
B(a,c)),
aE~
one obtains Q( U B(a,e)) = 1. We deduce supp(Q) c a which yields Q E 2 .
[]
aE~
P r o o f o f T h e o r e m 4.21 (a) To show that the assertion follows from L e m m a 4.22 applied to the metric space (ffJtr, pr), N = ~,~, and f = pr(P, "), it suffices to verify that
L(c) = (Q e ~ :
pr(P, Q) _< c}
4. Basic properties of optimal quantizers
61
is pr-compact for some c > V,~,r(P)1/r. Choose c such that V,~,~(p)Vr < c < V,,_l,r(P) V~, where Vo,r(P) = c~ (cf. Theorem 4.12). For Q E L(c) and a -- supp(Q), we have
f
m i n [ [ x - alr dP(x ) < pT(p,Q) < cL aEa
Hence, by Theorem 4.12 (or L e m m a 2.2 in case n = 1)
n(c) C {Q E ~3,~: Q(B) = 1} for some compact subset B of R d. Using L e m m a 4.23 we deduce p~-compactness of
L(c). (b) The assertion follows from an application of L e m m a 4.22 (a) to N = {a C R d : 1 < ]c~[ < n} equipped with the Hausdorff metric d r / a n d f : N -+ R+, f ( ~ ) = f rain ]Ix - a H r dP(x). J
aCa
Note first that f is dH-continuous. This follows from (4.5). Next, consider the level set
L(c) = {a E N: f(o~) < c} for Vn,~(P) < c < V,~-I,r(P). By Theorem 4.12 (or L e m m a 2.2 in case n = 1), there is a compact set B c R g such that
L(c) c {~ ~ N: o~ c 13}. Using L e m m a 4.23 we deduce dH-compactness of L(c). Finally, we show that (ak)k_>~ is a minimizing sequence in N for f. For k E N, let {Ak,,~: a E ak} be a Voronoi partition of R d with respect to ak. Set Qk = ~ Pk(Ak,a)5,,. Then Qk E D,~,~(Pk) aE~k
and
v,~,~(P) 1/~ _< f(ak) 1/" ___pr(P, Q~). Moreover, p~(P, Qk) -+ V~,,-(P)1/~ (cf. L e m m a 4.22 (b) for (95~r, p~) and N = ~3~). This implies f(~k) -~ v~,~(p), k -~ o0. Thus, we see that all assumptions of Lemma 4.22 (a) are satisfied.
[]
Notice that Cu,r(P) is d~-compact and Dn,r(P ) is pr-compact provided [supp(P)[ > n. The stability results can be applied to the empirical analysis of the quantization problem. Let X1, X 2 , . . . be i.i.d. Rd-valued random variables with distribution P E and let Pk =
k
~ 5x~ be the empirical measure of X 1 , . . . ,Xa. The empirical i=1
(sample) version of Vu,r(P) is given by k
1 inf ~
min [[Xi - a[[L
62
L General properties of the quantization for probability distributions
4.24 C o r o l l a r y ( C o n s i s t e n c y ) Let P C ff)tr. (a) V~,r(Pk) I/r --~ V~,r(P) l/r a.s. as k --~ oo uniformly in n E N. (b) Let Qk = Q~(X1,... ,Xk) e D~,~(Pk), k e N, and suppose [supp(P)l _> n. Then Pr(Qk, D~,~(P)) --+ 0 a.s., k -+ oo. (c) Let uk = ~ k ( X i , . . . ,Xk) E Cu,r(Pk), k E N, and suppose )supp(P)l ~_ n. Then
d.(uk, C~,~(P)) -~ 0 a.s., k
-+ ~.
Proof Since Pr(Pa, P) -~ 0 a.s. by the Glivenko-Cantelli theorem for Pr, the assertions follow from Theorem 4.21 and (4.4). [] Rates of convergence in empirical quantization can be found in Rhee and Talagrand (1989a), Linder et al. (1994), Bartlett et al. (1998) and Graf and Luschgy (1999c).
Notes Some material on the issue of this section is contained in Gersho and Gray (1992) and Graf and Luschgy (1994a). Theorem 4.1 belongs to the folklore of this area. Theorem 4.2 seems to be new. The characterization given in Lemma 4.4 is due to Pollard (1982a) for the/2-norm and r --- 2. The Counterexample 4.5 is new. In case the underlying norm is the/2-norm, the differentiability of ¢n,r (cf. Lemma 4.10) has been proved by Pollard (1982b) for r = 2 and for arbitrary r a proof is contained in Pages (1997). Theorem 4.16 is new. Examples 4.18-4.20 on the quantization of spherical distributions and the d-dimensional standard normal distribution are essentially taken from Gray and Karnin (1982), Iyengar and Solomon (1983), Flury (1990), Tarpey et al. (1995), and Tarpey (1995). See also Tarpey (1998). Let us mention that n-stationary sets of centers are sometimes called self-consistent sets. The central limit problem for n-optimal empirical centers of order r -- 2 with respect to the/2-norm has been solved by Pollard (1982b) under a uniqueness condition for the n-optimal population centers. A central limit result in a nonregular setting has been given by Serinko and Babu (1992) for the univariate case, d = 1, and an extension to non-i.i.d, sampling can be found in Serinko and Babu (1995) for d = 1. Hartigan (1978) has conjectured the asymptotic distribution of the empirical quantization error for a special population distribution where the uniqueness condition fails but has given no proof. Consistency results for a quantization (clustering) procedure based on a projection pursuit technique can be found in Stute and Zhu (1995). Stability and consistency
4. Basic properties of optimal quantizers
63
results for a trimmed version of the quantization problem are contained in CuestaAlbertos et al. (1997) and a central limit theorem for trimmed quantizers has been given by Garci£-Escudero et al. (1999). Theorem 4.1 provides the basis for the famous Lloyd algorithm used to design quantizers. To construct an approximation to an n-stationary set of centers for P of order r the iterative method proceeds as follows: Let ~ > 0 be given. Step 1. Choose an initial set a(0) of n points in R ~, calculate Co = E min IIX - aiI r. aEa(o)
Step 2. Determine a VoronoLpartition .A(i) with respect to a(i). Step 3. For each set A E ,4 (~) with P(A) > 0 choose a center a A for the conditional probability P(.I A) of order r and set ~(i+1) (aA: A E ,A(~)}. __
Step 4. Calculate ei+l = E
rain ]lX - a l l r. If (ci - e i + i ) < e e i then stop. Otherwise increase
aE~(i+i)
i by one and repeat Step 2,3 and 4. This algorithm was independently discovered by Steinhaus (1956) and Lloyd in 1957 (see Lloyd 1982). It is often called Lloyd's method I, since Lloyd developed a second type of algorithm (Method II) to design quantizers in the one-dimensioned case. Many people rediscovered Lloyd's method later on. For a description of the history of the algorithm we refer the reader to Gray and Neuhoff (1998). As it stands the algorithm is hard to use in practice. But if P is a discrete probability with finite support then the above algorithm can immediately be applied. The properties of Lloyd's algorithm in the context of general deterministic descent algorithms have been discussed in Sabin and Gray (1986). Recently Bouton and Pages (1997) thoroughly investigated a constant step stochastic gradient descent algorithm for the design of quantizers which is closely related to the Kohonen algorithms used in the theory of neural networks. Mentioning just these two algorithms for the design of quantizers is an arbitrary act since there exists a vast amount of literature concerning this subject. For a survey we refer the reader again to Gray and Neuhoff (1998).
64
5
I. General properties of the quantization for probability distributions
Uniqueness
and optimality
in one dimension
In the one-dimensional case there is a reasonable criterion for the uniqueness of nstationary sets. This immediately gives uniqueness of n-optimal sets of centers. In this section let d = 1. Let X denote a real random variable with distribution P satisfying E[X[ ~ < co for some 1 < r < co. The probability P is called s t r o n g l y u n i m o d a l if P = hA such that I = {h > 0} is an open (possibly unbounded) interval and log h is concave on I. Note that such distributions have all their moments finite. For this and further properties of (nondegenerate) strongly unimodal distributions we refer to Dharmadhikari and Joag-Dev (1988).
5.1
Uniqueness
The following theorem is due to Kieffer (1983). See also 2Yushkin (1984). 5.1 T h e o r e m ( U n i q u e n e s s ) I f P is strongly unimodal, then [Sn,r(P)[ = 1 for every n E iN, 1 < r < co. Strongly unimodal distributions are unimodal about some mode a E /R, i.e., the Adensity h of P is increasing on ( - c o , a) and decreasing on (a, co). Example 4.11 (as well as the subsequent Example 5.2) shows that the assertion of Theorem 5.1 may fail for unimodal distributions. In view of Lemma 4.7, the unique n-optimal set of centers of order r for a symmetric, strongly unimodal distribution is symmetric. It is a surprising fact that symmetric, 2-stationary sets of centers may fail to be 2-optimal for symmetric, unimodal (absolutely continuous) distributions. This is illustrated by the following example taken from Abaya and Wise (1981). The same phenomenon occurs for truncated Cauchy distributions, hyper-exponential distributions and for certain variance mixtures of normal distributions. See Karlin (1982), Tarpey (1994) and Flury (1990). 5.2 Example
Let P = hA with
¢-~"--~ fzl
/
h(x) = ~ 7-1xl |
72
1 _< Ixl
~
(0,
<
7,
Ixl > 7.
P is symmetric and unimodal about 0. Let n = 2 and r = 2. Then V2(P) = V a r X -~s~ = 5.611... and it is easily verified that $2,2(P) = {a1,~2,~3) with a1={-1,3}, One
a2={-3,1},
a3=
36'36
"
obtains EminIX aeal
-
al 2 =
EminIX aea2
-
al 2 =
47 -- = 18
2.611..
'
5. Uniqueness and optimality in one dimension
65
3551 E m i n l X - a] 2 = - = 2.739... ,e~3 1296 (Use the formula of Remark 4.6 (c).) Hence, C2,2(P) = {a1,(~2} and ½,2(P) = 4_71s," see Figure 5.1. We see that the symmetric, 2-stationary set (~3 is not 2-optimal. It is the sharp peak of the density which causes asymmetric optimal sets of centers (and prevents P from being strongly unimodal).
v
v
Figure 5.1: 2-optimal centers of order 2 Quantization for a symmetric distribution is related to quantization for its one-tailed version as follows. 5.3 R e m a r k (Symmetric distributions) Let P be symmetric with P({0}) = 0 and I supp(P)I _> n. Let Q = P(.[[0, co)), the one-tailed version of P. (a) For a C / R and k E 1N, a E Sk,~(Q) implies a c (0, co). This follows from Remark 4.6(a) and L e m m a 2.5. For a c (0, co), one obviously obtains a U ( - a ) E S2k,r(P) if and only if a E Sk,~(Q). In particular, there always exists a symmetric, n-stationary set for P of order r provided n is even. Example 4.13 shows that this may fail if n is odd. (Zopp~ (1997) proved the existence of symmetric, n-stationary sets for every n in case r = 2 under the assumption that P is absolutely continuous and supp(P) is convex.) Moreover, a U ( - a ) E C,~,r(P) with n 2k implies a E Ck,~(Q) for a c (0, co). This follows from Theorem 4.1 since U W(ala u ( - a ) ) = [0, co). =
aE~
(b) We have V,~,,.(P) <_ Vk,r(Q), n = 2k, and equality holds if and only if there exists a symmetric set fl E C~,r(P). In this case, a E Ck,T(Q) implies a u ( - a ) E C,~,r(P). In fact, for a C (0, co), I~1 -< k, V,,,r(P)< f -
min
./be~uC-~)
IX-bFdP(x)
= f min tX - al r dQ(x). , ] aCa
Choosing a E Ck,r(Q) gives V,~,r(P) <_ Vk,r(Q) and if V,~,~(P) = Vk,r(Q), then fi -a U (-o 0 E C,~,r(P). Conversely, if j3 E C,~,r(P) is symmetric and ~ = ~ A (0, co),
66
I. General properties of the quantization for probabifity distributions
then V~,r(P) : / minbe#x - bl~ d P ( x ) -- / mina~aIx - aF dQ(x) > Vk,~(Q) implying V,,,r(P) = Vk,r(Q). As yet only few examples of distributions P which are not strongly unimodal but satisfy IS~,r(P)I = 1 are known. See Fort and Pages (1999) and for the particular simple case n -- 2 and r = 2, Yamamoto and Shinozaki (1999).
5.2
Optimal
Quantizers
Let us describe optimal quantizers of orders r = 1 and r = 2 for some common univariate distributions. For n _> 2, consider real numbers al < ... < a~. Let m~ = (ai + a~+1)/2, 1 < i < n - 1, and a = { a l , . . . ,am}. Then the Voronoi region generated by ai takes the form
w(all
) =
W(ai[a) = [mi-l,mi], 2 < i < n -
w(a.l
1,
) = Ira._,, oo).
We assume that P is continuous so that the boundaries of the Voronoi regions have P-measure zero. Let F denote the distribution function of P. By Lemma 2.5, we have a c S~,r(P) if and only if P ( W ( a i I a ) ) > 0 for every i and al
--00
(5.1)
ml
~l
/ (a,-x)r-'dp(x)=f mi- 1
04
6r.
f
mn-1
2
oo
an
In case r = 1, the equations (5.1) take the form 2F(al) = F(rn,),
(5.2)
2F(ai) = F(rai) + F ( m , - l ) , 2 < i < n - 1, 2F(a,~) = 1 + F(m,~_t).
5. Uniqueness and optimality in one dimension
67
One obtains, for instance, {-F-l(¼), F -1 (3) } • $2,1 (P) for symmetric probabilities p . ( F - l ( y ) = inf{x e In : F ( x ) > y}, y • (0,1).) In case r = 2, (5.1) takes the form
alF(ml) = /
xdP(x),
--00 mi
(5.3)
xdP(x),2 < i < n-
ai[F(mi) - F(m~_l)] = /
1,
rrtl- 1 0o
a~[1 - F ( m , _ l l ] =
/
xdP(x).
~ t * - - !.
One obtains, for instance, { - E } X [, E IXI } • $2,2(P) for symmetric probabilities P. Now assume that P -- hA is strongly unimodal. Then by Theorem 5.1, (5.1) has a unique solution in I -- {h > 0} which provides the n-optimal set of centers a for P of order r. If, additionally, P is symmetric, we have c~ -- -c~, that is, ai = - a ~ + l - i for 1 < i < n. Therefore, mk 0 in case n = 2k and ak+l 0 in case n = 2k + 1. In both cases it is enough to solve the first (or last) k equations of (5.1). :
=
A remarkable property appears for the exponential distribution. It is the content of part (a) of the following proposition.
5.4 Proposition (a) Let P = E(c) and let a = { a l , . . . , as} • C~,r(P) with al < . . . < a,~. Then v~,r(P) = a r.
(b) Let P = D E ( c ) and let a = { a l , . . . , a,,} C C,,,~(P) with al < ... < am. Then V,~,r(P) = ak+l, ~
if n = 2k,
ak+2/2c
V~,~(P) = rc r
/
x r - l e -x dx,
i f n = 2k + 1.
0
Proof We may assume without loss of generality that c -- 1. By Theorem 5.1, c~ is the unique n-stationary set of order r.
68
L General properties of the quantization for probability distributions
(a) We have
Vn,~(P) = E min IX - a, F l<_i<_n
ml
u-i
rni
= f o [ x - a l l r e - ~ d x + i ~ 2f ~ _ ._ 'x-ai]~e-~dx + j
I x - a, lre -xdx
ran--1 al
O4
= f (al - x)re-* dx + ~: 2 . ~ ._ (ai - x)~e-* n--1
i=1
rnri
oo
ai
an
Integration by parts yields for b < a < c a
a
/ ( a - x)re-X dx = (a - b)re-b - r f (a - x)r-le-X dx, b
b c
c
/ ( x - a)re-X dx : -(c - a)re-C + r / ( x - a)r-le-Z dx, a
a
oo
oo
/ ( x - a)re-* dx = r / ( x - a)r-le-X dx = re-aF(r). a
a
Therefore
V,~,,(P) = a~ + ~ ( a ,
- m,-1)" e -'~'-1 - Z ( m ,
i=2 n-1
- a,)" e -'~'
i=1 ml
oo
q- r ~ / ( x - ai)r-le-X dx q- r / ( x - an)r-le-X dx i:1
o4
an
al
ai
By (5.1), this gives
V,,,r(P) = a r1.
5. Uniqueness and optimality in one dimension
69
(b) ff n = 2k, the assertion follows from (a) and R e m a r k 5.3. Now let n = 2k + 1. Then mk+l
v.>.(P)
=
k
x'e-:
dx
mk+i
+
Ix
0
k
ak+il
-
e
dx
l
OO
+ i
Ix - anlre-x dx
mk+l
-
]g~-I
ak+i
i'"-'"+z i 0
i~2m.k+i_ 1
k
rn~÷i
q-i~2 ] =
oo
(x-ak+i)re-xdxq- f
ak+i
an
Again, integration by parts and (5.1) give the desired formula.
[]
5.5 E x a m p l e ( U n i f o r m d i s t r i b u t i o n ) 12i-a Let P -- U([0, 1]) and let a ----L-~-n : i = 1 , . . . , n}. By Example 4.17, (~ E C~,r(P) for every r _> 1 and 1
V,~,,(P) -
nr(1 + r)2"
Since P is strongly unimodal, a is the unique set of n - o p t i m a l centers of order r. 5.6 E x a m p l e ( D o u b l e e x p o n e n t i a l d i s t r i b u t i o n ) Let P = D E ( l ) , t h a t is, the A-density h of P is given by h(x) = ½ e x p ( - I x ] ) . Then F(x) = ½exp(x) for x _< 0 and P is symmetric and strongly unimodal. Let r -- 1 and note that V~(P) = E[X i = 1. For this distribution there exists a closed-form solution of (5.2). Let Yi = exp(ai/2). For n -= 2k, the first k equations of (5.2) take the form 2yl = Y2, 2yi = yi+l + Yi-1,
2 < i < k - 1,
2y2 = 1 + Yk-lYk
(Yo = 0).
The solution of this difference equation is given by Yi = iyl, 1 < i < k, with Yl = (k 2 + k) -1/2. One obtains a~ ----2 l o g ( ~ ) ,
i
, vf~+ k, ai = 2 1 O g t n ~ ] - - ~ ) ,
1 < i < k, k+l
For n = 2k + 1, the first k equations of (5.2) take the form
2yl = Y2,
< i < n.
70
I. General properties o f the quantization for probability distributions 2yi = yi+l + yi-1, 2 < i < k - 1 , 2yk = 1 + Yk-1.
This leads to Yi = iyl with yl = (k + 1) -1 and hence i ai = 2 log(~--~-]-), 1 < i < k, k+l ai = 21Og( n + l _ ~ ), k + 2 < i < n.
In both cases {a~,... , a,~} is the unique set of n-optimal centers for P of order 1. For the quantization errror we have by Proposition 5.4 V~,I(P) = log(1 + 2),
if n is even,
2 V,~,I(P) - n + 1'
if n is odd;
see Table 5.9. 5.7 E x a m p l e ( E x p o n e n t i a l d i s t r i b u t i o n ) Let P = E ( c ) , that is, P = hA with h(x) = ~ exp(-~)l(0,oo)(x),c > 0. P is strongly unimodal. Let a ~ = 2clo g '[ n~7 1 - -+~ n) , , l < i < n . Then { a l , . . . , am} is the unique set of n-optimal centers for P of order r = 1. This is a consequence of Example 5.6 and Remark 5.3 (and the scaling property of C~,I(P) given in Lemma 3.2 (a)). Furthermore, by Proposition 5.4 1
V,~,I(P) = clog(1 + n).
Since V I ( P ) = E I X - ctog21 = clog2, one has to choose c ---- (log2) -I in order to achieve the norming VI(P) = 1; see Table 5.10. The equations (5.2) and (5.3) have been solved using MATHEMATICA for various strongly unimodal distributions. The numerical solutions are given in the Tables 5.1-5.12. (In case P -- E ( 1 / l o g 2 ) , r = 1 and P = D E ( l ) , r ---- 1, we obtained coincidence with the exact solutions given in Examples 5.6 and 5.7 up to 5 decimal places.) The behaviour of V~,r(P) reflects the value of the r-th quantization coefficient of P introduced in Chapter II.
Notes In the case of smooth densities and r -- 2, Theorem 5.1 is due to Fleischer (1964), who provided the first uniqueness result. A proof of Theorem 5.1 (for r -- 2) based
5. Uniqueness and optimality in one dimension
71
on the "mountain pass theorem" has been given by Lamberton and Pag6s (1996). See Cohort (1997) for a detailed exposition. The property of the n-th quantization error for the exponential distribution described in Proposition 5.4 is new. In the non-quantization setting n = 1 it has been noticed by Gilat (1988). Example 5.6 is essentially contained in Williams (1967). However, Williams intends to find n-optimal centers of order r = 2, but he deals with the equations (5.2) which correspond to r = 1. Example 5.7 is new. Tables of n-optimal sets of centers of order r = 2 for the normal distribution N(0,1), the double exponential distribution D E ( 1 / x / ~ ) , the exponential distribution E(1), and the Rayleigh distribution W(x/~, 2) (cf. Tables 5.1, 5.3-5.6) can also be found in Cox (1957), Max (1960), Lloyd (1982), Fang and Wang (1994), Adams and Giesler (1978), and Pearlman and Senge (1979).
n 1 2 al 0 -0.7979 a2 0.7979 aa a4 a~ a6 a7 as V~,2 1 0.3634
3 -1.2240 0 1.2240
4 -1.5104 -0.4528 0.4528 1.5104
5 -1.7241 -0.7646 0 0.7646 1.7241
6 -1.8936 -1.0001 -0.3177 0.3177 1.0001 1.8936
0.1902
0.1175
0.0799
0.0580
7 -2.0334 -1.1882 -0.5606 0 0.5606 1.1882
8 -2.1520 -1.3439 -0.7560 -0.2451 0.2451 0.7560 2.0334 1.3439 2.1520 0.0440 0.0345
Table 5.1: n-optimal centers and n-th quantization error for the normal distribution N(0,1) of order r - - 2
n
1
2
3
at a2 aa a4 a5 a6
0
-0.7643 0.7643
-1.2621 0 1.2621
4 -1.6382 -0.4569 0.4569 1.6382
5 -1.9422 -0.7947 0 0.7947 1.9422
6 -2.1978 -1.0671 -0.3270 0.3270 1.0671 2.1978
7 -2.4185 -1.2971 -0.5862 0 0.5862 1.2971 2.4185
1
0.4158
0.2307
0.1472
0.1022
0.0752
0.0576
a7 a8
V,,2
Table 5.2: Logistic distribution L ( - ~ ) , r = 2
8 -2.6129 -1.4971 -0.8033 -0.2548 0.2548 0.8033 1.4971 2.6129 0.0456
72
I. General properties of the quantization for probability distributions
n 1 al 0 a2 aa a4 a5 a6 a7 as V,~,2 1
2 -0.7071 0.7071
3 -1.4142 0 1.4142
4 -1,8340 -0.4198 0.4198 1.8340
5 -2.2537 -0.8395 0 0.8395 2.2537
6 -2.5535 -1.1393 -0.2998 0.2998 1.1393 2.5535
7 -2.8533 -1.4391 -0.5996 0 0.5996 1.4391 2.8533
0.5000
0.2642
0.1762
0.1198
0.0899
0.0681
8 -3.0867 -1.6725 -0.8330 -0.2334 0.2334 0.8330 1.6725 3.0867 0.0545
Table 5.3: Double exponential distribution D E ( ~ 2 ) , r = 2
n al a2 a3 a~ a~ a6 a7 as Vn,2
1 1
2 0.5936 2.5936
3 0.4240 1.6112 3.6112
4 0.3301 1.1780 2.3652 4.3652
5 0.2704 0.9305 1.7784 2.9657 4.9657
6 0.2290 0.7697 1.4298 2.2777 3.4650 5.4650
7 0.1986 0.6565 1.1972 1.8574 2.7053 3.8926 5.8926
1
0.3524
0.1797
0.1090
0.0731
0.0524
0.0394
8 0.1753 0.5725 1.0305 1.5712 2.2313 3.0792 4.2665 6.2665 0.0307
Table 5.4: Exponential distribution E(1), r = 2
n
1
2
3
4
5
6
7
8
a1 a2 a3 a4 a5 a6 a7 a8
1.4142
0.9271 2.7353
0.7108 1.8420 3.5501
0.5847 1.4269 2.4815 4.1445
0.5009 1.1798 1.9577 2.9772 4.6135
0.4407 1.0136 1.6363 2.3842 3.3829 5.0012
0.3950 0.8932 1.4157 2.0119 2.7420 3.7266 5.3318
1
0.3565
0.1836
0.1120
0.0755
0.0544
0.0410
0.3590 0.8014 1.2537 1.7523 2.3325 3.0505 4.0248 5.6199 0.0321
Vn,2
Table 5.5: G a m m a distribution F ( ~ , 2), r = 2
5. Uniqueness and optimality in one dimension
73
n
1
2
3
4
5
6
7
8
al
1.9131
1.2657 2.9313
0.9772 2.1140 3.4848
0.8079 1.7010 2.6325 3.8604
0.6947 1.4421 2.1738 3.0025 4.1425
0.6130 1.2615 1.8745 2.5237 3.2882 4.3670
0.5508 1.1270 1.6599 2.2032 2.8001 3.5197 4.5529
1
0.3408
0.1724
0.1042
0.0698
0.0501
0.0377
0.5016 1.0222 1.4966 1.9688 2.4675 3.0277 3.7136 4.7111 0.0294
a2 a3 a4 a5 a6 a7 a8
V~,2
2 2), r -- 2 Table 5.6: Rayleigh distribution W (:i7~7,
n
1
2
3
4
5
6
7
8
al
0
-0.8453 0.8453
-1.2898 0 1.2898
-1.5864 -0.4734 0.4734 1.5864
-1.8067 -0.7974 0 0.7974 1.8067
-1.9810 -1.0412 -0.3303 0.3303 1.0412 1.9810
-2.1244 -1.2353 -0.5819 0 0.5819 1.2353 2.1244
1
0.5931
0.4258
0.3331
0.2739
0.2327
0.2024
-2.2460 -1.3959 -0.7838 -0.2540 0.2540 0.7838 1.3959 2.2460 0.1791
a2 a3 a~ a5 a6 a7
as Vuj
Table 5.7: Normal distribution N(0, ~), r = 1
n al a2 a3 a4 a5 a~ aT as
1 0
2 -0.7925 0.7925
3 -1.2716 0 1.2716
4 -1.6218 -0.4609 0.4609 1.6218
5 -1.9000 -0.7925 0 0.7925 1.9000
6 -2.1314 -1.0542 -0.3265 0.3265 1.0542 2.1314
7 -2.3298 -1.2716 -0.5817 0 0.5817 1.2716 2.3298
V~,I
1
0.6226
0.4569
0.3620
0.3001
0.2564
0.2239
Table 5.8: Logistic distribution L ( ~ )1,
r -- 1
8 -2.5037 -1.4581 -0.7925 -0.2531 0.2531 0.7925 1.4581 2.5037 0.1988
74
L General properties of the quantization for probability distributions
n al a2 aa a4 a5 a~ aT a8 V,~,I
1 0
2 -0.6931 0.6931
3 -1.3863 0 1.3863
4 -1.7918 -0.4055 0.4055 1.7918
5 -2.1972 -0.8109 0 0.8109 2.1972
6 -2.4849 -1.0986 -0.2877 0.2877 1.0986 2.4849
7 -2.7726 -1.3863 -0.5754 0 0.5754 1.3863 2.7726
1
0.6931
0.5000
0.4055
0.3333
0.2877
0.2500
8 -2.9957 -1.6094 -0.7985 -0.2231 0.2231 0.7985 1.6094 2.9957 0.2231
Table 5.9: Double exponential distribution D E ( I ) , r = 1
n al a~ a3 a4 a5 a6 aT as V,,1
1 1
2 0.5850 2.5850
3 0.4150 1.5850 3.5850
4 0.3219 1.1520 2.3219 4.3219
5 0.2630 0.9069 1.7370 2.9069 4.9069
6 0.2224 0.7485 1.3923 2.2224 3.3923 5.3923
7 0.1926 0.6374 1.1635 1.8074 2.6374 3.8074 5.8074
1
0.5850
0.4150
0.3219
0.2630
0.2224
0.1926
Table 5.10: Exponential distribution E ( ~ 1) ,
n al
8 0.1699 0.5552 1 1.5261 2.1699 3 4.1699 6.1699 0.1699
r = 1
1 1.5958
2 1.0650 2.9008
3 0.8299 1.9829 3.6870
4 0.6922 1.5572 2.6090 4.2542
5 0.6001 1.3029 2.0824 3.0882 4.6988
6 0.5344 1.1310 1.7586 2.4987 3.4774 5.0646
7 0.4826 1.0058 1.5355 2.1281 2.8449 3.8053 5.3754
1
0.5883
0.4195
0.3265
0.2675
0.2266
0.1966
a2 a3 a4 a5 a6 a7 a8
Vn,1
8 0.4423 0.9099 1.3709 1.8688 2.4405 3.1416 4.0887 5.6457 0.1737
Table 5.11: G a m m a distribution F(a, 2), a = 0 . 9 5 0 8 . . . , r = 1
5. Uniqueness and optimality in one dimension
75
n
1
2
3
4
5
6
7
8
at
2.2501
1.5347 3.3040
1.2121 2.4268 3.8697
1.0204 1.9835 2.9631 4.2516
0.8906 1.7041 2.4770 3.3430 4.5375
0.7958 1.5079 2.1592 2.8389 3.6352 4.7648
0.7229 1.3608 1.9304 2.5014 3.1233 3.8714 4.9527
1
0.5778
0.4091
0.3173
0.2594
0.2194
0.1902
0.6647 1.2455 1.7555 2.2539 2.7748 3.3567 4.0688 5.1124 0.1678
a2 a3
a4 a5 a6 a7 58
Table 5.12: Rayleigh distribution W(a, 2), a = 2 . 7 0 2 7 . . . , r ---- 1
Chapter II Asymptotic quantization for nonsingular probability distributions It is difficult to find n-optimal quantizers for a fixed number n of quantizing levels at least in the multivariate case. This chapter is concerned with the theory of asymptotic quantization for nonsingular probability distributions as n tends to infinity. The asymptotic behaviour of the quantization error is derived and the asymptotic performance of certain classes of quantizers is compared with asymptotically optimal quantizers. We introduce quantization coefficients which provide interesting parameters of probability distributions. They can be evaluated for univariate distributions and some of them also for multivariate distributions. Moreover, asymptotic quantization is related to a geometric covering problem.
6
A s y m p t o t i c s for the quantization error
In this section we derive the exact asymptotic first order behaviour of constants) as n --+ c~ in case P is not singular with respect to Aa.
V,~,,.(P) (up
to
Let X be a Rd-valued random variable with distribution P, let [I II denote any norm oo
on R d, and let 1 _< r < oo. Lemma 6.1 reflects the fact that [.J ~-n is a dense subset of the Banach space
Lr(P, Ra).
6.1 L e m m a Z f ~ l l X l l ~ < oo, then
lim V, r(P) = 0.
78
H. Asymptotic quantization for nonsingular probability distributions
Proof Let {aba2, a3,...} be a countable dense subset of ]R~ with al = 0. For ~ > 0, {B(ak, (e/2) l/r) : k E N} is a covering of ~d. Therefore, one can find a Borel measurable partition {Ak : k C N} of R d satisfying Ak C B(ak, (¢/2) l/r) for every k. Choose n E N such that
[,xlr P(x) < k>n.4 k 7b
and let fn = ~ akl&. Then fn E 9rn and k =l
Vn.r(P) _< EIIX - A(X)[r
[] The following theorem in its present general form is stated in Bucklew and Wise (1982) for the/2-norm. Under some additional assumptions the result is due to Zador (1963, 1982) (who is also dealing with the/a-norm). See also Fejes T6th (1959) for a special case. For a Borel measurable function h : R d + R and 0 < p < oo let
,,hl,p = ( f lh,PdAd) I/p .
(6.1)
F~rthermore. let P = Pa + P8 be the Lebesgue decomposition of P with respect to Ad, where Pa denotes the absolutely continuous part and Ps the singular part of P.
6.2 Theorem (Asymptotic quantization error) Suppose EIIXll ~+~ < c~ for some 5 > 0. Let Q~([0,1] a) = inf nr/dM,~,~([O,1]d).
(6.2)
n~l
Then Q~([0,1] ~) > 0 and (6.3)
limoo n~/dv,~,~(P) = Q~([0,
1]d)
d~ d/(dTr)
"
The proof is given below. For singular distributions, (6.3) only yields Vn,r(P) = o(n -r/d) provided the above moment condition holds. An investigation of the exact order of Vn,r(P) for several classes of singular (continuous) distributions P is contained in Chapter III.
6. Asymptotics for the quantization error
79
6.3 R e m a r k (a) The moment condition EHXII r+~ < c~ ensures that the limit in (6.3) is finite. In fact, h c Ll(Ad), h >_ O, and filxilr+~h(x) dx < ec for some 5 > 0, implies (r+~)d P = -d.~r d+r Then h C La/(a+r)(Ad). To see this, let s = d-~r, t = 4--~7--, - and q = --7-"
h(x) s dx < c~
f B(0,1)
and by HSlder's inequality
/
h(x)S dx =
B(0,1) c
f
h(x)Sllxlltllxlrt dx
B(0,1) c
B(O,1) c
B(0,1) c
since tp = r + 5 and tq = (~+~)d > d. 7" (b) The converse of the above implication does not hold. Consider, for instance,
h(z) -- 2~(1+~)n2+ 1 ~ if x E [2~, 2~+1), n C N. Then h E LI(A) and h 1/(l+r) d.~ =
n(2+r)/(l+r) < OO
but 2n-i-t
f
¢¢
l
/
~=1
[x[r+~ dx
2n
2n(~+e)2~ 2n(i+~)n2+~
->
2n6 7?.2+ r
(X:)
n,=l
for every ~ > 0. Note that f ]x]~h(x) dx < c¢. (c) Without any moment condition we still have
liminf n~/dV~,,(P) > Q~([0,1] ~)
dPa d/(d+r) "
This is contained in the subsequent proof of Theorem 6.2 (Step 5). The following example shows that the moment condition in Theorem 6.2 cannot be dropped.
80
H. Asymptotic quantization /'or nonsingular probability distributions
6.4 Example
Let Xk = 3 • 2k-1 and
C /P(X =xk) - 2~klog2k, k > 2 with norming constant c =
EXr
~ )
. Then
3rc~ ~ = ')---~-
k 1 1~,1o~2k < co,
k=2
k=2
EXr+~ = 3r+~2 r ~
k ,s~o~2 2k~ k = co, (f > 0.
k=2
Foro~CNwith]o~ I=n,letI={k>2:o~N[2
k,2 k + l ) = o } . Then f o r k C I
min ]xk - a] r > (xk - 2~)r = 2 (k-1)r aCvt
and hence oo
m i n l x k - - al r¸ c k=2 aea 2krk log2 k
E min IX - al r =
c
1
kEI c
oo
1 k=,~+2 k log2 k
c f >- --7 2
n+2
__1 dx x log2 x
C 2 r log(n + 2) This gives lim nrV,~r(X) = oo. Here the order of convergence to zero of V,~,r(X) is (logn) -1. {x2,... , x=+l} one obtains minbe]~[Xk -- b[r = 0, k < n + 1, 3r2kr min[xk--b[ r < x ~ - - - 2---7--, k > n + 2 b~fl
-
-
-
-
In fact, for f~ =
6. Asymptotics for the quantization error
81
and hence 3rc 3 rC = ~~+ 1 < ~ < 2--;- k 2 k l o g 2 k - 2 ~ l o g ( n + l ) "
E~2lX-bl It follows that C
- - < lim inf log nV,~ ,.(X) 3re
<_ limsuplognVn,r(X) <_ -~-. In case Pa # 0 and EIIXII "+' < oo for some (f > 0, the number
(6.4)
Q r ( P ) = Q,([0,1] d) d~_~ d/(d+,)
is called r - t h q u a n t i z a t i o n
coefficient of the probability P on R d. (Notice that
Q,(P) depends on the underlying norm.) By Remark 6.3(a), 0 < Q r ( P ) < oc. We also write Q~(X) instead of Q~(P). For A • B(R d) bounded with Ad(A) > 0, define the r-th quantization coefficient of A by
Q,(A) = Q,(U(A)).
(6.5)
Quantization coefficients provide interesting parameters of a probability distribution. They are discussed in Sections 8-10. We shall need the following lemmas for the proof of Theorem 6.2. 6.5 L e m m a Let P = sP1 + (1
-
s)P2, 0 < S < 1, f Ilxllr dP~(x) < c<). Suppose .'/dY.,.(P1) -~ c e [0, ~ ] a s .
-+ ~.
Then (a)
liminf n'ldv~,,(P) >_ sc + (1 - s)liminfn'ldv,~,,(P2), lim sup nr/dv,~,,(P) < 8(1 - ¢)-'/dc + (1 - s)C "/d lim sup nr/dvm,(P2) Tt,'-~O0
I'~--+OQ
for every 0 < c < 1.
(b) If lim nrldv~,~(P2) = O, then n-~ oo
lim nr/dvn,r(P) = sc. r~-~ o o
82
II. Asymptotic quantization for nonsingular probability distributions
Proof (a) The first inequality follows immediately from L e m m a 4.14 (a). For 0 < ~ < 1, let
nl = nl(n,~) = [(1 - ¢ ) n ] and n2 = n2(n,¢) = [¢n]. ([x] denotes the integer part of x E R.) T h e n by L e m m a 4.14 (b)
nr/dv,~,~(P) <_ snr/~Vm,~(Pl) + (1 - s)n~/dvn~,r(P2) = s(n/nl)~/dn~/dvm,r(Pi) + (1 -- s)(n/n2)~/dn~/dv~,~(P2) for every n ->- max{~,l-~_~}. Since n__ _+ ~ -~i second inequality. (b) follows from (a).
and n__ __+ ! as n -+ (x), this gives the 1%2 []
The following result is due to Pierce (1970). 6.6 L e m m a Let d = 1. Then
V~,r(P) <_ n-~(C1EIXF +~ + C2), 6 > 0, n _> Ca for numerical constants C1, C2, C3 > 0 depending on 5 and r (but not on P and n). Proof First, assume P((1, oo)) = 1. We use a r a n d o m quantizer argument. Consider i.i.d. Pareto-distributed r a n d o m variables Y1,... , Y~ independent of X with distribution function 1
G(y) =
-
y-(~/r),
O,
y > 1
y <_ l.
Set b = 6/r. T h e n
V,~,~(P) < E min IX - Y~I~ = f E min [x - YiF dP(x) --
l<_i<_n
j
l<_i<_n
and for x > 1 oo
E min Ix l<_i<_n
11//[~
= r f/P( j 0 oo
min lx - Yil > t) tT-1 dt
"l
= r / [ 1 - G(x + t) + G(x
-
t)]nt r-i dt
0 x--1
= r f [1 - (x - t) -b + (x + t ) - b ] n t r - 1 dt 0 oG
+ r f (x + t ) - ~ t ~-I dt ---: Jr(x) + J2(x). X--1
6. A s y m p t o t i c s for the quantization error
83
Since (x-t)
-b-
(x + t) -b > 2btx -O+b), O < t < x - 1
and thus [1 -
(x - t) -b + (x + t)-b] n <_ [1 - 2btx-O+b)] n <_ e x p ( - 2 n b t x - O + b ) ) , 0 < t < x - 1
one gets OO
Ji(z) <_ r / e x p ( - 2nbtx-( l +b))t r-1 dt 0
= F ( r + 1)x ~(l+b) = F ( r + 1)rrx r+a O x r+a --:--, n > l. nr(25) r nr Furthermore, CO
4(x) _< , - / ( 1 +
t)-/mt r-1
dt
0
_ r(,- + 1)r(bn - r) r(b,~) < --
where
C2
--
,/l, r
n>ca '
--
the constants c2 _> 1 a n d Ca > 0 d e p e n d only on 5 and r. This gives
y~,r(P) <_ ~-r(clEIXI ~+~ + c2), ~ > 0, ~ _> c3. Since V ~ , ~ ( - X ) = Vn,r(X) by L e m m a 3.2, the above inequality also holds in case P((-co,-1)) = 1. If P ( [ - 1 , 1]) = 1, choose (~ = { - 1 + 2/-1: i = 1,.. n}. T h e n
V.:(P) <_ E ~ n l X - a l r < n -r, n>_ 1. Now let P be arbitrary. We m a y write P = S l P ( - I ( - c o , - 1 ) ) + s a P ( . l [ - 1 , 1]) + saP(-I(1, 0o)) with s~ = P ( ( - c o , - 1 ) ) , s2 = P ( [ - 1 , 1]), and sa = P ( ( 1 , co)). Let nl = In/3]. F r o m L e m m a 4.14 (b) and the above inequalities we deduce
v~,r(P) < ~ ( c ~ E I X I r+a + ~), ~ > ~3.
H. Asymptotic quantization for nonsingular probability distributions
84
This implies
V~,r(P) <_ n-~(5rclEIXr ~+6 + 5~c2), n _> 5ca. [] Lemma 6.6 is easily generalized to arbitrary dimensions d. 6.7 C o r o U a r y We have
V,~,r(P) < n-r/u(ClEllX[I r+6 + C2), 5 > O, n > Ca for numerical constants C1, C~, Ca > 0 depending only on 5, r, d and the underlying norm.
Proof For p > 1, let I[ [[p denote the /p-norm on R ~. Recall that all norms on R d are equivalent. Hence
V,~,r(P) < coV,~,r(P, [1 IIr) where Vn,,-(P, ][ ]Jr) denotes the n-th quantization error for P of order r with respect to t h e / r - n o r m and Co > 0 is a constant depending only on r and the norm [[ [I- Let nl = [nl/d]. By Lemma 4.15 and Lemma 6.6 d
v,,,,(P, II II,-) _< ~
v~,,,(x,)
i=1
Ttl r
E[Xi[ rWtf+ de2 , 5 > 0, nl _> Ca
C1 i= l
<_n -rId 2rc~
E I N [ r+e+2rdc2
, 5>0,
n>(2ea) d
i=1
where the constants cx, c2, ca > 0 depend only on 5 and r. Furthermore, d
= EllXlk+~ < c4EIIXlr +~ i=~l
for a constant c4 > 0 depending only on r, 6 and [[ [1. This proves the assertion. We further need an elementary lemma. 6.8 L e m m a
For m E N and numbers 8i > O, let B = { (v,,... ,vm) E (O,c~)m : E v i i=1
< l}
[]
6. Asymptotics for the quantization error
85
and
s~/(d+r) ti-- m , l
Then the function
F : B --+ R+,
F(Vl,... , v,~) = ~
sir'[ ~/d
i=1
satisfies
F(tx,...,tm)=
rain F(vl,... (v~,.., ,v.,)~B
vm)=
s~/(d+r) i=I
Moreover, ( t l , . . . , tin) is the unique minimizer o f F . Proof By HSlder's inequality (for exponents less than 1) with p = d/(d + r) and q = - d / r , one obtains
F(vl,
. . . , v , ~ ) >_
v7 rid
q
= F(tl,...,t,~) for (Vl,... , vm) C B. To get equality F ( V l , . . . , v~) = F ( t l , . . . , t~) it is necessary m
(and sufficient) that ~ vi = 1 and si = cv~/p, 1 < i < m, for some constant c > 0. i=1
This implies ti = v~ for every i.
[]
P r o o f o f T h e o r e m 6.2 The proof is given by a sequence of steps from the uniform case to the general case. S t e p 1. Let P -- U([0,1]d). Let m , n E N, m < n and let k = k ( n , m ) -- [(~)l/d]. Choose a tesselation of the unit cube [0,1] d consisting of k d translates C1,. • • , C ~ of kd
the cube [0, ~]d. Then P = k-d~_, U(C~). By Lemma 4.14 (b) and the translation i=l
86
H. Asymptotic quantization for nonsingular probability distributions
and scale invariance of Mm? (see L e m m a 3.2 (b)), kd
v.,~(P) <_k -~ ~ vm,~(u(c,)) i=1 kd
= k-d E
Mm,~(Ci)k-r
= k-rM,~,~([0,11 d)
= k-rVm,~(P) and hence
~/%,~(p) <_(k ~__A1)~~/dUm,~(p). This implies
limsupnr/4v,~,~(P) < m~/dvm,~(P) for every m E N. Therefore, lin~_~oQ nr/dVn,r(P) exists in [0, c~) and l i m nr/dVn,r(P) = Qr([o, 1]d).
(6.6)
From Theorem 4.16 it follows that @([0, 1] d) > 0. m
S t e p 2. Let P = ~ siU(Ci), where { e l , . . . , Cm} is a packing in N ~ consisting of i=1
closed cubes whose edges are parallel to the coordinate axes and with common length of the edges l(Ci) = l > O, si > 0, ~ s~ = 1. Set h = dP/dA d = ~ sil-Qc,. We may i=1
i:1
assume without loss of generality that si > 0 for every i. For n E N, let a/(a+r) 8i ti -- m andni=ni(n)=[tin],l
Then by L e m m a 4.14 (b) and L e m m a 3.2 Ft~
v.,r(P) < ~ s,v.,,r(u(cd) i=1
= E s i V , , ~ ( V ( [ O , lld))F, n > max (1/ti). '
--
l<_i<m
i=1
From Step 1 it follows that
,~/~v.,,~(u([o, 1]d)) = (~/,~,)'/~/~v.,,r(u([o, 1]~)) -+ tY~r/gQr([O, 1]~) as n --+ oo.
6. Asymptotics for the quantization error
87
This implies lim sup n~/dV,~,~ (P) <_ Q~ ([0,1] d) E (6.7)
rt--~oQ
sitY~/dl~
i=-I
= Q~([0, 1]a)llhlld/(d+r). To prove that Q~([0, 1]d)llhlld/(d+~) is a lower bound for L ----liminf~o~ nr/dv,~,r(P), let fl ---- ~(n) C C~,r(P) for n • N, ~i -- ~i(n) = / ~ DintCi, and ni = ni(n) = I~d, 1 < i < m. For 0 < e < I/2, let Ci,¢ c Ci be a parallel closed cube with the same midpoint as Ci and edge-length l(Ci,~) = l - 2~. Choose a finite set 7i = 7i(e) C Ci,e, ]7~] = k = k(~) such that min [Ix - all < inf ]Ix - y[[ for every x E Ci e, 1 < i < m. aETi
- - yEC~
Then Vn,,.(P) = ~--~ si f min Ix - bl[r dx l -d i=1
Ci
_> ~ s i
f
i=l
= E
be,u~,minHx - bHr dx l -d
Ci,~
si f
be~,u-y,miIIx n -
b[I~ dx l -a
Ci,e
i=1
> ~
be~
siVn,+k,r(U(Ci,e))(l - 2e)d/-d
i=1 m
= EsiV,~,+~,,(V([O, l l d ) ) ( / - 2e)d+r/-d, n • 1~. i=1
Choose a subsequence (also denoted by (n)) such that ni
- - -4 vi E [0, 1], 1 < i < m and n~/4V,~,r(P) -4 L as n -4 oo. n
Since ~ ni < n, we have i=1
vi _< 1. Furthermore, vi > 0 for every i. Otherwise, i=1
Step 1 yields L -- 0% which contradicts (6.7). By taking a further subsequence we can assume without loss of generality that lim (ni + k)~/dV,~,+k,~(U([O, 1]d)) = Qr([O, 1]d), 1 < i < m. This implies L _> Qr ([0,1] d) ~ i=1
siv~r/d(l - 2c)d+~l -d.
88
II. Asymptotic quantization for nonsingular probability distributions
Since 0 < ~ < I/2 is arbitrary and by Lemma 6.8, one obtains
siv:'/dl"
L > Qr([o, 1]d) E i=1
s/("+')
_> Q,([O, 1]d i=1
1]~)llhlla/(d+,).
= Q,([o, Hence, (6.3) holds in this setting.
S t e p 3. Let P be absolutely continuous with respect to ,kd and assume that P has a compact support. Let supp(P) C C for some closed cube C whose edges are parallel to the coordinate axes with edge-length l(C) = l. For k E N, consider a tesselation of C consisting of k d closed cubes C b . . . , Ckd of common edge-length l(Ci) = l/k. Set k~
P~ = ~ P(c~)u(c~), i=1
dad -
= ~ (cd
where )td(C,) = (l/k) d, and h = dP/d)~ a. By differentiation of measures
hk --~ h
)~a-a.s. as k --+ c<~
(cf. Cohn, 1980, Theorem 6.2.3). Therefore, by Scheff6's lemma lim
k--~-oo
link - hll, = 0.
Since
Ilhk - htl~/(d+,) h a/Cd+,) h dlCd+,) k all(d+,)-
all(n+,)
< :ad(C)'/dlihk <-[Ihk-
- hll~, h ~/(d+,) d/ca+,)
this implies lira IIn~ll~/cd+,)=
k--~oo
Ilnll~/(~+,).
Furthermore, by Step 2 (6.9)
limoon'/dv,~,,.(Pk ) = Q,([0, 1]~)llhklk/(a+,), k • N.
For n • N and 0 < e < 1, let nl = nl(n,e) = [(1 -e)n] and n2 = n2(n,e) = [(en)l/d] d. Consider a tesselation of C consisting of n2 closed cubes of edge-length I/n~/d and
6. Asymptotics for the quantization error
89
let 7 = 7(n2) denote the set of its n2 midpoints, that is, 7 corresponds to the cube n2-quantizer for C. Then
min Hx-~ll <_ 2n~/d cl---kl< (on)l~----~g ~, x • C, n > 1/~ for some constant cl > 0 depending only on the underlying norm. Let ~(nl, k) • C,~l,r(Pk) and 5 = 5(n, k,e) = c~(nl, k) U 7(n2). Then 151 < n and
n ~14 f
Ilx--allrdPk(x) -
min
J
ae6(n,k,e)
f
min
J ae~(u,k,e)
IIx-aHrdP(x)
j a~5(n,k,~)
-
< ~llh~
- hll~
--: c211h~ - hll~, k • N, n
> max~ 1-, ~ . te 1-cJ
-
This implies
nr/dV,~,~(P) < n r/d [ -
J
IIz
rain
-
~ll" dP(x)
ae6(n,~,~)
< n ~/d f
-
min IIx- all~dPk(x) +~llh~ - hll~ J ~e6(,,,k,~)
< n r/d jf
-
rain
¢,e,~(m,~)
[ i x - ~lrdPk(x)
= n~/dv.~,~(Pk) + c211hk -
+c~llhk-- hll~
hll~, k • N, n > m a x ~ 1, ~ -
~e
1-e
}
and hence by (6.9)
limsupnr/dV~,r(P) < (1 -- e)-r/dQr([O , 1]d)llhklld/(d+~) +
c211hk-
hll~, k • N.
~--->00
Letting k tend to infinity and then letting e tend to zero yields (6.10)
lim snpnr/dV,,r(P) < Q~([O, 1]~) IIhlld/(d+r).
To prove the converse estimate, let j3(nl) • Cm,r(P ) and ~- = 7-(n, e) = ~(nl)UT(n2). Then H -< n and as above
nr/d f J a~(,~,~)
f J ~(,~,~)
1}
k E N , n>_max e , ~ _ e .
_
hL11,
90
H. Asymptotic quantization for nonsingular probability distributions
This implies nr/d f
n'`/dVnbr(P)
J
min IIx - all'`dP(x)
> nr/d[ --
J
min I I x - a l F d P ( x ) ae~'(n,~)
~ n rId f
min lix - all" dPk(x) -
c211hk-
hlh
>_ n'`/dv,~,r(Pk) - c2Nhk - hlh, k • N, n >_max{-[,~_c}.l 1 Therefore
(1 - ~)-'`/"lim~,~;/%,,'`(P)
> Q'`([0, lld)llh~ll~(d÷'`) -- c:llh~ -- hill, k • N.
Letting k tend to infinity and then letting c tend to zero yields
lira inf n'lav,~,'` ( P) ~-~oo
(6.11)
=
lim inf n~ldvm,, (P)
> Q'`([o, 1]d)llhlla/(a+'`). Hence, (6.3) holds in this setting. S t e p 4. Let P be singular with repect to /~d and assume that P has a compact support. Let supp(P) C C for some open cube C whose edges are parallel to the coordinate axes with edge-length l(C) = l. For any ~ > 0, there is an open set A C C such that P ( A ) = 1 and AS(A) <_ ~. Moreover, there exists a countable partition {Ci : i • N} of A consisting of half-open cubes C~ ¢ ~ with edges parallel to the coordinate axes (cf. Cohn, 1980, Lemma 1.4.2). Choose m ~ 51 such that
F E
P(Ci) <_ e'`/d.
i>_m+l
For n • N, let
~d(c~) 8i ~
ni = ni(n) =
m
[(sinl2)lla} a,
l
E Aa(cj)
j=l
n~+l = n~+l(-)
= [(-/2)1/d] ~.
For 1 ~ i _< m, consider a partition of Ci into r~ cubes of common edge-length (Ad(Ci)/ni) 1/d and let ai = (~i(nd) denote the set of its ni midpoints. Furthermore, consider a partition of C into rnm+l cubes of common edge-length #/~l/d and let m+l
~m+l = O~m+l(nm+l) be the set of its nm+l midpoints. Let ~ = (~(n) :
LJ (~i. Then i:l
I~1 _< n and min IIx - all < ~e~(~) -
c,~d(C~)~l d 2n~/~ czl/a
< -
c,~<~(C~)~la (s~/2)'/~
<- (n/2)~l----~' x • C~, n >_ 2/s~,
6. Asymptotics for the quantization error
91
cl -- (n12) lid'
min I I z - a l l < - -
aea(,~)
xEC,
n>2
--
for some constant c > 0 depending only on the underlying norm. This implies
nr/dVnr(P),
f
_<
oo,.,minIlx - all~ dP(x)
= ~ '~r/~f min ll= - all~dP(=) + '~/~ i=1
Ci
min [ x - all r dP(x) aEa(n)
A\Um=lCi
m
<_ ¢~z'/~"1~~
P(cO + c"~.l~l" ~
i=1
P(C~)
i:>m+l
< 2c~2rlde ~/d, n > max (2/si) --
--
l
and hence
limsupnr/aV,~,r(P) < 2cr2r/%r/d. ~ - + oo
Since c > 0 is arbitrary, one obtains lira nr/aV,,,~(P) = O.
(6.12)
~--+oo
S t e p 5. Let P be arbitrary. Set h = dPa/dA a. For k E N, let Ck = [ - k , k] d. Then P(-lCk) =
hlek d P~('ACk) f(Ck) A + P(Ck)
is the Lebesgue decomposition of P(.ICk) with respect to Ad. From Steps 3 and 4 and Lemma 6.5(b) it follows (6.13)
hlok nlinaoonr/aVn,r(P(.]Ck)) = Qr([0,1] d) P--~k) a/a(a+r)
Since V,~,~(P) >_ P(Ck)V,~,~(P(.]Ck)), this gives
liminfn~ldv,~,r(P) > Qr([0, l]")llh le~ Ih/(d+~). Therefore, by the monotone convergence theorem as k -+ cx)
lim~f n~/aV,~,,(P) > Q,([0, 1]a)llhHa/(d+,).
(6.14)
S t e p 6. Now suppose EHX[[ r+6 < (x) for some 6 > 0. Set h = dPa/dA d. For k E N, let C~ = [ - k , k] a. Let 0 < e < 1. Using the decomposition P = P(Ck)P(']Ck) + P(C~)P(.IC~) , it follows from (6.13) and L e m m a 6.5 (a) that lim sup n~/dv,,,r(P) < (1 -- e)-~/dQ~([0, 1]d) ]]h lck ]Id/(a+~) r~--+oo
+ f(C~)c -~/a lim supnr/aV,~,~(P(.IC~)).
92
H. Asymptotic quantization for nonsingular probability distributions
By Corollary 6.7,
P(C~)nr/dV,~,r(P('[C~)) < cl /I[x][r+6dP(z) + c2P(C~), n > 63 c~ (with constants cl, c2, c3 independent of k) and the above moment condition implies lim f ]lxl[~+~dP(x) = O. c~
k-~oo , ]
Therefore, letting k tend to infinity in (6.15) and then letting ~ tend to zero yields (6.15)
lira sup nr/dv~,~(P) <_Q~([0,1] d) I]h lld(,+~).
In view of (6.14), the proof is complete.
[]
Notes The approximation (6.3) of the n-th quantization error occured apparently for the first time in Panter and Dite (1951) for univariate absolutely continuous distributions and r = 2. The proof of Theorem 6.2 for distributions with compact support is a simplified version (and an extension to arbitrary norms) of Bucklew and Wise (1982). The crucial point in their treatment of distributions with unbounded support is a "compander" result whose proof is not complete (cf. Linder, 1991). As shown above, the unbounded case can be resolved via the Pierce-Lemma 6.6 and its generalization to arbitrary dimensions (Corollary 6.7). The asymptotics for empirical versions of the quantization problem (or related location problems) when both the level n and the sample size tend to infinity were studied in Hochbaum and Steele (1982), Wong (1984), Zemel (1985), Rhee and Talagrand (1989a), McCivney and Yukich (1997), Yukich (1998), PStzelberger (1998b), and Graf and Luschgy (1999c).
7. Asymptotically optimal quantizers
7
93
Asymptotically optimal quantizers
Let X be a/Rd-valued random variable with distribution P, let [[ [[ denote any norm on /R d and let 1 _< r < c~. In the light of Theorem 6.2 a sequence (an)n>1 with an C //~d, ]an[ _< n, is called a s y m p t o t i c a l l y n - o p t i m a l set o f c e n t e r s for P o f o r d e r r if
lim nr/dE min IIX - all r = Q r ( P )
(7.1)
provided Pa ~ 0 and EIIXL[ ~+~ < c~ for some 5 > 0. Here Q~(P) denotes the r-th quantization coefficient of P as defined in (6.4). Notice that if (an)n>1 is asymptotically n-optimal of order r and {Aa : a c an} denotes a Voronoi partition o f / R d with respect to an, then (fn)n>~ with fn = ~ alA~ E J:~ is an asymptotically n-optimal o*E~n
quantizer of order r, that is (7.2)
lim
n--+OO
7.1
Mixtures
and
n~/dEIIx
-
fn(X)ll ~ --
Q~(P).
partitions
The following lemma is related to Lemma 6.8. 7.1 L e m m a
Let P =
s~p~, s~ > O, ~ s~ = l, f llxllr+~dP~(x) < co for some S > O, P~,, ¢ O for i=1
i=1
every i.
(a) Q~(P) <_ ( ~=1(siQr(P~)) d/(d+~)) (d+~)/d (b) Suppose (7.3)
/ ,n \ Cd+~)ld Q~(P)= (i~=l(siQ~(Pi))d/(d+~)) .
For n E 1N, let (7.4)
t~ =
(s~Qr(PO)d/(d+r) E(sjQr(Pj))d/(d÷~) j=l
ni = ni(n) = [tin], ai,~ E Cn,,r(Pi), an = U ai,~. Then (an)~>_l is an asymptoti=1
icedly n-optimal set of centers/'or P of order r.
94
H. Asymptotic quantization for nonsingular probability distributions
Proof
(a) and (b). By Lemma 4.14(5) V~,~(P) < f min llx - a[[~ dP(x) J
aCctn
<_ ~
s~V,~,,r(P~), n >_ max (1/ti) l<_i<_m
i=-i
and by Theorem 6.2 lira nr/dV,~,,~(P~) = t ~ / d Q ~ ( P i ) , 1 < i < m .
~--~oo
This implies
Q~(P) <_ liminfn ~/~ f min n-~oo
j
aEa,~
IIx - alFdP(x)
< lim sup n ~/e f min IIx - all ~ dP(x) ~--¢'00
<--~
J
~EOtn
siQr(Pi)t; r/d
i=1
[ ,~
\ (d+r)/d
= //_~1 (8iQr (P/)) d/(d+r)fl
• []
For P with P~
=
h)t d # 0 and h
E
Ld/d(d+r)(Ad), define hd/(d+r)
(7.5)
h~ = f hd/(d+r) dad, Pr = h~A~.
Recall that by Remark 6.3, the condition h E Ld/(d+r)()~d) is satisfied in case EIIXll ~+~ < c¢ for some 6 > 0. The probability distribution Pr will play a central role. 7.2 L e m m a
Suppose P~ ~ 0 and EIIXI] r+~ < c~ for some ~ > O. Let {A1,... ,Am} be a Ppacking in j ~ d (i.e., P(Ai n A3) = 0 for i # j) such that P~(Ai) > 0 for every i and P ( O A,) = 1. Then for the mixture P = ~ P(Ai)P(.]Ai), (7.3) is satisfied and (7.4) i~-I
takes the form ti = Pr(Ai) for every i.
i=-i
7. Asymptotically optimal quantizers
95
Proof
Let P~ = hA d. We have P('[Ai)a = hlA~Ad/P(Ai) ~ 0 and thus
) /P(A,) Q~(P('IA,)) = Q~([O,1 ] d ) ( f h d/(d+~)dA d~(~+r)/~ Ai
= Q~([O, 1]a)(Pr(Ai) f ha/(d+~)dA a) (a+~)/d/p(Ai) = Q~(P)Pr(Ai)(d+r)/d/P(Ai). Therefore
(P(Ai)Qr(P(.IAi) ) ) g/(d+~) = Q~( P)a/(a+~)Pr(Ai). Since P~ and P~ are mutually absolutely continuous and hence
i=1
i=1
this gives the assertions.
[]
An interesting consequence concerns univariate symmetric and nonsingular distributions. While there are not necessarily symmetric n-optimal sets of centers (see Example 5.2 and Remark 5.3), there are symmetric asymptotically n-optimal sets of centers. More generally, we have: 7.3 C o r o l l a r y ( I n v a r i a n t d i s t r i b u t i o n s )
Let G be a finite group of bijective isometries on IRa. Suppose further P is Ginvariant, Pa ~ 0 and EIIXII ~+~ < co for some 5 > O. Suppose there exists A E B(1Ra) such that {T(A) : T E G} is a P-packing in 1Ra and P ( ~J T(A)) = 1. Let n i = TEG
n,(n) = [n/[G}] and a~ • Cm,~(P(.IA)). Then ( (.J T(a~))n> , is an asymptotically TcG
n-optimal set of centers for P of order r. Proof
For T E G we have
p = p T = pT + pT, where P = Pa + P, denotes the Lebesgue decomposition of P with respect to Ad. Since Ad is G-invariant, p T is absolutely continuous and p T is singular with respect to Ad. So by the uniqueness of the Lebesgue decomposition, Pa is G-invariant. Let P~ = hAa. Then h o T = h Ad-a.s., T E G. Therefore, Pr is also G-invariant. This implies P~(A) = Pa(T(A)) > 0 and
Pr(A) = Pr(T(A)) = 1/IG[, T E G.
96
H. Asymptotic quantization for nonsingular probability distributions
Fhrthermore, since P(.IT(A)) = P(.IA) T, T • G, we obtain from Lemma 3.2. that
T(a,~) • C,~I,~(P(.IT(A))),T • C , n • IN. []
The assertion now follows from Lemmas 7.1 and 7.2.
7.4 Example (Sign-symmetric d i s t r i b u t i o n s ) Let X be sign-symmetric, that is, for every choice of signs ct -- + 1 , . . . , Cd ----±1, X -( X t , . . . , Xd) has the same distribution as ( c t X ~ , . . . , cdX,~). The corresponding group G satisfies IGI = 2 d and consists of isometrics if, for instance, the underlying norm is the/p-norm, 1 _< p _< co. Suppose P ( { x • j~d : xi = 0}) = 0 for every i and let A = [0, co)d. Then { T ( A ) : T • G} is a P-tesselation o f / R ~. 7.2
Empirical
measures
If ( ~ ) ~ > t is an asymptotically n-optimal set of centers for P of order r and {An : a C c~} denotes a Voronoi partition o f / R d with respect to an, then ( ~ P(A~,)da),~> t is aEotr~
an asymptotically n-optimal quantizing measure of order r, that is
t~CO~ n
where Pr is the Lr-minimal metric defined in
(3.3).
In particular,
~
P(A~)Sa con-
fl*E t3tn
verges weakly to P. On the other hand, c~n is asymptotically Pr-distributed in the sense that the empirical measure of o~n converges weakly to Pr. This result which is suggested by Lemma 7.2 is the content of the following theorem. A remarkable property is that P~ does not depend on the underlying norm. 7.5 T h e o r e m Suppose P is absolutely continuous with respect to Ad and EI]XJl r+6 < co for some 5 > O. Let (o~,~),~>_t be an asymptotically n-optimal set of centers for P of order r. Then 1
n
co,
aC~n
Proof The proof relies on Theorem 6.2 and the equality case of HSlder's inequality (cf. Lemma 6.8). Since certainly lirn~_~oo Io~,~l/n = 1, we may assume without loss of generality that Ic~nl = n for every n > 1. Let 1 Z(ia" Izn= n aEan
7. Asymptotically optimal quantizers
97
It suffices to prove t h a t the l i m i t i n g measure of any vaguely convergent subsequence of (P~)n_>l coincides with P,. Suppose for a subsequence (also denoted by (p~)) #n -+ # vaguely for some finite Borel measure ~ o n / R a. T h e n ~(/R d) _< 1. Consider a d - d i m e n s i o n a l interval A = (b, c] w i t h b, c E / R a such t h a t #(OA) = O. By vague convergence p~(A) --+ # ( A ) a n d hence # ~ ( A ~) -+ 1 - # ( A ) . A s s u m e 0 < P(A) < 1. Since P and equivalent to 0 < Pr(A) < 1. W r i t e v2 = 1 - ~(A1), Pi = P ( ' I A i ) and ~i,~ b~,ci E /R d, b < b~ < c~ < c, b2 < b B2 = [52, c2] c satisfy P(Bi) > 0 and
Pr are m u t u a l l y absolutely continuous, this is A~ = A, A2 = A c, si -- P(Ai), vl = #(At), = c~ A Ai. For 0 < ~ < mind=l,2 Pr(Ai), choose < c < c2 such t h a t the sets B1 = [bx,cl] a n d
P , ( B ~ ) > P , ( A ~ ) - ~,
i = 1, 2.
T h e n choose a finite set ~h (on the b o u n d a r y of Bi) so t h a t min [Ix - el[ < i nf c [Ix - y[[ for every x e Bi, i = 1, 2. aE~/i
-- yEA i
Say h'~l -- k. T h e n we o b t a i n
f/ J
2
P
rain IIx - all" dPi(x) m i n IIx - all" dP(x) = E s i j / aean i=1
aEan
2
> E --
i=1
f a Emin IIx ~nUTi
s,
J
- ~11"
dP,(x)
Bi
2
= E si f i=1
J
Bi
min
aEai,nUTi
IIx - all~ dPz(~)
2
>- E siV'*'+k'r(P('lBi))f(Bi)/P(A~)' i=1
where ni = np~(Ai) = I~i,,]. This implies vi > 0, i = 1, 2. If not, t h e n Q r ( P ) = co, a contradiction. Using T h e o r e m 6.2 we deduce
Qr(P) = lim n ,/d [ min IIx - all r dR(x) n~oo
j
aea.
2
>- ~ siv~'/aQ,(P('lBi))P(Bi)/P(A~) i----1
98
H. Asymptotic quantization for nonsingular probability distributions
We have
Qr(P(.IB~))P(Bi) = Q~(p)pr(Bi)(d+r)/~ >_Qr(P)(Pr(Ai) - e) (d+~)/d.
Since 0 < ~ _< mini=l,2 Pr(A~) is a r b i t r a r y and by Lemmas 6.8 and 7.2, one obtains
2
Qr(P) >- E s'v[r/dQ~(P)Pr(Ai)(d+r)/d/P(Ai) i=l 2
= ~ s, Qr(Pi)v~ rid /=1 2
>_~ siQ~(Pi)P~(Ai) -~/d = Or(P). i=l
Using Lemmas 6.8 and 7.2 again, this yields vi
= Pr(Ai), i -- 1, 2. Thus #(A) = Pr(A).
If P(A) = 0, then omit the first summands in the above considerations. One gets v2 = P~(A2) = 1 and thus we have #(A) = Pr(A) = 0. If P(A) = 1, then omit the second summands. One obtains vl = Pr(A1) --- 1. Now we have p(A) = P~(A) for every (bounded) d-dimensionai interval A = (b, c] with #(OA) = 0. This implies ~ = P~. [] C o m p u t a t i o n s of Pr can be found in Tables 7.1. and 7.2.
P
p~
d-dimensional Normal
Nd(O, E )
Nd(O, - ~ E )
~ positive definite Uniform
U(B)
U(B)
B E B(IRa) bounded,Aa(B)> 0 Table 7.1: Probability distributions Pr
7. Asymptotically optimal quantizers
99
d
P
(~P)r 1
Logistic d
L(a)
® aL(a, i
Double exponential d
DE(a)
N~ DE(~(~+O~ 1
Double Gamma d
Dr(a, b)
® Dr(~(~r), d--~~+~ J 1
Hyper-exponential
HE(a, b)
d
N HE(a(
lib, b)
1
Exponential d
E (a )
a d+r
® E ( ~ ~-J~-~~2d) 1
Gamma r(a, b)
d
® r(~(d+r b~+r~ --~, d ) ' d+r / 1
Weibull
W(a,b)
d
¢¢~p(~:d+r~llb b~+r b) "o'~--',~L d I ,-y~-, 1
Pareto
P(a,b),bd>r
d
® P ( a , bd-~ d÷r / 1
Table 7.2: Probability distributions ( ~ P ) r
7.3
Asymptotic
optimality
in one dimension
For univariate distributions, the necessary condition of Theorem 7.5 can be turned around and used to construct asymptotically n-optimal sets of centers. Let d -- 1 and let P -- hA such that I = (h > 0} is an open (possibly unbounded) interval and h is continuous on I. Suppose EIXI r+a < oo for some 5 > 0. For n E /N, let ai denote the ~ - q u a n t i l e of Pr, 1 < i < n, and let rn~ = (ai ÷ ai+i)/2,
100
H. Asymptotic quantization for nonsingular probability distributions ai E I and
1 < i < n - 1. T h e n
E min
f(al-z)~h(x)dx+
I X - a i ] r~-
l<__i<_n
(ai-x)rh(x)dz
i=2
mi-1
--oQ
t'mi
~-i
+ ~ j~ (x- ~,)~h(~)dx+
f~ o(~_ ~.)~h(~)~.
The m e a n value t h e o r e m yields al
al
I
r)2 r+lh(u')(ai - °4-1)~+I'
-- (1 + a~-i < m ~ - 1 < u i _ < a i ,
2
f (x - a,)rh(x) dx = h(vi+l) f (x - ai)r dx 04
ai
1 h(v,+l)(a,+l
-- (1 + r)2 r+'
-
a,) ~+',
a~
2i- 1
2(i- 1 ) - 1
n
2n
2n
04
= a4-1 f hr(~)
dx = hr(w,)(ai - ai-1),
ai-1 < wi < ai, 2 < i < n. This gives ~1
E min
oo
IX -ail ~ - f ( a , - x)~fi(x) dx + f ( x - a , , ) ~ h ( x ) d x --00
On*
+ (1 + ~)2~ i " h(v,) (ai + ~ ~ h~(~,)~
i-=2
-h~(w~) - - - ~ ( ~ - ~-*) -
ai-1)].
7. Asymptotically optimal quantizers
101
Note that a~, u~, vi, w~ depend on n. Under suitable assumptions on the density h, we have
~l i- m ~~
h(yi) h~(wi)r'(ai
-- ai-1)
=
/ - ( h~ dA = Ilhlll/(t+~),
i:2
yi C {ui, v~}, and the two remaining summands are of order o(n-~). One obtains
(7.7)
lim nrE min IX ~-~oo 1<~<~
-
1 (1 + r)2 ~
Ilhlll/c,+r)
= Qr(P).
For the latter equation see Example 5.5. Thus ({a~,... , an)})~>, is asymptotically noptimal for P of order r. The same result holds for the --~f-quantiles of Pr, 1 < i < n. Sufficient conditions for the above result to be valid can be found in Cambanis and Gerr (1983) (with a gap in the proof), Linder (1991), PStzelberger and Felsenstein (1994).
7.6 Example (Double exponential distribution) Let P = DE(c). The '--~+l-quantiles of Pr = D E ( ( 1 + r)c) are given by
2i n+ i ai = (1 + r)clog(~--~-), 1 < i < T ' n+l
ai = (l + r)cl°g(2n + 2 - 2i)'
n+l
2
< i < n.
In case n odd and r --- 1, they coincide with the n-optimal set of centers for P of order 1 (cf. Example 5.6). The error of quantizers of the above type for various distributions is evaluated in Tables 7.3 and 7.4. These values should be compared with the n-th quantization error given in Tables 5.1 - 5.12.
102
H. Asymptotic quantization for nonsingular probability distributions
P\n N(0,1) L(~)
DE(,-~) E(1) r(~,2)
W(~,2)
2 0.5006 0.3661 0.7332 0.4188 1.0826 0.5234 0.4836 0.6112 0.4625 0.4761 0.4214 0.3896
3 0.2466 0.1913 0.3373 0.2319 0.3657 0.2648 0.2223 0.2763 0.2232 0.2269 0.2040 0.1935
4 0.1456 0.1180 0.1978 0.1478 0.2418 0.1782 0.1282 0.1551 0.1311 0.1323 0.1196 0.1151
5 0.0960 0.0802 0.1304 0.1026 0.1508 0.1200 0.0834 0.0987 0.0861 0.0866 0.0784 0.0762
6 0.0680 0.0581 0.0925 0.0754 0.1112 0.0903 0.0586 0.0680 0.0609 0.0610 0.0554 0.0541
7 0.0506 0.0441 0.0690 0.0578 0.0811 0.0682 0.0434 0.0497 0.0453 0.0453 0.0411 0.0404
8 0.0392 0.0346 0.0535 0.0457 0.0641 0.0546 0.0335 0.0378 0.0350 0.0350 0.0318 0.0313
Table 7.3: r -= 2, V=(P) = 1. Quantization error for ~ l - q u a n t i l e s (first line) and i ,~7-t-quantiles (second line) of P2, 1 < i < n
P\n N(0,2 ) l L(21-ET-~2)
DE(l) E(lo-~) F(a, 2) a---- 0.9508... W(a, 2) a - - 2.7027...
2 0.6512 0.5965 0.7284 0.6226 0.8863 0.6998 0.6497 0.6890 0.6396 0.6351 0.6177 0.5990
3 0.4613 0.4279 0.5151 0.4569 0.5556 0.5000 0.4459 0.4694 0.4476 0.4434 0.4324 0.4222
4 0.3565 0.3345 0.3987 0.3620 0,4504 0.4063 0.3402 0.3553 0.3443 0.3409 0.3323 0.3260
5 0.2904 0.2748 0.3253 0.3001 0.3600 0.3333 0.2752 0.2856 0.2797 0.2771 0.2698 0.2655
6 0.2450 0.2334 0.2749 0.2564 0.3091 0.2879 0.2310 0.2387 0.2356 0.2335 0.2270 0.2240
7 0.2119 0.2029 0.2380 0.2239 0.2653 0.2500 0.1991 0.2050 0.2035 0.2017 0.1960 0.1937
8 0.1866 0.1795 0.2099 0.1988 0.2358 0.2232 0.1749 0.1796 0.1791 0.1776 0.1724 0.1706
Table 7.4: r -- 1, VI(P) = 1. Quantization error for ~nl-quantiles (first line) and ~-@f+l-quantiles (second line) of P1, 1 < i < n
7.4
Product quantizers
We will compare the asymptotic performance of n-optimal product quantizers with n-optimal quantizers. The comparison is based on the relation between Qr(X) and Q,-(Xi), 1 < i < d. For this, the following lemma is useful.
7. Asymptotically optimal quantizers
103
7.7 L e m m a
Let d
8 = {(.,,
,vd) (0,co)d: IIv,
1}
i=1
and for si > 0 let
1/r d si
ti--
l
j=l
Then the function d
F : B--+ ~+,
F ( v l ) . . . ,Vd)
=ESiv~r i=i
satisfies F(tl,...
te) = )
d F ( v l , . . . , ve) = d I I s~/~" (Vl ,... ,Vd)EB i=1 min
Proof By the arithmetic-geometric mean inequality one obtains d
F(Yi)'--)Yd)
[..-~ ~-- d t H s i v ;
r\ l/d )
i:1 d
= F ( t l , . . . ,td) for ( v l , . . . ,v~) E B.
[]
7.8 L e m m a
Let the underlying norm be the l~-norm with 1 <_ r < co. Suppose E[]X[] ~+~ < co for some 6 > 0 and Q~(Xi) > 0 for I < i < d.
d (a) Qr(X) < d H Qr(Xi) lid i=-1 d (b) I f t i = Q r ( X i ) i / r / H Qr(Xi) 1/rd, ni = [til~l/d] for n E z1N, ~i,n C Cni,r(ii) and /=1
an = Xd=l/~i,~, then d
lim nr/dS min HX - all ~ = d H Q~(Xi)Wd"
n--~O0
aEotn
i=l
104
II. Asymptotic quantization for nonsingular probability distributions
Proof The choice of ti comes from L e m m a 7.7. We have
V,~,~(X) < E min IIX - all r d
d
----~-~ E min ]Xi - b]r = E i=1
Vm,~(Xi)
i=1
provided ni >_ 1 for every i. Since by Theorem 6.2
n~/dV~,,~(X,) = (nl/---~d~n[V~,~(X,) --> t~-~Q~(X,) \ni ] as n -4 co, one obtains
Q~(X) <_ lira nr/dE ~'OO
min ]IX
-
all r
aEO~n
i:1 d :
i=[
[] In view of Lemmas 7.7 and 7.8, the number d
d 1-[ Q~(Xi) 1/d
(7.8)
i:1 Q~(xI,..., Xd)
represents the "vector quantizer advantage" provided the underlying norm is the Itnorm and assumptions of L e m m a 7.8 are satisfied. For the d-asymptotics of the vector quantizer advantage see Remark 9.6 (a). 7.9
Remark
Suppose EIIXII r+~ < co for some 6 > 0. Suppose further t h a t the one-dimensional marginal distributions Pi of P are absolutely continuous with respect to A, 1 < i < d. d
Let ni = hi(n) C 1N such t h a t I I ni <_ n, let (fli,n)n>l be an asymptotically ni-optimal i=l d set of centers for Pi of order r and let c~n = Xi= 1 ~i,n. Then as n -+ co d
1 aCan
i=1
In fact, we have
aEan
=
bEfli,n
7. Asymptotically optimal quantizers
105
and by Theorem 7.5 1
P ,r, l < i < d .
Note that in case d > 2 d
~i=l
d
~r
i=l
unless P~ is an uniform distribution for every i. Notes The observation in Corollary 7.3 seems to be new. Theorem 7.5 extends considerably a corresponding result of McClure (1975) for one-dimensional distributions with compact support. See also the review article by McClure (1980). Rates of convergence in Theorem 7.5 for some one-dimensional distributions P (Pareto, exponential and power-function distributions) with respect to various local distances have been computed by Fort and Pages (1999). A discussion of the vector quantizer advantage as defined in (7.8) can be found in Lookabaugh and Gray (1989). Na and Neuhoff (1995) provide a nice result about the asymptotic performance of suboptimal quantizers (like product quantizers).
106
8
H. Asymptotic quantization for nonsingular probability distributions
Regular quantizers and quantization coefficients
As before, let X be a Rd-valued random variable with distribution P, let ]1 I] denote any norm on R g, and let 1 _~ r < co. The r-th quantization coefficient of P as defined in (6.4) consists of a geometric part Qr([0,1]) 6) which depends only on r and dP, the dimension d (and on the underlying norm) and a second part II~-~ IId/(d+~) which is related to the distribution P. For instance, the r-th quantization coefficient of a d-dimensional normal distribution N~(0, E) with positive definite covariance matrix E is given by
Qr(Nd(O, E)) ----Qr([0, 1]d)(27r)r/2(~-~)(d+r)/2(det E)~/2d. The constants Qr([0,1] d) are only known for d -- 1, d -- 2 (/l-norm,/2-norm), and in "trivial" cases for d _> 3. It appears to be a rather challenging problem to determine the values of these constants. First, note that the scaling property of Vn,~(P) carries over to Q~(P). 8.1 L e m m a Let T: R d -+ R d be a similarity transformation with scaling number c > O. (a) /fP~ ~ 0 and EIIXII ~+~ < co for some ~ > 0, then
Q~(T(X)) = CQ~(X).
(b) I r A E B(R d) is bounded with A~(A) > 0, then Q~(T(A)) = CQr(A). Proof Immediate consequence of L e m m a 3.2 (or of (6.4)).
[]
The following lemma shows that the largest quantization error among distributions concentrated on a fixed bounded set appears (asymptotically) for the uniform distribution. 8.2 L e m m a
Let A E B(R d) be bounded with Ad(A) > 0. Then m a x ( Q r ( P ) : P(A) = 1, Pa ~ O} = Q~(A) -- Qr([0, 1]d)&~(A)r/d. Proof Immediate consequence of HSlder's inequality.
[]
Since one may not expect to be able to find the precise values of Qr([0,1] d) for all dimensions d (and all norms), it is of great interest to find bounds. These bounds immediately yield bounds for Qr(P).
8. Regular quantizers and quantization coefficients 8.1
107
Ball lower bound
The following lower bound for Q~([0,1] d) indicates that the members of an n-optimal partition for a uniform distribution tend to look like a ball. For the/2-norm it is due to Zador (1963, 1982) and for arbitrary norms this bound is contained in Yamada et al. (1980). 8.3 P r o p o s i t i o n
Qr([0,1] d) _> M~(B(0, 1)) d
(d +
1))
Proof By (6.2) and Theorem 4.16 Q~([O, 1]4) = inf nr/dM,~,r([O, 1]d) n>l
>_ M~(B(O, 1)). []
For the formula for Mr(B(0,1)) see Lemma 2.9
8.2
Space-filling figures, regular
quantizers
and upper
bounds
For low dimensions good upper bounds for Qr([0,1] d) can be obtained by the normalized r-th moment of space-filling figures in R d. Here a set A C R d is called space-filling if A is compact with Ad(A) > 0 and there is a countable family T of bijective isometries on R d such that {T(A) : T E T} is a tesselation o f ~ d. This notion depends on the underlying norm. In case the isometries T c T can be chosen of the form T(x) = x + t, t E R d, then A is called space-filling b y translation. Note that if S: R d --+ Rd is a similarity transformation and A is space-filling (by translation), then S(A) is also space-filling (by translation). Let us mention some properties of space-filling sets. 8.4 L e m m a Let A C R d be space-filling and let T be a correponding family of bijective isometries. Choose a • Cr(U(A)) and let aA = ( T ( a ) : T • T}. (a) ~T(A) : T C T~ is a locally finite tesselation o f R d. (b) aA is locally finite. (c) Ad(OA) = 0 and int(A) ~ 0.
108
H. Asymptotic quantization for nonsingular probability distributions
Proof
Set u -- diam(A). (a) Let I = {T • T : T(A) N B(O, s) ~ O} for some s > O. Then
U T ( A ) C B(O,s+u). TEl
Therefore
Ad(T(A)) = Ad(U T(A))
IliAd(A) = ~ TEI
TEI
< Aa(B(O, s + u)) < oo which implies ]I] < c~. (b) Let s > 0 and let I = {T C 7 - : T(A) NB(O,s+u) ¢ 9 } . By (a), I is finite. Let T C T \ I. Then IlY[] > s + u for every y • T(A). Since T(a) • Cr(U(T(A))) by Lemma 2.1, it follows from Lemma 2.6 (b) that there exists x • T(A) such that ]ix - T(a)]] < u. This gives IlT(a)ll _> I I x [ I - I l z - T(a)]l > s + u -
u = s.
Hence {T • 7" : T(a) • B(O, s)} C I. This implies that aAM B(0, s) is finite. (c) Let S • T. Then {S-1T(A) : T • 7-} is a tesselation of R a. We have
Od C U (S-1T(A) MA) T~T TC,S
and thus
Ad(OA) <_ ~
Ad(S-'T(A) M A) = O.
TET TC~S
This implies int(A) ¢ 0.
[]
Quantizers for U([0,1] d) can be built with space-filling sets A as follows. For n C / N , consider a rescaled version cA of A, c = c(n) > 0, such that Ad(cA) = 1/n. Observe that {cT(A) : T E T} is a tesselation of R d consisting of isometric copies of cA. In fact, T = L + T(0), where L = T - T(0) is a bijective isometry with L(0) = 0 and hence, L is linear. The bijective isometry :~ = L+cT(O) then satisfies T(cA) = cT(A). Let
I = I(n) = {T E T : cT(A) c [0, 1]d} and m = re(n) = tII. Then m _< n. We thus obtain a tesselation of the cube [0, 1]d consisting of m isometric copies cT(A), T E I, of cA and a region D near the boundary,
D = P(n) = [0, 1]d \ U cT(A); TEI
8. Regular quantizers and quantization coemcients
109
see Figure 8.1. Choose a • C,.(U(A)). Define a r e g u l a r n - q u a n t i z e r f,~,A with respect to A by
fn,A = E cT(a)lBru((
(8.1)
TEl
U "r)¢nBr) ' Tet
provided I ¢ ~, where {BT : T E I} is a Borel measurable partition of U cT(A) TCI
with BT C eT(A) for every T C I and {/~T : T E I } is a Voronoi partition of R d with respect to (cT(a) : T • I}.
Figure 8.1: Tesselation of [0,1] 2 into m = 6 regular hexagons and a boundary region, n = 10
8.5 T h e o r e m
Let A C R ~ be space-filling. Then
lim
7/,--~00
f
nr/'~ Itx J
-
A,A(X)II"dU([O, 1]d)(x)
In particular Qr([0,1] d) < Mr(A).
---- Mr(A).
110
H. Asymptotic quantization for nonsingular probability distributions
Proof By Lemma 2.1, cT(a) E C~(U(cT(A))) and Mr(cT(A)) = M~(A). Therefore
f llx-
fn, A (~)11~ dU([O, 1141(x)
-- ~ f [Ix-cT(a)ll~ dx + fminTezI x - - c T ( a ) l l ~ dx TCIcT(A)
D
= n -~/4 mM~(A) + f min r~
d
TEl
llx cT(a))lr dx -
D
= n -~/4 ~ M ~ ( A ) +
f minllx TCI -
g D
cT(a)ll~dx.
Since A is space-filling, we have A~(D) = A~(D(n)) = 1 - m(n)/n -~ O. Let u denote the diameter of A. Then cu is equal to the diameter of cT(A) and
cu = c(n)u = n-1/dA4(d)-l/du = O(n-t/d). There exists a constant 3' > 1 such that for every x E [0,1] 4 and sufficiently small s > 0 one can find y C B(x, 7s) satisfying
B(y, s) C B(x, 7s) M [0, 1]a. This is clearly true for t h e / ~ - n o r m and hence for any norm. For x ff D and n large we deduce
B(x, Tcu) A ( U cT(A) ) ~ 0. TEI
Otherwise, choose y E B(x, 7cu) such that B(y, cu) C B(x, 7cu) M [0, 1]d and then choose S C T such that y E cS(A). One obtains
cS(A) C B(y, cu) C D, a contradiction. In view of Lemma 2.6 (b) it follows that min Ix - cT(a)I ] <_ (7+ 1)cu, n large. TEl
Therefore fDmin Hx -- cT(a)tl T dx < ((7 + 1)cu)rAd(D) = o(n-T/4) . TEl
This implies the assertion. Let the r - t h r e g u l a r q u a n t i z a t i o n coefficient of [0, 1]d be defined by (8.2)
Q(~R)([0, 1]d) = inf{Mr(A): A C R d space-filling}.
[]
8. Regular quantizers and quantization coet~cients
111
By the preceding proposition, we get (8.3)
Qr([0,1] d) _< Q(R)([0, 1]~).
A basic question is whether equality holds in (8.3). The regular quantizer problem consists in finding a space-filler A c R d such that Q(R)([0,1]d) ---- Mr(A). Both problems are unsolved for d > 3. One technique for obtaining upper bounds is to select space-fillers in higher dimensional spaces by forming products of two (or more) lower dimensional space-fillers. 8.6 L e m m a Let the underlying norm be the It-norm, 1 <_ r < oo. Let A c R d and B C R k be space-filling. Then A x B C R d+k is space-filling and Mr(A x B) = Mr(A)A~(A)r/d + Mr(B)Ak(B)r/k A~(A)r/(d+k)Ak(B)r/(d+k) Proof Clearly A × B is space-filling in R a+k. Furthermore, V~(U(A x B)) = V~(U(A)) + V~(U(B)) = Mr(A)Ad(A)r/d + Mr(B)Ak(B) r/k. This implies the assertion.
8.3
[]
Lattice quantizers
The Voronoi regions of lattices in R ~ provide an interesting class of regular quantizers. A (d-dimensional) lattice in R d is a locally finite additive subgroup of R d which spans R ~. Equivalently, a lattice is a subset of the form A -- Zyl + . . . +Zyd for some (vector space) basis {Yl,-.- , Yd} of R~; such a basis is called basis of A. The volume of a f
fundamental parallelotope ~
d
tiyi:O _< ti _<1 for l < i < d ~ of a lattice A does not
depend on the choice of the basis { y l , . . . , Y~} of A and is denoted by det(A). Then det(A) is the absolute value of the determinant of the matrix with rows Yl,... , Yd. Fundamental parallelotopes of A are space-filling by translation with A as set of translation vectors and any space-filler A C Nd of this type satisfies Ad(A) = det(A). Now consider the Voronoi diagram {W(a[A) : a E A} of a lattice A c R d. Obviously W(alA) = W(0IA) + a, a • i .
If the Voronoi diagram of A is a tesselation of R d we say that A is admissible. For strictly convex norms every lattice is admissible (see (1.7) and Theorem 1.5).
112
II. Asymptotic quantization for nonsingular probability distributions
8.7 Lemma
Let A C R ~ be a lattice. (a) W(0[A) is compact and Ad(W(0IA)) _> det(A). (b) A is admissible if and only if Ad(W(O[A) ) = det(A) and Ad(OW(OIA)) = O. Proof (a) Choose s > 0 such that B(0, s) contains a fundamental parallelotope of A. Then {B(a, s) : a E A} is a covering of R a. This implies W(01A ) C B(0, s) and hence, W(0IA) is compact. Furthermore, if B denotes a fundamental parallelotope of A, then =
n ( B + a)) aEA
=
- a) n B ) aEA
> Ad(B) = det(h), where W = W(0IA ). Here the inequality follows from the fact that the Voronoi diagram of A is a covering if Rd; see Proposition 1.1. (b) If A is admissible, then in view of (a), W(01A ) is space-filling by translations with A as set of translation vectors. This implies Ad(W(01A)) = det(A). If A is not admissible and Ad(OW(O]A)) = 0, then int W(all A) N int W(a2]A) ¢ 0 for some al,a2 E A, al ~ a2. Hence, there exist xl,x2 c int W(0IA), Xl ¢ x2, such that Xl - x2 C A. Let b = xl - x2. Choose c > 0 such that B(xi, c) C W(0IA) and B ( x , , ~) n B(x2, c) = 0. Set A = W(0IA) \ B(xl, ~). Then B ( x l , ~) = B(x2, c) -t- b C A + b which yields W(01A ) C A O (A + b). This implies that {A + a : a E A} is a covering of Rd, hence Ad(A) _> det(A). We obtain Ad(W(0IA)) > Ad(A) _> det(A). [] It is remarkable that by part (b) of the preceding lemma, the volume of W(01A ) does not depend on the underlying norm as long as admissibility holds. There are lattices which are not admissible. example.
This is illustrated by the following
8. Regular quantizers and quantization coefficients
113
8.8 Exaraple Let the underlying norm on R 2 be the/t-norm and let A = Z ( - 1 , 1) + Z(4, 0). Then A={a•Z 2:al+a2•4Z}, det(A)=4,
W(01A) = (x e R2: ]xll + 121 -~ 1} u ([-2,0] 2 n {x • R~: xl + x : > - 2 } ) u ([0, 2]: n {x • R2: zt + x~ < 2}) and A2(W(0IA)) = 5; see Figure 8.2. admissible (for the/1-norm).
•
~
It follows from Lemma 8.7 that A is not
•
Xl
Figure 8.2: Voronoi region W(0IA) with respect to the/1-norm for a nonadmissible lattice As concerns the convexity of W(01A ) one can modify Remark 1.9 as follows: if W(0IA ) is convex for every lattice A C R d, then the underlying norm is euclidean (cf. Gruber, 1974, Theorem 2). If A c R d is an admissible lattice, then we know from Lemma 8.7 that the Voronoi region W(0IA) is space-filling by translation with A as set of translation vectors. Thus Therorem 8.5 applies to the n-quantizer f,,,h -- f~,w(olh) for U([0, lid). Note that W(0IA) is symmetric (about the origin) and hence
(8.4)
Mr(W(01A)) =
fw(0,A) Ilxll r dx det(A)(d+,)/a ;
cf. Example 2.3. For n E / N , let
(8.5)
o~n,h = {ca : a E A, c W ( a l A ) c [0, 1]d}, c = c(n) = (n det(A)) -1/d.
114
H. Asymptotic quantization for nonsingular probability distributions
8.9 T h e o r e m Let A C N e be an admissible lattice and let X be U([0, 1]e)-distributed. Then lim n r/a
n--+O0
f
J
min ]Ix - bllrdU([O, 1]d)(x) = Mr(W(01A))
bean, A
and n '/a
min IIX - bll ~ ~ ~ n
bcan,A
~ ~,
where # is a probability on R+ with distribution function F,(t) = Aa((det(A)-'/dw(OIA)) N B(O, t)). In particular
Q~([0,1]4) _/,(W(01A)). Proof
We have E min I[X - bll ~ = E I I X - f,~,AIl" bE~n,h
(when f,~,his defined in (8.1) with respect to the center a = 0). So the first assertion follows from Theorem 8.5. N o w observe that a~,h does not depend on r and
Mr(W(Olh)) =
d#(z).
Since supp(#) is compact, the convergence of moments, lira nUdE min IIX - b]l~ -- f z r d # ( z )
for every 1 < r <
OO,
implies the desired distributional convergence (cf. Hoffmann-Jorgensen, 1994, 5.13). [] Let the r - t h l a t t i c e q u a n t i z a t i o n coefficient of [0, 1]4 be defined by (8.6)
Q(rL) ([0,1] d) -- inf{Mr(W(0]A)) : A C R a admissible lattice}.
Then (8.7)
Qr([0,1] d) _< Q(R)([0, 1]d) _< Q!L)([O, 114).
The lattice quantizer problem consists in finding an admissible lattice A such that @L)([0, 1]4) = Mr(W(01A)).
8. Regular quantizers and quantization coefficients
115
8.10 R e m a r k
(a) Suppose A C R d is space-filling by translation, where the corresponding set of translation vectors is an admissible lattice A. Then Mr(W(0IA)) _< Mr(A). In fact, since E min
bE~n,h
IIX -
bll r _< EIIX - A,AIL
where X is U([0, 1]d)-distributed, the above inequality follows from Theorems 8.5 and 8.9. (b) Suppose A C R d is a convex space-filler by translation. Then A is a centrally symmetric polytope (i. e. - ( A - x) = A - x for some x E R d) and admits as set of translation vectors a lattice A (cf. McMullen, 1980). By (a) we have Mr(W(0IA)) __ Mr(A) provided A is admissible. Thus we obtain Q!L)([0, 1]a) = inf{Mr(A) : A C R d space-filling polytope by translation} for euclidean norms. (c) Suppose the ball B(0, 1) is space-filling. Then the ball is obviously space-filling by translation. By (b), there exists a lattice A as set of translation vectors. Then W(0IA ) = B(0, 1) and hence A is admissible. In view of Proposition 8.3 we obtain
Mr(B(O, 1)) = Qr([0,1] d) = Q~n)([0,1] 4) = Q!L)([O,1]d). Conversely, if Mr (B (0, 1)) ----Q(L)([0, 1]d) holds (for some r) and if the lattice quantizer problem has a solution A, then B(0, 1) is space-filling. To see this, choose s > 0 such that Ad(B(O, s)) = det(A). By (the proof of) Lemma 2.9, we have B(0, s) C W(01A ). To verify the converse inclusion, assume that there exists x E W(0, A) with s < [[x[[. Since the distance function d(., A) is continuous on R d, one obtains B(x, ~) C ( U B(a, s)) c for some ~ > 0. Choose a e A such that Ad(B(x,¢) M W(a[A)) > 0. aEA
Then
Ad(B(O, s) ) < Aa(B(a, s) ) + Ad(B(x, ~) n W(alA)) _< Ad(W(aIA)) = det(A), a contradiction. Hence B(0, s) = W(01A ) and so B(0, 1) is space-filling. The lattices in the following examples are related to optimality results.
116
H. Asymptoticquantizationfor nonsingularprobabilitydistributions
8.11 E x a m p l e ( S t a n d a r d l a t t i c e Z a) Let A = Z d and let the underlying norm be the/p-norm, 1 _< p _< oo. Then det(A) = 1 and W(01A) = t - : ,:[ ~J:]d" In particular, A is admissible. For computations of the normalized r - t h moments of W(0[A) see Example 4.17. In case p = :x~, the limiting measure # in Theorem 8.9 is given by
F,(t) =
(2t) a, 0 < t < 1/2.
For d = 1, one obtains # = U([0, :]).1 8.12 E x a m p l e ( H e x a g o n a l l a t t i c e i n R 2) Let d -- 2 and let A = Z(1, 0) + Z ( 1 / 2 , v ~ / 2 ) . Here we have det(A) = v/3/2. If the underlying norm is t h e / 2 - n o r m , then W(01A ) is a regular hexagon,
W(01A) = {x e R=: Ix:l <_ 1/2, Ix, I + ~ l x = l _< 1} 1 1 1,+(0, ~), +(:,1 2~1};
=cony{±( 1, see Figure 8.3.
::~i~iiiil~i~ •
O
I
O
•
•
Figure 8.3: Voronoi region W(0{A) with respect to the /2-norm for the hexagonal lattice We have 8- 2r/2 I/2 M , ( W ( 0 1 A ) ) - 3(2+,)/, /
(1-=:)lv"~
/
0
(x~ +x~)'/2dx2dx:.
0
In case r = 1 and r = 2 one obtains 3 log(V"-3) = 0.37771... M:(W(01A)) - 2 + 37/4~
and 5 M2(W(O[A)) - - -
18v~
-- 0.1603 ....
8. Regular
quantizers and quantization coefllcients
117
If the underlying norm is the/t-norm, then W(0IA) is the (nonregular) hexagon W(01A ) --- {x • R2: Ixll < 1/2, Ix, I + Ix2[ _< (1 + Vr3)/4}
Since A2(W(01A)) -- v/'3/2, A is admissible by Lemma 8.7.
8.13 Example (Lattices Dd) d
Let A = {a • Z ~ : ~ ai even}. This lattice is usually called of type Dd or checkeri=1
board lattice. Here we have det(A) = 2. If the underlying norm is the/2-norm, we obtain W(01h) = [ - 1 , 1}, d = 1,
W(01h) = (z • R": ~
I~1 < i for every
iEI
I C {1,... ,d},lil = 2}, d > 2. In case d = 2, W(0[A) is the unit/1-ball and in dimension d = 3, W(0tA ) is a rhombic dodecahedron. Let r = 2 and write W = W(01A ). Since W is invariant under permutations of the coordinates, one gets
f llxll2dx = d/x~dx W
W 1
=a/~a~-~(W~,)dxl, -1
where Wxl denotes the xl-section of W, W~ = {y • R d - t : y~. lYil -< 1 for every I C {1,... , d - 1}, iEI
111=2, l Y i l ~ l - l x , [ f o r l < i < : d - 1 } ,
IxlI
Using Ad-I(W~) : 2 - 2d-llXl[ d-l, [x1[ ~ i ,
I ~d-~(w~l) = 2~-~(1 - lx~L)d-~, ~ < tztt < 1 we deduce d 1 Ilxll2dx = ~ + d--~? W
118
H. Asymptotic quantization for nonsingular probability distributions
This implies the formula 1_3__( d 1 M2(W(0]A)) = 22/~ 12 + 12(d - - -+- ~ )' d > 1 (cf. Conway and Sloane, 1993, p. 462). In particular 3 M2(W(0IA)) = 41/38 = 0 . 2 3 6 2 . . . , d = 3,
M2(W(0IA)) = ~
13
= 0 . 3 0 6 4 . . . , d = 4.
If the underlying norm is the /1-norm, then W(0IA ) coincides with the above 12Voronoi region and A is thus admissible. Here we obtain for r = 1 1 d 1 MI(W(0]A)) = 2-~/d( ~ + )---------~)' 1 2(d+ d _> 1. In particular, MI(W(0IA)) = 21/3---~ = 0 . 6 9 4 4 . . . , d = 3. 8.14 E x a m p l e ( D u a l l a t t i c e s D~) The dual lattice of the lattice Da is defined by
D*a = { x C R d : E
aixi E Z f o r every a e Dd •
i-=l
Then
D*a = zd + z ( 1 , . . .
,21-)
and 2D~ = (2Z) d U (2Z + 1) d Note that 2D~ = Z and 2D~ = D2. Let A = 2D~. Then det(A) = 2d/det(Dd) = 2 a-1 If the underlying norm is the/2-norm, it is not difficult to verify that W(0IA) =
x e Rd: ~
[xd <
n [-1, 11d.
i=1
For d = 3, this Voronoi region is a truncated octahedron; see Figure 8.4. It is more difficult to compute the normalized second moment of W(01A). We obtain 19 M2(W(0IA)) - 21/36-----~ - 0.2356... , d = 3.
8. Regular quantizers and quantization coefficients
119
Figure 8.4: Truncated octahedron
Note t h a t this moment is slightly smaller t h a n M2(W(OID3)). For d = 4, A and 394 are similar. In fact, the similarity transformation T : R 4 - ~ R 4,
T(x)
= ( x l + z 2 , z l - x 2 , ~3 + x~, z 3 -
with scaling factor yr2 satisfies T(D4) = A. Therefore, W(01A ) = yields M2(W(01A)) =
M2(W(OID4)) =
~) TW(OID4) which
13 21/23-----6 = 0.3064..., d = 4
A general formula for M2(W(01A)) can be found in Conway and Sloane, 1993, pp. 470-471. The above upper bounds for Qr([0, lid), d = 3, 4, are close to the ball lower bounds given in Proposition 8.3. We have
M2(B(O, 1))
( 3 ~ 2/33_ = 0.2309 , d = 3, = \47r/ 5 "'"
v ~ _ 0.3001 " " ' d = 4. M2(B(0, 1)) - 23---~If the underlying norm is the /1-norm, the Voronoi region generated by 0 does not change and so A is admissible. For d = 3 and r = 1, we obtain 35 MI(W(0[A)) = 41/33-----~ -- 0.6890..., d -- 3. This moment is smaller t h a n given by
MI(W(O[D3)). The
M I ( B ( 0 , 1 ) ) (=3 )
corresponding ball lower bound is
4/3 = 0.6814... , d = 3.
120
H. Asymptotic quantization for nonsingular probability distributions
If the underlying norm is the/2-norm, Qr([0,1] d) is only known for d = 1 and d = 2. We will see below that for d -- 2, Q~([0,1] 2) = M~(W(0IA)) holds, where A is the hexagonal lattice described in Example 8.12. In particular, A solves the lattice quantizer problem. For dimension d --- 3, the Example 8.14 shows that the normalized second moment of the lattice D] (truncated octahedron) is very close to the ball lower bound. It is known that D~ is a solution of the lattice quantizer problem for r = 2 (cf. Barnes and Sloane, 1983), so (8.8)
Q~L)([0,1] a) ----M2(W(O[D~)) -
1___~9_ 0.2356.. /2-norm. 21/364 ",
For d _> 4, solutions of the lattice quantizer problem are not known. Conway and Sloane (1993) give a comprehensive survey of the best known lattice quantizers for r = 2 among them D~ (or D4) and D~. For recent improvements see Agrell and Eriksson (1998). (Note that these authors present the value of M2(A)/d for A C Rd.) If the underlying norm is t h e / l - n o r m , the Example 8.14 shows that the normalized first moment of D~ is close to the ball lower bound and hence, D] provides a good quantizer for U([0, 1]a) in case r = 1. However, optimality results are not known for d > 3. A trivial case occurs for d = 2, where the ball B(0,1) -- W(0[D2) is space-filling. Therefore (8.9)
2 Q~([0,1] 2) = M~(B(O, 1)) - (2 + r)2r/~' /1-norm.
A further trivial case concerns the loo-norm. Here B(0, 1) = W(01Z d) is again spacefilling and so (8.10)
d Qr([0,1] d) = Mr(B(0, 1)) - (d + r)2 ~' l~-norm
(cf. Example 4.17). The following result is due to Fejes T6th (1959, 1972). 8.15 T h e o r e m Let d = 2 and suppose the underlying norm is the 12-norm. Let A be the hexagonal
lattice. Then Qr([0,1] 2) = M~(W(0[A)). Proof Set A -- W(01A ) and recall that A is a regular hexagon. For every n E / 5 / a n d every a C A with la] -- n, we have
/ ma~n ][x - allr dx >- n f A
n-1/2A
I]x[rdx
8. Regular
quantizers and quantization coe~cients
121
(cf. Fejes T6th, 1972, p. 81). Let a C Cn,r(U(A)). It follows from Theorem 4.1 and Lemma 2.6 (a) that (~ C A. Therefore
det(A)Vn,r(V(A)) = /
minaeaIIx
--
allrdx
A
>n f
Ilxllrdz
n-ll2A
: n-rl2Mr(A) det(h)( 2+r)/2 which yields
nr/2V~,r(U(A)) >_Mr(A) det(A) r/2,
n c/hr.
This implies
Qr(A) >_Mr(A)det(A) r/2 and hence Q,([0,1]``) _> Mr(A). This together with Theorem 8.9 gives the assertion.
[]
8.16 R e m a r k The above results allow to prove by a quantization argument that a~,h is uniformly distributed in [0, 1]`` for every admissible lattice A C R`` with convex Voronoi region W(01A ) in the sense that 1
I'~,~,AI ~ ~b~
u([0, i]``) as ~
-+ ~.
bEan,A
First, observe that A : W(0tA ) is the unit ball of some norm [[ [[o- Then forget the underlying norm which was only used to form the Voronoi region W(0[A) and through this (~,A and proceed with the norm II II0. The Voronoi region W(01A , I] II0) with respect to II II0 coincides with A. Therefore, by Theorem 8.9 and Proposition 8.3 lim 7t--}~X}
nrl``f j
min
bEo~, A
IIx
blrodU([O, 1]'~)(x)
-
=
Mr(A, II Iio) = Q,.([o, 1]", II Iio).
The assertion now follows from Theorem 7.5. 8.4
Quantization tions
coefficients
of
one-dimensional
In Tables 8.1-8.3 one can find the quantization coefficients
Qr(P) =
1
(1 + r)2 r
( f ( d P ~ ll(l+r) ) \ dA)
dA
l+r
distribu-
122
II. Asymptotic quantization for nonsingular probability distributions
of several univariate absolutely continuous distributions. As an illustration, the Figure 8.5 shows the densities of three hyper-exponential distributions with variance equal to one and small, moderate and large second quantization coefficient, respectively.
Figure 8.5: Densities of hyper-exponential distributions P = H(a, b) with variance equal to one and Q2(P) = 1.8470 (top), Q2(P) = 3.3106 (center), Qz(P) ~ 8.1000 (bottom).
123
8. Regular quantizers and quantization coefficients P Normal N(0, a 2)
Q~(P) ( ~ ) r / a ( 1 + r) (~-')/2
Logistic
L(a)
p[
I X2+2r
'2"" (I + r)p(:_::)'+~
Double Exponential DE(a)
(a(1 + r)) r
Double Gamma p/b+r~l+r
Dr(a, b)
ar(l+r)b+r-1 -~V~J
p(b)
Hyper-exponential
HE(a,b)
(~)~(1
+ r) O+r-b)/b
Uniform
U([~,b]) Triangular
T(a, b; c)
1
.b-a,
r
1+~ ( - ~ )
2 +T)(b-(2+r)'+'((1 2 a))~
Exponential
E(a) Gamma
r(a, b)
(a)r(1 + r)~r-, -,l+r,
r(b)
Weibul!
w(a,b)
(a(l+r)Ub)rp.
b-t-r
Pareto
P(a,b),b > r
b
a 1 + r).r
b-r (::b-~)
Table 8.1: Quantization coefficients
1+~
124
H. Asymptotic quantization for nonsingular probability distributions P
Q2(P)
N(0, 1)
V•2
L(v/'3/~r) DE(1/x/~)
-
_
F(~)~ - 3.7709... 47r2r(~) 3 9 -- 4.5 2
3b+ip(b+2~3 --~ 3 J
Dr(a, b) a 2 -
-- 2.7206...
r(b + 21
1
_
range: (0,3F(2) 3) ----(0, 7.4488...)
b(1 + b) HE(a, b)
a~_ r(~) r(~)
r(~)~ 3(~-b)/b range: (1, oo)
U([a, b]), b = a + 2x/3
1
T(a, b; ~), 27 1.6875 16 9 2.25 4
b=a+2v~
-
E(1)
-
a2 _
-
3b+l
r(~, b)
-
bd-2 r(-r)
3
4r(b + 1)
1
range: ( 3r(~) ~- ~ , ~ . 1
= (1.8622...,2.7206...)
W(a,b) 1
a 2
r(~)- r(~F P(a, b), b > 2, a2 = ( b - 2)(b- 1) 2 b
91/br(~) 3 452[r(~_ _ r(~_~)2] range: [2.1555, c~) 9b-1
2
range: (9, c<))
Table 8.2: r = 2. Quantization coefficients of distributions P with V2(P) = 1
8. Regular quantizers and quantization coet~cients
125
QI(P)
P
N(O, ~) L ( ~ )1
7[
- = 1.5707... 2 8 log 2
DE(l)
2
2br( )
Dr(a, b) 1
b
r(b + 1) range: (0,r)
HE(a,b)
r(~)
- 1.7798...
br( )
a = p(})
range : (1, co)
v([a, hi),
1
b=a+4 T(a, b; a+b~ 2 Y' b=a+6 E
1
P(a, b), b > 1, b-1 a - b(21/b _ 1)
4
- ----1.3333... 3 1 ---- 1.4426... log 2
(b--1)(21/b--1) 1 range: ( ~ , c o )
Table 8.3: r -- 1. Quantization coefficients of distributions P with VI(P) = 1
Notes The issue of space-filling sets in the quantization setting was raised by Gersho (1979) for the /2-norm. Expositions concerning tesselations (tilings) can be found in Griinbanm and Shephard (1986) and Schulte (1993). For a background on lattices, we refer to the books of Cassels (1959) and Gruber and Lekkerkerker (1987). Gersho (1979) contains upper bounds for Qr([0,1] ~) of the type (8.3) for the/2-norm. Theorems 8.5 and 8.9 provide a rigorous derivation. The obervation in Theorem 8.9 concerning the distributional convergence seems to be new. Different proofs for the Hexagon-Theorem 8.15 can be found in Newman (1982) (for r -- 2), Wong (1982) (also for r = 2) and Haimovich and Magnati (1988). Discussions and applications
126
II. Asymptotic quantization for nonsingular probabifity distributions
of this theorem axe contained in Bollob£s (1972,1973). The quantization coefficient QI([0,1] d) appears in upper bounds for limiting constants in the euclidean traveling salesman problem; see Goddyn (1990). According to Theorems 8.9 and 8.15, the hexagon quantizer is asymptotically n optimal for P = U([0, 1]2) of every order r. This result can be extended to bivariate nonuniform distributions P with a continuous density using a piecewise hexagon quantizer depending on P and r in the spirit of L e m m a 7.2 (but now with m -+ co, m/n -+ 0 and the ni-optimal quantizer for P(.IAi) replaced by a hexagon quantizer). See McClure (1980, p. 197) for r = 2 with an unpublished proof. Su (1997) showed that this design yields an asymptotically n - o p t i m a l quantizer for every r. 8.17 C o n j e c t u r e Qr([0,1] a) -- Mr(W(OID~) ) for every r E [1, co) when the underlying norm on R 3 is t h e / 2 - n o r m (cf. Example 8.14, (8.8), and Remark 10.11(c)).
9. Random quantizers and quantization coefficients
9
Random
quantizers
and quantization
127
coefficients
In this section we determine the asymptotics of a stochastic version of the quantization problem and derive further upper bounds and the d-asymptotics for the quantization coefficients.
9.1
Asymptotics
for random
quantizers
Let X be a Rd-valued random variable with distribution P, let [[ [[ denote any norm onR d,andlet l
(a) Assume P(g > 0) -- 1. Then l_
F~(t) =
{01 ' - f exp(-Ad(B(O, 1))g(x)d) dP(x),
t_ O.
If additionally E[[X - YI[[r < co, then f g-r/~ dP < oo and lim nr/dE min ,,X- Yi,{r = F(2, + r ) M r ( B ( 0 , 1 ) ) J f g-r/d dP u~oo t <_i
• (~r/~mm,_<,_<~ IIX - y ~,{Fr )~>1 is uniformly integ~ble. -
-
(b) Assume P(g = O) > O. Then the sequence (nl/gminl<~l is stochasticaIIy unbounded. Proof (a) Let t > 0. For x • R ~, we have
Jl~(TI,lid min I1~: - Y~II < t) = 1 - [1 - Q(B(x, tn-1/d))] ~. i
nQ(S(x, tn-i/~)) Q(B(x, tn-1/d)) = -+ g(~) tdAd(B(O, 1)) Ad(B(x, tn-V~))
128
H. Asymptotic quantization for nonsingular probability distributions
as n -+ oo for Ad-almost all x and hence for P-almost all x C R d (cf. Chatterji, 1973, Chapitre V). Therefore, by Lebesgue's dominated convergence theorem
1P(nWd min []X - Yi[[ _~ t) = f~'(n ~/d min [Ix - Yd] < t) dP(x) -+ F~(t). l<_i<_n J l<_i 0) = 1, F , is a distribution function and we obtain
n 1/a min [[X - Yi[[ D v. t
This, together with
f,r
1))-r/dfs/"eP
r(1 + = F(2 +
d)Mr(B(O, 1)) / g-r~, dP
implies the second assertion (cf. Hoffmann-Jcrgensen, 1994, 5.2). (b) As above (but now F~ is not a distribution function)
IP(n x/d min [[X - Yi[[ > t) -+ 1 - F,(t) >_P(g = O) > 0 l
for every t >_ 0.
[]
For P ----hAd such that h e L,/(,+~)(A"), let Pr = h~ A4 be defined as in (7.5). Set
B = {g~ Ll(Ad): g >_O,f gdA~ <_l}. Then the function F : B -~ [0, oo],
f F(g) = Jg-r/ddP
satisfies
E(hr) = min F(g) = [Ihlld/(d+r)
(9.1)
gEB
(cf. Lemma 6.8). In fact, by H51der's inequality with p = one obtains
d/(d + r) and q = -d/r,
F(g) ~_ [,h[,p (/(g-r/d)qdAd) l/q > Ilhllp = F(hr) for g C B. This shows that an asymptotically optimal random quantizer for P of order r is given by the distribution Q --- Pr provided the uniform integrability condition holds for Pr. In this case Theorem 9.1 yields (9.2)
lim
n--coo
nr/dE rain IIX - Y~llr -- F ( 2 + r ) Mr(B(0,1))llhlld/(d+~ ). l ~i ~_n a
We will not discuss uniform integrability of (nr/d mint <_i_<~ItX - Y~IIt ) ~_>~ in general but consider only the case of uniformly distibuted 1I//.
9. Random quantizers and quantization coetficients
129
9.2 T h e o r e m Let A C R a be a compact set with Aa(A) > 0 and tet II1, Y2,... be i.i.d. U(A)distributed random variables independent of X . Assume P = Pa, supp(P) C A or P(int A) = 1. Assume further that there are constants c > 0 and to > 0 such that
(9.3)
A~(B(x, t) f3 A) >_ ct d for every x e supp(P), t E (0, to).
Then n 1/d
min IIX - Ydl -~
l
and lim nr/dE min IIX - Y~llr = F ( 2 + d)Mr(B(O, 1))Ad(A)r/a, n---~oo
l
where u denotes the Weibull distribution with distribution function {01
F.(t) =
' - exp(-Ad(B(0, 1))td/Ad(A)),
t<0 t > O.
Proof Let Q = U(A). In case P = Pa and supp(P) c A, the first assertion follows from Theorem 9.1. In case P(int A) = 1, we have for t > 0 and x E intA
IP (n t/d min I I x - Y~II < t) = 1 -
1
Ad(B(0,llltd "
for sufficiently large n. This implies
n 1/a min [IX l
-
Ydl
& --
Furthermore (9.4)
supnS/dE min IIX - Yd[ 8 < co for every s c [1, co). n>l l_
This property implies that (n rid mint _<,<_nIIxyields the second assertion.
~llr),~___lis uniformly integrable which
To prove (9.4) note that supp(P) C A. Let x E supp(P). We have oo
E min I I z - Y, IIS = / h ° ( m i n l
,] 0
l
I I z - Y d l >tl/S)d t
diam(A) = s
f 0
[1 - O(B(x,t))]"t '-~ dt.
130
II. Asymptotic quantization for nonsingular probability distributions
Observe that (9.3) holds (with a different constant c) for every t c (0, diam(A)) if diam(A) > to. In fact, choose tl E (0, to). Then for every to _< t < diam(A)
Aa(B(x, t) N A) >_Aa(B(x, tl) rh A) >_ ct~ >_c(tl/ diam(A))at a. Using the inequality (1 - z) ~ _< e -nz, 0 < z < 1, one obtains with cl = c/Ad(A) diam(A)
E l<_i_
(1
-
cltd)nt s-1 dt
0 oo
<_ S / exp(--ncltd)t s-1 dt 0 _
sF(s/d) _. c2n_8/a. d(r~Cl)S/d
This gives
supnS/dE min IlX n>l
Y, II s =
l
supnS/dfE min [Ix u>_l
j
Yi[[sdP(x) < c2.
l<_i<_n
--
[] The regularity condition (9.3) with supp(P) replaced by supp(U(A)) is discussed in Section 12. Compact convex subsets A of R d with Ad(A) > 0 satisfy this condition. Next we will deal with consequences for the quantization coefficients.
9.2
Random
quantizer
upper
bound
Clearly, random quantizers cannot be better than optimal quantizers giving the following upper bound for Qr([0, 1]d). For the/2-norm, it is due to Zador (1963, 1982). 9.3 P r o p o s i t i o n
Qr([O, 1]d) <_F(2q-d)Mr(B(O, 1)). Proof Choose A = [0, 1]d and P = U([0, 1]d) in Theorem 9.2. Then
Mn,r([O, 1]d) -<- E l<_i<_n min ]IX - Yi[f, n > 1 which yields the assertion.
[]
9. Random quantizers and quantization coet~cients
131
A comparison of the random quantizer and the cube quantizer upper bound shows for the/2-norm and r = 2
F(2 +2)M2(B(O, 1)) < M2([O,1]~), d >_7.
For the/1-norm and r -- 1 one obtains
F ( 2 + d ) M I ( B ( 0 , 1 ) ) < MI([0, 1]~), d_~ 5.
A comparison of the random quantizer upper bound and the ball lower bound given in Proposition 8.3 can be found in the Tables 9.1 and 9.2.
d 1 2 3 4 5 6 7 8 9 10 20 30 40 50 100
M2(B(0, 1)) 0.0833 0.1592 0.2309 0.3001 0.3676 0.4338 0.4991 0.5636 0.6276 0.6910 1.3105 1.9169 2.5175 3.1149 6.0801
F(2 + ~)M2(B(O, 1)) 0.5 0.3183 0.3471 0.3989 0.4566 0.5165 0.5774 0.6386 0.7000 0.7614 1.3714 1.9744 2.5733 3.1696 6.1325
Table 9.1: /2-norm, r = 2. Ball lower bound and random quantizer upper bound for Q2([O, 1]d)
132
H. Asymptotic quantization for nonsingular probability distributions
d 1 2 3 4 5 6 7 8 9 10 20 30 40 50 100
MI(B(O, 1)) 0.25 0.4714 0.6814 0.8853 1.0855 1.2831 1.4788 1.6730 1.8662 2.0585 3.9545 5.828O 7.6920 9.5506 18.8083
r(2 + ~)MI(B(0, 1)) 0.5 0.6267 0.8113 1.0031 1.1960 1.3887 1.5809 1.7725 1.9636 2.1542 4.0422 5.9128 7.7526 9.6330 18.8886
Table 9.2: /1-norm, r = 1
9.3
d-asymptotics
and
entropy
From the ball lower bound and the random quantizer upper bound for Qr([0,1] 6) we deduce the following approximation for large d. 9.4 C o r o l l a r y Let the underlying norm be the Ip-norm, 1 <_p < c~. Then lim d-~/PQr([O, 1]6) = P~ d--~ 2r(ep)~/v r(~) ~" Proof By (2.5) and (2.6)
Mr(B(O, 1)) =
dF(1 + p~ g)r/6
(d + r)2rr(1 + i)r"
From Stirling's formula for the F-function, i.e.
r ( x ) ~ v ~ x * - ½ e -x as x -~ o~, we deduce
TJ,F(1
rip
9. Random quantizers and quantization coefficients
133
Since lim~_~ F(2 + ~) = P(2) = 1, the assertion follows from the Propositions 8.3 and 9.3. [] For the/2-norm and r = 2, one obtains from the preceding corollary lim d-'Q2([O, 1] d) =
d-~o~
1 27re
= 0.0585 . . . .
In this case the cube quantizer yields 1
d-lM2([0, 1]d) = ~ = 0 . 0 8 3 3 . . . , d > 1 and the lattice quantizer based on the lattice Dd gives 1 lim d-I M2(W(O[Da) ) = --~
d--+o~
(cf. Example 8.13). For t h e / l - n o r m and r = 1, one gets dlim - ~ d-*Ql([O, 1]d) = ~i = 0.1839... , 1 d-tMl([0, 1]d) = ~, d > 1, lim d-IMI(W(OIDd)) d-~
1 4
The d-asymptotics for the quantization coefficients of arbitrary product with identical one-dimensional marginals now follows from the preceding and the well known fact that the Renyi entropy of order s approaches the differential entropy as s --~ 1. For P = hA a, the R e n y i e n t r o p y o f o r d e r s by (9.5)
Hs(P) =
1
measures corollary Shannon is defined
log f h 8 dA d, O < s < l.
The d i f f e r e n t i a l e n t r o p y is defined by (9.6)
H(P)=-
f hloghdAd=- f loghdP
provided the integral exists. Note that in (9.5) and (9.6) the entropies are calculated in nats and not in bits. For the/2-norm, the following result is contained in Zador (1963).
[]
9.5 P r o p o s i t i o n Let X 1 , X 2 , . . . be i.i.d, real random variables with distribution P. Suppose Pa ~t 0 and E]X1] ~+~ < co for some 5 > O. Let the underlying norm be the lp-norm, 1 <_p < co. Then lim Q~(X,,.. . , Xa) = 0 if Pa ~ P,
d--+ oo
134
II. Asymptotic quantization for nonsingular probability distributions prerH(e)
liraoo d-~/vO~(X1,... d-~
,Xd) - 2,.(ePl,./pF(1) r if P~ = P.
Proof It follows from the assumptions that d
d
1
1
(®.)
,x.)ll
The r-th quantization coefficient of (X1,... , X~) is given by Qr(X1,... ,Xd) = Q~([O, 1)d)Pa(R) d (~1 d h d/(d+~) where h : dPa/dA d and/54 -- Pa/Pa(R ). Since h E /W0+r)(A) by Remark 6.3, the differential entropy H(/Sa) is well defined and H(/Sa) E [-co, co) (cf. Vajda, 1989, p. 316). We have d
:dlogllhlld/(~+~)
log ~ h 1
d/(d+r)
= (d + r) log f h d/(d+r) dA
= rHd/(,+~)(P~). Therefore d
lim log (~) h
= rH(D~).
Furthermore lim d~/PP~(R)d = 0 if P~(R) < 1.
d-+oo
Thus both assertions folow from Corollary 9.4.
[]
We see that d r/p is the correct order of convergence of Qr(X1,... ,Xd) to infinity (under the/p-norm) provided P = Pa and H(P) E 9.6 R e m a r k (a) Proposition 9.5 immediately yields the d-asymptotics for the vector quantizer advantage as defined in (7.8) in the i.i.d, case: (9.7)
lim dQ~(Xl) --- 2rerr(~)rQ~(Xl) > 1. d-~ Qr(X1,... ,Xd) rre rH(xl) --
9. Random quantizers and quantization coefficients
135
(Here the underlying norm in t h e / r - n o r m , EIXI[ r+a < c~ for some 5 > 0, and the distribution of X1 is absolutely continuous with respect to A.) (b) We know from Table 8.2 that infQ2(P) -- 0 and supQ2(P) = 0% where the infimum and the supremum are taken over all univariate symmetric (absolutely continuous) distributions P with variance equal to one. On the other hand, the normal distribution N(0, 1) is the unique maximizer of the differential entropy among all probabilities P with mean zero, variance equal to one and supp(P) -- R. For such distributions P it follows from Proposition 9.5 that d
d
for sufficiently large d. Consider for instance, the hyper-exponential distribution P = HE(a, b) with a u -- F(~)/F(~) and b --- 1/10. Then Q2(P) ~- 37.0908... while Q2(N(O, 1)) = 2.7206... (cf. Table 8.2). The inequality (9.8) holds for d _> 2 (cf. Table 9.4). Anlogous statements are valid for distributions which maximize the differential entropy among other classes of univariate distributions.
Notes A discussion of the uniform integrability condition in Theorem 9.1 can be found in Zador (1963) and Stadje (1995). Gersho (1979) contains an extension of Proposition 9.5 for the/2-norm to stationary ergodic sequences.
136
H.
Asymptotic
quantization for nonsingular probability distributions
P
H(P)
N(0, o.2)
½log(21ra 2) +
1
L(a)
loga + 2
DE(a)
log(2a) + 1
Dr(a,b)
log(2aF(b)) + (1 - b)¢(b) + b
HE(a,b)
log(2ar )/b) ) +
U([a, b])
log(b - a)
E(a)
log a + 1
V(a, b)
log(aV(b)) + (1
W(a,b)
-
b)¢(b) + b
log(~) + b-b~:~-+ 1
P(a,b)
log(~) +
b+l
b
Table 9.3: Differential entropies. ¢ = F'/F, 7 = Euler's constant = 0.5772...
d
P
Qr(® P)
Q,~
exp(rHd/(d+r)( P) )
=
N(O,o L(a)
t~ (
DE(a)
(2a)r
DF(a, b)
2a~_)f_( d+r~bd+r~{bd+r~d+r Fib)a ~ d / ~ d+r ]
T
!
d+r
HE(a, b) Table 9.4: Quantization coefficients for product probability measures up to Qr([0,1] d)
10. Asymptotics for the covering radius
10
AsyInptotics
137
for t h e c o v e r i n g r a d i u s
Let P be a Borel probability measure on R d a n d let II 1 < r < oo and g : R d -+ R Borel measurable, define
II denote
~ (f lglrdP) l/r, 1191b',~ = Ilgtlr = [inf{~
any norm on R d. For
1 <_
> 0: Igl < c P-a.s.},
r
< oo
r = oo
and (10.1)
e,,,dP) =
i~r Ildoll. oCR d
where do(x) = d(x, ~) = inf IIx - aiI. It follows from L e m m a 3.1 that, for 1 _< r < cx~, aEo
(10.2)
V~,r(P) 1/~ = e,,~(P).
In this section we consider the case r = oo and discuss its relation to the quantization problem (r < oo).
10.1
Basic
properties
Note t h a t e~,o~(P) < oo if s u p p ( P ) is compact. It follows from the continuity of do t h a t Iidoiloo = sup do(x) and hence xCsupp(P)
e,,,oo(P) = inf acRd ]o]<_n
sup
m i n i i x - a[].
xEsupp(P) uEo
So, if we define for a nonempty compact set A C R d, (10.3)
en,~(d) = inf m a x min mix- a]l oC~ d xEA aCo Iot<_~
then e,~,oo(P) = e~,oo(A) for every probability P with s u p p ( P ) = A. A set ~ C ~d with ]~1 <- n for which the above intlmum is attained, is called an n - o p t i m a l s e t o f c e n t e r s for A o f o r d e r c~. Let C~,~(A) denote the set of all n - o p t i m a l sets of centers for A of order c~. Since
max minllx-~ll--min{s >0: xEA aC~ --
U B(~,8)~A} aEo
searching for ~ E C~,oo(A) is equivalent to the geometric problem of finding the most economical covering of A by at most n balls of equal radius. The number e~,oo(A) is called n - t h c o v e r i n g r a d i u s f o r A . Recall t h a t the Hausdorff metric is given by
dtt(A, B) = m a x ( m a x min Ila - bll, max min II~ - bll } " aEA
for nonempty compact sets A, B c R d.
bEB
bEB
aEA
138
H. Asymptotic quantization for nonsingular probability distributions
10.1 L e m m a Let n C N.
(a) I l l < r < s < co, then e,~,~(P) <_ en,8(P). (b) I r s u p p ( P ) is compact, then lim e~,r(P) = ea,oo(P). r--~oo
(c) Let A denote the support of P and suppose A is compact and [A[ >_ n. Let a~ C C,~,~(P), 1 <_ r < co, and let (rk)k>__t be a sequence in [1, oo) converging to infinity. Then the set Of dH-cluster points of the sequence (ark)k>_1 is a nonempty subset of C~,oo(A) and lim dH(ar, C~,oo(A)) = O.
r-~oo
Proof (a) follows from the fact that [[ [It -< [[ [Is for r _< s. (b) and (c). Let u = diam(A) and choose s > 0 such that A c B(0, s). It follows from Theorem 4.1 and L e m m a 2.6 that
C,~#(P) C {a C Rd: 1 < lal < n, ~ c B(0, s + u)} for every r E [1, co) provided ]A[ > n. Using L e m m a 4.23 we deduce the existence of a dH-cluster point for (a~k)k>l. Now let a be the dH-limit of a subsequence of (ar~)k>l which is again denoted by (ark)k>> We have [[d~rk - d~[[r~ _< IIda~ - d~[[oo < sup[d(x,o~rk) -- d(x,a)l xER d
= dH(a~,
~)
and hence
[Idar, llrk >__[Idal[r, - [Id,*rk - da[[rk >_ lid, lit~ - dH(ark,a). By (a), l i m e~,r(P) exists and is less than or equal to e,~,o~(P). Therefore
e~,o~(P) > ,~lim~,r~(P)= lira Ila~r~lit, >_ l i r a [[da[[r~ = [[dalloo >_ e~,o~(P).
This gives (b) and the first assertion of (c). The second assertion of (c) follows from the first one. [] For A C R a compact with Ad(A) > 0, set (10.4)
M,~oo(A)- en,oo(A) ,
Ad(A)l/d"
In spite of the slight inconsistency in notation we continue to write M ~ (A), M,,,~ (A), Qo~(P) etc. for the corresponding notions in case r = oo.
10. Asymptotics for the covering radius
139
10.2 L e m m a
Let A c R a be a nonempty compact set and let T: R ~ --+ R d be a similarity transformation with scaling number c > O. (a) C~,oo(T(A)) = TC~,oc(A). (b) e~,~o(T(A)) = ce~,oo(A). (c) M,~,~(T(A)) = M,~,oo(A) ifAd(A) > O.
Proof Obvious.
[]
The existence of n-optimal sets of centers of order co can be derived from the existence of n-optimal sets of centers of order r < co (cf. Theorem 4.12) and Lemma 10.1(c). 10.3 L e m m a ( E x i s t e n c e )
If A C R d is a nonempty compact set, then C,~,~(A) ¢ O. Proof We assume without loss of generality that [A[ _> n. To show that the assertion follows from Lemma 10.1(c) it suffices to note that A -- supp(P) for some Borel probability measure P on N ~. If A is finite, set P = ~ 5aliA]. Otherwise let B = {bl, b2,... } aEA
be a countable dense subset of A and set P = ~
2-n(~b~. Then
A -- supp(P).
[]
The covering problem can be formulated in terms of the Hausdorff metric and the Loo-minimal metric. 10.4 L e m m a
Let A C R d be a nonempty compact set. Then e,~,oo(A) = inf dH(a,A). lal<_n
If e,~,oo(A) < e.-1,~(A) (eo,oo(A) := co), then C~,~o(A) = {a C Rd: 1 <_ [~l <_ n, dH(~,A) = e~,oo(A)}. Proof Let ~ • Cn,oc(A) and set
t3 = {a • a: d(a,A) <_ e~,oo(A)}. Then fl ~ 0 and
e~,oo( A ) = mea~ d(x, ~) = mea2 d(x, fl) = dH(fl, A ).
140
H. Asymptotic quantization for nonsingular probability distributions
This yields
e,~,~(A) > inf dH(O~,A).
-lal<~ The converse inequality is obvious. Furthermore, the inclusion C~,o~(A) D C : = {a C Rd: 1 _< [a[ _< n, dH(a,A)= e~,~(A)} is also obvious and fl E C. Now assume e~,oo(A)_< en-l,oo(A). Since fl E C,~,oo(A), one gets [fl[ = n. This gives a = fl and thus a E C. [] The L ~ - m i n i m a l m e t r i c po¢ is given by
Pc~(P1,P2) = inf{e > 0: PI(B) < P2(dB ~_ ¢) for all B E B(Ra)} for Borel probability measures P1, P2 on N d with compact support. 10.5 L e m m a
Suppose supp(P) is compact and let X be a Na-valued random variable with distribution P. Then e~,~(P) = inf p~(P,Q) inf poo(P,PS)
=
feYn
---- inf esssupIiX - f(X)][. feY~ Proof Let A denote the support of P. If Q E P~ with Q(a) = 1, Io4 <_ n, let c > 0 such that Q(B) < P(d8 <_~) for all B E B(Rd). Then
1 = Q(a) = P(d~ <_c) which gives A C {d~ _< ¢}. Therefore ]Idol] = m a x d ( x , a ) < poo(P,Q)xEA
This implies
e,~,oo(P) <_ ~cn~pp:c(P,Q) < )nf poo(P,Pf). If a C R d with 1 < [a[ < n, let ¢ = m a x d ( x , a ) . xEA
Choose a Voronoi partition
{Aa: a E a} o f R a with respect to a and let f = ~ alA~. Since AaMA C B(a,¢) for aEa
every a E a, one obtains for ~ C a
"aE,6
"
"aE/~
"
10. Asymptotics for the covering radius
141
Therefore
Poo(P,P f )
(_ c = m a x d ( x , xEA
o:).
This implies
inf p~(P, Pf) < e,o~(P).
fEY~
--
'
For f E E~, let c~ = f ( R d) and Aa = {f = a}, a E ~. Then ess supllX - f(X)ll = inf{c_>O: E P ( A ' ~ f - I {XE Ra: I l x - a l l > c } ) : O} aE~
>_ inf{c_> O: E P ( A , , n { d , ~ > c } ) = O } aE~
= IId lW,
and if {A~ : a E a} is a Voronoi partition of R d with respect to a, then esssupllX - / ( X ) l l
= IId lW.
This implies e~,~(P) = inf esssupllX - f(X)] I. fEY~
[] Since poo(P1,P2) > dH(supp(PO, supp(P2)), p~-eonvergence implies weak convergence and dH-Convergence of the supports. 10.2
Asymptotic
covering radius
Clearly, if A c ]~d is nonempty compact then e~,oo(A) decreases to zero as n -+ oc. We need the following simple lemma. 10.6 L e m m a (a) If A, B C Rd are nonempty compact sets with A c B, then e~,~(A) < e,~,~(B). m
(b) If Ai C R d axe nonempty compact sets and ni E N with ~ ni <_ n, then i=l m
e~,oo( U A,) < max e,~,,o~(A,). i_
-- i_(i_<m
Proof (a) Let fl E Cm~(B ). Then
e,,,.(A) < maxmin I l x - bll < e,~(B). --
xEA
bEfl
--
'
142
H. Asymptotic quantization for nonsingular probability distributions
(b) Let ai • Cm.~(Ai ) and let ~ = 0 ai. Then I~l -< ~ and therefore,
i=1
m
e ~ , ~ ( U A~) _< ma~ rt~
xE U Ai
i=1
min If~ -- ~1[ O,EC¢
i=l
---- max m a x m i n l l z - all
t<_i<_mxEAi aEa
< max m a x m i n l l x - all --
l<_i<_m x E A i
aEai
= max e,~,,oo(Ai).
l<_i<_m
[] Now we can derive the exact asymptotic first order behaviour of the covering radius e,~,oo(A) for compact Jordan measurable sets A with Ad(A) > 0. 10.7 T h e o r e m ( A s y m p t o t i c c o v e r i n g r a d i u s ) Let A C R 4 be a nonempty compact set with Ad(OA) = 0. Let Qoo([0,1] 4) = inf nt/de~,oo([O, 114).
n>_l
Then Q~([0, 1]4) > 0 and lira nl/den,oo(A) = Qoo([O, 1]4)Ad(A)I/d. Proof The proof is given in three steps. S t e p 1. Let A - - [0,1] d. Let m , n e N, m < n a n d l e t k = k ( n , m ) = [(~)t/d]. Choose a tesselation of the unit cube [0, 1]d consisting of k d translates C1,... , Cad of the cube [0, ~] 1 d. Then by Lemmas 10.2 and 10.6, e~,oo([O, 1] d) < max e.~,~(Ci) t<_i<_ka = max Mm,o~(Ci)k -t 1
-
1] e) = k-le~,~([O, 1] ~) = k-lMm,oo([O,
and hence
l'~t/den,oo([O, 1] d) ~ ~--~Tr~l/dern,oo([O, 1]d). This implies lira sup nl/de~,oo([0, 1]d) _< mVde,,~,oo([O, 1]d)
10. Asymptotics for the covering radius
143
for every rrt • N. Therefore, lin~-,oo nl/den,oo([O, 1]d) exists in [0, cx3) and
(10.5)
limoo nW%n,oo([O, 1]
From the subsequent Proposition 10.10 (a) it follows t h a t Qoo([0,1] d) > 0. S t e p 2. Let A = 0 Ci, where { C 1 , . . . ,C,~} is a packing in R ~ consisting of closed i=1
cubes whose edges are parallel to the coordinate axes and with c o m m o n length of the edges l(Ci) = l > 0. Let nl = nl(n) = [~]. T h e n by L e m m a s 10.2 and 10.6
e.,oo(A) < max en<,oo(Ci) -- l
---- m a x Mm,~(C~)I l
= e,~,oo([O, lid)l, n _> m. From Step 1 it follows that
n ~ Udnl1/den,,~t[u, . . . . ±J~,) --+ ml/gQoo([O, 1]d) as n _+ co" nllUe,~,oo([O, 1] d) = [t~11} This implies (10.6)
limsupnllden,oo(A) < Q~([0, 1]d)ml/dl : Qoo([0, lld)Ad(A) 1/d.
To prove t h a t Qoo([0, 1]a)Aa(A) TM is a lower bound for L = liminfn_.~ nl/aen,~(A), let Z = Z(n) • C . , ~ ( A ) for n • N, ~ = Z~(~) = ~ n i n t C ~ and n~ = n~(~) = IZ~I, 1 < i < m. For 0 < c < I/2, let Ci,~ C Ci be a parallel closed cube with the same midpoint as Ci and edge-length l(Ci,c) = l - 2~. Choose a finite set 7i = 7i(e) c Ci,c, 17il = k = k(v) say, such that minllx - ~11 < i~f I1~ - yll for every x E Ci,~, 1 < i < m. aCTi
- - yEC~
Then
e~,~o(A) = m a x min Ix - bI] xcA
bE~
>_ m a x max min IIx - biI l <_i<m xECi bE~U'#i
>_ m a x m a x
r a i n I Ix - bll
l <_i<m xCCi,~ bE~U'Ii
max m a x
min IIz - b l l
l <_i<m xCCi,e bEfliUTi
>_ m a x en,+k,o~(Ci,e) l <_i<m
m a x em+k,oo([0, 1 ] a ) ( / - 2e).
l~i_
144
H. Asymptotic quantization for nonsingular probability distribu tions
Choose a subsequence (also denoted by (n)) such that ni
--
--~ vi E [0, 1], 1 < i < m and nl/den,oo(A) -+ L as n --~ co.
n
ni _< n, we have ~ vi _< 1. Furthermore, vi > 0 for every i. Otherwise, Step
Since i:1
i:1
1 yields L = co, which contradicts (10.6). By taking a further subsequenee we can assume without loss of generality that lirn (ni +
k)l/deni+k,oo([O, 1]d) =
Qoo([o, 1]~).
This implies L > max Qoo([0, 1]d)v~-t/d(l -- 2C). l
Since 0 < e < I/2 is arbitrary and maxl_ m l/e, one obtains L > max Qoo([0, 1]d)v:~l/gl -- l
_> Q~¢([0, 1]d)ml/dl
(10.7)
= Q~([0,
1]d))~d(A) lid.
S t e p 3. Let A be an arbitrary compact subset of R d. Let A c C for some closed cube C whose edges are parallel to the coordinate axes with edge-length l(C) = I. For k C N consider a tesselation of C consisting of closed cubes C1,... , C ~ of common edge-length l(C d = I/k. Set
Ak = [.J { Ci : A n Ci ¢ O, i < k d} Since A C Ak, it follows from Step 2 and Lemma 10.6 (a) that
limsupnl/de,~,oo(A) <_ lim n~/de,~(A}) = Qoo([0, 1]d),kd(Ak)l/d, k E N. If U C R d is an open set with A c U, then Ak C U for sufficiency large k. Therefore
he(A) <_inf ~d(Ak) k>l
_< inf{)~d(U) : U C R d o p e n , A C U} = )~d(A) which yields (10.8)
limsupnUden,ac(A) <_ Qo.([0,
1]d)Ad(A)l/d.
Now s u p p o s e )~d(OA) ---- 0. To prove that Qoo([0, 1]d))~d(A)t/d is a lower bound for liminf~_~o~ nt/de,~,oo(A) we may assume that Ad(A) > 0. Set
Bk = LJ {Ci : Ci c A, i < k d}
10. Asymptotics for the covering radius
145
Since Bk C A, it follows from Step 2 and Lemma 10.6 (a) that liminfnUden ~(A) > lim nl/de,~,~(Bk) = Q~([0, 1]a)Ad(Bk) Ud, k C N with Bk ¢ 0. Since Ad(0A) = 0, A is Jordan measurable and hence suPk>l Ad(Bk) ----A4(A). Thus we obtain (10.9)
liminf nl/de~,oo(A) >_ Q~([0, 1]d)Ad(d) 1/d.
Combining (10.8) and (10.9) the theorem is proved.
[]
For compact sets A with ~ ( A ) = 0 the preceding theorem only yields e,~,~(A) = o(n-1/d). An investigation of the exact order of en,~(A) for several classes of compact sets A with ~d(A) = 0 is contained in Chapter III. For A C R d compact with Ad(A) > 0 and ~(OA) = 0, define the covering coefficient (with respect to coverings of A by balls of equal radius) by
Q~(A) = Q~([0,1]d)A~(A) 1/d.
(10.10)
Qoo(A) is sometimes called q u a n t i z a t i o n coefficient of o r d e r ~ . It follows from Lemma 10.1(a), Theorem 6.2 and Theorem 10.5 that Qr(P) 1/r is increasing in r and (10.11)
lim Qr(p) ~/~ = lim Qr(A) ~k <_Qoo(A)
r--~OQ
r-~oo
provided supp(P) -- A and P is absolutely continuous with respect to Ad. 10.8 R e m a r k (a) We conjecture that equality holds in (10.11) (cf. the special cases given in (10.17), (10.19) and (10.20)). This would show in a precise manner that the covering problem is a limiting case of the quantization problem. (b) Possibly the condition Ad(OA) = 0 can be dropped in Theorem 10.7. This is true if the conjecture in (a) can be resolved for the unit cube. In fact, we have for an arbitrary nonempty and compact set A C R ~ with )~d(A) > 0 Qr([0, 1])l/r~d(A)i/d = Q~(A) 1# = lim nl/~e,,~(U(A)) ~-4OO
< lim inf nl/de,~,~ (A) n--~oo
< limsupnl/%m~(A ) Qoo([o, 1]d))~d(A)1/d, i <_r < co, Here the first inequality follows from the Lemmas 10.1 and 10.6 (a) while the last inequality follows from (10.8). Since by assumption l i m ~ Q~([0,1]~) Ur = Qo~([0,1]~), this implies
lim nl/de~,~ (A) = Q~ ([0,1] d)A~(A)1/d.
146
II. Asymptotic quantization for nonsingular probabifity distributions
10.9 R e m a r k
Let A C ]~d be an infinite compact set with Aa(0A) = 0. For ¢ > O, let N(c, A) be the minimal number of balls of radius 6 > 0 which are necessary to cover A, i.e., N(¢) = N(E, A) = min(n _> 1: e~,oo(A) <_~}
(
=min. n>l:3aCR
a, l a l < n ,
B(a, 1)
C aE(~
}
Then
>_eN(~),oo(A), hence by Theorem 10.7 lim infN(¢)¢ d > lim N(¢)eN(c),oo(A) d e--~0
--
¢-~0
= Qoo([0, 1]d)dAd(A). The definition of N(¢) implies
< eN(c)-l,oo(A), hence again by Theorem 10.7 lim sup N(e)¢ d < lin~ N(c)eN(e)-l,oo (A) d ¢-~0
= Qoo([0, 1]dldAd(A). We obtain lim N(~, A )~d = Q~ ([0, 1]d)dAd(A).
(10.12)
g--~0
(Actually, this limit result is equivalent to the assertion of Theorem 10.7). From (10.12) we deduce that Ad(B(0, Qoo([0, lid))) coincides with the density of the thinnest covering of the whole space by translates of B(0,1) (cf. Gruber and Lekkerkerker, 1987, p. 237, Definition 6). The existence of the limit Iim N(~, A)~ d appears in Gruber ~--+0
and Lekkerkerker (1987, p. 237, Theorem 7) for convex compact bodies A.
10.3
Covering
radius
of lattices
and
bounds
As for the quantization coefficients, the covering coeffcient Qoo([0,1] d) is only known for d = 1, d = 2 (/1-norm, /2-norm), and in "trivial" cases for d > 3. Lower and upper bounds for Qoo([0,1] d) which correspond to those given in Proposition 8.3 and Theorem 8.9 can easily be derived. For A C R d compact with Ad(A) > 0, set
Moo(A) = Ml,oo(A).
10. Asymptotics for the covering radius
147
Note that if A C R d is an admissible lattice, then (10.13)
Moo(W(01i)) = max{llx[[ : x • w(0, A)} _ supxeR~ minaeh I[x - all det(A)Ud det(A)Ud '
that is, Moo(W(0IA)) is the normalized covering radius of A (with respect to the whole space). 10.10 P r o p o s i t i o n
1 (a) lirar~oo Qr([0,1]d) Ur _> M~(B(O, 1)) = Ad(B(0 ' 1))1/d.
(b) /I'A C R d is an admissible lattice then Qo~([0, 11d) _< Moo(W(OIA)). Proof (a) By Proposition 8.3 lira Qr([0, lid) l/r >_ lim Mr(B(O, 1)) Ur r-~oo
r-~oo
= io~(B(O, 1)). (b) For n • N, let c = c(n) = (n
det(A))-l/d,
/3~,A = { c ~ : a • A, c W ( ~ l h ) n [0,1] d # 0},
k -- k(n) -- IZ,~,AI and
A,~ = U W(blcA). bE,8,,.,A
Then 1 __5Ad(An) = k/n. If U c R d is an open subset with [0, 1]d C U, then A , c U for sufficiently large n. Therefore
1 < inf k(n) : inf Aa(A,~) n_>l
n
n_>l
inf{Ad(U) : U C R d open, [0, 1]d C U} = .xd([o, 1] d) = I.
148
II. Asymptotic quantization for nonsingular probability distributions
Furthermore, ek,o+([0,1] d) < max min --
]lx-bll
xC[O,1] d bef~u,A
< max min IIx - bl] wEAr, bE~n,A
= max{l[xll : x e W(OIcA)} = cmax{llxlr:x
e W(OIA))
-_ n-'/'~Moo(W(OlA)).
This implies
Q~([0,1{) = i.f kl/%,~([0,1{) n>l
<_i~>_l(k(~nn) )l/d Moo(W (OlA)) =/~(W(01A)). [] Let the lattice covering coefficient of [0,1] d be defined by (10.14)
Q~)([0,1] d) = inf{M~(W(0[A)) : A c R d admissible lattice}.
Then (10.15) Note that (10.16)
Qoo([o, 1]d) _< Q~)([o, 1]d).
Q(L)([0, 1]d)1/r is increasing in
r and
li+m Q(L)([0, 1]d) 1/r < Q~)([0,1]d).
If the underlying norm is the/2-norm, Q~([0,1] ~) is known for d = 1 and d = 2. For d = 2 and the hexagonal lattice A c R 2, it follows from Theorem 8.15 that lira Qr([0,112) 1/r = lira
/'--+(30
r--+OO
Mr(W(OIA)) 1/r
=/oo(W(0[i)) and hence, by (10.11) and Proposition 10.10 (b) Q~([0,1] 2) -- lira Q~([0,112) 1/~ r=~OO
(10.17)
= M~(W(0[A))
= (~_~)1/2 = 0.6204... , l~-norm
10. Asymptotics for the covering radius
149
This result is due to Kersher (1939). Solutions of the lattice covering problem are known for dimensions i < d < 5 among them the hexagonal lattice for d -- 2 and D~. We have
(L)
d
1)1/24 ( d(d + 2). )1/2, 1 < d < 5, /2-norm
Qoo ( [ 0 , 1 ] ) - - ( 4 +
(lO.18)
12(4+1)
21/a51/~ Q(oo L) ([0,1] a) - - - 0.7043... ,
4
21/2 Q~)([O, 1]4) = ~ = 0.7733..., Q~)([O, 1]5)
-
351/~ 21/269/1---------~ = 0.8340...
(cf. Conway and Sloane, 1993, p. 12 and Chapter 2, Section 1.3 and the subsequent Remark 10.10 (a)). A trivial case occurs for the/1-norm and d = 2, where B(0, 1) -- W(OID2) (cf. Example 8.13). Therefore Qoo([0,1] 2) = I i m Qr([0, 112)i/r (10.19)
= M~o(B(O, 1)) = ~
1
= 0.7071... , /i-norm.
A further trivial case concerns the/oo-norm. Then B(0, 1) = W(01Z 4) and so Qoo([0,1] d) = lim Qr([0,1]d) 1/r
(10.20) 1 lo~-norm = Moo(B(O, 1)) ----2' (cf. (8.8) and (8.9)). 10.11 R e m a r k (a) The lattice covering coefficient Q(oo L) ([0,1] d) coincides with the so called lower absolute inhomogenious minimum of the ball B(0, 1) and A~ (B(O, Q~) ([0,1]~))) coincides with the density of the thinnest lattice covering of the whole space with B(0, 1) provided every lattice is admissible (cf. Gruber and Lekkerkerker, 1987, p. 230 and p. 236). Note that =
1
sup det(A) l/d' where the supremum is taken over all admissible lattices A such that {B(a, 1) : a E A} is a covering of R d. For arbitrary (not necessarily admissible) lattices A one can show that
Qoo([o, 1] d) < max{ilxll: x e W(01A)} -
det(A)Ud
(cf. Gruber and Lekkerkerker, 1987, Theorem 6, p. 235).
150
H. Asymptotic quantization for nonsingular probability distributions
(b) ( d - a s y m p t o t i c s ) If the underlying norm is the/v-norm, with 1 < p < co, then P lira d-~ooinfd-U'Qoo([O, iI d) _> 2(ep)l/pF(}). This follows from Proposition 9.4. The upper bound Qoo([O,1]d ) < (d log d + d log log d+5d) Vd Ad(B(0, 1))Ud , d _> 3 which is due to Rogers (1957) gives P limd_~oosupd-1/PQoo([0,1]d) -< 2(ep)I/PF(1/P)" Therefore lim d-UPQoo([O, 1]d) = P /p-norm. d-~oo 2(ep) VVF(1/p)'
(10.21)
Thus we find for the covering coefficient Qoo([0,1] d) exactly the same d-asymptotics as for the r-th roots Q~([0, lid) 1/~ of the r - t h quantization coefficients, 1 _< r < co (cf. Proposition 9.4). (c) Let the underlying norm be the/2-norm. For d = 2, the solution of the lattice quantizer problem - - given by the hexagonal lattice - - does not depend on r (cf. Theorem 8.15). For d --- 3, the lattice D~ solves the lattice quantizer problem for r -- 2 (cf. (8.8)) and the lattice covering problem (r -- co). So possibly D~ solves the lattice quantizer problem for every r. For d = 4, the solution of the lattice covering problem is unique up to linear similarity transformations (cf. Baranovskii, 1965) and differs from the best known lattice quantizer/94 for r = 2 (cf. Conway and Sloane, 1993, p. 12 and p. 61). Therefore, solutions of the lattice quantizer problem must depend on r. One may use n-optimal (or asymptotically n-optimal) sets of centers of order r -- co as quantizers. Their asymptotic performance depends on the covering density of the unit ball (cf. Remark 10.8). 10.12 P r o p o s i t i o n
Let A c R d be a nonempty compact set with Ad(A) > O, let (O/n)n_>1 be an asymptotically n-optimal set of centers for A of order co, i. e., [an[ _< n and lim n TMmax min [[x - a[] = Qoo([O, 1]d)Ad(A) l/d,
n-+co
x c A aEan
and let 1 < r < co. Then limsupn ~/d f min [[x - a[[r dU(A)(x) < o(d+~)/dM~(B(O, 1))Ad(A) r/d, n--~oo
, ] aEan
where Od ----Ad(B(O, Qoo([O, 114))). In particular Qr([o, 1]d) < zg(d+r)/dMr(B(O, 1)).
10. Asymptotics for the covering radius
151
Proof Let sn = mea~ d(x, c~,~). Then s~=max
max
IIx-all
aea, xCW(alan)nA
which gives W ( a l ~ ) n A c B(a, s~), a e ~ . This implies f min IIx - aiF dU(A)(x) acorn
<_~
f
aea~W(alan)nA
IIz-allr dx/)~d(A)
-<)-]" S IIx-allrdx/'>'<~(A) 5 nM,.(B(O, 1)),,td(B(O, S,,))(d+")ldlxd(A )
= ns~+';td(B(O, 1))(d+')ldMr(B(O, 1))l;kd(A). Therefore nr/d
f ] min IIx - allr dU(A)(x) J aE O~n
<_ (nsg~d(U(O,1)))(d+r)ldMr(U(O,1))/~d(A) for every n E N. This yields the assertion.
[]
The above covering density upper bound for Qr([0,1] d) is better than the "trivial" bound Q~([0, 1]d) r as long as ~d < (d + r)ld while for the r-th root lim (#(~d+~)/dMr(B(O,1))) 1/r = Q~([0, lid). 1"~00
10.4
Stability properties and empirical versions
A stability property for the n-th covering radius in terms of the metric dH follows immediately from L e m m a 10.4. If A, B C R d are nonempty compact sets, then (10.22)
le~,~(A) - en,o~(S)l < dH(A,B)
for every n E N. A stability result for n-optimal sets of centers of order r --- co can be derived from L e m m a 4.22. 10.13 T h e o r e m Let dH(Ak, A) -+ 0 for nonempty compact sets Ak, A C R d and let c~ E C,,,~o(Ak), k E N. Suppose (10.23)
e,~,o~(A) < e,~_l,~(A).
152
H. Asymptotic quantization for nonsingular probability distributions
Then the set of dH-cluster points of the sequence (ak)k>l is a nonempty subset of C~,oo(A) and dH(O~k,C~,oo(A)) -+ 0 as k -+ oo. Proof To show that the asserton follows from Lemma 4.22 applied to the space of all nonempty compact subsets of R d equipped with the Hausdorff metric du, the subset N = {a C Rd: ]a[ _< n, a ~ 0}, and f = dH(A, .), it suffices to verify that
i(c) = {c~ E N: dg(a,A) < c} is dg-compact for some c > eu,oo(A). By Lemma 10.4, this setting meets the covering problem because the assumptions imply en,oo(Ak) < e,~-l,oo(Ak) for all large k. Choose s > 0 such that A c B(0, s). Then
L(c) C {a e N: a E B(O,c+s)}. Using Lemma 4.23 we deduce the dH-compactness of L(c).
[]
If in the preceding theorem the sets ak C Cuoo(Ak) satisfy maxd(a, Ak) < emoo(Ak) for all large k (such a choice is always possible), then the assumption (10.23) can be dropped. Under suitable conditions, weak convergence of probability distributions implies the dH-Convergence of their (compact) supports. The following special case will be needed. 10.14 L e m m a
Let Pk ~ + P for Borel probability measures on R d with compact supports Aa and A, respectively. Then max min [[x - y[[ -+ 0, k -+ oo. xcA
yEAk
Hence, if Ak C A for every k E N, then lim dH(Ak, A) ----O.
k--~oo
Proof For e > 0, choose a finite subset a of A such that A c U B(a, e). For a E a, define aEt~
a bounded continuous function f~: R d --+ R+ by
fa(x) = max{O, 1 - [Ix - all/e }. Then
max~e~df f ~ d P k - / f~dP --+ 0, k ~ o o . Since rain f f~ dP > 0, one gets aE~
minPk(B(a,c)) > min f fadPk > 0 aEa -- aEa j
10. Asymptotics for the covering radius
153
and therefore max min Ila - Yll < aE~ yEA~
for sufficiently large k. This implies the assertion.
[]
From the stability properties one immediately obtains consistency results for empirical versions of the covering problem. Let X1, X2,... be i.i.d. Rd-valued random variables with distribution P. The empirical version of e~,oo(P) is given by en,oo(Pk) = eu,o~({Xl,. . . , Xk}) ---- inf max min llXi - all , lalgn l <_i<_k aEa
where Pk denotes the empirical measure of X1,... , Xk. 10.15 C o r o U a r y ( C o n s i s t e n c y ) Let A denote the support of P and suppose A is compact. (a) d , ( { X 1 , . . .
, X k } , A ) = m a x min IlY- Xi[I --~0 a.s., k -+ oo. yEA l ~ i < k
(b) e~,oo({X1,... , Xk}) --+ e~,oo(A) a.s. as k --+ co uniformly in n. (c) Let otk = ~ k ( X l , . . . ,Xk) C C,~,oo({Xl,... ,X~}), k E N. Suppose (10.23) for A = supp(P). Then dH((~k,C~,oo(A)) -+ 0 a.s., k --+ oo. Proof Since Pk D> p a.s., the assertions follow from Theorem 10.13, Lemma 10.14, and [] (10.22). Notice that uniqueness IC~,oo(A)l = 1 implies (10.23). Under this uniqueness condition, Corollary 10.15 (c) is contained in Cuesta-Albertos et al. (1988, Theorem 12). Part(a) has been observed by Wagner (1971).
Notes Some material about the issue of this section for the/2-norm and/co-norm may be found in Niederreiter (1992), Chapter 6. In particular, the exact order n - l I d of e~,~(A) is well known ff A~(A) > 0. However, we are not aware of a reference concerning Theorem 10.7. Examples of n-optimal sets of centers for [0,1] 2 of order co can be found in Johnson et al. (1990) for the/1-norm and the/2-norm. The covering density upper bound for the r-th quantization coefficients given in Proposition 10.12 seems to be new. A discussion of the relation between the quantization problem for r ----2, the covering problem and the packing problem can be found in Forney (1993) for the/z-norm. For general treatments of the covering problem we refer to Gruber and Lekkerkerker (1987) and Conway and Sloane (1993).
154
II. Asymptotic quantization for nonsingular probabifity distributions
Consistency and central limit results for a trimmed version of the covering problem have been proved by Cuesta-Albertos et al. (1998) and Cuesta-Albertos et al. (1999). Empirical versions of related covering problems and their asymptotics when both the level n and the sample size k tend to infinity were studied e.g. by Zemel (1985) and Rhee and Talagrand (1989b) for the/2-norm. Let us mention that n-optimal sets of centers of order co are often called best n-nets and Chebyshev-centers in case n -- 1. (cf. Garkavi, 1964, and Singer, 1970, Section
II.6.4). 10.16 C o n j e c t u r e l i m r ~ Qr([0, lid) Ur = Q~([O, 1]d) (cf. (10.11) and Remark 10.8). If Conjecture 8.17 can /2-norm. Furthermore, vides a solution of the Moo(W(OID~) ). This is
be resolved, then Conjecture 10.16 is true for d = 3 and if Conjecture 8.17 can be resolved, then the lattice D~ procovering problem in R a for the /2-norm, i.e., Qo~([0,1] a) -a long standing conjecture in geometry.
Chapter III Asymptotic quantization for singular probability distributions In this chapter we consider some classes of continuous singular distributions on R d and determine the asymptotic first order behaviour of their quantization errors.
11
The quantization dimension
Here we determine the order of convergence for the sequence of quantization errors of a given distribution. X is an Rd-valued random variable and P is its distribution. In some cases we abbreviate e,~,r(P) by en,r, Vn,r(P) by Vn,r, and C~,r(P) by Cn,r. In this section we always assume either that 1 <_ r < co and E(IIXII r) < + c o or that r = co and supp(P) is compact.
11.1
Definition
11.1 D e f i n i t i o n D_~ := D__~(P) = liminf ~--~OO
and
l°g~
elementary
-- l o g e n , r
properties
is called the l o w e r q u a n t i z a t i o n d i m e n s i o n o f P
o f o r d e r r. D--~ := D r ( P ) = lim sup ~- l o g e n , r is called the u p p e r q u a n t i z a t i o n d i m e n s i o n o f P o f o r d e r r. If the two numbers D~ and Dr agree then their common value is denoted by (= Dr(P)) and called the q u a n t i z a t i o n d i m e n s i o n o f P o f o r d e r r.
Dr
11.2 R e m a r k and Dr do not depend on the underlying norm. D__ooand Doo depend only on the support of P. Using the definition of e,~,oo(K) in (10.3) we also define D__oo(K), D--~(K), and Doo(K) for an arbitrary (nonempty) compact set K C R d.
156 11.3
III. Asymptotic quantization for singular probability distributions Proposition
(a) If O <_ t < ~
< s then lim ned,r.. = +oo and lim inf ne~ r = 0
(b) IfO <_ t < Dr < s then l i m s u p n e ~ r = + o o and lim ne~, r = O. Proof Let us first prove (a). If e~,r = 0 for some n E N then D_~ = 0 and (a) is obvious. Suppose en,r > 0 for all n E N. For 0 _< t < D_~ choose t' E (t, D__~). Then there exists an no E N with en,r < 1 and
log n - log
> t'
en,~
for all n _> no. This implies
ne~, r > 1 and, hence
t-t' ll.etn,r > en,r
for all n _> no. Since lim en,r = 0 we deduce n--+O0
lira ne~# = +c~. For D_~ < s there is an s' E (Dr, s) and a subsequence (enk,r) of (en,r) with
enk,r
log nk
_<s'.
- l o g en~,r
for all k E N. This implies
st
nkenk,r <_ 1 and, hence nkeSk,r
--~ es-S'n~,r
Since lim eukr = 0 this leads to rb--~oO
l i m i n f n e ~ r < h" m nke-s
[]
P a r t (b) can be proved in a similar way. 11.4 C o r o l l a r y
(a) If l < r < s < oo then D_~ ~_ ~
= O.
__
m
and Dr ~_ Ds.
11. The quantization dimension
157
(b) I f D e (0, +oo) is such that
0 < liminfneDr < limsupneDr < + o o ~-+00
~-+OO
then Dr = D. (c) Let 1 < r < c¢ and suppose E(llZll r÷~) < + c ¢ for s o m e ~ > 0 then Dr < d. I f the absolutely continuous p a r t P~ of P does not vanish then Dr = d. (d) Let r = oo. Then -Doo < d. /fAd(supp(P)) > 0 then Doo = d.
Proof (a) follows from L e m m a 10.1 (a) and Proposition 11.3 (b) follows immediately from Proposition 11.3 (c) and (d) follow from Proposition 11.3, Theorem 6.2, (10.8), and the fact that ne,,oo(P) d > Ad(supp(P))/Ad(B(O, 1)).
[]
11.2
Comparison
to the
Hausdorff
dimension
Next we will investigate the connection of the quantization dimension to other types of dimension. Let us first consider the relationship of D_~(P) to the Hausdorff dimension of the support of P. For a set A C R ~ and e > 0 an e-cover of A is a cover of A by sets U~ each of diameter at most c, i. e., diam(Ui) -- sup{Hx - YH: x, y E A} < e. For s >_ 0 let ~/~(8A ) = inf{ E
diam(U~)8 : (Ui)~el is an e-cover of A}
iEI
Then ~/~ (A) = lira 7/~ (A) is the s - d i m e n s i o n a l H a u s d o r f f m e a s u r e of A. It is easy ¢--#0
to check that ?-/S(A) is non-increasing with s and that 7/t(A) > 0 implies ~/S(A) = c¢ for all s < t. The H a u s d o r f f d i m e n s i o n of A is defined as dimH(A) = sup{s > 0: 7-/~(A) = oo} = inf{s >_ 0: ~/~(A) = 0}. While the definition of the Hausdorff measure depends on the underlying norm the Hausdorff dimension has the same value for all norms on R ~. 11.5 P r o p o s i t i o n Let K C R d be compact and let P be any probability measure with supp(P) = K. Then, for every t > O, (11.1)
7-lt(K) < 2 t liminf ne,~,oo(P) t,
in particular
dimH(K) _< D__oo(P).
158
III. Asymptotic quantization for singular probability distributions
Proof Let an E C~,oo. Then (B(a, e~,oo))aca~ is a cover of K. Let c > 0 be arbitrary. Since (e~,oo)neN converges to 0 there is an ne E N with en,oo _< ~ for all n ___n~. This implies ?-/t6(g ) < inf ~
diam(B(a, en,oo))t
aE~n
< inf n(2e,~oo)t By letting ¢ tend to 0 we obtain ~/t(K) < 2t lim infne~ oo. The remaining claim in the proposition follows from Proposition 11.3.
[]
The H a u s d o r f f d i m e n s i o n o f a (probability) m e a s u r e P is defined to be dimn(P) = inf{dimH(A): A e B(Rd),P(RU\A) = 0}.
(11.2) 11.6 T h e o r e m
For all r > 1,
dimH(P) < D_Q,(P). Proof. The proof will be given in Corollary 12.16.
[]
Since dimH(P) < d, the above inequality can be strict (see Exmaple 6.4). In the case that r = 2 and supp(P) is compact the above theorem was proved by PStzelberger(1998a).
11.3
Comparison
to the box dimension m
Now we will consider the relationship between D__oo(P), Doo(P) and the upper and lower box dimension of supp(P). For our purposes the box dimension (entropy dimension, Minkowski dimension) of a compact subset K of R d is most conveniently defined in the following way: For s > 0 let N(e) = N(e, K) = min{n E N: e,~,oo(K) < ~}. Then (11.3)
dim B (K) = lim inf log N(¢) ~o - log¢
is called the lower b o x d i m e n s i o n of K and (11.4)
dimB(K) = lim sup log N(c) e-~0 - l o g e
11. The quantization dimension
159
is called the u p p e r b o x d i m e n s i o n of K. If dimB(K ) = dimB(K) this value is denoted by dimB(K) and called the b o x dim e n s i o n of K. This definition suggests that there is a close relationship between D__~(K) and dim.B(K ) and between D ~ ( K ) and dimB(K). We have the following result. 11.7 T h e o r e m Let K C R d be compact. Then (i) dimB(K ) = D__~(K), (ii) dimB (K) -- Doo (K). Proof (i) To prove dimB(K ) < D__oo(K) let n > 1 be a natural number. Then Nn := N(en,oo) <_ n and eN,,,oo = en,oo. We deduce D__oo(K) = lim inf n~¢~
-
log n > lim inf log en,oo - r , - * o o
-
log N~ log en,oo
>_ lira inf log N(e) = dimB(K). 6-*0 loge Next we show dimB(K ) > D_oo(K). For e > 0 the definition of N(e) implies eN(6),oo _~ C, hence D__~(K) = liminf n-*oo
log n -
log
e~,oo
log N(e)
< liminf -
6-*0
-
log
eg(e),oo
_< lim inf log Y ( e ) _ d i m B ( g ) . 6-,0
-
log
e
(ii) First we will prove that there is a k _> 1 such that ekn,oo "( 1-e for all n _> 1. For n _> 1 choose a E Cu,oo. Let fl C R d be of minimum cardinality with
d~(x) _< 1 for all x E B(0, 1). Let k ----I~l. For y E R d, e > 0 and/~(y,e) = e~ + y we have dz(u,6) (x) _< l e
160
III. Asymptotic quantization for singular probability distributions
for all x E B(y, ~). Let (~' = U / 3 ( y , emoo). yea
T h e n ]a' I < kn a n d for every x E K there is a y E a with IIx - yI] < e~,oo a n d hence a z E fl(y, e,~,oo) with
l l x - zll <__
~ r%oo"
T h u s we o b t a i n d(x, o~') < ½e,~,oo. T h i s implies
ek.,oo < sup d.,(x) < ½e.,=. xEK
Next we will show t h a t , for n E N with e~,oo > 0, we have
N(e,~,~) <_ n < kN(e~,~).
(11.5)
Let N,~ = N(e,~,~). T h e n N,~ _< n and since eu,,~ = e.,oo > 0, we have 1 CkNn,oo ~-- ~eN,,oo < Vn,oo.
T h i s implies n < kN,~. Using (11.5) a n d lim en,oo = 0 we get rt--+oo
d i m B ( K ) = lim sup log N(¢) > lim sup log_,e~,~,N(~ c-~0 - log e - ~ - ~ - log en,~ log I n > l i m sup - ~-~o0 - log en,c,~ [ -logk logn ] = l i m s u p [--]-og--'~n,~ ~ - log en,~ J = Doo(K). To prove the converse inequality observe t h a t , for small ~ > 0, eN(v)_l,oo >
E.
T h i s leads to dimB ( K ) = lim sup log N ( e ) c~0 - log e log N ( e ) _< lim sup ~0
-
log
eN(~)-l,~
log(N(e) - 1)
logU(e) = lim sup ~-~0 - log eN(~)-l,oo log(N(e) - 1) = l i m s u p log(N(e) - 1) ~-~o -- logeN(~)-l,oo < Doo a n d completes the proof of the theorem.
[]
11. The quantization dimension
161
11.8 Corollary Let K c R ~ be compact. I f the box dimension of K exists then the quantization dimension of K of order co also exists and equals the box dimension. 11.9 Proposition Let P be a probability on R d with compact support K. Then, for 1 < r < s < co
D---~(P) <_-D,(P) 5 -Doo(P) = -D~o(K) = d i m B ( K ) and D_~(P) _< D__~(P) _< D__oo(P) = D D_oo(K) = dimB(K). Proof The result follows immediately from Corollary 11.4 and Theorem 11.7.
11.4
Comparison
to the
rate
distortion
[]
dimension
T. K a w a b a t a and A. Dembo (1994) introduced the concept of rate distortion dimension for probability distributions. They showed t h a t for norms on R d the rate distortion dimension is the same as R~nyi's information dimension• Here we will compare the quantization dimension to the rate distortion dimension• Let us recall its definition• For x = 0 set x log(x) -- 0. For a probability Q on the Borel a-field of R d x R a denote by Q1 and Q2 the marginals on the first a n d second component, respectively. If P is a probability on R d and Q is a probability on R d x R d with P = Q1 then the a v e r a g e m u t u a l i n f o r m a t i o n I(P, Q) of Q is equal to
f h(x, y) log h(x, Y) dQ, ® Q2(x, y) if Q is absolutely continuous with respect to Q1 @ Q2 and h is the corresponding R a d o n - N i k o d y m derivative and equal to co otherwise. Let 1 < r < co. The r a t e d i s t o r t i o n f u n c t i o n o f o r d e r r, Rp,r : (0, q-OO) --+ R, is defined by
Rp, r(t) = inf{I(P,Q): Q probability on R d x ~ with Q, = P and ]Ix - yH~dQ(x, y) <_ t}. The u p p e r r a t e d i s t o r t i o n d i m e n s i o n (of order r) of P is defined to be • dima(P)=llmsup e-.0
R p,(e r ) "'" . - log
The l o w e r r a t e d i s t o r t i o n d i m e n s i o n (of order r) of P is d i m R ( P ) = .i ,. m . . l m RP'~(¢r) E~o - loge
162
III. Asymptotic quantization for singular probability distributions
and, if the two values agree, it is called the r a t e d i s t o r t i o n d i m e n s i o n (of order r) of P and denoted by dimn(P). It is shown in Kawabata and Dembo (1994, Proposition 3.3) that the (upper, respectively lower) rate distortion dimension does not depend on r and equals the corresponding (upper, respectively lower) information dimension introduced by R~nyi (1959). 11.10 T h e o r e m
Ill < r < oo then dimn(P ) _< D__~(P). Proof Let c > 0 be given. Let n E Nsatisfye~,r <_ e. Let f : R d - + R d be an n-optimal quantizer of order r and Q the image of P on R d × R d under the map x -+ (x, f(x)). ThenQl=P, Q2=P/=Pof-land
: f
dP(z)=f II
- S( )II
- yl[~ dQ(x,y) <_e r .
Set a = f ( R d) and define h: R d x R d --~ R by
h(x,y)
J'o,
y¢
~ l 1{ I = y } ( x ), y E c~.
For a Borel set A C R d x R d we obtain
h(x, y) dQ1 ® Q2(x, y) = / / 1 A ( X , y)h(x, y) dQ2(y) dP(x) A
= / ~ P ( f = a)lA(X, a)h(x, a) dR(x) aE O~
= ~ / V ( f =--a)IA(X' a ) p ( f L a) ~l{f=a'(x) alP(x) aC~
{f= } = ~ - ~ P ( ( x : f(x) = a and (x, f(x)) e A}) aE¢~
= P({x: (x,f(x)) e A})
= Q(A). Thus Q is absolutely continuous with respect to Q1 ® Q2 and h is the corresponding
11. The quantization dimension
163
Radon-Nikodym derivative. By the definition of
Rp,~ we
get
R~,.(c')< s(f, Q)
= f h(x, y)log h(x, y) dQ, ® Q2(x, y) = f log h(x, y) dQ(x, = f logh(x,
=a~ea i
y)
f(x)) dR(x)
logh(x,a)dP(x)
{f= }
--~-~"f log~dP(x) = - ~_i P(I = a) log P ( f
= a)
aE~
___ log ]~].
Since f is n-optimal and a = f(R d) we know that [a] = n (cf. Theorem 4.1). Thus we have shown that e~,r < e implies Rp,r(er) <_log n. Now let ne C N be the smallest natural number with en,,r <_ e. Then we get
R~,~(Yn~,.) < logn~ hence
Rp,r(V,~.,r) <
logn~
- l o g e~,,~ -
- l o g e~,,r"
This implies liminf RP'~(vr) <_lim inf logns e-+0 - l o g e s~0 - l o g e ~ , ~ " With e~ = e~,r we know that n~. _< n and e. .... = e~,r and get liminI logn~
Remark
It follows from the preceding theorem that, for 1 < r < cx), d i m n ( P ) _< D , ( P ) _< D__~(P) _< D ~ ( P ) = dimB(supp(P)). It remains an open question whether dimR(P) <_ Dl(P).
[]
164
III. Asymptotic quantization for singular probabifity distributions
Notes The concept of quantization dimension was introduced by Zador (1982). Hausdorffand box dimension are classical mathematical notions which play a central role in fractal geometry. A good survey can be found in the books of Falconer (1985, 1990, 1997). The rate distortion dimension is introduced and thoroughly discussed by Kawabata and Dembo (1994) where the identity with R~nyi's information dimension is also pointed out (cf. R6nyi, 1959).
12. Regular sets and measures o f dimension D
12
165
Regular sets and measures of dimension D
The notion of regular sets and measure of dimension D is an obvious modification of a concept of regularity used by David and Semmes (1993) and attributed to Ahlfors by these authors. The class of regular sets contains, for instance, convex sets in R d, surfaces of these sets, compact Cl-manifolds, and serf-similar sets (satisfying the open set condition). Examples of regular measures of dimension D are certain measures which are absolutely continuous with respect to the Hausdorff measure on a regular set of dimension D. Here we give a detailed discussion of the asymptotic behaviour of the quantization errors for regular measures of dimension D. In this section [[ [[ is o
an arbitrary norm on R d and D is a nonnegative real number. By B(a, r) we denote the open ball of center a and radius r.
12.1
Definition
and
examples
12.1 D e f i n i t i o n Let # be a finite Borel measure on R d. (a) # is called regular o f d i m e n s i o n D if # has compact support and satisfies o
3 c > 0 3r0 > 0 V r • (0, r0) Va • supp(/~): ~r D < I~(B(a,r)) < cr D. (b) M C R d is called r e g u l a r o f d i m e n s i o n D i f M is compact, 0 < 7 i V ( M ) < oo, and the restriction 7i~)M = 7/D(" N M) of 7iD to M is a regular measure of dimension D with support M.
12.2 R e m a r k A set or a measure which is regular of dimension D in R ~ with one given norm is also regular with dimension D in R d with any other norm. This follows from the wellknown fact that any two norms on R d are equivalent, i.e., if [[ [[ and [[[ [][ are norms on R d then there is a constant c > 0 with ~1]] ]]] -< ][ !t -< cl]l III- The notion of regularity of dimension D remains unchanged if one uses closed balls instead of open bails in the definition. Next we will study the elementary properties of regular sets of dimension D.
12.3 L e m m a Let # be a finite measure on •d such that there is a c > 0 and an ro > 0 with 0
Iz(B(a, r)) <_ cr ° for all a • supp(p) and all r • (0, ro). Then there is a d > 0 with 0
~ ( B ( a , r)) < c'r D
for all a • R d and a / / r > 0.
166
III. Asymptotic quantization for singular probability distributions
Proof o
First we will show t h a t there is a 5 > 0 with I~(B(a, r)) <_ cr D for all a E supp(#) and all r > 0. To this end let a E supp(#) be a r b i t r a r y and define ----max(c, I~(Rd)roD). If r E (0, ro) then by assumption we have o
~(B(a, r)) < c r v < ~r ". If r _> ro then o
5r z) >_ #(R~)roDr D >_ # ( R d) > I~(B(a, r)). We claim that, for a r b i t r a r y a E R d and r > 0, o
I~(B(a, r) ) <_ 2DSrD = dr D. o
If 0 < r < d(a, supp(#)) then #(B(a, r)) --- 0 and the claim is true. If d(a, supp(#)) < r choose b E supp(/~) with [[a - bl[ ----d(u, supp(#)). Then we have o
o
# ( B ( a , r)) < #(B(b, r + lib - aiD)
_< ~(~ + lib - all) ~ _< e~D(1 +
lib- all)D <_ 2Da~D. r
[] 12.4 L e m m a
A finite union of regular sets of dimension D is regular of dimension D. Proof Let M b . . . , M~ C R d be regular of dimension D and M = M1 O . . . O Mn. Then M is compact and we have
o < riD(M) <_~ n ° ( M ~ )
< oo.
Let c~ > 0, ri,0 > 0 be such t h a t Y r E (0, ri,0) Va E U,: l r D <_ u D ( u i A B ( a , r ) ) <_ carD.
ca
By L e m m a 12.3 there is a constant d / > 0 with o
7"lD(Mi M B(a, r) ) <_ d~rD
12. Regular sets and measures of dimension D
167
for all a • R d and all r > O. W i t h o u t loss of generality we may assume d/ > c+ Set c = d1 + . . . + dn and r0 = min(r~,0,... , rn,o). It follows that for all a • M and all r • (0, r0) 1
1
°
~ ___rain( , . . . , ± ) r ~ <_ n ~ ( i n B(a, ~)) en
n
<_ ~ U ~ ( M ~ nb(a,~)) i=,
-< ~ c~TD i=1
<
cr D. []
12.5 L e m m a Let M C R u be compact.
Then M is regular of dimension D if and only if every point x of M has a regular neighbourhood of dimension D in M . Proof
Since M is compact M can be covered by finitely many regular sets of dimension D and the lemma follows from Lemma 12.4. [] 12.6 L e m m a
Let M C ]~d be regular of dimension D, U C R d open with M C U, a n d g: U --~ R d a bi-Lipschitz map, i.e., there is a constant d > 0 with ~
~ llx - yll -< IIg(x) - g(y)ll --- c'llx - yll for all x, y • U. Then g( M ) is regular of dimension D. Proof
Obviously g(U) is open and g ( M ) is a compact subset of g(U). Falconer (1990, p. 28, 2.9)) that
It follows (from
0 < ( 1 ) D T t D ( M ) < 7-ID(g(M)) <_ c'DTtD(M) < co. Let rl =
min d(z,g(U)C). Then we have rl > 0. Let r0 > 0 and c > 0 be such that
zeg(M)
1--rD < ?'l°(M M B(a, r)) < cr D C
for all a • M and r • (0, r0). Define r~ = min(r~, I r 0). For y • g(M) and r • (0, r0) we obtain o
1
o
B(g-l(y), ~ r ) c g - l ( B ( y , r ) )
o
c B(g-'(~l,c'r)
168
III. Asymptotic quantization for singular probability distributions
and, hence, 1
1
D
< n '(M n
°
1)) o
< 7-lV(M n g-'(B(y, r)))
(12.1)
o
< 7-lD(M NB(g-l(y),c'r)) < c(c'r) o
o
Since g( M N g-l (B (y, r ) ) ) = g( M ) N B (y, r) an elementary property of Hausdorff measures (see Falconer (1990, p. 28, 2.9)) implies 1
D
o
o
-~--57-l (M N g-l(B(y, r))) < 7-lO(g(M) n B(y, r))
(12.2)
<_ c'DnD(Mn
r))).
Combining (12.1) and (12.2) yields
-
r D ~_ 7-ID(g(M) n B(y, r)) ~_ ca'2DrD.
C
[]
o has g(M) as its support and g(M) is regular of dimension D. Thus ~/Ig(M) We will now give some examples of regular sets of dimension D.
12.7 E x a m p l e ( C o n v e x sets) Let K C R d be a nonempty compact convex set. The dimension D of K is defined as the dimension of the affine subspace of Ra spanned by K. We will show that K is a regular set of dimension D. Without loss of generality we may assume that K spans R ~ (otherwise we take the affine subspace generated by K and transform it by an affine isometry onto some Rd). In this situation D -- d, 7-/~ is just a non-zero multiple of the Lebesgue measure A~K, and, obviously, 0 < Ad(K) < oo since i n t K ¢ 0 (cf. Webster, 1994, p. 61, Theorem 2.3.1). By Remark 12.2 we may assume that R d carries the/2-norm. To prove that K is regular of dimension d it is, therefore, enough to show that A~K is regular of dimension d. Let r0 > 0 be arbitrary. o
o
The map K --+ R, x --+ Ad(K NB(x, ro) ) is continuous. Thus c -- min Aa(K nB(x, to)) xCK o
exists. Since B(x, r0) N int K ~ 0 we know that c > 0. For 0 < t < 1 and x E K the convexity of K yields o
x + t(B(O, ro) n (K
o -
x)) C K n B(x, tro).
12. Regular sets and measures of dimension D
169
We deduce o
o
Aa(K M B(x, tro)) >_ taA~(B(O, ro) M (K - x)) o
= tUAd(B(x, ro) n K )
>
C
d
(tro) .
Thus we obtain, for r E (0, ro), C
d
<
o
o
_< Ad(B( ,
n
o
<_ Ad(B(O, 1))r d. 12.8 E x a m p l e ( S u r f a c e s o f c o n v e x sets) Let K C R d be nonempty, convex, and compact. Let b d K denote the relative b o u n d a r y of K , i.e., the boundary of K relative to the affine subspace spanned by K . Let D + 1 be the dimension of K . We will show t h a t b d K is regular of dimension D. As above we may assume t h a t D + 1 = d and t h a t R d carries the/u-norm. For x, y E R d the number (x, y) E R is the s t a n d a r d scalar product of x and y, i.e., i f x = ( X l , . . . ,xd) and y = (Yl,... ,Yd) then d
(12.3)
(x, y) =
x,y . 4=1
Our claim is that, for every x E OK = b d K , there is an r > 0, a bi-Lipschitz m a p f : B(x, r) -+ R d and a hyperplane H C R d with
f ( B ( x , r) M OK) = f ( B ( x , r)) M H. o
Once this has been proved the argument is finished as follows. The set f ( B ( x , r)) is open. Hence there is an s > 0 with B(f(x), s) C f ( B ( x , r ) ) . By Example 12.7 the o
set B ( f ( x ) , s) A H is regular of dimension d - 1. The m a p g = f - 1 from f ( B ( x , r)) o
to B(x, r) maps B ( f ( x ) , s) M H onto f - l ( B ( f ( x ) , s)) M OK. Since g is bi-Lipschitz L e m m a 12.6 implies t h a t f - 1 (B(f(x), s)MOK is regular of dimension d - 1 . Moreover, f - l ( B ( f ( x ) , s)) AOK is a neighbourhood o f x in OK. By L e m m a 12.5 OK is a regular set of dimension d - 1. To prove the claim let x E OK be arbitrary. Then there exists a 5 > 0 and a unit vector u E R d with x + 5u E int K . Moreover there is an r > 0 with B(x + 5u, r) C int K . S i n c e x ~ i n t g we know t h a t r < 5. Define ~ : B ( x , r ) - - + R b y W(y) = min{t E R: y + tu E K}. Then ~ is well defined since for an a r b i t r a r y y E B(x, r) we have y + hu E B(x + hu, r) C i n t K . For y E OK the point y + 0 • u belongs to K so t h a t ~(y) < 0. Since the open line segment between y + W(y)u and y + 5u lies in int K the fact t h a t y belongs to OK
170
111. Asymptotic quantization for singular probabifity distributions
implies ~(y) _> O. Thus y • OK yields ~(y) = O. Obviously every y • B(x, r) with ~(y) --- 0 belongs to OK. Next we will show t h a t ~, is convex. Let y, y' • B(x, r) and t • [0, 1] be arbitrary. Then ty + (1 - t)y' + (t~(y) + (1 - t)~o(y'))u = t(y + ~o(y)u) + (1 - t)(y' + ~o(y')u) • K, hence ~o(ty + (1 - t)y') <_ t~o(y) + (1 - t)~(y'). A convex function on an open set is locally Lipschitz (see Webster, 1994, p. 224/225, proof of Theorem 5.51). Thus, by making r a bit smaller if necessary we may assume t h a t p is Lipschitz, i.e., there is a c > 0 with ]~o(y) - ~(y~)] _< c]]y - y']] for all y,y' E B ( x , r ) . Define H = {y • Rd: (y,u) = 0} and let Ptt be the orthoghonal projection onto H. Define f : B ( x , r ) -+ R ~ by f ( y ) = PH(Y) -- ~O(y)u. Then f is a Lipschitz map, since
I]/(u) - I(y')]i -< liP.(u) (12.4)
- P.(y')[]
+ I~(y) - ~o(y')l]
-< Ily - u'll + oily - y'll < (1 + c)lly - y'li
Now we will show t h a t there is a constant d > 0 such t h a t (12.5)
[if(v) -/(u')tl
>_ ¢lly - v']l
for all y, y' • B(x, r). Let y,y' • B ( x , r ) be arbitrary, set t = (x,u) and define z = PH(Y) + tu, Z' = PH(Y') + tu. Since u is orthogonal to H we deduce IIz - ~fl 2 = l I P ~ ( z - x)II ~ + lltu - (x, u ) ~ l l ~
= iiP-(~)
- P~(~)II ~.
Since PH(Z) = PH(Y) we get
IIz - xJI ~ _< Ily - xJJ~ -< r ~, hence z e B(x, r). Similarly, z ~ C B(x, r). By the definition of ~, we have
~o(z) = min{s
• R: z + su • K}
= min{s • R: PH(Y) + tu + su • K } = min{s • R: PH(Y) + (Y, u)u + (t + s -- (y, u))u ~_ K}. Since y = PH(Y) + (Y, u)u we obtain (12.6)
~o(z) = min{s • R: y + (t + s - (y, u)) • K } = (y, u) - t + ~(y)
12. Regular sets and measures of dimension D
1 71
For the same reason (12.7)
~ ( z ' ) = (y', u) - t + ~ ( y ' ) .
Using (12.4), (12.6) and (12.7) and the definition of z and z' leads to I~(y) - ~ ( y ' ) i = I ~ ( z ) + t - (y, ~) - ~ ( ~ )
- t + (~', ~)I
> Il- (l+c)llz-z'll --I(y- y', u)l- (1 + c)IIPH(Y) -- P-(Y')II. If I(Y- Y', u)l > 20 + c)IIPH(Y) -- PH(Y')II then I~(Y) - ~(Y')I > ~I(Y - Y',")I L hence tly - y'll ~ = I I P . y - P . ~ ' t l ~ + I(y - y', u)l 2 (12.8)
< 4 1 1 P . y - P H y ' l l ~ + 41~(y) - ~ ( ~ ' ) l ~ = 4Ill(y)
-
f(y')[I
2
If I(Y - Y',u)I < 2(1 + c)IIPHU -- PHY'II then
(12.9)
Ily - y'll 2 = I I P . y - P ~ y ' l l 2 + I ( y - y', u)l 2 _< (4(1 + c) 2 + 1)IIPHY - PHY'II 2 < 4((1 + c) 2 + 1 ) I l l ( y ) - f(y')lL 2
Combining (12.8) and (12.9) yields (12.5), so that f is bi-Lipschitz. Moreover we have (12.10)
f ( B ( x , r) n OK)) = g A f ( B ( x , r) ).
This can be seen as follows: If y 6- OK n B(x, r) then ~(y) -- 0 and, hence,
f(Y) = PB(Y) C H N f ( B ( x , r ) ) If y 6- B(z, r) and f(y) 6_ H then ~(y) = 0, hence, y 6_ OK N B(x, r). 12.9 E x a m p l e ( C o m p a c t d i f f e r e n t i a b l e m a n i f o l d s ) Let D 6_ {1,... ,d - 1} and M be a compact D-dimensional CLsubmanifold of R a. Then M is regular of dimension D. To prove this we will show that every point in M has a regular neighbourhood of dimension D in M. The claim then follows from Lemma 12.5. Let x 6_ M be arbitrary. By Federer (1969, p. 231, 3.1.19) there exists an open neighbourhood U of x, an open set V in R d, a D-dimensional vector subspace W of R d, and a C 1diffeomorphism f : U --~ V with f ( M n U ) = W N V . Let r > 0 be such that B ( f ( x ) , r) (:7_Y. By Example 12.7 the set B ( f ( x ) , r) N W is regular of dimension D, g = f - 1 is bi-Lipschitz and maps B ( f ( x ) , r ) [7 W onto f - l ( B ( f ( x ) , r ) ) t3 M. By Lemma 12.6 f - l ( B ( f ( x ) , r))NM is regular of dimension D. Since f-1 (B(f(x), r))NM is obviously a neighbourhood of x in M the argument is finished.
172
III. Asymptotic quantization for singular probability distributions
12.10 E x a m p l e ( S e l f - s i m i l a r sets) Let S 1 , • . . , SN : R d ---> R d be contracting similarity transformations with scaling numbers s l , . . . ,Sg E (0,1), i.e., for i E { 1 , . . . , N } and allx, y E R d, llS~z -
S~yll = s~ll~ - yll.
It was shown by Hutchinson (1981) that there is a unique non-empty compact set A in •a with A = SI(A) U . . . U S N ( A ) . This set is called the self-similar set corresponding to $ 1 , . . . , SN. It is easy to see that there is a unique real number D > 0 with N
~--~8D : i, i=l
the similaritydimension of ($I,... ,SN). (S1,... , SN) is said to satisfy the open set condition (OSC) if there exists a nonempty open set U C R d with Si(V) C V and Si(U) M Sj(U) -- 0 for i ¢ j, i, j = 1 , . . . , g . Recovering an older result of Moran (1946), Hutchinson (1981, Theorem (1)) proved that 0 < 7-l°(A) < oo if ( $ I , . . . , SN) satisfies the open set condition. Hutchinson (1981, p. 737/738) also showed that there are constants cl, c2 > 0 and r0 > 0 with o
elf D <_ 7-lD(A M B ( x , r)) <_ c2r D for all x E A and all r E (0, to). In our language this means that A is regular of dimension D. Classical examples of self-similar sets satisfying the OSC are the Cantor set, the Sierpinski gasket, and the von Koch curve. The C a n t o r set is the self-similar set corresponding t o t h e two contractions $1, $ 2 : / R --+/R with Sl(X) = I x a n d S 2 ( x ) = ~x+~.l 2 The S i e r p i n s k i g a s k e t corresponds to the three contractions $1, $2, $ 3 : / R 2 --+/R 2 with S i x = ½x, S2x = ~x 1 + (3, 1 0) and S3x = ~x 1 + (-,, 1 x~_ 4 )" The v o n K o c h c u r v e is generated by the four contractions $1, $2, $3, $ 4 : / R ~ - + / R 2 defined by Sl(x) = Ix, 1 ~ v ~ ) , S3(x) -- ~zl + ( 2~ , 0) , where D e S2(z) = I D ~ x + ( ~1 , 0) S3(x) = ~1 D _ ~ z + (~, is the counter-clockwise rotation about the origin with angle ~. By Hutchinson's result these sets are regular of dimension D, for d -- ~l o g 3 ' D --- ~log2 ~ and D = t°-aA log3 respectively. For more information about these and other self-similar sets we refer the reader to the book of Falconer (1990). Before we deal with the quantization of regular measures of dimension D let us mention the following characterization of these measures. 12.11 P r o p o s i t i o n Let # be a finite measure on R d with compact support K . Then the following properties are equivalent:
12. Regular sets and measures of dimension D
173
(i) # is regu/ar of dimension D.
v < # < cT-t~. (ii) K is regular of dimension D and there exists a c > 0 with -J~lg Proof That (i) implies (ii) is an immediate consequence of a proposition in Falconer(1990, Proposition 4.9) and the definition of regularity of dimension D. The converse implication (ii) =~ (i) follows from the definition of regularity of dimension D. []
12.2
Asymptotics
for the
quantization
error
12.12 P r o p o s i t i o n
Let P be a probability on R d. Assume that there is a c > 0 and an ro > 0 such that o
P ( S ( a , r ) ) < cr D
(12.11)
for all a E supp(P) and all r E (0, ro). Then there exists a constant d > 0 with (12.12)
Ilx -
f
all dP(z) > dP(B)l+-~
B
t'or a/l a E R a and all Borel sets B c R a. Proof o
By Lemma 12.3 there is a 5 > 0 with P ( B ( a , r)) < 5r D for all a ~ R a and all r > 0. Let a E R a and let B be a Borel subset of R a. If P ( B ) = 0 then the conclusion (12.12) obviously holds. Let us assume P ( B ) > 0 and set o
rB = inf{r > 0: P ( B ( a , r ) ) >_ ½P(B)}. O
o
Since lira P ( S ( a , r)) = 1 >_ P ( B ) there is an r > 0 with P ( B ( a , r)) >_ ½P(B). Hence, r --+OO
r B < 00.
For r > r n we
have o
& o >_ P ( B ( a , r)) >_ ½P(B) which implies (12.13)
5r~ >_ 1 p ( B ) .
For r < rB we have o
P ( S ( a , r)) < 1 p ( B )
174
III. Asymptotic quantization for singular probability distributions
Since P ( B ) > 0 there is an r > 0 with 0
P(B(a, r)) ~_ 5r D
<
½P(B).
If (r,~),~eN is any increasing sequence with rn < rB and lim r,~ ----rB we deduce from o
o
B(a, rB) = [_J B(a, r~) that nCN o
o
P ( B ( a , rB)) = lira P(B(a, rn)) <_ P(B)
(12.14)
Using (12.14) and (12.13) we get
f Ilx-alldP(x)>B
f
IIx - all
dP(x)
o B\B(a,rB) o
>_ rBP(B\B(a, rB)) o
> rB(P(B) - P(B(a, rB))) >_ l r B g ( g ) > l (~-~)b(P(B))l+v. [] 12.13 C o r o l l a r y
Let P be a probability on R d. Then the following conditions axe equivalent: (i) There exists a c > 0 with o
P(B(a, r)) < cr D for all r > 0 and all a E R d. Oi) There exists a d > 0 with
/
llx - all
dP(x) >_ctP(B(a, T)) l+b
o
B(a,r)
for all r > 0 and all a E R d. (iii) There exists a d' > 0 with
f l]x - a]] dP(x) > c l t p ( B ) l + ~ B for all a E R d and all Borel sets B C R d.
12. Regular sets and measures o f dimension D
1 75
Proof That (i) implies (iii) is Proposition 12.12 and that (iii) implies (ii) follows by setting o
B = B(a, r). It remains to show that (ii) implies (i). Obviously, (ii) yields
IIx -
rP(B(a,r)) > f
all d P ( x ) > c ' P ( B ( a , r ) ) 1+-~
o B(a,r)
and, hence, o
1 D_D (~) "1 >_P(B(a,r)).
[]
12.14 C o r o l l a r y Let P be a probability on R d. Suppose that there is a c > 0 and an ro > 0 with o
P ( B ( a , r)) < cr o
for all a E supp(P) and a/l r C (0, r0). Then there is a c o n s t a n t b > 0 such that, for every e - p ~ k i n g {B1,... ,B~} in R d with P(Rd\ ~ BO = 0 o~d an ~ , , . . . , ~ e R d i=1
Proof Without loss of generality D > 0. Set p = 1 + -~ and q = 1 + D. Then we have ~+~1t = 1 and p > 1. H61der's inequality yields D
S := ~
1
IIx - adl d P ( x )
I
= n~
IIx - a~ll d P ( x
This implies Sq < n
x - aill d#(x \i:1~,
III. Asymptotic quantization t'or singular probability distributions
176
Using Proposition 12.12 we get for the constant c' > 0 of that proposition D
S = ~
[Ix - a i l [ d f ( x
Thus, the corollary holds, if we set b
>
=
c')D--O~P(Bi) = (c') ~-~.
(6J)D.
[]
12.15 P r o p o s i t i o n Let P be a probability on R d. Suppose that there are constants c > 0 and ro > 0 with o P ( B ( a , r)) <_ cr D for every a e supp(P) and every r e (0, r0). Then > o.
Proof Let am E C~,1 and let {Aa: a E am} be a Voronoi partition o f R d with respect to am. By Corollary 12.14 we have
ne~l = n
Ilx- all dP(x)
~
> b > O,
aE a n A a
where b is as in Corollary 12.14. This implies the proposition.
[]
12.16 C o r o l l a r y Let P be a probability on R d. Then dimH(P) < D.r(P ) for aIl r > l.
Proof By Corollary 11.4 it suffices to prove dimH(P) _< D__t(P). If dimH(P) = 0 then there is nothing to show. So let dimH(P) > 0 and let t with 0 < t < dimH(P) be arbitrary. By Falconer (1997, Prop. 10.3) we have o
dim~(P) = inf{s E R: lim inf l ° g P ( B ( x ' r ) ) < s for P - a.e. x}. r-~0 Io~ r This implies P ( { x C Rd: lira inf l ° g P ( ' ~- ' ( x ' r ) ) > t}) > 0 r-.o log r
12. Regular sets and measures of dimension D
177
and, hence, o
P ( { x E Ru: 3r~ > 0 Vr _< rx: P ( B ( x , r ) ) < rt}) > O. Thus, there exists a compact set K C R d with P ( K ) > 0 and an r0 > 0 such t h a t o
P ( B ( x , r)) <_ r t for all x E K and all r < r0. Set Q = p---~K)PIK. Then it follows t h a t o
1
Q ( S ( x , r ) ) <_ p----(~r
t
for all x E supp(Q) and all r _< r0 and Proposition 12.15 yields lim infneu I ( Q ) t > O. Since en,i(Q) _< p---(~K)en,l(P)this leads to lira infne~ I ( P ) t > 0. Using Proposition 11.3 we deduce
t <_ _D~. Since t < dimH(P) was a r b i t r a r y this implies dimu(P) < D,. [] 12.17 P r o p o s i t i o n Let P be a probabifity on R ~ with compact support K . Assume that there is a c > 0 and an ro > 0 with o
crv <_ P ( B ( a , r)) for every a E K and every r E (0, r0). Then lim supne,~,oo(P) D < oo. n--~oo
Proof If there is an no C N with e~o,oo = 0 then en,oo -- 0 for all n ~ no and the assertion of the proposition is obvious. So let us assume t h a t e,~,o¢ > 0 for all n E N. Since K is compact there exists a finite set an C K of m a x i m u m cardinality satisfying [Ix - y[[ >_ eu,oo for all x, y E ~ with x ¢ y.
1 78
IlL Asymptotic quantization for singular probability distributions
We will show that [an[ > n. Assume the contrary. Then we know that e,~,oo _< sup dam (x). xEK
Hence there exists a y E K with [[y - all > en,o~ for all a E an, which contradicts the maximality of an. For x, y ~ au with x ¢ y we have 0
0
B(x, ½e,~,~) M B(y,
½e,~,oo)= I~
hence
1 = P(K) > P Due to I'~---~ lim( X ) en,~
=
a(a, ° ½e,~,oo
=
))
(u
z
P(B(a, ~e~,oo)). '
°
0 there is an nt E N with 1
en,oo ~ r0
for all n > nl. Thus, for n > nl, 1 D e(2en,°°)
1 _> E aC ¢~n
--
el , l
(2 en,oo) v >
c
%00
and, therefore, 2D neDoo ~ - --
C
"
This proves the proposition.
[]
Now we can formulate and prove the main result concerning the asymptotics of quantization errors for regular probabilities. 12.18 T h e o r e m
Let P be a regular probability of dimension D on R ~. Then, for 1 < r < o% (12.15)
0 < liminfne,~r(P) ° < limsupnen,r(P) D < oo. n-+oo
~
--
n---~ o o
In particular the quantization dimension D r ( P ) agrees with D which is also the Hausdorff dimension of the support of P. Proof The inequality (12.15) follows immediately from Propositions 12.15, 12.17 and Lemma 10.1 (a). The remaining statements follow from Corollary 11.4 and Proposition 12.11. []
12. Regular sets and measures of dimension D
179
12.19 R e m a r k It remains an open question for which regular probabilities P of dimension D the limit lim ne D ~--+OO
Dr
exists in (0, oo). Recall from (6.4) that in this situation, for 1 _< r < oo,
Qr(P) = ,~-~o~limn-~V,~,r(P) = ( l i m ne~r(P) ) f~ is called the r-th quantization coefficient of P. It follows from Theorem 6.2 that for the normalized volume measure P of a convex compact set the r-th quantization coefficient exists. We conjecture that the same is true for the normalized surface measures on convex compact sets and compact C1-manifolds. For the natural Hausdorff measure on a self-similar set the quantization coefficients exist in some cases while in other cases (like the classical Cantor set) they need not exist. We will discuss measures on self-similar sets in Section 14.
Notes The concept of regularity for sets and measures can be found in several books on geometric measure theory (see, for instance, David and Semmes (1993, 1997) and Mattila (1995)). Since there are several different notions of regularity for sets and measures the above regularity is sometimes called Ahlfors-David regularity (see Mattila, 1995, p. 92). The elementary results on regular sets of dimension D (Lemma 12.4-12.6) and the results concerning the regularity of convex sets and their boundaries as well as that of compact Cl-manifolds are probably well-known. We just could not find an explicit reference. To our knowledge the results concerning the quantization of regular sets and measures of dimension D as stated above are new. A good introduction to the theory of convex sets is Webster (1994). The basic theory concerning self-similar sets as well as many examples can be found in Barnsley (1988). For the canonical normalized Hausdorff measure P on a self-similar set with OSC the inequalities in (12.15) were first proved in Graf and Luschgy (1996). After this book had essentially been finished PStzelberger (1998a) gave different conditions for a probability P to ensure that 0 < liminf,~oone,~,2(P) D or limsup,~_~oone~2(P) D < oo or lim ne,~2(P) D exists (for the/s-norm), where D is suitably chosen.
180
IlL Asymptotic quantization for singular probability distributions
13
Rectifiable curves
Here we consider the length measures on rectifiable curves. These measures can be obtained by restricting the one-dimensional Hausdorff measure to the given rectifiable curve. In this way we get an elementary class of singular measures of quantization dimension 1 for which the quantization coefficients exist and will be calculated. Nevertheless there are simple examples that show that the length measure on a rectifiable curve need not be regular of dimension 1 (see below). In this section [I [[ will always denote an euclidean norm on R a. First we wilt collect some basic results about rectifiable curves. 13.1 D e f i n i t i o n Let a, b E R with a < b. A c u r v e (more exactly, a Jordan curve) F is the image of a continuous injection 3`: [a, b] -+ R d. 3' is called a p a r a m e t r i z a t i o n of F. A curve is called r e c t i f i a b l e if L=L(F)--sup
3,(ti)-3,(ti-1)[[:nEN,
a=t0<...
L is called the l e n g t h o f t h e c u r v e F. 13.2 L e m m a I f F c R d is a curve then 7-/1(F) -- L(F), in particular L(F) does not depend on the parametrization 3`. Proof See Falconer (1985, p. 29, L e m m a 3.2).
[]
13.3 D e f i n i t i o n Let F be a rectifiable curve of length L. A continuous injection 3,: [0, L] --~ F is called a p a r a m e t r i z a t i o n b y a r c l e n g t h ff L(3`([0, t]) = t for all t E [0, L]. 13.4 R e m a r k Every rectifiable curve admits a parametrization by arc length (see Falconer, 1985, p. 29). 13.5 L e m m a Let F be a rectifiable curve of length L and 3`: [0, L] -~ F a parametrization by arc length. Let # be 1-dimensionaI Lebesgue measure restricted to [0, L]. Then (13.1)
IJ3,(t)- 3,(s)l[ _< [ t - s l foralls, t e [O,L] and 7-/~r = # o 3 , -t.
Proof The lemma follows immediately from Falconer (1985, p. 29, (3.2) and p. 30, Corollary 3.3). []
13. Rectiaable curves 13.6
181
Lemma
Let F be a rectifiable curve, x E F and r E (0, ½diam F). Then o
7/1 (F M B(x, r)) > r.
(13.2) Proof
This follows from Falconer (1985, p. 30, L e m m a 3.4).
[]
Inequality (13.2) is one half of the condition for regularity of dimension 1 of a rectifiable curve. That, in general, a rectifiable curve need not be regular of dimension 1 is shown by the following example. 13.7 E x a m p l e Let m E N, m > 1. Then there exist a continuous injection %~: [m---~' ~] --+ R2 with
~m(~)
=
, ~ 1 1 1 (~--~,o), (~) =(~,1 0 ), ~-r-1
lm:=L(Tm([rnll,1]))
Define 7: [0,i] --+R ~ by
:max( 1
{°,
m+l'v~
t=0 1 1 if m E N with t E [~--~, ~]
7(t)= %~(t),
Then F = 7([0, 1]) is a rectifiable curve since oo
L(r) < ~ lm< ~. Moreover, for m > 2, oo
7/I(B(0,
)NF) =
Ik k=rn
Since there exists an m0 E N with i
I
v/m
i
i
x/-m-~ 1 - m
m+l
__>
for m > m0, we have, for these m, oo
1
k=m
so that
o
1
7-/1 (B(0, 1 ) n r ) -- v ~
182
III. Asymptotic quantization for singular probability distributions
This shows that
1 o sup - U ' (B(0, r>0
r) FI F)
-- q-oc
r
and F is not regular of dimension 1. Before we come to the quantization of recitifiable curves we will prove a result concerning the distance of the n-optimal set of centers of order r from the support of the probability in question. 13.8 L e m m a
Let P be a probability on R d with compact support K . Let 1 < r < 0o and let ~ be an n-optimal set of centers for P of order r. Define
5~ =
ma~
max
aCa,~ xC W(al(xn)g)K
IIx-
all = IId.ollo~
Then ~ ~ 2 ))" -< e,~,,(P).
o
(13.3)
5..~2minP(B(Xx,eK
Proof Let a E a~ satisfy 5u =
max
xEW(ala~)nK
]]x - all. Then there exists an x E W(a]a~) N K
with 5~ = I]x - all. For every b E au we have ilx - bll
>_ IIx -
all.
o
For every y E B(x, lSn) and every b E ~n this yields 1
Ib - bll _> IIx - bll -
IIx - yll -> II~ - all -
II~ - yll = ~
-
IIx - yll
-> ~ .
Using this inequality we deduce cr,r = Vn,r = . [ da,(z)r d p ( z )
>
/
d.~(z)"dP(z)
~n~(~,½~) _> (15n)rp( K A B(x, o 1 -~,~)) and the lemma is proved.
[]
13.9 C o r o l l a r y
Let P be a probability on ]~d with compact support K . Let n ~ [K[, 1 <_ r < 0o and let a,~ be an n-optimal set of centers for P of order r. Then, for every a E ~ , (13.4)
1 o -sd(a,K) m ~ P ( B ( y , yE
1 1 x d ( a , K ) ) ) ; ~_ en,r(P). Z
13. Recti~able curves
183
Proof The corollary follows from L e m m a 13.8 if one observes that
5n >_ d(a, K ) for all a E am, since W(a[a~) n K ¢ O for all a E an by (4.1).
[]
First we give a quantization result for line segments. For x, y E R d let [x, Yl be the line segment from x to y, i.e. [x, y] = {(1 - t)x + ty: t • [0, 1]}. It is a well-known fact t h a t for f : [0, 1] --+ R d with f(t) = (1 - t)x + t y we have 1
(13.5)
IIz - yll rilE.,, j l = u([0, l j/.l'
(see L e m m a (13.5)). 13.10 L e m m a 1 1 Then, for 1 _< r < co and Let x, y E R d with x ~ y be given. Let P = lWz~7-/l[~,y].
n>_l, e,~,r(P) = \1 + r]
(13.6)
2n
Proof By the remark preceding the l e m m a we have P = U([0, 1])I = U([0,1]) o f - 1 . l
y~,r(P) <
f rain IIx - bll r d(P(x)
-- J
be/~
= J ~min Ill(t) -- f(a)ll ~ dV([0, 1])(t) Since [If(t) - f(a)l I = It - a I IIx - Yll we obtain Vn,r(P) < IIx - yll r f min It - al r dU([0,1])(t) J
aEa
---- IIx - ylV'V,~,,.(U([O, 1])) According to Example 5.5 we have U~,~(U([0, 10)
This yields
-
-
1 (1 + r)(2n) ~
Let
184
III. Asymptotic quantization for singular probability distributions
">": Let fl E C~,r(P). Since supp(P) = [x,y] is convex and since the underlying norm is euclidean Remark 4.6 yields fl C Ix, Y]. Let a C [0, 1] equal f - l ( f l ) . Then a has n points and we get V,~,r(P) = f min llx - bllr d(P(x) J b E E '"
----/ ~ n HI(t) - f(a)J[ r dU([0, 1])(t) = ilx - yll ~ /
It - ai r dU([0, 1D(t)
> Ilx - yllrV,~,r(U([O, 1])) 1
= Ilx - yll"(1 + r)(2n) r and, hence,
[] Now we will give a first lower bound for the quantization errors for a normalized one-dimensional Hausdorff measure on a rectifiable curve. 13.11 L e m m a Let 7: [a, b] --~ R d be a continuous injection which is a parametrization of the rectifii 1 able curve F with length L > O. Let P = -fl-l[r. Then, for 1 < r < c%
(13.7)
~ r, ( p ) >- ((r
1
! (a)ii
Proof Let G = {(1 - t)'),(a) + tT(b): t E R} be the line through 7(a) and 7(b). Let PG be the orthogonal projection onto G. By [7(a), 7(b)] = {(1 - tT(a) + tT(b) : t E [0, 1]} we denote the line segment from 3'(a) to 7(b). Let Q denote the image of P with respect to PG. First we will show that (13.8)
1 ~¢lp~(r) -< n~r o p51.
Let B be a Borel set in R a. Then using the fact that IlPa(x) - Pa(y)ll ~ IIx - y[I for all x, y c R d and Falconer (1985, p. 27, Proposition 2.2) we get 7 @ ( P a ~ ( B ) ) = 7tl(P51(B) Cl F) > 7-ll(Pa(Pal(B) Cl F)) = n l ( B 17 P c ( r ) )
13. Rectifiable curves
185
Thus, (13.8) is proved. Now let a e C~,~(P). Using (13.8), the fact that [7(a), 7(b)] C PG(F), and Lemma 13.10 we obtain r er~r
= V,~r = / d(x, a) ~ dP(x) F
= 1L / d(x, a) r dn~r(Z ) r
> Z1 / d(Pa(z), Pa(a)y
dT/~r(x)
r
_1 -- L / d(y, Pa(a)y d~t~roPj~(y) Pc(r)
1/ 1/
> $
d(y, PG(a)) r d~/~pG(r)(y)
Po(r)
> -~
d(y, Pc(a)) r dT-ll(y)
bC~),~Cb)] >-- L 11"7(b)- 7(a)l[V~'r (]]7(b)17(a)][7-/l[~(a)'~(b)])
= 1L[[-),(b)l~+~(a)[[ ([[7(b)~n.~(a)[])~ []
Thus, the lemma is proved. 13.12 Theorem 1 1 Then Let F C Rd be a rectifiable curve, with length L > O. Set P = ZT-/ir. 1
O) for 1 < r < 0% lim n e ~ ( P ) = _
Q~([O,ll)~/'nI(F) = (~-~)~ -~L
(ii) l i m ne~,.(P) = Q.([O, 1])7-/1(F) = L.
Proof
Let 7: [0, L] -~ F be a parametrization of F by arc length. First we will show (i). ">" : Let O=to < t l < . . . < t r n = L a n d c h o o s e t for i C {1,... , m - 1},
o = t 0 = t + , t ~ n = t m = t +and,
tL1 < t:; < t~ < t +, < t-;÷l.
186
III. Asymptotic quantization for singular probability distributions Let [~i = T([t+_x,t[]). Then we know that F i M F j = 0 f o r i # j. Let 6 -min{d(P,, Pj): i ~ j}, where d(B, C) =- inf{d(b, C): b C C} for B, C c R u. Then we have 6 > 0 and 6 < diam(F). By Lemma 13.6 this implies >
for all x E F, and hence, (13.9)
~m~nP(FN&(x,~_~ > (i~(6_~I+~ 4)] - kLJ \ 4 ]
Let a~ be an n-optimal set of centers of order r. For i -- 1 , . . . , n set
~,~ = {~ ~ ~ :
W(~l~.) n r~ ¢ 0}.
We will show that there is an no E N such that
(13.10)
(~n,i M oln,j = O
for i ~ j and all n > no. Since lim e,~,r = 0 there exists an no c N with
(13.11)
e~or <
6
r .
Using Lemma 13.8 we have (13.12)
Iid~il~minP(B ( 1 yEF
\
\
for all n > no. Observing that
~
--
t -+ tminP(B(y,t))~
--
is non-decreasing and
yEF
using (13.12), (13.11), and (13.9) we deduce ]lda, II~ <
5
for all n > no. By the definition of 6 and an,i this yields (13.10). It follows from (13.10) that, for n > no,
nr yn,r
nr I d(x, a,~)~dP(x) F m
> ~ ~ / a(x,.o)r~r(x) i:l Fi
:nr~Ffd(x'°ln'i)rdP(x).= .
13. Rectifiable curves Setting ni =
187
I~,~l,
Li = 7-/l(Fi), and P~ = E
w,
_
Iv, we get
"~ L, l f i=l
= n" Z
"d
1
x
Fi
V.,,.(P,)
i:I
By L e m m a 13.11 this leads to n "V.. , r _>
n'~-~ Li L ( I + lr)L, - II'Y(t+-l)-~(t;)lll+" i=l
(13.13)
1 -> 2"(1 +
119,(t+_l) _ ~,(t~_)lll+r
(1). 7n~
(o,)" ~-
r)L i=1
Set si = I1"~(~/+_1) - ")'(ti-)ll l+r. It follows from L e m m a 6.8 t h a t
s~ n
(13.14)
->
i:1
s
,
i----1
hence l+r
('~'~")"
= •
"" -
2,(1 + r)L
-~(t~_l) - ~ ( t ;
For t~- --+ ti, t + -* ti we deduce liminfnen,,>
1((1~r))7
(L) 7 i=1
Since L = sup{Y]~ II~(t,_l) - "Y(ti)ll : a = to < . . .
< t m = b} we obtain
i=1 1
liminfnen,,>l ( 1 - - ~ ) 7
13.15) -<"
Now let/3 C [0, L] be of cardinality less t h a n or equal to n and set oe = 7(/3)Using L e m m a 13.5 we deduce n
"v.n,r
~--
n" Jfminllx aEa Eo,z]
[0,El
-
allrdP(x)
188
IlL Asymptotic quantization for singular probability distributions Since fl with I/3[ _< n was arbitrary in [0, L] we deduce
n rV,n,r < nrVn,r(U([O, L]) By Example 5.5 we have L r
Vn,r(U([0, L]) - nr(1 + r)2r Thus we get limsupne.,~ < \ 1 + r ]
(13.17)
Combining (13.15) and (13.17) yields the first part of the theorem. Now we will prove part (ii). "<" : Let o~ be as above. Then using L e m m a 13.5 we have
ner~,oo(P) <_n m a x m i n l l x - all xEF aEa
= n max min 117(t) - 7(b)ll te[0,L] be#
< n max min It - b I.
--
Since/3 with 1/3I
tC [0,L] bc~
was arbitrary in [0, L] we get L
ne,,~(P) < ne,~,,(U([O,L]) = 7" Hence (13.18)
L
limsupnen,oo(P) < -~ ~--~oo
">" : On the other hand (11.1) yields L = 7{'(V) < 2 l i m i n f n e n , ~ ( P ) ~-~oo
so that (13.19)
lira inf ne,~,oo(P) > n--~O0
--
L -2'
Combining (13.18) and (13.19) implies part (ii) of the theorem.
[]
13. R e c t i t l a b l e c u r v e s
189
13.13 R e m a r k In geometric measure theory a measure /z on ]Rd is called m-recitifiable if m C { 1 , . . . , d}, /z is absolutely continuous with respect to 7/m, and # is supported on a countable union of m-dimensional Cl-manifolds. A length measure on a rectifiable curve is 1-rectifiable in this sense. We conjecture that, for all m-rectifiable measures # on H a and all 1 < r < co, lim ne,~,,.(p,) ~ 7/ , -+00
exists in (0, + c o ) , i. e., every m-rectifiable measure has a r - t h quantization coefficient.
Notes The basic notions a b o u t rectifiable curves can, for instance, be found in the book of Falconer (1985, C h a p t e r 3). A good introduction to the theory of rectifiable measures is given by M a t t i l a (1995, §15-20).
190 14
IH. Asymptotic quantization for singular probability distributions Self-similar
sets
and
measures
The class of self-similar measures has been a central object of studies in fractal geometry during the last two decades. Most self-similar measures are singular with respect to Lebesgue measure, but the restriction of Lebesgue measure to the d-dimensional cube is, for instance, also self-similar. In this section we determine the quantization dimension of all self-similar measures that satisfy a certain separation condition. As it turns out many self-similar measures have the property that their quantization dimension of order r is strictly increasing with r. In this respect they are different from all measures considered so far in this volume.
14.1
Basic
notion
and
facts
Let ]] ]] denote a norm on R d. A c o n t r a c t i n g s i m i l a r i t y t r a n s f o r m a t i o n S on R d is a map S: R a -~ R d such that there is a constant s E (0, 1) with
(14.1)
lls( ) - s(v)ll =
- vit
for all x, y E R a. s is called the s c a l i n g n u m b e r or c o n t r a c t i o n n u m b e r of S. In what follows N is always a natural number greater than 1, (S~,... , SN) is an N-tuple of contracting similarity transformations of R a, and ( s l , . . . , sN) is the corresponding N-tuple of scaling numbers. Hutchinson (1981) has shown that there is always a unique nonempty compact subset A of R d with
(14.2)
A = SI(A) U... 0 Sly(A).
A is called the a t t r a c t o r or i n v a r i a n t set of ( S b . . . , SN). Sometimes A is also called the self-similar set corresponding to ( S I , . . . , SN). N
It is easy to see that there exists a unique real number D _> 0 with ~ s D = 1, the i=1
similarity d i m e n s i o n of ($1,... ,SN). Let (Pl,... ,PN) be a probability vector, N
i. e., Pi > 0 and ~-~p~ = 1. Then Hutchinson (1981) showed that there is a unique i=l
probability measure P on R d with N
(14.3)
P = ~ p ~ P o S~-'. i=1
If Pi > 0 for all i E { 1 , . . . , N } then the support of P equals the attractor A. P is called the self-similar m e a s u r e corresponding to ( S b . . . , S N; Pl,... ,PN)" ($1,... , SN) satisfies the s t r o n g s e p a r a t i o n c o n d i t i o n if
Si(A) n Sj(A) = 0
14. Self-similar sets and measures
191
for i ¢ j . ($1,. •. , S ~ ) satisfies the o p e n s e t c o n d i t i o n (OSC) if there exists a nonempty open set U C R ~ with Si(U) C U and S~(U) A Sj(U) = 0 for i ¢ j . Schief (1994) has shown t h a t the open set U in the above definition can always be chosen to be bounded and satisfy U N A ¢ 9. If ($1,... , SN) satisfies the strong separation property t h a n it also satisfies the open set condition. If ($1,... , SN) satisfies the OSC and P is the self-similar measure corresponding to ( $ 1 , . . . , Szv; sD,. • • , SD) then P is the normalized D-dimensional Hausdorff measure restricted to the a t t r a c t o r A, i.e. 0 < TiP(A) < co and
P
(14.4)
1
=
D
7.ID(A) 7-LIA
(see Hutchinson, 1981, p. 737/738). By { 1 , . . . , N}* we denote the set of all words on the a l p h a b e t 1 , . . . , N including the empty word 9. If ( q l , . . • , q~) is an N - t u p l e of real numbers and a = a l . . . aN belongs to { 1 , . . . , N}* then define
q~= (i~=lq~i , otherwise For a n o n e m p V word a = a l . . . a,~ E { 1 , . . . , N}" set (7- = ~ ,
n= 1
( (71 . . -
(7n--1~
n >
1.
If a is a word then the l e n g t h o f a, denoted by lal, is 0 if a = q} and equal to n if cr = a l . . . aN. For m < 1(71 let
(71m ~-
{
0, (71 . . . a m ,
m = 0 m > 1
be the restriction of a to m. A n a t u r a l order for words is defined by I -I and
-II l =
For an infinite sequence ~7 E ( 1 , . . . , N } N the restriction ~I-~ is defined in an analogous way. A word a E { 1 , . . . , N}* is a p r e d e c e s s o r of y iff r/ll~ I = a .
A finite set F C ( 1 , . . . , N}* is called a f i n i t e a n t i c h a i n iff any two elements of F are incomparable with respect to the order given above. A finite antichain F is called maximal iif, for every finite antichain F ~ c ( 1 , . . . , N}* with F C F t, we have F -- F r.
192
III. Asymptotic quantization for singular probability distributions
A finite antichain F is maximal if and only if every sequence in {1,... , N} N has a precessor in F. If (ql,... ,qN) is a probability vector and F is a maximal finite antichain then
Z aCF
(14.5)
q~
=
1.
If 0 < ¢ < min{ql,... , qN} then F(¢) = ( a • ( 1 , . . . ,N}*: q~- > ¢ > q~} is a maximal finite antichain. For a • ( 1 , . . . ,N}* set S~=
id,
a=~
(Sat o... OSan, (7 ~- (71...(7n where id is the identity on R d.
14.2
An
upper
bound
for the
quantization
dimension
We use the notation introduced in 14.1. In the following (Pl,... ,P~) is always a probability vector with pi > 0 for i = 1 , . . . , N and P is the self-similar probability measure on Iit~ corresponding to (S1,... , SN; Pl,... ,PN). 14.1 L e m m a For every n >_ N and every r • [1, oo), (14.6)
V,~,~(P) <_ min
{L
pis~Vn,,~(P): 1 <_ n,,
i=l
L i=l
}
ni <_ n .
Proof Since Si is a similarity transformation it follows immediately (see L e m m a 3.2) that (14.7)
V,,,,(P o SF 1) = srVm,,(P)
for all i • {1,... , N } and all m • N. Using (14.3) Lemma 4.14 (b) implies, for N
n l , . . . ,nN • N with r~ _> 1 and ~ ni _< n, i=l N
Vn,r(P) < ~-~piVm,r(P o S [ ' ) . i=l
Using (14.7) to substitute each summand on the right hand side yields the assertion of the lemma. []
14. Self-similar sets and measures
193
14.2 C o r o l l a r y Let P C { 1 , . . . , N}* be a finite maximal antichain, n C N with n >_ lPI and r C [1, +oo). Then (14.8)
Vn,r(P) < minl~p~s~Vn~.~(P): l < ha, ~-~n~ < n } "aEF
crEF
Proof
The corollary follows from L e m m a 14.1 by induction on max{lal : a C F}.
[]
14.3 L e m m a For every n > N, N
e~,oo(P) < min~t max s, en.~o(P): 1 <
(14.9)
--
/ I < i < N --
~'
"
"
--
--
77"i~
~
n, < n } --
"
i = l
Proof N
Since s u p p ( P o S~-1) = Si(A) and A = [.J S~(A) the lemma is an immediate eonse-
i=l
quence of L e m m a 10.2(b) and L e m m a 10.6(b).
[]
14.4 L e m m a Let r C [1 + oo) be fixed. Then there exists exactly one number ~r C (0, +oo) with N
~ ( 8 Pi ri)~~+r r = 1.
(14.10)
i=1
Proof N
Since 0 < pis~ < 1 the function t -+ Y~(pis~) t is strictly decreasing and continuous. i=l
Since this function tends to N as t tends to 0 and takes a value less than 1 at t = 1 the intermediate value theorem implies the existence of a unique t E (0, 1) with N
y~(pisr) t
=
1.
Then ~r = ~
satisfies the conclusions of the lemma.
[]
i=1
14.5 P r o p o s i t i o n
Let r C [1, q-c~) and let ~r satisfy (14.10). Then lim sup ne~,r < oo, m
in particular, the upper quantization dimension Dr(P) of P is less than or equal to I~ r
.
194
III. Asymptotic quantization for singular probability distributions
Proof qi = (pisS)"+'~, and Co = min{qb .. ,qt¢}. Then we have ~0 > 0. Let m , n E N be arbitrary with m < c ~ a n d s e t ¢ = ¢ 0 - - l m~-. F o r P ( ¢ ) = { a E {1,... ,N}*: q~- > ¢ > q,} it follows by (14.5) that Let
1=
:
~-~q, ,er(~)
~
~er(~)
qa-qawl
> c¢olr(~)I , hence
Ir(e)l _< (e~0) -~ Using Corollary 14.2 we deduce
act(E)
(p~s: )r+~, (p~s;),+~, vm,,(p) <_ (¢ "" )'+',Vm,,(P) ~
q~
aer(¢)
= go'~'/(m)~vm,r(p). Thus we obtain
r
_ r g_
r
n~Vn,r(P) <_~o "r m~Vrn,~(P) which implies
ne,~,,(P) '~" <_ ¢o'mem,r(P) "r. Since, for fixed m, this inequality holds for all but a finite number of n we get lim sup ne,~,r(P) '~" <_¢o lmem,r (P)~" 7"1,-+00
< +cx) []
and the proposition is proved. 14.6 R e m a r k The proof of the preceding proposition shows that
(14.11)
limsupne,~,,(P) '~" < max{(pls~) . , + ~. , , . ... , ( p., , , , ~ )
~,+-,
"
,~,~,,(
)
14. Self-similar sets and measures
195
14.7 P r o p o s i t i o n Let D be the similarity dimension o f ( & , . . .
, SN). Then
lira sup ne~,oo (p)V < +oc,
in particular, the upper quantization dimension Doo(P) of P is less than or equal to D. Proof Set qi = s ° and eo = m i n { q l , . . . , qN}. Let m, n E N be arbitrary with ~ < 5 2. Set 5 = 5 0- - I r~a a n d F ( 5 ) = {a E { 1 , . . . ,N}*: q~- _> 5 > q~}. SinceF(e) i s a m a x i m a l finite antichain it follows by Lemma 14.3 and an induction argument that
en,oo(p)D~ min~ max sD~ , ~ ( P ) v l~er(~ )
:l
, ~er(e)
_
As in the proof of Proposition 14.5 we see that IF(e)I <_ ~, ~ so that
e,~,oo(P) D <_ max sDe,~oo(P) o ~er(e) <_ 5era,oo(P) D =
5;l~e.,.(P) v
and, hence,
,~era,oo(P)~' _< ~;'-~e~,+(P) D. For fixed m this holds for all but finitely many n and yields lim sup ne,,,oo(P) D < +oc. The remaining statement of the proposition follows from Proposition 11.3
[]
14.8 R e m a r k The proof of the preceding proposition shows that
limsupne~,o~(P) D < max(sTD,..,
(14.12)
14.3
A lower
bound
for the
s~ D) inf mem oo(P) D.
quantization
dimension
The general assumptions in this section are the same as those in the preceding section. 14.9 L e m m a For every 5 > O, (14.13)
inf{P(B(z,¢)):
z E A} > O.
196
III. Asymptotic quantization for singular probability distributions
Proof D a n d F ( t ) = { a e { 1 , . . . , N } * : s ~ _ D _> t > S~}. Then r(t) is a finite Sett--(~) maximal antichain. Thus a = min{p~: a C F(t)} > 0. Let z E A be arbitrary. Since A = (.J S~(A) there is a 7- C F(t) with z E S~(A). Since ST is a similarity we have ~er(0 diam(S~(A)) = s~ diam(A) and, hence, diam(S~(A)) < c. Thus, we have S~(A) C B(z,e). Prom (14.3) it follows that
P(S~(A)) = ~
p ~ e o S[I(S~(A))
~erCt)
>_p ~ P ( S ~ ( S ~ ( A ) ) ) = Pr.
Combining the last two results we obtain
P ( B ( z , ¢ ) ) > p~ > a > 0 and the lemma is proved.
[]
14.10 L e m m a
Let ( S~,. . . , SN) satisfy the strong separation condition and let r E [1, +oo) be given. Then (14.14)
V,~,r(P) = min{~-~p~s~V,~,,r(P): 1 < ni, ~-~ni < n i----1
}
i=1
for all but finitely many n E N. Proof "<": T h a t Vn,r(P) is less than or equal to the right hand side for all but finitely many n is the statement of Lemma 14.1. ">": To show the converse inequality let ~ = min{d(S~(A), Sj(A)): i # j}. Then we have ~ > 0. Prom Lemma 14.9 we deduce
By Lemma 6.1 we know that lim V,~,~(P) = O.
r~-~oo
14. Self-similar sets and measures
t97
Hence there exists an no E N with Y~,r (P) < fl for all n > no. Let on be an n-optimal set of centers for P of order r. Then L e m m a 13.8 implies t h a t 1
r
Since the function t -+ t~minP(B(y,t)) is
for all n > no and all a E on. --
yEA
non-decreasing it follows t h a t
i.e.
6 for all n _> no and all a E a~. By the definition of 5 we deduce that, for n > no and i , j E { 1 , . . . , N } with i ¢ j , the sets an,i = (a E a~ : W(a[a~)MSi(A) • 0} and an,j -- {a E a~: W(aIan) M S j ( A ) ¢ O} are disjoint. Using (14.3) we obtain
V~,~(P) =
min ]ix - a[]~ dP(x)
~EOtn N
f min }lSi(x) - all r dR(x)
= Epi
J aEa~
i=i N
- - - - E p i f min [ [ S i ( x ) - aHrdP(x) i=t j aea~,, N
= Epis r f min [Ix - b[[r dR(x) i=t J bes71(~,~) N
pisrVm,r(P), •
where ni = I~,~1 ~ 1 Since
N Z n i ~ IC~nl z n i=1
we deduce
198
III. Asymptotic quantization for singular probability distributions []
14.11 Proposition
Let ( $ 1 . . . , SN) satisfy the strong separation condition and let r E [1, +oo) be g/ven. Moreover, let nr satisfy (14.10). Then lira infne~.~r > 0,
in particular, the lower quantization dimension D___r(P ) is greater than or equM to t~. Proof Since [A[ = c~ we have V,,~(P) > 0 for all n 6 N. Let no E N be such that (14.14) holds for all n _> no. Choose c > 0 with V~,r(P) _> c n - ~ for all n < no. We will show by induction on n that
V,~,,.(P) > c n - ~
(14.15)
for all n E N. Let n C N be such that
Vk,~(P) >_ c k - ~ for all k < n. Using this assumption and (14.14) we obtain
N V,~,r(P) = m i n ~ E P i s : V ~ , r ( P ) : l < _
} ni _< n
hi, E i=1
pisicn i
_
: 1 < ni,
< 1
x i:l
= cn-~
i=1
rain,
pis
-~
: 1
<_ ni,
-i=1
<_ 1
.
n
By Lemma 6.8 we have N
N
mm~ mpis i(-) [" i = 1
n
~ : 1 <_ hi,
N
<_ 1 i=1
Thus we get
v~,~(P) >_ cn ~,
>_
rt~r
pis~)~+~ ~ i=1
14. Self-similar sets and measures
199
and (14.15) is proved. It follows that nen,r(P) ~T > c ~ > 0 for all n C N, in particular liminfne~,r(P) ~ > O. ~-+oo
The remaining statement in the proposition is an immediate consequence of Proposition 11.3. [] 14.12 P r o b l e m Does the conclusion of Proposition 14.11 remain true under the weaker assumption that ($1,... , SN) satisfies the open set condition? 14.13 P r o p o s i t i o n Let ($1,... , SN) satisfy the open set condition and let D be the similarity dimension of ($1,... , SN). Then lim infne~ oo(P) D > 0, in particular, the lower quantization dimension D_oo(P ) is greater than or equal to D. Proof Since supp(P) =- A -- supp(7/~A ) the statement of the proposition follows immediately from Example 12.10, Theorem 12.18 and Remark 11.2. []
14.4
The
quantization
dimension
The general assumptions are the same as in Section 14.2. We denote the similarity dimension of ($1,... , SN) by D. In this section we will show that, for most serf-similar measures, the quantization dimensions of different orders are different. 14.14 T h e o r e m Let r E [1, +oc), let ~r C (0, oc) be defined by N
(14.16) and let ( S I , . . .
E(p~s[);~z;~ = 1, i=1 , SN)
satisfy the strong separation condition. Then
0 < lim infne,,r(P) ~" <_ limsupe~ r ( P ) ~ < + o o .
In particular, the quantization dimension Dr ( P) of order r exists and equals ~ . Proof The result follows from Proposition 14.5 and Proposition 14.11.
[]
200
IlL A s y m p t o t i c quantization for singular probability distributions
Next we will prove some auxiliary results concerning the function K : [1,+oo) -~ (0, +co), r --+ at. Define F : [1, +co) × (0, +co) -+ R by N
F ( r , t ) = E ( p , s [ ) r ~ ; - 1. i=1
By definition K is the unique function on [1, +co) with F(r, K(r)) = 0 for all r C [1, co). Since N
OF -~r (r't) = i=1
t (t+r)2(-l°gPi+tl°gsi)
and N
- - ~ ( r , t ) = i=l
(t + r)2(l°gpi + r l ° g s i )
implicit differentiation yields N
~ i=~1(PISS)~ (14.17)
g'(r) =
(logpi-- t% log si) g r ~_~(pis[)_~zT~(logpi + rlogsi) i=1
14.15 L e m m a I f there exists an ro >_ 1 with K'(ro) = 0 then p~ = s D for i = 1 , . . . , N and for all r E [1, +co).
K r =
Proof r
~o
Set qi : (pisi°) ~°+~°. Since K'(ro) -- 0 we derive from (14.17) and tcro > 0 that N
(14.18)
E
qi (log pi - t%o logsi) = O.
i=l
By the definition of qi we have si = ( =qi ".o } \Pi /
(14.19) and hence
~ro log si = t~ro(_ logpi +
r o -~-/~ro
ro
1%o
--
logqi)
t%° l o g pi + ro + tcr_.______~o log qi. TO ro
D
14. Self-similar sets and measures
201
Substituting this value into (14.18) yields N
E
qi(r° + ~ro log Pi
i=1
Since ~
r0
ro + aro log qi) = O.
7"0
r°
> 0 this implies N
E
qi log p--!= O. qi
i=1
Since the logarithm is a strictly concave function this implies Pi = qi
for i = 1 , . . . , N. Hence (14.19) yields 1..2_ Si = p~O i.e.
Pi = S~ r° N
Since y~ Pi = 1 the definition of the similartiy dimension of ( S b . . . , Sly) yields i=1
and Pi = s~ for i = 1 , . . . , N. Using this identitiy in (14.16) we obtain N \ $ ./ rq*-~r ~'~ (s9+r~
~
1.
i=1
This implies
D+r
D = t % ~ /~r - b r
and, hence ~r = D for all r • [1, +co).
[]
14.16 L e m m a
If (p~,... ,p~) = ( s f , . . . , s~)
then ~, = D for ~tI r • [1, + o o ) .
I f (Pl,... ,P~) ~ ( s ~ , . . . , sg) then K : (0, +oo) --+ R, r --+ ~;r is strictly increasing with lim nr --= D.
202
IIL Asymptotic quantization for singular probability distributions
Proof If ( p l , . . . ,p~¢) ----( s ~ , . . . , s~) then the last part of the proof of L e m m a 14.15 shows that ~r = D for all r E [1, +co). If (Pl,... ,PN) ~ (s~,... , s~) then it follows from L e m m a 14.15 that K'(r) ~ 0 for all r C [1, +c~). Since, as a consequence of the definition of ~r, the function K is increasing this implies that K is strictly increasing. In particular moo = lim mr exists r--+oo
in [0, +c~]. Since N
N
~p
r~r
1 = Z ( p i s r ) r+'~ <_ ~ i=1
we obtain
s: ÷'"
i=1
- rm - r~
D,
r-l-mr
hence Dr mr ~
-
-
r-D
for r > D. Thus we deduce moo < D . Since N
N
rif T
1 = ~oolim--~-~)(pis _ 'r - ~-~ = ~oolimZ p i ~:~ s,~+~" i=1
i=l
N
i=1
we deduce
m~o = D. [] 14.17 T h e o r e m Let q, r E [1, +oo] and let ($1,... , SN) satisfy the strong separation condition.
(i) If (Pl,... , P~) = (sO,... , s °) then the quantization dimension Dr(P) of order r exists and equals D. (ii) If (Pl,... ,PN) ~ ( s f , . . . ,s D) then the quantization dimension D r ( P ) exists and J D , r = +o0 Dr(P) mr,
r < -'[-00.
Moreover, q < r =¢. Dq(P) < Dr(P) and lim Dr(P) = D.
r-+OO
14. Self-similar sets and measures
203
Proof That D ~ ( P ) exist and equals D follows from Proposition 14.7 and Proposition 14.13. The remaining statements follow from Theorem 14.14 and Lemma 14.16. [] 14.18 P r o b l e m Does Theorem 14.17 hold under the weaker assumption that ($1,... , SN) satisfies the open set condition? 14.19 R e m a r k
It is shown by Kawabata and Dembo (1994, Theorem 4.1), that under the assumptions of Theorem 14.17 the rate distortion dimension of P equals N
p~ log p~ i=1 N
Pi log si i=1
and, therefore, equals the Hausdorff dimension dimn(P) of P by Cawley and Mauldin (1992, Theorem 2.1).
14.5
The
quantization
coefficient
In the preceding sections we have shown that, for many self-similar probabilities P, the inequality 0 < liminfnen,r(P) Dr < limsupnen,r(P) D~ < +co 1%--~00
1%-"~00
holds for r E [1, col and the quantization dimension Dr of order r for P. It is, therefore, natural to investigate the problem under what conditions the above sequence has a finite and positive limit. Taking r < co and generalizing Theorem 6.2 and (6.4) the ~7-th power of this limit, if it exists, is called the r-th quantization coefficient 1
Qr(P). For r = co, lim n~e,~,~, if it exists, is called the covering coefficient or r~-+oo
quantization coefficient of order co and does only depend on supp(P) ((10.10) and Theorem 10.7). Little seems to be known about the above problem. We will first state a positive result concerning the quantization coefficient of order co. To this end we need the following definition. 14.20 D e f i n i t i o n An N-tuple (Sl,... , sN) of real numbers is called a r i t h m e t i c if there is a positive s E R with Sl,... ,SN E s Z : = {sn: n E Z}. 14.21 T h e o r e m Let ($1,... , SN ) be an N-tuple of contracting similarity transformations of R d statisfying the open set condition and let the corresponding N-tuple ( S l , . . . , SN) of contraction numbers be such that (logs1,... ,logsN) is not arithmetic. Let A be the
204
III. Asymptotic quantization for singular probability distributions
attractor of ($1,... , SN) and D its similarity dimension. Then (ne,,oo(A)D),er~ has a finite and positive limit, hence the quantization coefficient Qoo(A) of A exists in
Proof With an argument similar to that given in Remark 10.9 one can show that lim nem~(A) D exists if and only limN(c)E D exists, where N(c) is the minimal numr/.--', OO
¢--~0
ber of balls of radius ¢ > 0 that cover A. If one of the limits exists then so does the other and they agree. Due to a result of Lalley (1988) combined with a result of Schief (1994), limN(¢)¢ D exists in (0, +oo) under the assumptions of the theorem ¢~t0
(see also Falconer, 1997, p. 123, Proposition 7.4).
[]
The following proposition shows that the quantization coefficient Qo~(A) need not exist if the assumption is dropped that (log s b . • • , log s~v) is not arithmetic. 14.22 P r o p o s i t i o n
Let N >_ 2, ($1,... , SN), A, and D be as above but assume that ( S 1 , . . . , SN) satisfies the strong separation condition and that all Si have the same contraction number s. Then 0 < lim inf ne~ oo(A) D < lim sup ne~,oo (A) D < +oo. Hence the quantization coefficient Qoo(A) does not cxist. Proof According to Proposition 14.7 and 14.13 we know that
0 < liminfne,,oo(A) D < limsupne,~,oo(A) D < c~. 1"~--+o o
Therefore, it remains to show that (nen,~(A)n)neN does not converge. Let no C N satisfy no _> N and
1 min{d(S~(A), Sj(A)) : i # j}. e,~o,~(A) < -~ Using Lemma 10.2(b) it follows immediately that, for n > no, N
n
We claim that (14.21)
e,~,~(A) = se[~],~(A)
for all n _> no, where [~] denotes the greatest integer less than or equal to ~. Setting ni = [~] it follows from (14.20) that
en,oo(A) <_ se[~rl,oo(A).
14. Self-similar sets and measures
205 N
Now let n l , . . . , nN C N satisfy 1 _< hi, ~ ni _< n, and i=l
e,~,c~(A) = max se,~,cc(A). Without loss of generality we assume nl <_ n2 <_ ... <_ nN SO that
e,~,~(A) = se,~,,oo(A) and nl < [~]. Thus we deduce =
> 8eE j, (A)
and our claim is proved. Since N > 2 the equality (14.21) implies
e2,~o+l,~(A) = seno,~(A) = e2,~o,o~(A)
(14.22) and, therefore, (14.23)
eNk(2,~o+t),~(A) = em(2,,o),oo(A)
for a l l k E N. Now assume that (ne,~,~(A)D),,e N converges to some constant c E (0, co). Using (14.23) this implies c ---- lim (gk(2no + 1))eN,(Uno+l),~(A) D k--+ oo
---- lim (Nk(2no + 1))em(2,~o),~(A) ° k--~oo
•
2no + 1
k
D
{2no -}- 1~ --
)C,
which yields a contradiction and finishes the proof of the proposition•
[]
14.23 R e m a r k It remains an open problem to characterize those serf-similar sets A for which the quantization coefficient Q ~ ( A ) exists by a natural condition on the generating Ntuple ( S l , • . . , SN). For 1 < r < co and general serf-similar probabilities P almost nothing is known about the existence of the quantization coefficients Qr(P). The only serf-similar probability P for which the existence of all quantization coefficients Qr(P), 1 < r < co is known seems to be the restriction of Lebesgue measure to the unit cube in R d. The classical Cantor distribution P on R has no quantization coefficient Q2(P) (cf. Graf and Luschgy, 1997). Below we will summarize the known results for the Cantor distribution. Let S1,S2: R -+ R be defined by S i x = ~xl and S2x = .5xl +.5.~ Then $1 and $2 are similarity transformations with contraction number ~. The attractor of the pair
206
III. Asymptotic quantization for singular probability distributions
($1, $2) is the classical C a n t o r set C c [0,1]. The similarity dimension of ($1, $2) equals D = ~log 3" Let P be the self-similar probability corresponding to ($1, $2,½,½) (see (14.3)). According to (14.4) P is the normalized D-dimensional Hausdorff measure on C. This distribution is called the (classical) C a n t o r d i s t r i b u t i o n . Since (S1, $2) satisfies the strong separation condition and since (s D, s D) = (½, ½) we know from Theorem 14.17 that D is the quantization dimension of P of order r for all r C [1, + o o 1. In the following theorem we will describe all optimal sets of n-centers, the quantization errors V~,r (P), and all limits points of the sequence (n 2/DV,,,r(P)),~eN for r - 2. In particular we show that the quantization coefficient Q2(P) does not exist. To do this we need some more notation. For a E {1, 2}* let ag = Sg(½). For n _> 1 let l(n) = [log 2 n]. For I C {1, 2} ~('0 with [I] -- n - 2° 0 let
an(I) = {a~: a C {1,2}t(~) \ I } U U{a~l,ag2} gel
Define f : [1, 2] -~ R by 1
2
f(x) ----~-~xZ(17- 8x).
14.24 T h e o r e m Let P be the Cantor distribution and let D, l(n), an(I), and f be defined as above. (a) For every natural number n >_ 1 the following conditions axe equivalent (i) a is an n-optimal set of centers of order 2 for P (ii) There exits an I C {1,2} L(n) with a = (~(I) (b) For every natural number n >_ 1, V~ 2(P) =
,
1
1 (2/(n)+, _ n + 1
18,(~) . g
~(n - 2~(~))).
(c) The set of all accumulation points of (n~Vn,2(P)),~n is the intervall
1
(Notice that
17
=
In particular P has no second quantization coe~cient. The proof is given in Graf and Luschgy (1997) and will be omitted here.
14. Self-similar sets and measures
207
Notes The definition of self-similarity as used in this section was introduced by Hutchinson (1981). His paper also contains the basic results about self-similar sets and measures. Other references concerned with this subject are the books of Barnsley (1988), Falconer (1990, 1997), and Mattila (1995). The book of Barnsley (1988) describes many interesting examples of self-similar sets and measures. The idea of studying the quantization of self-similar probabilities goes back to Zador (1982). But his results are not formulated in a rigorous way. Since then nobody seems to have dealt with the problem. Thus, all the quantization results in this Section 14 seem to be new.
Appendix Univariate distributions The following univariate distributions served as examples. Recall that the r-th (absolute) moment about the center of a real random variable X is given by V~(X) = aE infIR E ] X - a l l
N o r m a l distribution
N(O,a 2)
The normal distribution is strongly unimodal. If X is N(0, a~)-distributed, then V,(X)=EIXr
~ 2/~-a2rF(r+l ~ r : > l .
= V T
~TJ'
-
In particular v, ( x ) = o vr
;.
Logistic distribution L(a) The density (with respect to A) is given by
exla h(x) = a(1 + e~/~) 2 ' x c l~,
where a > 0 is a scale parameter. The logistic distribution is symmetric about the origin and strongly unimodal. The distribution function takes the form 1 e~/~ F ( x ) - 1 + e-Z~ " - 1 + e~/~ Suppose that X has distribution L(a). Then oo
V r ( X ) = EIX[*" = 2arF(r + 1) E ( - 1 ) J - l j
-r, r _~ 1
j=l
= 2 a r r ( r + 1)(1 -- 2 - ( r - x ) ) ¢ ( r ) ,
r > 1,
210
Appendix
Univariate distributions
where ~ denotes the Riemann zeta function. In particular a27r 2
VI(X) = 2alog2, V2(X) -
3
G e n e r a l i z e d L o g i s t i c d i s t r i b u t i o n GL(a, b) The density is defined by
F(2/b)e~:/ab h(x) = a r ( i / b ) 2 ( 1 + :/~)~/b, x E / R , where a > 0, b > 0. We have GL(a, 1) = L(a). D o u b l e E x p o n e n t i a l d i s t r i b u t i o n DE(a) The density is given by
h(x) = l e - I ~ l / ~ , x ~ IR, where a > 0 is a scale parameter. The double exponential distribution is strongly unimodal. The distribution function takes the form
F ( x ) = ~ 2v ' x < O I 1 - - ~i e-~/a , x > 0 . I. If X is DE(a)-distributed, then
V~(X) = E I X F -- a~r(1 + r), r _> 1. In particular
VI(X) = a , V2(X) = 2a 2. Double Gamma
d i s t r i b u t i o n DF(a, b)
The density is given by 1 h(x) - 2abF(b ) [x[b-le -Izl/a, x E IR, where a > 0 is a scale parameter and b > 0 is a shape parameter. We have DF(a, 1) = DE(a). If X is DF(a, b)-distributed, then v ~ ( x ) = E I X I r - a r r ( b + r) r(b)
, r > 1.
In particular
VI(X) = ab, V2(X) = a2b(b + 1).
Appendix
Univariate distributions
Hyper-exponential
211
d i s t r i b u t i o n HE(a, b)
The density is given by
h(x) - 2aF(1/b) exp -
, x e/R,
where a > 0 is a scale parameter, b > 0. The hyper-exponential distribution is strongly unimodal if b > 1. We have HE(a, 1) = DE(a) and HE(a, 2) -- N(0, a2/2). Let X be HE(a, b)-distributed. Then V~(X)
a"r(~--~b~) r(-~) ,
= EIXI ~ -
r > 1.
U n i f o r m d i s t r i b u t i o n U([a, b]) The density is given by 1
h(x) = b--~l[a,b](X), where a, b E /R, a < b. The uniform distribution is symmetric about (a ÷ b)/2 and strongly unimodal. Let X be U([a, b])-distributed.Then Med(X) --- {(a + b)/2}, E X = (a + b)/2,
V~(X)= E X
a +b r -"--7
( b - a)r -
( l q - r ) 2 ~ ' r~_ 1.
T r i a n g u l a r d i s t r i b u t i o n T(a, b; c) The density is given by 2 ( b - x)
2(x - a)
h(x) - ( c - - a ) ' ~ - - a)l[a,c](x) + ( b - c ) ( b - a)l[c,b](X), where a < c < b. Consider the case c : (a + b)/2. Then the triangular distribution is symmetric about (a + b)/2 and 4
h(x) - (b - a) 2 ((x - a)l[a,(a+b)/2l(X) q- (b - x)l((a+b/2,b](X)). If X is T(a, b; a~-~)-distributed, then u~(x) = E X -
a+b~ -5-(b - a) r -- (r q- 1)(r + 2)2 r-1 ' r > 1.
In particular
Vl(X)-
b-a '
V2(X)- (b-a) 2 24
212
Appendix
Univariate distributions
E x p o n e n t i a l d i s t r i b u t i o n E(a) The density is given by
h(x) = 1-e-~/%0,=)(x), a
where a > 0 is a scale parameter. The exponential distribution is strongly unimodal. The distribution function takes the form - - e -x/a ,
F(x) = If X is E(a)-distributed, then
{1 O,
X >0 x <_ O.
Mud(X) = {alog2}, E X = a, VI(X) = E I X - alog21 = a l o g 2 , V2(X) = V a r X = a 2. Weibull
W (a, b)
distribution
The density is given by h(x)
~xb-lexp( -
x b
where a > 0 is a scale parameter and b > 0 is a shape parameter. The Weibull distribution is strongly unimodal for b >_ 1. The distribution function takes the form F(x)=
{
1-exp
-
~
0,
,
x>0 x<0.
We have W(a, 1) = E(a). The distribution W(a, 2) is called Rayleigh distribution. Let X be W(a, b)-distributed. Then Mud(X) = {a(log
2)1/b},
E X " = arF (1 + b) , r _> 0, V2(X) = V a r X = a2(F( 2 + 1 ) - F( 1 + 1)2). o
(]
Let b = 2. Then
V~(X) = EIX - a ~ [ 3 71-
v~(x) = a~(1 - -~). For
a= [1Vi~ + ~ ( ~ one obtains Vl(X) = 1.
2¢(~))]-'
= 2.7027...
Appendix Gamma
Univariate distributions distribution
213
F(a, b)
The density is given by
h(~)-
1
~_~_~/o.
abF(b) X
e
,,
1(o,¢¢)(x),
where a > 0 is a scale parameter and b > 0 is a shape parameter. The G a m m a distribution is strongly unimodal for b > 1. We have F(a, 1) = E(a). If X is F(a, b)distributed, then
EX" = arF(b + r) , r > 0 , r(b)
E X = ab, V2(X) = V a r X
=
a2b.
Let b = 2. Then Med(X) = {a- 1.6783... } VI(X) = a . 1.0517...
and for a = 0.9508... one gets VI(X) = 1. G e n e r a l i z e d G a m m a d i s t r i b u t i o n GF(a, b, c) The density is given by
h(x)-
c__ zb_ ~
z c
where a > 0, b > 0, c > 0. We have GF(a, b, 1) -- F(a, b). Pareto
distribution
P(a, b)
The density is given by
h(x) = ba%-(b+l)l(a,oo)(x), where a > 0 is a scale parameter, b > 0. The distribution function takes the form
F(x) = ( 1 t 0, Let X be P(a, b)-distributed. Then Med(X) = {a21/b},
(~)b,
x >a x<_a.
214
Appendix arb
EX r -
Univariate distributions
, b > r ~ 0,
b-r
B(X)
= E I X - a2~lbl - ab(211b - 1) b-1 , b>l,
v:(x)
= Va~X
=
a2b (b - 2 ) ( b - 1)~ ' b > 2.
Cantor distribution Let C c dimension Hausdorff Then P is
[0, 1] be the (classical) Cantor set and let D = ~mog3be the Hausdorff of C. Then the Cantor distribution P ist the normalized D-dimensional measure on C. Define $1, $2: R --+ R by S i x = ~xl and S 2 x = ~xl + ~" the unique Borel probability on R with
p= l(ps, + e,~). Let X be P-distributed. Then 1 V2(X) = V a r X = ~1 E(X) = 3'
Bibliography Abaya, E.F. and Wise, G.L. (1981). Some notes on optimal quantization. Proceedings of the International Conference on Communications (Denver, Colorado), 30.7.1-10.7.5. IEEE Press, New York. Abaya, E.F. and Wise, G.L. (1984). Convergence of vector quantizers with applications to optimal quantizers. SIAM J. Appl. Math. 44, 183-189. Abut, H., editor (1990). Vector Quantization. IEEE Press, New York. Adams Jr., W.C. and Giesler, C.E. (1978). Quantizing characteristics for signals having Laplacian amplitude probability density function. IEEE Trans. Communications 26, 1295-1297. Agrell, E. and Eriksson, T. (1998). Optimization of lattices for quantization. IEEE Trans. Inform. Theory 44, 1814-1828. Anderberg, M.R. (1973). Cluster Analysis for AppLications. Academic Press, San Diego. Aurenhammer, F. (1991). Voronoi diagrams: A survey of a fundamental geometric data structure. ACM Computing Surveys 23, 345-405. Baranovskii, E.P. (1965). Local density minima of a lattice covering of a fourdimensional Euclidean space by equal spheres. Soviet Math. Dokl. 6, 1131-1133. Barnes, E.S. and Sloane, N.J.A. (1983). The optimal lattice quantizer in three dimensions. SIAM J. Algebraic Discrete Methods 4, 30-41. Barnsley, M. (1988). Fractals Everywhere. Academic Press, London. Bartlett, P.L., Linder, T., and Lugosi, G. (1998). The minimax distortion redundancy in empirical quantizer design. IEEE Trans. Inform. Theory 44, 1802-1813. Benhenni, K. and Cambanis, S. (1996). The effect of quantization on the performance of sampling designs. Techn. Report No. 481, Center for Stoch. Processes, Univ. of North Carolina, Chapel Hill. Bennett, W.R. (1948). Spectra of quantized signals. Bell Systems Tech. J. 27, 446-472. Bock, H.H. (1974). GSttingen.
Automatische Klassifikation.
Vandenhoeck and Ruprecht,
216
Bibliography
Bock, H.H. (1992). A clustering technique for maximizing ~o-divergence, noncentrality and discriminating power. Analyzing and Modeling Data, 19-36 (ed., M. Schader). Springer, Berlin. Bollob£s, B. (1972). The optimal structure of market areas. J. Economic Theory 4, 174-179. Bollob~s, B. (1973). The optimal arrangement of producers. J. London Math. Soc. 6, 605-613. Bouton, C. and Pages, G. (1997). About the multidimensional competitive learning vector quantization algorithm with constant gain. Ann. Appl. Probab. 7, 679-710. Bucklew, J.A. and Cambanis, S. (1988). Estimating random integrals from noisy observations: Sampling designs and their performance. IEEE Trans. Inform. Theory 34, 111-127. Bucklew, J.A. and Wise, G.L. (1982). Multidimensional asymptotic quantization theory with r-th power distortion measures. IEEE Trans. Inform. Theory 28, 239247. Calderbank, R., Forney Jr., G.D., and Moayeri, N., editors (1993). Coding and Quantization. DIMACS Vol. 14, American Mathematical Society. Cawley, R. and Mauldin, R.D. (1992). Multifractal decomposition of Moran fractals. Adv. Math. 92, 196-236. Cambanis, S. and Gerr, N.L. (1983). A simple class of asymptotically optimal quantizers. IEEE Trans. Inform. Theory 29, 664-676. Cassels, J.W.S. (1971). An Introduction to the Geometry of Numbers. Second Printing. Springer, Berlin. Chatterji, S.D. (1973). Les martingales et leurs application analytiques. Lecture Notes in Math. 307 (Ecole d' Et~ de Probabilit~s: Processus Stochastiques), 27-135. Springer, Berlin. Cohn, D.L. (1980). Measure Theory. Birkh~user, Boston. Cohort, P. (1997). Unicitd d'un quantifieur localement optimal par le th~or~me du col. Technical Report, Labo. Probab., Univ. Paris 6. Conway, J.H. and Sloane, N.J.A. (1993). Sphere Packings, Lattices and Groups. Second Edition. Springer, New York. Cox, D.R. (1957). Note on grouping. J. Amer. Statist. Assoc. 52, 543-547. Cuesta-Albertos, J.A. and MatrOn, C. (1988). The strong law of large numbers for k-means and best possible nets of Banach valued random variables. Probab. Theory Related Fields 78, 523-534. Cuesta-Albertos, J.A., Gordaliza, A., and Matr£n, C. (1997). Trimmed k-means: an attempt to robustify quantizers. Ann. Statist. 25, 553-576.
Bibliography
217
Cuesta-Albertos, J.A., Gordaliza, A., and Matr£n, C. (1998). Trimmed best k-nets: a robustified version of an L~-based clustering method. Statist. Probab. Letters 36, 401-413. Cuesta-Albertos, J.A., Garci£-Escudero, L.A., and Gordaliza, A. (1999). Trimmed best k-nets: asymptotics and applications. Preprint. Dalenius, T. (1950). The problem of optimum stratification. Scandinavisk Aktuarietidskrift 33, 203-213. David, G. and Semmes, S. (1993). Analysis of and on Uniformly Rectifiable Sets. Mathematical Surveys and Monographs, Vol. 38, American Mathematical Society, Rhode Island. David, G. and Semmes, S. (1997). Fractured Fractals and Broken Dreams. Clarendon Press, Oxford. Dharmadhikari, S. and Joag-Dev, K. (1988). Unimodality, Convexity and Applications. Academic Press, Boston. Diday, E. and Simon, J.C. (1976). Clustering analysis. Digital Pattern Recognition, 47-94 (ed., K.S. Fu). Springer, New York. Elias, P. (1970). Bounds and asymptotes for the performance of multivariate quantizers. Ann. Math. Statist. 41, 1249-1259. Eubank, R.L. (1988). Optimal grouping, spacing, stratification, and piecewise constant approximation. SIAM Review 30, 404-420. Falconer, K.J. (1985). The Geometry of Fractal Sets. Cambridge University Press, Cambridge. Falconer, K.J. (1990). Fractal Geometry. Wiley, Chicester. Falconer, K.J. (1997). Techniques in Fractal Geometry. Wiley, Chicester. Fang, K.-T. and Wang, Y. (1994). Number-theoretic Methods in Statistics. Chapman and Hall, London. Federer, H. (1969). Geometric Measure Theory, Springer, Berlin-Heidelberg-New York. Fejes T6th, L. (1959). Sur la repr6sentation d'une population infinie par un nombre fini d' @16ments. Acta Math. Acad. Sci. Hung. 10, 299-304. Fejes T6th, L. (1972). Lagerungen in der Ebene, anf der Kugel und im Raum. Second Edition. Springer, Berlin. Fleischer, P.E. (1964). Sufficient conditions for achieving minimum distortion in a quantizer. IEEE Int. Cony. Rec., part 1, 104-111. Flury, B.A. (1990). Principal points. Biometrika 77, 33-41.
218
Bibfiography
Forney Jr., G.D. (1993). On the duality of coding and quantization. Coding and Quantization, 1-14 (eds., R. Calderbank et al.). DIMACS Vol. 14, American Mathematical Society. Fort, J.C. and Pages, G. (1999). Asymptotics of optimal quantizers for some scalar distributions. Preprint. Garci£-Escudero, L.A., Gordaliza, A., and MatrOn, C. (1999). A central limit theorem for multivariate generalized trimmed k-means. Ann. Statist. 27, 1061-1079. Gardner, W.R. and Rao, B.D. (1995). Theoretical analysis of the high rate vector quantization of LPC parameter. IEEE Trans. Speech Audio Processing 3, 367-381. Garkavi, A.L. (1964). The best possible net and the best possible cross-section of a set in a normed space. Amer. Math. Soc. Translations 39, 111-132. Gersho, A. (1979). Asymptotically optimal block quantization. IEEE Trans. Inform. Theory 25, 373-380. Gersho, A. and Gray, R.M. (1992). Vector Quantization and Signal Compression. Kluwer, Boston. Gilat, D. (1988). On the ratio of the expected maximum of a martingale and the Lp-norm of its last term. Israel J. Math. 63, 270-280. Goddyn, L.A. (1990). Quantizers and the worst-case Euclidean traveling salesman problem. J. Combinatorial Theory Series B 50, 65-81. Graf, S. and Luschgy, H. (1994a). Foundations of quantization for random vectors. Research Report No. 16, Applied Mathematics and Computer Science, University of Miinster. Graf, S. and Luschgy, H. (1994b). Consistent estimation in the quantization problem for random vectors. Trans. Twelfth Prague Conf. Inform. Theory, Stat. Decision Functions, Random Processes, 84-87. Graf, S. and Luschgy, H. (1996). The quantization dimension of self-similar sets. Research Report No. 9, Dept. of Mathematics and Computer Science, University of Passau. Graf, S. and Luschgy, H. (1997). The quantization of the Cantor distribution. Math. Nachrichten 183, 113-133. Graf, S. and Luschgy, H. (1999a). Quantization for random vectors with respect to the Ky Fan metric. Submitted. Graf, S. and Luschgy, H. (1999b). Quantization for probability measures with respect to the geometric mean error. Submitted. Graf, S. and Lusehgy, H. (1999c). Rates of convergence for the empirical quantization error. Submitted. Gray, R.M. (1990). Source Coding Theory. Kluwer, Boston.
Bibfiography
219
Gray, R.M., Neuhoff, D.L., and Shields, P.C. (1975). A generalization of Ornstein's distance with applications to information theory. Ann. Probab. 3, 315-328. Gray, R.M. and Davisson, L.D. (1975). Quantizer mismatch. IEEE Trans. Communications 23, 439-443. Gray, R.M. and Karnin, E.D. (1982). Multiple local optima in vector quantizers. IEEE Trans. Inform. Theory 28, 256-261. Gray, R.M. and Neuhoff, D.L. (1998). Quantization. IEEE Trans. Inform. Theory 44, 2325-2383. Gruber, P. (1974). 0ber kennzeichnende Eigenschaften yon euklidischen Pd4umen und Ellipsoiden I. J. Reine Angew. Math. 265, 61-83. Gruber, P.M. and Lekkerkerker, C.G. (1987). Geometry of Numbers. Second Edition. North-Holland, Amsterdam. Griinbanm, B. and Shephard, G.C. (1986). Company, New York.
Tilings and Patterns.
Freeman and
Haimovich, M. and Magnati, T.L. (1988). Extremum properties of hexagonal partitioning and the uniform distribution in euclidean location. SIAM J. Discrete Math. 1, 50-64. Hartigan, J.A. (1978). Asymptotic distributions for clustering criteria. Ann. Statist. 6, 117-131. Hochbaum, D. and Steele, J.M. (1982). Steinhaus's geometric location problem for random samples in the plane. Adv. Appl. Probab. 14, 56-67. Hoffmann-Jorgensen, J. (1994). Probability with a View Toward Statistics. Vol. 1. Chapman and Hall, New York. Hutchinson, J.E. (1981). Fractals and self-similarity. Indiana Univ. Math. J. 30, 713-747 Iyengar, S. and Solomon, H. (1983). Selecting representative points in normal populations. Recent Advances in Statistics, Papers in Honor of H. Chernoff, 579-591. Academic Press. Jahnke, H. (1988). Clusteranalyse als Verfahren der schliet3enden Statistik. Vandenhoeck and Ruprecht, GSttingen. Johnson, M.E., Moore, L.M., and Ylvisaker, D. (1990). Minimax and maximin distance designs. J. Statist. Plann. Inference 26, 131-148. Karlin, S. (1982). Some results on optimal partitioning of variance and monotonicity with truncation level. Statistics and Probability: Essays in Honor of C. R. Rao, 375-382 (eds., G. Kallianpur et al.). North-Holland, Amsterdam. Kawabata, T. and Dembo, A. (1994). The rate distortion dimension of sets and measures. IEEE Trans. Inform. Theory 40, 1564-1572
220
Bibliography
Kemperman, J.H.B. (1987). The median of a finite measure on a Banach space. Statistical Data Analysis based on the L1-Norm and related Methods, 217-230 (ed., Y. Dodge). North-Holland, Amsterdam. Kershner, R. (1939). The number of circles covering a set. Amer. J. Math. 61, 665-671. Klein, R. (1989). Concrete and Abstract Voronoi Diagrams. Lecture Notes in Computer Science 400. Springer, New York. Lalley, S. (1988). The packing and covering functions of some self-similar fractals. Indiana Univ. Math. J. 37, 699-709. Lamberton, D. and Pages, G. (1996). On the critical points of the 1-dimensional competitive learning vector quantization algorithm. Proceedings of the ESANN'96 (Bruges, Belgium), 97-101. Li, J., Chaddha, N., and Gray, R.M. (1999). Asymptotic performance of vector quantizers with a perceptual distortion measure. IEEE Trans. Inform. Theory 45, 1082-1091. Linder, T. (1991). On asymptotically optimal companding quantization. Problems of Control and Information Theory 20, 475-484. Linder, T., Lugosi, G., and Zeger, K. (1994). Rates of convergence in the source coding theorem, in empirical quantizer design, and in universal lossy source coding. IEEE Trans. Inform. Theory 40, 1728-1740. Linder, T., Zamir, R., and Zeger, K. (1999). High-resolution source coding for nondifference distortion measures: multidimensional companding. IEEE Trans. Inform. Theory 45, 548-561. Lloyd, S.P. (1982). Least squares quantization in PCM. IEEE Trans. Inform. Theory 28, 129-137. Lookabaugh, T.D. and Gray, R.M. (1989). High resolution quantization theory and the vector quantizer advantage. IEEE Trans. Inform. Theory 35, 1020-1033. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proc. Fifth Berkeley Symp. Math. Statist. Prob. 281-297, Univ. California Press, Berkeley. Mann, H. (1935). Untersuchungen fiber Wabenzellen bei allgemeiner Minkowski Metrik. Monatsh. Math. Physik 42, 417-424. Mattila, P. (1995). Geometry of Sets and Measures in Euclidean Spaces. Cambridge Univ. Press, Cambridge. Max, J. (1960). Quantizing for minimum distortion. IEEE Trans. Inform. Theory 6, 7-12. McClure, D.E. (1975). Nonlinear segmented function approximation and analysis of line patterns. Quart. Appl. Math. 33, 1-37.
Bibliography
221
McClure, D.E. (1980). Optimized grouping methods. Part 1 and part 2. Statistik Tidskrift 18, 101-110, 189-198. McGivney, K. and Yukich, J.E. (1997). Asymptotics for geometric location problems over random samples. Preprint. McMullen, P. (1980). Convex bodies which tile space by translation. Mathematika 27, 113-121. (Acknowledgement of priority: Mathematika 28, 191.) Milasevic, P. and Ducharme, G.R. (1987). Uniqueness of the spatial median. Ann. Statist. 15, 1332-1333. Moiler, J. (1994). Lectures on Random Voronoi Tesselations. Lecture Notes in Statistics 87. Springer, New York. Moran, P.A.P. (1946). Additive functions of intervals and Hausdorff measure. Proc. Cambridge Phil. Soc. 42, 15-23. Na, S. and Neuhoff, D.L. (1995). Bennett's integral for vector quantizers. IEEE Trans. Inform. Theory 41,886-900. Newman, D.J. (1982). The Hexagon theorem. IEEE Trans. Inform. Theory 28, 137-139. Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods. CBMS-NSF Regional Conference Series in Applied Math. Vol. 63. SIAM. Okabe, A., Boots, B. and Sugihara, K. (1992). Spatial Tesselations: Concepts and Applications of Voronoi Diagrams. Wiley, Chicester. Pages, G. (1997). A space quantization method for numerical integration. J. Comput. Appl. Math. 89, 1-38. P~rna, K. (1988). On the stability of k-means clustering in metric spaces. Tartu Riikliku 01ikooli Toimetised 798, 19-36. P~rna, K. (1990). On the existence and weak convergence of k-centres in Banaeh spaces. Tartu Ulikooli Toimetised 893, 17-28. Panter, P.F. and Dite, W. (1951). Quantization distortion in pulse-count modulation with nonuniform spacing of levels. Proc. Inst. Radio Eng. 39, 44-48. Pearlman, W.A. and Senge, G.H. (1979). Optimal quantization of the Rayleigh probability distribution. IEEE Trans. Communications 27, 101-112. Pierce, J.N. (1970). Asymptotic quantizing error for unbounded random variables. IEEE Trans. Inform. Theory 16, 81-83. Pisier, G. (1989). The Volume of Convex Bodies and Banach Space Geometry. Cambridge University Press, Cambridge. Pollard, D. (1981). Strong consistency of k-means clustering. Ann. Statist. 9, 135-140.
222
Bib~ography
Pollard, D. (1982a). Quantization and the method of k-means. IEEE Trans. Inform. Theory 28, 199-205. Pollard, D. (1982b). A central limit theorem for k-means clustering. Ann. Probab. 10, 919-926. PStzelberger, K. and Felsenstein, K. (1994). An asymptotic result on principal points for univariate distributions. Optimization 28, 397-406. PStzelberger, K. (1998a). Asymptotik des Quantisierungsfehlers. Quantisierungsdimension, Verallgemeinerung des Satzes von Zador und Verteilung der Prototypen. Preprint. PStzelberger, K. (1998b). Asymptotik des empirischen Quantisierungsfehlers und Konsistenz des Sch~tzers oder Quantisierungsdimension. Preprint. PStzelberger, K. and Strasser, H. (1999). Clustering and quantization by MSPpartitions. Preprint. Rachev, S.T. (1991). Probability Metrics and the Stability of Stochastic Models. Wiley, Chicester. Rachev, S.T. and Riischendorf, L. (1998). Mass Transportation Problems. Vol. 1 and Vol. 2. Springer, New York. Rdnyi, A. (1959). On the dimension and entropy of probability distributions. Acta Math. Sci. Hung. 10, 193-215. Rhee, W.T. and Talagrand, M. (1989a). A concentration inequality for the k-median problem. Math. Oper. Res. 14, 189-202. Rhee, W.T. and Talagrand, M. (1989b). On the k-center problem with many centers. Oper. Res. Letters 8, 309-314. Rogers, C.A. (1957). A note on coverings. Mathematika 4, 1-6. Sabin, M.J. and Gray, R.M. (1986). Global convergence and empirical consistency of the generalized Lloyd algorithm. IEEE Trans. Inform. Theory 32, 148-155. Schief, A. (1994). Separation properties for self-similar sets. Proc. Amer. Math. Soc. 122, 111-115. Schulte, E. (1993). Tilings. Handbook of Convex Geometry, 899-932. (eds., P.M. Gruber and J.M. Wills). Elsevier Sciene Publishers. Semadeni, Z. (1971). Banach Spaces of Continuous Functions. Polish Scientific Publishers, Warszawa. Serinko, R.J. and Babu, G.J. (1992). Weak limit theorems for univariate k-mean clustering under a nonregular condition. J. Multivariate Anal. 41, 273-296. Serinko, R.J. and Babu, G.J. (1995). Asymptotics of k-mean clustering under noni.i.d, sampling. Statist. Probab. Letters 24, 57-66.
Bibfiography
223
Shannon, C.E. (1959). Coding theorems for a discrete source with a fidelity criterion. IRE National Convention Record, Part 4, 142-163. Singer, I. (1970). Best Approximation in Normed Linear Spaces by Elements of Linear Subspaces. Springer, Berlin. Small, C.G. (1990). A survey of multidimensional medians. Int. Statist. Review 58, 263-277. Sp~th, H. (1985). Cluster Dissection and Analysis. Ellis Horwood Limited, Chichester. Stadje, W. (1995). Two asymptotic inequalities for the stochastic traveling salesman problem. Sankhy~ 57, Series A, 33-40. Steinhaus, H. (1956). Sur la division des corps materiels en parties. Bull. Acad. Polon. Sci. 4, 801-804. Stute, W. and Zhu, L.X. (1995). Asymptotics of k-means clustering based on projection pursuit. Sankhy~ 57, Series A, 462-471. Su, Y. (1997). On the asymptotics of quantizers in two dimensions. J. Multivariate Anal. 61, 67-85. Tarpey, T. (1994). Two principal points of symmetric, strongly unimodal distributions. Statist. Probab. Letters 20, 253-257. Tarpey, T. (1995). Principal points and self-consistent points of symmetric multivariate distributions. J. Multivariate Anal. 53, 39-51. Tarpey, T. (1998). Serf-consistent patterns for symmetric multivariate distributions. J. Classification 15, 57-79. Tarpey, T., Li, L., and Flury, B.D. (1995). Principal points and self-consistent points of elliptical distributions. Ann. Statist. 23, 103-112. Tou, J.T. and Gonzales, R.C. (1974). Pattern Recognition Principles. AddisonWesley, Reading. Trushkin, A.V. (1984). Monotony of Lloyd's method II for log-concave density and convex error weighting function. IEEE Trans. Inform. Theory 30, 380-383. Vajda, I. (1989). Theory of Statistical Inference and Information. Kluwer, Dordrecht. Wagner, T.J. (1971). Convergence of the nearest neighbor rule. IEEE Trans. Inform. Theory 17, 566-571. Webster, R. (1994). Convexity. Oxford University Press, Oxford. Williams, G. (1967). Quantization for minimum error with particular reference to speech. Electronics Letters 3, 134-135. Wong, M.A. (1982). Asymptotic properties of bivariate k-means clusters. Comm. Statist. Theory Methods. 11, 1155-1171.
224
Bibliography
Wong, M.A. (1984). Asymptotic properties of univariate sample k-means clusters. J. Classification 1, 255-270. Yarnada, Y., Tazaki, S., and Gray, R.M. (1980). Asymptotic performance of block quantizers with difference distortion measure. IEEE Trans. Inform. Theory 26, 6-14. Yamamoto, W. and Shinozaki, N. (1999). On uniqueness of two principal points for univariate location mixtures. Statist. Probab. Letters 46, 33-42. Yang, M.-S. and Yu, K.F. (1991). On a class of fuzzy c-means clustering procedures. Proceedings of the 1990 Taipei Symposium in Statistics, 635-647, (eds., M.T. Chao and P.E. Cheng). Institute of Statistical Science, Academia Sinica, Taipei. Yukich, J.E. (1998). Probability Theory of Classical Euclidean Optimization Problems. Lecture Notes in Math. 1675. Springer, New York. Zador, P.L. (1963). Development and evaluation of procedures for quantizing multivariate distributions. Ph.D. dissertation, Stanford Univ. Zador, P.L. (1982). Asymptotic quantization error of continuous signals and the quantization dimension. IEEE Trans. Inform. Theory 28, 139-149. Zemel, E. (1985). Probabilistic analysis of geometric location problems. SIAM J. Algebraic Discrete Methods 6, 189-200. Zopp~, A. (1997). On uniqueness and symmetry of self-consistent points of univariate continuous distributions. J. Classification 14, 147-158.
Symbols B(~,~) B(a, r) ~(~)
closed ball with center a and radius r, 8
o
cl
Cr,,r(P), C~,r (X) C~,oo(A) cony o~
cr(P), cr(x) Da, D~ DE(c) det(h)
Dr(a,b) dH diam(A) dimB(K) dimB(K) dimB(K) dimn(A) dimn(P) dimn(P) dims(P) dimn(P)
D,,,r(P) D..~,D__r(P) Dr, Dr(P) Dr, Dr(P) D__~(K),D~(K), D~(K)
open ball with center a and radius r, 165 Borel sets, 20 closure, 11 set of all n-optimal sets of centers for P (for X), 31 set of all n-optimal sets of centers for A of order c~, 137 convex hull of ~, 17 set of all centers of P (of the random variable X) of order r, 20, 20 lattices, 117, 118 double exponential distribution, 67 111 double Gamma distribution, 99 Hausdorff metric, 57 diameter of A, 24 lower box dimension of K, 158 upper box dimension of K, 158 box dimension of K, 159 Hausdorff dimension of A, 157 Hansdorff dimension of P, 158 upper rate distortion dimension, 161 lower rate distortion dimension, 161 rate distortion dimension, 162 set of all n-optimal quantizing measures for P of order r, 59 lower quantization dimension (of P) of order r, 155 upper quantization dimension (of P) of order r, 155 quantization dimension of P of order r, 155 (lower, upper) quantization dimension of K of order co, 155
226
Symbols d(x,A) E(c) e~,r(P) e,~,~(A) en,oo(P)
7~ GL(a,b) ar(~,b) H(a,b) HE(a,b) hr ?-lS(A)
H.(P) H(P), H(X) I6 int
I(P,Q) i(a) Med(X)
M,~,,.(A) M,~,~(A) Mr(A)
M~(A) id(O, ~), N(O, 1) N(e,A) P. P(a,b) p I , p o f -1
Pr es
Qr(A)
Q(L)([0, 1] 6)
Qr(P), Qr(X) Q(~R)([O,1]6)
distance from x to A, 8 exponential distribution, 67 = Vn,r(P)Ur, 137 n-th covering radius for A, 137 138 set of n-quantizers, 30 generalized logistic distribution, 99 generalized Gamma distribution, 99 Leibnitz halfspace, 9 hyper-exponential distribution, 99 94 s-dimensional Hausdorff measure of A, 157 restriction of a (Hausdorff) measure to M, 165 Renyi entropy of P, 133 differential entropy of P (of X), 133, 134 unit matrix, 54 interior, 9 average mutual information of P and Q, 161 logistic distribution, 71 set of medians of a real random variable X, 22 normalized n-th quantization error for A of order r, 31 138 57 normalized r-th moment of A, 20 146 d-dimensional normal distribution, 54, 106 normal distribution, 55 146 absolutely continuous part of P, 78 Pareto distribution, 99 image measure, 33, 162 set of discrete probabilities with at most n points in the support, 33 94 singular part of P, 78 r-th quantization coefficient of A, 78, 81 r-th lattice quantization coefficient of [0, 1]d, 114 r-th quantization coefficient of P (of X), 81 r-th regular quantization coefficient of [0, 1]a, 110
Symbols Qoo(A)
Q~)([o, 1V) q~
Rp,r S(a, b) s~,r(P), s~,r(x) SS,~,r(P), SS,~,~(X)
o% supp(/~)
T(a, b;c) U(A) VarX
U~,~(P), W,~,~(X) Vr(P), V~(X)
W(al~) W(a, b) W0(~l-) A
~x
r(a,b) r(e) A~ #(-IA) pr (7-
Iol tT[m (7_
IAI 1A D ----+ 0
IJhll~
227 covering coefficient of A (or quantization coefficient of order co), 145 lattice covering coefficient of [0, 1] d, 148 191 rate distortion function, 161 separator of a and b, 11 set of all n-stationary sets of centers for P (for X) of order r, 39 set of all elements of S~,r(P) (of S,~,r(X)) whose Voronoi diagram is a P-tesselation, 39 192 topological support of a finite measure #, 24, 165 triangular distribution, 123 uniform distribution on A, 20 variance of a real random variable X, 22 n-th quantization error for P (for X) of order r, 30 r-th moment of P (of the random variable X), 20, 20 Voronoi region generated by a E c~, 8 Weibull distribution, 73, 127 open Voronoi region generated by a E c~, 9 symmetric difference, 27 point mass at x, 25 Gamma distribution, 72 maximal finite antichain generated by e, 192 1-dimensional Lebesgue measure, 55 d-dimenisonal Lebesgue measure, 13 26 Lr-minimM metric, 33, 140 immediate predecessor of a, 191 length of a, 191 restriction of a to m, 191 a is a predecessor of 7-, 191 cardinality of A, 13 indicator function of the set A, 31 weak convergence, 57 boundary, 10 Lr(P)-norm of g, 137 Lp(Ag)-(quasi-)norm of h, 78
228
Symbols
V+f(x, y) [~] (x,y) [I TI
one-sided directional derivative, 23 gradient, 23 integer part of the number x, 82 scalar product, 16 norm on R d, 7
Index empirical version, 34, 57, 151 euclidean norm, 16 Existence theorem, 47, 139 exponential distribution, 67, 70
admissible, 111 arithmetic, 203 asymptotic covering radius, 142 asymptotic quantization error, 78 asymptotically n-optimal, 93, 96 attractor, 190 average mutual information, 161
finite antichain, 191 fundamental parallelotope, 111
Ball packing theorem, 50 Boundary theorem, 13 box dimension, 159
Gamma distribution, 72 generalized Gamma distribution, 213 generalized logistic distribution, 210
Cantor distribution, 206 Cantor set, 172, 206 center of P of order r, 20 checkerboard, 117 cluster analysis, 34 compact differentiable manifolds, 171 consistency, 62, 153 contracting similarity transformation, 190 contraction number, 190 covering, 9 covering coefficient, 145 cube quantizer, 52 curve, 180
Hausdorff dimension, 157 Hausdorff dimension of a measure, 158 Hausdorff metric, 57 hexagonal lattice, 116 hyper-exponential distribution, 99
d-asymptotics, 150 density of the thinnest covering, 146 density of the thinnest lattice covering, 149 diameter, 24 differential entropy, 133 directional derivative, 23 double exponential distribution, 69 double Gamma distribution, 99 dual lattice, 118 empirical measure, 34, 61
invariant distributions, 95 invariant set, 190 /p-norms, 8 Lr-Kantorovich metric, 33 Lr-minimal metric, 33, 140 Lr-Wasserstein metric, 33 lattice, 111 lattice covering coefficient, 148 length of a curve, 180 length of a, 191 locally finite, 8 logistic distribution, 71 lower box dimension, 158 lower quantization dimension of order r, 155 lower rate distortion dimension, 161 ~packing, 50 /~-tesselation, 15 n-optimal partition of order r, 32
230
n-optimal quantizer of order r, 30 n-optimal quantizing measure of order r, 34 n-optimal set of centers, 31 n-optimal set of centers for A of order co, 137 n-quantizer, 30 n-stationary set of centers of order r, 39 n-th covering radius, 137 n-th quantization error of order r, 30 necessary conditions for optimality, 37, 38 normal distribution, 54 normalized n-th quantization error of order r, 31 normalized r-th moment, 20 normalized r-th moment of balls, 27 one-dimensional marginals, 49 one-tailed version, 65 open set condition, 172, 191 open Voronoi region, 9 packing, 50 parametrization, 180 parametrization by arc length, 180 Pareto-distribution, 99 polyhedral set, 17 polytope, 17 predecessor, 191 product quantizer, 42 quantization coeffÉcient of order co, 145 quantization dimension of order r, 155 r-th r-th r-th r-th
(absolute) moment of P, 20 lattice quantization coefficient, 114 quantization coefficient, 81 regular quantization coefficient, 110 rate distortion function of order r, 161 Rayleigh distribution, 73 rectifiable, 180 regular hexagon, 116 regular n-quantizer, 109
Index
regular of dimension D, 165 Renyi entropy, 133 s-dimensional Hausdorif measure, 157 scaling number, 15, 190 self-similar measure, 190 self-similar set, 172 separator, 11 Sierpinski gasket, 172 sign-symmetric distributions, 96 similarity dimension, 172, 190 similarity transformation, 15 smooth norm, 23 space-filling, 107 space-filling by translation, 107 spherical distribution, 53 standard lattice, 116 star-shaped, 9 strictly convex norm, 11 strong separation condition, 190 strongly unimodal distribution, 64 surfaces of convex sets, 169 symmetric distributions, 65 tesselation, 15 triangular distribution, 123 truncated octahedron, 118 uniform distribution, 20 unimodal distribution, 64 Uniqueness theorem, 22, 64 upper box dimension, 159 upper quantization dimension of order r, 155 upper rate distortion dimension, 161 vector quantizer advantage, 104 volume of/p-balls, 28 yon Koch curve, 172 Voronoi diagram, 8 Voronoi partition, 9 Voronoi region, 7 Weibull distribution, 73, 123