Lecture Notes in Mathematics Edited by A. Dold and B. Eckmann
690 William J. J. Rey
Robust Statistical Methods
Springer-Verlag Berlin Heidelberg New York 1978
Author William J. J. Rey M B L E - Research Laboratory 2, Avenue van Becelaere B-1170 Brussels
Library of Congress Cataloging in Publication Data
Rey, William J J 1940Robust statistical methods. (Lecture notes in ~athematics ; 690) Bibliography: p. Includes indexes. 1. Robust statistics. 2. Nonparametric statistics. 3. Estimation theory. I. Title. II. Series: Lecture notes in mathematics (Berlin) ; 690. QA3.L28 no. 690 [QA2763 510'.8s [519.53 78-24262
AMS Subject Classifications (1970): Primary: 62 G 35 Secondary: 62G25, 62J05 ISBN 3-540-09091-6 ISBN 0-387-09091-6
Springer-Verlag Berlin Heidelberg NewYork Springer-Verlag NewYork Heidelberg Berlin
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to the publisher, the amount of the fee to be determined by agreement with the publisher. © by Springer-Verlag Berlin Heidelberg 1978 Printed in Germany Printing and binding: Beltz Offsetdruck, Hemsbach/Bergstr. 2141/3140-543210
FOREWORD
During
the
processing These
last nine
of b i o m e d i c a l
problems
assumptions
had
were
from unknown
years,
in common
without
were
the
the
ordinary
had to be r e l i a b l y
estimators
was
partly
presented
robust
estimators.
offset
of the
spurious
and v a r i a n c e estimators assess
in the
estimation.
is justified
their
without
properties,
resorting
The t h e o r e t i c a l function.
Applied
and r e g r e s s i o n aspects.
The primary
estimates
data
in this
sample.
tools
even
for
Due
toward
the
has
concern
is bias
reserved
form which
sample
sizes
design
of
any significant
of an erroneous
emphasis
small
This
methods.
is p r e v e n t i n g
analytical
model
or
reduction
to type M permits
(n=10
to
or 50),
arguments.
are m a i n l y
derivations
analysis.
special
drawn
of the
results.
oriented
second
samples
sophisticated
scatter
of the
selection
The
by their
to involved
are
concern
due to the
The
and the
of robust text
quality
and frequently
nevertheless,
comparison
solved by application
The methods
to
lot;
statistical
by the anthor.
of the usual
Poor
non-normal
estimated
needed to permit
in the
en c o u n t e r e d
fact that most
usually
parameters
been
problems
been
any solid basis.
distributions,
non-stationary,
several
data have
are
the jackknife in the
attention
fields
and the
influence
of location
is devoted
estimation
to c o m p u t a t i o n a l
TABLE OF CONTENTS
I. INTRODUCTION
I
1.1. History
and main contributions
1.2. Why robust
estimations
I
?
4
1.3. Summary
6
2. ON SAMPLING D I S T R I B U T I O N S 2.1.
8
Scope of section
2.2. Metrics
8
for p r o b a b i l i t y d i s t r i b u t i o n s
2.3. D e f i n i t i o n of robustness, 2.4. Estimators
8
breakdownpoint
seen as functional of d i s t r i b u t i o n s
2.5. The influence function of Hampel
15
3. THE JACKKNIFE 3.1.
17
Introduction
17
3.2. Jackknife theory 3.3.
Case
11 12
20
study
24
3.4. Comments
28
h. M - E S T I M A T O R S
30
4.1. Warning
30
4.2. M M - e s t i m a t o r s
32
4.3. M - e s t i m a t o r s 4.3.1.
Location and r e g r e s s i o n
4.3.2.
Least powers
4.3.3.
Can we expand in Taylor
4.3.4.
"Best" robust
4.4. M M - e s t i m a t o ~
4.5.
in location and r e g r e s s i o n
38 38 43
series ?
46
location estimator
~9
in regression
estimation
60
4.4.1. R e l a x a t i o n methods
62
4.4.2.
Simultaneous
64
4.4.3.
Some proposals
4.4.4.
Illustration
solutions
Solution of fixed point
65 68 and non-linear
equations
5. OPEN AVENUES 5.1. Estimators
72 77
seen as functional
of d i s t r i b u t i o n s
77
5.2. Sample d i s t r i b u t i o n of estimators 5.3. A d a p t a t i v e estimators
78 80
5.4. R e e u r s i v e
82
estimators
Vl
5.5.
Other
views
on r o b u s t n e s s
84
6. R E F E R E N C E S
88
APPENDIX
100
Consistent
101
estimator distributions
102
Contaminated
normal
Distribution
space
106
Influence
function
108
Jackknife
technique
109
Prokhorov
metric
114
derivative
120
118
Robustness von M i s e s AUTHOR SUBJECT
INDEX INDEX
~2~129
I.
INTRODUCTION
The t e r m statistical
"robustness" definition.
in
1953 to
cover
by
Kendall
and B u c k l a n d
does It
a rather
not
seems
vague
lend
itself
to h a v e
concept
(1971).
Their
to
been
a clearcut
introduced
by G . E . P .
described
in the
following
dictionary
states
:
Box way
" R o b u s t n e s s : M a n y test p r o c e d u r e s i n v o l v i n g p r o b a b i l i t y l e v e l s d e p e n d for t h e i r e x a c t i t u d e on a s s u m p t i o n s c o n c e r n i n g the g e n e r a t i n g m e c h a n i s m , e.g. t h a t the p a r e n t v a r i a t i o n is N o r m a l ( G a u s s i a n ) . If the i n f e r e n c e s are l i t t l e a f f e c t e d by d e p a r t u r e f r o m t h o s e a s s u m p t i o n s , e.g. if t h e s i g n i f i c a n c e p o i n t s of a t e s t v a r y l i t t l e if the p o p u l a t i o n d e p a r t s q u i t e s u b s t a n t i a l l y f r o m the n o r m a l i t y , the t e s t s on the i n f e r e n c e s are said to be r o b u s t . In a r a t h e r m o r e g e n e r a l sense, a s t a t i s t i c a l p r o c e d u r e is d e s c r i b e d as r o b u s t if it is not v e r y s e n s i t i v e to d e p a r t u r e f r o m the a s s u m p t i o n s on w h i c h it d e p e n d s . " This
quotation
various with
can be
associates
procedures.
expressed
applicability robust
clearly
statistical
as f o l l o w s
of a g i v e n
against
some
robustness The
how
we d e s i g n
a statistical
procedure
to
safe
of p o s s i b l e
data
set
I.I.
History
Due
to t h e
increased
and main
is the
met
domain
equivalently, ?
Second,
robust
or,
uncertainty
of is it
how
in o t h e r
in the
available
contributions
appearance
For
mathematics
relates
:
field
the and
as
analytical
thirty
a mode it has
can
stands
some
be
line
seen
twenty
as
received
Mainly,
algorithms in the
possibly
been
has
years.
in r e c u r s i v e it
as w e l l
of r o b u s t n e s s
last
However,
instance,
of l o c a t i o n ,
Thucydides
the
during
developments.
studies.
of i n v o l v e d
facilities,
attention
in n o n - l i n e a r
estimate
to be
of the
questions
?
computational
the n e w
or,
assumptions
should
in s p i t e
applicability
large
procedure
f r o m the
terms,
remain
with
complementary
: first,
statistical
departure
two
as
four
have
an
progresses permitted
of m a n y
old
a robust centuries
ago.
" D u r i n g t h e same w i n t e r (h28 B . C . ) , the P l a t a e a n s ... and t h e A t h e n i a n s who w e r e b e s i e g e d w i t h t h e m p l a n n e d t o l e a v e the c i t y and climb o v e r the e n e m y ' s w a l l s in the h o p e that t h e y m i g h t be able to f o r c e a p a s s a g e ... T h e y m a d e l a d d e r s e q u a l in h e i g h t to the e n e m y ' s w a l l , g e t t i n g the m e a s u r e by c o u n t i n g the l a y e r s of b r i c k s at a p o i n t w h e r e the e n e m y ' s w a l l on the side f a c i n g P l a t a e a h a p p e n e d not to have b e e n whitewashed. M a n y c o u n t e d the l a y e r s at the same t i m e , and w h i l e s o m e w e r e s u r e to m a k e a m i s t a k e , the m a j o r i t y w e r e l i k e l y to hit the true c o u n t , e s p e c i a l l y since t h e y c o u n t e d t i m e and a g a i n , and, b e s i d e s , w e r e at no g r e a t d i s t a n c e , and the part of the w a l l t h e y w i s h e d t o see was e a s i l y v i s i b l e . The m e a s u r e m e n t of the l a d d e r s , t h e n , t h e y got at in this way, r e c k o n i n g the m e a s u r e f o r m the t h i c k n e s s of t h e b r i c k s . " Eisenhart
(1971)
sophisticated
has
also
compiled
procedures
to
background
can also
other
estimate
examples
a location
of m o r e
or less
parameter
for
a set
of
measurements. Historical of a f f a i r s that
the
around
adoption
to u n m a n a g e a b l e accepted; seen
that
was
done
well
as in the
this
history
papers
surveyed
by H a r t e r
in the
complementarily
important
works
clarified estimation framework
of t h e i r
of H u b e r We w i l l
relations
and
finite
where
Particularly, derivations.
the
they
theory provide
past
of y o n
which
size
in w h a t
exist
time,
in the
an
more
(1972-1973) later
join
best
were
important
procedures report
series
as
of
of
specifically
developments
is r o b u s t n e s s
to
are
and them.
we n o w
sketch
some
:
Mises
practical
dogma
A remarkable
regard
state
largely
the
rejection
and t h e o r e t i c a l
(1973).
recent
that
of t h e
it a p p e a r s
as s e c o n d
was
with
found
With
Lecture
of the
At
can be
1972 W a l d
insight
seen
of n o r m a l i t y
by
Hampel
was
soundness.
time
account
Briefly,
of v a r i o u s
(197h-1976).
investigations the
dogma
discarded.
motivations
some m o r e
- The
techniques
investigation
up to the p r e s e n t
by the
(1973).
in d i s a g r e e m e n t
and h a d to be
history,
To g a i n
The
observations
in the
be g a i n e d
in S t i g l e r
squares
approaches.
assessment
written
robustness
given
of l e a s t
is,
as e r r o n e o u s
effort
1900
(19h7)
between
and P r o k h o r o v
asymptotic
situations.
of r o b u s t n e s s justifications
has
been
for the
(1956)
theory
of
They
provide
able
to grow.
analytical
have
a neat
- A mathematical by
Tukey
and
(1958),
estimate
the
distribution released
- A review
on h o w
of t h e - The
answers He
famous
to t h e
the
principles has
into
question
on h o w to
design
asymptotic sample)
have
Further, family
underlying we
here
augmented
the bias
regard
for the
is thus
rejection and
of
experimental
appearing
in the
places,
for o t h e r
the
normal
estimators
estimation
pioneer
p-values.
work
squares
seems
to be
and c h a r a c t e r i z e s location,
normal.
In this
minimization estimations
of G e n t l e m a n
The m a x i m u m
procedures.
set of u n d e r l y i n g
of the
through
least
of
F (the
suitable
M-estimators
obtained
origin
: "A c o n v e n i e n t
is a c o n t a m i n a t e d
classical
at the
statistical
excerpt
some
for the
estimators the
has b e e n
(n + ~) w h e n
over
introduces
them
distribution
find the
their to
variance
ranges
he
among
p of r e s i d u a l s ;
referred
to t h e
robust
we
for a s y m p t o t i c a l l y
of the
framework,
and
assumptions.
theoretical
196h,
of d i s t r i b u t i o n s ;
of the
power
any
distribution
observations
in
of r o b u s t n e s s
the
without
pertaining
account
(1956)
to reduce
statistician
stimulated
of H u b e r ,
supremum
when
The
paper
distributions"
permits
distributions.
domains
robust
set.
questionable
distribution
a most
Quenouille
estimators
data
(1960)
to t a k e
sample
considers
measure
of the
by
technique,
of m o s t
the
frequently
by A n s c o m b e
researches tails
variance
some
proposed
jackknife
underlying
of
outliers
trick
the
(1965)
likelihood
of some (p = 2)
is
estimators
also
are M - e s t i m a t o r s . A thesis
-
introduces
the
an e s t i m a t o r estimation other
far
we r e f e r
Statistics,
worthy
Among
Berger -
(N.C.H.S.
the
(1971), (1976a,
Thus,
to d e p e n d
work,
the
sensitivity
it p e r m i t
of
to modify
on o u t l i e r s ,
or on any
observations. of t h e
interested with
list
that
question and,
not
to yon M i s e s ' exhibiting
the
period
prior to
r e a d e r to the U.S.
1970
annotated
National
Center
is bibliography
for H e a l t h
- 1972).
the
to m e n t i o n
them,
Jaeckel
the
a tool
values.
literature
arrangement
To c o n c l u d e
-
be
as the
of the
related
as
in o r d e r
feature
under
(1968), curve
observation
procedures
concerned, prepared
Hampel
to the
specific - As
by
influence
of m a i n
many
of what
whether
theoretical
fundamental we
really
estimators
are
contributions,
questions estimate
remain
it m a y
open.
is a p p r o a c h e d
admissible
by
is a n a l y s e d
by
1976b).
Complementary
to the t h e o r e t i c a l
progress,
experience
has
been
gained
through
Princeton has
Monte
Study
displayed
Carlo
(Andrews
computer
et
in an o b v i o u s
al.
runs.
- 1972)
w a y the
In this
specially
need
for
respect, deserves
robust
the mention;
it
statistical
procedures.
1.2.
robust estimat|ons
Why
It seems robust
statistical
computing analysis
robust
with
methods In
computers
sets
then
as w e l l
contribute
the m a i n
lies
in the
short,
rather
softwares.
draws
to the
the
that
classical
elaboration
for making
overwhelming
frequently,
robustness
use
of
of our statistical
sets
are
results
produced
deficiencies
applied
by
of the
statistical
is e s s e n t i a l
statistics.
and to the
data
with
on p o s s i b l e of the
power
to p e r f o r m
Comparison
attention
it a p p e a r s
to
motivation
it is so easy
as on l i m i t a t i o n s
Thus,
complementary
that
that,
by u n s u i t a b l e
methods
procedures. is
author
facilities.
processed
data
to t h i s
?
One
and
validation
of
the
because
it
other must
statistical
conclusions. Many apply has
statisticians
when
been
the
see e n t r y
arrive
than at
normal
by t h e
the m o s t
illustrated
by T u k e y
(1960)
distributions.
deviation.
efficient
robustness
have
a long
normal that
been
layed
excerpt
:
scale
Some
paper down
large
fact m u s t
some
justifications (1973).
kept
study
- and
are r e q u i r e d
deviation
standard
This
by H a m p e l
be
this
observations
the
This
of
appendix
sizes
standard
that
study
extended
of the
thousand
0.95)
they
assumptions.
in his
sample
by the
eight
the m e t h o d s the
further
distribution"
(at l e v e l
estimator.
incentive
We h a v e
quite
of the
to g u a r a n t e e
In a r a t h e r
propose
satisfy
conclusion
mean
can be
strictly
the m e a s u r e m e n t
disposal
how poor
do not
"contaminated
at t h e
justify
do not k n o w
sets
brillantly
contaminated -
data
to
rather should
be
deviation
is
in m i n d .
to the use
of
Hereunder,
we
"What do those "robust estimators" intend ? S h o u l d we g i v e up our f a m i l i a r and s i m p l e m o d e l s , such as our b e a u t i f u l a n a l y s i s of v a r i a n c e , our p o w e r f u l r e g r e s s i o n , or our h i g h - r e a c h i n g c o v a r i a n c e m a t r i c e s in multivariate statistics ? The a n s w e r is no; but it m a y w e l l be a d v a n t a g e o u s to m o d i f y them slightly. In f a c t , g o o d p r a c t i c a l s t a t i s t i c i a n s h a v e d o n e such m o d i f i c a t i o n s all a l o n g in an i n f o r m a l way; we n o w o n l y start to h a v e a t h e o r y a b o u t t h e m . Some l i k e l y a d v a n t a g e s of such a f o r m a l i z a t i o n are a b e t t e r i n t u i t i v e i n s i g h t into t h e s e modifications, improved applied methods (even r o u t i n e m e t h o d s , for some a s p e c t s ) , and the c h a n c e of h a v i n g p u r e m a t h e m a t i c i a n s c o n t r i b u t e s o m e t h i n g to the p r o b l e m . Possible disadvantages may arise along the usual transformations of a t h e o r y w h e n it is u n d e r s t o o d less and less by m o r e and m o r e people. D o g m a t i s t s , who i n s i s t e d on the use of " o p t i m a l " or " a d m i s s i b l e " p r o c e d u r e s as l o n g as m a t h e m a t i c a l t h e o r i e s c o n t a i n e d no o t h e r c r i t e r i a , m a y n o w be g o i n g t o i n s i s t on " o p t i m a l r o b u s t " or " a d m i s s i b l e r o b u s t " e s t i m a t i o n or t e s t i n g . T h o s e who h a b i t u a l l y try t o lie w i t h s t a t i s t i c s , r a t h e r t h a n seek f o r t h r u t h , m a y c l a i m even m o r e d e g r e e s of f r e e d o m for t h e i r w i c k e d doings. "Now w h a t are the r e a s o n s f o r u s i n g r o b u s t procedures ? T h e r e are m a i n l y t w o o b s e r v a t i o n s w h i c h c o m b i n e d g i v e an a n s w e r . O f t e n in s t a t i s t i c s one is u s i n g a p a r a m e t r i c m o d e l i m p l y i n g a v e r y l i m i t e d set of p r o b a b i l i t y d i s t r i b u t i o n s t h o u g h t p o s s i b l e , such as t h e c o m m o n m o d e l of n o r m a l l y d i s t r i b u t e d e r r o r s , or t h a t of exponentially distributed observations. Classical (parametric) statistics derives r e s u l t s u n d e r the a s s u m p t i o n t h a t t h e s e m o d e l s w e r e s t r i c t l y true. However, apart f r o m some s i m p l e d i s c r e t e m o d e l s p e r h a p s , such m o d e l s are n e v e r e x a c t l y true. We m a y t r y to d i s t i n g u i s h t h r e e m a i n r e a s o n s for the d e v i a t i o n s : (i) r o u n d i n g and g r o u p i n g and o t h e r " l o c a l i n a c c u r a c i e s " ; (ii) the o c c u r r e n c e of " g r o s s e r r o r s " such as b l u n d e r s in m e a s u r i n g , w r o n g d e c i m a l p o i n t s , e r r o r s in c o p y i n g , i n a d v e r t e n t m e a s u r e m e n t of a m e m b e r of a d i f f e r e n t p o p u l a t i o n , or just " s o m e t h i n g w e n t w r o n g " ; (iii) the m o d e l m a y have b e e n c o n c e i v e d o n l y as an a p p r o x i m a t i o n a n y w a y , e.g. b y v i r t u e of the central limit theorem".
1.3.
Summary
It a p p e a r s fact
must
be
that
most
faced
difficulty.
When
and
concerning
at d i s p o s a l
as w e l l
factors thus
(e.g.,
parameter. to t h e i r
given
any
model
very at the
is the
robustness Section compare
level;
2 provides
that
difficulty
the to
its
but
the
some
and,
the
by r e f e r e n c e sizes.
estimators
we w i l l
under
encounter
parameter.
appears
may
However
what
be u n c l e a r .
at the
theory
of the
and p e r m i t
root
of all
h is the
on
leads
central
(or
part
and
covariance)
several
generalizations
in m a n y
respects.
5 reconsiders
a few
met,
on the
6 is a l i m i t e d
the
not
are
has
the
been
to form
the
possible
in
We m a y h o w e v e r
scarcely of r o b u s t n e s s
several
of this
side
so far,
through
jackknife are
results.
It is a d e t a i l l e d
M-estimators problems.
they
which were
(denoted
This
developments
questions
because
the
as w e l l
derivations
conjectural
text.
of p r e v i o u s
specific
in e s t i m a t o r s
presented
in r e g r e s s i o n
original
are,
permit
to v a l i d a t e
years.
bias
of s i m u l t a n e o u s
presents
questions
result
to a d e f i n i t i o n
of p o s s i b l e
substantiate
left
has
thirty
of v a l i d i t y
of our k n o w l e d g e ,
applications
but
last
which
function.
variance
of M - e s t i m a t o r s
The
is p r e s e n t e d
demonstration the
directly
reduction
to
derivations
value.
argument
conditions
influence
3 provides
emphasis
an
during
needed
original
Section
and it
asymptotic
a strict
attempts
To the b e s t
of t h e
therefrom,
various
estimate
with
sample
compare
few t h e o r e t i c a l
expansion;
This
as e s t i m a t i o n
Most
infinite to
to
analysed
is e r r o n e o u s
method.
Section
be
assumption
conceptual
as to the
previously
parameter
sample
issues
These
an e s t i m a t o r
will
for
of t h e
the
principles.
restrictive.
with
obtained
quality
function).
of the
it is p o s s i b l e
the
concerning
loss
estimators
some
the m o d e l
several
Section
optimal
this
conceptual
is due to
to
structure,
analysis
model,
uncertainty
some
basis;
it
consistent
spite
Section
to
a weak
are
obtained
as w e l l
partly
of c o m p a r i n g
asymptotic
an e s t i m a t o r
conjecture
all
have
required,
definition
values
which
when
of a T a y l o r - l i k e
of
of an
clear
often
estimators
methods
statistical
any p o s s i b i l i t y
asymptotic
is e s t i m a t e d This
the
are
as, p e r h a p s ,
In p r a c t i c e ,
Furthermore,
robust
methods
selection
prohibit
suppress
robust
can be r e l a t e d
those
uncertainty
involved
of the
and
have not
MM),
part is
been
timely.
open.
bibliography.
We h a v e
tried
to
favor
recent
papers
and
surveys
investigation. very
We
arbitrary.
in the
domains
are w e l l
aware
Furthermore,
we
which
are
of the feel
secondary
fact
that
limited
to
such
our m a i n selection
in r e a d a b i l i t y ,
time
is and
space. Throughout asymptotic finite methods
this
text,
properties
sample
we
have
and h a v e
estimators.
in a p p l i c a t i o n s .
This
not
devoted
placed option
the
much
accent
results
attention
to the
on the b e h a v i o u r
from
the
need
of
of r o b u s t
2.
ON SAMPLING DISTRIBUTIONS
2.1.
Scope o f
the section
Hereinafter,
we
relationships relative
intend
existing
to
the
to
sketch
between
relations
two
theories
distributions
between
relative
for
distributions
one and
of
to
the
them,
and
estimators
for
the
other. To
fix
the
ideas,
consider
m =
[ w i xi,
the
following
standard
set
of
equations
(i=I ..... n ) ,
=
s
X w.(x.-m)
=
1
2
1
,
= lira m, n -* ~, 2 2 c* = lira s , n ÷ ~. Under of
appropriate
the
the
one
dimension
present
relative
conditions, sample
formulation
weight
wi,
it
be
(Xl,...,Xn)
each
which
can
used , as
to
estimate
well
as
the
its
location
scale
o.
observation
could
x. has b e e n a t t r i b u t e d I reflecting its i m p o r t a n c e or
be
In a its
accuracy. To
investigate
underlying such
the
that
The
xi,
interest
the
present the
This
will
case,
the
asymptotic
s
2
=
~
2
+
lie
in t h e
itself
2.2.
Metrics
for
indicated
to
easy
probability by
to
w i as are the
[ w i [(xi-u)2
Munster
Baire
~ on t h e to
lead
sample with
well
to
distributions
"closeness"
distribution
respect
as t o
the
enlightened scale
- ~2I
distribution
compare
of
concern.
dependence
weight
and
a tool
distributions first
relations
lends
As
the
U
need
our
estimators;
which
distributions
also
in
relative
be
of
we
between
will
specifically, to
dependence
"closeness"
estimators.
more
the
observations,
by
parameter
This
will
of m
each
sample
also
be
our
n.
s or,
In t h e
in t e r m s
given
(x i - u)
and
observation
size
expansions is
- [ [ wiw j
analysis.
to
(xj
of
by
- u),
second
concern.
distributions (197h),
functions,
we nor
do the
not
have
to
application
restrict space
our to
Borel
sets;
although,
in p r a c t i c e ,
we
of f u n c t i o n s
and the
Euclidean
distribution
we m a k e
use
o v e r the the
n-dimension
appendix
concerned one
at the
here
another;
be
reduced
only
assess
to the
with
the
are
the
in our
help
They
- They may
- They
context
simple
in
been
distributions
and,
thus,
been
they
proposed
are
revie~
to
generally
of K a n a l
following
not
(197~)
or
deficiencies
of c o n t i n u o u s
appropriate
to
compare
its u n d e r l y i n g and,
we n o w
latter
an e m p i r i c a l
parent.
thus,
are not
applicable
is,
need
accordingly conclude,
former
limit
respect,
the
I.~, in
a subset
The
achieved
over
Prokhorov
subsets
present
metric level
is - see
understanding
the
of
here
association
sample
space;
measure of the
of
the
of the
is d e r i v e d sample
considerations.
of p r o b a b i l i t y holds
an i l l u s t r a t i o n .
it
details
probability
distance
supremum
inequality
1956),
its
to
of a d i s c r e t e
of t h e
of the
of d i m e n s i o n a l i t y to
suitable
just m e n t i o n e d
one t h r o u g h
subset.
to the by
helpful
a continuous
measure
comparison
help
the
a very with
in the
points
the
triangular
ourselves
help
three
with
(sec.
involved
- To
the
is r e l a t i v e
incidentally
in this
rather
with
this
distance
Prokhorov
a continuous
measures
the
only
permits
independent
its d e f i n i t i o n accordingly, only
with
over
probability therefore,
by
consider
inequality, distribution
its p a r e n t . the
appendix
performed
distribution
from the
from
metric
of the
is t h e n
empirical
although
in the
Prokhorov
observation
We
later
the
closeness
triangular
an
proposed
distribution
comparison
the
knowledge
idea
metric"
The
empirical
and,
not ~ith
differing
of our
has
analytics,
above.
fact,
m a y be to
:
one-dimension
compare
its m a i n
"Prokhorov
and
in
only
will
have
the
(1976),
of the
are
satisfy
to
distribution To the b e s t
each
but,
strictly
rarely
condition
its
be
Rn .
on
our
distances
distribution
be
given
- We w i l l
estimators
however
through
of Chen
a measure
distributions,
class
defined
are
distributions
between
definitions
Going
paper
for m o s t
function
space"
close
Baire
discontinuous
precisions
sampling
of d i s t r i b u t i o n s ,
of the
(discrete)
of t h e i r
the
of i n t e r e s t .
context.
provide
how
of c l o s e n e s s
with
fundamental Dirac
"distribution
of " d i s t a n c e "
closeness
observed -
are
at ease The
classical
of m e a s u r i n g
closeness
number
acceptable
on
assessment
distributions
A great
is the
feel R n.
R n - Complementary
section
by ways
the
of
space
only space
space In
measures
true. definition
and will
of i n f o r m a t i o n .
We
10
We
estimate
densities
Distribution
where
the
distance
of p r o b a b i l i t y
two
distributions
some
derived
from
f through
g(x)
=
f(x)
+ t ~(X-Xo),
(l-t)
centered
of the
on x0,
sample
space
integration
simplicity, that
variable
we a s s u m e
g ( x - x 0)
I
is a D i r a c
if x ~ x 0
f ~(X-Xo)dX
the
0 < t <
and
by t h e
space.
i.e.
~(x-x O) = O,
with
defined
n-dimension
g is
x is any p o i n t
function
between
f and g o v e r
that
spanning
f(x)
= I
the
satisfies
whole
R n space.
the m e a n
value
For
theorem
in x 0
is F(r)
=
f f(x)
dx = f(x O) V(r)
I X-Xoi < r where V(r) is the
volume
distance
of the b a l l
between
The
inequality
can be
E may
[~nl2/p(1+nl2)]rn
of r a d i u s
f and g is g i v e n
d(f,g)
therefrom
=
be
= sup inf r E omitted
seen
r centered
Then
the
by
{E:G(r)
when
% F(r+E)
f(x)
as a f u n c t i o n G(r)
in x O.
is
+ E}.
sufficiently
regular,
of r in
= F(r+e)
+ E
or t + (l-t) and the
distance
r-values.
This
is g i v e n supremum
fCx0)
V(r)
= f(x O) V ( r + e )
by the m a x i m u m is r e a l i z e d
g-value
by r=O.
We
for
+ E all p o s i t i v e
obtain
the
recursive
solution d(f,g) where
the
general
last
terms
term we
= t - f(x 0) V [ d ( f , g ) ]
is n e g l e g i b l e
observe
that
with
small
t and n >
I.
In q u i t e
11
d(f,g) but
we m u s t
frequently
2.3.
warn
Definition
This Hampel
the
requires
of
paragraph (1971)
reader fairly
< t <
I;
: Evaluation involved
robustness,
of t h i s
Prokhorov
distances
arguments.
breakdown-point
essentially
restates
in o t h e r
- For technical
details,
the
words
reader
the
viewpoint
is r e f e r r e d
of
to the
appendix. We c o n s i d e r distribution let
e n be the
noted and
terms
e
a more
The
and g(x);
to
drawn
estimate
distribution
f(x).
from
of this
generally
valid model,
say g(x).
We
dependent
upon
scarcely
is we e x p e c t
¢(en,f)
some
some p a r a m e t e r
But
if it is that
{ X l , . . . , x n}
be u s e d
sampling
upon
or less
is r o b u s t
n
f(x)
set w i l l
and d e p e n d s
have
that
between
of o b s e r v a t i o n s
this
estimate.
¢(en,f)
only
a set
f(x);
we
8,
estimate
do not say
know
is f(x)
in c o a r s e
the
difference
and ¢ ( a n , g )
to be
close. More
precisely,
%
is
said to be
robust
with
respect
to
distribution
n
g (and to
f)
if d(f,g)
for
small
although
positive ~ is not
This are
not
value
close
respectively
< ~ ~ d[ ¢ ( a n , f ) , ¢(en,g)]
a and
~.
so s m a l l
~
is t h e
enough
It m a y but
so-called
(differ
on f(x)
and
happen
remains
more
on g(x)
that
<
e
small
inferior
to
breakdown-point.
than may
be
5*),
the
quite
c be a l l o w e d
some
critical
If f(x)
estimates
different.
value
and g(x)
of 8 n b a s e d Then,
e
is n
not
anymore The
robust.
limited
realism
of t h e s e
definitions
must
be
indicated.
The
condition d(f,g) covers fact
any
that
some
extraneous estimator
distribution
is
particularly family.
f(x),
reasons.
f(x)
in the
close
to g ( x ) ,
Thus,
it
seen
as r o b u s t
when
f(x)
Nevertheless,
~-neighbourhood can be
sometimes
for
some
is r e s t r i c t e d there
<
exist
not
acceptable
appears
subset
of
that
despite
the
for
a non-robust
distributions
to be m e m b e r
non-robust
of g(x)
of some
estimators;
f(x),
parametric a startling
12
case
is t h e
mean,
contaminations c ~
I).
the
median
with
In this
~
of
=
leading
I/2.
to
emphasis
on
Up
2.4. In
of
let
seen on
"differentiable and
The as
basic
points
rather The
are
an
and now
of t h e in
are
which
is two
interested
be
in the
of
robustness
already
mentioned to
of
idea
is
least
distributions of
way
sample
some
of the
of t h e
let
sample
as w e l l
a~d
space
g, an
present
between
the
two
Mises.
can
be
estimator, T(f)
assume
as c o n t i n u o u s ,
this
of
of v o n
which
it be we
sample.
of
domain
argument
f and
space,
we
yon
expansion
underlying context
Hereinafter
the
distributions,
Taylor-like
defined
distributions;
difference
the
subset.
distributions
some
special
particularly
respect
introduce
over
with
The
delineation
discrete
of the
(1976)
with
distribution
defined
without
samples.
(1964),
distribution
us
Hampel
contrary
"best"
functions".
let
outlying
(large
robustness
(1969).
a not-clearly
on t h e s e
could
Huber
On the extreme
definition
properties
original
a convenient
a functional,
of
in a h e u r i s t i c
first,
material
in
estimators
both
asymptotic
by
be
small
mean
investigation
one-dimension
above
functionals
statistical
But
the
= 0.
can An
of a d i s t r i b u t i o n
valid
provide
application.
be
~
of the
exhibits
sample
with
Effectively,
offset
is
reported
the
lectures
terms
is
been
that
to
as
the
in
expansion
theory
of
mean.
large
estimation.
approach
introduces
estimators
This
has
add
distribution
(1947)
of t h e
of
estimation
us
any
distribution
a half
estimator
a paper
arithmetical
yield
breakdown-point
error
minimax
Estimators
Mises
to
a group
an
favorable
can
the
utility
the
in
design
case
location
with
surveyed to
5)
important
conclude
concurs
ordinary
a one-dimension
breakdown-point
To
the
(small
thus
viewed or
or T(g). common
to
far.
We
functionals
T(f)
T(g).
and
Consider between
the
f and
functional
g,
T(h)
where
h is
a distribution
intermediate
precisely h(x)
=
(l-t)
f(x)
+ t g(x)
with 0 < t ~ This the
functional real
variable
is, t
under and
suitable
can
be
I.
conditions,
expanded
a continuous
in T a y l o r
series,
function i.e.
of
13
TCN)
which
is c o n v e r g i n g
coefficients to the For
= TCf)
inasmuch
a i are g i v e n
variable
instance,
t
the
aI *
(t2/2)
as T(f)
in t e r m s
t and t h e r e f o r e coefficient = [
a I
or,
+
and TCg)
....
are
of d e r i v a t i v e s
involve
the
a I comes
u(x)
a2 +
out
[g(x)
-
finite.
The
of T(h)
with
difference
respect
[ g(x)-f(x)] .
as
f(x)]
dx
equivalently, = f
a I
Before are
the
proceeding,
high
for t =
order
g(x)
note
that
terms.
Thus,
dx.
the
when
closer g and
h is f r o m
f are
f, the
close,
the
smaller
expansion
I
T(g)
= T(f)
+ #~(x)
can be t r u n c a t e d "derivatives". with
we
~(x)
the
to
its
They
help
I
g(x)
dx + ~ [ f @ ( x , y )
first
involve
of T(f)
and
few terms. functions
f.
They
g(x)
The
~(x),
cancel
g(y)
terms
dx dy +
are the
~(x,y),
for e q u a l
...
so-called
... d e f i n e d
solely
distributions
g and
f. It has required
not to
yon M i s e s although proper many the
appeared
justify
derivation
function the
existence features at the with,
which
"yon
We
we h a v e
if,
of
frequently
two
only
if,
sequence
f(x).
in p r a c t i c e .
in o r d e r
to
can h a r d l y
bounding some
of
in p r a c t i c e . This
distribution f is
- This
"smooth"
continuous
way to
conditions as w e l l
as
involve
the
of a f f a i r
a set
expansion,
they
-See
a
appendix
being
met
of s u f f i c i e n t
is t e n t a t i v e .
properties
{T(fi) ) c o n v e r g e s of
state to
The
short,
entry-
while
an easy
derivatives
in
respect,
of V o l t e r r a ,
the
in
f and g. some
integrals;
ourselves
In this works
substantiate
unknown
limit
conditions
Moreover,
be r e l a t e d
distributions
applicability
I. A d i s t r i b u t i o n
and
Cauchy to
define
helpful
derivative" to
the
derivation.
metric,
preferred
state
to p r e v i o u s
and to t h e
of g e n e r a l
first
Definition
are
Mises
to
intuitive
which
convergence
so far,
by refering
is p r o d u c e d
Prokhorov
and
conditions
appear
appear
T(f)
above
himself
do not
conditions
involve
the
satisfies they
possible,
with
to T(f)
distributions
: respect when
to f u n c t i o n a l
{fi(x)}
uniformely
is
any
converging
T
14
Definition
2. A d i s t r i b u t i o n
functional
T if,
and
f is
only
"domain
limited"
with
respect
to
if,
lim T(f R) = T(f) when
R tends
to
infinity
and
fR(x)
is the
possibly
defective
distribution fR(x)
It m u s t
be
functional Theorem
= f(x),
if I xl
< R
=
if
>
0
,
observed
that
these
T rather
than
the
: A Taylor-like is v a l i d
with
two
definitions
respect
constraining They
of the
T defined
distributions
to the
are
f(x).
in terms
a functional both
R.
distribution
expansion
for
and g, when
Ixl
are
functional.
yon
over
Mises
two
smooth
For the
yield
the
: derivatives
distributions
and d o m a i n real
f
limited
variable
t
satisfying O~t~1, we have T[ (l-t)
f + tg]
= T(f)
+ t I ¢(x) 2 t +7-f ÷
To gain
further
expansion Let
say we have
p-dimension
sample
function
f(x).
thus,
attribute
them
be n o t e d
in t e r m
at d i s p o s a l
space
The
(p >
sample to
f¢(x,y)
each
( W l , . . . , w n).
dx
g(x)
g(y)
dx dy
...
in this
of an e s t i m a t o r
us
we
insight
g(x)
theorem, of
illustrate
its a s y m p t o t i c
some
I) with
population observation The
we
sample
drawn
probability
possibly
a given
empirical
value.
(Xl,...,Xn)
continuous has
by the
been
positive
density
from
density
stratified weight;
functions
a
and,
let
is,
accordingly, g(x) where We now
~(x-xi)
is the
introduce
estimated
= (I/I
the
by T(g),
Dirac
wi)
function
parameter
i.e.
[ W i ~(x-xi) ,
(i=1,...,n)
concentrated
8 defined
by the
on the
observation
functional
T(f)
and
x i.
15
e =
T(f)
= T(g)
we f u r t h e r
assume
observations the
that
= T(x I ..... Xn;
T(g)
w I ..... Wn) ;
is a n a l y t i c a l
with
respect
to the
x. as w e l l as to the w e i g h t s w.. W i t h this r e s t r i c t i o n I i T, the d i s t r i b u t i o n s f and g are s m o o t h and d o m a i n
functional
limited. domain
Effectively,
limited
Distribution converges
per
g(x)
existence is
to T(g)
distribution
smooth
with
g*(x)
of
f(x)
e, the
with
to
smooth
parameter
respect
m tending
is
to
per
definition
t o be
estimated.
functional
infinity
T,
and
for T(g*)
in
hm(X-Xi)
= (II[ w i) [ w i
and lim hm(X) Any to the
sequence Dirac
theorem is
have
of c o n t i n u o u s
function
m a y be
applicability
inasmuch the
as
= ~(x),
functions
{h
considered.
is t h a t
it has b e e n
m + -.
distribution
possible
to
(x)} u n i f o r m e l y c o n v e r g i n g m The last c o n d i t i o n f o r the g(x)
observe
be d o m a i n the
limited;
sample.
Thus
it
we
e~pansion
e
=
T(g)
=
T[ (l-t) f + tg] ,
for t = I
= e + (I/[ wi) [ w i ~(xi) I 2 + ~ (I/[ w i) [ [ w i wj @(xi,x j) + with
the
higher
...
order terms
sample
is r e p r e s e n t a t i v e
terms,
when
f(x)
and g(x)
It m a y
be
noticed
that
estimator
in
section
2.1
2.5.
The
Under section, parameter
influence
the we
appropriate
8 by the
are
of
given the
for the
above
of
regularity
e is r e l a t e d
approximate
e +
f(x)
when or,
the in o t h e r
variance
structure.
Hampel
conditions
=
importance
distribution
expansion precisely
an e s t i m a t o r
simple
neglegible
parent
close.
the has
function
see that
having
of t h e
to
relation
(11[ w i) [ w i @(x i)
met
in
the
corresponding
the
above
on
16
where
the
on the
factor
result
function
~(xi)
8.
or,
rather,
samples.
robustness
one
(1975)
- as w e l l
indicative has
the
one-dimension at
is
Hampel
at
of t h e function
"influence This
dimension as
named
is
curve"
a very
- see
several
influence
of the
~(x),
influence
for
he
powerful
Andrews
et
dimensions
the
al.
- for
was
tool
value
xi
considering to
only
appreciate
(1972)
and
instance,
the
Hampel
Rey
(1975a)
A
Basically The
it m e a s u r e s
influence
the
first
Let
us
von
first
the
function
Mises
sensitivity
of
can
defined,
recall
the
for
the
particular
have,
when
t
is
observation.
and
direction
easily
of
obtained,
a Dirac
as
function.
¢(x)
f(x)
dx = 0.
distribution
g(x) we
be
in t h e
each
tautology
: Then,
also
derivative
e to
small,
=
(l-t)
f(x)
that
is w h e n
@(x)
g(x)
+ t
6 ( x - x O)
f(x)
and
g(x)
are
close
to
one
another, T(g)
Therefrom
the
= T(f)
+ f
= T(f)
+ t : ~(x)
: T(f)
+ t ¢(x0).
following ~(x 0)
Observe
that
applicability derivatives; distribution the
sample The
seen
than
in R e y
definition
the is
g(x)
which
of the
= lira ([ T(g)
above
this
- T(f)] /t),
definition
Taylor-like
due
to t h e involves
influence
has
only
local
domain
in t e r m s
particular
occurs
t ÷ O.
a larger
expansion
very
function
of
of t h e
selection
properties
yon
Mises
of
at p o i n t
x 0 of
space.
concept
situations
the
dx
~ ( x - x 0) d x
of
where
influence several
(1975b).
function
parent
can
be
distributions
immediately are
generalized
concerned
as
can
to be
3. THE JACKKNIFE
3.1.
Introduction
The
so-called
reduce
possible
obtain
jackknife bias
estimation
of v a r i a n c e s .
power
to p r o d u c e
cheap
way,
cheap
in c o m p u t a t i o n
the
that
results
not v e r y
estimator is t o
say
are
defined
been
has
introduced
and t h e n
cheap
and
to be
impressively
circumstances,
but
the
by
cases;
are
to to
its
in a
necessarily
By all
in m o s t
results
extended
assessment
not
considered. good
Quenouille
interesting
estimator
in m e t h o d o l o g y
has
by
progressively
It is e s s e n t i a l l y
improvement
if t h i s
obtained
well
method
in e s t i m a t i o n
standards, but
either
in a few
poor
or
ridiculous. Unfortunately not
easy to
under
check As
And,
them
support in the
the
to d e v i s e
use of
relatively
few details
More
and w i l l
be
reserved
story
starts
possible
bias
Huber
involved
(1972)
more
specific
we d i s a g r e e
that
: "It
technique
to t h e
is h a r d l y under
work
be n e e d e d
might
and b e t t e r the
for
is
it is a p p l i c a b l e
respect
conditions
with
method
?
variance
above
which
the
to
estimate"
viewpoint
all e s t i m a t o r s
and w i t h
We n o w p r e s e n t a relatively
considerations
for the in
with
method
and
falling
2.h.
jackknife
by Miller.
The
more
jackknife
however,
regularity
jackknife
section
is the
to
properties,
sequel,
of t h e
seen,
of the
estimator
precise
useful
in the
frame
But w h a t with
these
than
indicated
down
be
of the
according
to w r i t e has
of a p p l i c a t i o n
It m a y
regularity
observations.
jackknife
scope
delineate.
suitable
worthwhile
the
1956,
of s t a t i s t i c a l
could
next
section.
when
Quenouille
estimators
the
light
obscure
proposes
through
what
technique
notation the m a i n
to r e d u c e appears
used ideas
the
to be
a
A
"mathematical sample
of
sample
size
trick".
size
n can
He
observes
frequently
that
have
an e s t i m a t o r
its bias
e based
expanded
on
in t e r m s
a of the
as f o l l o w s
E
(i
- 8) = a l n
-I
+ a2 n
-2
+
... A
This based
form
is n o w
on the
observation.
compared
sample The
with
of size
new
the
n-l,
expansion
corresponding
the is
same
sample
for the
estimator
without
the
i-th
ei
18
E(; i when
the
observations
proposes
which
to
consider
have
=
are
a 1 (n-l)
+
independent.
a2(n-1)
-2
+
To r e d u c e
...
the
bias,
Quenouille
bias
expansion,
except
that
the
first
order
term
It is
E(~ i Obviously, leading
-1
the v a r i a t e s
a similar
cancels.
8)
the
term
efficiency
e) = - a2/[ n(n-1)]
same m a t h e m a t i c a l
of the
in the
expansion,
estimation,
trick and
he
+
can be
so on.
suggests
...
applied
To a v o i d the
to the
a loss
definition
second
of
of an
average
estimator
8 = =
For
no c l e a r
has
~i ~i [ (n-1)/n]
8
reasons
demonstrates appear
(l!n) n
fairly also
-
at t h a t good
named
A 8i .
[
time,
the
statistical
8i, the
jackknife
estimate
properties.
pseudo-estimates,
Tukey
8 frequently
who w i l l
and 8i,
the
soon
jackknife
pseudo-values. With
regard
to b i a s
reduction
advantageous
to m o d i f y
observation,
say we d e l e t e
pseudo-estimates constituting devoted
these
to the
observation
three
some
and the
serial
selection be m o r e
appear
be
of m o r e
I) and w o r k
There
of
(h >
(nnh)
are
scheme
each
of t h e
group
other.
out t h e ways
of
of
(h = I)
I) c o n s e c u t i v e third,
different
when
the
ways.
parameter
independent.
When
can be m o r e
reliable;
of h d e l e t e d The
one
has b e e n
deletion
possible
strictly
be
than
are m a n y
consideration
equivalent
second
such t h a t
independent
( h >
: first,
deletion
could possibly
of g = n/h p s e u d o - e s t i m a t e s ;
observations
of h m u s t
it
deletion
(n-h).
special
in the
the
that
n by
schemes
t o be r a t h e r
correlation,
or less
and
second,
of h o b s e r v a t i o n s
moderate
of size
following
and c o m p u t a t i o n
schemes
seems
size
h observations
subsamples
three
it
sample
samples
at a time;
observations deletion
with
the
third
The h is
there
is
the
observations
scheme
appears
to
19
be v e r y
valuable
although
it has
experimentally Before mention the it
for theoretical limited
supported
leaving
this
pseudo-estimates
samples. Gray
The takes are
This
and
second in
1958 w h e n
different
at l e n g t h
in the
in t h e i r
progress Tukey
therefrom
This
conjecture the
significant
will
in the
of t h e
he p r o p o s e s
2(~)
retain
book
largely
be
on the
feature
property
of the
jackknife
collection
history
to
sizes,
on the
same
of p a p e r s
by
that
of the
the
jackknife
pseudo-values
of e a c h
~i
specific
variance
estimate
in the
sequel.
We
is,
fact,
_ e)2] ,
supported
attention
is
of e v a l u a t i n g
sample
estimators
jackknife
= E[ (~ i
is w o r t h y
Instead
incidences
the
-
(1972).
conjectures
representative
it
on d i f f e r e n t
account
as
significant
essentially
observation,
as w e l l
(1977)
discussion
for r e f e r e n c e s .
method.
based
Sen
above
reduction,
jackknife
estimators
into
is d e f e n d e d
Schucany
place
from
to t a k e
The
(1974a)
of bias
of the
- see
interest.
- see M i l l e r
subject
generalizations
ispossible
derivations
practical
which
in
now
simply
the m o s t
method.
Bias reduction as well as variance estimation can be achieved without detailed knowledge
of the sample distribution nor involved
analysis of the estimation method. estimation
definition.
- Next
We only need a sample and an
section
will
demonstrate
that
the
third
A
central The large
moment
of
0 is
computation and,
through
accordingly,
analytical
(1966).
also
With
observation
means.
E ranging
deletion,
eventually
and
second the
variable
order
terms
infinitesimal ¢ is k e p t
analytical
derivations can be
To
conclude
from
he
zero,
limits
in the
states
this
it
is
introduction
to c.
the
be a b u s i v e l y as p o s s i b l e
is due
deletion
to D e m p s t e r of one
to
one,
with
the
of the
This
is met
approach
(1972a),
where
concern
differences
of d e r i v a t i v e s . scarcely
published
we w o u l d
like
help
complete
consideration
It is of g r e a t
then,
may
developments
of J a e c k e l
in t e r m s
background,
case
the
no d e l e t i o n ,
variable
because,
method
it as m u s c h
his
himself
jackknife
stated
apply
interesting
infinitesimal.
estimators
of t h e o r e t i c a l
One
jackknife to
to manipulate
formulas,
of a v a r i a b l e
with
by t h e
one t e n d s
difficulties
in his
and
available.
involved
the
first anew above
for
between
But,
due
to the
lack
so far.
to m e n t i o n
three
papers
20
which
could
a naive
introduce
of s t a t i s t i c s also
some
warning
very
complete
advantages In the
line
theoretical
set-up, (1968)
survey
sampling
survey
context
(1975).
The
jackknife
we
should
which
has
a n d the two
is
and F e r g u s o n
but
say,
can be
sharp
the
place
general
to w o u n d
is p o s s i b l y
to b a l a n c e
in
courses
having
: "However
(197ha)
tries
(1971)
(1975)
after
sufficiently
of M i l l e r which
but
with very
mention only
the
interesting are
different
the
the
the
paper
method
justification,
(1976)
partly
objectives
leave-one-out
intuitive
and C a u s e y
papers
Mosteller
in e l e m e n t a r y
of the m e t h o d .
jackknife
of W o o d r u f f
last
review
drawbacks
of t h e
Lachenbrueh
jackknife
of i n f o r m a t i o n
the
Bissel
properties
the
- the
jackknife.
introduction
level;
robust
important
piece
with
its
highschool
on the
stands
The
to the
for
how beneficial
still
unwary".
reader
pleads
at t h e
emphasis
demonstrated
most
the
presentation
which
of G r a y , related
is
of
the
work
adapted
Schucany
to the
and
in
t o the
and
Watkins
infinitesimal
jackknife.
5.2. Jackknife t h e o r y Hereinafter, justifies being
the
rather
we
sketch
jackknife involved,
Particularly,
this
method.
estimators
although account.
points
of the
A few p o i n t s
we r e f e r
section
taken
into
the main
to the
will
be
vector-valued
of this
appendix
solely
derivation
original
for f u r t h e r
concerned
estimators
which
and
with
work
details.
scalar
functionals
could
be
A
Assume respect
we
to
at d i s p o s a l
parameter
observation the
have
8 and b a s e d
x i appearing
estimator
observations
m a y be
with
seen
as
and w e i g h t s .
=
Then,
according
to
a scalar on the
sample
the b o u n d e d
it be w r i t t e n
T(x I ..... Xn;
2.h,
it can
8, c o n s i s t e n t
(Xl,...,Xn),
non-negative
a transformation
Let
section
estimator
w I .....
on the
each
weight set
with
w i.
Then
of
as
Wn).
frequently
be
expanded
in the
form
= e + (i/Di)[w where T(.)
the
coefficients
and the
sample
i *i +
( I / Z wi)2
[[ wiw.J *ij
+
"
'
"
~. a n d @.. are f u n c t i o n s of t h e t r a n s f o r m a t i o n i ij ( X l , . . . , X n ) , but i n d e p e n d e n t of the w e i g h t s .
21 This m o d e l , throughout. but we k n o w
the
method.
few t erms,
happens
modifications
of one w e i g h t
can p o s s i b l y
correspond
w e i g ht
w. is m o d i f i e d i p s e u d o - e s t i m a t e is
however,
Wl,...,wi_1,
g=n).
of the
the
ourselves
to the
This m o d i f i c a t i o n
observation.
When the
(I + t)wi,
wi+1,...,Wn].
by
~i - (~wj)
5] /t
through
e = (~/[ corresponding
we c o n s i d e r
(I + t), the p e r t a i n i n g
~i = [ (twi + ~wj)
The
(h=1,
with the d e l e t i o n
by a f a c t o r
estimate
of
in s i t u a t i o n s
to derive
we limit
at a t i m e
the p s e u d o - v a l u e s
and the j a c k k n i f e
required
for the v a l i d i t y
fails
introduction,
set of w e i g h t s
wi,
ei = T[Xl,...,Xn; We now d e f i n e
in the
section,
modification
condition
the j a c k k n i f e
met
of the
In this
is s t r i c t l y
to be not v e r i f i e d .
to the d e r i v a t i o n
pseudo-values.
will be a s s u m e d
its v a l i d i t y
a sufficient
Moreover,
above m o d e l
Contrary arbitrary
to its first
it c o n s t i t u t e s
the j a c k k n i f e where
limited
We do not k n o w w h e t h e r
expansion
has
wi ) [ ~'i"
approximately
the
form
I
8 = 8 + (I/~wi)[wi~ i + ~ which nearly The
first
multiplied introduce
With derived
the
differing
by the a bias
cancellation, note that
equates
expansion
item
factor
met
is the
(I/lwi)2 with
second
(I + t); this
decreasing
obtained
(I + t)
with the
with t = -I,
sample usually
not b r i n g
any bias
r e g a r d to bias
reduction
the
for a class
+
o
is also the size.
appears
Therefore
reduces
the bias.
- if the w e i g h t s -
to s e c t i o n
w i are i n d e p e n d e n t
if the g g r o u p s
- if each g r o u p
according
have
the
conclusions
may be
of the
2.h, observations
same w e i g h t s ,
is i n d e p e n d e n t
of the
We also
reduction.
following
others;
then,
can
its
of e s t i m a t o r s
If 8 can be e x p a n d e d
•
first which
A
-
•
for 5.
order t e r m w h i c h
t erm
small t does
[[wiwj#ij
xi,
22
- the -
ordinary
jackknife
the
jackknife
the
original
bias
(t = -I)
estimate
e has
estimator
possibly
the
e, but
same
for
reduces
the
asymptotic
a translation
bias,
distribution resulting
as
from
reduction.
In o r d e r attentively
to
obtain
the
a variance
pseudovalues
estimate,
~..
They
we now
have
the
consider
more
particularly
simple
I
expansion
:
~i which
indicates
they
observation
weight
Furthermore,
the
are
w.
wi
e + wi
essentially
and
$i + proper
independent
in the the
the
random
pseudo-values.
former
are,
are the
terms
component
mean
Thus,
up to This
perturbed
other
weights.
A
main
arithmetic
pseudo-values.
'
to the
of the
I
precisely
"'"
some
of the the
omitted
in the
~,2(e)
e, i.e.
main
the
(I/~w i)
~wi~ i,
is
random
components
and t h i r d
central
moments
homologous
moments
of the
second
factor,
argument
of
is the m o r e
correct,
expansions.
the
more
Asymptotically,
= u,2(e ) = (1/[wi)2
g
appearing of
neglegible
we have
U'2(~i)
and
"'3 (~) = "'3(e) The and,
"not-quite
large
in a g r e e m e n t
with
sample" Tukey
: (I/[wi)3 situation
(1958),
we
g ~ ,3el.() is t r e a t e d
obtain
the
in the
appendix
jackknife
variance
estimate
var(e)
var(;)
Moreover var(@)l order
when t
right
hand
= -1.
variance
tolerates as the
bias
number
is
is
due t o
the
in
the
jackknife
estimator
is
obtained
reduction. of groups
independent.
several
2
expression
This
random component
This
are
the
=
Therefore,
observations
when
cancellation
for
it may
be
of
the
class
of
its v a r i a b i l i t y
sufficiently
some
good approximation the
of
second
estimate.
Furthermore, g is
a fairly
-
large
safe
correlation
and
e which
is low
inasmuch
to p e r t u r b is e x p e c t e d .
also
inasmuch
as they
simultaneously This
23
consideration considers
strict
While range
is o p p o s i t e
possible robust
Sharot's
viewpoint
(1976a),
who
only
independency.
computing
of the
to
the
variates
outliers,
jackknife (7.
- w.
i
I
they
estimators
would
estimate,
it
is wise
8) in o r d e r
to
assess
imoair
are j a c k k n i f e d ,
the
these
variance
to the
check
presence
estimate•
variates
the of
When
are b o u n d e d
due
to the
A
limited For
incidence
procedure
as
observation (finite)
obtain
would
of a s m a l l In the
the
be m a r k e d l y
The
an
preferred.
parameter
be
limited
8.
to m i m i c
or c o m p l e t e ei i m p l i e s
the
deletion
the use
(infinitesimal) This
of an
of a
differential
is o b t a i n e d
through
the
t.
of the
pseudo-estimates
estimator
difficult
partial
whereas
version
strictly
The
on the
appears
pseudo-estimate
operator,
perturbing
can
follows.
it
above•
infinitesimal
expansions as
described to
observation
derivations,
difference
operation use
of each
analytical
jackknife,
to t h e i r
t small,
first
leading
the random
terms
are
8 i = e + t w .i ( a / ~ w i) T(x I ' "
" • DEn
; w I'
" • • 'Wn
)
= e + tw i J i and the
jackknife
estimate
is not bias
e
The
corresponding
jackknife
var(~) In his
in
version•
His
with
respect
analytical
estimators
Jaeckel
and
of the
recommend as w e l l
his be
his
comparison h i m to
has
the
is
proposed
is,
of
with
the
the
jackknife
infinitesimal
partial
derivatives
of T
in f a c t ,
as m u c h
as p o s s i b l e
an
derivation
that
treatment
seeing
to m i m i c
the
several
conclude
to m o d i f y
even
computational
line
)2] •
second
advanced
- In a p a r a l l e l
leads
estimate
bias-reduction
involves
t o w i and wj
to
variance
obtain
proposal
could
observation. noteworthy,
to
duplication
We h e s i t a t e treatments
order
8.
= X(wiji)2/t I- X w 2i / ( ~ w i
1972 m e m o r a n d u m ,
procedure
=
reduced
paper types :
that the
of
with
other
full
Sharot
(t = -I). appropriate
deletion (1976b)
of j a c k k n i f e
of an is
variance
24
"On the b a s i s of the M o n t e C a r l o s t u d i e s , w o u l d a p p e a r t h a t the d e s i r e d g a i n in p r e c i s i o n ... is o f t e n a c h i e v e d . A l t e r n a t i v e e s t i m a t o r s d e s i g n e d for a p a r t i c u l a r a p p l i c a t i o n m a y , not s u r p r i s i n g l y , do b e t t e r still. The i n f i n i t e s i m a l j a c k k n i f e is seen t o y i e l d just such an e s t i m a t o r in m a n y c a s e s . " We
frankly
only
regret
appears
reference To
conclude
this
(1976)
transformation assumed
3.3.
than
Case
To
is r e q u i r e d
the
of m e a n
The
the
the
section
has
been
been
in
Suppose an i n i t i a l
be
reported
issued
connection we have time
interest
censored
in Rey
by M i l l e r with
the
and t h e r e
It
the
we
indicate
satisfied
the
last
Taylor-like but
that
by the
one
expansion
equation.
of a p p l i c a t i o n
m a y be
durations
they
are
remaining
x i (m < i < n).
the m e a n
residual
likelihood
sum
(1975);
either
in this
with
application
the
another latter
section
in s u r v i v o r s h i p lies
fashion.
and
items
stopped
observed are
Then,
under
time
met
The
in the
fact
derivation
paper
in the
has p o s s i b l y
same
been
before
also
which
or in f a i l u r e .
durations observed
assumption
failure
are r u n n i n g
from
We w i l l
of the m f a i l i n g
during
known
of c o n s t a n t
is c l a s s i c a l l y
hazard
given
by the
estimator
on all
items,
[ xi, whereas
i=I ..... n. the
denominator
is the
number
ones.
A question consistent
runs
of this
items
= (11m)
of f a i l i n g
investigate
a set of n i n d e p e n d e n t
until
(n-m)
we
frequently
former.
the
the
309,
(1975b)
items;
note
that
in an u n k n o w n
b y x i (I ~ i < m ~ n), the
maximum
theory,
to be
scope
power,
life
denote
rate,
known.
2.4.
jackknife residual
particular
data may
already
field
by
obtains
that
(197ha),
jackknife
- see p a g e
we c o n j e c t u r e
so l i t t l e
help.
conditions
He e v e n t u a l l y
demonstrate
written
on the the
review
is
study
studies.
has
of M i l l e r ' s
section
implied
an e s t i m a t o r
that
jackknife
is of no g r e a t
enumerates T.
Nevertheless, broader
infinitesimal
in one p a g e
to J a e c k e l
Thorburn
we h a v e
the
it
we w i l l
to t h e
not
resolve
real mean
life;
here
is w h e t h e r
estimator
this
is not the
object
e is
of the
present
25
discourse.
This
of c o n s t a n t
hazard
rate.
assume
that
not
anymore
a negative more
important
exponential
reliable
observations free
is
ourselves
the
failure
derived
agree
that
on the
probability
results, do not
seeing
We k e e p
now relax
times
solely
In t h i s
f r o m the
with
the
the
assumption
definition
are d i s t r i b u t e d
function.
strictly
from distribution
we
estimator
data
set,
preconceived
ideas.
we n o t e t h a t
the
do
according
way,
exponential
but
we
when
to
expect the
assumption.
This
We
is a f e a t u r e
of r o b u s t n e s s . Before
applying
the
jackknife
weighted
form
A
e = (IIX wj) [ fits
in the
frame
wixi,
of s e c t i o n
(j=1 ..... m;
2.~
and,
thus,
we
i=I ..... n) are
in a g o o d
position
to p r o c e e d . With
all
procedure. the
weights
equal
Modification
to
one,
of the
we n o w d e v e l o p
weight
w i by
the
a factor
finite
jackknife
(I + t)
produces
pseudo-estimate A
0. = e + t g. 1
1
with 5.
1
The
corresponding
~i
=
(x.
-
e)l(m
i = xi/m ,
+ t),
i <
m,
if i > m.
pseudo-value
= [(n
if
+ t)
is
Oi
= 0 + (n + t)
- n O] / t ~.. i
Therefrom,
we
obtain
the
jackknife
estimate
o = ~ ~iln =
0
+
[ (n
+
t)In]
~
= e + t [ (n + t)/n] The
bias
reduced
introduced,
estimate
see t h a t
[xillm(m
is the
+ t)] ,
expression
for
where
i >m. t = -I has
been
i.e.
T0 We
~i
the
-_ ~ _ n - I e n m (m - I) [ xi'
bias
correction
is
inversely
for
i > m.
proportional
to the
sample
26
size,
when
ratio
is
the
Effectively fluctuates matter
in
of s t o p p e d
with
such
with
size
case
size
to
failing
n, t h e n
the
n and,
the
parameter
items
bias
is given.
correction
e to be e s t i m a t e d
therefore,
what
should
be
If this
is m e a n i n g l e s s . also
estimated
is a
of o p i n i o n .
A variance jackknife size
ratio
changing
estimate
and this
for
finite
is r e a d i l y
variance
derived
is c o r r e c t
t-values.
According
through
to the
to the
the
first
infinitesimal
order
in
sample
definition
e i = e + t w i Ji' we h a v e
in this
case
study
wi and,
immediately,
the
estimator
var(e) = [ ~2i/(
Ji = 6i follows
I - n/n 2 )
or
var(e)
-
n
The
jackknife
confirmed be r e l a t e d t; this
to the
Carlo fact
is i n d i c a t i v e
2 + ~ x i]
(j
< m,
i > m).
J
estimate
by M o n t e
2
[ [(x.-e)
m2(n-1)
and the
jackknife
experiments.
that
the
of r a p i d
The
variance quality
estimate
of the
have
results
been may
6 . - v a l u e s are s c a r c e l y d e p e n d e n t upon l c o n v e r g e n c e of the e x p a n s i o n of s e c t i o n
2.h. It
is h a r d l y
illustration.
However,
consideration. specifically
possible
Assume as
we
We w o u l d on t h e
multidimensional
as w e l l
to d r a w
covariance
that
conclusion the
jackknife
now to retain matrix
f r o m this
the
which
is w o r t h y
attention
comes
simple
out
for
to
estimate
covariance
the mean
matrix,
vector
given
that
of a s a m p l e
(Xl,...,~n),
the m u l t i d i m e n s i o n a l
observations
We d e n o t e
estimator
x. are p o s s i b l y i n c o m p l e t e l y k n o w n . --I of the m e a n and its j-th c o m p o n e n t w i l l
be
j to be u n b i a s e d . absence
of
more
estimators.
we w a n t its
feel
like
any
The
of c o m p o n e n t
: [ ~ . . x al
coefficients x... Jl
Per
zji
ji
/[
are
definition,
given
by 8 the by
zji indices we h a v e
of p r e s e n c e
or
27
z.. = I, J1 = O, Before
applying
if x.. is o b s e r v e d , 31 if x.. j! is m i s s i n g . the
jackknife
independency
assumptions~
the
form
weighted
method
on the
zji
be
w..
These
the
need
of
as on the ~i;
furthermore
of 8,
~j = [ w i zji must
we note
as w e l l
sufficiently
xji
differentiable
conditions
are
/ [ w i zji,
with
respect
to the
x.. J1
and to the
satisfied.
I
Attempts bias.
to
reduce
We will
sequel.
the
therefore
It r e s u l t s
bias
would
only
from
the
here
estimate partial
put
the
to
light
covariance
the
lack
matrix
of
in the
derivatives.
J..j~ = a ej / a wi =zji (xji - ej) / [ w i zji and
has
the
components
{co~(~)Ikl We will say
write
for the
it down
=
[ w~
Jki
for the
unweighted
mean
Jli
/ [ I - ~ w~
uniformly
weighted
/ ([ wi)2]. estimator,
A
n
ZZki(Xki
This
covariance the
does
not
interest be
to
from the
positive
--
~
all
definite
on the
elegancy with
theory.
it w o u l d
1
Zli
The
have
Zi ~ki
matrix.
observations
by c o m p a r i s o n
classical
definite;
t eov(~_)lkl
when
insist
estimator
- 81)
Z~ki Z~li
is a p o s i t i v e
estimator
seem u s e f u l of this
inferred
even
estimator
ordinary
is to
A
- ek)(Xli
{cov(~)]~l - n-1
with
that
vector,
been
(xki
are
It concurs
complete.
of the m e t h o d the
structure
latter
estimator
It
seeing which
the
can
is not
written
_ ~k ) ( Xli
_ ~ 1 ) ~li
where
m = ~i Zki With
regard
to
applications,
they
Zli"
can
be f o u n d
in v a r i o u s
domains;
28
some
are
interesting
apparently not
fails
surprising
Ferguson and
Rey
3.4.
et
on
because
to us.
al.
Gray
(1975),
been
data
However
(1975),
and M a r t i n
it has
correlated
it
et
observed
and on o r d e r
is i n f o r m a t i v e
al.
that
the
jackknife
statistics. to
scan
This
the
(1976),
Miller
(197~b),
to the
review
of M i l l e r
posterior
is
papers Rey
of
(197h)
(197ha).
Comments On t e n d e n c y
conditions g, the
to n o r m a l i t y .
and w i t h
jackknife
degrees
large
estimate
of f r e e d o m .
consequence
tends
To u s ,
of the
It
is k n o w n
group-size
to have
this
performed
that
under
h as w e l l
regularity
as f i n i t e
group-number
a T-distribution
appears
accidental
mkthematical
with
g-1
and e s s e n t i a l l y
derivation.
Expressing
a
the
A
estimator
e in t e r m s
and t r u n c a t i n g with
normal
that
This
of
state
of the
happens but, the
the
affair
frequently
then,
the
Lindeberg
has
of
been
rather
that
original
the
components
regularity
estimate
tends
avoided
in this
than
in t e r m s
jackknife
estimator
also
analysis.
Jackknifing
independency.
When
must
assembled.
All
other
statistics.
This
presentations
jackknife.
Effectively,
of the
continuous
independent
in the
of the
classify,
among
observation
several
information.
of o b s e r v a t i o n s
observations. jackknifed
in
subject
in It
or less
to
the
normal
application
of
classes,
This
can be
Estimator
prior
large
generally does the
not
can be
sizes
h
fulfilled.
justify
estimators
and the
Numerous
an e x t r a n e o u s
information
as
of the
"discriminant
are
under
in v a r i a n c e s
of r e l a t i v e l y
weights
to k n o w n seen
applied
weights
the
are u s u a l l y are
not
values.
probabilities.
belonging
misclassification
grouos
conditions
observation
On m i s c l a s s i f i c a t i o n
prior
then
is r e a c h e d .
by e x p a n d i n g
is m o r e
can be
size
not
And
observations.
stationarity
sample
utilization
permits,
some
and
be
paper
estimate
e was
and b o u n d e d
to n o r m a l i t y
of the
assumed
On o r d e r
are m a n y
conditions).
conditions.
On t i m e - s e r i e s conditions
in the
jackknife
weights
x. as a p o w e r series e x p a n s i o n 1 necessarily pseudo-variates 8 i
produces
if t h e i r
disguised
conclusion
observations
order
distributions
(appropriately
terms
of the
at a low
sample
consists
classes.
a function
Then,
where
the
are
on the
in use basis
frequently the
of t h e s e
misclassification
analysis",
methods
to of
in a set
probability
of
known
probability estimator
can be
depends
29 smoothly some not
on o b s e r v a t i o n
other
importances;
classification
application
such
as
of the
"nearest
method
with
neighbour",
is
justified. On t r a n s f o r m a t i o n s .
is
techniques,
but
also
applicable
differentlable be u s e f u l easy
to
If the
jackknife
to
$ = ~(~)
inasmuch
in the
vicinity
of e.
set
confidence
to m a n i p u l a t e
On r o b u s t n e s s .
than
the
domains original
The p s e u d o - v a l u e s
method as
~(.)
However, when
the
is a p p l i c a b l e
to ~,
is c o n t i n u o u s l y the
once
transformation
distribution
it
may
of $ is more
distribution. ~.
are
indicative
of the
IA
observation version reveal
incidences
of H a m p e l ' s an a b u s i v e l y
on the
influence high
estimator curve
sensitivity
8 and h a p p e n
(197h). to
some
Their
to be a d i s c r e t e
inspection
observations.
may
4. M-ESTIMATORS
~,.I. Warning In t h e s e well-known viewpoint which
sections in the
For
consider
"best"
representativity
real
location
from
each
But
under
what
e is the
estimation.
leads
estimation
without
both
translation
However
disregard
of a l o c a t i o n
to t h e
real meaning.
to e s t i m a t e
to
which
are
the
certain
aspects
concern.
according
mean;
problems
feeling We m a y
are
of t h e the
law f(x
by
end up with with
distribution;
its
a median
respect
both
however
e for
We w i l l
concerned
invarlant
location,
parameter
- 8).
to the
may
they may
differ
other. are t h e s e
generalizations
have
classical
statistical
estimator
or its
be u s e d
a few
and this
the
or an a r i t h m e t i c
frequently
with
be of g r e a t
x distributed
for the
estimate
of
unusual
would
example,
the v a r i a t e search
theory
is r a t h e r
otherwise
we m e e t
M-estimators
of the
parameter
in o b v i o u s
usual
value
we
maximum
maximizing
investigate likelihood
the
?
They
are
estimates.
likelihood
Classically
function,
i.e.
we
notation L
=
n
f(x i
I e)
= max
for
e
or e q u i v a l e n t l y - in L = - Z in f(x. 1 The
estimators
of t y p e
M are
solutions
M = ~ P(Xi, where note
the
function
that
the
correlated The
above
estimators
be
structure
of t y p e
problems,
the m a n y
other
attention
- The
meet
developments
with
may
of the m o r e
e) = m i n
rather
for
general
structure
e,
arbitrary.
is r a r e l y
e.
Before
appropriate
proceeding,
we
to p r o c e s s
observations.
for l o c a t i o n however,
p(.)
I e) = rain for
independency
next
M have
since
been
which which
analysed
initial
estimation
section
properties
the
problems
is o r i g i n a l , are b a s e d usually
in q u i t e
contribution have
in this
many
received respect
on d i f f e r e n t i a b i l i t y do not
hold
respects
of H u b e r
f o r the
(196h);
scarcely - We w i l l and other
two
any
31
great
classes,
respectively and t h r o u g h (some
are
the
L- and the
obtained rank
R-estimators.
through
tests.
linear
The
We r e f e r t o H u b e r
questionable),
and to
latter
combinations
Scholz
(1972)
(1974)
are
of o r d e r
statistics
for their
for their
properties
respective
merits. Developping
the
functions,
their
structures
as w e l l
reading tried need
we p r e s e n t
to maintain a special
We w i l l
theory
of M - e s t i m a t o r s
derivatives,
operators
as
summations
the
notation
the
now
script,
meet
n
sample
P
dimensionality
space,
f(x)
probability
X. 1 n
observations,
w. i
non-negative
~(x
sample
-
x i)
8. S g M. J
Dirac
the
~ C of
its
ease
each t i m e
definition. scripts
~.
density
function
of x, x E ~.
I ~ i ~ n.
weight
function
simultaneous minimized
estimators.
t E R,
(.)*
perturbed
(.)'
transposed
entities
have
(~)
underlined
entities
are
Hampel's
of x to M~,
I ~
j ~ g.
I ~
j ~ g.
function,
x E ~.
~ (x,.). J ,j(x,.). influence
0 ~t
~
I.
entities
have
asymptotic
variance
regression
model,
m
dimensionality
x
= (u, Z')'
,(c)
= (~lac)
~i(c).
= (~l~c)
~(c).
e
a star
superscript.
a prime column
of 8.. J u = X' ~I
of ~ a n d ~I'
in p1(a)
=
+ ~" p = m + I.
= (a/~u i) c i. k
IEI v
superscript.
vectors.
n.
rigidity index, R i exponent
on x.. z I ~ j < g.
by e~,
t
contribution
x.. z
concentrated
and e s t i m a t o r ,
of
function
of o b s e r v a t i o n
the
we have
R p.
= (~lae.) J = (al~e k)
Ri
further,
following
~jCx,.) ,j(x,.) *jk(X,.) a.(x) J
2 o'. J u,v,£
to
as p o s s i b l e
size.
parameter number
notation;
or m a t r i c i a l
In o r d e r
As m u c h
recalled
with
need various
vector
integrals.
in use.
classical
we h a v e
successively
and
we w i l l
with
we
$2
e
= e I for m =
y , D ,F ,e i , n
see
V
2 = oi,
asymptotic
s
scale
of ~.
4.2.
section
I, v =
I.
h.3.h. variance
of 8.
MM-estlmators
Consider
some
distribution
sample
f(x),
unknown
except
account
the
space,
x E ~,
for
some
empirical
sample
on a set
function
section
investigate
we
(81,...,8g)
( X l , . . . , x n)
which
(MI,...,Mg) , that
weights
concentrated
are
a density
this
distribution
and t h e n
we t a k e
is
into
w i) [ w i ~(x - x i)
= (I/X
of n o n - n e g a t i v e
is a D i r a c
on w h i c h
possibly
distribution.
f(x) based
say ~ C R p,
is d e f i n e d ;
the
on the
estimation
such that
is t h e y
are
M. = m i n J
( W l , . . . , w n)
they
such for
and w h e r e
~(x - x i)
observation
x.. In this I of p a r a m e t e r s
of a set minimize
the
functions
that
8., J
j =
1,...,g
where Mj = I Oj This
definition
yields
to
(x,
e I ..... eg)
identity
of the
f(x)
dx.
parameters
with
their
estimates. The
above
classical
framework
M-estimator
interdependence often e.g.
depends the
is e s s e n t i a l l y theory.
of v a r i o u s upon
variance
where
a 2 m a y be
~ is the m e a n
These
are
in fact
- o212
f(x)
of t y p e
of
through
f(x)
another
(x - ~)2
estimators
For
estimation
defined
through
#
M M - e s t i m a t ors.
It is m o t i v a t e d
estimators.
a previous
$ [ (x - U)2
a generalization by t h e
instance, some
of the frequent a scale
location
for
estimator
estimator,
the M - s t r u c t u r e ,
dx = m i n
now
f o r p = I,
a 2,
M-structure
dx
= min
for
Multiple-M
~. or,
as we
call them,
33
In the under
sequel
the
we
integral
- Independence
=
,jk(X,.)
set
respect
meet
to
M. can be d i f f e r e n t i a t e d J assume :
we
01,...,8g ,
of f(x)
the
and
at the
frontier
of the
sample
needs).
differentiability
the M M - e s t i m a t o r s
can
of d e r i v a t i v e s ) .
as w e l l
be
defined
by the
following
of g e q u a t i o n s
e I ..... eg) f(x) dx = 0.
f ~j(x, In our for
functions
(alaej) ~j(x,.), = (~lae k) ,j(x,.),
(existence Accordingly,
the
precisely,
on d e r i v a t i v e s
G generally
- ,j(x,.)
that More
of ~ with
(conditions space
assume sign.
illustration,
an e m p i r i c a l
the
corresponding
distribution
set has
g = 2 equations
and
is,
f(x),
[ wi [(xi
-
,)2
-
O
2]
O,
=
w i (x i - ~) = O. Strictly
speaking,
We w i l l
now
estimators. we give
be
their
to
adaptations are be
of the
According to the
to
estimate
s
2
°
Is an M M - e s t i m a t o r .
distributions
their
influence
conclude
this
of t h e s e functions,
section
by
then
some
robustness.
domains
difficult
(possibly
to
select
of a p p l i c a t i o n s .
derivations,
a prime
whereas
sample
derive
eventually
is r e l a t i v e l y
certain
by
by the
first
and
on t h e i r
column-vectors denoted
we
variances
notation
ourselves
interested
Precisely,
considerations The
~ is an M - e s t i m a t o r ,
we will
assume
of d i m e n s i o n
I).
without
restricting
In o r d e r
to p e r m i t
that
estimators
the
The t r a n s p o s i t i o n
easy 8j
will
superscript.
section
2.5,
8j is g i v e n
the
Hampel's
influence
function
relative
by
Gj(x O) = lim[ (8~. - 8j)/t] , t ÷ 0 where
ej
is d e f i n e d
through
f*(x) for
a given
coordinate
=
the
perturbed
(I - t)
set x 0.
f(x)
Observe
distribution
+ t ~(x that
- x 0)
the
script
Gj(x)
describes
34 a vector-valued
function
w i t h the
sample
space
although
unfortunate;
I C s j , F(.)
of H a m p e l
on the
itself;
sample
the
it s t r i c t l y
space
notation
and should not be c o n f u s e d
£(.)
corresponds
is b e c o m i n g
standard,
with the c u m b e r s o m e
(197h).
The g e x p r e s s i o n s
Cj( x,
I can be e x p a n d e d
as follows
We c o n t r a c t
the
notation
first
in the p e r t u r b a t i o n .
order
[ ej(x) f
= ~ [[
+ ~ Cjk(X)
Cjk(X)
= [ Ajk(e;
linear
see that
the
equations.
or f u n c t i o n a l coefficients estimators
f dx]
respect
to the n o n - p e r t u r b e d into
account
elements
the terms
= 0.
(e~~ - 8j)
They m a y be
depending Ajk
(e~ - e k) + t ~ j ( x O)
difference
scalar
is the
but p o s s i b l y
u p o n the n a t u r e
(k ~ j) are d o m i n a t e d independent
solution
of the
they
of a set of g are v e c t o r i a l
estimators.
by Ajj, that
When the
is when the
f r o m one a n o t h e r ,
a solution
Under
II~k Aik Akk Akj A
II < <
I, k#i,
k#j,
we o b t a i n e~ - e 5 = - t [ A Uj!j Therefrom,
the
influence
~j(Xo)
- [k
function
-1 A~] Ajk Akk
is g i v e n
n j ( x o) = [ AT! -1 Ok(X0) JJ A~k Akk where
the
summation Ajk
runs
over k = 1,...,g
= I Cjk
(x, e I .....
~k(xo)],
by
- 2 A[.jj1 ~ ( x 0 ) and
eg)
f(x)
dx.
k~j.
-
up to the
(Ok - Ok) ] [(I - t) f + t 6(Xo) ] dx
- e k) + t ~ j ( x 0 )
are r e l a t i v e l y
be d e r i v e d .
with
and only take
0
=
dx
= f [~j(x)
We
f*(x) dx
* 81,... , 8g*)
can
35
When
the
distribution
corresponding
sums, Ajk
and the of the
set
of f(x)
treatment
described with
the
This
in s e c t i o n variance
notation
function
with
~hen
If and estimate
a prime,
the
only
latter
if all
3.2
is r e l a t e d
to the
asymptotic
solution. unique
A natural
or at
solutions accepted no
loss
least
to be
-- f
Given local
contexts
that
definite,
but
In case out
subspace
the and,
solution
is
see that
the
they
the
Accordingly
2
we
2
-
transposed
wi] • of the
influence
the
same
weight,
the
variance
variance
attention apart
can be
being
set
is r a t h e r
the
nature
of
troublesome
indeterminacy
must
solutions
with
difficulty
and with
coefficients
inverted
as was
set
of
may
the
the
help
be
and,
is r e s o l v e d
there,
a minimum,
is to be
the
solutions
indeterminacy
no
of the
robust,
is we r e q u i r e
convex
the
to the
from
that
this
when
we
thus
jackknife.
a modification has b e e n
=
unique;
the
unicity,
(Sj) much
happens
to work
some
to
what
~j(x) ~j(x)' f(x) dx
A locally
condition.
on
have
requirement,
discrete.
is a p p r o p r i a t e
limited
expression
"locally"
in c e r t a i n
projections
the
a scalar.
asymptotic
devoted
of g e n e r a l i t y
supplementary it
not
can be
is p r e c i s e l y
[ .] ', is for the
var So far we have
equate
eg),
[ n j ( x i ) | ' / [(~ wi)
is not
2 J
the
integrals
e I .....
on the
observations
(~.
through
the
estimate
var(O.)3 = ~ w~ [n.(xi)]j The
~jk(Xi,
in f*(x)
(~1'''''Wn)"
meet
here
is e x p e r i m e n t a l ,
= CI/~ w i) ~ w i
perturbation weight
f(x)
e.g.,
often, by
a
be p r e s e r v e d , of t h e i r
occurs. requirement
of
A.. must be p o s i t i v e JJ i m p l i c i t e l y a s s u m e d in the
derivation. To be r o b u s t
we must
therefore,
the
first
the
sample
space.
its
distribution
also
have
derivatives
Then, tends
a bounded ~j(x,.)
the M M - e s t i m a t o r
to the
normal
law
influence
must
function
be b o u n d e d
for
and,
any x in
e. has a finite v a r i a n c e J (possibly multivariate)
and
36
according
to the
multivariate An
Lindeberg
central
apparent
conceptual
estimator
definition.
f(x),
a mathematical
have
and
defined
the
corresponding of this What
The the
we
is here,
information
should
p.
be
thus,
designed this
clarified
from
its
and the
concerning
(MI,... , Mg),
some
we
as
Further, of
the
or a sample,
estimator
equivalent.
in
spite
variance.
?
beginning
we have
of
and
an e s t i m a t o r
mess
1966)
128).
a distribution
estimated
and,
have
of F e l l e r ,
the m i n i m i s a t i o n
to be
behind
6.3 1973,
at d i s p o s a l
rule,
to the m i n i m u m
argument
(eq. (Rao,
difficulty
parameter
viewpoint
unique
theorem
Having
"confusion",
is the
condition
limit
on the
to
end
of the
distribution
argument,
is f(x),
that
say for
a
sample fCx) and that
it appears
say g(x), sampling to be
if it theory
estimated
avoids
fictitious
is not and,
thus,
is g i v e n
a parent
this
distribution
assess
2.h. the
In fact, parent
Then,
concerned
notation
U~.
the
the
denote
knowledge
these
definitions
to
form
theory, and,
than to
f(x),
With
of order
~,
or
with
in the
frame
it
with
it has
the
section
to 3.2.
need
here
been
omitted.
where
all the
above
in the
=
f
ordinary
)~
f(x)
dx
of the m e a n
parameters
as follows
-
(x
f
x
f(x)
respectively
dx. 8 I and
of
of
is p o s s i b l e
illustration
stated,
parameter g(x) ; this
existence
we do not
thus,
to
assume
accordingly
a simple
moment
fit
f(x),
usual
of the
definition
=
We
relations
it must
sampling
expllcitely
rather
implicitely
estimators
central
~ implies
we
distribution,
to the
definition
to f(x)
reference
we p r o p o s e
any p a r e n t
The
further
usual
can be
study Its
of the
- xi) ,
is o p p o s i t e
unclear
But
in an e x p l i c i t
elements we
This
with
unique
to the
proceeding,
formalism,
introduce
reference
g(x),
with
distribution
Before
by
relations.
variability
contrary
to
startling.
distribution
cleared
parent
w i) [ w i ~(x
peremptory.
arbitrarily
section
= (II[
82 and t r a n s f o r m
the
any
37
I
[e 1 -
(x
-
e 2 ) v]
[
[e 2
-
x]
f(x)
f(x)
d x = O,
and
Comparing
these
dx
last two e x p r e s s i o n s
=
0.
w i t h the f u n d a m e n t a l
MM-estimator
equation I Wj(x, we o b s e r v e
the
e I .....
correspondence
eg)
dx = 0,
f(x)
and we d e f i n e
~l(X,
el,
e 2)
= e1 -
(x
V2(x,
el,
e 2)
= e2 - x.
-
e2 ) v
and
These
which
equations
have the
following
partial
~11(X,.)
= I,
~12(x,.)
-- ~(x - e 2)
~21(X,.)
= 0,
$22(x,.)
= 1;
will be i n t r o d u c e d
factors
A11
= I,
A12
= v WU-I'
A21
= 0,
A22
=
hi(x)
~ .
,
dx
I.
leads us to the e x p r e s s i o n
of the
influence
function
-I AI 2 A22 -I ~2(x ) - A11 -I ~ I (x) = A11 =
which
f(x)
the
This m a t e r i a l
v-1
in
Ajk = $ ~ j k ( X , . ) to o b t a i n
derivatives
indicates
(x
the
The a s y m p t o t i c
-
~)~
-
~
incidence
-
~
~
_1(x
-
~)
of an o b s e r v a t i o n
variance
~I2 = I
n~(x ) f(x ) ax
x on the
central moment
38
does
not
present
specifically sample
of
estimate
any
devote
size
computational our
n of e q u a l l y
of ~v and the
weighted
=
The v a r i a n c e
of m v comes
reference
a parent
var(mv)
form
concurs
observations. is the
(11n)
of ~
based
We d e n o t e
probability
on a m
density
the function
~ ~(x - xi).
immediately,
without
any
explicit
distribution
= n-11 f ~ ( x )
1 n-I
This
out
N o w we m o r e
on an e s t i m a t i o n
distribution f(x)
to
difficulty.
attention
(m2v
f(x) 2 m~ +
2
2 m 2 m~-I
-
classical
result,
up to the
-
with the
dx
v
2 v m ~-I
m~+1 ) "
second
order
in
n,
M-estlmators
4.3.
In this g =
I and
order
In
section obtain
to h a v e
we
and
specialize
regresslon
the
above
the
conditions
to be
robust
estimates.
This
assessing
the
functions
01(.)
some
location
robustness
but,
and @I(.)
Location
Let us
first
and
provides
to t h e
"best"
means
design
estimators,
in
of
of the best
in
regression
set t h e
stage.
We
are
u = v! e -
u and
In case
¢ are
scalars
of a l o c a t i o n
observation
consists
in the
per
definition,
~I
we
set =
is the
concerned
are m - d i m e n s i o n
simply
(ui, ! , i ) ,
M I = I p~(~,
model
e n
of v a l u e s
At)
f(£)
column-vectors.
have m = I and v =
of p a r a m e t e r s
set
linear
+ E
minimizing
dx
or
MI
by t h e
-I
a n d ~ and ~I
problem,
KI and,
the
situation
or @ i ( . ) ,
sense.
4.5.1.
where
it leads
produce
to t h e
by 0 1 ( . ) ,
presentation
moreover,
which
derivation
satisfied
=(II[ w i) [ ~i °I(~i'
Ai)
I.
Each
39 with
01(~, ~I) = 01(~) = 01(u - Z' !i). The
fact that
writing
observation u is;
u and ~ are
commodity.
one
Both
assembled
~i may be outlying
should
v has been
correspond
reviewed
in the
are c o n c e r n e d as well
with the
by Hill
(1977)
script
at the because
other.
level
than
situation
as Ypelaar
a
and an
~ is abnormal
The
as well
~ is more
same
as because
of outlying
and V e l l e m a n
(1977). As p r e v i o u s l y , mathematical
we observe
rule
of m i n i m i z a t i o n
characterization
of the
is an i n c r e a s i n g
function
seen
as a way
inadequacy
statistical
partly where
linear
the
there
of
emphasis
(1971),
is a "natural"
This
emphasis
is u n d e r l y i n g
what
it should
(1976)
be)
state
possibility
to be
concerns.
of a s t a t i s t i c a l
and only take (1976).
conjectured literature
Dempster
1977)
characterization select
Carlo
mathematical
(1975,
prior,
(the prior
is m o d i f i e d
frequently
consists
and F o r s y t h e
until
good
1975)
This may
case
e.g.
are
in the
pleading
for the use
directly
opposed
of by
statistical
introduced
if one admits estimators
results)
or w i n s o r i z a t i o n ,
specific been
to
can be
in a sequential
of " s a t i s f a c t o r y "
trimming
have
the
finite
properties,
Obviously
require
ignore
study
properties
and robust
obtention
and many methods
authors
have been
is often
and Collins
when they
A nice
on the
recent
(1972b)
sample
framework. very
in arbitrary
(1976).
observations
(1973,
(or
The m i n i m i z a t i o n
asymptotic
and which
on data
and
outlying
papers
problems.
is robustness
the more
simulations.
and
f(x)
of symmetry.
statistical
generally,
finite
only works
way
Yale
the
center
on
(1964),
distribution
1977a).
characterization
rather than
by Huber
of what
Jaeckel
But rather
in a b a y e s i a n
The d e p e n d e n c e
the
most
with
problems;
rules
who
a data-dependent
derived.
1973,
analysed
cases
is the Hampel's
appropriate
as 01(.)
can also be
of possible
rule
skipped
of ~I'
in c o n s i d e r a t i o n
In most
from Monte
process
symmetric
clears
(1972,
of r e g r e s s i o n
samples
a statistical Inasmuch
in spite
the d i s c u s s i o n
by Huber
Maronna
has been
symmetric
critically
their
~.
on the m a t h e m a t i c a l
definition
0(.)
investigations
u by ~' ~I'
by c o n s i d e r i n g
limiting
is more
than through variable
the
model.
Then
viewpoint
rather
through
I¢I, the m i n i m i z a t i o n
characterization
by Jaeckel
is defined
possibly-random
of a p p r o x i m a t i n g
of the
In location, the
that ~I
see
identification
investigated
of
in this
40 regard
since A n s c o m b e
(1976).
However
(1972), Mead
in k e e p i n g
and Pike
After
a critical
having travelled
we will take
asymptotic
properties
more
specifically
move
along
section @(z)
respect
on what
the work values,
is p r o d u c e d ,
or seen as a type
of this
derivation
with
eye
a long way b e t w e e n
procedure
our d e v e l o p m e n t
Noting
let us just m e n t i o n
to us a n e e d to endure
of Garel
as does
Youden
as r e c o m m e n d
(1975).
as a s t a t i s t i c a l resume
(1960);
it seems
particular
into a c c o u n t
we r e f e r the
for r e g r e s s i o n
the r e g r e s s i o n
of a p p r o x i m a t i o n ,
M-estimator.
samples
to Huber
let us
In this
of m o d e r a t e
r e a d e r to H u b e r
problems,
seen e i t h e r
sizes.
For
1972)
and,
(196h,
(1973).
We now
4.2.
and ¢(c)
for the
first
to ¢, the d e r i v a t i v e s
and second d e r i v a t i v e s
with
respect
of 01(s)
to ~I have the
simple
expressions
$I(~,
AI)
(a/a~1)
=
~i (u - z' !I)
= - ~(u - Z' At) Z as w e l l
as
~11(a,
[i ) =
(a/a~1) ~I(~ ,
AI)
= ¢(u - Z' !I ) Z Z'The f a c t o r has the
A11
is a g e n e r a l i z a t i o n
of the
ordinary
design m a t r i x .
It
form A11 = (I/~ w i) ~ w i ¢11(~i , ~i )
= (I/~ w i) ~ w i ¢(c i) Zi Z' i which must
be p o s i t i v e
observation
~i' has the
definite.
The
influence
function,
at the
form -I ~i(~i ) = @(c i) A11 X i
and leads
to a c o v a r i a n c e
estimate,
here w r i t t e n
for equal w e i g h t s ,
v. var(_e I) - n_n--1 [ ~(¢i )2 [[ ¢(ej) vj v'j] -I --z ~'i w h i c h is v a l i d for i n d e p e n d e n t o b s e r v a t i o n s ~i" R o b u s t r e g r e s s i o n s is a f i e l d where the c o n c e r n
[X ~(sj) v. v . ] -1 --a --S
is more
on g e t t i n g
a
41
regression
certainly
risk
of b e i n g
that
the
classical
of o t h e r
two m o s t
by
other
methods
that
instead
of n e g l e c t i n g
the
desire
of p r o d u c i n g
robust
regression
resolve
the
function ~i'
however
cannot
have
But
of
one
a very
hand,
to
Ek m a y
be
importance.
and
The
outlying
abnormally
small.
although
or,
on
The
it is the m o s t
fitted
on the
the d e s i r e
identified
s. t h r o u g h z
likelihood
p1(¢k)
between
having
the
it o c c u r s
term
has b e e n
conflict
taking
outlying
that
important
to d e t e c t ,
tends
ways
in v i e w
choice
~k
minimization
We
of the
already
dependency
to r e c o g n i z e other
particularly
modification defined
the
it u s u a l l y
be u s e d
some ~k"
conflict.
evaluate
of m a x i m u m
sensitive
regression
small
than
outlier
of
as o u t l i e r ) is at the
and
root
of
methods.
an a p p r o p r i a t e
above
to
by
This
data"
Effectively,
a disproportionate
ck (without
all
squares,
on the
residual
it.
a not-small
Hopefully,
them
the
of t h e
are v e r y
difficult
that
bulk
observation.
least
are,
its
: it m e a n s
obtaining
of
contribute
is p a r t i c u l a r l y
important
"the odd
origins
aspects
~k may
hand,
in
some
attributing
obvious
observation
latter
by
probabilistic
observations
the
right
misled
to that
have
of ~I
cancel the
regression
a "rigidity from
the
has
which
of u.. I
been this
influence
observation
residuals
assess
index"
a change
should
specific
small
to
p1(.)
at d i s p o s a l
on any
for
can be p r o p o s e d
¢. r e s u l t i n g i
function
and,
thus,
"locked"
on
dependency;
reflects
we
the
Specifically,
it is
by R. = ~e. / ~u. i I i
and
is the
respect
to
outlier, this
closer
the
index
to
erroneous above
can be
one,
the m o r e
u i.
Conversely,
index
is c l o s e
evaluated
"rigid"
to
while zero.
be
regression
the
fit
In the
is w i t h
is l o c k e d present
on an
framework,
by
R i = I - w i $(ci) as m a y
the
seen by d i f f e r e n t i a t i o n [ w i ~1(x,
-I v' i A11 v i
of
o I) = o
or of w i ,(e i ) v i = O. We m a y
now
infer what
conditions
the
function
p(~)
must
satisfy
in
42
order
to p r o d u c e
observation
a robust
x. has
and w h a t e v e r Bounding
is
estimation,
a bounded
its n o n - n e g a t i v e
the
rigidity
index
bounding
In fact
$(ci)
observations Bounding
of o - s e c o n d
must
be
sense
estimator
that
each
whatever
it is
w.. z
satisfy R.
<
1
I
derivative,
~(~)
>
$(~)
bounded,
strictly
in the
on the
weight to
0 <
implies
robust
incidence
i.e.
0
a.e.
positive
for
at least
m independent
x. in o r d e r to have a p o s i t i v e - d e f i n i t e m a t r i x A11. --I i n f l u e n c e ~1(xi) implies b o u n d i n g of o-first d e r i v a t i v e ,
the
i.e. ~(¢) In o r d e r
to
obtain
bounded,
an u n b i a s e d
estimator, u = v'
is p o s s i b l e
(with
zero
residuals),
The
fact
matter
that
the
by the
if the
realized
with
function
p1(c)
bounding
almost
must
be
everywhere
derivatives
are
probability
zero,
satisfying
also
=
as
for
relation
"any p o s s i b l e "
indication.
bounded,
set
linear
impose
achieved
such
the
0.
(a.e.)
not
this
when
OI
we
~(0)
reminded
a.e.
It does
or u n d e f i n e d ,
~ corresponding
of c o n d i t i o n s
will
for
some
with be
c is not e
~ ~ ~.
A
said
"admissible". To c o n c l u d e
function
01(¢)
this
discussion,
for ¢ = O, have almost strictly add to
we
must be piecewise
sum up
convex
everywhere
: To be admissible,
(i.e., not concave),
bounded first
two derivatives
convex in the vicinity of at least m residuals
guarantee
~I
unicity
: the function
the be minimum
p1(c)
¢i"
and be
We f u r t h e r
must be continuous
convex everywhere. A natural The
answer
invariant
question
is w h e t h e r
is p o s i t i v e , [I
when
the
but sample
there
admissible space
exists p1(c)
is not
any
admissible
cannot
bounded,
produce i.e.
with
01(c). scale
and
43
= R p, p = m + with
respect
I.
to the
of the n o n - z e r o
Effectively,
assume
scale
residuals,
real
of the
variable
we
require that
~I
to be
is to be
independent
independent
X in
w i ~(~c i) Xi = o; then
we m u s t
have
cancellation
of the
a!l
first
/ax
derivative
= o
or
w.1 E.1 ~ ( ~ c i ) We n o w
compare
arbitrary
set
the
last
and the
of o b s e r v a t i o n
c i ¢ ( X a i) is p r o p o r t i o n a l scale
invariant
to
structure,
last
zi but
weights, @(kEi).
/
two
their
expressions.
~(¢)
For
compatibility
Therefrom
for ~ 6 R p,
¢(c)
= o.
we
obtain
an
implies that
the
that only
satisfies
= constant
or
c ~(~) and
= (~-I)
~(~)
is
p1(~)
= kl~l v
k ~ 0,
v ~
with
But this
specific
form
can be a r b i t r a r i l y bound
the
sample
pI(E)
in o r d e r
alternative
is a n a l y s e d variable
4.3.2.
large.
space
G,
to p r o d u c e
will
investigated
is not
in
now
section
and which
admissible,
We t h u s
face
or we m u s t a robust
retain
our
seeing
the
: either
the
scale
knowledge
The
first
second
term
will
Least
structure
of
"scale"
powers.
section
we f u r t h e r
investigate
the
minimizing
M I = ~ wi 01(~ i) 1
of t h e
be
invariant s, the
estimation
we m u s t
function
¢.
In this
with
the d e r i v a t i v e s
a scale-dependent
estimation.
the
that
an a l t e r n a t i v e
use
attention;
~.~ w h e r e
implies
I.
of
8 --I
01(E/s)
of the
44
pl(~)
= I~l ~
and E
We
assume
that
the
in o r d e r to h a v e sample
space
In this most
2, but
the
avoids
method
is
are
the
criterion
v tending
to
S'
At"
and that
large
found
certainly
Chebyshev's
respectively
-
p(¢)
exceedingly
of p(c)
u
v satisfies
a non-concave
family
famous
parameter
=
residuals
several least
and t h e
infinity
some
of the
c i-
well
squares absolute
and to
bounding
I, have
known
schemes.
minimization value
The with
~ =
minimization,
equally
received
much
attention. Large due
values
to the
applications. of the
of t h e
fact
large
they
In fact residuals
fairly moderate terms,
to
the
smaller
not
any
be
considered
interest
~ is,
the
in the
in c o m m o n
smaller
sequel
statistical
is the
incidence
8 e s t i m a t e ; it a p p e a r s that v m u s t be --I a r e l a t i v e l y r o b u s t e s t i m a t o r or, in o t h e r
an e s t i m a t o r
The
~ will
present
on the
to p r o v i d e
provide
observations.
parameter
do not
selection
scarcely
of an
perturbed
optimal
by
outlying
~ is i n v e s t i g a t e d
by R o n n e r
(1977). The
value
arithmetic
v = 2, c o r r e s p o n d i n g
mean,
of the
Princeton
(1972,
p.
239),
is
still too
group they
based
answer
with
large
on M o n t e the
the
as we
least can
Carlo
squares
or t h e
appreciate
by the
runs.
In A n d r e w s
question
W h i c h was the w o r s t e s t i m a t o r in the s t u d y ? If t h e r e is any c l e a r c a n d i d a t e for such an o v e r a l l s t a t e m e n t , it is the a r i t h m e t i c m e a n , l o n g c e l e b r a t e d b e c a u s e of its m a n y " o p t i m a l i t y p r o p e r t i e s " and its r e v e r e d use in a p p l i c a t i o n s . T h e r e is no contradiction : the o p t i m a l i t y q u e s t i o n s of m a t h e m a t i c a l t h e o r y are i m p o r t a n t for t h a t t h e o r y and v e r y u s e f u l , as o r i e n t a t i o n points, for applicable procedures. If t a k e n as a n y t h i n g m o r e t h a n t h a t , t h e y are c o m p l e t e l y i r r e l e v a n t and m i s l e a d i n g for the b r o a d r e q u i r e m e n t s of p r a c t i c e . Good a p p l i e d s t a t i s t i c i a n s w i l l e i t h e r look at the d a t a and set aside ( r e j e c t ) any c l e a r o u t l i e r s b e f o r e u s i n g the " m e a n " (which, as the s t u d y shows, w i l l p r e v e n t t h e w o r s t ) , or
opinion
et al.
45
t h e y w i l l s w i t c h to t a k i n g the m e d i a n if the distribution looks heavy-tailed." d i f f i c u l t i e s h a v e b e e n s u r v e y e d by H u b e r (1972) and
The
concluded extreme
that
only
data.
expected,
For
but
this
Technically incidence
the
solution
limit
procedure who
investigated
totally
great
facilities. The
view
The
recently
is p o s i t i v e
to the
but
et
oriented
viewpoint,
while
as L e w i s
and
(1975)
theoretical fact
that
their
Shisha
a great
these
of p o w e r s vector These
norms
and t h u s
to
the
location
problems.
the
said,
are
Several
many
is p a r t l y
difficulties ~ is in t h e
own
median
with
are
by
range
?
a reluctancy
Hwang
(1975)
as well
The due
algorithms solutions.
permits to
toward
of
in t h e o r y
of m i n i m i z i n g
Boyd
encountered
object
on p r a c t i c e
a vector
interesting
the
of c o m p u t a t i o n a l
a computation
artificial
oriented
this
assumptions.
generalization from
been
in the
instead
(1921)
to the
aspects.
impact
9, this
experience
is t h a t
(1975),
are u s e d
investigated
with
by J a c k s o n
authors,
topological
that
9 = I as the
is it m e a n i n g f u l
question
e x t e n d ~I
However
on ~ has
Shisha
note
are well
existence
and
immediate
problems
theoretical
a serious
residual
force
Before the
sum
to m i n i m i z e
a matrix
(197h)
to the
to
structure.
and R e y
multidimensional
in the
computations
I < 9 < 2,
for
zero
troublesome.
well
first
function
paper
parameter
residuals
Let us
last
be
a n d an
to m u c h
aspect
for most
to p o s s i b l y
are
vary
- Our
the
of t r i c k s
an
but
needed
analyse have
considerations,
of r e s i d u a l s
norm minimizations
As we
Cargo
convergences
(1975a);
when
approach
number
of [I
present
reluctant
considerations
eventual
leaving
(197h)
most
may
techniques.
recommended
one-dimension
compute,
large
is lost
not
median.
the
the
in c o n s i d e r i n g
interesting
of the
Fletcher
to t h e
estimate
corresponding
does
dependency
pertinence
al.
~I
of t h e
we can
bounds
it has b e e n
with
attributed
a good
programming
resolved
- An
extension
Nowadays,
answer
related
The
one-dimension
domain.
attention
be
be
1.2,
convexity
linear
I. I n a s m u c h
this
a natural
multivariate
by
reasonable; the
supports
provides
also
of ~ >
which
strict
result.
solved
may
appears
then
may
are u s u a l l y
indeterminacy
inferior
But
must
of
be
of o p i n i o n
only parameter
is 9 = I.
and
importance vicinity
is a m a t t e r
the
indeterminate known
limited ~ in the
it m a y
known
discard
components
methods
are
at d i s p o s a l
all t h e m e t h o d s (e.g.,
orthogonal
relying
to m i n i m i z e on
some
decomposition);
functions.
separation due to t h e
of fairly
46
involved in this
non-linearities, way for the
techniques very poor function with
of m i n i m i z a t i o n accuracy
minimum,
relaxation
Gentleman this
second
order derivatives. power
implementation whereas
Ekblom
this
Davidon
(1973,
1974)
perturbed
Ekblom m e t h o d
corresponds
generalized
problem
a fictitious
second
to apply
first
the
size, that
is,
to
of Rey
order
to
the
With present
a clear
literature. techniques
4.3.3.
solve
4.5 was the most regard
discussion
Comparison
the
least
2.4 may be inferred mentioned
but,
power
seeing the
demonstrate the
For
given
it is valid
importance
case
u~.
by a
that
the
of the in addition
of
We have p r e f e r e d chosen
of the
step
second
we have met
noted that with
at
time. and SpKth
to be
found
approach
(1974)
in the prior
with
other
? ~I
fits
in the
papers
of the
v >
frame
we have
question,
conditions
of the m i n i m u m
it does
I, the
squares,
we believe fit.
continuity
the T a y l o r - l i k e ~ = 2.
of section
already
sum of absolute
to d e m o n s t r a t e
for least
the
1963),
of the poles
equations
Merle
small
(1977).
estimator
power p a r a m e t e r ,
to v is sufficient
that
powers
series
under what
situation
larger
least
from the t h e o r e t i c a l
invest i g a t e = I.
experience,
to
recommends
a specially
in c o m p u t a t i o n
in Rey
in Taylor
to d i r e c t l y
respect
non-linear
Icil,
in the
to observe
progressing,
be
of the
and Fowell,
the poles
of several methods
of the
proposed
Can we exoand
Whether
highly
in
residuals,
it consists
with
"avoid"
efficient
to c o m p u t a t i o n a l
has been
method
cannot
(1972)
data,
a
we are left
methods
correspond
restricted
have
a "flat"
as used by
for hazards [I
search
for M M - e s t i m a t o r s .
elimination
scalar
experience
such
absolute
Indeed,
to the
gradient
the
algorithm
a somewhat
Then
determination
It is n o t e w o r t h y
in order to with
for
gradient
Forsythe
the
they
algorithms
(Fletcher
(1975a).
However,
the
solution
proposes
method.
derivatives.
section
the
method
dimension
chosen
involve
because
h.4.1
in the
difficulty,
of the
quadratically
gradient
The
trouble
former,
at section
to us to p r o c e e d
range.
often.
order
are responsible
whenever
To bypass
The
arising
They
quite
to first
order
and they
convergence
residuals.
to n u m e r i c a l
methods.
due to the problems
possible
interesting
occurs
and will be p r e s e n t e d
to a n e g a t i v e algorithm
which
are equivalent
the
appear
also be rejected
inclined
and gradient
Unfortunately, considered
not
~ in the
must
and are situation
(1965),
context
second
it does
parameter
useful
We will
residuals, of ~I with expansion
47
In the
sequel,
indeterminacy, obtained
for
Let ~I
we
the
implicitely
entities
9 tending
be the
to
as p r e v i o u s l y ,
probability
in the
domain
the
density
For
may
situation definition
of the
limit
- v'
e I) f(~)
integral
spans
(u,
function
~')'
f(~)
d~ = o
the e
sample
space
G with
~.
is a s s u m e d
sufficiently
regular
satisfying
~ = I, the
value
in per
above.
Iu - X' !ii When
that,
I equate
of
=
The
~ =
I from
solution
Z ~(u where,
assume
for
function
be u s e d
instance,
the
~(¢)
and this above
small.
becomes
will
undefined
not m a t t e r
limiting
scheme
as
leads
at ~ = 0. long
Any
finite
as it is finite.
to
~(u - ~' 9_i ) = - I, if u < !' i I, = =
We now
perturb
corresponding
f(~)
let
it p r o d u c e
the
reorganize
function
this
of 81
(I
-
The
integrand
domain;
~(u
v'
81,
I,
if
u >
v' --
e --I"
distribution
- v'
t)
- v'
v_~(u
~(u
first
integral
Let
the
solution
e I)
f*Cx)
9_ I ) -
- v_' A t ) term,
consider
I
e_1 a c c o r d i n g
dx
the
be
0 ~ t ~
to
= 0.
to
obtain
have
the
~(u
- v'
9_i )]
g(x)
dx =
0
in fact, to
and
perturbation
in o r d e r
immediately
is equal
g(~)
+ t g(~),
equation
We
of the
and the
=
f(~)
and g(x).
+ t f v
u
perturbed
integral
/
if
on ~I'
= (I - t)
f v
We
some
perturbation
f*(~)
and
by
+
0,
9_I as a
implicit
f(x)
is n o n - z e r o
equation
dx
in a v e r y
limited
48
- 2 [ Z
@(u
- Z'
~i ) f ( z )
dz
with =
"between"
~
= Z' ~I and
is thus
implicit far. later are
continuous
equation
only
assume
discuss
that
cancel
for
know
the
the
What the
the are
=
that
would
g(~),
Then But
set
The
sufficiently
we have both
obtained
~,
when
are
~;_ r e m a i n s
dependent function ~I'
interested
as we
(I
(t2/2)
upon
- 8")[
~2'
"'"
vicinity
regularity
comes
I, v =
-
its
2 f
e.
f(y)
ol
I.
out
are
to
of ~I"
conditions
particularly
Then
dy
fCx) dx =
relation
(I - t)
did p r e v i o u s l y
- t)
the m e d i a n
easily
to
f(x)
o,
with
e
+ t g(x)]
in dx = O.
obtain
+ t [+~
~(x
*)
g(x)
dx
= 0
- e)
g(x)
dx
= 0
- e
approximately,
(I
and
- t)
2
(e
- e
Strict needed
e of the
e or,
can
theorem.
m =
by
sum
...
+
in the
the
terms
to t and we
satisfies
Y+~_ @(x We d e v e l o p
is enough
a 2
I~] ~(x we
This
a 1
problem,
f(x)
respect
e 1
+
both
and t h e i r
t and g(~)
expansion
t
that
so
and will
with
Taylor-like
+
of the
distribution
are v e c t o r i a l
of v a l u e s
term
regular
of the
coefficients
distribution
second
an u n s p e c i f i e d
is also
~I"
is when
be
implicit
one-dimension
whereas
to ~I'
M o r e o v e r ~I is c o n t i n u o u s e s o l u t i o n ~I = ~I for t = O.
validity
small,
demonstration apply
to
condition.
e 1
to
respect
a discrete
trivial
t is
u = ~' ~;
g(~)
with
two h y p e r p l a n e s
given.
conjecture
when
with
that
smoothly
arbitrarily
and
is r e l a t i v e
We will
varying
the
) f(e)
+ t S+~_ ¢ ( x
in
49
e Whether
g(~)
question.
allow
the
above
expansion. The
one
hand,
of g(x)
g(x)
e(x 2 f ( ~ ) 8) g(x)
has to be d i f f e r e n t i a b l e
On the
approximation
function
= e + t I +-Z
by
the
We
support
these
point
of the
critical
On the
a very
poor
"'"
everywhere
limiting
scheme,
a differentiable
presentation.
may mean
dx ÷ d2
is a p a r t l y
9 ÷
function,
other
hand,
convergence
open
I, or an is
sufficient
a fairly
of the
to
rough
Taylor-like
remarks. derivation
is the
continuity
of the
second
.
term
in the
implicit
g(~)
is met
with,
equation
the
with
respect
to ~I"
as a f u n c t i o n
a discontinuous
integral
f Z ¢(u - v' e[) g(~) seen
When
of e I, is d i s c o n t i n u o u s
u=v'_
d~,
at each e I such
that
e_1
with = Note
the
parameter have
(u, ~')'
discontinuities space
smoothed
by
of the
a set
these
: discontinuity integral
of h y p e r p l a n e s
discontinuities;
of g(~).
provide - To
but
a partition
allow
they may
the be
of the
presentation,
we
intrinsically
present. Consider
now v a r i a t i o n
of the
parameter
t,
from t = 0 to t = I.
.
That
may
partition
mean
that
and thus
~I m o v e s has
through
a fairly
several
connex
discontinuous
parts
of the
variation.
This
above can
only
.
be
expressed
by
a slowly
neighbourhood
of [1'
expansion
converge
level
can
of its
4.3.4.
"Best"
We d e s i g n
first
that
robust
is in the
very
order
in this
converging
rapidly
distribution
continuous, - the
1ocatlon section
f(u)
curve
part
of the
and t h e r e f o r e
~I
remains
in the
partition,
the
be t r u n c a t e d
at the
estimator the
function
given
is k n o w n ,
for u 6 [ u_,
influence
robustness).
same
When
term.
a minimum a s y m p t o t i c variance - the
expansion.
u+] .
pI(E),
or ~(e),
which
yields
that except Moreover
is e v e r y w h e r e
for
its
f(u
bounded
location,
) = f(u+)
(criterion
and
= O. of
50
- the
solution
Although same
type
only
of a r g u m e n t
estimations. involved
Thus,
we
of the
seeing
intend
an a p p r o p r i a t e
attention defined
on the
up to
in o r d e r
to
In w a y
2
choice fact
that,
a unit
f o r the
of H u b e r ' s
du
@(¢).
best,
this
have.
variance
e) f(u) du]
factor.
the
thought
already
e)] 2 f(u)
function
between
not
the
2
We
first
function
The
latter
retain
@(¢)
the
can
w i l l be
o n l y be
selected
denominator.
of r o b u s t n e s s
frame
-
If ~ ( u -
of t h e
we
asymptotic
= L [~(u
some m u l t i p l i c a t i v e
have
questions.
the
results
the
of a f a i r l y
existing
E; we have
sequel,
regression
expense
relationships
limited
1
of i n t r o d u c t i o n
constraints in t h e
the
of c o n v e x i t y ) . in the
in m u l t i p l e
be to the
residuals
to m i n i m i z e
V
by
would
of u, ~ and the
profitable
(criterion
is c o n c e r n e d
be p r o d u c e d
this
investigation
defined
estimation
could
However,
distributions effort
e is u n i q u e l y location
to the and
paper
derivation,
convexity. (1964)
However
the
approach
to
the a n a l y t i c a l
we
when
he
be
quite
will
first
omit t h e
We t h e r e f o r e
fit m o r e
investigates different
or less
his m i n i m a x in m a n y
respects. In o r d e r throughout
discretisation operators
be
ease
this
section of t h e
sample
space.
we w i l l
notation
Functions
become
use
resulting vectors,
from
a
whereas
are m a t r i c e s .
Let the
space
defined
by t h e
taken
manipulations,
a vectorial-matricial
at the
coordinate set
be d i s c r e t e .
of v a l u e s
regularly
spaced
{...,
Any
function
y(u)
will
then
Y ( U i _ 1 ) , Y(Ui) , Y ( U i + 1 ) , . . . }
coordinates
{...,
ui_1,
ui, u i + 1 , . . . }
with uj = u 0 + jq,
-
Remark
is not
the
representation
essential,
of v a l u e s , We w i l l
we
other
associate
assume
that to
where
the m a t r i x
D has
basis
n infinitesimal. will
representations a vector functions (~/Su)
are
an i n t e r m e d i a t e
can be p r e f e r r e d
Z of p o s s i b l y
y(u),
elements
only be
infinite
sufficiently we
associate
step
- To the
set
dimension.
differentiable, Dy
and
e.g.,
51 d .1O . = =
-
t/(2n),
if
i
= j-1
1/(2n),
if
i
=
= When 2 d o e s n o t
vanish
differentiation, similarity We
also
in o r d e r arbitrary
at
need
an i n t e g r a t i o n
support
y(u) z(u)
one
f(u)
=
to
With
this
moment,
z(u)
with
z(u), du,
set
be
taken
will
of care
of by
calculus. It will
respect
be
of m a t r i c i a l
to
a weight
f(u)
2'
F A = A'
F 2
type
and two
we a s s o c i a t e
and has
elements
diag( .... f(u i) . . . . ).
I, f(u)
formalism,
only
take
Let 2 be the
the
unit
constant
we
associage
~,
we
associate
2'
we t r a n s f o r m
with into
vector
du,
equality account
associated
the and
the
function inequality
is s u b s t i t u t e d
F ! analysis
problem
constraints.
in a
At the
equalities.
to the
V = 2' under
open
e.g.,
is c o n c e r n e d ,
=
minimization we
the
e.g.,
to $ y(u)
standard
~
which
operator.
F is d i a g o n a l
function
other,
and
of
infinitesimal
an o p e r a t i o n
functions
only
occur
ordinary
F
When
frontier
difficulties
the m a t r i x
to the
the
the
to $ y(u) where
O, o t h e r w i s e ,
with
to
j+l
~(u - e),
then
it m i n i m i z e s
F 2
constraints c I = 2 !'
F 2=
0
and c 2 = 2( !' Constraint stands We
c I indicates
for the solve
multipliers;
that
denominator
this let
minimization them
be
8 is
of the
XI'
F D ~ - I) = O. an M - e s t i m a t o r , asymptotic
by the m e t h o d X2"
Then,
V + ~1 C l or,
after
differentiation
with
respect
~ is
+ ~2 c 2 to
~,
whereas
constraint
variance. of the
Lagrange
also m i n i m u m
of
c2
52
F y + ~I
F 1 + ~2 D'
F I = 0
and Y = - ~I 1 + ~2 F-I The
last
can be its
transformation
performed
domain
only
makes if t h e
of d e f i n i t i o n ; D'F
Introduction
of ~
use
DF 1.
of t h e
antisymmetry
distribution
vanishes
of m a t r i x
on the
D,
frontier
and of
i.e.
= - D F * f(u_)
in c I and
= f(u+)
c2 provides
the
= 0.
Lagrange
multipliers
~I = ~2 !' D F !, given !' F ! = I, X 2 = II!' F D F-I D F !" The
former
cancels
accordingly
with
x11x ~ = !' o F ! = $
(alau)
= f(u+) We
collect
the
results
and
f(u)
du
- f(u_)
= 0
obtain
~(u
- e) = x 2 f(u) -I
(alau)
~(u
- e)
in
f(u)
or
Given
ci,
see t h a t
estimator,
f(u).
i.e.
f we
= x 2 (alau)
~(u
any M - e s t i m a t o r except
for
which
of S t e i n ( 1 9 5 5 ) ,
space
boundary.
when
the
asymptotic
du = 0,
is e q u a l
a translation
sense
The
- e) f(u)
to
constant,
distribution variance
the m a x i m u m is
vanishes
of this
=
-
F y 12
where
x2
=
I
I I [ ( a 2 1 a u 2)
in
f(u)]
on t h e
M-estimator
by V = y'
likelihood
admissible
fCu)
du.
in the sample is g i v e n
53
Although
the
classical, relations of the
we
characteristics
illustrate
between
sample
the
the
various
=
[I/r(~
+
I)]
likelihood
findings
elements.
Ul,...,u n drawn
f(u)
of m a x i m u m
above
Let
from
the
gamma
(u
a) v
exp
-
estimators
in o r d e r us
estimate
-
a)],
strictly Note
the
Effectively, respect
to
u >
a
we h a v e
In the
v.
parameter
restriction derived
present
v
on p a r a m e t e r
a differentiable
boundary.
location
,ifu
positif
that
the
of d e n s i t y
if
=0 for
the
distribution
[-(u
are
to d i s p l a y
the m i n i m u m distribution
case,
is p e r e m p t o r y .
variance
M-estimator
vanishing
it is c o n v e n i e n t
on the
to
with
sample
space
set
u_ = a and u+ = ~. This
yields ~Cu
-
e)
=
x2
{[~/(u
-
a)]
-
1}
and V = - X 2 = V - I, f o r Therefrom,
and
accordingly I
we
can
define
I .
with ~(u
e)
-
f(u)
du
= 0,
e through wi
or by t h e
~ >
explicit
{ [ 9/(U.1
result
-
derived
e)]
with
1}
= 0
respect
to
e0,
an a p p r o x i m a t i o n
of 8.
e =
This not
eo +
estimator
robust.
~ wi
[(1/~)
-
is m i n i m u m
- By the
way,
of d i f f i c u l t i e s
the
of s e c t i o n
of e is not high
order
anymore terms
i
-
eo)]
in a s y m p t o t i c
note
is i n d i c a t i v e expansion
1/(u
that
in t h e
inversely the
variance
V tends
too
slowly.
to t h e
first
expansion
n var(@)
= V + 0(n-l).
eo)2].
obviously
is
for v = I; t h i s
conditions
proportional in the
-
but
to c a n c e l
analytical
2.4 to c o n v e r g e
dominate
[ [ wi/(u i
/
which
Then the
sample
size
lead
variance n; the
54
We now p r o c e e d constraints a minimum
which
in the
derivation
by
impose
robustness
to the M - e s t i m a t o r .
variance
e and thus
have
equality
of the
inequality We
search
for
to m i n i m i z e
V=y' under
addition
FZ
constraints e I
=
2
1'
F
y
=
0
e2
=
2( S '
F
D
Z
-
and
We
further
limit
p(.)
to be
I)
=
convex,
0.
that
is we r e s t r i c t
by
C3k = - 2~' k D [ < O, for any basis bound mean may
to the
vector
influence
quadratic be
~k =
stated
curve.
value. in the
Thus
Inspection
reveals
only
This
bound
a second
We
will
group
of
be
also
set
a superior
set r e l a t i v e
inequality
to the
constraints
form
Chl
if, and
(0,...,0,1,0,...,0)'
that
= (e' 1 Z) 2 - 8 [' the
set
F ~ ~ 0.
of e q u a t i o n s
has
a non t r i v i a l
solution
if, 8>I.
The
situation
median We
of f(u)
8 = I, seen for
investigate
method
of Kuhn
section
h.3.3).
be m i n i m u n
as the
solution. this
and
limit
It is the most
inequality
Tucker,
to the
to
above
the
inequalities
lead
the
[ X3k with
the
Beveridge derivation,
following
last
C3k + [ Xhl
two
sums
C3k + [ Xhl
constraints
on the
X3k > 0
and
to
and
Xhl > 0
the
by the
Schechter
the
solution
Chl ,
satisfy
ehl = O,
Lagrange
gives
e.
minimization
of
V + X1c I + X2c 2 + ~ X3k but
8-values,
robust
constrained
according
Similarly
of g r e a t e r
multipliers
(1970, Z must
55
for
all k and
delimited not
I.
There
by the
exterior
to this
as r e s o l v e d
accessible
minimum. now
a further
region.
considered
Consider
is
inequalities;
the
seeing
the
requirement
minimum
This
question
that
8 >
term
conclude
as w e l l
can
the
of a c c e s s i b i l i t y
sequel,
Vajda We order
the
contribute
in a non
at the
accessible
either
k
will
be
existence
of an
= 0.
positive
minimum,
way,
therefrom
we n e c e s s a r i l y
and
or
= 0 3k k3k > 0
and
< 0 3k C3k = 0,
either
~41 = 0
and
c41 < 0
or
ALl >
and
Chl = 0.
alternatives are not
lies
we
will
of the
on the carry
thorough
(1961,
we
have
c
obtain
boundary
the
is o b v i o u s l y
binding of the
attention
discussion
section
differentiate to
0
of these
second
for the
for the
accessibility
on the
of the
that,
and that,
region.
terms;
Kuhn-Tucker
combined
the m i n i m u m .
this
approach
expression,
This
yields
with
respect
terms, In the
is
in the
given
to ~,
to
) F Z + ~I F ! + ~2 D' F !
-
D'
(e'
Z) ~ i = 0
and --
)-I
- [ ~3k is u n d e r
first
second
12.~). the
(I - 8 ~ ~ i
This
region
and thus
condition
constraints
the m i n i m u m
line
the
accessible
as,
The m e a n i n g terms,
only
that,
be
I guarantees
~3k C3k + X ~ 1 % 1 Each
concerning
must
conditions
F -I
D ~k - X ~ I
(e' ~)
--
F -I
~l l"
in
by
56
D'
F = - OF o r
f(u_)
= f(u+)
= 0
and
The
latter
on t h e
frontier
The will
factor
of t h e
are,
types
subsets
Type
be
in
to
that
third,
of
cancel
new
the
definitions
that
- e)
can
They
are
inequalities
on
Lagrange
matrices
assume
$(u
be c o n s t r a i n i n g
multiplicative
a continuous
space.
of
cannot
of t h e
the
accept
note
of b e h a v i o u r s
groups
(ZBk m u s t
obtain
sample
criterion
that
if we
And
D ~k-
in t e r m s
note
Second,
corresponding
= - Z3k
space
omitted,
in o r d e r
two
~k
for ~
First,
of t h e
I : the
obtain
convexity
diagonal.
derivative
different
the
sample
analysed. can
D'
obtained
multipliers.
or n e a r l y
-
be
(.)-I
Lagrange
connex
that
expression
now
first
implies
Z3k
F -I
frontier). multipliers scalar
of
the
and
(F-ID)
f(u)
has
a continuous
~(u
- e).
Then,
be d i s t i n g u i s h e d
are
not
binding.
are,
three
in
Then
we
subsets.
k3k_; Y =
~41
k2 F
= 0
D F I_ - ~,i !
or
$(u
- o) = x 2 [ ( a / a u )
in f(u)]
- ~I
with
[¢(u
-
0)] 2 ,~ Bv
and
-
~(u - Type
2
: in
subsets
where
the C3k
e) > o .
convexity = O,
criterion
Chl < 0
and $(u
- 0)
= constant
with
[~,(u
-
0)]2
<
By.
is b i n d i n g ,
we h a v e
87
-
Type
3
: in t h e
criterion
remaining
is b i n d i n g .
subsets
of
the
sample
space,
the
robustness
Thus c41
= 0,
C3k <
0
~(u
- e) = ± ~ B V .
and
Observe The
that
$(u
above
section
it
M-estimators To the
how
in t h e
We n o w
or
determine
structure. ~(u
case. we
example. the
There
- 8) has
the
We
gamma in
that
~ =
80 w i t h
the
equation
is
estimate
certainly
expression,
the
way
the
robust
location
the
maximum
in H u b e r
(1972,
estimators.
a n d we
to
3 behaviour.
variance
consider
distribution
a similar
require
conjecture
minimum
location
we m a y
mean"
estimate
discourse,
from
a type
likelihood
above
contribute
terms,
to
subsets,
the
maximum
present
the met
in two
supports
demonstrates
previously
"quadratic
fully
robust
Ul,...,u n drawn
precise
at m o s t
concerning
illustrate
observations
has,
derivation
12.3)
moreover,
- 8),
But,
robust
estimation
of t h e
require
incidence
sample
that
estimate.
for
all
In m o r e be
twice
the
find
its
B 0 = 22 = ~. of
~(u
a type written
- e).
First,
let
us
given
B >
I.
I subset, with
unknown
There,
coefficients
b I and
b2, ~(u
Seeing the
that
the
existence
There
can
be
o)
convexity
one
- e)
or
two
bound,
=
b I ll/(u
criterion
of a u n i q u e
corresponding ~(u
-
type
of t y p e
summarize
the
= - blb3, =biE
S/(u
a)
is n e v e r
I subset
subsets
we
-
- h2] "
binding,
and
to
the
3.
Noting
structure
of
- b21 ,
with
b~
b2, =
b5 =
~(u
- e)
if b 4 ~ u - a ~ b 5 if u ~ b 5
b3 ~ 0
I / ( b 2 + b 3) I/(h 2 - b3) ,
if b 2 ~ b 3 if b 2 ~ b 3 .
conclude of t y p e
to 2.
blb 3 the
if u - a ~ b~ - a)
= + blb3,
bl,
we
absence
as f o l l o w s
88
-
It m a y
be n o t e w o r t h y
obtained
by
Huber
distribution The
although
w a y we h a v e
determination variance verify
the
observe section
the
constraint
they
ci,
satisfy
as
such t h a t
is v e r y
nearly
contaminated
the
result
normal
differs.
function
~(u
In o r d e r
to
coefficients
- 8) p e r m i t s produce
must
be
an e a s y
a minimum
such t h a t
they
i.e.
¢(u - e) f(u)
the
constraint
f~ t(~/~u) as w e l l
context
the
the
this
for t h e
coefficients.
M-estimator,
f~ such t h a t
that 6)
present
parametrized
of its
robust
to
(1964,
they
du = 0,
c2,
i.e.
~(u - e)J f(u) du = I,
satisfy
c41
in t h e
max~ ~(u - e)l 2 =
type
(blb3)2
3 subsets,
i.e.
~ sV
with
f~ E~(u - e)J2 f(u) du.
V Due to the
absence
constraints This
set
means. values
of i m p l i c i t
of ~ a n d good
B.
but
less
To set
Our
V = ~;
further
define
have
concern
it
can
omitted
easily
I the r e s u l t s is for
With
this
is m o r e
be
the
solved
obtained
B = 4 and B-value,
efficient
inequality
by n u m e r i c a l
for
we its
than
relative
efficiency
= ~/(~
than
the
non-robust
minimum
relative
efficiency
= ~/(~
illustrate,
~ = 3.
we w r i t e
Accordingly
f ~(u we
we
a few particular
see that
the
asymptotic
the
robust
variance
non-robust
mean,
efficient
B = 4,
equations
in T a b l e
efficiency.
is a p p r o x i m a t e l y arithmetic
2 subset,
CBk.
We p r o p o s e
e exhibits
of t y p e
e through
the
down t h e
+ I), variance - I).
solution
with
- e) f(u)
implicit
du = O,
equation
M-estimator,
for the
parameter
59
[ ~i
~(ui
- e)
=
o
or .2849 [ wj - [ Wk[ I/(u k - e) " .3076]
= 0
where u i = uj, = Uk, Some 9 percents observations
v=2,
v=3,
of their
into account through their weights
and
exact values. V
b5 $b 4 f(u) du
8=4
6.002
.4401
.4698
1.988
.9006
6=9
4.114
.4694
.9171
1.582
.9632
8=25
3.052
.4879
1.878
1.314
.9908
8=100
2.458
.4968
4.353
1.145
.9988
8=4
12.07 8.921
.3076 .3219
.2849 .5309
2.956 2.492
.9086 .9686
8=25 8=100
7.212
.3297
1.031
2.212
6.369
.3327
2.256
2.065
.9932 .9993
8=4
20.27 15.68
.2360 .2444
.1959 .3546
3.942
.9144
3.435
.9723
13.30 12.28
.2485 .2498
.6675
3.154
.9947
1.418
3.032
.9996
8=4
114.1
.0980
.0553
8--9 8=25
97.92
.0994
.0934
9.958 9.295
.9826
91.42
.0999
.1645
9.046
.9981
8=100
90.06
.1ooo
.3331
9.002
I .000
8 large
v(v-1)
1/~
*
v-1
8=100
b3 = 8
there the
b3
8=25
v > I,
lie in the type 3 subset;
b2
6=9
v=10,
1.688 + e.
bI
8=9
v=4,
if u i >
of the weights
are taken
irrespectively
if u i ~ 1.688 + e
1/2 [ v 2 ( ~ - 1 ) ] - 1/2
Table
I
.93O3
60
The be
estimator
applied
8 is c o n s i s t e n t
to the
assessment
approximately
according
to f(u).
distributions
than
may
4.4.
MM-estlmators
We h a v e robust some -
f(u)
M-estimators
we
robustly
We w i l l residuals scatter (1976)
are
a regression
defines
what
a.
sample
e means
It m a y
distributed
for o t h e r
difficulties.
estimation that
to
it was
without
is t r u e
involving
even
include
not
in the
a scale
possible
to
obtain
a dependence
simple
point
estimation
on
location
whenever
we
problem. by the
as w e l l
This
to p a r a m e t e r of a n y
conceptual
h.3.1
concerned
we c o u l d
estimation. who
- This
compelled
be m a i n l y but
section
of r e g r e s s i o n
parameter
Therefore,
solve
at
respect
location
However,
present
in r e g r e s s i o n
observed
"scale"
with
of the
be
is the
estimation
in n e e d object
of
of the
scale
of the
some m u l t i d i m e n s i o n a l
of the
simultaneously
location
and
"scale"
residuals
?
investigation scatter
by
of M a r o n n a
a set of
MM-estimators. But
what
is t h e
disappointing
answer
has
of the
been
proposed
A partially
by H u b e r
(196~).
We
excerpt
:
"The t h e o r y of e s t i m a t i n g a s c a l e p a r a m e t e r is less s a t i s f a c t o r y t h a n t h a t of e s t i m a t i n g a location parameter. P e r h a p s the m a i n s o u r c e of t r o u b l e is that t h e r e is no n a t u r a l " c a n o n i c a l " p a r a m e t e r to be estimated. In the case of a l o c a t i o n p a r a m e t e r , it was c o n v e n i e n t to r e s t r i c t a t t e n t i o n to s y m m e t r i c d i s t r i b u t i o n s ; t h e n t h e r e is a n a t u r a l l o c a t i o n p a r a m e t e r , n a m e l y t h e l o c a t i o n of the c e n t e r of s y m m e t r y , and we c o u l d s e p a r a t e d i f f i c u l t i e s by o p t i m i z i n g the e s t i m a t o r for s y m m e t r i c distributions (where we k n o w w h a t we are e s t i m a t i n g ) and t h e n i n v e s t i g a t e the p r o p e r t i e s of this o p t i m a l e s t i m a t o r for n o n s t a n d a r d c o n d i t i o n s , e.g., for nonsymmetric distributions. In the case of s c a l e p a r a m e t e r s , we m e e t , t y p i c a l l y , h i g h l y a s y m m e t r i c d i s t r i b u t i o n s , and the a b o v e d e v i c e to e n s u r e u n i c i t y of the p a r a m e t e r to be e s t i m a t e d fails. M o r e o v e r , it b e c o m e s q u e s t i o n a b l e , w h e t h e r one s h o u l d m i n i m i z e b i a s or v a r i a n c e of the e s t i m a t o r . " The scale will to
same and
only
some
author
scatter assume
estimator
has
recently
definitions that, ~I
given
through
investigated
(1977a).
For
various our p a r t ,
a set of r e s i d u a l s
approaches
to
at the m o m e n t
we
Cl,...,a n corresponding
61
•
=
cl we have
a minimization
rule
u
-
i
which
Z'i
~I
provides
s=8
' the
scale
s according
to
2
and M 2 = min
for
e2
with
M2 = I In
order
to
function
provide
p2(.)
P2 ( ~ ' ~ 1 ' e 2 )
a measure of
must
be
selected
the
scale
such
that
e 2 = scale is c o n s i s t e n t
f(~)
of
d~.
of
the
residuals,
the
(¢1,...,En)
with
Ixle 2 = scale We now d e v o t e
our
of
attention
(X c 1 . . . . .
X Cn ) , X 6 R.
to the M M - e s t i m a t o r
~I"
It will
be
such
that M I = rain for
eI
with MI where
Pl
is c o n s t r a i n e d ~I
= [ Pl
(~'A1'e2) f(K)
to y i e l d
compatibility
= regression
estimate
on
d~ between
(~1,...,~n)
and ~I The two to have
= regression conditions
the v e r y
on the
natural
Pl(~,
estimate
A 1,
Pl-
on and
(X ~ I , . . . , X P2-structures
two
next
in use
subsections, to
the
e 2 ) = p l ( £ / e 2 , i 1) =
In the
constrain
form
= Pl [ (u -Z'
procedures
~n ), ~ E R+.
solve
the
01
we p r e s e n t set
~l)/e 2]
(c/s).
the m a i n
of i m p l i c i t
computational
equations
former
62
The
last
provide
subsection robust
f
~1(~,
A1,
e 2) f ( ~ )
d~ = O,
f
~2(~,
A1 , e 2) f ( ~ )
d~ = O.
indicates
the m a i n
proposals
designed
in o r d e r
to
estimators.
4.4.1, Relaxation methods Although present be
many
the m a i n
sufficient The
one
are
algorithm
where
to i n d i c a t e
algorithm
repeated
variations
is
sufficient.
Each
and we w i l l
assume
of ~I
and
In
times
that the
we step
corresponding
~2(~,
step,
in a t r e a t m e n t
convergence will
at d i s p o s a l
squares
In p r i n c i p l e practice
these
no great
usually
two
difficulty
an e x p l i c i t
is a k n o w n
reorganized
regression First
an M - e s t i m a t o r notation
in o r d e r
of
previously,
i.e.
k+1)
f(x)
-
could
can
enter
the
method. k
(~
, e 2)
and we
dx = 0 _ •
dx = 0
--
"
be d i f f i c u l t
is e n c o u n t e r e d
k+1 e2
e~ +I-
$.3.
being
Let
according
= S(c~,
of the
to be
of r e g r e s s i o n
section
we
k
of
' 02
We d e t a i l
that,
is
at this
to
solve,
level.
The
but
in
first
solution
function
algorithm. observe
k+l
which
is d e e m e d
solution
f(x) _
solution
A1
only will
of
k+1) e2
A~,
equations
s = S(.)
solution
This
be i d e n t i f i e d by an index k k ~I and e2, a p p r o x i m a t i o n s
Initially,
least
^k+1 ~I , the
$ ~1 ( ~ '
did
the
begins.
_k+1 ' the ~2
- Compute
be
or
to the
$
can
until
we will
appears.
difficulties.
consists
k, we start with the a p p r o x i m a t e in ~• I k+1 ' e2k+1) by the s c h e m e
- Compute
where
the
methods,
factor
step
improve
has
lie
have
in t h e s e
no d a m p i n g
and
repetition,
e2, w h e n
solution
where
iterative
or s e v e r a l
possible
to
.,¢~)
residuals.
solved
by
The
any l e a s t
second
equation
squares
further. already
(g = I). us
..
define
estimated,
Therefore, the
scalar
we
are
we will
estimating
join
function
the
@(g)
as we
63
¢i(£, !I, e2) = ¢I [(u - Z' A1)le2] = - ¢ ( u - Z' = - ¢(~) Z, then
the
equation
AI ) Z
can be w r i t t e n
[ wi ¢(k)i
v--l = 0
for
f(~) = (11[ w i) [ w i ~(~ - ~i ). And,
to
conclude,
the
equation
is t r a n s f o r m e d
k k wi[¢(¢i)/~i]
(u i - -v'
in
AI)
i
Zi = 0
or
this
produces
system.
an
improved
Whether
effectively
n a t u r e of the f u n c t i o n ^k+1 to ~I , by r e p e t i t i o n admissible is quite
in the
general
can be v e r y 8)
clear
as w e l l
important whether
starting
set
that
that,
after
drawback
this
can
be
be a s s e s s e d possibly
and
corresponds
• Ik +-1~Ik
depends
upon
that
converges
seen
Note
situations,
in c o m p u t a t i o n
the
well
time
the
is approach
selected
- see H u b e r
(1973,
(1974).
relaxation
erratic
[I
¢(s)
method
be o b s e r v e d importance situations to the
at s e c t i o n
a few
results
above
in the
linear
be
that
It m a y
in the v i c i n i t y
last
whenever
Dutter
associated
of the
h.3.1.
the u t m o s t
saddle-points)
Thus,
is linear
and
of this
has
easily
specific
cheaper
inversion
computation,
of a few p a t h o l o g i c a l
will
converge.
much
in
it c o n v e r g e s .
(and p o s s i b l y problem
last
section
as H u b e r
(AV , e~)
Investigation author
of the of
by
improvement
It m a y
and that,
section
an
~I
¢(~).
sense
methods
An
estimate
is that
that
existence space
h.5
the
- For
steps,
of the
approximately
the
final
final
revealed
parameter
is not
sometimes
on the has
it
to this
of
several
of
(~I'
time
being
relaxation solution,
minima
82)
- This
we
assume
method the
the
solution.
does
convergence
to
-I -I = A11 A12 A 2 2 A21
(e~ ^k+1) -- - ~I '
where
Ajk = f C j k We omit the
the
argument
derivation
leading
presented
(~' At' e2) f ( ~ )
to this
at s e c t i o n
result ~.2.
d~.
seeing
its
similarity
with
64
In
spite
frequently most
of the
centers least
have
as
software
package
Various sets
methods
we
only
simultaneously perform The
~I
4.5 b a s e d
the m o m e n t ,
of e q u a t i o n s the
and
defining
we have
we want
the
by
e 2.
Each
equation our
the
improved
solve
The
algorithms Recent
as Y p e l a a r Coleman
and
convergence,
This
is
computer generalized
are g e n e r a l l y
experience
and V e l l e m a n
has
been
(1977),
a
et al. (1977).
to
of the
direct
in the and
h.2
simultaneously
a fairly
problem
solution
estimates
general
section
estimate
describe
of [I space
by
e 2.
([I'
It p r e s e n t s
leading
to
method
but,
iterations
and of
the
general
at hand,
can be p r e s e n t e d
an M M - e s t i m a t o r .
of
at
which
In fact,
we
82)" for any
many
number
g
similarities
an e x p r e s s i o n
of the
~(x). at d i s p o s a l
solution
a set
Under
only
of a p p r o x i m a t i o n s
(el,...,eg)
is a p p r o x i m a t e l y
attention
expansions.
to
We w i l l
f Cj fax = f Cj(x,
devote
most
iterations
quite
function
Assume
that
proposed
consider
derivation
influence
been
produce
is
method.
on a m o d i f i c a t i o n
Newton-Raphson argument
a relaxation
software
of
solutions
have
of v a l u e s
at s e c t i o n
that
is p r e s e n t e d
approach
consideration
squares.
as well
above
hazards
problems. least
(1977)
Slmultaneous
4.4.2.
into
of the
possible
an e f f i c i e n t
regression
"reweighted"
by Gross
of the
adopted
taking
at d i s p o s a l
reported
with
have
when
squares
denoted
deficiencies
knowledge
experimenters
understandable
two
obvious
without
(el,...,eg),
and
set of g e q u a t i o n s
e I ..... Og) f(x) dx = 0.
satisfied
to the
the u s u a l
of the
first
conditions
by the
order
set
terms
(el,..., in the
eg)
and we
following
of d i f f e r e n t i a b i l i t y ,
we have
t
f Cj fax = f [¢j + X ¢~k (ek - ek)] fd~ = f Cj fd~ + ~ [f Cjk fax] (e k - e k) = X Ajk(e k - ek). We see
the
equations. that
is w h e n
another,
difference
When the
the
(e~ - e.)j is the
coefficients
estimators
a solution
are
Ajk
(k @
relatively
can be d e r i v e d .
Under
solution
of
a set of g l i n e a r
j) are d o m i n a t e d
by Aj~,
independent
one
from
65
II Ik Aik A k-ik Ak~. A?! jj IT < < I, k ~ i, k # j, we
obtain
ej = e~ - {X A~! jj A j k A k-i k f ~k* fax - 2 A~! JJ f Sj* fdx}
*
•
j fdx
: ej - ;
(x, e I ..... eg) f(x) dx.
= e 5 - $ nj Inspection
of the m a t h e m a t i c a l
expression
is c o r r e c t
upon
treatment
if the
reveals
estimators
that
are
the
strongly
last dependent
one a n o t h e r .
Therefrom finite
we
conclude
increments
section
that
e~ - e~.
robust
This
MM-estimators
is i m p o r t a n t
are
in v i e w
obtained of the
through
next
where
R. b e c o m e s a c o n t i n u o u s f u n c t i o n of some p a r a m e t e r J from (81,... , eg). F u r t h e r if we o b s e r v e large i n c r e m e n t s ,
independent this
even
must
lead
us
to
suspect
lack
of r o b u s t n e s s
of the
concerned
MM-estimator.
4.4.3.
Some
Basically robustness
proposals very
few p r o p o s a l s
is f r e q u e n t l y
functions
~I and
have
difficult
~2 s h o u l d
ve
been
to
select
f ~I (x' ~I'
advanced
and t h e i r
appreciate.
The
level
question
of
is what
in
02) f(~)
d~ = 0
AI , o 2 ) f(~)
d~ = 0
and $ ~2(x, in o r d e r
to
obtain
robust
estimate
of the
u = Z' AI and r o b u s t
scale
estimate
of the
Possibly
that
the most
Princeton
Although
described the
are
exhaustive
Robustness
thoroughly
both
Group
of
c.
simultaneously set
of p r o p o s a l s
and
Study
in the m o d e l
residuals
(see A n d r e w s
by Gross
Princeton
11
+
s = O 2 = scale - Let us r e c a l l
regression
Tukey
has
been
robust has
et al.,
or n o n - r o b u s t
been
1972)
conceived and
it has
by the been
(1973). largely
concerned
by l o c a t i o n
66
problems, field.
many
of t h e i r
In this
although
they
will
retain
not
are o b t a i n e d at
section
well
as
There to
should
already
Assume
that
for k n o w n
to
problem
of the
regression procedures
at hand.
one-step
from
Thus,
estimators
relaxation
can be d r a w n
associate
a given
although
we d i s s o c i a t e
scale
to t h e
method
Bickel
we
which seen
(1975)
as
work.
of @ 2 ( . ) ,
At the m o m e n t ,
to t h e
iteration
details
reason
selection
adapted
the c o m p u t a t i o n a l
on the v a r i o u s
referred
is no o b v i o u s
performed.
adapted
a single
Further
can be
disregard
attention
through
a specific
we
he w e l l
the
h.4.1.
in the
estimators
section,
this
these
selection
of @i(.)
is f r e q u e n t l y selections.
factor
s, we
estimate
M I = ~ w i p1(ei)
= min
for ~I
~I
through
with ¢i = ui then
the
family
two m a i n
classes
2 c ,
=
= ks is a p a r a b o l a
functions proposal
becoming
classes
whatever squares
I~1
-
ks),
prolonged gradually
: first,
a
~ ks
if
Ic]
if
I~1 > k s ,
by two t a n g e n t s constant
for
and,
large
=
I - COS [ e/(CS)] , if
I¢I ~ wCS
=
2
I~I
is ¢ and,
, if
for
have
small
robust
distributions, the m i n i m a x
known
scale
least
bounded
>
first
residuals,
squares.
In o r d e r to has
estimation
Huber
second,
residuals,
families among
of
them
a
~cs.
and
second
are e q u i v a l e n t
s.
His
It p r o d u c e s distribution avoid
suggested
any
(1964)
good
the
can be
results
is not
context
f o r the
proposal
incidence
to u s e
in t h e
demonstrated
solution
parameter
reference
1975)
following
derivatives to t h e
least
method.
as b e i n g
the
C2
of p r o p o s a l s
Investigating normal
are t h e
due to A n d r e w s PiCe)
Both
of p r o p o s a l s
AI
i
due to H u b e r
~1C~)
which
- -v'
of c o n t a m i n a t e d
optimality location seen
as
but p o s s i b l y
of his
problem
proposal with
a "robustified" sub-optimal
normal. of t h e
a function
0(e)
outliers, constant
Andrews
(1974,
for l a r g e
when
67
residuals. 25A
This
in A n d r e w s
had
also
et al.
continuous
~(¢),
c-domain.
Therefore,
these
produce
unforseen
clearly
appear
the
hand,
one
and,
on the
been
1972)
approaches these
results
in the ~I
yield
functions
possibly
hand,
by H a m p e l (1977),
to n e g a t i v e
are
not
at the
$(¢)
are
12A to
a
for
some
and may
way.
of this
with
solutions
using
admissible
end
discontinuities
several
(proposals
but,
in an u n n o t i c e d
illustration
exhibits
other
considered
and by Gross
This
will
section
respect
where,
to p a r a m e t e r
possible
for
some
on c
given
c. We now turn robust
our
attention
definition
to the
is the m e d i a n
scale
of the
factor
residuals
estimate.
A seemingly
in a b s o l u t e
values,
i.e. S = e2 = m e d i a n
(ISll .....
l~nl)
or
e 2 = lim e2v , for v ÷
I, v >
I
in
M 2 = ~ w i p2(¢i
) = min
for
e2v
with
This
definition
definitions statistics
in the
frame
of
gained
normal
be
been
(e.g.,
robustness. be
will
have
section
the two
it r e m a i n s
to p e r f o r m as
to
(1977b)
seen,
avoid propose
obtain
Dutter
illustration. based but
furthermore
of the
the
this
this
on how to tables
of pi(¢)
they
they
Several on
other
some o r d e r
do not
necessarily
sometimes
select
in the
and
computations may
involve
trouble,
to m i n i m i z e
simultaneously
exhibit
a scale
appendix
fit
poor
estimator
can
on c o n t a m i n a t e d
Huber
and
leading rather
Their
been
to the
(1974)
expression
e 2.
have
than
decided
not
seem v e r y
properties.
- Z'i
~1)/sl
appealing
We feel
afraid
to us by the
s + As
calculations. as well (MI,
expression,
= rain for ~I'
in spite level
of
upon,
eventual
expensive
and D u t t e r
another ~I
p2(¢)
M2)
such that
resolved
several
s >
0
favorable
of a r b i t r a r i n e s s .
In
as H u b e r
(1977), wi P[ (ui
does
2.4,
definitions
estimates;
they
range)
insight
inspection
the
distributions.
Once
order
in
frequently
interquartile
Some m o r e by
applied
proposed
by
68
Quite
a different
in d i r e c t l y among
alteri,
by H a m p e l argument factor
approach
minimizing the
(1975)
the
has
scale
attitude
of J a e c k e l
as p r o v i d i n g
appears
dubious
the
to us.
above
procedure
selection
of
it c o n s i s t s
instead
This
(1972b)
optimum When
of M I.
a n d has
breakdown
the
been recommended
point,
definition
is,
but
of the
the
scale
(loll
for v e r y
large
power The
everywhere.
The
Campenhout
a ~l-estimate
the
Pl(S)
= min
v.
It m a y ,
above
corresponding
to
the
1 + l/v]
[ ~ ( l ~ ,l / s' -) ' " therefore,
function
solution
(1977)
escape
produces
I%1),
.....
a function
M-estimators.
to
been mentioned,
is s = median
the
already estimate
is not u n i q u e .
it does
not
combinatorial
be
seen
is a d m i s s i b l e ,
seem
Further,
possible
as the l i m i t
but
not
after
to d e s i g n
of
convex Cover
and v a n
an a l g o r i t h m
able
complexity.
Illustration
@.4.4.
We b r i e f l y of r e g r e s s i o n The
report on
regression
considered for the
(1965,
and
Wood
observations
problem
by m a n y
oxydation
Brownlee Daniel
the
a classical we
are
authors. of
It
amonia
section (1971,
made
in c o m p a r i n g
several
methods
example. investigating is r e l a t i v e
into
13.12), chapter
nitric Draper
has
to t h e
acid and and
5), A n d r e w s
already
been
operation
of a p l a n t
can be f o u n d
Smith
(1966,
(197~)
in
chapter
and D e n b y
6),
and M a l l o w s
(1977). The
data
consists flow,
set has
in the
against
against constant
the
size
regression
n = 21
of ui,
v3i , a c o o l i n g
v4i , an a c i d in the
and
is o f d i m e n s i o n
a stack
water
inlet
concentration;
the
loss,
against
temperature, term Vli=
p = ~.
It
v2i , an air
as w e l l
as
I introduces
a
regression
u i = v'i 8_i + ¢ i. In this
example,
various
observations
(i =
distribution
of t h e
techniques
1,3,~,21)
are
seventeen
have
clearly
others
revealed outlying
which
form
that with
a neat
four respect
to the
cluster.
69
Although be
this
does
ascertained.
not
a method
of n o n - l i n e a r
(1976).
This
2-dimension
method
plane
observations; close. the
In o r d e r
in the
of t h e
observation
21 o b s e r v a t i o n s
multidimensional
produces
while
distant
distance
appear
A plot
preserving
points
to be
a map
the
remain
independent
d,. zj between
scaling
of the
distance
two o b s e r v a t i o n s
this
in F i g u r e
described
~-dimension
distant of the
coordinates,
is g i v e n
in Rey
space
in a
relationships
and
close
coordinate x. and -x. --z -j
can I by
between
points
remain
dimensionalities, has b e e n
defined
according t o d~• j where ~Ik
is m e m b e r
solutions. have
= ~ -~
The
scarcely
of a set
exact
any
- -v' i
[(ui
Ilk ) -
{el- ,I' ~I ,2'
composition
incidence
(u j
of this
on t h e m e t r i c
- -v' ~ Ilk)] 2 '
"" .} of r o b u s t set of
regression
solutions
as long
as t h e y
seems
to
are
independent. Four
different
selections
of t h e
function
~I = X w i p1(~i ) to be m i n i m i z e d
will
results
reported
scale
will
be
factor
s,
n o w be
considered
in T a b l e
table
is c o r r e c t
Weighted outlying 21 to
observations
size
17.
outliers. whereas
up
This
In T a b l e
Fit
to the
least squares.
last
With
procedure
2 corresponds
Least 9-th power.
implies
v between
incidence Rey
of the
(1975a)
result
2 and
or,
17.
the
form
outlying
for v = 1.2
HuberWs
I, a r a n g e
at less
method.
of
=
selection
some
corresponding
digit.
to go
solutions
identification least
squares
with more
is o b t a i n e d
by the m e t h o d
as Fit
of the
smoothly
four
from
size
of the on
size
21
I~I ~
observations
expense,
is r e p o r t e d As
prior
ordinary
size
Selecting
pI(C) with
each
and the
= c 2, the w e i g h t
decreased
I is the with
printed
pi(¢)
is g r a d u a l l y
2, F i t
for
(I~iI ..... ICnl).
s = median This
and
2, n a m e l y [I
we
by the
of next
3.
seen p r e v i o u s l y ,
select
or less algorithm
section.
The
of
70
21
15e 11ol2 Q
20
16 10 19 18
14
§1] 5 1
2
3
4
FIG. 1
? 90 0 6
13
71
2 = ¢ ,
pi(¢)
if E > ks,
= ks(2 The
result
proposed
for
I~I- ks),
k = I is r e p o r t e d
by H u b e r
is the
only
if ~ > ks.
as Fit
4.
admissible
Let us
method
note
among
that
the
the m e t h o d
four
we
are
considering.
Method of Andrews. o1(c)
A result
for
strictly
with
large the
same
solution
given
1.5
c value. c =
and
5.
It does 5, last
The
four
outlying
s on
size
size
21 r e s u l t
the
size
17 r e s u l t
of this way.
21
There
7 and
claim
that
size
be
several
between
due
result
to
is
is n o n s e n s e ,
correct Possibly
it m a y
produce
are d i s c o n t i n u i t i e s may
the
17 d i f f e r
defensible.
is that
correspond I column)
is a n e a r l y
Fit 8 are t h r e e
intermediate
not but
observations and on
is not
method
c and there
6, Fit
a value
I~I > ~ e s
Table
computation.
parameter Fit
1.8,
Icl ~ ~cs
if
the
weakness
of the
as Fit
(1974,
is
if
factor
in an u n n o t i c e d
function
with
in his
the
structure
eos[c/(cs)],
result
scale
whereas
selected
is r e p o r t e d
In fact,
important
results
2,
as w i t h o u t
the
significantly.
most
I -
=
Andrews'
with
that
=
c = 1.5
inaccuracies
seeing
The
in ~I
as a
solutions
solutions
the two A n d r e w s
the
aberrant
for
a
obtained recommends,
2.1.
Fit
8
I
82,1
e3,1
e&,1
-39.920
.71564
1.2953
-.15212
1.9175
L.sq.,
size=21
2
-37.652
.79769
57734
-.O67O6
1.o579
L.sq.,
size=17
3
-38.805
.82643
64760
-.08577
1.2194
L~,
-38.158
.83800
66290
-.10631
1.1330
Huber,
5
-37.132
.81829
51952
-.07255
.96533
Andrews,
c=1.5
6
-37.334
.81018
54199
-.07037
.99926
Andrews,
e=1.8
7
-~1.551
.93911
58026
-.11295
1.4385
Andrews,
c=1.8
8
-~1.990
.93352
.61946
-.11278
1.5710
Andrews,
c=1.8
1,1
Table
Each method which
become
produces
apparent
a different
while
~I
comparing
~ = 1.2 k =
I
2
estimate, the
but t h e r e
respective
are trends
fit r e s i d u a l s ,
~i"
72
All methods differently The
scale
see l a s t
least
There
an
of
last
simultaneous
divergence
we
the
solution
R'(.)
repeat
conceptual been
have
estimated are
than
by the
the
former
be
seen
that
to
We
In our
solution
process
section
non-linear iterative
,...,eg)
(e
an " i m p r o v e d "
regard
involved
are,
to
observed
or of
that
could dissimulate
Even
with
h.3.2).
and
in t h e
rather
of r e l a x a t i o n
M-estimators
This
application
section
will
of t h e
equations.
methods
can be
seen
of M M - e s t i m a t i o n ,
we w o r k
approximate
out
through
as f i x e d
given some
an
arithmetic
solution.
eg ) = R ' ( e
until
also
possibly,
process.
(see
context
methods
have
and,
considerations
solve
the m e t h o d s
and f r e q u e n t l y
attempted.
all c o n v e r g i n g
with
equations
obvious
inaccurate
point
difficulty
.....
stationarity
e )
is o b t a i n e d ,
that
is to
say
until
Then,
The
the
solution
solution
difficulties.
-
The m e t h o d
followed
expensive
~I
peremptory.
estimated.
can be,
encountered
method
the
the
is c l o s e l y
and non-linear
(e 1 ..... and
but
deceiving.
is m o r e
otherwise-iterative
computations.
approximate
and
upon
solutions
is t o t a l l y
relatively
are
could
on f i x e d
Basically
rule
section,
can be
continuation
point
way,
observations.
dependent
latter
has
point
of t h e
centered
point
what
fixed
when
method
difficulties
in a s i m i l a r outlying
competition The
of M M - e s t i m a t o r s
time-consuming
be
the
important
4,5.
one-step
strongly
by A n d r e w s
wins
exactly
computation
non
time.
remains
Solution
the
2 - and f i x e d
power method.
understanding
In the
s is
proposed
in c o m p u t a t i o n
outliers
account
of T a b l e
by H u b e r
v-th
to r e j e c t
into
parameter
column
The method proposed
tend
take
of the
(e 1 .....
eg)
: R,(e 1 .....
eg).
(e I , ....
eg) = R'(e I ..... eg).
is
general
We w i l l
first
fixed
point
consider
problem
the
still
situation
presents
where
many
we k n o w
for
73
sure
there
is a s i n g l e
solution;
then
we w i l l
take
care
of m u l t i p l e
solutions. We n o t e to d e f i n e the
that some
it
impossibility
parameters, section
(1973)
to
in f a i r l y
in t h i s
aspects
apply
subset
situations
- Thus,
we w i l l
and p r a c t i c a l
Brouwer
et al.
theorem
constant
in the
space
- See H e n r i c i
rather
algorithm
in K e l l o g g
the
a Lipschitz
a compact
computational
refined
to
through
trivial
respect
a general
and f u r t h e r
complementary
difficult
mapping
of d e l i n e a t i n g
except
6.12)
attention
is e q u a l l y
contracting
devote
considerations
as to
of the (1974,
our
investigated
(1976).
due
by S c a r f
A few
are
reported
by T o d d
(1976). The
algorithm
following we want
"the"
the
is
based
on t h e
solution
single
when
solution e
"continuation
some
method"
parameter
varies.
and
consists
Precisely,
in
assume
of =
R(e)
with
e = (e'l,...,e'g)' and
assume
we k n o w t h e
solution
e0
than
e is the
solution
for
for
method
consists
e = e 0, up to X = efficiency function The
I where
e(X)
with
problem,
it
+
the
-
X)
R0(e).
solution fixed
greater,
to the
of the
whatever
(1
is the
is t h e
respect
continuation
serious
R0(.)
k = I of
in f o l l o w i n g
of the m e t h o d
rule
= R0(e0),
e = X R(e) The
another
point the
"variable"
solution
from
the method
from
k = 0, w h e r e
solution
smoother
it is
desired.
is the
The
implicit
~.
X = 0 up to
is, as l o n g
X = I presents
as the
no
(matricial)
expression B = 1 remains
positive
involved seeing
definite.
analytical
their
illustration
good
x(~/ae)
RCe)
numerical
(1
-
X)
(a/~e)
Predictor-corrector
treatment
of s e c t i o n
-
can be p r o p o s e d .
stability
4.h.~.
When
algorithms We
and t h e y h a v e the
above
R0(e) as w e l l
f a v o r the been used
expression
as
former in the
becomes
74
singular [0,1] What
that
or,
indicates
in o t h e r
is h a p p e n i n g
More with
involved
some
into
lead
for one
going-in
relevant
theory
of o p e n
concerned,
we
continuation one-to-one e(o)
path
property
then
degree
not
8(k)
are k n o w n . specialize
some
k 6
appropriate.
(~I ~
of a p a r t i t i o n
the
number
compact
vector
fields
space.
on a v e r y R(O)
space
As
are
~ in [ 0 , 1 ] ;
that
when
8 0 is
moreover
with
respect
approximate
discussion
split
the
to the
of the
continuous
when
above
to
~.
e0 = in some
This
fixed
evaluation
point
of
expression
0 = R(e) in its
components;
we w a n t
a fixed
(e, I ..... e'g) where
we m u s t
select
Ri(e)
=
point
solution
of
(R1(e)' ..... R g ( e ) ' )
according
to t h e
estimation
theory,
we
select Ri(e) The
choice
where robust takes
e0i
of t h e
rule
R0(.)
is a c o n s t a n t
estimation the
= e i - I ¢ i ( x , e l .... ,eg)
of e.. i
may
be
fairly
R0i(e)
= e0i,
of the
trivial,
right magnitude
With these
=
(I
-
X)
or,
definitions,
form B
f(x)
I +
XA
on
are here
assume
computations
let us
of the
on the
as we
property
0, f u r t h e r
paths
it is b a s e d
defined
far
useful
and R0(8) of
of the
account 3);
a
path
of g o i n g - o u t
A good
that
I), or to
continuation
chapter
Banach
a continuation
is c o n t i n u o u s
fast
the
First
~I
of t h e
part.
attention
e(~)
some
help
count
a given
= 8 0 for
to v e r y
at
starting
(1976,
for
: assume
We n o w
MM-estimators.
for
e 0 was
in A m a n n
of 8 in t h e
solutions
see that
the
to
of some
the
of 8 ( I ) ,
leads
set
an e x p l o s i o n
entering
subsets
method
to
with
can be f o u n d
mappings
neighborhood
to
is p o s s i b l e
retain
= 8(I),
differentiable
starting
to a t e r m i n a t i o n
Further,
it
the L e r a y - S c h a u d e r closure
permits
possibly,
paths.
space,
is not
the
?
or,
several
parameter
0(~)
that
analysis
80 m a y
discontinuity
that
terms,
dx. say we
select
possibly, the
above
a less factor
B
75
where
A is the m a t r i x
of b l o c k - c o m p o n e n t s
Aij = $ ~ij (x'el ..... eg) f(x) dx encountered
at
section
an a p p r o x i m a t i o n improved
of 0(~)
approximation
e(x) The
with =
but
For
as
instance
continuous
function
and all
the v a l u e
- e.g.,
-1
[~
it has
been
to
This
been
of t h e
has
produced
were
starting
With
00(2.25) a fast
computation
we h a v e With
tried
the
has
to a g a i n
given
the
estimate
a very
fixed To
rapid
computation
but
point
totally solution
conclude
present solution
converges
section
the
X = 0, u p to
c = ~ down
the
was
a
approximately
Difficulties
to
were met
set
0(2.~),
continuation result
associated
from
0 = 0(c)
predicted
-
with,
~.h.h with
that
fairly
only very
as s t a r t i n g
slowly.
set
= 0(2.3), estimator
the
result
fixed of
this point
section
= 2 0(2.3)
has
come
different
0(2.25)
for
c = 2.25.
corresponding
with
Then
c = 2.35.
had been section
be p e r f o r m e d
by
out w i t h
found
for
it m a y be
mathematics 4.h.2,
- 0(2.25),
from the
an
that
.....
estimate
already met
0 close
0(2.35).
to A second
c = 2.35.
suitable
is e q u i v a l e n t
except
pi(x,el must
- an
initialization
00(2.35)
e0(2.35)
of
c from
this
is
*
been met
problem-free.
the
formula
0
) - 0 1.
generally
appared
with
*
continuation
= 2 0(2.35)
to r e a l i z e
X) R 0 ( e
have
the
parameter
0(2.3).
-
described
it has
and w h e n
by a p r e d i c t o r
) + (1
hazards
computations
possible
*
definite
Newton-Raphson
computations
00(2.3) it has
obtained
achieve
of A n d r e w s ,
c = 2.3;
B is p o s i t i v e
by the
R(0
to t i m e
in t h e
method
with
+ B
difficulty
regression
c = 2.35
*
When
is g i v e n
from time
serious
I.
= e
procedure
rapidly
h.2.
to
indicate
to the
substitution
eg)
that
the
simultaneous of the
function
76
~i(X,el ..... eg)
+
(I
- ~)~0i
(x'el ..... eg)
with ~0i(x,el ..... eg) = (I/2) Possibly
the g r e a t e s t
provide
a larger
aspects
as well
f i x e d point
theories
as on the
conce p t
by S w a m i n a t h a n
advantage
insight
as well
the
continuation
attention method.
illustrated
by the
and by K a r a m a r d i a n
as a l g o r i t h m i c
(e i - e0i).
of the fixed p o i n t
retaining
is w e l l
(1976)
(e i - e0i)'
features.
(1977)
approach
is to
on the m u l t i p l i c i t y The g e n e r a l i t y
series
of p a p e r s
concerning
of the edited
mathematical
5. OPEN AVENUES
We p r e s e n t worthy
5.1.
in this
of f u r t h e r
Estimators
Section work
seen
2.2
however,
possible
to
evaluated
and the
This
state
plausibility
groups
possibly,
also
this
What level some and
we
of
the
and, has
basis,
in case been
With
T(.),
to
theory
although
alrpady
gCy) the
noted
high
for
a large
the
attempts
by
failed.
This
conditions
have
complexity
of the
analysts
- Furthermore,
it
...
dx dy +
to the
of m o t i v a t e d
class
problem
but,
with is not
clear
required.
and,
could
gCx)
precise
which
possibly,
expansions.
as they
integrand
to the (19T5),
can be t r u n c a t e d this
Moreover,
have
been
is v e r i f i e d
the
presented
have
previously;
they
it m a y
be p o s s i b l e
to assess
influence fully even
"natural"
defined,...).
introduce
tackle
respect
demonstrated
is an e x p a n s i o n
few terms
derivative,
a few
everywhere
function
hardly
have
even
influence with
for
function
a completely
their by
to the
values
simulation
per
se
what
derived.
von M i s e s
to
been
in this
a functional
with
~(x,y)
been As
is r e a l l y
of d o u b t ,
regard
investigation
involve
state
shortage
infinite
jackknife
heuristic
has
in t o p o l o g i e s .
first
diverging
it has
expanded
is u n s a t i s f a c t o r y
to
need
seem
justified.
presented
conditions
I
attributed
expansion
really
its
be
to the
top-qualification whether
stressed,
and d i s t r i b u t i o n s .
partly
derivations
dx + ~ I#
expansion
of e x p e r t s
can
either
weakly
distributions
what
can be
gCx)
of a f f a i r
of f u n c t i o n a l s
be
which
rather
to
+ # ~(x)
of the
of
under
g,
to us
of m o s t
fact m u s t
precisely
considerations
appear
root
f according
= T(f)
a few
or
functional
on d i s t r i b u t i o n
T(g)
failure
as
is at the
state
distribution
many
chapter
research
~(x,y)
(linearly)
of the
the
for m o d e r a t e
requirements Mallows
a second-order
of the
function,
supports
correlated
integral
and
in an e x p e r i m e n t a l based
samples.
use
function;
influence data.
size
(consistency,
also m a k e s
influence
double
(first-order)
Mallows, definition
This
first
criteria
a statistic, above
it h a p p e n s
is d e f i n e d
function.
His
being of the
on the
as the may
expansion
to be the
be
influence a way to
78 5.2.
Sample d i s t r i b u t i o n
Relatively sample the
few pieces
distributions
jackknife
and t h e
of
information
of r o b u s t
method
symmetry
Furthermore, normal
of estimators
ways
of a s s e s s i n g
(by t h i r d
we m a y
central
safely
distribution
available We have
regarding
already
the
given
the
possible
bias,
moment)
of t h e i r
distributions.
conjecture
is r a p i d
are
estimators.
that
whenever
the
the
tendency
influence
with
the v a r i a n c e
toward
function
the
is
bounded. This
state
of a f f a i r
utmost
practical
limits
have
supplied
distribution the
sample;
difficulty
to us v e r y
on
robust
on the
location
problem.
computation
same
of the
excerpt
estimator
and the
sample
he has
between
of
the
a few
(1970);
distribution
the
the
distribution
proposed
of t e n d e n c y
has
set c o n f i d e n c e
demonstrated
estimates
line
question
to
relations
essentially
of r o b u s t
In the
the
(1968)
on the
Subsequently,
studentization
we
have
although
by H u b e r
insight
attempts
reasonable.
is n o t e w o r t h y ;
Attempts
some m o r e
these
of the
conjectures
paper
of t h e but
is u n s a t i s f a c t o r y
interest.
they
appear
to n o r m a l i t y , by H a m p e l
the
(1973b)
:
"... t h e t h i r d a n d p e r h a p s m o s t i m p o r t a n t p o i n t seems to be e n t i r e l y new. It c o n c e r n s not the q u e s t i o n w h e r e to e x p a n d , but what to e x p a n d . M o s t p a p e r s c o n s i d e r the c u m u l a t i v e d i s t r i b u t i o n Fn, some the d e n s i t y fn'
but
neither
approach
leads
to v e r y
simple expressions. It s h a l l be a r g u e d that the m o s t n a t u r a l and s i m p l e q u a n t i t y to sudy is the d e r i v a t i v e of the l o g a r i t h m of the d e n s i t y , f ' n / f n , a n d this for s e v e r a l reasons whether
his
It m a y use
: ..."
approach
be
is p r a c t i c a l
seen t h a t
of r o b u s t
we
estimators
feel
remains
entangled
because
to be d e m o n s t r a t e d . in a v i c i o u s
we do not k n o w
distribution
of the
sample
at h a n d ,
and we
departs
a given model
in o r d e r
to
from
estimator. derive the
only
the
evident
sizes
We
are
theoretical
afraid
it is f i g h t i n g
distributions
possibility
to
of t h i s assume
f r o m the
method. any
strict
know
: we m a k e
the
how
a sample
the d i s t r i b u t i o n
for a w r o n g
for r o b u s t
is i n f e r e n c e
limitations
it is not w i s e
should
state
circle
precisely
issue
estimators. sample
We f e e l model
itself, that
for
whereas,
of t h e
to t r y
We in
are
spite
small
to
afraid of
sample
for moderate
79
or l a r g e
sample
sizes,
models
are
not
needed
seeing
the
tendency
to
normality. But and
we m a y
the
even
also
scatter
for
small
to
question
whether
apply
studentization.
sizes,
in
with
the
we
help
are
able
to
The
of t h e
estimate
answer
jackknife
the
mean
is p o s i t i v e ,
method.
We
illustrate. For
5000
negative
replicates
exponential
of
size
n =
11,
(Ul,...,u11) , drawn
from
the
distribution
f(u)
= exp[-(u =
- a)] , if u >
0,
we
have
estimated
the
location
we
have
estimated
e such
a,
if
u
parameter
a by
the
method
of H u b e r ,
i.e.
that 0(u i -
e) = m i n
for
8
with =
£
2
=k(21s When
we
note
e 0 the
asymptotic #
we
have
the
that
p(u
e -e 0 is
variance
of
e is
if Isl < k if IEl > k .
- k),
value
- e 0)
f(u)
a consistent
du
V
is t h e
Through giving
to
relatively In T a b l e
= min
%0
for
asymptotic
e0,
of p a r a m e t e r
small 3,
the
sample
we
variance
according
e 0 and
and
Se,
experimental
the
perfect
agreement
size
of the
theoretical
Moreover,
to
the
~e
and
jackknife the
expressions
Thus
experimental
theory.
we
are
results
in
for
a a
size.
report
k = 2.
replicates.
a.
I = n---~-I V
analytical derivations we h a v e o b t a i n e d 2 o 0 as f u n c t i o n s of t h e p a r a m e t e r k. compare
satisfies
approximately
and
small
i.e.
e 0 and
position
5000
e,
estimator
2 a e = var(e) where
of
are
Except
the the mean for
results
for
two
theoretical and
standard
a neglegible
between
theory
sample,
n =
11.
and
specific
values
to
deviation bias
simulation
with
values,
be
compared
observed k =
k =
.I,
.I
with
with
we
note
notwithstanding
the
the a
80
k = .I
k = 2
8o
.6948
.9~75
F
.7370
.953O
se
• 3055
.2768
s e
.2993
.2632
.0422
.0055
9.768
I .ho5
eo
-
(~
eo)/(~eH5OOO)
-
Table
We m a y values
also
of k.
functions
compare Figure
(pdf)
the
positive
from
one
another
further
5.3.
tails
are
due
similar
which
to t h e
a relatively
while be
that
number
the
negative
We
tails
note
differ
We r e f r a i n
figure
in this
two
density
(8 - ~ ) / ~ 8 "
expected. the
for t h e s e
probability
coordinate
could
fact
small
distributions
of t h e two
the r e l a t i v e
in a w a y
commenting
experimental
2 is a d i s p l a y
against
that
replicates,
the
3
is d e r i v e d
from from
5000
regard.
Adaptive estimators
Seeing
the
inappropriate estimators,
inadequacy
Then,
by m e a n s
illustrate,
some
distributions,
each m e m b e r
of d i s t r i b u t i o n s . is d e c i d e d
of
consider
the
many
of the which
either
on a s a m p l e
of a h e a v y - t a i l e d
location
distribution
above
favors
the
of the
alternative of
alternative
heavy-tail
+
of a d a p t i v e
tried
to
to d e s i g n
for
select
setting
to sets
a specific
for a g i v e n weights.
of
class
sample
To
estimator (I - w)
in
(median).
may
indicate
see
Smith,
setting
and to
be to d e f i n e
statistics.
applied
optimal
or by
(e.g.,
consists
hypothesis,
would
some t e s t
and d e f i c i e n c i e s
estimator
when
have
being
(Xl,...,Xn) , a test
of the
function
authors
sets
of t e s t s
8 = w(mean) Based
estimators
w as
high
1975). zero
one o t h e r w i s e ;
- A good
methods
w to
the
The when the
a monotonously account
is p r e s e n t e d
likelihood
of the
by H o g g
first the
term
test
second term
increasing advantages (1974),
with
81
pdf (O)
k=.l / / k =2
.2
.I
-2
-i
o
FIG. 2
82
particular
emphasis
aspects. although
being
arguments
that
man-made value
"corrected" first
lead
O = w I (midrange) situation are
select
"best"
optimality
drawn
from
to v a r i o u s
context,
then,
involved
treatment
of p r i o r
distributions;
prior
appears
To
5.4.
only
among
from
some
a location
but t h e
our p a r t ,
we do not
(I - w I - w2)
(median).
when
class.
the v a r i o u s Yohai
a family
(197h)
of H u b e r ' s
k and p r o v e s
second feel
the
estimator suggests
to
estimators asymptotic
what
but
prefer
the b a y e s i a n
is the
origin
of the
by M i k ~
(1973)
who
possibly
the
soundest
arbitrariness.
introduces treatment
defined
line
of D e m p s t e r
(1968),
significant
application
in the r o b u s t n e s s
to
these
refrain
from
according
viewpoint
only
infering
to u p p e r
would
be to
and l o w e r
but this
too much
An
a family
when
author
has
field.
the
It
sample
evidence. mixed
of R e l l e s
estimators
rather
is p r o p o s e d
little
conclude
feelings,
and Rogers
there
(1977)
is the
comforting
: statisticians
are
fairly
of l o c a t i o n .
Recursive estimators
Mainly which the
is c l e a r
is in the
of a n y
observation robust
it
important
provides
+
parameters
distributions This
not h e a r d
a n d the m e d i a n , For
different
we w o u l d
because,
bounds.
closer
favors
structure
a common
estimator
to d e c l a r i n g
to be
position
the
that most
of the m e t h o d .
In this
use
the m e a n
+ w2(mean)
is not m u c h
components
The
to the m i d r a n g e . composite
of the
to j u s t i f y
saying
distributions
they
with the
corresponding
from
are u n c o n s c i o u s l y
could
the
range
of n o r m a l
between
robust
scope
a discussion
are p r o p o s e d
They
than
viewpoint
The
arguments
should.
as the
a broader
estimators.
data
intermediate
happy
as w e l l
has p o s s i b l y
is e s s e n t i a l l y
structures.
from mixtures
estimator
very
it
adaptive
subjective
estimator
are d r a w n
typical
robust
background
(1975)
informative;
to
fairly
of the
samples
of T a k e u c h i
less
leading
Frequently, choice
on the h i s t o r i c a l
The p a p e r
in t i m e - s e r i e s
permit
to w o r k
estimator
Makhoul linear
(1975) models
on
analysis,
out
size
surveys and t h e y
it is p e r e m p t o r y
an e s t i m a t o r
n as w e l l
as
the v a r i o u s appear
on a s a m p l e
some
summary
methods
non-robust.
of
to h a v e size
information
relevant
expressions
(n +
I), w h e n
are given.
to f o r e c a s t i n g
with
83
The theory
design
of robust
for recursive
Nevel'son
(1975)~
algorithm
and
sample
regard with
to gross
errors
influence,
same
estimation robust
(1976),
In general,
of the
analysis
of s t a t i o n a r i t y
must
hardly
to build
possible
information.
non-linearities can be seen It does
in Rey
investigation (1970) Poisson
have
jackknife
of point
compared with
In t i m e - s e r i e s results
robust
several
They
is optimal as well
can be attributed
assumptions
of the models
sensitivity
to these
assumption
This
question
Devlin
et al.
complicate
with
the
also provides
a
of Kersten
(]976).
recursive
robust
approaches
derived
it is rare to obtain
are,
on the
and that,
one hand,
that lack
on the other hand,
compact
representation
are e n c o u n t e r e d
it is of the
even when the
treatment
estimation
is c o n s i d e r e d
except
estimates placed
are m o d e r a t e
but this
as
to be very
of a r e l i a b i l i t y
that
variance
an estimate
process
to d i f f i c u l t y
analyses,
to assess
and to the d i f f i c u l t y
assumptions.
is only one
Among
with
promising.
and Hoel
factor
in a
and s e n s i t i v i t y derived
them,
what
the help
the
from the
the
possible
are the key the
independency opposed
alternate
of an influence
(1975);
limited
of e v a l u a t i n g
it is f r e q u e n t l y of the
and by Mallows
in the Gaver
respects.
as in point
is approached (1975)
occasionally.
on bias,
conclude in most
is nearly u n m a n a g e a b l e ~
correlation,
work
the proposal
analytical
processes,
emphasis
hypothesis. theory
or less
paper
(1975a).
seem that
process
to Poisson
in the
accessible
a recursive
on heuristic
any a p p r o p r i a t e
sensitivity
a weak
More
et al.
However,
difficulties
involved
not
series
The reasons
be coped with
These
That
finite
in order to
A more
with
with
of Evans
in part,
algorithm.
convergence.
data.
analogous
of time
at least
from the R o b b i n s - M o n r o
past
of serial
the
have
similar.
(1974)
the
a Kalman-Bucy
bounded
offset.
by Rey
Whether
to bound
outliers
is partly
by
question.
proposes
strictly
of a large
b a s e d on the paper
the
Some
the R o b b i n s - M o n r o
an open
in order
is not
scale
with
(1976)
Occasional
(1977)
"median"
is based,
satisfactory
Tollet
influence
unsatisfactory.
M-estimators.
remains
selected
had been p r o p o s e d
of the
estimators
factor
in presence
estimation
and Kurz
satisfactory
and M a r t i n
strategy
so far,
similarities
or outliers. that
tracking
by M a s r e l i e z
are
a gain
but
is,
of type M has been d e v e l o p p e d
to o n e - s t e p
to filtering,
approach
permit
it presents
is relative
properties
With
methods
estimators
the results
to linear hypotheses.
function
are too
by
84 5.5.
Other
views
In s e c t i o n "linear" This
approach
h, we
Sacks
of w o r k
by
regard
deleting
here
facing
by
structures
"abusively"
be
and van
(1976)
this
attitude
is v e r y m u c h
be d i s c u s s e d that
To is,
be
in the
evident
help
while
basis.
The
is not
as
need
for these data,
Rohlf,
1975)
are w e l c o m e . is a c l a s s i c a l
is a r e g r e s s i o n
at h a n d
obtained
and
Most
classical
analysis methods.
by
and K e t t e n r i n g may
methods
is i l l u s t r a t e d squares
by H u b e r ' s
a
is p a r t i c u l a r l y
of o u t l i e r s
and r o b u s t
can
procedures.
by p r o v i d i n g
advocated
least
as
on the
methods
in d a t a
methods
This
and
can h a r d l y
attention
results
as
identification
(see H o e r l
statistical
to the m o r e
computation
structures
appropriate
the
estimation
to
in the v a r i a b l e s ,
clear.
than
and by G n a n a d e s i k a n the
(e.g.,
former
problem
put
or t h e
is the m o s t
we r e t a i n
robust
multivariate
(1969)
the
But
counterpart
domain,
4, t h e
method)
methods
that
identification
latter
on t h e i r
estimated
in v a l i d a t i n g
difficulties
regression
What
the
possible
by r i d g e
upon
text.
recall
and W i l k
this
3 and the
dependent
processing
Non-polynomial
errors
are
it is c l e v e r l y
shortcomings
of the
We
of v a r i a b l e s ;
when
its
(197h).
(1977).
to
(1975) for
and an i n t e r e s t i n g
by D y k e
of
(1974).
present
let us
In the m u l t i v a r i a t e
whilst
al.
an e s s e n t i a l
Gnanadesikan
Figures
et
and P i k e
in s p i t e
attributed
is p r e c i s e l y
comparison
formidable
paper
More
eventual
techniques
selection
is t a k e n
and
is p a r t i c u l a r l y
The
be d i s r e g a r d e d
care
be d o n e
as a p p r o x i m a t i o n
methods
requiring
may
due
seen
f o r us,
reliable
when
what
conclude
Robust
1977)
quadratic;
a tremendous
with
This
(1976)
best
or
by M a r c u s
as by M e a d
is p r e s e n t e d
of the not
can be p a r t l y
by F l o r e n s
better
should
for a r e c e n t
reviewed
fact
problem
a minimax
that
assumed
approach.
by H o c k i n g
data
Campenhout
concerned,
non-linearities
of the
linear
of
validity.
non-linearities.
terms.
as w e l l
with
it seems
hidden
solution
the m o d e l
augmented
frequently
(1975)
are r e v i e w e d
general
non-linearities; Kennard
are
either
cases,
with
introduced
and D r a p e r
(e.g.~ P e a r s a l l ,
Cover
can a l s o
cope
to the
(1975)
slightly
simplest
to
discontinuous
the
implemented light
in the
and bound method
by H u b e r
and
the r o b u s t n e s s
with
attention
questionning
alternative,
Even
Box to
addressed
on the
is n e e d e d
great
without
criticized
variables
situation
branch
based
of the
discussed
devoted
problems
been
polynomial
deletion
with
has
(1977).
general
have
is d i r e c t l y
partly
paper
amount
robustness
regression
question
this
in
(1972). present not by
regression
method
as
85
described
at
of o p i n i o n , important) sample
section but
the
the
4.h.3. second
difference
is w o r t h y
of
Which
is the b e s t
m a y be p r e f e r r e d between
further
the
attention.
two
and
regression (this
results
is
is a m a t t e r
is the m o s t such t h a t
the
86
•
• •
•
O0
8 II
II c-
.
I
F~G. 3
87
•
•
"'X.. oo
\o
q,,,m
II
II C-
•
FIG. Z+
6. REFERENCES
Amann,
H., Fixed point equations
ordered Banach spaces, Andrews,
D.F., A robust method
Technometrics,
and nonlinear
SIAM Roy., 18 (1976)
16 (1974)
for multiple
eigenvalue
problems
linear regression,
523-531.
Andrews,
D.F., Alternative
variance
problems,
Andrews,
D.F., Bickel,
calculations
In Gupta
for regression
and analysis
of
(1975) 2-7.
P.J., Hampel, F.R., Huber, P.J., Rogers,
and Tukey, J.W., Robust
in
620-709.
estimates
of location,
Princeton
Univ.
W.H. Press
(1972). Anscombe,
F.J., Rejection
of outliers,
Barra, J.R., Brodeau, F., Romier, developments
in statistics
Technometrics,
~ (1960)
123-147.
G. and Van Cutsem, B., Recent
(edited by), North-Holland,
Amsterdam
(1977). Beran,
R., Robust location
estimates,
Beran, R., Minimum Hellinger Ann.
Statist.,
Berger, J.O.,
Beveridge, practice, Bickel, Statist.
estimates
results
of a location vector,
G.S.G.
results
and Schechter,
P.J., One-step Ass.,
for parametric
for generalized Ann. Statist.,
for generalized
of a location vector, Ann.
Mc Graw-Hill
~ (1977a)
431-444. models,
445-463.
J.0., Admissibility
coordinates
distance
Inadmissibility
of coordinates Berger,
~ (1977b)
Ann. Statist.,
Statist.,
estimators 302-333.
Bayes estimators
~ (1976b),
R.S., Optimization
Cy., New York
Bayes
~ (1976a),
334-356.
: theory and
(1970).
Huber estimates
in the linear model,
J. Am.
7__O0(1975) h28-43~.
Bissel, A.F.,
and Ferguson,
R.A., The jackknife
edged weapon
? Statistician,
24 (1975) 79-100.
Box, G.E.P., Non-normality (1953) 318-335.
: Toy, tool or two
and tests on variances,
Biometrika,
40
of
89 Box, G.E.P. 3~7-352.
and Draper, N.R., Robust designs,
Biometrika,
6_~2 (1975)
Boyd, D.W., The power method for Lp norms, Linear Algebra Appl., (197~) 95-101. Brownlee, K.A., Statistical theory and methodology engineering,
Wiley, New York, 2nd ed.
in science and
(1965).
Cargo, G.T. and Shisha, 0., Least p-th powers of deviations, Approximation Thy,
J.
15 (1975) 335-355.
Chert, C.H., On information and distance measures, feature selections,
error bounds and
Information Sciences, 10 (1976)
159-17h.
Coleman, D., Holland, P., Kaden, N., Klema, V. and Peters, system of subroutines
for iteratively reweighted
computations, manuscript,
Mass.
Inst. Techn.,
S.C., A
least squares
(1977).
Collins, J.R., Robust estimation of a location parameter in the presence of asymmetry, Ann.
Statist., k (1976) 68-85.
Cover, T.M.
and van Campenhout,
measurement
selection problem,
J.M., On the possible orderings IEEE Trans.
Syst. Man Cyb., SMC-7
in the (1977)
657-661. Daniel,
C. and Wood, F.S., Fitting
equations to data, Wiley, New York
(1971). Dempster, A.P., Estimation (1966) 315-332.
in multivariate
Dempster, A.P., A generalization Statist.
Soc., B-30
analysis,
of bayesian
In Krishnaiah
inference,
J. Roy.
(1968) 205-232.
Dempster, A.P., A subjectivist
look at robustness,
In I.S.I., ! (1975)
3hg-37h. Dempster, A.P., Examples relevant to the robustness inferences, Denby,
In Gupta and Moore
(1977)
L. and Mallows, C.L., Two diagnostic
regression
analysis.
of applied
121-138. displays
Technometrics, 19 (1977)
1-13.
for robust
90 Devlin, S.J., Gnanadesikan, R. and Kettenring, J.R., Robust estimation and outlier detection with correlation coefficients, Biometrika, 6_~2 (1975)
531-5~5.
Draper, N.R. and Smith, H., Applied regression York (1966).
analysis,
Dutter, R., Algorithms for the Huber estimator Computing, 18 (1977) 167-176.
of multiple
Dyke, G'V., Designs regression, Eisenhart,
Appl.
to minimize
Statist.,
loss of information
of the concept
Ekblom,
H., Calculation
Tidskr.
Informationsbehandling),
Ferguson,
Sciences,
estimation
with
1!I (1976) 69-92.
to probability
theory and its applications,
2 (1966).
R.A., Fryer, J.G.
a truncated
BIT (Nordisk
BIT (Nordisk Tidskr.
P. and Kurz, L., Robust recursive
W., An introduction
Wiley, New York, Vol.
1971 Am. Statist.
I__33(1973) 292-300.
Ekblom, H., Lp-methods for robust regression, Informationsbehandling), I~ (197&) 22-32.
Feller,
in polynomial
of linear best Lp approximations.
Information
regression,
of the best mean of a set
of measurements from antiquity to the present day, Ass. Presidential Address, Audience notes (1971).
applications,
New
2__33(197~) 295-299.
C., The development
Evans, J.G., Kersten,
Wiley,
and Mc Whinney,
normal distribution,
I.A., On the estimation
In I.S.I., ~ (1975) 259-263.
Filippova, A.A., Mises' theorem of the asymptotic behavior of functionals of empirical distribution functions and its statistical applications,
Theor.
Probab.
Appl., ~ (1962) 2h-57.
Fine, T.L., Theories of probability; Academic Press, New York (1973). Fletcher,
R., Grant, J.A.
and Hebden,
an examination
of foundations,
M.D., The continuity
and
differentiability of the parameters of best linear Lp approximations, J. Approximation Thy, I__O0(197h) 69-73.
of
91 Fletcher, R. and Powell, M.J.D., A rapidly convergent for minimization, Computer J., 6 (1963) 163-168. Florens,
J.P., Mouchart,
error-in-variables Forsythe, A.B., coefficients
M. and Richard J.F., Bayesian
models,
Robust
estimation
by minimizing
(1972)
159-166.
Garel,
B., D~tection
J. Multivariate
of straight
aberrantes
Th~se, Univ.
Gaver, D.P. and Hoel. D.G., Comparison probability
estimates,
Technometrics,
Gentleman,
W.M., Robust
minimizing
p-th power deviations,
University,
R. and Kettenring,
outlier detection 81-12~. Gnanadesikan, statistical
analysis,
of certain
(1976). small-sample
of multivariate
and Schucany, New York
Gray, H.L., Schucany,
Poisson
location by
Dissertation,
Princeton
Bell Tel. Labs.
J.R., Robust
In Krishnaiah
1_~h
dans un ~chantillon
estimates,
data, Biometrics,
(1969)
(1965). residuals
and
2 8 (1972)
R. and Wilk, M.B., Data analytic methods
Marcel Dekker,
jackknife
Technometrics,
12 (1970) 835-850.
Ph.D.
with multiresponse
in
line regression
Grenoble
and Memorandum MM 65-1215-16,
Gnanadesikan,
Gray, H.L.
estimation
inference
Anal., ~ (197~) 419-452.
p-th power deviations,
des valeurs
gaussien multldimensionnel,
descent method
in multivariate
593-638.
W.R., The generalized
jackknife
statistic,
(1972).
W.R. and Watkins,
T.A., On the generalized
and its relation to statistical
differentials,
Biometrika,
62
(1975) 637-642. Gray, H.L., functions
Schucany,
distributions, Gross, A.M., Am,
W.R. and Woodward,
of the parameters
Statis.
Gross, A.M.
IEEE Trans.
Confidence
and Tukey,
Study, Princeton
Univ.,
Reliability,
intervals
Ass., 72 (1977)
W.A., Best estimates
of the gaussian R-25
of
and the gamma (1976)
for bisquare
95-99.
regression
estimates,
J.
341-35~.
J.W., The estimators Techn.
Rept.
38-2
of the Princeton
(1973).
Robustness
92 Gupta, R.P., Applied Statistics
(edited by), North Holland, Amsterdam
(1975). Gupta, S.S. and Moore, D.S., Statistical decision theory and related topics
(edited by), Academic Press,
Hampel, F.R., Contributions Dissertation,
(1977).
to the theory of robust estimation,
Univ. California,
Berkeley
Ph. D.
(1968).
Hampel, F.R., A general qualitative definitions of robustness, Ann. Math.
Statist.,
42 (1971)
1887-1896.
Hampel, F.R., Robust estimation Wahrscheinlichkeitstheorie Hampel, F.R., Asymptotic
Some small sample asymptotics,
Statist.,
(1973b)
Hampel, F.R., The influence Am. Statist.
: A condensed partial
Proceed.
Prague Symp.
109-126. curve and its role in robust estimation, J.
Ass., 69 (1974)
383-393.
Hampel. F.R., Beyond location parameters In I.S.I.
survey, Z.
verw. Cab., 27 (1973) 87-104.
: Robust concepts and methods,
I (1975) 375-382.
Hampel, F.R., On the breakdown points of some rejection rules with mean, E.T.H. Harter,
Zurich, Res. Rept
11 (1976).
H.L., The method of least squares and some alternatives,
Statist.
Rev., 42 (1974)
125-190,
273-278,
147-174, 235-264,
269-272,
44 (1976)
282, 4 3 (1975)
113-159.
Henrici, P., Applied and computational complex analysis, Vol. New York (1974). Hill, R.W., Robust regression Ph. D. dissertation, Hille, E., Analytic
when there are outliers
Harvard University function theory,
I, Wiley,
in the carriers,
(1977).
Blaisdell,
New York, Vol.
I
(1959). Hocking,
R.R., The analysis
regression,
and selection of variables
Biometrics, 32 (1976)
Int.
1-44,
1-49.
in linear
93
Hoerl, A.E. and Kennard, R.W., Ridge regression of the biasing parameter.
Commun.
Statist.
: iterative estimation
- Theor. Meth., A5 (1976)
77-88. Hogg, R.V., Adaptive robust procedures suggestions
for future applications
: A partial review and some
and theory, J. Am. Statist. Ass.,
69 (1974) 909-923. Huber, P.J., Robust estimation
of a location parameter,
Ann. Math.
Statist., 35 (196~) 73-101. Huber, P.J., Robust verw. Geb.,
confidence
limits, Z. Wahrscheinlichkeitstheorie
I_~0 (1968) 269-270.
Huber, P.J., Th~orie de l'inf~rence Montreal
Huber, P.J., Studentizing robust Huber, P.J., Robust (1972)
statistique robuste, Presses Univ,
(1969).
statistics
estimates,
(1970) h53-h63.
: A review, Ann. Math.
Statist.,
4_~3
1041-1067.
Huber, P.J., Robust regression: Carlo, Ann.
Asymptotics,
conjectures
and designs,
Huber, P.J., Robust covariances,
In Srivastava
In Gupta and Moore
Huber, P.J., Robust methods of estimation Math. 0perationsforsch.
Statist.,
197h,
(197~)
Jackson,
165-191.
~I-53.
solution of robust regression
165-172.
Speech Signal Proc., ASSP-23
I.S.I., Proceedings
(1977a)
of regression coefficients,
S.Y., On monotonicity of Lp and lp norms,
Electroacoustic
(1975) 287-301.
Ser. Statistics, ~ (1977b)
Huber, P.J. and Dutter, R., Numerical problems, COMPSTAT
Int. Stat.
and Monte
Statist., ~ (1973) 799-821.
Huber, P.J., Robustness
Hwang,
In Purl
IEEE, Trans.
(1975) 593-596.
of the bOth Session, Warsaw - 1975, h books, Bull.
Inst., h6 (1975).
D., Note on the median of a set of numbers, Bull. Ann. Math.
Soc., 27 (1921)
160-16~.
94 Jaeckel,
L.A., Robust
contamination,
estimates
Ann. Math.
of location
Statist.,
Jaeckel, L.A., The infinitesimal MH-1215 (1972a). Jaeckel,
L.A., Estimating
dispersion
Kanal, L., Patterns Theory,
IT-20
Karamardian,
(197~)
S., Fixed points
Kellogg,
R.B., Li, T.Y.
(prepared
Kersten,
IEEE Trans.
I~49-1458.
Information
and applications
and Yorke, J., A constructive
(edited by),
and computational
proof of the
results,
SIAM J. Numer.
273-483.
and Buckland,
W.R., A dictionary
3rd ed.
Statistical
of statistical
Institute).
terms
Oliver and
(1971).
P. and Kurz, L., Robustized vector Robbins-Monro
with applications (1976)
algorithms
for the International
Boyd, Edinburgh,
23 (1972b)
the
(1977).
fixed point theorem
Kendall, M.G.
Statist.,
by minimizing
697-722.
Press,
Brouwer
I020-I03~.
coefficients
Ann. Math.
and asymmetric
Bell Lab. Memorandum,
in pattern recognition,
Academic
Anal., 13 (1976)
h_~2 (1971)
jackknife,
regression
of the residuals,
: Symmetry
to M-interval
detection,
Information
algorithm
sciences,
I__!I
121-I~0.
Koml6s, J., Major P. and Tusn~dy, In Revesz (1975) 149-165. Krishnaiah,
G., Weak convergence
and embedding,
P.R., Multivariate
analysis
(edited by), Academic
Press, !
P.R., Multivariate
analysis
(edited by), Academic
Press,
(1966)o Krishnaiah,
(1969). Lachenbrueh, discriminant multiple
P.A., On expected probabilities of misclassification in analysis, necessary sample size, and a relation with the
correlation
coefficient,
Biometrics,
2k (1968) 823-83h.
Lewis, J.T. and Shisha, 0,, Lp-convergence of monotone functions and their uniform convergence, J. Approximation Thy, l h (1975) 281-28h.
95 Makhoul,
J., Linear prediction
: a tutorial
review,
Proc.
IEEE, 63
(1975) 693-708. Mallows,
C.L., On some topics
in robustness,
I.M.S. meeting,
Rochester,
May (1975). Marcus,
M.B.
and Sacks, J., Robust designs
Gupta and Moore Maronna,
(1977)
Masreliez,
Statist.,
C.J.
linear model
k (1976)
and Martin,
of multivariate
Merle,
viewpoint,
the Kalman
Biometrics,
and
filter,
estimation
for the
IEEE Trans.
on Aut.
Computing,
12 (1974)
Mik@, V., Robust Pitman-type Math.,
surface methodology
from
31 (1975) 803-851.
G. and Spath, H., Computational
approximation,
Statist.
location
(1977) 361-371.
Mead, R. and Pike, D.J., A review of response a biometric
In
51-67.
R.D., Robust bayesian
and robustifying
Contr., AC-22
problems,
245-268.
R.A., Robust M-estimators
scatter, Ann.
for regression
experiences
with discrete
Lp
315-321.
estimators
of locations,
Ann.
Inst.
25 (1973) 65-86.
Miller.
R.G., The jackknife
- a review,
Miller,
R.G., An unbalanced
jackknife,
Biometrika, Ann.
61 (1974a)
Statist.,
1-15.
2 (1974b)
880-891. Miller.
R.G., Jackknifing
Biostatistics Mosteller,
censored
Div., Techn. Rept.,
F., The jackknife,
Rev.
data, Stanford University, I~ (1975). Int., Statist.
Inst., 39 (1971)
363-368. Munster,
M., Th~orie g~n~rale
Soc. Roy. Sc. Liege, N.C.H.S.,
Annoted bibliography
procedures,
(1972).
de la mesure
et de l'int6gration,
Bull.
43 (1974) 526-567.
U.S. Dept.
on robustness
Health Educ. Welf.,
studies
of statistical
Publication
(HSM) 72-1051
96 Nevel'son,
M.B., On the properties
of the recursive
functional of an unknown distribution
function,
estimates for a
In Revesz
(1975)
227-251. 01kin,
I., Contributions to probability and statistics
Stanford Univ. Press,
(edited by),
(1960).
Pearsall, E.S., Best subset regression by branch and bound, report, Wayne State University, (1977). Prokhorov,
Y.V., Convergence of random processes
and limit theorems
probability theory, Theor. Probab. Appl., ! (1956) Puri, M.L., Nonparametric
techniques
by), Cambridge Univ. Press,
Internal
in statistical
in
157-21~. inferences
(edited
(1970).
Quenouille, M.H., Notes on bias in estimation,
Biometrika,
~3 (1956)
353-360. Rao, C.R., Linear statistical
inference
and its applications
-
second
edition, Wiley, New York (1973). Reeds, J.A., On the definition of yon Mises functionals, Dissertation
and Res. Rept.
S-&h, Harvard University,
Relles, D.A. and Rogers, W.H., estimators
of location,
Amsterdam
(1976).
are fairly robust
J. Am. Statist. Ass., 72 (1977)
Revesz, P., Limit theorems North-Holland,
Statisticians
Ph. D.
of probability theory
107-111.
(edited by)
(1975).
Rey, W., Robust estimates of quantiles,
location and scale in time
series, Philips Res. Repts, 29 (197h) 67-92. Rey, W., On least p-th power methods location
estimations,
in multiple regressions
BIT (Nordisk Tidskr.
and
Informationsbehandling),
15
Rey, W., Mean life estimation from censored samples, Biom. Praxim.,
15
(1975a) 174-18h.
(1975b)
I~5-159.
Rey, W., M-estimators Brussels, R329
(1976).
in robust regression, MBLE Research Laboratory,
97 Rey, W.J.J., M-estimators Barra et al. Rey, W.J.J. In I.S.I.,
in robust regression
-
a case study, In
(1977) 591-594. and Martin, L.J., Estimation of hazard rate from samples,
4 (1975) 238-240. m
Rohlf, F.J., Generalization multivariate
of the gap test for detection of
outliers, Biometrics,
Ronner, A.E., P-norm estimators Dissertation,
31 (1975) 93-101.
in a linear regression model,
Groningen University,
The Netherlands
Scarf, H., The computation of economic New Haven and London
equilibria,
(1977). Yale Univ. Press,
(1973).
Scholz, F.W., A comparison of efficient location estimators, Ann. Statist., ~ (1974)
1323-1326.
Schucany, W.R., Gray, H.L. and Owen, D.B., On bias reduction estimation, J. Am. Statist. Ass., 66 (1971) Schwartz,
L., Analyse numerlque~
fonctionnelle,
Hermann, Paris
topologie
in
524-533. g~n~rale et analyse
(1970).
Sen, P.K., Some invariance principles
relating to jackknifing and their
role in sequential analysis, Ann. Statist., ~ (1977) 316-329. Sharot,
T., The generalized jackknife
sizes, J. Am. Statist.
: finite samples
and subsample
Ass., 71 (1976a) 451-454.
Sharot, T., Sharpening the jackknife,
Biometrika, 63 (1976b)
315-321.
Smith, V.K., A simulation analysis of the power of several tests for detecting heavy-tailed distributions,
J. Am. Statist.
Ass., 7 0 (1975)
662-665. Srivastava, J.N., A survey of statistical design and linear models (edited by), North-IIolland, Amsterdam
(1975).
Stein, C., A necessary and sufficient condition Math.
for admissibility,
Ann.
Statist., 26 (1955) 518-522.
Stigler,
S.M.,
estimation
Simon Newcomb,
: 1885-1920,
Percy Daniell and the history of robust
J. Am. Statist.
Ass., 68 (1973) 872-879.
98 Swaminathan, Academic
S., Fixed point theory and its applications
Press,
Takeuchi,
K., A survey of robust
procedures,
especially
In I.S.I., ! (1975) Thorburn,
Eisenhart
History
Tukey,
properties
of the Peloponnesian
statistics,
war, ~ (428 B.C.) para 20, In
Washington
forecasting (1976)
Lecture
124 (1976).
IEEE Int. Conf.
Cybernetics
on
&
600-605.
29 (1958)
S., Mathematical
Berlin,
for the linear model with emphasis
ouliers,
J.W., Bias and confidence Statist.,
and applications,
Springer Verlag,
in not-quite
large
samples
(abstract),
614.
Tukey, J.W., A survey of sampling 01kin (1960) 4~8-485.
Mas.,
of jackknife
of fixed points
Syst.,
toward occasional
Ann. Math.
Vajda,
and
quantity,
305-313.
and Math.
I.H., Robust
robustness Society,
: models
of a physical
(1971).
in Econ.
Tollet,
of location
336-348.
Todd, M.J., The computation Notes
estimation
in case of measurement
D., Some asymptotic
Biometrika 63 (1976) Thucydides,
(edited by),
(1976).
from contaminated
programming,
Addison-Wesley
distributions,
In
Inc., Reading,
(1961).
Von Mises, R., On the asymptotic statistical Woodruff,
functions,
R.S.
the variance
Ann. Math.
and Causey,
distribution Statist.,
B.D., Computerized
of a complicated
estimate,
of differentiable
18 (1947) 309-348. method for approximating
J. Am. Statist.
Ass., 7_!I (1976)
315-321. Yale, C. and Forsythe, (1976)
291-300.
Yohai,
V.J., Robust
(1974)
562-567.
Youden,
A.B., Winsorized
estimation
W.J., Enduring values,
regression,
Technometrics,
in the linear model, Ann.
Technometrics,
I_~ (1972)
Statist.,
1-11.
18
99 Ypelaar,
A. and Velleman,
procedures,
Economic
Cornell University,
P.F., The performance
and social 877/007
statistics
(1977).
of robust regression
technical
reprint
series,
APPENDIX
The
following
which have
would been
ranked
developments, made
between
are n o t e d
pages
unduely
in
are
in a l p h a b e t i c a l
wherever the
essentially
load the main
they may
sections
italic
type.
of t h e
devoted
text order
to the
involved
derivations
otherwise.
The v a r i o u s
sections
to
research
specific
ease t h e
be r e q u i r e d . appendix.
Many Then,
of
cross-references the
concerned
are
entries
101 Consistent
estimator
A vector
estimate
tn
is said
to be
consistent
when
it c o n v e r g e s
in
m
probability, which
it
terms,
~n
as the
sample
size
is an e s t i m a t o r
- from
is c o n s i s t e n t
if, and plim
where
p(.,.)
is the m e t r i c
to the v a n i s h i n g distribution
There with
only
interest
to m o m e n t
Chebyshev-like
(1971).
sample
in b o u n d i n g
of p o s i t i v e
~ of In o t h e r
= O, space
the
~.
This
between
Prohkorov
order
is e q u i v a l e n t fn'
metric
m,
= E{[O(~n , ~)]e}.
inequality
P r o b { P ( & n , ~ ) < ~} > 1 leads
parameter
if,
O(~n , ~)
in the
to the
and B u c k l a n d
Prokhorov metric d(.,.)
of the
m The
Kendall
of --n t , and f, the D i r a c d i s t r i b u t i o n centered C o n s i s t e n c y ~ lim d(fn, f) = O.
m a y be
respect
n increases,
-
to
1/(e+1)
d(f n, f) ~ m e
me
8
e
the on
sample
8.
d(fn,
f),
102 Contaminated
normal
With the
desire
of a s s e s s i n g
important,
Tukey
estimators
on the
recall
findings
his
compares
(1960)
and
estimations
from
a purely
drawn
from
the
observations
normally
distribution
symmetry
has
longer
when
The known
model
instead
has
may
of t h e m e a n
contamination
but
is
of s e v e r a l
distributions.
We
own o b s e r v a t i o n s .
He
performed
distribution
with
a sample
a 2) and w i t h
a sample
contaminated
His m o d e l
by e x t r a n e o u s
of c o n t a m i n a t e d
form
N(~, for
shorter
~).
+ c NC.I, ~I = ~ and, tails
than
contamination
unexpected
of the
leads
for m o d e r a t e the
normal
estimation
of
to
some
£, t h e
w h e n ~I < c and
the
better
situation
thousandths
12 % b e t w e e n
readers.
distribution
to have
of one or two
loss
normal
to an e f f i c i e n c y
is s u f f i c i e n t
scale
efficiency
performances
our
N(~,
c)
appear
estimation
Concerning
with
and scale
distribution
the
-
the
assumption
~I > a.
findings that
them
normality
non-normal
distributed.
is m a i n t a i n e d
contaminated
slightly
augment
normal
the
investigated
and
same n o r m a l
also
whether
of l o c a t i o n
(i The
has
normal
drawn
normal
distributions
the mean
loss
b y the m e d i a n
of 36 %, b u t
estimation is r a t h e r
is
It is w e l l
location
exceptional;
sufficient
deviation
a 10 %
by t h e m e d i a n .
to b a l a n c e
and t h e
a an
standard
deviation. Precisely, obtained to the
for
in t h e some
symmetric
following
specific
the
asymmetric
given
table
Location
has
been
asymptotic
comparison.
Scale : the
£.
find Table
the r e s u l t s 4 is r e l a t i v e
-
£) N ( 0 ,
I)
+ e N(0,
32),
- e)
~(0,
I) + ~ ~ ( 2 ,
32 )
5.
respective
estimators
one w i l l
model
(I has
tables,
of p a r a m e t e r
model
(I whereas
two
values
estimated variances
has b e e n
standard
by the m e a n (war)
assessed
deviation,
and t h e m e d i a n .
constitute by t h e t h r e e the m e a n
a natural
Their basis
for
following
deviation
to the m e a n
and
103
the m e d i a n
deviation
semi-interquartile reported we
have
the r e s u l t s with
definition
compare
the
variation
the
at l e v e l whether
lines
0.95
as
estimator,
computed
but
(or s l i g h t l y
the
has
not b e e n
poorer
than)
to the m e d i a n .
Due
it has
appropriate
on the b a s i s
appeared
of t h e i r
of t h e i r
tables
to the
respective
asymptotic
are d r a w n
contaminated
ratio
given
test
relative
size
such
that
with
probability
simple
minded;
estimation
to H I.
vhat
lack
of
to
coefficients
standard
deviations
of t h e
a "discriminatory the
It m u s t
of to
and,
distribution
and the
integrations
H 0 be
we
size"
to t e s t
that
HI,
or
a
with
or 2, 3 2 )
that
should
then,
the
our
take
standard
have
attributed
be n o t e d
realistically
sample
have
supposed
H 0 against
a E are the m e a n
from
must
normal
We h a v e
to t e s t
Numerical
parameters
sample
a strictly
I) + ¢ N ( 0
drawn
0.95.
more
from
performed
~s and
a sample
size
distribution.
was
~, and w h e r e
deviation
provides
is t h e m i n i m u m
H 0 = N(~¢, s H I = (I - ¢) N ( 0 ,
would
same
scale,
ratio
of t h e
which
observations
a given
likelihood
for
been
means.
The l a s t
from
the
A fourth
also
deviation
f o r the
estimators
(c.v.),
has
being
the m e d i a n
natural
their
to the m e d i a n .
range,
provided
the
to H 0 by the
approach into
is v e r y
account
discriminatoy
test
the
sample
sizes
be l a r g e r .
Inspection distribution
of the shape
In p a r t i c u l a r , significantly
tables
is v e r y
a slight impair
Moreover,
to t e s t
optimality
the
long-tail
the
deviation.
reveals
that
dependent
large
of t h e s e
sensitivity nature
contamination
efficiencies
quite
the
upon the
is
of the m e a n
samples
must
estimators.
be
to t h e
of the
estimators.
sufficient and of the at d i s p o s a l
to standard in o r d e r
104 Symmetric model
0.0000
0.0018
0.0282
0. 1006
0.2436
0.5235
0.8141
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
1.0000
1.0140
1.2252
1.8047
2.9488
5.1882
7.5230
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
1.5708
1.5745
1.6315
1.8042
2.2389
3.7066
2.5130
Mean
var
Median
var
Standard d e v i a t i o n -
1.0000
1.0070
1.1069
1.3434
1.7172
2.2778
2.7410
c.v.
0.7071
0.7627
1.1725
1.3540
1.2317
0.9720
0.7929
Mean d e v i a t i o n to the mean -
O.7979
0.8007
0.8428
0.9584
1.1866
1.6333
2.0970
c.v.
0.7555
0.7627
0.8514
0.9822
1.0461
0.9720
0.8417
Median deviation to the m e d i a n 0.6745
0.6754
0.6891
0.7297
0.8259
1.1089
1.6167
1.1664
1.1668
1.1725
1.1899
1.2317
1.3435
1.3600
137
83
123
638
C.V.
Discriminatory
0.95 1
-
sample
7880
size
497
Table 4
105 Asymmetric
model
0.0000
0.0008
0.0115
0.0617
0.2228
0.6246
O.6878
0.0000 1.0000
0.0016 1.0097
0.0230 1.1377
0.123h
0.4456 3.4753
1.2492
1.7254
6.9348
1.3756 7.3613
-I
0.0000
0.0005
0.0072
0.0~01
0.1657
0.7409
0.8986
var
1.5708
1.5727
1.5977
1.7254
2.2901
6.9348
8.7865
Mean
var
Median
Standard
deviation
1.0000
1.0049
1.0666
1.3136
1.8682
0.7611
1.1694
1.5176
1.2520
2.6334 0.8172
2.7132
c.v. I 0.7071
-
I
Mean deviation
0.?837
to the mean
I 0.7979
0.7996
0.8222
0.9301
1.2841
2.0487
2.1355
c.v. I 0.7555
0.7611
0.8263
0.9973
1.0525
0.8077
0.7837
-
Median -
deviation
I O.6745
c.v. I
1.1664
Discriminatory
to the median 0.6749
0.6810
0.7115
0.8368
1.4726
1.6093
1.1666
1.1694
1.1838
1.2520
1.3731
1.3080
121
51
119
164
sample size
1
0.951
8760
769
Table 5
106
Distribution All
space
probability
convex
complete
(1971,
Ch.
11)
For
any
three
functions,
metric and
we
space,
Hille
(1959,
functions
f,
define
a metric
by
concerned
Sec.
g,
a distance
with,
in a B a n a c h
are
defined
space.
See
in
a
Schwartz
h.7).
g and f,
we
are
included
h belonging h E
to
this
space
E,
E,
function
d(.,.)
having
the
ordinary
properties d(f,g) d(f,g)
The
space
sequence
is
0
d(f,g)
= d(g,f)
d(f,g)
> d(f,h)
complete
{fn ) of
>
= 0 ~ f = g
or,
E belongs
+ d(h,g).
in o t h e r
to
E,
{fn} lim The
space
is
convex,
With
the
we
usual
distributions to
some Let
have
not
E,
Q be
associate
the
the
measure
whole
+
must
be
The f, can
f(x)
it has
not
understood sample
however be
is
we
do
extended a unit
not
G
can
measure
Cauchy
consider
in o r d e r
probability
to
be
be
this
function
as b e i n g
to
be
we
either
and
they
the
some
subset,
probability
related
then
we
function
the
integral
~,
distribution
theory.
different
possibility
i.e.,
frequency are
f by
dx.
x E
to
consider.
way.
m C Q be
any
common
measure,
seen
f(x)
of t h e
possibly
~ h E E.
distributions
F(m)
for
I
probability
and
= f
sense
0 ~ t ~
following
space
continuous
in t h e
space
can
in t h e
sample
probability
what
density
F(=) Whenever
of a n y
E
(1-t)g
they
probability
probability
6
t E R,
clarified
terminology, or
limit
i.e.
h = tf far,
the
{fn ) = f ~ f E E.
f,g E
So
terms,
i.e.
all
for and
each we
notation
distribution
assume
distributions
that
f and
it that
107
F(n)
This
possible
distribution
extension
functions
smooth
functions
only
metric
construction
of the
as a sum defined
I.
sample
space
of D i r a c
on
purposes,
=
some
~ leads
functions
discrete
we also
to
sample
require
see d i s c r e t e
rather
the
than
space.
sample
as For
space
~ to be
metric. It must
be
axiomatic Feller
(1966,
constraining of the
Chap. and
The
our
metr4c
constraint
1977b)
discrete
data
as p o s s i b l e
we have
found
to m a n i p u l a t e . reported
avoided
by K o l m o g o r o v ,
A
Except
by
it too
critical
by Fine
the
as r e l a t e d
(1973,
appraisal
Chap.
3)
and
seen
to the
to
in the
first
of
considering
the
then,
include
that
performs
and
papers
metric.
both
escape
For
we have types
of
this
instance,
distribution
estimations
of the
discrete
- Although
simultaneously
The
to
The Prokhorov
space.
space.
a continuous
is due
generalization
real
several
Prokhorov
difficult.
knowledge,
continuous
distribution
substitutes
set and,
is e s p e c i a l l y
of our
p-dimension
indicate
avoid
best
as a m u l t i d i m e n s i o n a l
simultaneously
we m u s t
metric
Beran
to any
by m i n i m i z i n g
the
distance. when
cumulative are
been
~ = R p, the
and thus
(1977a, Hellinger
be
possibility
distributions,
work
proposal,
functions
the
is b e c a u s e
heavy
has
as m u c h
introduced
of an a p p r o p r i a t e
while
permits
required
this
setup
and m a y
probability
we have
attitude.
selection
Levy metric
h);
abusively
satisfactory
Prokhorov
that
of p r o b a b i l i t y
Kolmogorov
justifies
only
observed
theory
~ is the
distribution
relative
real
axis,
functions
to p r o b a b i l i t y
it is not (cdf);
density
thus
possible most
functions
to d e f i n e
arguments (pdf).
in this
108
Influence For
function
regular
in the
functional
vicinity
distribution
of f in terms
g be
in this T(g)
Truncation smaller The named
at the
~(~),
"influence"
of d i s t r i b u t i o n of the
vicinity,
= T(f)
level
the P r o k h o r o v function
obtained
T(f)
distance
function
first
by H a m p e l
g
derivatives.
(~) d~ +
order
d(f,g)
depends
Mise8
is p o s s i b l e Let
then
+ f ~(~)
of the
which
Yon
f, e x p a n s i o n
term
...
is the m o r e
valid,
the
is.
upon
the d i s t r i b u t i o n
(1972).
Its
f, has
evaluation
been
is easily
through
~(~0 ) = f ~(~) ~(~-~0 ) d~ =
where
lim
{T[ ( l - t )
~(~-~0 ) is the D i r a c
important
role
in the
f +
t~]
- T(f)},
distribution
assessmet
t 6
centered
of robustness.
R,
t ÷
0
on ~ = ~0"
It has
an
109 Jackknife The
technique
so-called
and w i t h
the
suitably
regular
justifying
the
infinitesimal
method
of t h e
derivation
functional
regular
distribution
which
g be
~(f)
= X(f)
X and ~(.)
present
produces
an
while
recipe
ordinary
f,
Mise8
uon
on
as w e l l
of n o t a t i o n ,
we
as to the
suppose
sample
expansion
a
space. is p o s s i b l e Let
derivatives.
then
f is the
distribution
g is an e m p i r i c a l
of
centered
bias
by
g(~) d~
(an a r i t h m e t i c
evaluation
to the
of the
defined
g(~) f A(~,~) g(~) d~ d~ +
context,
is u n k n o w n , is some
is o n l y
of d i s t r i b u t i o n o f the
+ [ l(~)
estimation
as a m u l t i d i m e n s i o n a l
vicinity,
I + ~ f
the
of e s t i m a t o r s
section
leads
as w e l l
in this
with
convenience
of f in t e r m s
X(g)
In t h e
This
For
functional
in the v i c i n i t y
deals
variance
functionals.
jackknife.
vector-value For
jackknife
estimation
rule
a vector
. . .
underlying
distribution.
or p o s s i b l y
denoted
~,
some
The
sample
functional
an a l g o r i t h m )
which
i.e.
e = T(f), A = lCg), with
g = [
wi
6(a-
X = (x 1 .....
a i)/[w
i
x n) ,
w. > O. The the
script
6(x_ - xi)
observation
x.
stands
and to
each
for the
Dirac
observation
x.
--I
non-negative
weight
expansion
obtain
distribution
is a s s o c i a t e d
We
introduce
the
expression
of g in the
A = A + [ Z(a i) w i / ( [ w i ) 1
+ 7 [[ which
can
also
a
--i
w.. 1
to
centered
wi ~ ( ~ i '
be w r i t t e n
~j)
wj/([wi
12 + " ' "
in t h e m a t r i c i a l
notation
A
8 = e_ + V w I ( I '
w)
+ 1 _w' @ ~ @ wlC1'
where w =
I'
Wls... =
,Wn) !
(I,...,I)'
w) 2 + ...
on
110
a n d @ is a s c r i p t were
a scalar.
bilinear
form
which
Per with
would
an o r d i n a r y
[_w'
[ elk
where
While due
is a s q u a r e
searching
to t h e
denote
definition,
@
a matricial k-th
matrlcial
¢ @
=
w_] k
product
component product,
_w '
[ ¢] k
if v e c t o r
is g i v e n
by
e
a
i.e.,
w
-
matrix.
for the
quadratic
the
term
origin
of p o s s i b l e
inasmuch
bias,
it a p p e a r s
to be
as
A
8_ is c o n s i s t e n t
with
respect
to _8
and w.
is i n d e p e n d e n t
of
x..
l
That
is
Terms
of o r d e r
superior
to two
have
respect
to t h e
lower
neglegible
with
order
cannot
term
To the
--i
derive
first
introduce
an e x p r e s s i o n
order
term.
any
covariance
Thus
preliminary
derivation
of t h e
of an e s t i m a t o r on d i f f e r e n t
we o n l y
To size group
apply h
the
(n = gh) by
some
=
I is the
+
being
Observe
that
the
are first
of ~,
we l i m i t
ourselves
to
consider
V
w
/I'
w
given
V'}
reported,
method.
by
/ (I' w) 2 we
now produce
It c o n s i s t s
on a set of w e i g h t s
w with
the
in a c l e v e r other
comparison
estimators
based
of w e i g h t s .
method,
we d i s t r i b u t e
and c h a n g e
factor
the
identity
matrix
the
weights
(I + t). w.
where
e
seeing they
consistent estimators.
for
= E { W w w'
results
jackknife
based
sets
orders.
bias
of ~ is a p p r o x i m a t e l y
coy(S) These
neglected
for the v a r i a n c e
8
a n d the
been
Then :
~i
the +
~.~
and E. I
observations
of the
in g g r o u p s
observations
weight-vector
in the
of i-th
w becomes
w
is d i a g o n a l
with
ones
for
the
111 i-th g r o u p usual
and
zeros
presentation
produces
otherwise.
is r e l a t i v e
the o r d i n a r y
It m a y be
appropriate
to a scalar
jackknife,
and
estimator
to note that the
e and that
t = -I
small t the i n f i n i t e s i m a l
jackknife. We d e n o t e i.e.,
by ~i the p s e u d o - e s t i m a t e
based
on the w e i g h t - v e c t o r
Ei ,
A
ei The
corresponding
TCX, _wi).
=
pseudo-value
is g i v e n by A
--~i = [ (1' _wi) _ei - (I' w) ~_]/t and has the e x p a n s i o n ,
~i
for
small t,
= (I' E. w) e + v E i w I--
~ ~
+
I
i
-I
2w @ ¢® Ei_w + tw'
E. @ ¢ ~ E.w
I' E. w -
Averaging
it has
of t h e s e
a relatively
all g r o u p s
-- I v
w~
w' i --
pseudo-values
simple
are e q u a l l y
®
¢
•
leads
expansion
weighted.
wil.
+
"'"
to the j a c k k n i f e
when
Thus,
estimate
¢ is b l o c k - d i a g o n a l
under
and when
conditions
[i[j _w' Ej ® ¢ @ E i w_ = O, for
i # j
and g 1' E i E = 1' w, we d e r i v e ,
for
for all i,
small t/g,
= e + vwl(1,w)
+ ½
(I ÷ t
-
2t/g)
w' • ¢ @ w / ( 1 ' w ) 2 + ...
A
Comparison equates applied.
w i t h the e x p a n s i o n the o r i g i n a l Bias
cancellation producing
estimator
reduction
occurs
of the s e c o n d
term
is g i v e n
of e r e v e a l s e while
the
the j a c k k n i f e
infinitesimal
w i t h the o r d i n a r y
o r d e r term.
by the
that
solution
Exact of the
version
jackknife
cancellation equation
[i(~i @ ¢ @ X i / ! ' ~ i ) = g[' @ ¢ @ w / 1 ' w
estimate is
(t = -I) by of this bias
in t
112
or,
under
the
same
conditions,
by p r e c i s e l y t
whatever
the v a l u e
=
--I
of g is. A
Estimation the
of the
covariance
covariance
of the
of ~
is o b t a i n e d
after
investigating
variables
~mi : ~--i -- (l' E i W) "e'. The
attitude
is that
the
pseudo-value
7.
is e s s e n t i a l l y
N I
of the
i-th
covariance now be
group could
followed
incidence possibly
under
two
E(W
on the
lead
estimator
to the
representative
A
[;
therefore,
covariance
of
e.
its
This
way
will
conditions.
E. w) I --
(E'
E. W') j
= 0, for
i ~
j
and
(I' w) [i (I' E. w) cov(~ E i w) = [i(I' E i w) 2 cov(~w) They
imply,
from
a population
sampled
for the
group
first,
that
of g r o u p
either
has
the
g groups
variates
the
same
and,
are
for the
weight
(I'
Under
these
total
covariance
conditions, 8.
=
W
that
or yields
each a
--
proportional
to
its w e i g h t .
we have E.
--i
1
[i c°v(!i)
i
independently
second,
E. w)
u
to the
contribution
drawn
w
-
(I'
--
E.
--
1
w/1' .
.
w) .
= [ I - [ i ( ! ' E i w/1'
~
w
.
w) 2]
cov(~
X)
and
covCA) = [i covCAi)l[(±' It
is c o n v e n i e n t
derivatives such
of ~(X,
to
state
~)
with
this
z) 2 - [i(±' E i a)2].
last
respect
result
to ~.
Let
in terms
of the
us d e f i n e
the
Jacobian
that A
e_i = T(X, then the
we
evaluate
jackknife
the
wi)
= e_ + t J E i ~,
pseudo-estimate,
estimate
the
pseudo-value
and,
finally,
J
113 A
8" = e + J w = e, while
t is
becomes
small
in t h e
and
due
present
to
the
homogeneity
of ~ ( X ,
~).
The
context
= CA' E)
J Ei
and, t h e r e f r o m , cov(~)
= [i
cov(J
Ei Z)/[ 1 - [i(l
' Ei " w / l '
w)2].
variate
~.
114
Prokhorov This i.e.,
metric
metric
a topology
Prokhorov has
induces
of p o i n t w i s e
(1956),
progressively
investigate
limit
We f i r s t over
by H a m p e l
retained
the
theorems
- e.g.,
sample
space
space
~ and,
~, to
any
n-neighborhood
m R of all p o i n t s
Formally,
0(.,.)
with
n Then
the m e t r i c
a n d g,
= {y
is g i v e n
associated
to m e a s u r e s
H(F,G) U = n(c), arbitrariness
function
n(c)
performed distance
in the
in ~.
distribution
the
A further
functions;
simplified
d(f,g) or,
= max
= 0,
= inf{e
> 0
by
one
~ H(F,G)
metric
f and g
several
associate
than
n from
~.
< n}. between
following
to f u n c t i o n s
It is b o u n d e d
= inf{e
> 0
as f o l l o w s
: G(m)
f
way
~ F ( ~ ~) + ¢, 0,
for
all
m C ~}.
q has
n equal the
increasing to
e.
dimension
is p o s s i b l e
while
This of
= H(G,F).
can he w r i t t e n
< G ( ~ E) + e, for
~ F ( m e) + e, f o r
is
a
f and g are
has
all ~ C ~)
equivalently,
d(f,g)
an o p e n
H(G,F)}
setting
whereas
: F(~)
for
of the m o n o t o n o u s l y
simplification
Prokhorov
less
in the
: F(~)
effectively, = G(n)
functions
~ C ~ we
d(f,g)
(H(F,G),
reduced
to
(1975).
axis.
0(x,y)
n'(~) >
e is a s c a l a r
F(n) Then
F and G,
theory
by
and
~-space,
distance
selection
is o r d i n a r i l y
although
distance
of t h e
= inf{a ~ 0 ~(0)
real
subset
: ~ x E m,
by t h e
d(f,g)
The
at
the m e t r i c
illustrate
on t h e
closed
et al. two
introduced
robustness
in p r o b a b i l i t y
between
then,
been
to a s s e s s
see K o m l 6 s
metric
functions
It has
(1971)
attention
the P r o k h o r o v
distribution
In a m e t r i c
convergence.
proposed
define
a common
particular
distribution space a v a g u e t o p o l o g y ,
over the
all m C ~}.
115
0 ~ dCf,g) To
provide
some
distribution
insight,
functions
in [O,a] , w i t h
a >
0
in [ 0 , b ] , w i t h
b >
a
(I
- t)g,
: a sampling
n
h
a linear sum
is
of
The
size
between
functions
restrictive
distance
between
is
realized
H(H,F).
For
by
values been x I <
and
can
(I
m = [ 0,a]
t = 0 we
the
the
distance
triangular
interpolation.
use
of
derive
the
the
<
distance
H(F,Fn) ; but
realizing
between
~ = R I.
several They
are
a jump g.
at
coordinate
Distribution
f
x = is
n
a; a
- xi) ,
the
... be
for
interval
according
<
xn <
(b
We
will
later
to
a.
performed given
[ O,a].
with
probability
one.
by
- a)/(b
H(F,H),
as
+
I - t)
well
as
~ = [a
+
c,
b]
for
=
(b
- a)/(b
between = t
h and (b
holds
-
+
I).
g is
a)/(b
true,
+
given
by
I)
although
h
is
a linear
have
d(f,g) To
in
- t)
inequality
We
axis,
obtain
d(h,g) and
distances
I
f and
~(x
h is
d(f,g) Similarly,
1.
f.
with
ranked
0 <
=
=
6,
I = ~ ~
f and
t ~
n from
interpolation of D i r a c
d(f,h) and
0 ~
function
x. a r e t h e n s a m p l e d i assume that the sample has
not
with
a step
where
is
real
: uniform
fn
This
the
the
: uniform
Distribution is
consider over
we
defined
g
f
it
= G(n)
f
h = t f +
weighted
~ F(~)
great
infimum,
we
d(f,h)
+ d(h,g),
it
is
I.
f
care
needed
to
define
compelled
to
introduce
is
fn'
t <
between
are
and
0 <
convenient the
to m a k e
subset some
~0
notations
116 for to
this sum
only
small
derivation. connex
The
general
intervals
e.
idea
selected
of t h e
~0
construction
in o r d e r
to
have
by
the
is
I
Fn
(e~)
= 0 and
F(e i)
= max.
Then
~0 = u e i. Denote size
by n,
Yi
and
all by
disjoint
i i their
finite
intervals
lengths;
Yi
= [xi-1'
11
=
created
sample
of
formally xi]
and •
X -
z
-
xi-1
with x 0 = O, Xn+1 The In
selection each
of
interval
e i is Yi'
= a.
implicitely
associated
2e,
i
select
the
the
satisfying 1. >
we
to
intervals e i = [ x i _ I + e, x i - e]
of l e n g t h s
(i. - 2c) 1
and
such
that
[e~] Therefore,
the
distance
definition
H ( F , F n) or,
=
{s
Yi"
=
becomes
: F ( ~ O)
= F n ( ~ ~)
+
e}
immediately, (I/a)
[si(l i -
2c)
=
0 +
s
and d ( f , f n)
= ~ = Is i l i / ( a
with si =
1 ' if
= O,
1.z >
2e
otherwise.
+ 2 [ s i)
knowledge
of
c.
117
The
way
f
converges
to f, w h i l e
the
size
n increases,
is of great
n
interest We
and will
assume
of v a r i a t e
thus
be
investigated.
n sufficiently
i i to its
pdf With
this
only
large
continuous {1}
condition,
leads
to a law
of
=
to the
is
conclude,
involved
iterated -
= c ~
may
we
observe
in the
that
construction
contribute
is so m u c h
faster
Experimental obtained
as
distribution
nl/a).
of the
+ 2n I
[ a/(2n)]
implicit
£ pdf
equation
{i}
dl).
in
why
{[ a/(2n)]
a fraction
[
s,
the
=
l
(2n/a)}.
e of the
intervals
in F r o k h o r o v
theoretical
derivation
results. 1
n
1000
10000
d(f,fn)
.00292
.000392
~(f,fn )
~(f,fn )
dev.
Number
of r e p l i c a t e s
obtain
the
last
two
.00289
.000377
.00032
.000021
5o
20
to lines
metric
metric.
1
Stand.
Yi
~.
convergence
above
following
in
i.e.
Kolmogorov
of the
a
size
only
(l/n)
the
by the
in
of ~0'
explain with
validation
Theoretical Average
to than
indicated
Parameter Sample
sample
logarithm
- [ a/(2n)]
lim This
exp(-
dl/(a
the
distribution
approximation d(f,fn)
To
(n/a)
{i}
£ =
and
assimilate
~ is s o l u t i o n
e = n [ ¢ i pdf This
to
parent
has
been
118
Robustness Let be
f, g, ~ be
a distance
~(t
,g) the
n n drawn if,
distributions
function
such
distribution
from
and only
O, ~
between
(1971)
Under relate
also
based
the
estimator
t
based
on a sample
n t n is r o b u s t
Estimator
O,
<
~
f,
V n
with
some
defines
incompletely 6 when
t
can
n
respect
integral
from
g at the
point
of the
the
<
H - robustness
sample
sizes
E}.
in order
despite
lack
to r e l a t e
of
specified be
conditions,
expanded
in t e r m
term
point
= tn(g)
sample
+ 1%(~)
is the m o r e
where
~(~)
space,
least
different h =
f(m)
important
is the m o s t
i.e.
first
dm +
the m o r e
differing
important.
Itn(f)
produces
an e s t i m a t o r
- tn(g) I
f(~)
d~ I
distribution f(~)
=
(I - t) g(m)
+ t ~(~ - m0)
with t = small
h, we
have
lh/~(~0) I.
asymptotically d(f,g)
(n ÷ ~) =
t
d [ $ ( t n , g ) , ~(tn,f) ] = h and,
to
order
Yon
...
Let
such that
f which
= I.r ~(~)
For
independency
it is p o s s i b l e of the
I*¢"o)I,
is the
to g
We have tn(f)
the
by
of size
:
6 ~ d [ ~ ( t n , g ) , ~(tn,f)]
on d i f f e r e n t
Mise8 derivative.
then
and d(. I.)
we d e n o t e
samples.
E and
and the
g.
6 >
{d(f,g) Hampel
Prokhorov metric, then
if,
~ ¢ >
estimators
of the
distribution
distribution spaces
in some
as the
thus,
a(f,g)
~ ~ = s > ~I~(~o)I.
bias
f is
mO be this
119
yon Mises This T(f)
derivative
derivative
be the
is r e l a t i v e
functional
a n d f,
to a f u n c t i o n a l as w e l l
d~stributioe space d e f i n e d
over
order
(1947)
von Mises
lim where
derivative
{T[ (I - t ) f the
integral The
limit
existence
A by-product
space
of this
presentation
its v a l i t i t y .
to p r o g r e s s
his
is not
work
topologies. that
p(.,.)
of t h e
substitute, and go"
to
These
parameter
space
is t h e
T(f)
zero
and the
seeing
above
insight.
t o our
as d i d needs
If t h e r e
with are
will
set
f o r he
see t h a t unable
which
are
the
to it is
(1976).
However
is d e p e n d e n t
respect
upon
to the m e t r i c we
continuous,
with
f.
of c o n d i t i o n s .
discontinuities,
be d e f i n e d
we d e r i v e
situations,
Reeds
some
distribution
remain
asymptotic
further
to be
convexity
expression,
We w i l l
we w i l l
the
to p r o v i d e
of the
a sufficient
but
appear
that
the h e l p
say f0 of
some
a fine
f0 ) = 0,
c E R,
c ÷ Co,
Prokhorov metric, a n d
to f u l f i l l
continuous
performing
he
continuous
distributions
lim With
first
c satisfying
d(.,.)
expect
of some
the
t E R, t ÷ 0
to
In o r d e r
to the
will
2.
lim d(g, We
d~,
derivative
T,
in the v i c i n i t y
pertinent
lim d(f, where
leads
f and g, n e w d i s t r i b u t i o n s new
g(~)
t tending
Gateaux
Considering
f and g are
sample
~(~)
assumed.
is r e a s o n a b l e
really
We
assume
has b e e n
a few steps
intricate
/ t = f
at p r o v i d i n g
demonstrate
2; t h e n
Let
is, p e r d e f i n i t i o n ,
functional
derivation
expansion
possible
the
which
Taylor-like
space
for positive
of T(f)
aims
as g, be d i s t r i b u t i o n s sample
for this
upon
expansion
a distribution.
2.
conditions
on t h e m e c h a n i s m
a Taylor-like
The
over
depending
distribution
insight
- T(f)}
is u n d e r s t o o d
is e x t e n d e d
essentially of t h e
+ tg]
some
of
go ) = 0,
the
T ( f 0)
c ÷ cO •
condition = T(f),
f and g, t h e partition
c E R,
c E R, c ÷ c O •
principle
of the
sample
of the space
derivation 2, t h e n
consists
obtaining
the
in
120
expansion
of T as a f u n c t i o n
to conclude, First
refine
defined
the p a r t i t i o n
we restrict
the G - s p a c e
in a m u l t i d i m e n s i o n a l
space
in order
to obtain
limit
to a ball
of radius
R centered
and,
expressions. on some
point ~0 • G n R = {K
: p(K,
K0 ) < R}.
In this ball, we c o n s i d e r the p a r t i t i o n in n d i s j o i n t subsets and to each subset we a s s o c i a t e a m e a s u r e A(~i). We have U mi = ~R, and
~i'
I < i < n
mi n ~j = O, i # j
as well
as
A(m i u ~j)
In each
subset
we a r b i t r a r i l y
the c o r r e s p o n d i n g
values
= A(~i ) + A(mj), select
i @ j.
a point ~i
of the d i s t r i b u t i o n s
and denote
by fi and gi
f and g, i.e.
fi = f(~i )' ~i • ~i C nR , and
gi = g(~i )' ~i • ~i C GR . We now
investigate
partitioned notation based
space.
TR(f),
the b e h a v i o u r
Over this
whereas
the
on the n e v a l u a t i o n s
as a f u n c t i o n
defined
in the v i c i n i t y
over
of {fi}
differentiable.
We are
of the f u n c t i o n a l
ball,
script in the
we denote TR({fi})
subsets
an n - d i m e n s i o n
is valid
while
interested
the
on the
functional
denotes
~i"
T(f)
But TR({fi})
space
and
by the
its a p p r o x i m a t i o n a Taylor
can be
seen
expansion
T is s u f f i c i e n t l y
by the
expansion
TR({f i + t (gi - fi )}) = T R ( { f 'i} ) + ~ [ t ( g i - fi) ] h i A(m i) i
+ !2 [~[t(gi ÷
where and
h.l A(~.)I stands similarly for the
The last
part
- fi )] hij A(wi)
A(wj)
[t(gj
- fj)]
I .
.
.
.
for the first p a r t i a l d e r i v a t i v e higher order partial derivatives.
of the d e r i v a t i o n
consists
in o b t a i n i n g
(a/af i) TR{f i} the
limit
121 expressions. with
We
all A ( m i )
first
refine
tending
to
TR[ f + t
the
zero.
(g - f)]
+ t f
h(m)
+ ~I t 2 f
We
assume
existence
far,
the to
ball
GR"
T[ (I
[g(~)
-
limits
sense.
We n o w
c O ) in o r d e r
to
let
- t)
To the
conclude
first
sample we
order
h(~)
reorganize
the
and
passages
as to
T(f)
integral for
= S h(~)
g(~)
d~ - I f
~(=)
- [* g(=)
h(,,,)
-
f(m)]
d= d m
T(f)
are
understood
integration (and
is,
also
thus
possibly
c
dm
h(~,=)
[g(,,,)
- f(=)]
d~ d~
follows
for
the
g(m)
following
h(=) fC=)
f(~)
d~]}
as
ones.
d~] gC~)
d=
regular
to
expansion
=
+ t f ?(u) t2 #
expressions, the
sufficiently
Taylor-llke
f + tg]
domain.
d,,,.
is a f u n c t i o n a l
T[ (I - t)
f
integration
d=
+ 1
[gC=)
integrals of
infinity
- f(=)]
- f(~)]
limits,
the
- f(~)]
[g(~)
= f Inasmuch
h(=,=)
f
= T(f)
similarly
= S Ch(=)
the
infinity
.*.
~ as
f h(=)
f(,,,)]
to
[g(~)
space
term,
dm
domain
R tend
f + tg]
1 t 3 + 6
whole
- f(m)]
and
The
+~I t 2 f [g(~)
the
to
obtain
+ t I
with
n tend
= TR(f)
[ g(m)
of t h e
Riemann-Stieltjes
tend
letting
t 3
1
in the
partition, Then
#
g(m)
d~
~(u,
=)
g(m)
du dm
+
...
permit
all
122
is v a l i d
and
the
expression
of t h e
yon
Mises
derivative
results
immediately. When obtain
as w e l l
The obtain
we
introduce
a tautology
g = f in the
for
any
two
t value,
members
and
of t h e
therefore
we
above
equality,
we
have
as
expansion
in T a y l o r
asymptotic
essentially
jackknife
in study
properties
small
sample
as w e l l
as
series of
has
contexts. for
been
statistical
the
It
used
by F i i i p p o v a
estimators, is t h e
basis
influence function
(1962)
we u s e for
the
concept.
it
to
AUTHOR INDEX
Amann H., 7h, 88. Andrews D.F., 4, 16, 4h, 65, 66, 67, 68, 71, 88. Anseombe F.J., 3, 40, 88. Barra J.R., 88, 97. Beran R., 88, 107. Berger J.O., 3, 88. Beveridge G.S.G., 54, 88. Bickel P.J., 4, 16, 44, 65, 66, 67, 88. Bissel A.F., 20, 88. Box G.E.P., I, 84, 88, 89. Boyd D.W., 46, 89. Brodeau F., 88, 97. Brownlee K.A., 68, 89. Buckland W.R., I, 94, 100. Cargo G.T., ~5, 89. Causey B.D., 20, 98. Chert C.H., 9, 89. Coleman D., 6h, 89. Collins J. R., 39, 89. Cover T.M., 68, 84, 89. Daniel C., 68, 89. Dempster A.P., 19, 39, 82, 89. Denby L., 68, 89. Devlin S.J., 83, 90. Draper N.R., 68, 84, 89, 90. Dutter R., 63, 67, 90, 93. Dyke G.V., 84, 90. Eisenhart C., 2, 90, 98. Ekblom H., h6, 90. Evans J.G., 83, 90. Feller W., 36, 90, 107. Ferguson R.A., 20, 28, 88, 90. Filippova A.A., 90, 123. Fine T.L., 90, 107. Fletcher R., ~5, h6, 90, 91. Florens J.P., 84, 91. Forsythe A.B., 39, ~6, 91, 98.
124
Fryer J.G., 28, 90. Garel B., 40, 91. Gaver D.P., 83, 91. Gentleman W.M., 3, 46, 91. Gnanadesikan R., 83, 84, 90, 91. Grant J.A., 45, 90. Gray H.L., 19, 20, 28, 91, 97. Gross A.M., 64, 65, 66, 91. Gupta R.P., 88, 92. Gupta S.S., 89, 92, 93, 95. Hampel F.R., 2, 3, 4, 11, 12, 16, 29, 34, 39, 44, 65, 66, 67, 68, 78, 88, 92, 108, 114, 118. Harter H.L., 2, 92. Hebden M.D., 45, 90. Henrici P., 73, 92. Hill R.W., 39, 92. Hille E., 92, 106. Hocking R.R., 84, 92. Hoel D.G., 83, 91. Hoerl A.E., 84, 93. Hogg R.V., 80, 93. Holland P., 64, 89. Huber P.J., 2, 3, 4, 12, 16, 17, 30, 31, 39, 40, 44, 45, 50, 57, 58, 60, 63, 65, 66, 67, 78, 84, 88, 93. Hwang S.Y., 45, 93. I.S.I., 89, 90, 93, 97, 98. Jackson D., 45, 93. Jaeckel L.A., 3, 19, 23, 39, 68, 94. Kaden N., 64, 89. Kanal L., 9, 94. Karamardian S., 76, 94. Kellogg R.B., 73, 94. Kendall M.G., I, 94, 100. Kennard R.W., 84, 93. Kersten P., 83, 90, 94. Kettenring J.R., 83, 84, 90, 91. Klema V., 64, 89. Koml6s J., 94, 114. Krishnaiah P.R., 89, 91, 94.
125
Kurz L., 83, 90, 94. Lachenbruch P.A., 20, 9h. Lewis J.T., 45, 94. Li T.Y., 73, 94. Major P., 94, 114. Makhoul J., 82, 95. Mallows C.L., 68, 78, 83, 89, 95. Marcus M.B., 84, 95. Maronna R.A., 39, 60, 95. Martin L.J., 28, 97. Martin R.D., 83, 95. Masreliez C.J., 83, 95. Mc Whinney I.A., 28, 90. Mead R., 40, 84, 95. Merle G., 46, 95. Mik~ V., 82, 95. Miller R.G., 19, 20, 24, 28, 95. Moore D.S., 89, 91, 93, 95. Mosteller F., 20, 95. Mouchart M., 84, 91. Munster M., 8, 95. N.C.H.S., 3, 95. Nevel'son M.B., 83, 96. Olkin I., 96, 98. Owen D.B., 19, 97. Pearsall E.S., 84, 96. Peters S.C., 6~, 89. Pike D.J., 40, 84, 95. Powell M.J.D., 46, 91. Prokhorov Y.V., 2, 9, 96, 114. Purl M.L., 93, 96. Quenouille M.R., 3, 17, 96. Rao C.R., 36, 96. Reeds J.A., 96, 120. Relles D.A., 82, 96. Revesz P., 94, 96. Rey W.J.J., 16, 24, 28, 46, 69, 83, 96, 97. Richard J.F., 84, 91. Rogers W.H., 4, 16, 44, 65, 66, 67, 82, 88, 96.
126
Rohlf F.J., 84, 97. Romier G., 88, 97. Ronner A.E., 44, 97. Sacks J., 84, 95. Scarf H., 73, 97. Schechter R.S., 54, 88. Scholz F.W., 31 , 97. Schucany W.R.,
19, 20, 28, 91, 97.
Schwartz L., 97. 106. Sen P.K. , 19, 97. Sharot T., 23, 97. Shisha 0., 45, 89, 9h. Smith H., 68, 90. Smith V.K., 80, 97. Sp~th H., 46, 95. Srivastava J.N., 93, 97. Stein C., 52, 97. Stigler S.M., 2, 97. Swaminathan S., 76, 98. Takeuchi K., 82, 98. Thorburn D., 24, 98. Thucydides, 2, 98. Todd M.J., 73, 98. Tollet I.H., 83, 98. Tukey J.W., 3, 4, 16, 19, 22, 44, 65, 66, 67, 88, 91, 98, 102. Tusn&dy G., 94, 114. Vajda S., 55, 98. Van Campenhout J.M., 68, 84, 89. Van Cutsem B., 88, 97. Velleman P.F., 39, 64, 99. yon Mises R., 2, 12, 98, 120. Watkins T.A., 19, 20, 91. Wilk M.B., 84, 91. Wood F.S., 68, 89. Woodruff R.S., 20, 98. Woodward W.A., 19, 28, 91. Yale C., 39, 98. Yohai V.J., 82, 98. Yorke J., 73, 94. Youden W.J., 40, 98. Ypelaar A., 39, 64, 99.
SUBJECT
INDEX
Adaptive estimator, 39, 80. Admissibility, h2, 63. Asymptotic variance, 35, ~9. Bayesian viewpoint, 39, 82. Best estimator, 30, ~9. Bias reduction, see "jackknife",
21.
Bibliography, 7, 88. Breakdownpoint, 11. C.D.F., see "distribution". Classification, 28. Conceptual difficulty, 6, 30. Consistent estimator, 6, 20, 101. Contaminated normal distribution, 3, h, 58, 66, 102. Convexity, ~2, hg. Dirac function, 10, I~. Distance function, 9, 106. Distribution space, 9, 77, 106. Example, 10, 2h, 26, 36, 53, 57, 68, 79, 85, 102, 115. Fixed point computation, 72. Functional on distribution, 6, 12, ~6, 77, 120. Influence function, 15, 29, 33, 78, 83, 108. Jackknife, 6, 17, 20, 79, 109. Least powers, h3. Location estimate, 38, hg. M-estimator, 6, 30. Maximum likelihood, 30, 53, 57. Mean (arithmetic), 12, 32, ~ . Median, 12, h5, ~8, 67, 83. Metric, see "distance function". Minimax viewpoint, 3, 12, 66, 8&. MM-estimator, 6, 32, 60. Multiple solutions, 63, 7h. Normality, 2, 28, 35, 36, 78. Notation, 31. Order statistics, 28, 67. Outlier, 5, 23, 39, 41, 66, 69, 8~. P.D.F., see "distribution".
128
Prokhorov metric, 9, 13, 114. Recursive estimator, 82. References, 7, 88. Regression estimate, 38, 60, 65, 84. Rejection techniques, 2, 3. Relaxation method, 62. Rigidity index, ~I. Robustness, I, h, 11, 29, hl, ~9, 8h, 102, 118. Sampling distribution, 8, 28, 35, 78. Scale, 32, 43, 60. Scale minimization, 68. Simultaneous solution, 6h, 75. Time-series, 28, 82. Unicity, 35, h9, 7h. Variance estimate, see "jackknife", 22, 23, 113. yon Mises derivatives, 13, 16, 77, 108, 120. Weight, 14, 20.