ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 128
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES-CNRS Toulouse, France
ASSOCIATE EDITORS
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California
TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdom
Advances in
Imaging and Electron Physics EDITED BY PETER W. HAWKES CEMES-CNRS Toulouse, France
VOLUME 128
Amsterdam Boston Heidelberg London New York Oxford Paris San Diego San Francisco Singapore Sydney Tokyo
This book is printed on acid-free paper. Copyright ß 2003, Elsevier Inc. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the Publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (222 Rosewood Drive, Danvers, Massachusetts 01923), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-2003 chapters are as shown on the title pages: If no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/2003 $35.00 Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: ( þ 44) 1865 843830, fax ( þ 44) 1865 853333, e-mail:
[email protected]. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting ‘‘Customer Support’’ and then ‘‘Obtaining Permissions.’’
Academic Press An Elsevier imprint 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.academicpress.com
Academic Press 84 Theobald’s Road, London WC1X 8RR, UK http://www.academicpress.com International Standard Book Number: 0-12-014770-X PRINTED IN THE UNITED STATES OF AMERICA 03 04 05 06 07 08 9 8 7 6 5 4 3
2
1
CONTENTS
CONTRIBUTORS . . . . PREFACE. . . . . . . FUTURE CONTRIBUTIONS .
. . .
. . .
. . .
. . .
. . . . . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
ix xi xiii
Fourier, Block, and Lapped Transforms TIL AACH I. II. III. IV. V. VI. VII.
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
1 3 13 25 28 39 41 42 43 45 47 48
Introduction . . . . . . . . . . . . . Some Views on Space and Distances . . . . . Spatial Fuzzy Distances: General Considerations Geodesic Distance in a Fuzzy Set . . . . . . Distance from a Point to a Fuzzy Set . . . . Distance between Two Fuzzy Sets. . . . . . Spatial Representations of Distance Information. Qualitative Distance in a Symbolic Setting . . . Conclusion . . . . . . . . . . . . . References. . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
52 54 63 75 80 85 104 108 114 115
Introduction: Why Transform Signals Anyway? Linear System Theory and Fourier Transforms Transform Coding . . . . . . . . . . Two-Dimensional Transforms . . . . . . Lapped Transforms . . . . . . . . . Image Restoration and Enhancement . . . Discussion. . . . . . . . . . . . . Appendix A . . . . . . . . . . . . Appendix B . . . . . . . . . . . . Appendix C . . . . . . . . . . . . Appendix D . . . . . . . . . . . . References. . . . . . . . . . . . .
On Fuzzy Spatial Distances ISABELLE BLOCH I. II. III. IV. V. VI. VII. VIII. IX.
v
vi
CONTENTS
Mathematical Morphology Applied to Circular Data ALLAN HANBURY I. II. III. IV. V. VI.
Introduction . . . . . . . . . . . . . Processing Circular Data . . . . . . . . . Application Examples . . . . . . . . . . 3D Polar Coordinate Color Spaces . . . . . Processing of 3D Polar Coordinate Color Spaces Conclusion . . . . . . . . . . . . . Appendix A: Connected Partitions . . . . . Appendix B: Cyclic Closings on Indexed Partitions References. . . . . . . . . . . . . .
. . . . . . . . .
Quantum Tomography G. MAURO D’ARIANO, MATTEO G. A. PARIS, MASSIMILIANO F. SACCHI I. II. III. IV. V. VI. VII. VIII. IX.
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
124 126 153 169 181 196 199 199 201
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
206 209 222 243 255 265 281 287 298 305
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
310 314 319 343 361 381 399 413 431 432
. . . . . . . . .
AND
Introduction . . . . . . . . . . . . . . . Wigner Functions and Elements of Detection Theory . General Tomographic Method . . . . . . . . . Universal Homodyning . . . . . . . . . . . Multimode Homodyne Tomography . . . . . . . Applications to Quantum Measurements . . . . . Tomography of a Quantum Device . . . . . . . Maximum Likelihood Method in Quantum Estimation Classical Imaging by Quantum Tomography . . . . References. . . . . . . . . . . . . . . .
Scanning Low-Energy Electron Microscopy ILONA MU¨LLEROVA´ AND LUDE˘K FRANK I. II. III. IV. V. VI. VII. VIII. IX
Introduction . . . . . . . . . . . Motivations to Lower the Electron Energy. Interaction of Slow Electrons with Solids . Emission of Electrons . . . . . . . . Formation of the Primary Beam . . . . Detection and Specimen-Related Issues . . Instruments . . . . . . . . . . . Selected Applications . . . . . . . . Conclusions . . . . . . . . . . . References. . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
vii
CONTENTS
Scale-Space Methods and Regularization for Denoising and Inverse Problems OTMAR SCHERZER I. II. III. IV. V. VI. VII. VIII. IX. X. XI.
INDEX
Introduction . . . . . . . . . . . . . . . . . . . Image Smoothing and Restoration via Diffusion Filtering . . . Regularization of Inverse Problems . . . . . . . . . . . Mumford–Shah Filtering . . . . . . . . . . . . . . . Regularization and Spline Approximation . . . . . . . . . Scale-Space Methods for Inverse Problems. . . . . . . . . Nonconvex Regularization Models . . . . . . . . . . . Discrete BV Regularization and Tube Methods . . . . . . . Wavelet Shrinkage . . . . . . . . . . . . . . . . . Regularization and Statistics . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
446 447 460 472 474 478 493 500 510 517 522 523 531
This Page Intentionally Left Blank
CONTRIBUTORS
Numbers in parentheses indicate the pages on which the authors’ contributions begin.
TIL AACH (1), Institute for Signal Processing, University of Lu¨beck, Ratzeburger Allee 160, D-23538 Lu¨beck, Germany ISABELLE BLOCH (51), Ecole Nationale Supe´rieure des Te´le´communications, De´partement TSI, CNRS URA 820, 46 rue Barrault, 75013 Paris, France LUDE˘K FRANK (309), Institute of Scientific Instruments AS CR, Kra´lovopolska´ 147, CZ-61264 Brno, Czech Republic ALLAN HANBURY (123) Pattern Recognition and Image Processing Group (PRIP), Vienna University of Technology, Favoritenstraße 9/1832, A-1040 Vienna, Austria G. MAURO D’ARIANO (205), Quantum Optics and Information Group, Istituto Nazionale per la Fisica della Materia, Unita` di Pavia, Dipartimento di Fisica ‘‘A. Volta,’’ Universita` di Pavia, Italy ILONA MU¨LLEROVA´ (309), Institute of Scientific Instruments AS CR, Kra´lovopolska´ 147, CZ-61264 Brno, Czech Republic MATTEO G. A. PARIS (205), Quantum Optics and Information Group, Istituto Nazionale per la Fisica della Materia, Unita` di Pavia, Dipartimento di Fisica ‘‘A. Volta,’’ Universita` di Pavia, Italy MASSIMILIANO F. SACCHI (205), Quantum Optics and Information Group, Istituto Nazionale per la Fisica della Materia, Unita` di Pavia, Dipartimento di Fisica ‘‘A. Volta,’’ Universita` di Pavia, Italy OTMAR SCHERZER (445), Department of Computer Science, Universita¨t Innsbruck, Technikerstraße 25, A-6020 Innsbruck, Austria
ix
This Page Intentionally Left Blank
PREFACE
The six contributions in this new volume extend over many themes: mathematical morphology, signal processing, scanning electron microscopy, quantum tomography and regularization. We begin with a survey of transforms that are used in signal and image processing, by Til Aach. First, the continuous and discrete Fourier transforms are presented, which leads to the notion of block transforms. These are necessary preliminaries to the real subject of this chapter, which is to describe lapped transforms, the purpose of which is to reduce or even eliminate the artefacts introduced by block transforms. The basis functions now extend over more than one block. The next chapter is a short monograph by Isabelle Bloch on fuzzy spatial distances. Fuzzy sets are being found useful in a host of different areas and this chapter, in which the basic notions and the reasons why they are of practical interest are set out very readably, enables the reader unfamiliar with the subject to master it easily. Mathematical morphology plays an important role in this work, which leads us naturally to the third contribution, again a short monograph, in which Allan Hanbury discusses the application of this technique to circular data. Such data are represented by angles or by directional information in two dimensions. They arise in many practical situations: wind directions, the orientations of fracture planes in rocks, and the hue component of color representations in threedimensional polar coordinates are among those cited by the author. More generally, the phase component of complex signals or of complex quantities arising from Fourier transforms are all examples of circular data. This thorough account of a somewhat neglected but very important aspect of image processing will, I am certain, be very much appreciated. The fourth chapter, by G. Mauro D’Ariano, Matteo Paris and Massimiliano Sacchi, brings us to the newest generation of electronic and optical devices. This magisterial account of quantum tomography explains how the quantum state of a system can be estimated by a tomographic technique and presents in full detail all the stages of the reasoning and some practical examples. In the fifth contribution, we return to electron microscopy, this time to the use of the scanning electron microscope at very low energy, typically below 5 keV. For this, the instrument must be redesigned and the image interpretation must be reconsidered carefully. Ilona Mu¨llerova´ and Lude˘k Frank examine the instrumental aspect of low-energy SEM in xi
xii
PREFACE
considerable detail before showing how useful the technique can be in practice. Many areas of image restoration, and indeed of signal processing in general, are bedevilled by the fact that the equations describing the restoration process are ill-posed, which means that there may be no solution compatible with the measurements, or many solutions may correspond to them or again the solution may be highly sensitive to small changes in the data. In order to stabilize the methods, some form of regularization is required, and this is the central theme of the chapter by Otmar Scherzer. In the course of his account, many related questions are examined and, once again, his chapter has the status of a short monograph on this important subject. In conclusion, I thank most warmly all the contributors for taking so much trouble to make their chapters accessible to non-specialists, and on the following pages I list articles promised for future volumes. Peter Hawkes
FUTURE CONTRIBUTIONS S. van Aert, A. den Dekker, A. van den Bos and D. van Dyck (vol. 130) Statistical experimental design for quantitative atomic-resolution transmission electron microscopy G. Abbate New developments in liquid-crystal-based photonic devices S. Ando Gradient operators and edge and corner detection C. Beeli Structure and microscopy of quasicrystals G. Borgefors Distance transforms B. L. Breton, D. McMullan and K. C. A. Smith (Eds) Sir Charles Oatley and the scanning electron microscope A. Bretto (vol. 130) Hypergraphs and their use in image modelling H. Delingette Surface reconstruction based on simplex meshes R. G. Forbes Liquid metal ion sources E. Fo¨rster and F. N. Chukhovsky X-ray optics A. Fox The critical-voltage effect L. Godo and V. Torra Aggregation operators A. Go¨lzha¨user Recent advances in electron holography with point sources A. M. Grigoryan and S. S. Agaian (vol. 130) Transform-based image enhancement algorithms with performance measure
xiii
xiv
FUTURE CONTRIBUTIONS
H. F. Harmuth and B. Meffert (vol. 129) Calculus of finite differences in quantum electrodynamics M. I. Herrera The development of electron microscopy in Spain D. Hitz Recent progress on HF ECR ion sources J. Hormigo and G. Cristobal (vol. 130) Texture and the Wigner distribution K. Ishizuka Contrast transfer and crystal images G. Ko¨gel Positron microscopy W. Krakow Sideband imaging N. Krueger (vol. 130) The application of statistical and deterministic regularities in biological and artificial vision systems B. Lahme Karhunen–Loeve decomposition B. Lencova´ Modern developments in electron optical calculations M. A. O’Keefe Electron image simulation N. Papamarkos and A. Kesidis The inverse Hough transform K. S. Pedersen, A. Lee and M. Nielsen The scale-space properties of natural images M. Petrou (vol. 130) Image registration R. Piroddi and M. Petrou (vol. 131) Dealing with irregularly sampled data M. Rainforth Recent developments in the microscopy of ceramics, ferroelectric materials and glass
FUTURE CONTRIBUTIONS
xv
E. Rau Energy analysers for electron microscopes H. Rauch The wave-particle dualism J. J. W. M. Rosink and N. van der Vaart (vol. 131) HEC sources for the CRT G. Schmahl X-ray microscopy S. Shirai CRT gun design methods T. Soma Focus-deflection systems and their applications J.-L. Starck The curvelet transform I. Talmon Study of complex fluids by transmission electron microscopy M. Tonouchi Terahertz radiation imaging N. M. Towghi Ip norm optimal filters Y. Uchikawa Electron gun optics D. van Dyck Very high resolution electron microscopy K. Vaeth and G. Rajeswaran Organic light-emitting arrays C. D. Wright and E. W. Hill Magnetic force microscopy-filtering for pattern recognition using wavelet transforms and neural networks M. Yeadon Instrumentation for surface studies
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 128
Fourier, Block, and Lapped Transforms TIL AACH Institute for Signal Processing, University of Lu¨beck, Ratzeburger Allee 160, D-23538 Lu¨beck, Germany I. Introduction: Why Transform Signals Anyway? . . . . . . II. Linear System Theory and Fourier Transforms . . . . . . A. Continuous-Time Signals and Systems . . . . . . . . . B. Discrete-Time Signals and Systems . . . . . . . . . . . C. The Discrete Fourier Transform and Block Transforms III. Transform Coding . . . . . . . . . . . . . . . . . . . . . . A. The Role of Transforms: Constrained Source Coding . B. Transform Efficiency . . . . . . . . . . . . . . . . . . . C. Transform Coding Performance . . . . . . . . . . . . . IV. Two-Dimensional Transforms . . . . . . . . . . . . . . . . V. Lapped Transforms . . . . . . . . . . . . . . . . . . . . . . A. Block Diagonal Transforms . . . . . . . . . . . . . . . B. Extension to Lapped Transforms . . . . . . . . . . . . C. The Lapped Orthogonal Transform . . . . . . . . . . . D. The Modulated Lapped Transform . . . . . . . . . . . E. Extensions . . . . . . . . . . . . . . . . . . . . . . . . . VI. Image Restoration and Enhancement . . . . . . . . . . . . VII. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix D . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
1 3 3 6 8 13 13 14 23 25 28 28 29 30 33 36 39 41 42 42 43 45 47 48
I. INTRODUCTION: WHY TRANSFORM SIGNALS ANYWAY? The Fourier transform and its related discrete transforms are of key importance in both theory and practice of signal and image processing. In the theory of continuous-time systems and signals, the Fourier transform allows one to describe both signal and system properties and the relation between system input and output signals in the frequency domain (Ziemer et al., 1989; Lu¨ke, 1999). Fourier-optical systems based on the diffraction of coherent light are a direct practical realization of the two-dimensional continuous Fourier transform (Papoulis, 1968; Bamler, 1989).The discretetime Fourier transform (DTFT) describes properties of discrete-time signals and systems. While the DTFT assigns frequency-continuous and periodic 1
Copyright ß 2003 Elsevier Inc. All rights reserved. 1076-5670/2003 $35.00
2
TIL AACH
spectra to discrete-time signals, the discrete Fourier transform (DFT) represents a discrete-time signal of finite length by a finite number of discrete-frequency coefficients (Oppenheim and Schafer, 1998; Proakis and Manolakis, 1996; Lu¨ke, 1999). The DFT thus permits one to compute spectral respresentations numerically. The DFT and other discrete transforms related to it, like the discrete cosine transform (DCT), are also of great practical importance for the implementations of signal and image processing systems, since efficient algorithms for their computations exist, e.g., in the form of the fast Fourier transform (FFT). However, while continuous-time Fourier analysis generally considers the entire time axis from minus infinity to plus infinity, the DFT is only defined for signals of finite duration. Conceptually, the finite-duration signals are formed by taking single periods from originally periodic signals. Consequently, enhancement and transform codings of, for instance, speech, are based on the spectral analysis of short time intervals of the speech waveform (Lim and Oppenheim, 1979; Ephraim and Malah, 1984; van Compernolle, 1992; Cappe´, 1994; Aach and Kunz, 1998). The length of the time intervals depends on the nature of the signals, viz. short-time stationarity. Similarly, transform coding (Clarke, 1985) or frequencydomain enhancement (Lim, 1980; Aach and Kunz, 1996a,b, 2000) of images require spectral analysis of rectangular blocks of finite extent in order to take into account short-space stationarity. Such processing by block transforms often generates audible or visible artifacts at block boundaries. While in some applications these artifacts may be mitigated using overlapping blocks (Lim and Oppenheim, 1979; Lim, 1980; Ephraim and Malah, 1984; Cappe´, 1994; van Compernolle, 1992; Aach and Kunz, 1996a,b, 1998; Aach, 2000), this is not practical in applications like transform coding, where overlapping blocks would inflate the data volume. Transform coders therefore punch out adjacent blocks from the incoming continuous data stream, and encode these individually. To illustrate the block artifacts, Figure 1 shows an image reconstructed after encoding by the JPEG algorithm, which uses a blockwise DCT (Rabbani and Jones, 1991). Lapped transforms aim at reducing or even eliminating block artifacts by the use of overlapping basis functions, which extend over more than one block. The purpose of this chapter is to provide a self-contained introduction to lapped transforms. Our approach is to develop lapped transforms from standard block transforms as a starting point. To introduce the topic of signal transforms, we first summarize the development from the Fourier transform of continuous-time signals to the DFT. An in-depth treatment can be found in many texts on digital signal processing and system theory (e.g., Ziemer et al., 1989; Oppenheim and Schafer, 1998; Lu¨ke, 1999). In Section III, we discuss the relevance of orthogonal block transforms for
FOURIER, BLOCK, AND LAPPED TRANSFORMS
3
FIGURE 1. Left: Portion of size 361 390 pixels of the ‘‘Marcel’’ image, 8 bits per pixel. Right: Reconstruction after JPEG compression at about 0.2 bits per pixel.
transform coding, which depends on the covariance structure of the signals. Section IV deals with two-dimensional block transforms. Orthogonal block transforms map a given number of signal samples contained in each block into an identical number of transform coefficients. Each signal block can hence be perfectly reconstructed from its transform coefficients by an inverse transform. In contrast to block transforms, the basis functions of lapped transforms discussed in Section V extend into neighboring blocks. The number of transform coefficients generated is then lower than the number of signal samples covered by the basis functions. Signal blocks can therefore not be perfectly reconstructed from their individual transform coefficients. However, if the transform meets a set of extended orthogonality conditions, the original signal is perfectly reconstructed by superimposing the overlapping, imperfectly reconstructed signal blocks. Two types of lapped transforms will be considered, the lapped orthogonal transform (LOT) and the modulated lapped transform (MLT). We then discuss extensions of these transforms before concluding with some examples comparing the use of block and lapped transforms in image restoration and enhancement. II. LINEAR SYSTEM THEORY AND FOURIER TRANSFORMS A. Continuous-Time Signals and Systems Let s(t) denote a real signal, with t being the independent continuous-time variable. Our aim is to describe the transmission of signals through one or
4
TIL AACH
more systems, where a system is regarded as a black box which maps an input signal s(t) into the output signal g(t) by a mapping M, i.e., g(t) ¼ M(s(t)). Restricting ourselves here to the class of linear time-invariant (LTI) systems, we require the systems to comply with the following conditions. (i) Linearity: A linear system reacts to any weighted combination of K input signals si(t), i ¼ 1, . . . , K, with the same weighted combination of output signals gi(t) ¼ M(si(t)): ! K K K X X X M ai si ðtÞ ¼ ai Mðsi ðtÞÞ ¼ ai gi ðtÞ, ð1Þ i¼1
i¼1
i¼1
where ai, i ¼ 1, . . . , K denote the weighting factors. (ii) Time invariance: A time-invariant system reacts to an arbitrary delay of the input signal with a correspondingly delayed, but otherwise unchanged output signal: MðsðtÞÞ ¼ gðtÞ ) Mðsðt ÞÞ ¼ gðt Þ,
ð2Þ
where is the delay. An LTI system is completely characterized by the response to the Dirac delta impulse (t). Denoting the so-called impulse response by h(t), we have h(t) ¼ M((t)). The Dirac impulse (t) is a distribution defined by the integral equation Z
1
sðtÞ ¼ 1
sðÞðt Þ d,
ð3Þ
which essentially represents a signal s(t) by an infinite series of Dirac impulses delayed by and weighted by s(). Since an LTI system reacts to the signal s(t) by the same weighted combination of delayed impulse responses h(t), it suffices to replace (t) in Equation (3) by h(t) to obtain the output g(t): Z
1
gðtÞ ¼ 1
sðÞhðt Þ d:
ð4Þ
This relationship is known as the so-called convolution, and abbreviated by g(t) ¼ s(t) * h(t). Since the convolution is commutative, we may interchange input signal and impulse response, and equally write g(t) ¼ h(t) * s(t). Let us now consider the system reaction to the complex exponential seig(t) of frequency f (or radian frequency ! ¼ 2pf ) given by seig ðtÞ ¼ e j2pft ,
ð5Þ
FOURIER, BLOCK, AND LAPPED TRANSFORMS
pffiffiffiffiffiffiffi where j ¼ 1. From g(t) ¼ h(t) * seig(t), we obtain Z1 gðtÞ ¼ e j2pft hðÞej2pf d ¼ seig ðtÞ Hð f Þ,
5
ð6Þ
1
where1
Z
1
Hð f Þ ¼
hðtÞe j2pft dt:
ð7Þ
1
Hence, the input signal is only weighted by the generally complex weighting factor H( f ), but otherwise reproduced unchanged, and called an eigenfunction of LTI systems. The relationship between h(t) and H( f ) is the Fourier transform, and denoted by h(t)H( f ). If known for all frequencies, H( f ) is called the spectrum of the signal h(t), or the transfer function of the LTI system. Equation (7) essentially is an inner product or correlation between h(t) and the complex exponential of frequency f. The signal h(t) can be recovered from its spectrum H( f ) by the inverse Fourier transform Z1 hðtÞ ¼ Hð f Þe j2pft df , ð8Þ 1
which is a weighted superposition of complex exponentials. (This integral reconstructs discontinuities of h(t) by the average between left and right limit.) Evidently, an LTI system can also be fully described by its transfer function H( f ). When applied to a signal s(t), the Fourier transform S( f ) is called the spectrum of s(t). It specifies the weights and phases of the complex exponentials contributing to s(t) in the inverse Fourier transform according to Z1 sðtÞ Sð f Þ ) sðtÞ ¼ Sð f Þe j2pft df : ð9Þ 1
The Fourier transform allows one to describe the transfer of a signal s(t) over an LTI system in the frequency domain. According to Equation (6), the LTI system reacts to e j2pft by H( f ) e j2pft. Equation (9) represents the system input s(t) as a weighted superposition of complex exponentials. Because of linearity, the output signal g(t) is given by an identical weighted superposition of system reactions H( f ) e j2pft: Z1 gðtÞ ¼ Sð f ÞHð f Þe j2pft df : ð10Þ 1
1 In the following, we R 1assume the Fourier integrals to exist. For h(t) piecewise continuous, a sufficient condition is 1 jhðÞj d < 1.
6
TIL AACH
Denoting the spectrum of g(t) by G( f ), the inverse Fourier transform yields Z
1
gðtÞ Gð f Þ ) gðtÞ ¼ 1
Gð f Þe j2pft df :
ð11Þ
Comparing Equations (10) and (11), we obtain G( f ) ¼ H( f )S( f ), i.e., the spectrum of the output signal is given by the product of the spectrum of the input signal and the transfer function of the LTI system. The Fourier transform as given by Equations (7) and (8) thus provides insight into the frequency content of signals, and transfer properties of LTI systems. Relating a continuous-time signal to a spectrum that is a function of a continuous frequency variable, this version of the Fourier transform is, however, not suited for numerical evaluation by computer or digital signal processing systems. Still, realization of a continuous Fourier analyzer is possible, for instance by optical systems (Papoulis, 1968; Bamler, 1989). B. Discrete-Time Signals and Systems Let us now consider a discrete-time signal s(n), where the independent variable may only take integer values, i.e., n ¼ 0, 1, 2, . . . . Essentially, s(n) is an ordered sequence of numbers stored, for example, in the memory of a computer, or coming from an A/D-converter. A discrete-time system maps the input signal s(n) into the output signal g(n) by the mapping g(n) ¼ M(s(n)). As in the continuous-time case, we regard only linear timeinvariant systems obeying the following conditions: (i) Linearity: ! K K K X X X ai si ðnÞ ¼ ai Mðsi ðnÞÞ ¼ ai gi ðnÞ, ð12Þ M i¼1
i¼1
i¼1
for arbitrary input signals si(n) and weighting factors ai, i ¼ 1, . . . , K. (ii) Time invariance: MðsðnÞÞ ¼ gðnÞ ) Mðsðn mÞÞ ¼ gðn mÞ,
ð13Þ
where m is an integer delay. In the discrete-time case, the Dirac delta impulse is replaced by the unit impulse (n) which is defined by 1 for n ¼ 0 ðnÞ ¼ : ð14Þ 0 otherwise
FOURIER, BLOCK, AND LAPPED TRANSFORMS
7
A discrete-time signal s(n) can then be composed of a sum of weighted and shifted unit impulses according to 1 X
sðnÞ ¼
sðmÞðn mÞ:
ð15Þ
m¼1
To determine the system response g(n), it then suffices to know its impulse response h(n) ¼ M((n)). Because of linearity and time invariance, the output signal is given by the following superposition of weighted and shifted impulse responses: 1 X
gðnÞ ¼
sðmÞhðn mÞ:
ð16Þ
m¼1
This operation is called the discrete-time convolution, and is denoted by g(n) ¼ s(n) * h(n). Like its continuous-time counterpart, the discrete-time convolution is commutative. The eigenfunctions of discrete-time LTI systems are discrete-time complex exponentials given by seig ðnÞ ¼ e j2pfn :
ð17Þ
Note that the frequency variable f is still continuous. Passing seig(n) through our LTI system yields the output signal gðnÞ ¼ e j2pfn
1 X
hðmÞej2pfm ¼ seig ðnÞ HDT ð f Þ,
ð18Þ
m¼1
where HDT ð f Þ ¼
1 X
hðnÞej2pfn
ð19Þ
n¼1
is the DTFT of h(n), which can be regarded as the transfer function of the LTI system, or the spectrum of the signal h(n). We denote this relation by h(n)HDT( f ). Clearly, the spectrum of a discrete-time signal is periodic over f. Indeed, s(n) can be regarded as the Fourier series representation of HDT( f ). To reconstruct h(n) from its spectrum, it therefore suffices to consider a single period of HDT( f ): Z 1=2 hðnÞ HDT ð f Þ ) hðnÞ ¼ HDT ð f Þe j2pfn df : ð20Þ 1=2
8
TIL AACH
As in the continuous-time case, it is straightforward to show that the spectrum of an output signal of an LTI system is the product of the spectrum of the input signal and the transfer function of the LTI system: gðnÞ ¼ sðnÞ * hðnÞ GDT ð f Þ ¼ SDT ð f Þ HDT ð f Þ:
ð21Þ
While the discrete-time convolution in Equation (16) can be implemented on digital signal processing (DSP) systems, the spectral-domain relations are of less practical value, since they depend on a continuous frequency variable. C. The Discrete Fourier Transform and Block Transforms Let us now consider a finite-duration signal s(n), n ¼ 0, . . . , N1 comprising N samples. Seeking a spectral-domain representation for s(n) by N frequency coefficients SDFT(k), k ¼ 0, . . . , N1, we start from its DTFT SDT( f ), which is a sum over N components. SDT( f ) is periodic with period 1, and therefore fully specified by one period, for instance 0 f<1. Seeking to represent the N-sample sequence s(n) by N discrete frequency coefficients, we take N equally spaced samples from one period of SDT( f ), 0 f<1, thus obtaining the DFT of s(n) by N1 X k ¼ sðnÞejð2p=NÞkn , k ¼ 0, . . . , N 1: SDFT ðkÞ ¼ SDT N n¼0
ð22Þ
The finite-duration signal s(n) can be recovered from its DFT SDFT(k) by the inverse DFT (see Appendix A) sðnÞ SDFT ðkÞ ) sðnÞ ¼
1 X 1N SDFT ðkÞe jð2p=NÞkn , n ¼ 0, . . . , N 1: ð23Þ N k¼0
The DFT hence represents a finite-duration discrete-time signal s(n) of N coefficients by N discrete spectral coefficients SDFT(k), and is therefore perfectly suited for numerical implementation. In general, the frequency coefficients are complex, offering 2N degrees of freedom. However, for real * ðN kÞ, which s(n), the DFT obeys the symmetry condition SDFT ðkÞ ¼ SDFT reduces the degrees of freedom to N. Since the DFT applies to finite-length signals, samples of signals of long duration must be collected into successive segments or blocks of finite length, which are then subjected to the DFT. Transforms like the DFT are therefore termed block transforms. The block length is limited by practical considerations, like available memory and performance of the digital signal
FOURIER, BLOCK, AND LAPPED TRANSFORMS
9
processing system. More important, however, is the influence of statistical signal properties on the block length: the notion of power spectrum, for instance, is meaningful only for (wide sense) stationary random signals. Real data, like speech or images, are stationary only over short time intervals and blocks of rather small extent, respectively. Applications of spectral analysis, like power spectrum estimation by block transforms or block transform coding, therefore only make sense when applied to reasonably short and small segments. In the JPEG still image compression standard, images are processed in blocks of 8 8 pixels. Speech can be considered stationary for intervals of the order of 10–50 ms. When sampled at 8 kHz, this translates into blocks with 64–256 samples. Linear block transforms are conveniently expressed as matrix operations. Grouping the signal samples s(n), n ¼ 0, . . . , N1 into a column vector s ¼ [s(0), s(1), . . . , s(N1)]T, and the frequency coefficients into a vector S ¼ [SDFT(0), SDFT(1), . . . , SDFT(N1)]T, Equation (22) can be written as S ¼ W s,
with W ¼ ½W kn ,
ð24Þ
where W is the square N N transform matrix with entry Wkn ¼ ejð2p=NÞkn
ð25Þ
in the (kþ1)th row and (nþ1)th column. The inverse transform in Equation (23) can be expressed by s¼
1 ðW* ÞT S: N
ð26Þ
Thus, ðW* ÞT W ¼ N I,
ð27Þ
where I is the identity matrix. The DFT transform matrix is hence unitary up to a factor N, or, in other words, the DFT basis functions are orthogonal. We have derived the DFT by sampling the first period of the DTFT of a signal of length N with a sampling period 1/N. What are the consequences of this frequency-domain sampling operation? Comparing the Fourier transform S( f ) of a continuous-time signal s(t) according to Equation (7) to the DTFT in Equation (19), we see that replacing a continuous-time signal by discrete equally spaced samples with sampling period one leads to a periodic spectrum with period one in the DTFT of Equation (19). Also, the Fourier transform and its inverse in Equation (8) are almost identical in structure.
10
TIL AACH
Apart from a sign change in the exponent, the signal s(t) and its spectrum S( f ) are simply interchanged, as are the time and frequency variables t and f. Therefore, like time-domain sampling leads to a periodic spectrum, frequency-domain sampling of the periodic spectrum leads to a periodic discrete signal sp(n) by periodically repeating s(n) with a period that is the inverse of the sampling period in the frequency domain. Since the frequencydomain sampling period in Equation (22) is 1/N, the periodic signal sp(n) is given by sp ðnÞ ¼
1 X
sðn þ rNÞ:
ð28Þ
r¼1
Hence, the DFT represents one period of a periodic discrete-time signal by one period of its discrete-frequency periodic spectrum. Both the ‘‘actually’’ transformed signal and its spectrum are therefore periodic. This implicit, underlying periodicity must not be overlooked when applying the DFT. One consequence is the occurrence of spurious high-frequency artifacts in the DFT spectrum, which are generated when the block-end signal coefficients s(0) and s(N1) differ strongly. The periodically repeated signal sp(n) then exhibits abrupt transitions, which ‘‘leak’’ spectral energy into high-frequency spectral coefficients. To illustrate this effect, Figure 2 shows the signal s(n) ¼ cos(4pn/64), n ¼ 0, . . . , 63 and its DFT spectrum. Since two periods of the cosine fit perfectly into the analysis interval, periodic repetition of s(n) generates a smooth signal. As expected, the DFT spectrum exhibits two ‘‘clean’’ peaks at k ¼ 2 and k ¼ 62. This is vastly different in Figure 3: the frequency of the cosine is slightly increased such that now 2.5 periods fit into the data interval, with s(n) ¼ cos(5pn/64), n ¼ 0, . . . , 63. Periodic repetition generates transitions from almost 1 to 1 between block-end samples. The effect of these transitions is evident in the DFT spectrum, which is now spread over all frequency coefficients. This example also illustrates one important application of block transforms: as Figure 2 shows, a block transform may concentrate the signal energy into only a small number of spectral coefficients. This property is essential for data compression by transform coding. From Figure 3, however, it becomes clear that the DFT is probably not the optimal transform for this purpose, due to problems caused by discontinuities at the block ends. In the next section we examine transform coding in more detail. We will see that, although better transforms than the DFT exist for transform coders, artifacts caused by block boundaries persist. This is the main motivation for the development and use of lapped transforms.
FOURIER, BLOCK, AND LAPPED TRANSFORMS
11
FIGURE 2. Top: Source signal sðnÞ ¼ cosð4pn=64Þ for n ¼ 0, . . . , 63. Bottom: Modulus DFT spectrum SDFT(k) of s(n) for k ¼ 0, . . . , 63. Since periodic repetition of s(n) does not create discontinuities, the DFT spectrum exhibits the expected two peaks.
Various fast and highly efficient algorithms are available for the computation of the DFT and its inverse (‘‘fast Fourier transform,’’ FFT). These are widely used in applications like power spectrum estimation, fast convolution, adaptive filtering, noise reduction, and signal enhancement, as
12
TIL AACH
FIGURE 3. Top: Source signal sðnÞ ¼ cosð5pn=64Þ for n ¼ 0, . . . , 63. Bottom: Modulus DFT spectrum SDFT(k) of s(n) of k ¼ 0, . . . , 63. Periodic repetition of s(n) results in strong discontinuities between the periods, causing the spreading of signal energy over all frequency coefficients.
well as in many others. Some of these applications require the use of overlapping segments, others the use of segments that are subjected to a smooth window function such that discontinuities at block ends are reduced or eliminated (Oppenheim and Schafer, 1998; Ziemer et al., 1989, Chap. 11).
FOURIER, BLOCK, AND LAPPED TRANSFORMS
13
III. TRANSFORM CODING A. The Role of Transforms: Constrained Source Coding The aim of source coding or data compression is to represent discrete signals s(n) with only a small expected number of bits per sample (the so-called bit rate), with either no distortion (lossless compression), or as low a distortion as possible for a given rate (lossy compression). Since we try to optimize the trade-off between distortion and rate on the average, we regard signals as random which we describe by their statistical properties. The essential step in source coding is quantization (Goyal, 2001, p. 12). A straightforward approach is so-called pulse code modulation (PCM), where each sample is quantized individually at a fixed number of bits, e.g., eight bits for gray-level images. Most signals representing meaningful information, however, exhibit strong statistical dependencies between signal samples. In images, for instance, the gray levels of neighboring pixels tend to be similar. To take such dependencies into account, possibly large sets of adjacent samples should be quantized together. Unfortunately, this unconstrained approach leads to practical problems even for relatively small groups of samples (Goyal, 2001). In transform coding, the signals or images are first decomposed into adjacent blocks or vectors of N input samples each. Each block is then individually transformed such that the statistical dependencies between the samples are reduced, or even eliminated (Clarke, 1985; Zelinski and Noll, 1977; Goyal, 2001). Also, the signal energy which generally is evenly distributed over all signal samples s(n) should be repacked into only a few transform coefficients. The transform coefficients S(k) can then be quantized individually (scalar quantization). Each quantizer output consists of an index i(k) of the quantization interval into which the corresponding transform coefficient falls. These indices are then coded, e.g., by a fixed length code or an entropy code. The decoder then first reconverts the incoming bitstream into the quantization indices, and then replaces the quantization index i(k) for each transform coefficient S(k) by the centroid V(i(k)) of the indexed quantization interval, which serves as an approximation, or better, estimate, S^ ðkÞ ¼ VðiðkÞÞ of S(k). The relation between the indices i(k) and the centroids V(i(k)) is stored in a look-up table called a codebook. An inverse transform then calculates the reconstructed signal ˆ sðnÞ. The principle of a transform coder and decoder (codec) is shown in Figure 4. Clearly, due to quantization, the compression technique is lossy. The distortion caused by uniform scalar quantization is discussed in Appendix B. Optimizing a transform codec needs to address choosing an
14
TIL AACH
FIGURE 4. Block diagram of a transform coder and decoder. The signal vector s is first transformed into the transform coefficient vector S ¼ As. The transform coefficients are quantized. The quantization indices i(k) are encoded into codewords and multiplexed into the bitstream which is transmitted over the channel. The decoder first demultiplexes the bitstream into the codewords, which are then reconverted into the quantization indices i(k). The decoded quantization indices are used to access the codebooks, yielding the quantized transform coefficient values S^ ðkÞ ¼ VðiðkÞÞ. These are subjected to an inverse transform to obtain the reconstructed signal vector s^ ¼ A1 S^ .
optimal transform and optimal scalar quantization of the transform coefficients. Since the optimization is thus constrained by the architecture outlined in Figure 4, we speak of constrained source coding. Practical transform codecs employ linear unitary or orthogonal transforms. Linear transforms explicitly influence linear statistical dependencies, that is, correlations. In the next section we therefore first discuss unitary transforms subject to the criteria of decorrelation and energy concentration. We then show that the optimal transform with respect to these criteria is also optimal with respect to the reconstruction errors incurred at given rates.
B. Transform Efficiency Modeling the signal s(n) as wide-sense stationary over n ¼ 0, . . . , N1, the mean value is constant for all samples. Without loss of generality, we assume that the mean is zero, if necessary by having first subtracted a potential nonzero mean from the data. The autocovariance function (ACF) is then given by cs(n) ¼ E(s(m)s(mþn)), where E denotes expectation, and the
FOURIER, BLOCK, AND LAPPED TRANSFORMS
15
(constant) variance 2 of s(n) is 2 ¼ cs(0). The ACF can be normalized by cs(n) ¼ 2 ps(n), with ps(0) ¼ 1 and j ps(n)j 1. Alternatively, covariances can be expressed by the covariance matrix Cs (Fukunaga, 1972; Therrien, 1989), which is an N N matrix defined by 2 6 6 Cs ¼ E½ssT ¼ 2 6 4
1 ps ð1Þ .. . ps ðN 1Þ
ps ð1Þ 1
ps ð2Þ ps ð1Þ
ps ðN 2Þ ps ðn 3Þ
3 ps ðN 1Þ ps ðN 2Þ 7 7 7: 5 1 ð29Þ
The entry in the (nþ1)th row and (kþ1)th column of Cs is thus given by cs(jnkj). The covariance matrix of a wide-sense stationary signal vector is evidently a positive semidefinite and symmetric Toeplitz matrix (Therrien, 1989; Makhoul, 1981; Akansu and Haddad, 2001); indeed, Cs is symmetric about both main diagonals (persymmetric) (Unser, 1984). We transform the signal vector s into the coefficient vector S ¼ A s by a linear, unitary transform. The transform is described by an N N matrix A, with A1 ¼ AH, where the superscript H denotes conjugate transpose (cf.pEquation (27)). ffiffiffiffi For instance, A could be a unitary DFT defined by A ¼ 1= N W, with W given by Equation (25). A unitary transform preserves Euclidean lengths: kSk22 ¼ SH S ¼ sT AH A s ¼ sT I s ¼ ksk22 ,
ð30Þ
where sH ¼ sT, since s is real. The covariance matrix CS of the transform coefficients can then be derived as CS ¼ E½SSH ¼ AE½ssT AH ¼ ACs AH ,
ð31Þ
and also det(CS) ¼ det(Cs). Furthermore, the sum of the variances of the signal and transform coefficients are identical: N 2 ¼ trðCs Þ ¼ trðCS Þ ¼
N 1 X
S2 ðkÞ,
ð32Þ
k¼0
where tr(C) is the trace of matrix C. In general, the nondiagonal entries of Cs differ more or less strongly from zero, reflecting correlations between the signal samples s(n). We now seek a unitary transform matrix A, which decorrelates as much as possible the input data. Hence, we seek a transform matrix such that the covariance matrix CS of the transform coefficients is diagonal or nearly
16
TIL AACH
diagonal (Fukunaga, 1972; Therrien, 1989; Clarke, 1985; Goyal, 2001). At the same time, we seek to concentrate optimally the signal energy into only a few dominant transform coefficients. The decorrelation efficiency d can be measured by comparing the sums of absolute nondiagonal matrix entries before and after transformation by (Akansu and Haddad, 2001, p. 33) P k,l,k6¼l j½CS kl j : d ¼ 1 P m,n,m6¼n j½Cs mn j
ð33Þ
Energy concentration can be evaluated by the relative energy contribution of the L
S2 ð1Þ > S2 ðN 1Þ, such a measure is PN1 2 PN1 2 S ðkÞ k¼L S ðkÞ , DBR ðLÞ ¼ Pk¼L ¼ N1 2 tr ðCS Þ ðkÞ k¼0 S
ð34Þ
which is sometimes referred to as the basis restriction error (Jain, 1979; Unser, 1984; Akansu and Haddad, 2001). Denoting the rows of A by N-component row vectors aTk , k ¼ 0, . . . , N1, we obtain for the variances S2 ðkÞ by evaluating Equation (31) for the entries along the main diagonal S2 ðkÞ ¼ aTk Cs a*k :
ð35Þ
Minimizing the basis restriction error subject to the real constraint aTk a*l ¼ ðk lÞ is equivalent to minimizing the functional J¼
N 1 X
½aTk Cs a*k k ðaTk a*k 1Þ ,
ð36Þ
k¼L
with Langrangian multipliers lk, and where we have taken into account that the denominator in Equation (34) is invariant under a unitary transform. It can straightforwardly be shown that J is minimized by the normalized eigenvectors uk, k ¼ 0, . . . , N1 of the data covariance matrix Cs (Therrien, 1992, pp. 50, 694; Therrien, 1989; Akansu and Haddad, 2001). The eigenvectors fulfill Cs uk ¼ k uk :
ð37Þ
Since Cs is symmetric and positive semidefinite, its eigenvalues lk are real and nonnegative. Its eigenvectors are orthogonal, and, since the eigenvalues
FOURIER, BLOCK, AND LAPPED TRANSFORMS
17
are real, the eigenvectors can always be found such that their elements are real (Cs also has complex eigenvectors, obtained by multiplying the real eigenvectors by a nonzero complex factor). The unitary transform matrix A is given by 2 T 3 u0 6 uT 7 6 1 7 7 ð38Þ A¼6 6 .. 7: 4 . 5 uTN1 This transform is called the Karhunen–Loe`ve transform (KLT) (Fukunaga, 1972; Therrien, 1989). The variances of the transform coefficients are given by the eigenvalues lk, since from Equation (35) S2 ðkÞ ¼ uTk Cs uk ¼ uTk k uk ¼ k ,
ð39Þ
where we have considered only real eigenvectors. Also, since the eigenvectors are orthogonal, we have for the nondiagonal entries of the covariance matrix CS ½CS kl ¼ uTk Cs ul ¼ uTk l ul ¼ 0
for k 6¼ l:
ð40Þ
Hence, CS is a diagonal matrix, and the transform coefficients are perfectly decorrelated. We constrain the eigenvectors to be real, and order them in Equation (38) by rank of their eigenvalues. Up to the sign of the eigenvectors, the KLT then is the unique unitary transform which minimizes the basis restriction error and perfectly diagonalizes the covariance matrix if the eigenvalues are all distinct. Also, invoking Hadamard’s inequality which states that the determinant of any symmetric, positive semidefinite matrix is less than or equal to the product of its diagonal elements, we obtain an additional measure for energy concentration: we find that the determinant of a covariance matrix is always less than or equal to the product over all variances, i.e., det½Cs ¼ det½CS
N1 Y
S ðkÞ2 :
ð41Þ
k¼0
If CS was obtained by the KLT, we have equality: det½Cs ¼ det½CS ¼
N1 Y k¼0
k ¼
N1 Y k¼0
S ðkÞ2 :
ð42Þ
18
TIL AACH
Hence, the KLT minimizes the geometric mean of the variances to (Zelinski and Noll, 1977; Goyal, 2001) " 2 GM
¼
#1=N
N1 Y
S ðkÞ
2
" ¼
k¼0
N1 Y
#1=N k
:
ð43Þ
k¼0
As we will see later on, this measure is directly related to the distortion of a transform coder as a function of the rate. Although thus optimal in theory, the KLT has two drawbacks. First, it depends on the covariance structure of the data. Second, there is no general fast algorithm for computation of the KLT. Fortunately, as we will see subsequently, the KLT is in practice well approximated by sinusoidal tranforms like the DCT and lapped transforms. Let us first examine how the DFT is related to the KLT. Rewriting the covariance matrix in Equation (29) as 2 6 6 Cs ¼ 6 6 4
cð0Þ
cð1Þ
cð2Þ
cð1Þ .. .
cð0Þ
cð1Þ
cðN 1Þ cðN 2Þ
cðN 3Þ
cðN 1Þ
3
cðN 2Þ 7 7 7 7 5
ð44Þ
cð0Þ
¼ toeplitz½c0 , c1 , . . . , cN2 , cN1 , we form another symmetric Toeplitz matrix: Ds ¼ toeplitz½c0 , cN1 , cN2 , . . . , c1 2 cð0Þ cðN 1Þ cðN 2Þ 6 cðN 1Þ cð0Þ cðN 1Þ 6 ¼6 .. 6 4 . cð1Þ
cð2Þ
cð3Þ
cð1Þ
3
cð2Þ 7 7 7: 7 5
cð0Þ
ð45Þ
Similar to the decomposition of a signal s(n) into the sum s(n) ¼ se(n) þ so(n) of an even signal se(n) ¼ 0.5[s(n) þ s(n)] and an odd signal so(n) ¼ 0.5[s(n) s(n)], we can decompose the covariance matrix Cs into the sum of a circulant and a skew circulant matrix (Unser, 1984). The circulant matrix is calculated by 1 E ¼ ½Cs þ Ds ¼ toeplitz½e0 , e1 , . . . , eN1 , 2
ð46Þ
FOURIER, BLOCK, AND LAPPED TRANSFORMS
19
and the skew circulant by 1 O ¼ ½Cs Ds ¼ toeplitz½o0 , o1 , . . . , oN1 : 2
ð47Þ
Evidently, e0 ¼ c0 and o0 ¼ 0. The entries ei and oi, i ¼ 1, . . . , N1 are related to ci by 1 ei ¼ ½ci þ cNi ¼ eNi 2
and
1 oi ¼ ½ci cNi ¼ oNi , 2
ð48Þ
and the covariance matrix Cs is the sum Cs ¼ E þ O
ð49Þ
As shown in Unser (1984, Sect. 4), Therrien (1992, Sect. 4.7.2), and Akansu and Haddad (2001, p. 43), the basis functions of the unitary DFT form complex eigenvectors uk of the circulant matrix E. Denoting the elements of uk by uk(n), we thus have 1 2p kn : uk ðnÞ ¼ pffiffiffiffi exp j N N
ð50Þ
Similarly, the basis vectors of a related transform called the discrete odd Fourier transform are eigenvectors of O. The eigenvalues of E are then given by the DFT of its first row: Ek ¼
2p kn , k ¼ 0, . . . , N 1: en exp j N n¼0
N 1 X
ð51Þ
Because of the symmetry en ¼ eNn, n ¼ 1, . . . , N1, the DFT is real and symmetric, that is lEk ¼ lENk , k ¼ 1, . . . , N1. Therefore, eigenvectors with real elements can also be found for E, like real or imaginary parts of Equation (50). The DFT can be simplified to Ek
2p kn : ¼ en cos N n¼0 N 1 X
ð52Þ
Recalling from Equation (29) that the elements of a covariance matrix are given by the samples of the ACF, and regarding E as a valid covariance
20
TIL AACH
matrix, the eigenvalues lEk can also be interpreted as power spectral coefficients. Although we have thus found fast KLTs for circulant and skew circulant matrices, this does not generally solve for the KLT of the sum. We therefore analyze now a specific parametric covariance model, which is often used as an elementary approximation of the short-time behavior of s(n). Let w(n) denote zero-mean white noise with variance w2 , which is stationary by definition. Its ACF is cw ðnÞ ¼ w2 ðnÞ, and its covariance matrix is the N N diagonal matrix Cw ¼ diag½w2 , w2 , . . . , w2 . We model s(n) as the output of a first-order recursive LTI system with input w(n); s(n) then is also stationary and obeys s(n) ¼ s(n 1) þ w(n), with j j < 1. Transfer function HDT( f ) and impulse response h(n) of the LTI system are HDT ð f Þ ¼
1 hðnÞ ¼ "ðnÞn , 1 ej2pf
ð53Þ
where "(n) is the unit step sequence, i.e., "(n) ¼ 1 for n 0, and zero otherwise. The ACF of this first-order autoregressive (AR(1)) or Markov-I process is cs ðnÞ ¼ s2 jnj , with s2 ¼
w2 : 1 2
ð54Þ
The covariance matrix Cs then is Cs ¼ s2 toeplitz½1, , 2 , . . . , N1 :
ð55Þ
The correlation between samples of s(n) decays exponentially with their distance, and is the correlation between directly adjacent samples. Practically, approximation of the short-time and short-space behavior of speech and image signals, respectively, leads to positive and close to one (Ahmed et al., 1974; Clarke and Tech, 1981; Clarke, 1985; Malvar, 1992b; Goyal, 2001; Akansu and Haddad, 2001). The eigenvectors of the covariance matrix are sinusoids (Ray and Driver, 1970; see also Clarke and Tech, 1981; Akansu and Haddad, 2001, p. 36) the frequencies of which are not equally spaced on the unit circle. No fast algorithm for computing this KLT exists. Fortunately, as shown numerically in Ahmed et al. (1974), the KLT for an AR(1) process with sufficiently large is well
21
FOURIER, BLOCK, AND LAPPED TRANSFORMS
approximated by the DCT. Element n of basis vector k of the DCT is defined as 8 rffiffiffiffi > 1 > > < N ak ðnÞ ¼ rffiffiffiffi > > > 2 cos 2n þ 1 kp : N 2N
for k ¼ 0
9 > > > =
> > ; for k ¼ 1, . . . , N 1 >
,
ð56Þ
n ¼ 0, . . . , N 1: For a visual comparison, Figures 5 and 6 depict the KLT basis functions for ¼ 0.91, N ¼ 8 and the DCT basis vectors. Clarke proved analytically that the KLT of an AR(1) process approaches the DCT as approaches one (Clarke and Tech, 1981). Moreover, the DCT of an N-point signal vector can be regarded as the 2N-point DFT of the concatenation s(n) and the mirrored signal s(2N 1 n) (Clarke and Tech, 1981; Lim, 1990, p. 148). Periodic repetition of the concatenated signal is not afflicted with discontinuities between the periods, thus avoiding the spreading of spectral
FIGURE 5. Numerically computed KLT basis vectors of an AR(1) process for ¼ 0.91 and N ¼ 8.
22
TIL AACH
FIGURE 6. Basis vectors of the unitary DCT for N ¼ 8. Up to a sign, the similarity to the KLT basis vectors in Figure 5 is evident.
energy caused by the DFT leakage artifacts (Lim, 1990, p. 645). More details are given in Appendix C. Figure 5 also illustrates symmetry properties of the KLT: evidently, half of the eigenvectors are invariant to reversing the order of their elements; they are called (even) symmetric. For these vectors, we have ui ¼ Jui, where J denotes the N N counter identity matrix (or reverse operator), with ones along the second diagonal and zero elsewhere. For the other half, we have ui ¼ Jui; these vectors are skew symmetric. In fact, for persymmetric matrices C with distinct eigenvalues and N even, half of the eigenvectors are symmetric, while the other half are skew symmetric (Cantoni and Butler, 1976; Makhoul, 1981; Unser, 1984; Akansu and Haddad, 2001). The same symmetry properties hold for the DCT basis vectors, half of which are symmetric, while the other half are skew symmetric. We will need this property for the construction of lapped transforms. Let us summarize the results of this section:
The covariance matrix of a wide-sense stationary random signal is a persymmetric Toeplitz matrix. The orthogonal linear transform generating perfectly decorrelated transform coefficients from a wide-sense stationary signal is the KLT,
23
FOURIER, BLOCK, AND LAPPED TRANSFORMS
which is unique except for a sign in the eigenvectors if the eigenvectors are constrained to have only real elements. For an even number N of samples, half of the eigenvectors are symmetric, while the other half are skew symmetric. Also, the KLT maximizes energy concentration as measured by the basis restriction error, and minimizes the geometric mean of the transform coefficient variances. The covariance matrix of a wide-sense stationary process can be decomposed into the sum of a circulant and a skew circulant matrix. A KLT of the circulant matrix is the DFT. Real data can often be regarded as a first-order autoregressive (AR(1) or Markov-I) process with relatively high adjacent-sample correlation. A KLT for this model is well approximated by the DCT. As the adjacent-sample correlation approaches one, this KLT approaches the DCT.
For the AR(1) process with ¼ 0.91 and N ¼ 8, the decorrelation efficiency of the DCT is d ¼ 98.05% (for the KLT, d ¼ 100% by design). The basis restriction errors are given in Table 1.
C. Transform Coding Performance In this section we show how to distribute optimally an allowable maximum bit rate to the transform coefficients in Figure 4 such that the average distortion is minimized, and quantify the distortion. Since a unitary transform preserves Euclidean length, it is straightforward to show that the distortion introduced by quantization in the transform domain is the same as the mean square error of the reconstructed signal (Huang and Schultheiss, 1963; Zelinski and Noll, 1977). Denoting the quantized transform coefficient vector by S^ , and the reconstructed signal vector by s^ , the average distortion is (cf. Equation (30)) D¼
i 1 h ^ ÞH ðS S ^ Þ ¼ 1 E ðs ˆsÞT ðs ˆsÞ : E ðS S N N
ð57Þ
TABLE 1 BASIS RESTRICTION ERROR (%) L KLT DCT
FOR
KLT
AND
DCT
0
1
2
3
4
5
6
7
100 100
20.5 20.7
8.9 9.1
5.2 5.2
3.3 3.3
2.1 2.2
1.3 1.3
0.61 0.61
24
TIL AACH
For sufficiently fine quantization, it is shown in Appendix B, Equation (119), that the distortion D(k) of the k-th transform coefficient depends on the allocated bit rate R(k) by DðkÞ ¼ ðkÞ S2 ðkÞ 22RðkÞ :
ð58Þ
The required bit rate for a given maximum distortion then is 2
1 1 ðkÞ RðkÞ ¼ log2 ½ðkÞ þ log2 S : 2 2 DðkÞ
ð59Þ
The parameters (k) depend on the distribution of the coefficients and the type of quantization. Assuming a Gaussian signal, the transform coefficients are also Gaussian. (Transform coefficients perfectly decorrelated by a KLT are then also statistically independent.) The (k) are then all identical, and the rate simplifies to RðkÞ ¼
2
1 1 ðkÞ log2 ½ þ log2 S : 2 2 DðkÞ
ð60Þ
Minimizing the average distortion D¼
1 X 1N DðkÞ N k¼0
ð61Þ
X 1 N1 RðkÞ N k¼0
ð62Þ
subject to a fixed average rate R¼
yields that all transform coefficients have to be quantized with the same distortion D(k) ¼ D, k ¼ 0, . . . , N1. The optimum bit rate for the kth transform coefficient is RðkÞ ¼ R þ
2
1 ðkÞ log2 S2 , 2 GM
ð63Þ
2 where GM is the geometric mean of the transform coefficient variances introduced in Equation (43). (Potential negative rates for low-variance coefficients may be clipped, see e.g., Zelinski and Noll, 1977; Goyal, 2001).
FOURIER, BLOCK, AND LAPPED TRANSFORMS
25
Inserting this result into Equation (58), and with D(k) ¼ D, we obtain for the distortion as a function of rate given optimal bit allocation 2 D ¼ 22R GM :
ð64Þ
2 is minimized by the KLT; hence, the KLT is the As we saw above, GM transform minimizing the distortion under optimal bit allocation. To quantify the performance of a transform coder, the optimal transform coding distortion is compared to the distortion DPCM of PCM. In the latter, the transform matrix can formally be set to the identity matrix. Then, the transform coefficients are identical to the signal samples, and the coefficient variances are identical to the signal variance 2. We thus obtain for the transform coding gain
GTC ¼
P 2 2 ð1=NÞ N1 k¼0 S ðkÞ ¼ , 2 2 GM GM
ð65Þ
where the rightmost identity follows from the energy preservation property of unitary transforms. For an AR(1) process with ¼ 0.91 and N ¼ 8, the transform coding gains of DCT and KLT are 4.6334 (6.66 dB) and 4.668 (6.69 dB), respectively. Evidently, the DCT is a very good approximation to the KLT. Experiments show that this result holds also for covariance matrices estimated from real speech or image data (Zelinski and Noll, 1977; Malvar, 1992b; Clarke, 1985; Akansu and Haddad, 2001).
IV. TWO-DIMENSIONAL TRANSFORMS So far we have considered only 1D signals and their transformations. In this section we generalize to 2D signals. Let s(m, n) denote a real signal defined over the 2D block m, n ¼ 0, . . . , N1, and S(k, l) the transform coefficients for k, l ¼ 0, . . . , N1. (Without loss of generality, the restriction to square blocks simplifies notation.) Signal samples and transform coefficients can be regarded as N N-matrices s and S, respectively. The basis vectors ak ¼ [ak(0), . . . , ak(N1)]T, k ¼ 0, . . . , N l, are then replaced by basis matrices bkl ¼ [bkl]mn. The transform coefficients are calculated by Sðk, lÞ ¼
N1 X X N1 m¼0 n¼0
sðm, nÞbkl ðm, nÞ:
ð66Þ
26
TIL AACH
With a 4D transform tensor T, this can be expressed as (Malvar, 1992b, p. 22) S ¼ Ts,
with T ¼ ½T klmn ¼ ½bkl ðm, nÞ :
ð67Þ
Alternatively, we can order the signal samples row by row into a N2-dimensional column vector sv as sv ¼ ½sð0, 0Þ, sð0, 1Þ, . . . , sð0, N 1Þ, sð1, 0Þ, . . . , sðN 1, N 1Þ T :
ð68Þ
Similarly, a transform coefficient vector Sv can be formed. Ordering the entries bkl(m, n) in an appropriate order in a N2 N2 matrix B, we can express the 2D transform as a product of a matrix with a vector as Sv ¼ Bsv :
ð69Þ
Clearly, for real signals and transforms, this product requires O(N4) multiplications and additions. In practice, however, so-called separable 2D transforms are used almost exclusively. The entries bkl(m, n) of the (kþ1), (lþ1)th basis matrix of a separable transform are calculated from 1D basis vector entries by bkl(m, n) ¼ ak(m)al(n). For the unitary 2D DFT, this yields bkl ðm, nÞ ¼
1 jð2p=NÞðkmþlnÞ e , N
ð70Þ
and for the 2D DCT, we obtain 8 1 > > > > > N > > > pffiffiffi > > > > 2 2m þ 1 > > cos kp > > 2N < N bkl ðm, nÞ ¼ pffiffiffi > > 2 2n þ 1 > > lp cos > > > 2N N > > > > > > > 2 2m þ 1 2n þ 1 > > kp cos lp : cos N 2N 2N m, n ¼ 0, . . . , N 1:
9 > > > > > > > > > > > > > for l ¼ 0, k ¼ 1, . . . , N 1 > > > = for k ¼ l ¼ 0
> > > for k ¼ 0, l ¼ 1, . . . , N 1 > > > > > > > > > > > > > for k, l ¼ 1, . . . , N 1 ;
,
ð71Þ
27
FOURIER, BLOCK, AND LAPPED TRANSFORMS
The matrix B in Equation (69) can then be written as the Kronecker product of the N N transform matrix A for a 1D signal of length N with itself: 2
a00 A .. 6 B¼AA¼4 . aN10 A
a01 A
a0N1 A
aN11 A
aN1N1 A
3 7 5:
ð72Þ
In the tensor notation of Equation (67), the transform simplifies to the product of three N N matrices as S ¼ AsAT ,
ð73Þ
where the multiplication from the right by AT is a transform of the rows of s, while the multiplication from the left by A transforms the columns. The 2D transform can hence be realized by a 1D transform along each row of the signal block followed by a 1D transform along each column of the result, or vice versa. Evidently, the number of multiplications and additions needed by Equation (73) is O(N3), down from O(N4) for the nonseparable transforms, if no fast algorithms are used. As an illustration, Figure 7 depicts the real part of a basis matrix for the DFT computed from Equation (70) and a basis matrix for the DCT according to Equation (71). A comparison shows that in the case of a real transform, separability comes at a price: while the DFT basis matrix exhibits an unambiguous orientation, this is not the case for the DCT, which consists of two cosine waves with different orientations. The separable 2D DFT is
FIGURE 7. Left: Real part of a basis matrix of the 2D DFT for N ¼ 16, k ¼ l ¼ 2. Right: 2D DCT basis matrix for N ¼ 16, k ¼ l ¼ 4.
28
TIL AACH
therefore unambiguously orientation selective, while the separable 2D DCT basis matrices are sensitive to two different orientations. Unambiguous orientation selectivity is desired in applications like adaptive enhancement of oriented structures, such as lines and edges. More on this topic can be found in Section V.E, and in Kunz and Aach (1999) and Aach and Kunz (1996a, 2000). In the following, we will consider only separable transforms. Since these can always be implemented as a sequence of 1D transforms, we will return to the 1D notation for the remainder of this chapter.
V. LAPPED TRANSFORMS A. Block Diagonal Transforms In the preceding discussion it was sufficient to express the transform operations with respect to single blocks. The development of lapped transforms will require the joint consideration of several neighboring blocks. Denoting the (mþ1)th block by sm ¼ [s(mN), s(mNþ1), . . . , s(mNþN1)]T, a signal st consisting of M blocks can be written as sTt ¼ ½s0 , s1 , . . . , sM1 . Similarly, with Sm ¼ Asm being the transform coefficients for the (mþ1)th block and stacking these, we obtain 2 6 6 6 St ¼ 6 6 4
S0 S1 .. . .. . SM1
2. .. 7 6 7 6 7 6 7¼6 7 6 5 4
3 2
3
0
0
7 6 7 6 7 6 76 7 6 5 4
A A A
..
.
s0 s1 .. . .. .
3 7 7 7 7 ¼ T st , 7 5
ð74Þ
sM1
where the matrix T ¼ diag(A, . . . , A) is block-diagonal. The inverse transform is given by st ¼ TT St ¼ diagðAT , . . . , AT Þ,
ð75Þ
where we have assumed a real transform. Evidently, orthogonality of the blockwise transform can also be expressed as orthogonality of the transform matrix T.
FOURIER, BLOCK, AND LAPPED TRANSFORMS
29
B. Extension to Lapped Transforms As already shown in Figure 1, independent block processing may create artifacts at the block boundaries. These are caused by the discontinuous transitions to zero at the ends of the transform basis functions (Malvar, 1992b; Aach and Kunz, 2000). Block artifacts could hence be avoided by using basis functions that decay smoothly to zero. Perfect reconstruction by an inverse transform then requires that the basis functions of neighboring blocks overlap, as otherwise ‘‘holes’’ would appear in the reconstructed signal. The basis functions would thus have lengths L>N, while the number of transform coefficients per block must, of course, not exceed N. The square matrix A is then replaced by a nonsquare matrix P of size N L. We consider now L ¼ 2N. The basis functions for calculating Sm then extend over the blocks sm and smþ1, i.e., over the samples [s(mN), s(mNþ1), . . . , s((mþ2)N1)]. The N-dimensional vector Sm of transform coefficients is then given by Sm ¼ P
sm smþ1
:
ð76Þ
The next block is taken over the samples [s(mþ1)N), . . . , s((mþ3)N1)], and so on. This procedure is illustrated in Figure 8. Such a transform is called a lapped transform. Since P is not a square matrix, we cannot invert Equation (76) to obtain sm and smþ1 from Sm. We therefore formulate the transform with respect to the entire signal (or image). Writing the N 2N matrix P as the concatenation P ¼ [AB] of two N N matrices, Equation (74) becomes for a lapped transform 2. .. 7 6 7 6 7 6 7 6 7 6 7¼6 7 6 7 6 7 6 7 6 5 4 0 SM1 B
2 S 0 6 S1 6 . 6 .. 6 6 .. St ¼ 6 6 .. 6 . 6 . 6 . 4 . .
3
½A 0 0 0
B ½A 0 0
0 B ½A 0
0 B ½A
0 B ..
0
.
3 2 s 3 0 0 6 s1 7 .. 7 6 7 .7 7 6 ... 7 7 6 7 7 6 .. 7 7 6 . 7 ¼ T st , 7 6 . 7 7 6 7 0 7 6 .. 7 7 6 . 7 5 4 . 5 . B sM1 A ð77Þ
where the wrap-around in the last row corresponds to a periodic repetition of the signal. As in block transforms, T is a square matrix, which we
30
TIL AACH
FIGURE 8. Formation of signal blocks sm and transform vectors Sm in a lapped transform with basis functions of length L ¼ 2N.
require to be orthogonal, i.e., T TT ¼ I. The original image can thus be reconstructed by 2 6 6 6 6 T st ¼ T St ¼ 6 6 6 6 4
..
3 . AT
0
BT 0 .. .
AT BT
0 AT
0
BT
.
..
BT 0 .. .
7 7 7 7 7 St : 7 7 7 0 5 AT
ð78Þ
This relation shows that the inverse transform consists of two steps. First, each N-dimensional transform vector Sm is multiplied by the 2N N matrix PT, yielding a 2N-dimensional signal vector. Neighboring signal vectors overlap by N samples, and are added in a second step to obtain the reconstructed image. Alternatively, Equation (78) may be regarded as another lapped transform applied to the data St, yielding
S sm ¼ ½B A m1 Sm T
T
¼ BT Sm1 þ AT Sm ,
ð79Þ
which is of the same structure as Equation (76). C. The Lapped Orthogonal Transform The matrix product T TT yields a block tridiagonal matrix, with entries P PT along the main diagonal, entries A BT along the diagonal
FOURIER, BLOCK, AND LAPPED TRANSFORMS
31
immediately to the left, and entries B AT along the diagonal immediately to the right. From the orthogonality condition T TT ¼ I, the necessary and sufficient conditions on P ¼ [AB] therefore are P PT ¼ A AT þ B BT ¼ I and
A BT ¼ B AT ¼ 0:
ð80Þ
For T TT ¼ I, we can equivalently write TT T ¼ I, from which an alternative formulation of the necessary and sufficient condition can be derived: AT A þ BT B ¼ I and
AT B ¼ BT A ¼ 0:
ð81Þ
We may also approach the orthogonality conditions by rewriting Equation (76) to Sm ¼ P
sm
¼ ½AB
smþ1
sm
smþ1
:
ð82Þ
Inserting this into Equation (79), we obtain sm ¼ BT Asm1 þ AT Bsmþ1 þ ðAT A þ BT BÞsm :
ð83Þ
This equality holds with condition (81). The first condition in Equation (80) states that the rows of P, i.e., the transform basis functions, must be orthogonal, while the second condition requires the overlapping parts of the basis functions to be orthogonal as well. A transform complying with Equation (80) is called a lapped orthogonal transform (LOT). Invoking the shift matrix V defined as V¼
0 0
I , 0
ð84Þ
where 0 and I are of size N N, conditions (80) can be more compactly written as PVm PT ¼ ðmÞI, m ¼ 0, 1:
ð85Þ
Extending the above considerations towards lapped transforms with basis functions of lengths L ¼ KN, K ¼ 2, 3, . . . , the matrix P then has size N L, and condition (85) becomes (Malvar and Staelin, 1989; Malvar, 1992b) PVm PT ¼ ðmÞI, m ¼ 0, 1, 2, . . . , K 1,
ð86Þ
32
TIL AACH
where the identity matrix in the shift matrix V is now of order (K1)N. Of course, for K ¼ 1 this notation includes traditional nonoverlapping block transforms as a special case. If P0 is a valid LOT matrix, it can be used to generate more valid LOT matrices P by P ¼ Z P0, where Z is an orthogonal N N matrix. P will then also comply with condition (86), since PVm PT ¼ ZP0 Vm PT0 ZT ¼ ZðmÞIZT ¼ ðmÞI:
ð87Þ
In the following, we construct a valid LOT of order N with basis functions of length L ¼ 2N. To obtain a transform which can be realized by a fast algorithm, the initial matrix P0 is constructed from the unitary DCT basis functions of lengths N. As we have seen in Section III.B, half of the DCT basis functions are even symmetric, while the other half are odd. Stacking the even basis functions rowwise into the (N/2) N matrix De, and the odd ones into the matrix Do, a valid LOT matrix is (Malvar, 1992a; Akansu and Wadas, 1992; Akansu and Haddad, 2001) P0 ¼
1 De Do 2 De Do
ðDe Do ÞJ , ðDe Do ÞJ
ð88Þ
where J is the counter identity matrix (or reverse operator) already used in Section III.B. The matrix P0 is of size N 2N, where, similar to KLT and DCT, the basis functions in the first N/2 rows are even, while the other N/2 basis functions are odd. It satisfies condition (86), but it will not optimize the transform coding gain of, for example, an AR(1) process. Hence, for a given covariance model Cs, the orthogonal square matrix Z is determined such that its rows are identical to the eigenvectors of the covariance matrix P0 Cs PT0 . The LOT ZP0 thus consists of two steps: a transform by P0 followed by another transform by Z. The covariance matrix CS ¼ ZP0 Cs PT0 ZT then is diagonal. Note, however, that the LOT does not preserve the determinant of the covariance matrix. Figure 9 shows the basis functions for N ¼ 8 and L ¼ 16 as computed for an AR(1) process with ¼ 0.91. The coding gain of this LOT is 5.06 (7.05 dB). The fast implementation of this transform reflects its two-step structure (Malvar, 1992b): the matrix P0 is realized using an N-point DCT, which is followed by a series of plane rotations used to approximate Z (Akansu and Haddad, 2001; Akansu and Wadas, 1992). The numerical values of the basis functions of the approximate LOT can be found for ¼ 0.95 in Malvar (1992b, p. 171).
FOURIER, BLOCK, AND LAPPED TRANSFORMS
33
FIGURE 9. Basis functions of the LOT for N ¼ 8 and L ¼ 16. The computation of the basis functions is based on an AR(1) signal model with ¼ 0.91. The functions are sorted from left to right and top to bottom in descending order of the eigenvalues of P0CsPT0 .
D. The Modulated Lapped Transform The above LOT was derived by an eigenvector analysis, leading to basis functions with even or odd symmetry. An alternative approach is motivated by the close relationship between maximally decimated filter banks on the one hand and block and lapped transforms on the other (Akansu and Haddad, 2001, p. 4; Malvar, 1992b). In filter banks, the filters are often realized by a low-pass prototype shifted to N different frequency channels by modulation. In the context of lapped transforms, this leads to the so-called modulated lapped transform if the filter length L is equal to 2N. For longer filters (or basis functions), this transform is referred to as the extended lapped transform (ELT). For L ¼ 2N, the basis functions are formed by a cosine modulated window function h(n), leading to the N 2N transform matrix P with entries ½P kn
rffiffiffiffi
2 Nþ1 1 p cos n þ kþ , ¼ hðnÞ N 2 2 N
ð89Þ
34
TIL AACH
FIGURE 10. Basis functions of the MLT for N ¼ 8 and L ¼ 16. The frequency index k increases from left to right and top to bottom.
for k ¼ 0, . . . , N1, and n ¼ 0, . . . , 2N1. The window h(n) obeys hðnÞ ¼ sin
1 p nþ : 2 2N
ð90Þ
These basis functions are shown in Figure 10. Evidently, they are not symmetric any more. Still, the half-sine window ensures a continuous transition towards zero at the ends of the basis functions. In the following, we will show that this choice of basis functions complies with the orthogonality conditions (81). The window function obeys the conditions h2 ðnÞ þ h2 ðn þ NÞ ¼ 1
ð91Þ
hðnÞ ¼ hð2N 1 nÞ:
ð92Þ
and
FOURIER, BLOCK, AND LAPPED TRANSFORMS
35
Arranging the window samples into two diagonal N N matrices H0 and H1, we obtain H0 ¼ diag½hð0Þ, hð1Þ, . . . , hðN 1Þ H1 ¼ diag½hðNÞ, hðN þ 1Þ, . . . , hð2N 1Þ ¼ diag½hðN 1Þ, hðN 2Þ, . . . , hð0Þ ¼ JH0 J
ð93Þ
where JH0J reverses both rows and columns of H0. The modulating cosines are arranged into the N N matrices Q0 and Q1, yielding ½Q0 kn
rffiffiffiffi
2 N þ1 1 p cos n þ kþ , k, n ¼ 0, . . . , N 1 ¼ N 2 2 N
ð94Þ
and ½Q1 kn ¼
rffiffiffiffi
2 Nþ1 1 p cos n þ N þ kþ , N 2 2 N
k, n ¼ 0, . . . , N 1 ð95Þ
Expressing the transformation matrix P as the concatenation P ¼ [AB], we obtain A ¼ Q0 H0 , and
B ¼ Q1 H1 :
ð96Þ
For Q0 and Q1, the conditions QT0 Q1 ¼ QT1 Q0 ¼ 0,
ð97Þ
QT0 Q0 ¼ Q0 QT0 ¼ I J,
ð98Þ
QT1 Q1 ¼ Q1 QT1 ¼ I þ J
ð99Þ
and
hold (see Appendix D). Inserting these into condition (81), we obtain AT B ¼ H0 QT0 Q1 H1 ¼ 0
ð100Þ
36
TIL AACH
and AT A þ BT B ¼ H0 QT0 Q0 H0 þ H1 QT1 Q1 H1 ¼ H0 ½I J H0 þ H1 ½I þ J H1 ¼ H20 þ H21 H0 JH0 þ H1 JH1 ¼ I
ð101Þ
since H20 þ H21 ¼ I and H0 JH0 ¼ H1 JH1 . This shows that the MLT complies with the orthogonality conditions for the LOT. For the AR(1) model with ¼ 0.91, the MLT coding gain is 5.15 (7.12 dB). E. Extensions In this section we discuss three extensions of the MLT and the LOT by introducing additional basis functions which are in a certain sense complementary to the already existing ones. T In the MLT, reconstruction from the transform vector Sm ¼ P sTm sTmþ1 only leads to
s^ m
s^ mþ1
¼ PT Sm ¼
AT BT
½AB
sm smþ1
:
ð102Þ
With Equations (96), (98), and (99), we obtain
AT BT
AT A ½AB ¼ BT A
H0 ðI JÞH0 AT B ¼ T 0 B B
0 H1 ðI þ JÞH1
ð103Þ
where 0 is of size N N. This matrix is evidently not diagonal, thus mixing coefficients from sm with different time indices into one coefficient of the reconstructed vector ˆsm , and similarly for ˆsmþ1 . By analogy to frequencydomain aliasing, where higher frequencies are mapped back onto lower ones during downsampling, this phenomenon is called time-domain aliasing. Since the MLT perfectly reconstructs the entire signal st by adding the overlapping signals obtained by individual inverse transforms, time-domain aliasing in the reconstruction from Sm is hence canceled from reconstructions from Sm1, and Smþ1 (time-domain aliasing cancellation, TDAC). This observation holds if the transform vectors Si are left unchanged. Frequency-domain processing of the transform coefficients unbalances the time-domain aliasing components contained in the ˆsi , thus resulting in uncanceled time-domain aliasing in the reconstructed signal. In general, these uncanceled aliasing components are the larger, the stronger the transform coefficients are changed during processing. Keeping uncanceled
FOURIER, BLOCK, AND LAPPED TRANSFORMS
37
aliasing below an acceptable threshold hence restricts how strongly the transform coefficients can be processed. As an example, acoustic echo cancellation using the MLT is mentioned in Malvar (1999), where the occurrence of uncanceled aliasing limits the maximum echo reduction to no more than 10 dB. In Malvar (1999) the MLT is therefore extended by replacing the real basis functions by complex ones defined as ½P kn
rffiffiffiffi 2 Nþ1 1 p exp j n þ kþ : ¼ hðnÞ N 2 2 N
ð104Þ
The resulting transform is called the modulated complex lapped transform (MCLT). The inverse transform is carried out by the Hermitian transpose PH, yielding for Equation (102)
ˆsm ˆsmþ1
sm ¼ PH Sm ¼ PH P smþ1
ð105Þ
with (Malvar, 1999)
H20 P P ¼ diag h ðnÞ ¼ 0 H
2
0 H21
ð106Þ
which is a diagonal matrix. Time-domain aliasing does therefore not occur, which allows a stronger degree of processing. Superposition of the reconstructed signal vectors only compensates for the effects of the window h(n). In Malvar (1999) the MCLT permits one to reduce echo by 20 dB, compared to only 5 dB with the MLT. The price to pay is a redundancy by a factor of two, since the MCLT transforms N real signal samples into N complex transform coefficients. In Young and Kingsbury (1993) a similar extension, termed the complex lapped transform (CLT), is proposed for the 2D LOT. The objective is to estimate motion in image sequences by phase correlation between blocks. Since the use of a lapped transform implies smoothly windowed overlapping blocks, smoother motion fields are expected in comparison to motion estimation techniques using nonoverlapping blocks. The transform generates a redundancy by a factor two in each dimension, resulting in a total redundancy of four. We finally discuss an extension of the 2D MLT which makes the transform unambiguously orientation sensitive. As we have seen in Section IV, the basis functions of real separable 2D transforms are sensitive to two different orientations. For image enhancement and restoration,
38
TIL AACH
however, unambiguous detection of orientated structures is often desired. This can be achieved by complementing the cosine-shaped basis functions by sine-shaped ones. The basis functions of the separable, 2D MLT are given by ½P klmn ¼
2hðnÞhðmÞ Nþ1 1 p Nþ1 1 p cos m þ kþ cos n þ lþ N 2 2 N 2 2 N
ð107Þ
where k, l ¼ 0, . . . , N1 and m, n ¼ 0, . . . , 2N1. Replacing the cosine functions by sine functions leads to the complementary basis functions ½P0 klmn ¼
2hðnÞhðmÞ sin N
Nþ1 1 p Nþ1 1 p mþ kþ sin n þ lþ : 2 2 N 2 2 N
ð108Þ
The basis functions of the new, orientation-selective transform are formed by ½PLþ klmn ¼ ½P klmn þ ½P0 klmn 2hðnÞhðmÞ p Nþ1 1 Nþ1 1 cos mþ kþ nþ lþ , ¼ N N 2 2 2 2
ð109Þ
which is an unambiguously orientated windowed cosine wave. Since the [PLþ]klmn cover only half the possible orientations, we form additionally the basis functions ½PL klmn ¼ ½P klmn ¼ ½P0 klmn 2hðnÞhðmÞ p Nþ1 1 Nþ1 1 kþ þ nþ lþ : ¼ cos mþ N N 2 2 2 2
ð110Þ
This transform is termed the lapped directional transform (LDT) (Kunz and Aach, 1999; Aach and Kunz, 2000). The relation between the MLT and LDT basis functions is illustrated in Figure 11. The LDT is real-valued, but not separable. However, both forward and inverse LDT can be computed from the separable fast MLTs in Equations (107) and (108). The LDT generates a redundancy by a factor of only two, and was successfully used for anisotropic image restoration and enhancement in Kunz and Aach (1999) and Aach and Kunz (2000). In a combined image restoration, enhancement, and compression framework, the processed LDT coefficients can be
FOURIER, BLOCK, AND LAPPED TRANSFORMS
a)
b)
c)
d)
39
FIGURE 11. Example 2D MLT and LDT basis functions for N ¼ 8, and k ¼ 3, l ¼ 2: (a) MLT, (b) MLT0 , (c) LDT (sum), (d) LDT (difference).
reconverted into the coefficients of the MLT and the complementary MLT by a simple butterfly. Using only the MLT coefficients for compression eliminates the redundancy problem (Aach and Kunz, 2000).
VI. IMAGE RESTORATION AND ENHANCEMENT In this section we compare the block FFT and the LDT within a framework for anisotropic noise reduction by a nonlinear spectral domain filter. The noisy input image is first decomposed into blocks of size 32 32 pixels, which are then transformed by the FFT or the LDT. The observed noisy transform coefficients are then attenuated depending on their observed signal-to-noise ratio: the more the magnitude of a coefficient exceeds a
40
TIL AACH
FIGURE 12. Left: Original ‘‘Marcel’’ image. Right: Noisy version with a peak signal-tonoise ratio of 20.2 dB.
FIGURE 13. Left: Processing result for the noisy ‘‘Marcel’’ image using the block FFT with no overlap. The blocking effect is evident. Right: Processing result for the noisy ‘‘Marcel’’ image using the LDT. The noise reduction performance is almost identical to the FFT-based algorithm, but the blocking effect has disappeared.
corresponding noise estimate, the less it is attenuated. Since directional image information leads to spectral energy concentration, which can unambiguously be detected in both FFT and LDT (but not in real separable transforms, like DCT, LOT, and MLT), coefficients contributing to oriented lines and edges can be identified and more carefully treated than other ones. These algorithms are discussed in detail elsewhere (Aach and Kunz, 1996a, 1998, 2000; Aach, 2000; Kunz and Aach, 1999). Figure 12 shows an original image and its noisy version (white Gaussian noise, peak signal-to-noise ratio 20.2 dB). The processed images are shown in Figure 13. Evidently, processing by the FFT without block overlap reduces the noise level visibly, but the rather strong processing causes the block raster to appear. (In Aach
FOURIER, BLOCK, AND LAPPED TRANSFORMS
41
FIGURE 14. Enlarged versions of the FFT-processed (left) and LDT-processed (right) noisy ‘‘Marcel’’ image.
and Kunz (1996a,b) the authors therefore used overlapping blocks, inflating the processed data volume by a factor of four.) The LDT-based processing result reduces noise approximately as much as the FFT-based approach, i.e., by about 6 dB, without causing block artifacts. Enlargements of both processing results are shown in Figure 14.
VII. DISCUSSION In this chapter we have summarized the development of lapped transforms. We started with the continuous-time and discrete-time Fourier transforms of time-dependent signals with infinite duration. These transforms were viewed as a decomposition of the signals into frequency-selective basis functions, or eigenfunctions of LTI systems. With the discrete Fourier transform which decomposes a finite-length signal block into a set of orthogonal basis functions, a transform could be expressed as a multiplication of the signal vector by a unitary matrix, i.e., viewed as a rotation of coordinate axes. We then analyzed the effects of unitary transforms of the covariance structure of random signals, and found optimal transforms with respect to decorrelation and energy concentration. While these optimal transforms are signal dependent and cannot be calculated fast, we showed that Fourier-like fixed transforms, in particular the DCT, are good practical approximations to the optimal transforms. The disadvantage of blockwise processing are the blocking artifacts introduced by independent spectraldomain processing of the blocks. To alleviate the blocking effects, we then turned to finite-length transforms with overlapping basis functions. The
42
TIL AACH
transform matrix for a single block then is not square any more; inverse transforms of single blocks do not therefore exist. Under extended orthogonality conditions, however, it was shown that the original signal can be reconstructed from nonperfectly reconstructed individual blocks by overlapping and adding. Two types of lapped transforms were discussed, the lapped orthogonal transform and the modulated lapped transform, where we focused on a 2 : 1 overlap. For the LOT, a feasible rectangular matrix obeying the extended orthogonality conditions was first constructed using DCT basis functions. This matrix was optimized by multiplication with an orthogonal square matrix derived from an eigenvector analysis. The MLT did not need an eigenvector analysis; rather, it was based on modulated filter banks. We did not delve deeper into the relation between block transforms and filter banks. Suffice it to mention that a block transform can be viewed as a uniform critically sampled filter bank, where the filter length is equal to the number of subbands. Similarly, a lapped transform can be regarded as a uniform and critically sampled filter bank with filter length equal to, for example, twice the number of subbands. We then discussed extensions of both the LOT and the MLT in speech and image processing. These extensions are based on the additional use of complementary basis functions, thus introducing redundancy. We concluded with an exemplary comparison of block and lapped transforms in image processing.
ACKNOWLEDGMENTS The author is grateful to Cicero Mota, formerly with the University of Amazonas, Brazil, and now with the University of Lu¨beck, and to Dietmar Kunz, Cologne University of Applied Sciences, for fruitful discussions.
APPENDIX A To prove that Equation (23) indeed recovers s(n) from its frequency coefficients, we multiply both sides of Equation (22) by e jð2p=NÞkr , sum over all frequency coefficients, and normalize by N, yielding 1 N 1 1 X X X 1 N 1 N SDFT ðkÞ e jð2p=NÞkr ¼ sðnÞ e jð2p=NÞðrnÞk N k¼0 N n¼0 k¼0
for r ¼ 0, . . . , N 1, ð111Þ
43
FOURIER, BLOCK, AND LAPPED TRANSFORMS
where we have interchanged the order of summations on the right-hand side. The orthogonality of complex sinusoids 1 X 1 N 1 for r n ¼ 0, N, 2N, . . . jð2p=NÞðrnÞk e ¼ ¼ ðn ðr mNÞÞ 0 otherwise N k¼0 ð112Þ yields 1 N 1 X X 1 N SDFT ðkÞ e jð2p=NÞkr ¼ sðnÞðn ðr mNÞÞ ¼ sðrÞ, N k¼0 n¼0
ð113Þ
which concludes the proof.
APPENDIX B Figure 15 shows a scalar uniform quantizer with quantization interval or step size . A transform coefficient S(k) is quantized to multiples V(i(k)) ¼ i(k) (Goyal, 2001, p. 13; Gray and Neuhoff, 1998). The output of the quantizer hence is an index i(k) ¼ round(S(k)/), where round(x) rounds to the nearest integer. The decoder calculates the quantized transform coefficient values by S^ ðkÞ ¼ iðkÞ ¼ VðiðkÞÞ: Assuming sufficiently fine quantization, the error dðkÞ ¼ SðkÞ S^ ðkÞ can be assumed as being
FIGURE 15. Uniform quantization into multiples of the step size .
44
TIL AACH
uniformly distributed between /2 and /2. Defining the distortion D(k) as D(k) ¼ E [d2(k)], where E denotes the expectation, we obtain D¼
2 : 12
ð114Þ
Consider S(k) uniformly distributed between [Smax, Smax). Its energy S2 ðkÞ then is S2 ðkÞ ¼
ð2Smax Þ2 : 12
ð115Þ
Dividing the dynamic range [Smax, Smax) into steps of step size yields 2Smax/ quantization steps. Assuming the number of steps being a power of two, a fixed-length code needs 2Smax R ¼ log2
ð116Þ
bits per transform coefficient. The distortion then depends on the rate R according to D ¼ S2 ðkÞ 22R
ð117Þ
and the signal-to-distortion ratio is 2 S2 ðkÞ ðkÞ 2R ¼ 2 ) 10 log10 S dB ¼ R 6 dB: D D
ð118Þ
Each additional bit hence improves this ratio by 6 dB (Lu¨ke, 1999, p. 204; Proakis and Manolakis, 1996, Sect. 9.2.3). In fact, it can be shown that optimal quantizers perform in accordance with (Goyal, 2001; Gray and Neuhoff, 1998) D¼
S2 ðkÞ
2
2R
) 10 log10
S2 ðkÞ dB ¼ R 6 dB 10 log10 ðÞ dB, D ð119Þ
where is a factor depending on the distribution of the input signal and on the encoding method. For instance, for a Gaussian source and
FOURIER, BLOCK, AND LAPPED TRANSFORMS
45
pffiffiffi fixed-length encoding of i(k), we have ¼ 3p=2 2:7. Using an entropy code yields ¼ pe=6; this improves the signal-to-distortion ratio by about 2.8 dB over the fixed-length code (Goyal, 2001, p. 14; Jayant and Noll, 1984).
APPENDIX C To eliminate the potential discontinuities in the periodic repetition of s(n), n ¼ 0, . . . , N1, we form the concatenated signal of length 2N gðnÞ ¼
sðnÞ for n ¼ 0, . . . , N 1 : sð2N 1 nÞ for n ¼ N, . . . , 2N 1
ð120Þ
Figure 16 shows the concatenated signal g(n) for the cosine wave in Figure 3. Note that the last coefficient of s(n), i.e., s(N1) ¼ g(N1), is repeated as g(N), the concatenation therefore is not a perfect cosine wave. Periodic repetition of g(n) will not exhibit unwanted discontinuities, so the DFT of g(n) should not be afflicted by leakage artifacts. Also, if s(n) is of even length, so is g(n), which is convenient when one wants to use fast FFT-like implementations. Moreover, g(n) is symmetric with respect to N0.5. The
FIGURE 16. Concatenation g(n) of the cosine wave in Figure 3 and its mirrored version according to Equation (120). Note that always g(N1) ¼ g(N).
46
TIL AACH
DFT G(k), k ¼ 0, . . . , 2N1, of g(n) should therefore be real apart from a complex linear phase factor e jpk=2N , and even symmetric (recall that s(n) is assumed to be real). Indeed, we have for G(k)
gðnÞ GðkÞ ¼
N 1 X
sðnÞejðpk=NÞn þ
n¼0
2N1 X
sð2N 1 mÞejðpk=NÞm ,
m¼N
ð121Þ which, after substituting n ¼ 2N1m in the second sum, yields
GðkÞ ¼
N1 X
sðnÞ ejðpk=NÞn þ e jðpk=NÞðnþ1Þ :
ð122Þ
n¼0
Factoring out the complex linear phase factor caused by the (N 0.5)-point circular shift, we obtain
GðkÞ ¼ e
jpk=2N
pkðn þ ð1=2ÞÞ , k ¼ 0, . . . , 2N 1: 2 sðnÞ cos N n¼0 N 1 X
ð123Þ Leaving off the complex exponential factor (this corresponds to a reverse circular shift of g(n) by N 0.5 points) and normalizing to achieve a unitary transform leads to the DCT as defined in Equation (56). Because of the symmetry of the DFT coefficients, the coefficients for k ¼ 0, . . . , N1 suffice. Figure 17 shows jG(k)j for the extended cosine wave in Figure 16; this is proportional to the modulus DCT spectrum of the signal in Figure 3. When comparing the spectra in Figures 3 and 17, the reduction of leakage is immediately evident. The DCT can hence be regarded as a DFT after modifying the signal so that discontinuities do not occur in the periodic extension. Another consequence is that the DCT can be computed efficiently using FFT algorithms where, from Equation (56), no complex number operations are needed any more. Exploiting the symmetry of the concatenated signal g(n), the 2N-point DFT of G(k) can actually be computed by an N-point DFT (Lim, 1990, p. 153). The above observations also hold for the inverse DCT.
FOURIER, BLOCK, AND LAPPED TRANSFORMS
47
FIGURE 17. Modulus DFT of the extended cosine in Figure 16, for k ¼ 0, . . . , N1 ¼ 63, which is proportional to the DCT of the cosine wave in Figure 3. Note the improved concentration of spectral energy with respect to the DFT spectrum in Figure 3.
APPENDIX D With the notation
Nþ1 1 p ðk, mÞ ¼ m þ kþ 2 2 N N þ1 1 p
ðk, nÞ ¼ n þ N þ kþ 2 2 N
ð124Þ
we have for the (mþ1, nþ1)th element of QT0 Q1
QT0 Q1
With
mn
¼
1 X 2 N cos ðk, mÞ cos ðk, nÞ N k¼0
¼
1 X 1 N cos½ ðk, mÞ þ ðk, nÞ þ cos½ ðk, mÞ ðk, nÞ : N k¼0
1 p ðkÞ ¼ ðk, mÞ þ ðk, nÞ ¼ ðm þ n þ 1Þ k þ þp 2 N
ð125Þ
ð126Þ
48
TIL AACH
PN1 and 0 < m þ n þ 1 < 2N, the sum k¼0 cos ðkÞ extends over i periods if m þ n þ 1 ¼ 2i is an even number, and is thus zero. For m þ n þ 1 ¼ 2i þ 1 odd, the sequence of cos ðkÞ, k ¼ 0, . . . , N1, is an odd sequence, and again sums to zero. Similarly, the sum over cos½ ðk, mÞ ðk, nÞ is zero, which proves Equation (97). The entries of QT0 Q0 are
QT0 Q0
mn
¼
1 X 2 N cos ðk, mÞ cos ðk, nÞ N k¼0
¼
1 X 1 N cos½ ðk, mÞ ðk, nÞ þ cos½ ðk, mÞ þ ðk, nÞ N k¼0
ð127Þ
where N1 X
cos½ ðk, mÞ ðk, nÞ ¼
k¼0
0 for m 6¼ n N for m ¼ n
ð128Þ
and N1 X k¼0
cos½ ðk, mÞ þ ðk, nÞ ¼
0 N
for m þ n 6¼ N 1 for m þ n ¼ N 1
ð129Þ
from which Equation (98) follows. The proof of Equation (99) is similar.
REFERENCES Aach, T. (2000). Transform-based denoising and enhancement in medical x-ray imaging. European Signal Processing Conference, EURASIP, Tampere, Finland, edited by M. Gabbouj, and P. Kuosmanen, pp. 1085–1088. Aach, T., and Kunz, D. (1996a). Anisotropic spectral magnitude estimation filters for noise reduction and image enhancement. Proc. ICIP-96, Lausanne, Switzerland, pp. 335–338. Aach, T., and Kunz, D. (1996b). Spectral estimation filters for noise reduction in x-ray fluoroscopy imaging. Proc. EUSIPCO-96, Trieste, Italy, edited by G. Ramponi, G. L. Sicuranza, S. Carrato, and S. Marsi, pp. 571–574. Aach, T., and Kunz, D. (1998). Spectral amplitude estimation-based x-ray image restoration: An extension of a speech enhancement approach, in Proc. EUSIPCO-98, Patras, edited by S. Theodoridis, I. Pitas, A. Stouraitis, and N. Kalouptsidis, pp. 323–326. Aach, T., and Kunz, D. (2000). A lapped directional transform for spectral image analysis and its application to restoration and enhancement. Signal Processing 80(11), 2347–2364. Ahmed, N., Natarajan, T., and Rao, K. R. (1974). Discrete cosine transform. IEEE Trans. Computers 23, 90–93.
FOURIER, BLOCK, AND LAPPED TRANSFORMS
49
Akansu, A. N., and Haddad, R. A. (2001). Multiresolution Signal Decomposition. Boston: Academic Press. Akansu, A. N., and Wadas, F. E. (1992). On lapped orthogonal transforms. IEEE Trans. Signal Processing 40(2), 439–443. Bamler, R. (1989). Mehrdimensionale lineare Systeme. Berlin: Springer Verlag. Cantoni, A., and Butler, P. (1976). Properties of the eigenvectors of persymmetric matrices with applications to communication theory. IEEE Trans. Communications 24(8), 804–809. Cappe´, O. (1994). Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. Speech and Audio Processing 2(2), 345–349. Clarke, R. J. (1985). Transform Coding of Images. London: Academic Press. Clarke, R. J., and Tech, B. (1981). Relation between the carhunen loe`ve and cosine transform. IEEE Proc. 128(6), 359–360. Ephraim, Y., and Malah, D. (1984). Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoustics, Speech, and Signal Processing 32(6), 1109–1121. Fukunaga, K. (1972). Introduction to Statistical Pattern Recognition. New York: Academic Press. Goyal, V. K. (2001). Theoretical foundations of transform coding. IEEE Signal Processing Magazine September, 9–21. Gray, R. M., and Neuhoff, D. L. (1998). Quantization. IEEE Trans. Information Theory 44, 2325–2383. Huang, J., and Schultheiss, P. (1963). Block quantization of correlated Gaussian random variables. IEEE Trans. Communication Systems 11, 289–296. Jain, A. K. (1979). A sinusoidal family of unitary transforms. IEEE Trans. Pattern Analysis and Machine Intelligence 1(4), 356–365. Jayant, N. S., and Noll, P. (1984). Digital Coding of Waveforms. Englewood Cliffs, NJ: Prentice Hall. Kunz, D., and Aach, T. (1999). Lapped directional transform: A new transform for spectral image analysis. Proc. ICASSP-99, Phoenix, AZ, pp. 3433–3436. Lim, J. S. (1980). Image restoration by short space spectral subtraction. IEEE Trans. Acoustics, Speech, and Signal Processing 28(2), 191–197. Lim, J. S. (1990). Two-Dimensional Signal and Image Processing. Englewood Cliffs, NJ: Prentice-Hall. Lim, J. S., and Oppenheim, A. V. (1979). Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67(12), 1586–1604. Lu¨ke, H. D. (1999). Signalu¨bertragung. Berlin, Heidelberg, New York: Springer Verlag. Makhoul, J. (1981). On the eigenvectors of symmetric Toeplitz matrices. IEEE Trans. Acoustics, Speech, and Signal Processing 29(4), 868–872. Malvar, H. (1999). A modulated complex lapped transform and its application to audio processing. Proc. ICASSP-99, Phoenix, AZ, pp. 1421–1424. Malvar, H. S. (1992a). Extended lapped transforms: Properties, applications, and fast algorithms. IEEE Trans. Signal Processing 40(11), 2703–2714. Malvar, H. S. (1992b). Signal Processing with Lapped Transforms. Norwood, MA: Artech House. Malvar, H. S., and Staelin, D. H. (1989). The LOT: Transform coding without blocking effects. IEEE Trans. Acoustics, Speech, and Signal Processing 37(4), 553–559. Oppenheim, A. V., and Schafer, R. W. (1998). Discrete-Time Signal Processing. Englewood Cliffs, NJ: Prentice Hall. Papoulis, A. (1968). Systems and Transforms with Applications in Optics. New York: McGraw Hill.
50
TIL AACH
Proakis, J. G., and Manolakis, D. G. (1996). Digital Signal Processing. Upper Saddle River, NJ: Prentice Hall. Rabbani, M., and Jones, P. W. (1991). Digital Image Compression Techniques. Bellingham: SPIE Optical Engineering Press. Ray, W. D., and Driver, R. M. (1970). Further decomposition of the Karhunen–Loe`ve series representation of a stationary random process. IEEE Trans. Information Theory 16(4), 845–850. Therrien, C. W. (1989). Decision, Estimation, and Classification. New York: Wiley. Therrien, C. W. (1992). Discrete Random Signals and Statistical Signal Processing. Englewood Cliffs, NJ: Prentice-Hall. Unser, M. (1984). On the approximation of the discrete Karhunen–Loeve transform for stationary processes. Signal Processing 7, 231–249. van Compernolle, D. (1992). DSP techniques for speech enhancement. Proc. Speech Processing in Adverse Conditions, Cannes-Mandelieu, pp. 21–30. Young, R. W., and Kingsbury, N. G. (1993). Frequency domain motion estimation using a complex lapped transform. IEEE Trans. Image Processing 2(1), 2–17. Zelinski, R., and Noll, P. (1977). Adaptive transform coding of speech signals. IEEE Trans. Acoustics, Speech, and Signal Processing ASSP-25(4), 299–309. Ziemer, R. E., Tranter, W. H., and Fannin, D. R. (1989). Signals and Systems: Continuous and Discrete. New York: Macmillan.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 128
On Fuzzy Spatial Distances ISABELLE BLOCH Ecole Nationale Supe´rieure des Te´le´communications, De´partement TSI, CNRS URA 820, 46 rue Barrault, 75013 Paris, France
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II. Some Views on Space and Distances . . . . . . . . . . . . . . . . . . A. Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Linguistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Human Perception . . . . . . . . . . . . . . . . . . . . . . . . . . D. Cognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III. Spatial Fuzzy Distances: General Considerations . . . . . . . . . . . A. Spatial Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . B. Representation Issues. . . . . . . . . . . . . . . . . . . . . . . . . C. Types of Distances and Problems . . . . . . . . . . . . . . . . . . D. General Principles for Defining a Fuzzy Distance . . . . . . . . . 1. Generalizing a Crisp Distance to a Fuzzy One . . . . . . . . . 2. Distances from Similarity . . . . . . . . . . . . . . . . . . . . . 3. Distances from Set Relationships . . . . . . . . . . . . . . . . 4. Distances from Other Relationships . . . . . . . . . . . . . . . 5. Symbolic Approaches . . . . . . . . . . . . . . . . . . . . . . . E. Properties of Distances and Requirements for Spatial Distances . IV. Geodesic Distance in a Fuzzy Set . . . . . . . . . . . . . . . . . . . . A. Fuzzy Geodesic Distance Defined as a Number . . . . . . . . . . B. Fuzzy Geodesic Distance Defined as a Fuzzy Number . . . . . . V. Distance from a Point to a Fuzzy Set . . . . . . . . . . . . . . . . . . A. As a Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. As a Fuzzy Number . . . . . . . . . . . . . . . . . . . . . . . . . VI. Distance between Two Fuzzy Sets. . . . . . . . . . . . . . . . . . . . A. Comparison of Membership Functions . . . . . . . . . . . . . . . 1. Functional Approach . . . . . . . . . . . . . . . . . . . . . . . 2. Information Theoretic Approach . . . . . . . . . . . . . . . . 3. Set Theoretic Approach. . . . . . . . . . . . . . . . . . . . . . 4. Pattern Recognition Approach . . . . . . . . . . . . . . . . . . B. Accounting for Spatial Distances . . . . . . . . . . . . . . . . . . 1. Geometrical Approach . . . . . . . . . . . . . . . . . . . . . . 2. Morphological Approach . . . . . . . . . . . . . . . . . . . . . 3. Tolerance-Based Approach . . . . . . . . . . . . . . . . . . . . 4. Graph Theoretic Approach. . . . . . . . . . . . . . . . . . . . 5. Histogram of Distances . . . . . . . . . . . . . . . . . . . . . . VII. Spatial Representations of Distance Information . . . . . . . . . . . A. Spatial Fuzzy Sets as a Representation Framework . . . . . . . . B. Spatial Representation of Distance Knowledge to a Given Object
51
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52 54 54 57 59 60 63 63 64 64 67 67 69 69 70 70 71 75 75 77 80 80 81 85 86 86 88 89 93 93 94 95 99 100 101 104 104 105
Copyright ß 2003 Elsevier Inc. All rights reserved. 1076-5670/2003 $35.00
52
ISABELLE BLOCH
VIII. Qualitative Distance in a Symbolic Setting . A. Morpho-Logics . . . . . . . . . . . . . . B. Distances in a Qualitative Setting . . . . IX. Conclusion . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
108 109 113 114 115
I. INTRODUCTION The aim of this chapter is to discuss several definitions related to spatial fuzzy distances, based on existing works and on new proposals, with respect to their properties, the type of information they represent, and the questions they allow us to answer. The wide literature on fuzzy similarities, dissimilarities, and distances is rather silent on methods dealing with spatial information, and, unfortunately, not all approaches are suitable to this purpose. We restrict ourselves here to the ones that concern spatial information. The interest in relationships between spatial objects has been highlighted in very different types of works: in vision, for identifying shapes and objects, in database system management, for supporting spatial data and queries, in artificial intelligence, for planning and reasoning about spatial properties of objects; in cognitive and perceptual psychology; and in geography, for geographic information systems. All these applications converge towards spatial reasoning. According to the semantical hierarchy proposed in [105], we consider here metric relationships (corresponding to level 4 of this hierarchy). Many authors have stressed the importance of topological relationships (which include part–whole relationships such as inclusion, exclusion, and adjacency), e.g., [2,3,48,52,103,132,135,153]. But distances and directional relative position (constituting the metric relationships) are also important, since positional information is a basic cognitive spatial concept which plays a central role in all applications where spatial knowledge is involved (see, e.g., [49,66,75,102,105,108,128]). Usually vision and image processing make use of quantitative representations of spatial relationships. In a purely quantitative framework, spatial distances are well defined. But they need a precise knowledge of the objects and of the types of questions we want to answer. These two constraints can be relaxed in a semiqualitative framework, using fuzzy sets. This allows one to deal with imprecisely defined objects with imprecise questions such as ‘‘are these two objects near to each other?’’ and to provide evaluations that may be imprecise too, which is useful for several applications where spatial reasoning under imprecision has to be considered.
ON FUZZY SPATIAL DISTANCES
53
Fuzzy set theory finds in spatial information processing a growing application domain. This may be explained not only by its ability to model the inherent imprecision of such information (such as in image processing, vision, and mobile robotics) together with expert knowledge, but also by the large and powerful toolbox it offers for dealing with spatial information under imprecision [17]. This is in particular highlighted when spatial structures or objects are directly represented by fuzzy sets. If even less information is available, we may have to reason about space in a purely qualitative way, and the symbolic setting is then more appropriate. In artificial intelligence, mainly symbolic representations are developed and several works addressed the question of qualitative spatial reasoning (see [155] for a survey). For instance, in the context of mereotopology powerful representation and reasoning tools have been developed, but are merely concerned by topological and part–whole relationships, very little by metric ones. This chapter contains a contribution on this aspect too, in the context of modal logics. Limitations of purely qualitative spatial reasoning have already been stressed in [66], as well as the interest in adding semiquantitative extension to qualitative value (as done in the fuzzy set theory for linguistic variables [61,162]) for deriving useful and practical conclusions (as for recognition). Purely quantitative representations are limited in the case of imprecise statements, and of knowledge expressed in linguistic terms. Both quantitative and qualitative knowledge can be integrated, using semiquantitative (or semiqualitative) interpretation of fuzzy sets. As already mentioned in [73], this allows one to provide a computational representation and interpretation of imprecise spatial constraints, expressed in a linguistic way, possibly including quantitative knowledge. Therefore the fuzzy set framework appears central in this context. Links between mathematical morphology operations (mainly dilation) and several types of distance are well established in the quantitative case (for crisp objects). These links can be exploited to define fuzzy spatial distances or qualitative ones based on fuzzy and logical dilations. Therefore this framework allows one to represent in a unified way spatial distances in various settings: a purely quantitative one if objects are precisely defined, a semiqualitative one if objects are imprecise and represented as spatial fuzzy sets, and a qualitative one for reasoning in a logical framework about space. This is made possible because of the strong algebraic structure of mathematical morphology, that finds equivalents in set theoretical terms, fuzzy operations, and logical expressions. Therefore several definitions will be proposed based on mathematical morphology operations. This chapter is organized as follows. In Section II we present some views on space and distances in different domains, not necessarily technical ones.
54
ISABELLE BLOCH
In Section III we present the general framework for defining fuzzy distances: we define what we call spatial fuzzy objects and then we discuss the different possible representations for distances between imprecisely defined objects; general principles for defining fuzzy distances are summarized, and finally the properties required when dealing with spatial information are discussed. The problem of defining the distance between two points in a fuzzy set in a geodesic sense is addressed in Section IV. In Section V we define the distance from a point to a fuzzy set, based on dilation. An important section (Section VI) is dedicated to the distances between fuzzy sets and a classification is proposed, along with a discussion on their ability to manage spatial information. Then we propose a spatial representation of distance information to a given fuzzy set in Section VII. Finally, purely qualitative distances are modeled in a logical framework in Section VIII, using a modal logic defined from morphological operators applied on logical formulas.
II. SOME VIEWS
ON
SPACE
AND
DISTANCES
The issue of perception and representation of the space and of spatial relationships like distances and the issue of spatial reasoning have been addressed by researchers in many communities. This can be partly explained by the fact that spatial knowledge is fundamental to common-sense knowledge. In this section we summarize some views on space and distances. They are but some examples, and should definitely not be considered as an exhaustive review. A. Philosophy The philosophical thinking about space and spatial concepts was influenced by various theories or beliefs related to cosmology, science, and religion. In this section, we point out a few philosophical views of space. From Pythagoras to Zeno, the concept of space was linked to the first developments in arithmetics and Pythagorian geometry. The famous Zeno’s paradox highlights the limits of the Pythagorian space based on an infinite subdivision possibility. Democritus tried to overcome these difficulties by reducing the space to the infinite empty room which surrounds atoms. Plato and Aristotle did not accept this mechanism. Plato considered the concept of receptacle as the original cosmic space. Aristotle identified space with the places and limits of bodies. Stoicians and Epicurians on the contrary saw space as an infinite and corporal extension, which extends outside the world limits.
ON FUZZY SPATIAL DISTANCES
55
With the development of modern mathematical science of nature, the concept of space became more substantial and gained in autonomy. Rene´ Descartes (1596–1650) considered that spatial extension is specific to material entities, governed only by the laws of mechanics. Isaac Newton (1642–1727) introduced the absolute space and ‘‘sensorium Dei,’’ which was later criticized by Leibniz. David Hume (1711–1776) reduced space to a pure psychological function, leading Kant to advocate in favor of the objectivity of space. Samuel Clarke (1675–1729), Immanuel Kant (1724–1804), and Wilhelm Leibniz (1646–1716) debated about the intrinsic nature of space. Leibniz’s arguments were that:
God does not need a sense organ to perceive objects (this view was in opposition to Newton’s view of space as God’s boundless uniform sensorium); space cannot be an absolute reality; motion and position are real and detectable only in relation to other objects, not in relation to space itself since space itself represents no object (space becomes then itself an unnecessary hypothesis). Clarke replied to the last point and argued that motion is detectable in relation to space itself, for an object accelarating or rotating alone in a void betrays the effect of forces that exist in relation to no other object.1 Kant considered space as absolute. This viewpoint was accepted until the emergence of relativity. He argued that asymmetrical objects and their mirror-imaged counterparts are genuinely and physically different. No rotation in 3D space can turn one into the other (they could be rotated to each other in 4D space), which shows that space itself is real and independent of the objects. But he proposed reconciling Newton and Leibniz by considering that space is absolute and real for objects in experience, but really nothing among things in themselves. He considered that Euclid’s axioms of geometry are not logically necessary (and could be denied), but that they are known prior to experience and depend on our intuition of space. He developed antinomies to show that contradictory arguments can be advocated to consider space and time as finite or infinite (neither one nor the other can be proved) [96]. Hermann Helmoltz (1821–1894) [85] considered space as the ‘‘necessary form of outer intuition’’ prior to experience. He also related the perception of space to movement: the spatial ordering of objects is perceived through the moving sensations. He pointed out the disagreement between the conception of intuition and the training (in analytical methods, 1
http://www.friesian.com/space.htm.
56
ISABELLE BLOCH
perspective constructions, optical phenomena) required to represent spatial relationships in meta-mathematical spaces. For him Kant’s proof of the transcendental nature of the geometrical axioms is untenable, and he considered that these axioms are subject to proof or disproof by experience. Henri Poincare´ (1854–1912) [130] had an empiricist point of view and considered that spatial knowledge is mainly derived from motor experience. In this respect, localizing a spatial entity means representing mentally the necessary movements to reach this entity. But brain maturation also plays an important role. Poincare´ argued in favor of the relativity of space.2 More specifically, concerning distances he claimed that we cannot say that we know the distance between two points, since it can undergo strong variations that we may not perceive if other distances vary in the same proportions. There is no direct intuition of space, of distance, of magnitude, but only relations to a measuring instrument, in particular our own body. This is related to an understanding of space from motor experience mentioned above. A point is thus defined as the succession of movements required to reach it. Moreover space is not homogeneous, since different points cannot be considered as equivalent if the cost to reach them is not the same. Henri Bergson (1859–1941) addressed the problem of space and time through the notion of multiplicity. He considered two types of multiplicity, a numerical one which implies space as one of its conditions, and a qualitative one which implies time as one of its conditions. As opposed to Kant, he considered an ideal space not as a property of things but as in intellectual synthesis. Intuition gives access to the pure duration, in opposition to the spatialized time. One of Bergson’s theses is that a position in space can be considered as an instantaneous cut of the movement, but the movement is more than a sum of positions in space. But this goes even further, and another thesis is that a movement is a cut in time, and should be seen from a temporal perspective rather than from a spatial one. Albert Einstein (1879–1955) considered that geometry is linked to the sensible and perceptible space. The geometrical configuration of the world itself becomes relative, it depends on the distribution of masses and on their speed, and is better described by a non-Euclidean, Riemanian geometry [68]. Because of the scientific revolution of Einstein, the concept of space became a real topic of debate between philosophers and scientists, and several attempts were made to conciliate a philosophical explanation of space and results from physics.
2
http://www.marxists.org/reference/subject/philosophy/works/fr/poincare.htm.
ON FUZZY SPATIAL DISTANCES
57
In the meantime, purely philosophical views of space were developed by the phenomenologists and the existentialists: Edmund Husserl (1859–1938), Jean-Paul Sartre (1905–1980), Martin Heidegger (1889–1976). Many other philosophers have considered the question of space and distance. Mentioning all schools and thoughts is outside the scope of this chapter, although further investigations in this direction could certainly be both fascinating and helpful.
B. Linguistics Natural languages usually offer a rich variety of lexical terms for describing the spatial location of entities. These terms are not only numerous, they also concern all lexical categories, such as nouns, verbs, adjectives, adverbs, and prepositions [4]. The domain of linguistics is a source of inspiration of many works on qualitative spatial information representation and qualitative spatial reasoning [49]. Modeling qualitative spatial relations strongly relies on the way these relations are expressed verbally. Several properties are exhibited, such as the asymmetry of some expressions, the non-bijective relation between language and spatial concepts (in particular for prepositions [4,86,152]), and the interaction between distances and orientation [86,147]. Another important characteristic of linguistic expressions is the imprecision attached to ternary (or more) spatial relations (for instance being among people), but also to binary ones. Usually the context allows one to deal with such expressions and the linguistic statements are generally clear and informative enough to prevent misunderstanding. A remarkable feature is that representation and communication are then achieved without using numbers [4]. Conversely, apparently precise statements (for instance containing crisp numbers) should not always be understood as really precise, but rather as order of magnitudes. Let us consider for instance, the sentence ‘‘Paris and Toulouse are at a distance of 700 km.’’ The number 700 should not be considered as an exact value. It gives an idea of the distance, and its interpretation is subject to some considerations such as the areas of Paris and of Toulouse that are really concerned and the way to travel from one city to the other. Too precise statements can even become inefficient if they make the message too complex. This appears typically in the problem of route description for helping navigation and pathfinding. The example of giving directions in Venice is particularly eloquent [58].
58
ISABELLE BLOCH
Moreover, the way to describe spatial situations, the vision and the representation of space are not fixed and are likely to be modified depending on perceptual data and on discourse situation [4]. In linguistic statements about space and distance, the geometrical terms of the language that are involved in these statements are usually not sufficient to get a clear meaning. The statement context is also of prime importance, as well as the functional properties of the considered physical entities. This appears, for instance, in the use of prepositions, where, for example, the shape of an object influences the interpretation of a preposition preceding the name of this object. In [4], three levels are therefore distinguished for analyzing and representing the meaning of spatial expressions:
geometrical level: it concerns the objective space; functional level: it accounts for the properties of the entities described in the text and for the nongeometrical relations; pragmatic level, including the underlying principles for a good communication.
Languages exhibit strong differences in their spatial terms. This concerns the way to partition space, the terms describing motion events, and the preferred lexical categories. For instance, French, and other Romance languages, shows a typological preference for the lexicalization of the path in the main verb. On the contrary in Germanic and Slavic languages, the path is rather encoded in satellites associated to the verb (particle or prefix) [148]. Another subdivision considers a linguistic expression as composed of a theme, a frame of reference, and a medium. The medium can typically be related to distance, quality, etc. Three levels are then distinguished [59]:
thematic segmentation, involving topology and qualitative thresholds (for instance ‘‘close to’’), with a possible multiscale aspect; pseudo-analog representation, involving a metric; knowledge. The multiscale aspect allows us to deal with different levels of granularity. This overcomes some of the limits of approaches which have a fixed granularity and cannot properly manage both large-scale and close-range information, and of approaches which deal with infinitesimals but are faced with Xeno’s paradox. An interesting point that is worth mentioning to conclude this section is the importance of spatial metaphors in several natural languages, which allow us to communicate knowledge and information that would be difficult to communicate otherwise.
ON FUZZY SPATIAL DISTANCES
59
It is also interesting to consider how space is used in sign language: it appears that two types of iconicity are used: one (called imagistic) where the signing space is directly used to the spatialization of objects with respect to the ground and to some landmarks; and a second one, called diagrammatic, which does not reproduce the real space but rather conceives and constructs it as a diagram. Both types can be combined to use the space [56,140].
C. Human Perception A number of factors influence the perception of distance, leading to different measures [49]:
Purely spatial measures, in a geometric sense, give rise to ‘‘metric distances,’’ and are related to intrinsic properties of objects. It should be noted that these characteristics do not involve only purely geometrical distances, but also topological, size, and shape properties of objects. Temporal measures lead to distances expressed as travel time, and can be considered of extrinsic type, as opposed to the previous class. This advocates for treating space and time together (which will not be done in this chapter). Economic measures, in terms of costs to be invested, are also of extrinsic type. Perceptual measures lead to distance of deictic type. They are related to an external point of view, which can be concrete or just a mental representation, which can be influenced by environmental features, by subjective considerations, leading to distances that are not necessarily symmetrical. The discourse situation also plays a role at this level, as mentioned above for linguistic aspects. As mentioned in [74,83], the perception of distance between objects also depends on the presence or absence of other objects in the environment. If there are no other objects, the perception and human reasoning are mainly of geometrical type and distances are absolute. On the contrary when there are other objects, the perception of distance becomes relative. The size of the area and the frame of reference also play a crucial role in the perception of distances [49], in particular by defining the scale and the upper bound of the perceived distances. Perception is therefore not scale independent [123], while language is to a large extent scale independent [147].
60
ISABELLE BLOCH
Finally, attractiveness of the objects strongly affects the perception of proximity [83].
D. Cognition Spatial reasoning has often to deal with both quantitative measures and qualitative statements. The advantage of quantitative measures lies in their absolute meaning. On the contrary, qualitative information is dependent on the context. However, qualitative information is easily handled by humans and is often more meaningful and eloquent, and therefore preferred [49,83]. This raises the question of links between quantitative data and qualitative information, that is largely addressed in the fuzzy set community. The dependence on the context of qualitative information is particularly obvious when spatial distances are concerned. The meaning of ‘‘A is far from B’’ depends on the relative size of A and B, on other scale factors, on the type of deduction and consequence we expect from this statement (for instance how I can go from A to B), etc. The translation of this statement in a quantitative value can lead to completely different results depending on this context. For instance, saying that my preferred bookstore is far from my house can be understood as a distance of about a few hundred meters to a few kilometers, while saying that my cousin lives far away can be understood as a distance of about a few hundred to a few thousands kilometers. Such difficulties are related to the linguistic aspects mentioned before, as well as to the subjectivity of perception (in particular concerning the attractiveness of the objects). The cognitive understanding of a spatial environment, in particular in large-scale spaces, is issued from two types of processes [41,49,84]:
route knowledge acquisition, which consists in learning from sensorimotor experience (i.e., actual navigation) and implies an order information between visited landmarks; survey knowledge acquisition, from symbolic sources such as maps, leading to a global view (‘‘from above’’) including global features and relationships, which is independent of the order of landmarks. During development and growth, children first acquire route knowledge and have a local perception and representation of space. They acquire survey knowledge later, when they become able to perceive space from a more global perspective. The ability to reason about metrical properties of space comes at a very late stage of development [13,67,129,143]. The mapping between spatial words and basic spatial concepts does not appear to be universal, and languages differ in their partitioning of space.
ON FUZZY SPATIAL DISTANCES
61
Children are able to distinguish between the different spatial categories of their own language at the age of 18–24 months. Such differences between languages can also be observed in the representation of motion events. These two processes can be observed in neuroimaging [121]. For instance, a right hippocampal activation can be observed for both mental navigation and mental map tasks. A parahippocampal gyrus activation is additionally observed only for mental navigation, when route information and object landmarks have to be incorporated. Moreover, a mental simulation of a subject before reproducing a path from memory affects both map-like and route-like representations of the environment, and it allows the subject to better reproduce the path [154]. This is mostly observed for simple shapes, suggesting that the internal representation of space depends on geometric properties of the environment. Experiments in the case of sensory conflicts between visual and nonvisual information have been performed in [107] and show that either visual or nonvisual information can be used according to the task and to the sensory context. There are therefore at least two cognitive strategies of memory storage and retrieval, for mental simulation of the same path. As for the internal representation of space in the brain, a distinction is usually made between egocentric and allocentric representations [49,124]. Although the notion of ‘‘map in the head’’ has recognized limitations as a cognitive theory, it is still quite popular, and corresponds to the allocentric representations. It is important to note that the psychological space does not need to mirror the physical space. As shown in [9], the egocentric route strategy needs the memory of the associated movements with landmarks and episodes (kinesthesic memory). Solving Pythagoras’s theorem from memory is possible using vestibular information, but requires one to convert an egocentric representation into an allocentric representation. The mental representation is also combined with other factors in cognitive processes about space. For instance, questions such as ‘‘where am I?’’ can find different answers corresponding to [125]:
autobiographical memory, semantic memory, stress and emotion, egocentric spatial representation.
Cognitive studies report that distance and direction are quite dissociated. On the contrary, as mentioned for the perception, from a cognitive point of view, time and space cannot be easily separated.
62
ISABELLE BLOCH
The importance of the frame of reference, highlighted in all domains, has also a cognitive flavor: cognitive studies have shown that multiple frames of reference are usually used and appear as necessary for understanding and navigating in a spatial environment [49,104]. Changes of viewpoint are also strongly involved in social interactions, and are required in order to understand and memorize where others are glancing [9]. These cognitive concepts have been intensively used in several works in the modeling and conception of geographic information systems (GIS), where spatial information is the core [118,128]. Let us just mention two examples. In [110,141], a fuzzy cognitive map framework is introduced for GIS, inspired by the cognitive aspects of space and spatial relationships. It aims at integrating quantitative and qualitative data, taking into account the fuzziness of relations such as spatial distances, and at providing a decision support producing cognitive descriptions similar to the ones a human expert could derive and use. Another example is the geocognostics framework proposed in [67], which aims at integrating in a common framework both formal geometric representations of spatial information and formal representations of cognitive processes. The idea is to express views and trajectories in cognitive terms and then to re-interpret them geometrically. Another field where cognitive aspects about space inspire the development of frameworks and systems is the domain of mobile robotics. The work by Kuipers is fundamental in this respect [103–105]. His spatial semantic hierarchy is a model of knowledge of large-scale space including both qualitative and quantitative representations, and is strongly inspired by the properties of the human cognitive map. It aims at providing methods for robot exploration and map building. The hierarchy consists of sensory, control, causal, topological and metrical levels. As mentioned in the introduction, we are concerned in this chapter mainly by the last level. A new approach was proposed in [76], called conceptual spaces. These can be considered as a representation of cognitive systems, intermediate between the high-level symbolic representations and the subconceptual connectionist representations [1,77]. They emphasize orders and measures, and a key notion is distances between concepts, leading to geometrical representations, but using quality dimensions. They offer a nice and natural way to model categories, to express similarities. Distances are therefore put to the fore in such spaces. Ga¨rdenfors shows that ‘‘a conceptual mode based on geometrical and topological representations deserves at least as much attention in cognitive science as the symbolic and
ON FUZZY SPATIAL DISTANCES
63
the associationistic approaches,’’ and his book is therefore about the ‘‘geometry of thoughts’’ [77].
III. SPATIAL FUZZY DISTANCES: GENERAL CONSIDERATIONS A. Spatial Fuzzy Sets In this chapter we deal with spatial information, represented by specific fuzzy sets that model spatial objects and the imprecision attached to them. They are defined as follows. Let us denote by S the spatial domain (usually Rn or Zn in the discrete case). We denote by x, y, etc., the spatial variables, i.e., points of S (called pixels or voxels in the discrete case). We denote by dS (x, y) the spatial distance between two points x and y of S (related to the Cartesian space they belong to and independent of their membership of any possible fuzzy set). Generally dS is taken as the Euclidean distance on S. A crisp object is, as usual, a subset of S. Similarly, a fuzzy object is defined as a fuzzy subset of S. A fuzzy object is defined bi-univocally by its membership function, denoted by Greek letters ( , , etc.). A membership function characterizing a fuzzy object is therefore a function, say, , from S into [0, 1]. For each x in S, (x) is a value in [0, 1] which represents the membership degree of the point x to the fuzzy set . Such a representation allows for a direct representation of the spatial information. We denote by F the set of all fuzzy sets defined on S. For any two fuzzy objects and , we denote by d(, ) their distance. The definition of distances between fuzzy objects is the main scope of this chapter. We will also briefly address the question of defining the distance from a point to a fuzzy set, and the distance between two points of a fuzzy set, in a geodesic sense. Since we are mainly interested here in the type of information that is included in the various distance definitions, we assume that the fuzzy sets satisfy the necessary properties such that all mathematical expressions are well defined. For instance in the continuous case, several definitions assume that the membership functions are Lebesgue integrable. This will not be specified in the following. Moreover, in most cases we will restrict the discussion to the discrete bounded case (i.e., membership functions defined on Zn and having a bounded support), since this is the most useful case in applications such as image processing, mobile robotics, and geographic information systems.
64
ISABELLE BLOCH
B. Representation Issues Since the spatial objects we consider are imprecisely defined and therefore represented as fuzzy sets, there are several options to represent distances between such objects. Although the most common representation is to consider a distance as a number in R þ (or more specifically in [0, 1] for some definitions), different representations may be found more suitable for representing imprecision: if the objects are imprecise, we may expect that the distance between them is imprecise too. This argument is advocated in particular in [62,138], and also in [20,120]. Then the distance is better represented as a fuzzy set, and more precisely as a fuzzy number or a fuzzy interval (a convex upper semicontinuous fuzzy set on R þ having a bounded support). In [138], Rosenfeld defines two concepts that will be used in the following. One is distance density, denoted by ( , ) and the other distance distribution, denoted by ( , ), both being fuzzy sets on R þ . They are linked together by the following relation: Z
n
ð, ÞðnÞ ¼
ð, Þðn0 Þ dn0 :
ð1Þ
0
While the distance distribution value ( , )(n) represents the degree to which the distance between and is less than n, the distance density value ( , )(n) represents the degree to which the distance is equal to n. A simplified representation can also be considered, as intervals for instance. It may be easier to handle while keeping some information on imprecision through the length of the interval. The concept of distance can be represented as a linguistic variable. This assumes a granulation [160] of the set of possible distance values into symbolic classes such as near and far, each of these classes being defined as a fuzzy set. This approach has been taken, for example, in [8,12,83,102]. Spatial relations are then defined as restrictions on linguistic variables (e.g., [83]). Then they can be used to produce automatically scene descriptions using fuzzy rules (e.g., [98]). Finally, purely symbolic expressions can be used as logical formulas. C. Types of Distances and Problems Several problems can be addressed when fuzzy distances are concerned. We distinguish three of them:
distances between two points in a fuzzy set,
ON FUZZY SPATIAL DISTANCES
65
distances from a point to a fuzzy set, distances between two fuzzy sets.
The first type of distance is less treated in the literature. In the crisp case, this kind of distance is widely used in classical image processing and pattern recognition [142]. The definition of its fuzzy equivalent should lead to the design of new tools for generalizing classical methods when imprecision in structures and images has to be taken into account. We proposed in [15] to define a distance between two points in a fuzzy set as a fuzzy generalization of the concept of geodesic distance in a crisp set, by introducing fuzzy connectivity. Typical applications in fuzzy spatial information processing consist in finding the best path in the geodesic sense in a spatial fuzzy set representing some objective function (satisfiability of a property, security areas around objects, etc.). Fuzzy geodesic distance is also the basis for fuzzy geodesic operators, e.g., morphological ones [18,21]. This type of distance is detailed in Section IV. Distances from a point to a fuzzy set have not received much attention in the literature, although they are useful in several domains: they can be used for classification purposes where a point has to be attributed to the nearest fuzzy class, or when considering distance from a point to the complement of a fuzzy set , we obtain the basic information for computing a fuzzy skeleton of . Additionally, they may serve as a basis for defining distances between two fuzzy sets. We defined such distances based on fuzzy mathematical morphology in [14]. They are mentioned in Section V. The main focus of this chapter is the third kind of distance (between two fuzzy sets), and extends previous work [20]. It is the most widely addressed in the literature, but not often in the context of spatial objects. The specificities of spatial information call for a study of the existing definitions in terms of the spatial properties they include, and even for the definition of new ones. Applications of such distances cover a very large field, including image registration, assessment of relationships between image components, comparison of imprecise spatial objects, structural pattern recognition, etc. Roughly speaking, these applications can be grouped into two classes. The first class deals with distances dedicated to the comparison of shapes, these shapes being possibly contained in different images, or represent one image object and one model object. The concerned applications are related to registration and to recognition. The second class deals with distances between two objects in the same space, and provides measures for quantifying how far one object is from the other. It can also serve for modelbased pattern recognition, as a relationship between image (respectively model) objects. For instance, if we consider a graph-based recognition method, where the objects of the scene are the nodes of the graph, then
66
ISABELLE BLOCH
distances of the first class provide a way to compare nodes in two graphs, while distances of the second class can be considered as attributes of the arcs between two nodes in each graph [7,127]. While in the crisp case geodesic distance and distance from a point to a compact set are well defined, several definitions exist for distances between two sets. The main ones are the following.
Nearest point or minimum distance: dN ðX, YÞ ¼
inf
x2X,y2Y
dS ðx, yÞ,
ð2Þ
where X and Y are two crisp subsets of S (in the finite case, the infimum is replaced by a minimum). Maximum distance: dM ðX, YÞ ¼ sup dS ðx, yÞ:
ð3Þ
x2X,y2Y
Average distance: X 1 dðx, yÞ, jXkYj x2X,y2Y
ð4Þ
1 X 1 X dðx, YÞ þ dðy, XÞ, jXj x2X jYj y2Y
ð5Þ
dA ðX, YÞ ¼ or in a different form: dA0 ðX, YÞ ¼
where d(x, Y) denotes the distance from x to Y: dðx, YÞ ¼ inf dS ðx, yÞ:
ð6Þ
y2Y
Hausdorff distance: "
#
dH ðX, YÞ ¼ max sup dðx, YÞ, sup dðy, XÞ : x2X
ð7Þ
y2Y
Note that only the Hausdorff distance is a true distance, satisfying all properties of a metric (see Section III.E). In all these definitions, the objects are supposed to be given, and the aim is to evaluate the distance between them.
ON FUZZY SPATIAL DISTANCES
67
Another type of question is the satisfaction of some given distance relationship between two objects or with respect to a given object. For instance, we may want to answer questions such as: to what extent are two objects near to each other? or which are the areas of the space that are at a distance of about 10 from a given object? The first type of question only requires a comparison measure between the fuzzy distance and the fuzzy set defining the semantics of ‘‘near.’’ On the contrary, the second type of question requires a completely different approach, since only one set is known. We proposed in [22] spatial representations of spatial relations in order to solve such problems. This approach is detailed for the specific case of distances in Section VII. Finally, if little information is available, it may be useful to handle distance information in a completely qualitative way, using symbolic approaches such as formal logics. This will be addressed in Section VIII, following the formalism proposed in [23,24].
D. General Principles for Defining a Fuzzy Distance In this section we briefly summarize the main approaches that can be followed in order to define a fuzzy distance. These include:
approaches that rely on the definition of a crisp distance and try to generalize them, approaches that infer a distance from a similarity function, approaches that deduce a distance from set relationships between both sets (or other types of relationships), symbolic approaches.
1. Generalizing a Crisp Distance to a Fuzzy One We first consider the class of approaches to define a fuzzy distance that rely on the extension of a given crisp distance. They belong to the general problem of extending a relationship RB between two binary objects to its fuzzy equivalent R (fuzzy relationship between two fuzzy objects). Instances of the described methods to the case of distance are provided in Sections VI.A and VI.B. From -cuts. A way to define crisp sets from a fuzzy set consists in taking the -cuts of this set. Therefore one class of methods relies on the application of the relationship RB to each -cut. This gives rise to two different ‘‘fuzzification’’ methods in the literature.
68
ISABELLE BLOCH
The first fuzzification method consists in ‘‘stacking’’ the results obtained with binary operations on the -cuts: the fuzzy equivalent R of RB is defined as (see, e.g., [29,60,102]): Z
1
Rð, Þ ¼
RB ð , Þ d ,
ð8Þ
0
where denotes the -cut of , or by a double integration as: Rð, Þ ¼
Z 1Z 0
1
RB ð , Þ d d :
ð9Þ
0
Other fuzzification equations are possible, like: Rð, Þ ¼ sup min ð , RB ð , ÞÞ 2½0,1
or
Rð, Þ ¼ sup ð RB ð , ÞÞ, ð10Þ 2½0,1
the first of these equations being meaningful if RB takes values in [0, 1]. This approach has been applied to the definition of several fuzzy operations, for instance connectivity [137], fuzzy mathematical morphology [29], fuzzy adjacency [30], and of course distances [14,28,60] as will be seen later. As mentioned in [40,71] for instance, this approach has to be used with care in case of empty -cuts. The second fuzzification method is the extension principle [162], which leads in the general case to a fuzzy number (instead of a crisp number): 8n 2 VðRB Þ, Rð, ÞðnÞ ¼
sup
RB ð , Þ¼n
,
ð11Þ
where V(RB) denotes the image of RB, i.e., the set of values taken by RB (R þ or [0, 1] in the case of distances). Translating binary equations into fuzzy ones. Another way to proceed, in order to derive a fuzzy definition from a crisp one, consists in translating binary equations into their fuzzy equivalent: intersection is replaced by a t-norm, union by a t-conorm, sets by membership functions, etc. Examples can be found for defining fuzzy morphology [29], fuzzy inclusion [144], etc. This translation is particularly straightforward if the binary relationship can be expressed in set theoretical and logical terms. This can be obtained in a natural way for several distances, like nearest point distance or Hausdorff distance [14]. This remark endows methods based on mathematical morphology with a particular interest, since mathematical morphology is mainly based on set theory. This approach will be used in Section VI.B.
ON FUZZY SPATIAL DISTANCES
69
2. Distances from Similarity Considering that a distance can be derived formally from a similarity measure (see, e.g., [5,90,95,109,158,161]), then the problem amounts to defining the similarity measure. This can be addressed using one of the previous methods, given a similarity between crisp sets. However, because of the links between similarity and pattern recognition problems, this approach is often used for comparing objects based on some features, possibly fuzzy ones, that are extracted from the spatial information in preliminary stages. Then the similarity concerns these features, and not the objects as spatial fuzzy sets. This may explain why this approach leads mainly to distances dealing with membership functions only (Section VI.A). Similarity-based approaches can benefit from the existing algorithms for checking if a relation is a similarity, in particular if it satisfies the transitivity property (e.g., [131,149]).
3. Distances from Set Relationships Set relationships provide a lot of information for the comparison of objects, typically in the case where image objects have to be compared with some models or prototypes. Similar objects are expected to strongly overlap and to have reduced differences. We have chosen to present here the approach proposed in [37,136], where a very useful typology of comparison measures is proposed. In this work, a comparison measure is generally defined as a function of three variables FS[ M( \ ), M( ), M( )], where M is a fuzzy set measure (e.g., fuzzy cardinality) and denotes a difference operator (such that ) ¼ ;, and 0 ) 0 ). This approach is closed to Tversky definitions [151]. Then specific types of comparison measures are defined:
a similitude measure is a comparison measure such that FS(x, y, z) is nondecreasing with respect to x and nonincreasing with respect to y and z (this corresponds to the fact that two fuzzy sets are more similar if they have a greater intersection and less difference); a satisfiability measure is a similitude measure such that FS(0, y, z) ¼ 0 FS(x, 0, z) ¼ 1, and which does not depend on z (this corresponds to the case where the first object is considered as a reference to which the other is compared); an inclusion measure is a reflexive similitude measure such that FS(0, y, z) ¼ 0 and FS does not depend on z; a resemblance measure is a symmetrical and reflexive measure;
70
ISABELLE BLOCH
a dissimilarity measure is a comparison measure taking value 0 if ¼ , and such that FS is independent of x and increasing with respect to y and z.
A distance between two fuzzy sets can be derived from a dissimilarity measure, or from 1 FS if FS defines a similitude measure. Several distances that have been proposed in the literature can be classified from this point of view. 4. Distances from Other Relationships When distances are mainly used for comparing shapes, they may be derived from other relationships between objects, not only metric ones. Set relationships can be used as shown in the previous section, but also several other ones, like geometrical features extracted from the object or any other type of attribute, and topological relationships like ‘‘overlap’’ and ‘‘meet’’ [44,48,118]. Since such measures do not necessarily include information on the spatial distance, they are mainly found in the first class of definitions (Section VI.A) and used for model-based pattern recognition, for approaches relying on prototypes, for applications like indexing and searching in image databases (e.g., [44,145]). Such methods are often related to similarity-based measures. 5. Symbolic Approaches We mean by ‘‘symbolic approaches’’ methods that try to define linguistic variables representing distances, or to reason with purely symbolic expressions (the last types of representation mentioned before). For instance in image processing, the problem amounts to deriving symbolic representations from the numerical information carried by the image and from computation on it (see, e.g., [102]). These representations then provide a kind of summarization of the image content related to metric information. Distance information can be represented using words such as ‘‘close,’’ ‘‘far’’ [72], which constitute the rougher level of granularity, or with further levels of granularity (e.g., ‘‘very close,’’ ‘‘close,’’ ‘‘medium,’’ ‘‘far,’’ ‘‘very far’’) [49]. Relative distances are also useful in qualitative reasoning, and use words such as ‘‘closer.’’ Dealing with such information can be performed either in the semiqualitative framework of fuzzy sets, where fuzzy sets are used to define the semantics of the linguistic values, or in a purely qualitative framework, using logics.
ON FUZZY SPATIAL DISTANCES
71
It should be noted that even a statement containing a precise value, as often used in the common language (the distance between town A and town B is 300 km), should often be considered as an imprecise statement, and should be preferably modeled as such.
E. Properties of Distances and Requirements for Spatial Distances For most applications such as pattern recognition, scene interpretation, and path planning, one basic property that should be satisfied by spatial distances is invariance under geometric transformation, in particular rigid ones. Therefore the first required property is expressed as:
P0 invariance under rotation and translation.
Since the definitions summarized in this chapter do not always satisfy strictly the properties of a distance (or metric), we should rather speak of more general proximity functions. However, for the sake of simplicity we will keep the term distance. The main classes of proximity measures are recalled in this section. A metric is a positive function d such that:
P1 P2 P3 P4
8 2 F , dð, Þ ¼ 0 (reflexivity), 8ð, Þ 2 F 2 , dð, Þ ¼ 0 ) ¼ (separability), 8ð, Þ 2 F 2 , dð, Þ ¼ dð , Þ (symmetry), 8ð, , Þ 2 F 3 , dð, Þ dð, Þ þ dð , Þ (triangular inequality).
Several kinds of measures can be defined with less requirements: a pseudo-metric is a function satisfying 1, 3, and 4 (separability does not necessarily hold), a semimetric satisfies 1, 2, and 3 (and not the triangular inequality), a semipseudometric satisfies only 1 and 3, etc. (see, e.g., [113]). For instance in the crisp case, the Hausdorff distance is a metric, while the nearest point (or minimum) distance does not satisfy the separability property (since any two intersecting sets are at a zero minimum distance) or the triangular inequality, and is therefore a semipseudometric. Since distances may be derived from similarity measures, we recall here the definition of this concept. A similarity relation [161] is a function s taking values in [0, 1], such that: (1) 8 2 F , sð, Þ ¼ 1 (reflexivity), (2) 8ð, Þ 2 F 2 , sð, Þ ¼ sð , Þ (symmetry), (3) 8ð, , Þ 2 F 3 , t½sð, Þ, sð , Þ sð, Þ (t-transitivity, where t is a t-norm). A similarity relation is also called t-indistinguishability or t-equivalence.
72
ISABELLE BLOCH
If we set d ¼ 1 s, obviously d is a semipseudometric. The first property corresponds to P1, the second one to P3. As for the last one, it can be expressed in terms of distance as:
P5 8ð, , Þ 2 F 3 , dð, Þ T½dð, Þ, dð , Þ , where T is the t-conorm dual to t.
If t ¼ min, then we also have: 8ð, , Þ 2 F 3 , dð, Þ max ½dð, Þ, dð , Þ which is a property of a hypermetric. If t is the Lukasiewicz t-norm (i.e., t(a, b) ¼ max(0, a þ b 1), and by duality the corresponding t-conorm is T(a, b) ¼ max(1, x þ y)), then d satisfies also the triangular inequality and is a pseudometric. Therefore P5 implies P4 at least for all t-conorms which are smaller than the Lukasiewicz one. If f is an additive generator (typically like the functions used for generating continuous Archimedian t-norms [63]), then d ¼ f s is a pseudometric (taking values in R þ ) if and only if the t-norm generated by f is less than t [5]. A similar relationship holds between a metric and a t-equality (i.e., a similarity such that s( , ) ¼ 1 if and only if ¼ ). From a topological point of view, the definition of a metric d on F induces a topology on F, and therefore a continuity. It has been studied for instance in [57] for the case of the fuzzy Hausdorff distance. Partial results can also be obtained if d has less properties: if we set clðÞ ¼ f 2 F , dð, Þ ¼ 0g for d being a semipseudometric, then the function cl is a preclosure on F, which therefore defines a pretopology on F (see, e.g., [69,112]). Conversely, we may derive a semipseudometric from any (nonidempotent) adherence defined on F. Some other properties, issued from the wide literature on fuzzy similarities, can be transposed to distances (see, e.g., [70,126]): P6 dð, Þ ¼ 1 Q SuppðÞ \ Suppð Þ ¼ ;, where Supp() denotes the support of (this property being meaningful for normalized distances). P7 8X S (X is crisp), dðX, X C Þ ¼ max ð, Þ2F 2 dð, Þ, which means that the values taken by d are bounded and the maximun value is attained on all crisp sets and their complements; if d is normalized in [0, 1], this maximum value is equal to 1. P8 monotony property:
( 8ð, , Þ 2 F , ) 3
dð, Þ dð, Þ dð, Þ dð, Þ
ON FUZZY SPATIAL DISTANCES
73
where the fuzzy subsethood is defined as the relation on membership functions. P9 property of -distance: 8ð, Þ 2 F 2 , 8X S, dð, Þ ¼ dð \ X, \ XÞ þ dð \ X C , \ X C Þ,
which implies dð, Þ ¼ dð \ , [ Þ: P10 8ð, Þ 2 F 2 , dð, Þ ¼ dðC , C Þ, which is a property often required to define approximate proximity.
All these properties are expressed in the case where the distance is expressed as a crisp number. If imprecise representations are used, such as intervals and fuzzy numbers, these properties have to be adapted. For instance, saying a distance is equal to 0 will be replaced by [0,0] in the case of interval representations, and by a fuzzy number with a support and core3 reduced to {0} in case of fuzzy number representations. Some properties such as P4, P5, P8–P10 require addition, inclusion, or union of intervals or of fuzzy numbers. They can also be considered pointwise (in R þ ), for instance at each value n for ð, ÞðnÞ. Although we may speak about distances between spatial objects in a very general way, this expression does not make necessarily the assumption that we are dealing with true metrics. For several applications, it is not certain that all properties are needed. An important use of distances is related to the comparison of shapes, which reinforces the interest of deriving distances from similarities. The concept of similarities between objects, in particular spatial objects, contains some subjective aspects. As already stated by Poincare´ at the beginning of the twentieth century, and underlined by several authors in the fuzzy set domain (see, e.g., [64,94]), subjective similarities are not required to be transitive. This induces a loss of the triangular inequality in the derived distance. This question was raised in [50], and more precisely the authors suggested that if the t-transitivity is replaced by a pseudometric condition in order to define resemblance relations, then approximate equality is better modeled than when using a fuzzy equivalence relation. This point of view is controversial and has been discussed by several authors [33,34,51,92,101]. This unusual type of successive comments shows that there is no definite answer to the question of transitivity. Coming back to the spatial domain, typically for applications (in image interpretation for instance) where image objects have to be compared to 3 The support of a fuzzy set is the subset of points having nonzero membership values, while the core is the subset of points having a membership value equal to 1.
74
ISABELLE BLOCH
models, the triangular inequality is of no use, since the two arguments of the distance function belong to two different sets of objects. For such applications, semimetrics or even semipseudometrics may be sufficient. We may even go further in this direction. Indeed, since a semipseudometric does not satisfy the separability property, the study of the equation d( , ) ¼ 0 can be exploited in terms of pattern recognition. For instance if we build classes according to prototypes, this equation can be used as a classification rule: every object which is indistinguishable from a prototype will be added to the corresponding class. This has been developed in the context of pretopologies [36,69]. It is the non-idempotency of the adherence function in a pretopology that allows one to aggregate objects to a class. This is again an argument in favor of semipseudometrics. Moreover, when extending crisp distances to fuzzy ones, it is natural to expect that some properties may be lost. In particular it is difficult to extend a crisp distance while keeping the triangular inequality. Considering the Hausdorff distance, which is a true distance between two sets, i.e., satisfying properties P1–P4, it has been shown in [40] that under reasonable axioms it is not possible to define a fuzzy Hausdorff distance that is a true distance. Another aspect that can be useful in image processing and pattern recognition is the link existing between semimetrics and fuzzy partitions derived from a t-indistinguishability relation. This clearly finds applications as soon as the recognition or classification problem can be stated as the (fuzzy) partitioning of the set of objects. As for property P5, it is in general stronger than P4, and the above discussions apply a fortiori. On the contrary, property P0 (invariance under rigid transformations) is a strong requirement in most applications dealing with spatial distances. Property P1 (reflexivity) is often required and considered as quite natural. On the contrary, the separability property P2 is more difficult to satisfy. As soon as the two objects are considered to play the same role in the evaluation of their distance, it is natural to require P3 (symmetry). Properties P6–P10 are meaningful only for some classes of distances, not involving directly spatial distances (Section VI.A), and accordingly for problems where an object is compared to a model object by means of distances. For these classes of distances, it can be also interesting to identify them as derived from particular forms of comparison measures, such as similitude, resemblance, and satisfiability. For instance, property P6 is not desirable if the spatial distance has to be taken into account. Let us take as an example the case of two objects and such that SuppðÞ \ Suppð Þ ¼ ; and SuppðÞ \ Suppð þ tÞ ¼ ; where þ t denotes the translation of
ON FUZZY SPATIAL DISTANCES
75
by t. Then we may expect that dð, Þ 6¼ dð, þ tÞ, which is not possible under property P6. Similarly as for P6, property P7 is meaningful only when comparing the membership functions of the two objects. For instance if we erode a set X, we can expect that, in a spatial sense, dðEðXÞ, X C Þ > dðX, X C Þ, where E(X) denotes the erosion of X by a structuring element containing the origin of the space, and therefore d( X, XC ) cannot be maximal.
IV. GEODESIC DISTANCE
IN A
FUZZY SET
Although the concept of geodesy is very important for crisp sets and should be promising as well for fuzzy sets, this topic has not been much addressed in the literature. Beside our previous work [15], we could find only one other work in the literature [139].
A. Fuzzy Geodesic Distance Defined as a Number We proposed in Ref. [15] original definitions for the distance between two points in a fuzzy set, extending the notion of geodesic distance. Among these definitions, one proved to have desirable properties and was therefore considered as better than the others. We recall here this definition and the main results we obtained. The geodesic distance between two points x and y represents the length of the shortest path between x and y that ‘‘goes out of as least as possible.’’ A formal definition of this concept relies on the degree of connectivity, as defined by Rosenfeld [137]. In the case where S is a discrete bounded space (as is usually the case in image processing), the degree of connectivity in between any two points x and y of S is defined as: i h c ðx, yÞ ¼ max min ðtÞ , Li 2L
t2Li
ð12Þ
where L denotes the set of all paths from x to y. Each possible path Li from x to y is constituted by a sequence of points of S according to the discrete connectivity defined on S. We denote by L*(x, y) a shortest path between x and y on which c is reached (this path, not necessarily unique, can be interpreted as a geodesic path descending as least as possible in the membership degrees), and we denote by l(L*(x, y)) its length (computed in the discrete case from the
76
ISABELLE BLOCH μ
* l(L (x,y)) c μ (x,y) x
2
y x
S
d E (x,y)
x1
FIGURE 1. The geodesic distance in a fuzzy set between two points x and y in a 2D space.
number of points belonging to the path). Then we define the geodesic distance in between x and y as: d ðx, yÞ ¼
lðL* ðx, yÞÞ : c ðx, yÞ
ð13Þ
If c(x, y) ¼ 0, we have d ðx, yÞ ¼ þ1, which corresponds to the result obtained with the classical geodesic distance in the case where x and y belong to different connected components (actually it corresponds to the generalized geodesic distance, where infinite values are allowed). This definition corresponds to the weighted geodesic distance (in the classical sense) computed in the -cut of at level ¼ c ðx, yÞ. In this -cut, x and y belong to the same connected component (for the considered discrete crisp connectivity). This definition is illustrated in Figure 1. This definition satisfies the following set of properties (see [15] for the proof): positivity: 8ðx, yÞ 2 S 2 , d ðx, yÞ 0; symmetry: 8ðx, yÞ 2 S 2 , d ðx, yÞ ¼ d ðy, xÞ; separability: 8ðx, yÞ 2 S 2 , d ðx, yÞ ¼ 0 Q x ¼ y; d depends on the shortest path between x and y that ‘‘goes out’’ of ‘‘as least as possible,’’ and d tends towards infinity if it is not possible to find a path between x and y without going through a point t such that (t) ¼ 0; (5) d is decreasing with respect to (x) and (y);
(1) (2) (3) (4)
ON FUZZY SPATIAL DISTANCES
77
(6) d is decreasing with respect to c( x, y); (7) d is equal to the classical geodesic distance if is crisp. The triangular inequality is not satisfied, but from this definition it is possible to build a true distance, satisfying triangular inequality, while keeping all other properties. This can be achieved in the following way (see [15] for proof and details): d0 ðx, yÞ ¼ min t2S
lðL* ðx, tÞÞ lðL* ðt, yÞÞ þ : c ðx, tÞ c ðt, yÞ
Unfortunately this is computationally expensive. These properties are in agreement with what can be required from a fuzzy geodesic distance, both mathematically and intuitively. The definition proposed in [139] corresponds to one of the definitions proposed in [15] and is the length of the shortest path between the two considered points, the length being computed as the integral of the membership values along the path. Unfortunately, this definition does not meet all requirements we have here, since it does not satisfy the separability property and does not have the appropriate behavior with respect to the membership values (properties (4)–(6) in the preceding discussion). Indeed the best path can go through points with very low values (which tend to decrease the length), i.e., to go out of the set to some extent. However, one advantage of this distance is that it allows the authors in [139] to derive algorithms for computing the fuzzy distance transform.
B. Fuzzy Geodesic Distance Defined as a Fuzzy Number In the previous approach, the geodesic distance between two points is defined as a crisp number (i.e., a standard number). It could be also defined as a fuzzy number, taking into account the fact that, if the set is imprecisely defined, geodesic distances in this set can be imprecise too (as mentioned in Section III.B). This is the scope of this section. One solution to achieve this aim is to use the extension principle, based on a combination of the geodesic distances computed on each -cut of . Let us denote by d ðx, yÞ the geodesic distance between x and y in the crisp set . Using the extension principle, we define the degree to which the geodesic distance between x and y in is equal to d as: 8d 2 Rþ , d ðx, yÞðdÞ ¼ supf 2 ½0, 1 , d ðx, yÞ ¼ dg:
ð14Þ
78
ISABELLE BLOCH
d μ(x,y) cμ (x,y)
d d S(x,y)
d
(x,y)
μc (x,y) μ
FIGURE 2. Typical shape of the fuzzy geodesic distance between two points in a fuzzy set, defined as a fuzzy number.
This definition satisfies the following properties: (1) If > c ðx, yÞ, then x and y belong to two distinct connected components of .4 In this case, the (generalized) geodesic distance is infinite. If we restrict the evaluation of d ðx, yÞðdÞ to finite distances d, then d ðx, yÞðdÞ ¼ 0 for d > dc ðx,yÞ . (2) Let dS ðx, yÞ denote the Euclidean distance between x and y. It is the shortest of the geodesic distances that can be obtained in any crisp set that contains x and y. This set can be, for instance, the whole space S, which can be assimilated to the -cut of level 0 (0). Therefore, for d < dS ðx, yÞ, we have d ðx, yÞðdÞ ¼ 0. (3) Since the -cuts are nested ( 0 for > 0 ), it follows that d ðx, yÞ is increasing in , for c ðx, yÞ. Therefore, d(x, y) is a fuzzy number, with a maximum value for dc ðx,yÞ , and with a discontinuity at this point. Its shape looks as shown in Figure 2. This definition can be normalized by dividing all values by c(x, y), in order to get a maximum membership value equal to 1. One drawback of this definition is the discontinuity at dc ðx,yÞ . It also corresponds to the discontinuity existing in the crisp case when x and y belong to parts that become disconnected. Further work aims at exploiting
4 Since c ðx, yÞ corresponds to ‘‘height’’ (in terms of membership values) of the point along the path that connects x and y, i.e., the maximum of the minimal height along paths from x to y.
ON FUZZY SPATIAL DISTANCES
79
features of fuzzy set theory in order to avoid this discontinuity, if this is found desirable. The fuzzy geodesic distance can be used to define geodesic balls which can serve as structuring elements for defining fuzzy geodesic mathematical morphology, as shown in [21]. Conversely, in the discrete crisp case, geodesic morphology (and hence geodesic distance) can be obtained by iterating Euclidean morphological operations. Now, we exploit this idea to define a new geodesic distance as a fuzzy number. Let DnX ðYÞ denote the geodesic dilation of Y in X of size n. In the discrete crisp case, we have: DnX ðYÞ ¼ ðDðYÞ \ XÞn ,
ð15Þ
where D(Y ) denotes the Euclidean dilation of Y of size 1 and the exponent represents the number of iterations of the conditional dilation. This expression allows us to express the geodesic distance from a point x to a set Y conditionally to X as: dX ðx, YÞ ¼ n Q
x 2 6 ðDðYÞ \ XÞn1 x 2 ðDðYÞ \ XÞn
ð16Þ
from which we can derive the geodesic distance between two points x and y in a set X by considering y as a singleton set as: dX ðx, yÞ ¼ n Q
x 62 ðDðfygÞ \ XÞn1 x 2 ðDðfygÞ \ XÞn
ð17Þ
By extending this equation to the fuzzy case using the translation principle, we define the geodesic distance between two points x and y in a fuzzy set by: d ðx, yÞðnÞ ¼ t½c½tðD ððyÞÞðxÞ, ðxÞÞ n1 , ½tðD ððyÞÞðxÞ, ðxÞÞ n ,
ð18Þ
where the exponent still denotes the number of iterations, t is a t-norm, c is a fuzzy complementation (usually c(a) ¼ 1 a), denotes an elementary structuring element, and (y) denotes the fuzzy set of support {y} and value (y). The structuring element can be the unit crisp structuring element according to the chosen digital connectivity on S (as in the crisp case), or a fuzzy set representing the imprecision attached to the smallest spatial entities.
80
ISABELLE BLOCH
V. DISTANCE
FROM A
POINT
TO A
FUZZY SET
A. As a Number Distances from a point to a fuzzy set can be defined using a weighting approach or using a fuzzification from -cuts. In this way, they are defined as numbers. The idea in the weighting approach is that a point that has a low membership value to should have less influence in the computation of the infimum (or minimum). Therefore the distance between x and may be defined as: dðx, Þ ¼ inf ½dS ðx, yÞf ððyÞÞ , y2S
ð19Þ
where f is a decreasing function of (e.g., f ððyÞÞ ¼ 1=ðyÞ) such that f(1)< þ 1 (in order to guarantee that if x belongs completely to , i.e., if (x) ¼ 1, the distance is attained for y ¼ x), and with the convention 0f(0) ¼ þ 1. If (x) ¼ 0, i.e., if x is completely outside of , this definition leads to satisfactory results. However, if (x)>0, it leads always to 0, on the whole support of . This can be seen as a strong drawback of this definition, since we would intuitively rather expect that d(x, ) depends on the membership degree of x to . Generally speaking, it is required that d(x, ) be a strictly decreasing function of (x), with d(x, ) ¼ 0 if (x) ¼ 1. Defining a fuzzy function from its crisp equivalent applied on the -cuts is a very common way to proceed, which has already been used for defining several operations on fuzzy sets [60]. The two following equations express different combinations of the -cuts for defining d(x, ): Z
1
dðx, Þ ¼
dðx, Þ d ,
ð20Þ
dðx, Þ ¼ sup ½ dðx, Þ :
ð21Þ
0 2 0,1
The first one consists in ‘‘stacking’’ the results obtained on each -cut, while the second one consists in weighting these results by the level of the cut, d(x, ) being the classical distance from a point to a crisp set. Equation (21) does not lead to convenient results, since the obtained distance is always the distance from x to the core of , i.e., d(x, ) ¼ d(x, 1), and therefore does not depend on (x) if (x) 6¼ 1. Equation (20) does not share the same disadvantage, since all -cuts are explicitly involved in the result. For instance for and having the same
ON FUZZY SPATIAL DISTANCES
81
core and (x)> (x), we have d(x, )0: dðx, XÞ ¼ 0 Q x 2 X
ð22Þ
dðx, XÞ ¼ n Q x 2 Dn ðXÞ and x 62 Dn1 ðXÞ
ð23Þ
where Dn denotes the dilation by a ball of radius n centered at the origin of S (and D0(X) ¼ X ) (see, e.g., [35] for a study of discrete balls and discrete distances in the crisp case). In this case, the extensivity property of the 0 dilation holds [142], and x 62 Dn1 ðXÞ is equivalent to 8n0 < n, x 62 Dn ðXÞ. Equation (23) is equivalent to: x 2 Dn ðXÞ \ ½Dn1 ðXÞ C ,
ð24Þ
where AC denotes the complement set of A in S. This is a pure set theoretical expression, that we can now translate into fuzzy terms. This leads to the following definition of the degree to which d(x, ) is equal to n:
ðx,Þ ðnÞ ¼
ðx,Þ ð0Þ ¼ ðxÞ,
ð25Þ
t½Dn ðÞðxÞ, c½Dn1 ðÞðxÞ ,
ð26Þ
where t is a t-norm (fuzzy intersection), c a fuzzy complementation (typically cðaÞ ¼ 1 a for a 2 ½0, 1 ), and a fuzzy structuring element used for performing the dilation. As in Section IV, several choices of are possible.
82
ISABELLE BLOCH
It can be simply the unit ball, or a fuzzy set representing for instance the smallest sensitive unit in the image, along with the imprecision attached to it. In this case, has to be equal to 1 at the origin of S, such that the extensivity of the dilation still holds [29]. The properties of this definition are the following [14]: if ðxÞ ¼ 1, ðx,Þ ð0Þ ¼ 1 and 8n > 0, ðx,Þ ðnÞ ¼ 0, i.e., the distance is a crisp number in this case; if and are binary, the proposed definition coincides with the binary one; the fuzzy set (x,) can be interpreted as a density distance, from which a distance distribution can be deduced by integration (see Section III.B); finally, (x,) is a nonnormalized fuzzy number (in the discrete finite case).
Figure 3 presents an example of fuzzy numbers (x,)(n) obtained for different points, the spatial domain being reduced to a one-dimensional space in this example. The point x1 is outside the support of and at a larger distance from it than x2. The results correspond to intuition, since the fuzzy number (x2,)(n) is more concentrated around very small values of n than (x1,)(n). An example in a two-dimensional space is given in Figure 4. The distances of three points to the fuzzy set are computed, for three different t-norms (min, product, and Lukasiewicz). The coordinates of these points are, respectively, (25, 40) (point A, with high membership value to ), (26, 25) (point B, at the border of , with low membership value), and (60, 10) (point C, outside of the support of ). These points are superimposed on in Figure 4. The results are given in Figure 5.
Fuzzy set μ
δ( x , μ)
x2
x1 x1
x2
FIGURE 3. Fuzzy numbers representing (x,) ( being shown on the left) for two different x.
83
ON FUZZY SPATIAL DISTANCES
C (60, 10) B (26,25)
A (25, 40) FIGURE 4. A fuzzy set in a 2D space and the three points for which the distance to is computed.
Point (25, 40) to μ Point A – min
0.75
0.50
0.25
10
40
0.50
0.25
50
10
0.50
0.25
0.00
20 30 Distances
40
20 30 Distances
40
0.75
0.50
0.25
Point A – Lukasiewicz
0.75
0.50
0.25
0.00
10
20 30 Distances
40
10
20 30 Distances
40
50
50
Point C – product
0.50
0.25
0
0.75
0.50
0.25
10
20 30 Distances
40
50
Point C – Lukasiewicz
1.00
0.00 0
40
0.75
50
Point B – Lukasiewicz
1.00 Membership degrees
1.00
20 30 Distances
0.00 0
50
10
1.00
Membership degrees
10
0.25
0
0.00 0
t-norm: product
0.50
50
Point B – product
1.00
0.75
0.75
0.00 0
Membership degrees
Membership degrees
20 30 Distances
Point A – product
1.00
Membership degrees
0.75
Membership degrees
0
Point C – min
1.00
0.00
0.00 t-norm: min
t-norm: Lukasiewicz
Point (60, 10) to μ
Point B – min
1.00 Membership degrees
1.00 Membership degrees
Point (26, 25) to μ
Membership degrees
Distance
0.75
0.50
0.25
0.00 0
10
20 30 Distances
40
50
0
10
20 30 Distances
40
50
FIGURE 5. Distance from a point to a fuzzy set: example of three points and of Figure 4 with three different t-norms.
84
ISABELLE BLOCH
For the first point, which has a high membership to the fuzzy set, the distributions take a high value at 0 (equal to (x)), and decrease very fast. For the second point, which belongs to with a low membership value, the distributions are more spread. This represents the ambiguity in defining the distance of this point to the fuzzy set. For instance if we consider some defuzzification process using a threshold value on , depending on this threshold, the point would be more or less close to . The third point is outside of the support of , therefore the membership degrees of low distances are all equal to 0, and the distributions are shifted towards higher values. From this definition of a point to a fuzzy set, distances between two fuzzy sets can be derived using supremum or infimum computation of fuzzy numbers using the extension principle [61]. The details are given in [14], and summarized in the following. The maximum of p fuzzy numbers representing the fuzzy distance from xi to is: 8n 0, max ððx1 ,Þ , ðx2 ,Þ , . . . , ðxp ,Þ ÞðnÞ ¼
sup
ðn1 ,...,np Þ n¼max ðn1 ,...,np Þ
min ½ðx1 ,Þ ðn1 Þ, . . . , ðxp ,Þ ðnp Þ :
ð27Þ
In a similar way, the fuzzy minimum is defined as: 8n 0, min ððx1 ,Þ , ðx2 ,Þ , . . . , ðxp ,Þ ÞðnÞ ¼
sup
ðn1 ,...,np Þ n¼min ðn1 ,...,np Þ
min ½ðx1 ,Þ ðn1 Þ, . . . , ðxp ,Þ ðnp Þ :
ð28Þ
These expressions are in particular useful when p ¼ jSj (cardinality of S), and can therefore be used for defining distances between two fuzzy sets. As pointed out in [61], these definitions do not provide in general one of the input fuzzy numbers. Another interesting question may be: what is the greatest of these fuzzy numbers? A degree of possibility for a fuzzy set being greater than another one has been defined in [61]. Methods for ranking fuzzy numbers have also been proposed, e.g., in [150]. We do not make use of this point of view in the following and restrict ourselves to definitions (27) and (28). Now if we consider points in another fuzzy set defined on S, i.e., if we want to compute a function of (xi, ) over a set of xi having nonbinary membership degrees to , we have to introduce the values of (xi) in Equations (27) and (28), for instance as: 8n 0, max ððx1 ,Þ , ðx2 ,Þ , . . . , ðxp ,Þ ÞðnÞ ¼
sup
min ½min ½ðxi ,Þ ðni Þ, ðxi Þ :
i¼1...p ðn1 ,...,np Þ n¼max ðn1 ,...,np Þ
ð29Þ
ON FUZZY SPATIAL DISTANCES
85
Similarly we may define the minimum of fuzzy numbers as: 8n 0, min ððx1 ,Þ ,ðx2 ,Þ ,:::,ðxp ,Þ ÞðnÞ ¼
sup
min ½min ½ðxi ,Þ ðni Þ, ðxi Þ :
i¼1:::p ðn1 ,:::,np Þ n¼min ðn1 ,:::,np Þ
ð30Þ
Another possibility is to use the fuzzification principle over the -cuts of , which leads to a simpler expression for the maximum of a set of fuzzy numbers over points in a fuzzy set: Z
1
max ðx,Þ ðnÞ ¼ x2
max ðx,Þ ðnÞ d :
ð31Þ
min ðx,Þ ðnÞ d :
ð32Þ
x2
0
Similarly for the minimum, we have: Z
1
min ðx,Þ ðnÞ ¼ x2
0
x2
Similar expressions can be used for any function of fuzzy numbers. Since the nearest point distance, for instance, is simply a minimum over distances from a point to a fuzzy set, the fuzzy minimum taken over points in a fuzzy set leads directly to a fuzzy nearest distance between two fuzzy sets (as a fuzzy number). Similarly the Hausdorff distance can be directly derived from the distance from a point to a fuzzy set using the maximum of fuzzy numbers.
VI. DISTANCE
BETWEEN
TWO FUZZY SETS
We now address the problem of defining distances between two fuzzy sets. The classification we propose considers definitions relying on comparison of membership functions on the one hand, and definitions really taking into account the spatial distance dS on the other hand. Further subdivisions are based on the type of approach and of formalism. We refer to [20] for a comparison of these distances on a concrete example of spatial objects.
86
ISABELLE BLOCH
A. Comparison of Membership Functions In this section we review the main distances proposed in the literature that aim at comparing membership functions. They have generally been proposed in a general fuzzy set framework, and not specifically in the context of image processing. They do not really include information about spatial distances. The classification chosen here is inspired from the one found in [163]. Similar classifications can be found in [47,91,126]. 1. Functional Approach The functional approach is probably the most popular. It relies on a Lp norm between and , leading to the following generic definition [62,97,113]: Z
1=p
dp ð, Þ ¼
jðxÞ ðxÞj
p
,
ð33Þ
x2S
d1 ð, Þ ¼ sup jðxÞ ðxÞj:
ð34Þ
x2S
dp is a pseudometric, while d1 is a metric. In general, dp does not converge towards d1 when p becomes infinite, but it converges towards [113]: dEssSup ð, Þ ¼ inffk 2 R, ðfx, jðxÞ ðxÞj > kgÞ ¼ 0g,
ð35Þ
where l denotes the Lebesgue measure on S. It has been shown that dEssSup is a pseudometric, called essential supremum, and related to d1 by the relation dEssSup d1 . The equality does not hold in the general continuous case (a counter-example can be found in [113]). In the discrete finite case, these definitions become: " dp ð, Þ ¼
X
#1=p jðxÞ ðxÞj
p
,
ð36Þ
x2S
d1 ð, Þ ¼ max jðxÞ ðxÞj: x2S
ð37Þ
In this case, they are all metrics. Therefore, this approach is also called metric based in [91]. A noticeable property of dp is that it takes a constant value if the supports of and are disjoint. In such cases, we have: dp ð, Þ ¼ jj þ j j,
ð38Þ
87
ON FUZZY SPATIAL DISTANCES
where jj denotes the fuzzy cardinality of , and for d1 we have: "
#
d1 ð, Þ ¼ max sup ðxÞ, sup ðxÞ , x2 S
ð39Þ
x2 S
which is equal to 1 if the fuzzy sets are normalized. These equations show that, as soon as the support of and are disjoint, the value taken by their distance is constant, irrespectively of how far the supports are from each other in S. A slightly different version of d1 has been proposed in [47,157], where the distance is normalized by jSj (cardinality of S). This normalization could be applied to any dp as well (for p finite). However, this normalization does not change the properties or the type of information taken into account. It allows an easier link to similarity. Note that these definitions satisfy property P10 (proximity measure, in the sense of [70]) for p finite, and for and being normalized and having a bounded support for d1 . The distance d1 is also called geometrical distance in [47]. However, this definition (as well as the general definition dp ) considers only the geometry of the two fuzzy sets with respect to each other, in terms of shape of the membership function, but does not include the geometry related to dS. The distance d1 has been used in a pyramidal approach in image processing in [109] for recognizing objects based on their attributes. In this example, the fuzzy sets do not represent the objects themselves but fuzzy attributes of the objects. Therefore the spatial information is not taken into account at the level of the distance formulation but is rather included implicitly in the type of features used. Summarizing the properties of the definitions derived from a Lp norm, we get P0–P4, P7 (with a maximum value of jSj for nonnormalized forms and 1 for normalized forms), P8, P10. Properties P5 and P9 do not hold in general. A weaker form of P6 holds: if the supports are disjoints, then the distance is constant. Other forms of distances can be found in this class. For instance, in [126] the following form is proposed (in the finite discrete case): P P jðxÞ ðxÞj jðxÞ ðxÞj : ¼ x2S dð, Þ ¼ Px2S jj þ j j ððxÞ þ ðxÞÞ x2S
ð40Þ
This equation corresponds to a normalization of d1 by the sum of the cardinality of and . Again, its value is constant if the supports of both fuzzy sets are disjoint, the constant being equal to 1.
88
ISABELLE BLOCH
This equation can be generalized by using any Lp norm as: P dð, Þ ¼ P
1=p
x2 S
jðxÞ ðxÞjp
x2 S
ððxÞp þ ðxÞp Þ
1=p :
ð41Þ
It still satisfies property P6. For such a normalization, we do not have P4, P5, P8, P9, and P10. 2. Information Theoretic Approach Based on their definition of fuzzy entropy E( ), de Luca and Termini define a pseudometric as [114]: dð, Þ ¼ jEðÞ Eð Þj,
ð42Þ
with EðÞ ¼ K
X
½ðxÞ log ðxÞ þ ð1 ðxÞÞ log ð1 ðxÞÞ ,
ð43Þ
x2 S
where K is a normalization constant. This distance does not satisfy the separability condition. This can be overcome by considering the quotient space obtained through the equivalence relation Q EðÞ ¼ Eð Þ. However, this is not suitable for image processing. Indeed, since the entropy of a crisp set is zero, two crisp structures in an image belong to the same equivalence class, even if they are completely different. One main drawback of this approach is that the distance is based on the comparison of two global measures performed on and separately: there is nothing linking points of to points of , which is of reduced interest for computing distances. The properties satisfied by this definition are P0, P1, P3, P4, and P10. Entropy functions under similarity [38,59] combine this approach with the membership comparison approach. It has been applied in decision problems (in particular for questionnaires) but to our knowledge not in image processing or other spatial information processing applications. Based on a similar approach, a notion of fuzzy divergence (which can be interpreted as a distance) has been introduced in [11], by mimicking Kullback’s approach [106]: dð, Þ ¼
1 X ½Dx ð, Þ þ Dx ð , Þ jSj x2S
ð44Þ
ON FUZZY SPATIAL DISTANCES
89
with Dx ð, Þ ¼ ðxÞ log
ðxÞ 1 ðxÞ þ ð1 ðxÞÞ log , ðxÞ 1 ðxÞ
and the convention 0=0 ¼ 1. This distance is positive, symmetrical, but does not satisfy the triangular inequality. Moreover, it is always equal to 0 for crisp sets. A slightly different version was then proposed in [10], which solves some undetermination in the computation, by replacing by 1 þ (respectively by 1 þ ) in the logarithms: Dx ð, Þ ¼ ðxÞ log
1 þ ðxÞ 2 ðxÞ þ ð1 ðxÞÞ log : 1 þ ðxÞ 2 ðxÞ
The fuzzy divergence is a proximity measure in the sense of [70] (property P10). It also satisfies P0, P1, P2, P3, P7, and P8. 3. Set Theoretic Approach In this approach, distance between two fuzzy sets is seen as a set dissimilarity function, based on fuzzy union and intersection. Examples are given in [163]. The basic idea is that the distance should be larger if the two fuzzy sets weakly intersect. Most of the proposed measures are inspired from the work by Tversky [151] who proposes two parametric similarity measures between two sets A and B: f ðA \ BÞ f ðA BÞ f ðB AÞ,
ð45Þ
and in a rational form: f ðA \ BÞ , f ðA \ BÞ þ f ðA \ BÞ þ f ðB \ AÞ
ð46Þ
where f ðXÞ is typically the cardinality of X, , , and are parameters leading to different kinds of measures, and B denotes the complement of B. Let us mention a few examples (they are given in the finite discrete case). A measure being derived from the second Tversky measure by setting ¼ ¼ 1 has been used by several authors [47,55,61,91,126,158,163]: P min ½ðxÞ, ðxÞ : dð, Þ ¼ 1 P x2S max ½ðxÞ, ðxÞ x2S
ð47Þ
90
ISABELLE BLOCH
This distance is a semimetric, and always takes the constant value 1 as soon as the two fuzzy sets have disjoint supports. It also corresponds to the Jaccard index [55]. With respect to the typology presented in [37], this distance is a comparison measure, and more precisely a dissimilarity measure (see Section III.D). Moreover, 1 d is a resemblance measure. Applications in image processing can be found, for example, in [156], where it is used on fuzzy sets representing objects features (and not directly spatial image objects) for structural pattern recognition on polygonal 2D objects. Equation (47) can be generalized by replacing the min by any t-norm t and the max by any t-conorm T: P t ½ðxÞ, ðxÞ dð, Þ ¼ 1 P x2S : x2S T½ðxÞ, ðxÞ
ð48Þ
However, properties P1 and P2 hold only for the min and max, while property P6 holds for the minimum and product t-norms, and the dual tconorms (but not for Lukasiewicz ones for instance). Properties P0, P3, and P7 are satisfied. Properties P4, P5, P8, P9, and P10 are not. A slightly different formula has been proposed in [157], which, however, translates a similar idea: dð, Þ ¼ 1
1 X min ½ðxÞ, ðxÞ jSj x2S max ½ðxÞ, ðxÞ
ð49Þ
with the convention 0=0 ¼ 1. It is a semimetric. It takes the constant value 1 if the two fuzzy sets have disjoint supports, without any other condition on their relative position in the space. Again this expression can be generalized as: dð, Þ ¼ 1
1 X t½ðxÞ, ðxÞ jSj x2S T½ðxÞ, ðxÞ
ð50Þ
for any t-norm t and t-conorm T. But in general property P6 is not satisfied and reflexivity (P1) holds only for min and max. The following modified version has been proposed in [53]: dð, Þ ¼ 1
1 jSuppðÞ [ Suppð Þj
X x2SuppðÞ[Suppð Þ
t½ðxÞ, ðxÞ T½ðxÞ, ðxÞ
ð51Þ
which satisfies P6. It also satisfies P0, P3, and P7. Properties P1 and P2 are satisfied for the t-norm min and the t-conorm max.
ON FUZZY SPATIAL DISTANCES
91
Another measure takes into account only the intersection of the two fuzzy sets [47,91,163]: dð, Þ ¼ 1 max min ½ðxÞ, ðxÞ : x2S
ð52Þ
It is a semipseudometric if the fuzzy sets are normalized. Again it is a dissimilarity measure, and 1 d is a resemblance measure. It is always equal to 1 if the supports of and are disjoint. This definition can be generalized to [89]: dð, Þ ¼ 1 max t½ðxÞ, ðxÞ , x2S
ð53Þ
where t is any t-norm. It satisfies P0, P3, and P7. Property P6 is satisfied for the minimum and the product. Property P1 is satisfied for normalized fuzzy sets. If we set ðu ÞðxÞ ¼ max ½min ððxÞ, 1 ðxÞÞ, min ð1 ðxÞ, ðxÞÞ , two other distances can be derived, as [91,163]: dð, Þ ¼ sup ðu ÞðxÞ,
ð54Þ
x2S
dð, Þ ¼
X
ðu ÞðxÞ:
ð55Þ
x2S
These two distances are symmetrical measures (P3). They are separable (P2) only for binary sets. Also we have d(, ) ¼ 0 (P1) only for binary sets. They are dissimilarity measures. The first one is equal to 1 if and have disjoint supports and are normalized (if they are not normalized, then this constant value is equal to the maximum membership value of and ). The second measure is always equal to jj þ j j if and have disjoint supports. These measures actually rely on measures of inclusion of each fuzzy set in the other. Indeed, an inclusion index can be defined as [29,144]: I ð, Þ ¼ inf T½ðxÞ, 1 ðxÞ , x2S
ð56Þ
where T is a t-conorm. Since the distance should be small if the two sets have a small degree of equality (the equality between and can be expressed by ‘‘ included in and included in ,’’ which leads to an easy transposition to fuzzy equality), a distance may be defined from an inclusion degree as: dð, Þ ¼ 1 min ½I ð, Þ, I ð , Þ :
ð57Þ
92
ISABELLE BLOCH
By taking T ¼ max, we recover the definition derived from ðu Þ. This approach has been used in [6,158]. Other choices of T may lead to different properties of d. For instance, if T is taken as the Lukasiewicz t-conorm (bounded sum), then ðu ÞðxÞ ¼ jðxÞ ðxÞj. Therefore we have: sup ðu ÞðxÞ ¼ d1 ð, Þ,
ð58Þ
x2S
and X
ðu ÞðxÞ ¼ d1 ð, Þ:
ð59Þ
x2S
In this case, both distances are metrics in the discrete finite case. These measures have been applied in image processing for image databases applications in [91]. Other inclusion indexes can be defined, e.g., from Tversky measure by setting ¼ 1 and ¼ 0, leading to f ðA \ BÞ=f ðAÞ [55]. The last definitions given by Equations (52) and (54) are, respectively, equivalent to 1 ð; Þ and 1 min ½Nð; Þ, Nð ; Þ (where and N are possibility and necessity functions) used in fuzzy pattern matching [42,65], which has a large application domain, including image processing (see, e.g., [93]). The possibility is symmetrical in and and corresponds to a degree of intersection. The necessity is not symmetrical and corresponds to a degree of inclusion. It can be useful for instance if we want to compare an object to a model. For instance, if the object is only a substructure, it makes sense to consider its degree of inclusion in the model object. On the contrary if the object groups several structures, then the degree of inclusion of the model in the object is meaningful. In cases where a direct comparison is possible, a symmetrical expression as 1 min ½Nð; Þ,Nð ; Þ is appropriate. A further interest of this approach is that it allows one to evaluate the distance not only as a number, but as in interval such as ½1 , 1 N , which provides more information than only one of these two numbers. It is interesting to note that the necessity and the possibility are related to fuzzy mathematical morphology, since ð; Þ corresponds to the dilation of by at origin, while Nð; Þ corresponds to the erosion of by at origin. These definitions can be straightforwardly generalized to fuzzy union and intersection derived from t-norms and t-conorms, leading to a correspondence with other forms of fuzzy mathematical morphology [29]. Such generalizations using any t-norm and t-conorm for set relationships can be done for all definitions presented in this section.
ON FUZZY SPATIAL DISTANCES
93
4. Pattern Recognition Approach This approach consists in first expressing each fuzzy set in a feature space (e.g., cardinality, moments, skewness) and to compute the Euclidean distance between two feature vectors [163] or attribute vectors [145]. This approach may take advantage of some of the previous approaches, for instance by using entropy or similarity in the set of features. It has been applied for instance for database applications [145]. A similar approach, called signal detection theory, has been proposed in [91]. It is based on counting the number of similar and different features. A particular form of distances between attributes can be found in [47], where the distance is defined from vectorial representations a and b as: 1
ab : max ða a, b bÞ
ð60Þ
This form is very close to correlation-based approaches, such as the one described in [81,157]: X
½ðxÞ ðxÞ þ ð1 ðxÞÞð1 ðxÞÞ
x2S
dð, Þ ¼ 1 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : X X ½ðxÞ2 þ ð1 ðxÞÞ2 ½ ðxÞ2 þ ð1 ðxÞÞ2 x2S
ð61Þ
x2S
This expression is symmetrical (property P3), reflexive (property P1), and satisfies the separability property P2. It does not satisfies P6. Properties P0, P7, and P10 are satisfied. The Bhattacharya distance [61] can also be attached to this class. It is defined as: Z sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi #1=2 ðxÞ ðxÞ : dð, Þ ¼ 1 dx jj j j S "
ð62Þ
It has been used in image processing for classification in satellite images in [119]. B. Accounting for Spatial Distances The second class of methods tries to include the spatial distance dS in the distance between and . In contrast to the definitions given in Section VI.A, in this second class the membership values at different points of S are linked using some formal computation, making the introduction of
94
ISABELLE BLOCH
dS possible. This leads to definitions that do not share the drawbacks of previous approaches, for instance when the supports of the two fuzzy sets are disjoint. 1. Geometrical Approach The geometrical approach consists in generalizing one of the distances between crisp sets. This has been done for instance for nearest point distance [62,138], mean distance [138], Hausdorff distance [62], and could easily be extended to other distances (see, e.g., [31] for a review of crisp set distances). These generalizations follow four main principles. The first one consists in considering fuzzy sets in a n dimensional space as n þ 1 dimensional crisp sets and then in using classical distances [82]. However, this is often not satisfactory in image processing because the n dimensions of S and the membership dimension (values in [0, 1]) have completely different interpretations, and treating them in a unique way is questionable. The second principle is a fuzzification principle (see Section III.D): let D be a distance between crisp sets, then its fuzzy equivalent is defined by: Z dð, Þ ¼
1
Dð , Þ d ,
ð63Þ
0
or by a discrete sum if the fuzzy membership functions are piecewise constant [60,163] ( denotes the -cut of ). In this way, d(, ) inherits the properties of the chosen crisp distance. Another way to consider the fuzzification principle consists in using a double integration (see Section III.D). However, using this double fuzzification, some properties of the underlying distance may be lost. The third principle consists in weighting distances by membership values. For the average distance this leads for instance to [138]: P dð, Þ ¼
P x2S
P
dS ðx, yÞ min ½ðxÞ, ð yÞ P : y2S min ½ðxÞ, ð yÞ
y2S x2S
ð64Þ
The last approach consists in defining a fuzzy distance as a fuzzy set on R þ instead of as a crisp number using the extension principle (see Section III.D). For the nearest point distance this leads to [138]: dð, ÞðrÞ ¼
sup
x,y,dS ðx,yÞr
which is actually a distance distribution.
min ½ðxÞ, ðyÞ ,
ð65Þ
ON FUZZY SPATIAL DISTANCES
95
A similar approach has been used in [120], and the corresponding distance density is expressed as: dð, ÞðrÞ ¼
sup
x,y,dS ðx,yÞ¼r
min ½ðxÞ, ðyÞ :
ð66Þ
The Hausdorff distance is probably the distance between sets, the fuzzy extension of which has been the most widely studied. One reason for this may be that it is a true metric in the crisp case, while other set distances like minimum or average distances have weaker properties. Another reason is that it has been used to determine a degree of similarity between two objects, or between an object and a model [88]. Extensions of this distance have been defined using fuzzification over the -cuts and using the extension principle [39,45,57,133,134,163]. One potential problem with these approaches occurs in the case of empty -cuts [40,71]. Boxer [39] proposed to add a crisp set to every set, but the result is highly dependent on this additional set, and does not reduce to the classical Hausdorff distance when applied on crisp sets. The solution proposed in [71] consists in clipping the distance at some maximum distance, but similar problems arise. Other authors use the Hausdorff distance between the endographs of the two membership functions [57] (which corresponds to the first principle mentioned above). Several generalizations of Hausdorff distance have also been proposed under the form of fuzzy numbers [62]. Extensions of the Hausdorff distance based on fuzzy mathematical morphology have also been developed [14] and are presented in the next section. Extensions of these definitions may be obtained by using other weighting functions, for instance by using t-norms instead of min. These distances share most of the advantages and drawbacks of the underlying crisp distance [31]: computation cost can be high (it is already high for several crisp distances); moreover, interpretation and robustness strongly depend on the chosen distance (for instance, the Hausdorff distance is noise sensitive, whereas the average distance is not).
2. Morphological Approach We proposed in [14,20] original approaches for defining fuzzy distances taking into account spatial information, which are based on fuzzy mathematical morphology. They are summarized in the following. These definitions are obtained by a direct translation of crisp equations expressing distances in terms of mathematical morphology into fuzzy ones (see Section III.D). We just give the examples of nearest point distance and Hausdorff distance.
96
ISABELLE BLOCH
In the binary case, for n > 0, the nearest point distance can be expressed in morphological terms as: dN ðX, YÞ ¼ n Q Dn ðXÞ \ Y 6¼ ;
and
Dn1 ðXÞ \ Y ¼ ;
ð67Þ
and the symmetrical expression. For n ¼ 0 we have: dN ðX, YÞ ¼ 0 Q X \ Y 6¼ ;:
ð68Þ
The translation of these equivalences provides, for n >0, the following distance density: "
"
0
N ð, ÞðnÞ ¼ t sup t½ x2S
0
ðxÞ, Dn ðÞðxÞ , c
## sup t½ x2S
0
ðxÞ, Dn1 ðÞðxÞ
ð69Þ
or a symmetrical expression derived from this one, and: N ð, 0 Þð0Þ ¼ sup t½ðxÞ, 0 ðxÞ :
ð70Þ
x2S
This expression shows how the membership values to 0 are included, without involving the extension principle. As for the nearest point distance, we can extend the Hausdorff distance by translating directly the binary equation defining the Hausdorff distance: "
#
dH ðX, YÞ ¼ max sup dðx, YÞ, sup dðy, XÞ : x2X
ð71Þ
y2Y
This distance can be expressed in morphological terms as: dH ðX, YÞ ¼ inffn, X Dn ðYÞ and Y Dn ðXÞg:
ð72Þ
From Equation (72), a distance distribution can be defined, by introducing fuzzy dilation: " # H ð, 0 ÞðnÞ ¼ t inf T½Dn ðÞðxÞ, cð0 ðxÞÞ , inf T½Dn ð0 ÞðxÞ, cððxÞÞ , ð73Þ x2S
x2S
where c is a complementation, t a t-norm and T a t-conorm. A distance density can be derived implicitly from this distance distribution. A direct definition of a distance density can be obtained from: dH ðX, YÞ ¼ 0 Q X ¼ Y,
ð74Þ
97
ON FUZZY SPATIAL DISTANCES
and for n > 0: dH ðX, YÞ ¼ n Q X Dn ðYÞ and Y Dn ðXÞ
and X 6 Dn1 ðYÞ or Y 6 Dn1 ðXÞ :
ð75Þ
Translating these equations leads to a definition of the Hausdorff distance between two fuzzy sets and 0 as a fuzzy number: " # H ð, 0 Þð0Þ ¼ t inf T½ðxÞ, cð0 ðxÞÞ , inf T½0 ðxÞ, cððxÞÞ , x2S
ð76Þ
x2S
" 0
H ð, ÞðnÞ ¼ t inf T½Dn ðÞðxÞ, cð0 ðxÞÞ , inf T½Dn ð0 ÞðxÞ, cððxÞÞ , x2S
x2S
!#
0 0 n1 T sup t½ðxÞ, cðDn1 ð ÞðxÞÞ , sup t½ ðxÞ, cðD ðÞðxÞÞ : ð77Þ x2S
x2S
The above definitions of fuzzy nearest point and Hausdorff distances (defined as fuzzy numbers) between two fuzzy sets do not necessarily share the same properties as their crisp equivalent. This is due in particular to the fact that, depending on the choice of the involved t-norms and t-conorms, excluded-middle and noncontradiction laws may not be satisfied.
All distances are positive, in the sense that the defined fuzzy numbers have always a support included in R þ . By construction, all defined distances are symmetrical with respect to and 0 (P3). The separability property (P2) is not always satisfied. For the Hausdorff distance, H ð, 0 Þð0Þ ¼ 1 implies ¼ 0 for T being the bounded sum (Tða, bÞ ¼ min ð1, a þ bÞ), while it implies and 0 crisp and equal for T ¼ max . As for property P1 (reflexivity), if is normalized, we have for the nearest point distance N ð, Þð0Þ ¼ 1 and N ð, ÞðnÞ ¼ 0 for n > 1. Also the triangular inequality is not satisfied in general.
Another morphological approach has been suggested in [146], based on links between the minimum distance and the Minkowski difference. In the crisp case, we have: dN ðX, YÞ ¼ inf fjzj, z 2 YX g,
ð78Þ
if X and Y are nonintersecting crisp sets. In order to account for possible intersection between the two sets, the authors introduce also the notion of
98
ISABELLE BLOCH
penetration distance, defined along a direction as the maximum translation of X along such that X still meets Y: ð; X, YÞ ¼ max fk, ðX þ kÞ \ Y 6¼ ;g:
ð79Þ
The extension to fuzzy sets is done by assuming fuzzy numbers on each axis. This leads to reasonable computation time, but can unfortunately not be directly extended to any fuzzy objects. Finally, we propose a new definition in the discrete case based on links with mathematical morphology and more operational from a computational point of view. It relies on the idea of distance transform, that assigns to each point of S the distance to some object. From this distance transform, the nearest point distance between two sets can be easily computed. In the crisp case, this distance transform can be computed by a dilation by a conic structuring element , defined as: dðx, OÞ 8x 2 S, ðxÞ ¼ max 0, 1 k
! ð80Þ
where O is the origin and k a constant used to limit to support of the structuring element to the maximal distance of interest. It is easy to prove in the crisp case that: D ðXÞðxÞ ¼ 1
dðx, XÞ k
ð81Þ
i.e., the distance from x to X is directly linked to the dilation of X by at x. The minimum distance between X and Y is then given by: dN ðX, YÞ ¼ min ð1 D ðXÞðxÞÞk:
ð82Þ
y2Y
Now we apply similar formulas in the fuzzy case in order to define a fuzzy nearest point distance. We therefore dilate a fuzzy set by the conic structuring element : "
dðy x, OÞ D ðÞðxÞ ¼ sup min ðyÞ, max 0, 1 k y2S It is easy to show that this dilation preserves the core, i.e.: CoreðD ðÞÞ ¼ CoreðÞ:
!# :
ð83Þ
ON FUZZY SPATIAL DISTANCES
99
This shows that the points at distance 0 from are exactly the points of the core. From this dilation we define the nearest point distance between two fuzzy sets and 0 as: dN ð, 0 Þ ¼ sup min0 ð1 D ðÞðxÞÞk,
ð84Þ
2½0,1 x2
or a symmetrical expression by exchanging the roles of and 0 (this allows one to obtain a symmetrical distance satisfying P3). This defines the distance as a positive number (not a fuzzy number). Moreover, we have dN ð, Þ ¼ 0 (P1) and dN ð, 0 Þ ¼ 0 iff CoreðÞ \ Coreð0 Þ 6¼ ;, which is a similar property as in the crisp case for this distance (but weaker than P2). Property P0 is obviously satisfied. The Hausdorff distance can be defined as a number in a similar way: " # dH ð, 0 Þ ¼ sup max max0 ð1 D ðÞðxÞÞk, max ð1 D ð0 ÞðxÞÞk : ð85Þ 2½0,1
x2
x2
This expression defines a positive number. The obtained distance satisfies P0 and P3. We do not have P1 and P2, but only dð, 0 Þ ¼ 0 iff ¼ 0 and are crisp. 3. Tolerance-Based Approach This approach has been developed in [113]. The basic idea is to combine spatial information and membership values by assuming a tolerance value , indicating the differences that can occur without saying that the objects are no longer similar. The proposed definitions are semipseudometrics and are derived from the functional approach (see Section IV.A). The authors first define a local difference between and at a point x of S as: dx ð, Þ ¼
inf
y,z2Bðx,Þ
jðyÞ ðzÞj,
ð86Þ
where B(x, ) denotes the (spatial) closed ball centered at x of radius . Then the functions dp, d1, and dEssSup are defined up to a tolerance as: Z
1=p p dp ð, Þ ¼ ½dx ð, Þ dx , ð87Þ S
d1 ð, Þ ¼ sup dx ð, Þ,
ð88Þ
dEssSup ð, Þ ¼ inffk 2 R, ðfx 2 S, dx ð, Þ > kgÞ ¼ 0g:
ð89Þ
x2S
100
ISABELLE BLOCH
Several results are proved in [113], in particular about convergence: dp ð, Þ ð, Þ when p goes to infinity, all pseudometrics are converges towards dEssSup decreasing with respect to , and converge towards dp, d1, and dEssSup when becomes infinitely small, for continuous fuzzy sets. This approach has been extended in [111], by allowing the neighborhood around each point to depend on the point. Note that this approach has strong links with morphological approaches, since the neighborhood considered around each point can be considered as a structuring element. This approach has been illustrated on an example of noisy character recognition. 4. Graph Theoretic Approach A similarity function between fuzzy graphs may also induce a distance between fuzzy sets. This approach contrasts with the previous ones, since the objects are no longer represented directly as fuzzy sets on S or as vectors of attributes, but as higher level structures. Fuzzy graphs in image processing can be used for representing objects, as in [116], or a scene, as in [100]. In the first case, nodes are parts of the objects and arcs are links between these parts. In the example presented in [116] for character recognition, nodes are fuzzy sets representing features of a character, extracted by some image processing. In the second case, nodes are objects of the scene and arcs are relationships between these objects. In the example of [100], the nodes represent clouds extracted from satellite images. These two examples use different ways to consider distances (or similarity) between fuzzy graphs. In [116], the distance is defined from a similarity between nodes and between arcs (both being fuzzy sets), given a correspondence between nodes (respectively between arcs). The similarity used compares only membership functions, using a set theoretic approach (see Section VI.A) and corresponds to Equation (47). Although it has not been considered in this reference, spatial distance can then be taken into account if we include it in the attribute set. This idea is probably worth further development. In a similar way, several distances between graphs have been proposed as an objective function to find the correspondence between graphs. This function compares attributes of nodes of the two graphs to be matched, and attributes of arcs. One of the main difficulties is dealing with nonbijective matching. This has been addressed for instance in [7,43,127], where a formalism for defining fuzzy morphisms between graphs is proposed, as well as optimization methods for finding the best morphism according to an objective function including spatial distance information as an edge attribute.
ON FUZZY SPATIAL DISTANCES
101
Another way to consider distances between objects is in terms of cost of deformations to bring one set in correspondence with the other. Such approaches are particularly powerful in graph-based methods. The distance can then be expressed as the cost of the matching of two graphs, as done in [100] for image processing applications, or as the Levensthein distance accounting for the necessary transformations (insertions, substitutions, deletions) for going from the structural representation of one shape to the representation of the other [54]. In [100], the fuzzy aspect is taken into account as weighting factors, therefore the method is quite close of the weighted Levensthein distance of [54]. Spatial distances could also be introduced as one of the relationships between objects in these approaches. A distance between conceptual graphs is defined in [115], as an interval [N, ] where N represents the necessity and the possibility, obtained by a fuzzy pattern matching approach. Although the application is not related to image processing, the idea of expressing similarity as an interval is interesting and could certainly be exploited in other domains. A second interest of this approach is that the nodes of the graph are concepts, which could be (although not explicitly mentioned in this reference) represented as fuzzy sets (like linguistic variables). Although these examples are still far from the main concern of this chapter, it is worth mentioning them, since they bring an interesting structural aspect that could be further developed. 5. Histogram of Distances Until now we have considered the problem of evaluating a specific distance (nearest point, Hausdorff, etc.) between two given fuzzy sets. Another question is to check if two fuzzy sets satisfy a distance property, expressed for instance in linguistic terms. To answer such questions, we propose here a new approach, which consists in expressing all distance information between the two objects as a fuzzy set of the positive real line, and to compare this fuzzy set to a fuzzy set expressing the semantics of the distance property to be checked, using a fuzzy pattern matching approach [42,65]. This idea is inspired from previous work on directional position [16,19,99,122]. The complete distance information between the two objects is encoded in a distance histogram, and the pattern matching provides an evaluation as an interval. We first define the histogram of distances between two crisp sets X and Y as: 8d 2 Rþ , HðX, YÞðdÞ ¼ jfðx, yÞ, x 2 X, y 2 Y, dS ðx, yÞ ¼ dgj:
ð90Þ
102
ISABELLE BLOCH
In the finite case, H(X, Y)(d) is equal to 0 outside a bounded interval [d1, d2]. From this histogram, it is possible to recover several distances between X and Y: dN ðX, YÞ ¼ min fd, HðX, YÞðdÞ 6¼ 0g ¼ d1 ; dM ðX, YÞ ¼ max fd, HðX, YÞðdÞ 6¼ 0g ¼ d2 ; d2 X dHðX, YÞðdÞ
dA ðX, YÞ ¼
d1 d2 X
d2 X dHðX, YÞðdÞ
¼
d1
jXkYj
:
HðX, YÞðdÞ
d1
The Hausdorff distance cannot be obtained directly from the histogram. A normalized version of this histogram (by dividing each value by jXkYj or alternatively by max d2Rþ HðdÞ) allows us to consider it as a fuzzy set carrying all distance information between the two objects. The properties of H are: H is symmetrical in X and Y: 8d 2 Rþ , HðX, YÞðdÞ ¼ HðY, XÞðdÞ; if X ¼ Y, then d1 ¼ 0 and HðX, YÞðd1 Þ ¼ jXj ¼ jYj; if d1 ¼ 0 then X \ Y 6¼ ; (we recognize here a property of the minimum distance).
This idea can be extended to the distance histogram between fuzzy objects and by weighting the contribution of each point by its membership value: 8d 2 Rþ , Hð, ÞðdÞ ¼
X
min ½ðxÞ, ðyÞ :
ð91Þ
ðx,yÞ2S 2 ,dS ðx,yÞ¼d
The sum in this equation is actually limited to the points of the supports of and of , respectively, and is therefore finite. Any t-norm could also be used instead of the min. Again the normalization in [0, 1] of this histogram leads to an appropriate interpretation as a fuzzy set representing the distance information between and . The properties of H in the fuzzy case are similar as in the crisp case: H is symmetrical in andP : 8d 2 Rþ , Hð, ÞðdÞ ¼ Hð , ÞðdÞ; if ¼ , then Hð, Þð0Þ ¼ x min ððxÞ, ðxÞÞ ¼ jSuppðÞj ¼ jSuppð Þj; if d1 ¼ 0 then SuppðÞ \ Suppð Þ 6¼ ;.
ON FUZZY SPATIAL DISTANCES
103
Now if we want to evaluate the satisfaction of a distance relationship between two objects, such as ‘‘near,’’ ‘‘far,’’ and ‘‘very far’’ we can compare the normalized histogram with the fuzzy set expressing the semantics of the desired distance value, denoted by dist where dist denotes any of the linguistic values (near, far, etc.). For instance for d 2 Rþ , near ðdÞ represents the degree to which d is considered as a near distance value. This comparison can be done using a compatibility approach, as in [122] for directional position, or using a fuzzy pattern matching approach. We detail here this second possibility. Note that all information is encoded on R þ , and the comparison is done in the same space. In Section VII, we will address similar problems, but directly in the spatial domain S. The pattern matching between dist and the normalized histogram H( , ) provides an evaluation of the relation dist as two numbers, the necessity N and the possibility , defined as:
N ¼ infþ T½1 dist ðdÞ, Hð, ÞðdÞ ,
ð92Þ
¼ sup t½dist ðdÞ, Hð, ÞðdÞ ,
ð93Þ
d2R
d2Rþ
where T is a t-conorm and t a t-norm. The value N represents the degree of inclusion of dist in H( , ), i.e., the degree to which the relation dist is a part of the distance relationships between and . The value represents the degree of intersection of dist and H( , ), i.e., the degree to which dist is compatible with the distance relationships between and . The length of the interval [N, ] represents the ambiguity of the relation. Extremal values for N and are obtained in the following situations: ¼ 1 iff Coreðdist Þ \ CoreðHð, ÞÞ 6¼ ; (where CoreðÞ denotes the set of modal values of , i.e., having a membership value equal to 1); if Suppðdist Þ \ SuppðHð, ÞÞ ¼ ;, then ¼ 0, the reverse implication being true for some t-norms, such as min and product; if Suppðdist Þ \ Suppð1 Hð, ÞÞ ¼ ;, then N ¼ 1, the reverse implication being true for some t-conorms, such as max and algebraic sum; N ¼ 1 iff Coreðdist Þ \ Coreð1 Hð, ÞÞ 6¼ ;.
104
ISABELLE BLOCH
VII. SPATIAL REPRESENTATIONS
OF
DISTANCE INFORMATION
In this section we propose to represent distance information with respect to an object as a spatial fuzzy set, following the framework proposed in [22] for spatial representations of spatial information of various types. A. Spatial Fuzzy Sets as a Representation Framework The main idea here is to translate spatial information or knowledge as a spatial representation, and more precisely as a spatial fuzzy set representing the degree to which distance relationships to a reference object are satisfied at each point of the space S. Such representations can also be derived for very heterogeneous types of knowledge and then used in a fusion process that combines all these fuzzy regions of interest in order to focus attention by reducing the search space and to restrict it to the area that satisfies most relationships as proposed in [22,25]. This type of situation occurs, for instance, if we want to exploit spatial knowledge for guiding recognition of an object or to perform spatial reasoning. Such knowledge is generally heterogeneous: it may concern the object we are looking at (its shape, topology, color, position), or relationships to other objects (distances, adjacency, relative directional position). It may be generic (typically if derived from a model or from expert knowledge), or factual (if derived from the scene itself ). And it may be usually provided in a lot of different forms. Classically it can be a number, a distribution, or a binary value. But we can also be concerned with imprecise values and with propositional formulas which are often used by experts within a given application. Imprecise values are expressed sometimes in linguistic terms: for instance the expected distance between two objects (‘‘close,’’ ‘‘far,’’ etc.). They can also be expressed as an interval as mentioned previously. Therefore the proposed framework allows one to have representations of different pieces of information in the same domain and is therefore suitable to translate heterogeneous knowledge in a useable form for reasoning. In the following a point (volume element or voxel) in the 3D discrete space S is denoted by . For each piece of knowledge, we consider its ‘‘natural expression,’’ i.e., the usual form in which it is given or available, and translate it into a spatial fuzzy set in the space, the membership of which is denoted by: knowledge :
S
! °
½0, 1 knowledge ðÞ:
ð94Þ
ON FUZZY SPATIAL DISTANCES
105
In this representation, each piece of knowledge becomes a fuzzy region of the space. If the knowledge is considered as a constraint to be satisfied by the object to be recognized, this fuzzy region represents a search area or a fuzzy volume of interest for this object. This type of representation provides a common framework to represent pieces of information of various types (objects, spatial imprecision, relationships to other objects, etc.). Therefore the fuzzy regions defined in the space S corresponding to these pieces of information may have different semantics. Moreover, this common framework allows the combination of this heterogeneous information, as stated previously. The numerical representation of membership values assumes that we can assign numbers that represent degrees of satisfaction of a relationship for instance. These numbers can be derived from prior knowledge or learned from examples, but usually there remain some quite arbitrary choices. This might appear as a drawback in comparison to propositional representations. However, it is not necessary to have precise estimations of these values, and experimentally we observed a good robustness with respect to these estimations, in various problems like information fusion, object recognition, and scene interpretation [32,80]. This can be explained by two reasons: first, the fuzzy representations are used for rough information and therefore do not have to be precise itself; and second, several pieces of information are usually combined in a whole reasoning process, which decreases the influence of each particular value (of individual information). Therefore, the chosen numbers are not crucial. What is important is that ranking is preserved. For instance, if a region of the space satisfies a relationship to some objects to a higher degree than another region, then this ranking is preserved in the representation for all relationships described in the following sections, assuming the existence of ranking is reasonable for the type of relations we consider.
B. Spatial Representation of Distance Knowledge to a Given Object We apply now the previous idea to translate expressions of knowledge about distances into spatial volumes of interest within S, taking into account imprecision and uncertainty, in order to account for approximate statements where distances can be expressed as numbers, but also intervals, fuzzy numbers, linguistic values, etc. In contrast to the approach proposed in [78,79] where linguistic variables about distances are represented as fuzzy sets on each axis, from which distance knowledge in the space can be derived, we choose here to represent distance knowledge directly in the space S, as spatial
106
ISABELLE BLOCH
fuzzy sets. The method we propose is independent of the dimension of S and uses morphological expressions of distances [20], as detailed in Section VI. We assume that a set A is known as one already recognized object, or a known area of S, and that we want to determine B, subject to satisfying some distance relationship with A. According to the algebraic expressions of distances, dilation of A is an adequate tool for this. Let us consider the following different cases: If knowledge expresses that dN ðA, BÞ ¼ n, then the border of B should intersect the region defined by Dn ðAÞ n Dn1 ðAÞ, which is made up of the points exactly at distance n from A, and B should be looked for in Dn1 ðAÞC (the complement of the dilation of size n 1). If knowledge expresses that dN ðA, BÞ n, then B should be looked for in AC, with the constraint that at least one point of B belongs to Dn ðAÞ n A. If knowledge expresses that dN ðA, BÞ n, then B should be looked for in Dn1 ðAÞC . If knowledge expresses that n1 dN ðA, BÞ n2 , then B should be searched in Dn1 1 ðAÞC with the constraint that at least one point of B belongs to Dn2 ðAÞ n Dn1 1 ðAÞ.
The constraints on the border lead to the definition of actually two fuzzy sets, one for constraining the object and one constraining its border. However, they can be avoided by considering both minimum and Hausdorff distances, expressing for instance that B should lay between a distance n1 and a distance n2 of A, which is a typical type of knowledge we may have in concrete problems. Therefore, the minimum distance should be greater than n1 and the Hausdorff distance should be less than n2. In this case, the volume of interest for B is reduced to Dn2 ðAÞ n Dn1 1 ðAÞ. In cases where imprecision has to be taken into account, fuzzy dilations are used, with the corresponding equivalences with fuzzy distances [20,29]. The extension to approximate distances calls for fuzzy structuring elements. We define these structuring elements through their membership function on S. Structuring elements with a spherical symmetry can typically be used, where the membership degree only depends on the distance to the center of the structuring element. Let us consider the generalization to the fuzzy case of the last case (minimum distance of at least n1 and Hausdorff distance of at most n2 to a fuzzy set ). Instead of defining an interval [n1, n2], we consider a fuzzy interval, defined as a fuzzy set on R þ having a core equal to the interval [n1, n2]. The membership function n is increasing between 0 and n1 and
ON FUZZY SPATIAL DISTANCES
107
decreasing after n2 (this is but one example). Then we define two structuring elements as: ( 1 ðÞ ¼
1 n ðdS ð, 0ÞÞ
if dS ð, 0Þ n1
0
otherwise
( 2 ðÞ ¼
1
if dS ð, 0Þ n2
n ðdS ð, 0ÞÞ
otherwise
ð95Þ
ð96Þ
where dS is the Euclidean distance in S and O the origin. The spatial fuzzy set expressing the approximate relationship about distance to is then defined as: distance ¼ t½D 2 ðÞ, 1 D 1 ðÞ
ð97Þ
if n1 6¼ 0, and distance ¼ D 2 ðÞ if n1 ¼ 0. The increasing nature of fuzzy dilation with respect to both the set to be dilated and the structuring element [29] guarantees that these expressions do not lead to inconsistencies. Indeed, we have 1 2 , 1 ð0Þ ¼ 2 ð0Þ ¼ 1, and therefore D 1 ðÞ D 2 ðÞ. In the case where n1 ¼ 0, we do not have 1 ð0Þ ¼ 1 any longer, but in this case, only the dilation by 2 is considered. This case corresponds actually to a distance to less than ‘‘about n2.’’ These properties are indeed expected for representations of distance knowledge. Figure 6 illustrates this approach. The two structuring elements 1 and 2 are derived from a fuzzy interval n, are used for dilation of an object on the left (buildings extracted from a map), and distance is computed to represent the approximate knowledge about the distance to this object. This resulting fuzzy set represents the area of the space satisfying (to some degree) the relation of semantics n to the building. From an algorithmic point of view, fuzzy dilations may be quite heavy if the structuring element has a large support. However, in the case of crisp objects and structuring elements with spherical symmetry, fast algorithms can be implemented. The distance to the object A is first computed using chamfer algorithms [35]. It defines a distance map in S, which gives the distance of each voxel to object A. This discrete distance can be made as precise as necessary [117]. Then the translation into a fuzzy volume of interest is made according to a simple look-up table derived from n. This algorithm has a linear complexity in the cardinality of S.
108
ISABELLE BLOCH
3 4
5 2
6
1.0 Membership values
1 7
8
0.8 0.6 0.4 0.2 0 0
10
20 d(v, 0)
30
39
FIGURE 6. Buildings extracted from a map, membership function n, structuring elements 1 and 2, dilation of building 1 with these two structuring elements, and representation of distance (darker gray levels indicate higher membership values).
VIII. QUALITATIVE DISTANCE
IN A
SYMBOLIC SETTING
In this section we consider distance information in a symbolic setting, using formal logics. This point has not been much addressed in the literature, contrary to other types of relationships such as topological ones. It is, however, useful if no quantitative (even in imprecise form) is available, but only purely qualitative information, and it allows for symbolic reasoning because of the logical apparatus. In the context of mereotopology, relative distance information has been modeled as a ternary relation (4). A predicate Closer(x, y, z) reads ‘‘x is closer to y than z’’ and defines a strict order on pairs of spatial entities (x, y) and (x, z) (not necessarily reduced to points). It also induces an equidistance relation. Several axioms and properties are introduced in [4], which allow one to include this notion in reasoning schemes. Here, we take a different point of view, and propose to model distance information between two spatial entities expressed as logical formulas. We show that mathematical morphology can be defined on logical formulas and that dilations and erosions lead to definition of a modal logic which is suitable for spatial reasoning [23,24].
ON FUZZY SPATIAL DISTANCES
109
A. Morpho-Logics In this section we express morphological operations in a symbolic framework, using logical formulas. Let us first introduce some notations. Let PS be a finite set of propositional symbols. The language is generated by PS and the usual connectives, to which we will add modal operators in the following. Well-formed formulas will be denoted by Greek letters ’, . Kripke’s semantics is used. Worlds will be denoted by !, !0 and the set of all worlds by . Modð’Þ ¼ f! 2 j ! ’g is the set of all worlds where ’ is satisfied. The underlying idea for constructing morphological operations on logical formulas (as presented in [27]) is to consider set interpretations of formulas and worlds. Since in classical propositional logics the set of formulas is isomorphic to 2, i.e., knowing a formula is equivalent to knowing the set of worlds where the formula is satisfied, we can identify ’ with Mod(’), and then apply set theoretic morphological operations. We recall that Modð’ _ Þ ¼ Modð’Þ [ Modð Þ, Modð’ ^ Þ ¼ Modð’Þ \ Modð Þ, and Modð’Þ Modð Þ iff ’ . Using the previous equivalences, and based on set definitions of morphological operators [142], dilation and erosion of a formula ’ have been defined in [26,27] as follows: ModðDB ð’ÞÞ ¼ f! 2 j Bð!Þ \ Modð’Þ 6¼ ;g,
ð98Þ
ModðEB ð’ÞÞ ¼ f! 2 j Bð!Þ ’g:
ð99Þ
In these equations, the structuring element B represents a relationship between worlds, i.e., !0 2 Bð!Þ iff !0 satisfies some relationship with !. The condition in Equation (98) expresses that the set of worlds in relation to ! should be consistent with ’, i.e.: 9!0 2 Bð!Þ, !0 ’: The condition in Equation (99) is stronger and expresses that ’ should be satisfied in all worlds in relation to !. The structuring element B representing a relationship between worlds defines a ‘‘neighborhood’’ of worlds. If it is symmetrical, it leads to symmetrical structuring elements. If it is reflexive, it leads to structuring elements such that ! 2 B! , which leads to interesting properties, as will be seen later. An interesting way to choose the relationship is to base it on distances between worlds, which is an important information in spatial reasoning. This allows one to define sequences of increasing structuring
110
ISABELLE BLOCH
elements defined as the balls of a distance. For any distance between worlds, a structuring element of size n centered at ! takes the following form: Bn ð!Þ ¼ f!0 2 j ð!, !0 Þ ng:
ð100Þ
For instance a distance equal to 1 can represent a connectivity relation between worlds, defined for instance as a difference of one literal (i.e., one literal instantiated differently in both worlds). Now we consider the framework of normal modal logics [46,87] and use an accessibility relation as relation between worlds. We define an accessibility relation from any structuring element B as follows: Rð!, !0 Þ iff !0 2 Bð!Þ:
ð101Þ
Conversely, a structuring element can be defined from an accessibility relation. The accessibility relation R is reflexive iff 8! 2 , ! 2 Bð!Þ. It is symmetrical iff 8ð!, !0 Þ 2 2 , ! 2 Bð!0 Þ Q !0 2 Bð!Þ. In the following we will restrict the discussion to symmetrical relations. In general, accessibility relations derived from a structuring element are not transitive. Let us now consider the two modal operators u and s defined from the accessibility relation as [46]: M, ! u ’ iff 8!0 2 , Rð!, !0 Þ ) M, !0 ’,
ð102Þ
M, ! s’ iff 9!0 2 , Rð!, !0 Þ and M, !0 ’,
ð103Þ
where M denotes a standard model related to R, which will be skipped in the notations in the following (it will be always implicitly related to the considered accessibility relation). Equation (102) can be rewritten as: ! u’ Q f!0 2 j Rð!, !0 Þg ’ Q f!0 2 j !0 2 Bð!Þg ’ Q Bð!Þ ’, which exactly corresponds to the definition of erosion of a formula according to Equation (99).
ON FUZZY SPATIAL DISTANCES
111
In a similar way, Equation (103) can be rewritten as: ! s’ Q f!0 2 j Rð!, !0 Þg \ Modð’Þ 6¼ ; Q f!0 2 j !0 2 Bð!Þg \ Modð’Þ 6¼ ; Q Bð!Þ \ Modð’Þ 6¼ ;, which exactly corresponds to a dilation according to Equation (98). This shows that we can define modal operators derived from an accessibility relation as erosion and dilation with a structuring element: u’ ¼ EB ð’Þ,
ð104Þ
s’ ¼ DB ð’Þ:
ð105Þ
The modal logic constructed from erosion and dilation has the following theorems and rules of inference5:
T: u’ ! ’ and ’ ! s’ (if B is such that 8! 2 , ! 2 Bð!Þ, leading to a reflexive accessibility relation). Df: s’ $ :u:’ and u’ $ :s:’. D: u’ ! s’. B: su’ ! ’ and ’ ! us’. 5c: us’ ! s’ and u’ ! su’ (if B is such that 8! 2 , ! 2 Bð!Þ). 4c: uu’ ! u’ and s’ ! ss’ (if B is such that 8! 2 , ! 2 Bð!Þ). N: u> and :s ?. M: uð’ ^ Þ ! ðu’ ^ u Þ and ðs’ _ s Þ ! sð’ _ Þ. M0 : sð’ ^ Þ ! ðs’ ^ s Þ and ðu’ _ u Þ ! uð’ _ Þ. C: ðu’ ^ u Þ ! uð’ ^ Þ and sð’ _ Þ ! ðs’ _ s Þ. R: ðu’ ^ u Þ $ uð’ ^ Þ and sð’ _ Þ $ ðs’ _ s Þ. RN: ’ : u’
5
RM: ’ ! u’ ! u
and
’ ! : s’ ! s
ð’ ^ ’0 Þ ! ðu’ ^ u’0 Þ ! u
and
ð’ _ ’0 Þ ! : ðs’ _ s’0 Þ ! s
RR:
We use similar notations as in [46] for these theorems and rules of inference.
112
ISABELLE BLOCH
RE: ’$ u’ $ u
and
’$ : s’ $ s
K: uð’! Þ!ðu’!u Þ and by duality ð:s’^s Þ! sð:’^ Þ.
Let us now denote by un the iteration of n times u (i.e., n erosions by the same structuring element). Since the succession of n erosions by a structuring element is equivalent to one erosion by a larger structuring element of size n (iterativity property of erosion), un is a new modal operator, constructed as in Equation (104). In a similar way, we denote by sn the iteration of n times s, which is again a new modal operator, due to iterativity property of dilation, constructed as in Equation (105) with a structuring element of size n. We set u1 ¼ u and s1 ¼ s. We have the additional following theorems: 0
0
0
0
un un ’ $ unþn ’, and sn sn ’ $ snþn ’ (iterativity properties of dilation and erosion). susu’ $ su’, and usus’ $ su’ (idempotence of opening and closing). More generally, from properties of closing and opening:
0
0
0
0
0
0
sn un sn un ’ $ sn un sn un ’ $ smax ðn,n Þ umax ðn,n Þ ’, and 0
0
0
0
0
0
un sn un sn ’ $ un sn un sn ’ $ umax ðn,n Þ smax ðn,n Þ ’:
0
0
0
0
0
0
For n < n0 , sn ’ ! sn ’, un ’ ! un ’, un sn ’ ! un sn ’, sn un ’ ! sn un ’.
All these definitions and properties extend to the fuzzy case if we consider fuzzy formulas, i.e., formulas ’ for which Mod(’) is a fuzzy set of . The fuzzy structuring element can be interpreted as a fuzzy relation between worlds. The use of fuzzy structuring elements appears as particularly useful for expressing intrinsically vague spatial relationships. For spatial reasoning, interpretations can represent spatial entities, like regions of the space. Formulas then represent combinations of such entities, and define regions, objects, etc. (possibly fuzzy), which may be not connected. For instance, if a formula ’ is a symbolic representation of a region X of the space, it can be interpreted for instance as ‘‘the object we are looking at is in X.’’ In an epistemic interpretation, it could represent the
113
ON FUZZY SPATIAL DISTANCES
belief of an agent that the object is in X.6 The interest of such representations could be also to deal with any kind of spatial entities, without referring to points. Using these interpretations, if ’ represents some knowledge or belief about a region X of the space, then u’ represents a restriction of X. If we are looking at an object in X, then u’ is a necessary region for this object. Similarly, s’ represents an extension of X, and a possible region for the object. B. Distances in a Qualitative Setting We propose here to use the modal operators introduced in Section VIII.A to provide symbolic and qualitative representations of spatial knowledge. Again we use expressions of minimum and Hausdorff distances in terms of morphological dilations. The translation into a logical formalism is straightforward. Expressing that dN ðX, YÞ ¼ n leads to:
8m < n,sm ’ ^ inconsistent and sm ^ ’ inconsistent and sn ’ ^ consistent and sn ^ ’ consistent:
ð106Þ
Expressions like dN ðX, YÞ n translate into: sn ’ ^
consistent and sn ^ ’ consistent:
ð107Þ
Expressions like dN ðX, YÞ n translate into: 8m < n,sm ’ ^
inconsistent and sm ^ ’ inconsistent:
ð108Þ
Expressions like n1 dN ðX, YÞ n2 translate into:
8m < n1 ,sm ’ ^ inconsistent and sm ^ ’ inconsistent and sn2 ’ ^ consistent and sn2 ^ ’ consistent:
ð109Þ
The proof of these equations involves mainly T and the property 0 sn ’ ! sn ’ (see Section VIII.A). Similarly for Hausdorff distance, we translate dH ðX, YÞ ¼ n by:
8m < n, ^ :sm ’ consistent or ’ ^ :sm and ! sn ’ and ’ ! sn :
consistent
ð110Þ
The first condition corresponds to dH ðX, YÞ n and the second one to dH ðX, YÞ n. 6
This epistemic interpretation is due to Alessandro Saffiotti (personal communication).
114
ISABELLE BLOCH
n2
ϕ
n1
ϕ
ψ n2
ϕ
n1
FIGURE 7. Illustration of a distance relation expressed by an interval.
Let us consider an example of possible use of these representations for spatial reasoning. If we are looking at an object represented by in an area which is at a distance in an interval [n1, n2] of a region represented by ’, this corresponds to a minimum distance greater than n1 and to a Hausdorff distance less than n2. This is illustrated in Figure 7. Then we have to check the following relations: ! :sn1 ’ ^ sn2 ’,
ð111Þ
! un1 :’ ^ sn2 ’:
ð112Þ
or equivalently:
This expresses in a symbolic way an imprecise knowledge about distances represented as an interval. If we consider a fuzzy interval, this extends directly by means of fuzzy dilation. These expressions show how we can convert distance information, which is usually defined in an analytical way, into algebraic expressions through mathematical morphology, and then into logical expressions through morphological expressions of modal operators.
IX. CONCLUSION In this chapter we have discussed several ways of defining spatial distances, in different frameworks, ranking from purely quantitative ones to purely
ON FUZZY SPATIAL DISTANCES
115
qualitative ones. Issues such as knowledge representation, formal definitions, computation and reasoning are addressed, and different answers found depending on the type of available information and on the type of questions we want to answer. In this context, the fuzzy set framework plays a central role, since it merges elegantly quantitative and qualitative aspects. Also discussed has been the exploitation of features and properties of mathematical morphology to provide a unified framework for defining and computing distances in a quantitative setting, in a fuzzy (semiqualitative) one, as well as in a purely symbolic and qualitative one. Spatial distances constitute an important part of the spatial relationships linking objects in the space, and appear therefore as knowledge or information of major importance in spatial reasoning, which can be combined with other relationships in a fusion process. In particular, qualitative spatial reasoning and spatial reasoning under imprecision can benefit from the proposed approaches. Another interesting research perspective is to further investigate the links between the different views of space, as presented in Section II, and the various formalisms.
REFERENCES 1. Aisbett, J., and Gibbon, G. (2001). A general formulation of conceptual spaces as a meso level representation. Artifical Intelligence 133, 189–232. 2. Allen, J. (1983). Maintaining knowledge about temporal intervals. Commun. of the ACM 26(11), 832–843. 3. Asher, N., and Vieu, L. (1995). Toward a geometry of common sense: A semantics and a complete axiomatization of mereotopology. IJCAI ’95, San Mateo, CA, pp. 846–852. 4. Aurnague, M., Vieu, L., and Borillo, A. (1997). Repre´sentation formelle des concepts spatiaux dans la langue, in Language et cognition spatiale, edited by M. Denis. Paris: Masson, pp. 69–102. 5. De Baets, B., and Mesiar, R. (1997). T-partitions, T-equivalences and pseudo-metrics, in Seventh IFSA World Congress, Prague, Vol. I, pp. 187–192, June 1997. 6. Bandler, W., and Kohout, L. (1980). Fuzzy power sets and fuzzy implication operators. Fuzzy Sets and Systems 4, 13–30. 7. Bengoetxea, E., Larranaga, P., Bloch, I., Perchant, A., and Boeres, C. (2002). Inexact graph matching by means of estimation of distribution algorithms. Pattern Recognition 35, 2867–2880. 8. Benoit, E., and Foulloy, L. (1993). Capteurs flous multicomposantes: applications a` la reconnaissance des couleurs, in Les Applications des Ensembles Flous. Nıˆ mes, France, pp. 167–176. 9. Berthoz, A. (2002). Strate´gies cognitives et me´moire spatiale. in Colloque Cognitique. Paris. 10. Bhandari, D., and Pal, N. R. (1993). Some new information measure for fuzzy sets. Information Sci. 67, 209–228. 11. Bhandari, D., Pal, N. R., and Majumder, D. D. (1992). Fuzzy divergence, probability measure of fuzzy events and image thresholding. Pattern Recognition Lett. 13, 857–867.
116
ISABELLE BLOCH
12. Binaghi, E., Della Ventura, A., Rampini, A., and Schettini, R. (1993). Fuzzy reasoning approach to similarity evaluation in image analysis. Int. J. Intelligent Systems 8, 749–769. 13. Blades, M. (1991). The development of the abilities required to understand spatial representations, in Cognitive and Linguistic Aspects of Geographic Space, edited by D. M. Mark and A. U. Frank. NATO ASI, Kluwer, pp. 81–116. 14. Bloch, I. (1996). Distances in fuzzy sets for image processing derived from fuzzy mathematical morphology (invited conference). Information Processing and Management of Uncertainty in Knowledge-Based Systems. Spain: Granada, pp. 1307–1312,. 15. Bloch, I. (1996). Fuzzy geodesic distance in images, in Lecture Notes in Artificial Intelligence: Fuzzy Logic in Artificial Intelligence, towards Intelligent Systems, edited by A. Ralescu and T. Martin. Berlin: Springer Verlag, pp. 153–166. 16. Bloch, I. (1996). Fuzzy relative position between objects in images: a morphological approach, in IEEE Int. Conf. on Image Processing ICIP’96, Lausanne, Vol II, pp. 987–990. 17. Bloch, I. (1996). Image information processing using fuzzy sets. World Automation Congress, Soft Computing with Industrial Applications. Montpellier, France, pp. 79–84. 18. Bloch, I. (1998). Fuzzy morphology and fuzzy distances: New definitions and links in both Euclidean and geodesic cases, in Lecture Notes in Artificial Intelligence: Fuzzy Logic in Artificial Intelligence, edited by A. Ralescu and J. Shanahan. Berlin: Springer Verlag, pp. 149–165. 19. Bloch, I. (1999). Fuzzy relative position between objects in image processing: a morphological approach. IEEE Trans. Pattern Analysis and Machine Intelligence 21(7), 657–664. 20. Bloch, I. (1999). On fuzzy distances and their use in image processing under imprecision. Pattern Recognition 32(11), 1873–1895. 21. Bloch, I. (2000). Geodesic balls in a fuzzy set and fuzzy geodesic mathematical morphology. Pattern Recognition 33(6), 897–905. 22. Bloch, I. (2000). Spatial representation of spatial relationships knowledge. 7th International Conference on Principles of Knowledge Representation and Reasoning KR 2000, Breckenridge, CO, pp. 247–258. 23. Bloch, I. (2000). Using mathematical morphology operators as modal operators for spatial reasoning. ECAI 2000, Workshop on Spatio-Temporal Reasoning, Berlin, pp. 73–79. 24. Bloch, I. (2002). Modal logics based on mathematical morphology for spatial reasoning. J. Applied Non Classical Logics 12(3–4), 399–424. 25. Bloch, I., Ge´raud, T., and Maıˆ tre, H. (2003). Representation and fusion of heterogeneous fuzzy information in the 3D space for model-based structural recognition—Application to 3D brain imaging. Artificial Intelligence 148, 731–741. 26. Bloch, I., and Lang, J. (2000). Towards mathematical morpho-logics. 8th International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems, IPMU 2000, Madrid, Vol. III, pp. 1405–1412. 27. Bloch, I., and Lang, J. (2002). Towards mathematical morpho-logics, in Technologies for Constructing Intelligent Systems, edited by B. Bouchon-Meunier, J. Gutierrez-Rios, L. Magdalena, and R. Yager. Springer, pp. 367–380. 28. Bloch, I., and Maıˆ tre, H. (1995). Fuzzy distances and image processing (invited conference). ACM Symposium on Applied Computing, Nashville, TN, pp. 570–574. 29. Bloch, I., and Maıˆ tre, H. (1995). Fuzzy mathematical morphologies: A comparative study. Pattern Recognition 28(9), 1341–1387. 30. Bloch, I., Maıˆ tre, H., and Anvari, M. (1997). Fuzzy adjacency between image objects. Int. J. Uncertainty Fuzziness and Knowledge-Based Systems 5(6), 615–653. 31. Bloch, I., Maıˆ tre, H., and Minoux, M. (1993). Optimal matching of 3-D convex polyhedra with applications to pattern recognition. Pattern Recognition and Image Analysis 3(2), 137–149.
ON FUZZY SPATIAL DISTANCES
117
32. Bloch, I., Pellot, C., Sureda, F., and Herment, A. (1996). Fuzzy modelling and fuzzy mathematical morphology applied to 3D reconstruction of blood vessels by multi-modality data fusion, in Fuzzy Set Methods in Information Engineering: A Guided Tour of Applications, edited by D. Dubois, R. Yager, and H. Prade. New York: John Wiley, Chapter 5, pp. 93–110. 33. Bodenhofer, U. (2003). A note on approximate equality versus the Poincare´ paradox. Fuzzy Sets and Systems 133, 155–160. 34. Boixader, D. (2003). On the relationship between T-transitivity and approximate equality. Fuzzy Sets and Systems 133, 161–169. 35. Borgefors, G. (1996). Distance transforms in the square grid, in Progress in Picture Processing, Les Houches, Session LVIII, edited by H. Maıˆ tre. Amsterdam: North-Holland, chapter 1.4, pp. 46–80. 36. Bouayad, M., Leschi, C., and Emptoz, H. (1995). Contribution de la logique floue a` la mode´lisation de la reconnaissance des formes, in Rencontres francophones sur la logique floue et ses applications. Paris, pp. 49–56. 37. Bouchon-Meunier, B., Rifqi, M., and Bothorel, S. (1996). Towards general measures of comparison of objects. Fuzzy Sets and Systems 84(2), 143–153. 38. Bouchon-Meunier, B., and Yager, R. R. (1993). Entropy of similarity relations in questionnaires and decision trees. Second IEEE Int. Conf. on Fuzzy Systems, San Francisco, CA, pp. 1225–1230. 39. Boxer, L. (1997). On Hausdorff-like metrics for fuzzy sets. Pattern Recognition Lett. 18, 115–118. 40. Brass, P. (2002). On the nonexistence of Hausdorff-like metrics for fuzzy sets. Pattern Recognition Lett. 23, 39–43. 41. Briggs, R. (1973). Urban cognitive distance, in Image and Environment: Cognitive Mapping and Spatial Behavior, edited by R. M. Downs and D. Stea. Chicago: Aldine, pp. 361–388. 42. Cayrol, M., Farreny, H., and Prade, H. (1982). Fuzzy pattern matching. Kybernetes 11, 103–116. 43. Cesar, R., Bengoetxea, E., and Bloch, I. (2002). Inexact graph matching using stochastic optimization techniques for facial feature recognition. International Conference on Pattern Recognition, ICPR 2002, Que´bec. 44. Chang, C.-C., and Jiang, J.-H. (1996). A spatial filter for similarity retrieval. Int. J. of Pattern Recognition and Artificial Intelligence 10(6), 711–730. 45. Chauduri, B. B., and Rosenfeld, A. (1996). On a metric distance between fuzzy sets. Pattern Recognition Lett. 17, 1157–1160. 46. Chellas, B. (1980). Modal Logic, an Introduction. Cambridge: Cambridge University Press. 47. Chen, S. M., Yeh, M. S., and Hsio, P. Y. (1995). A comparison of similarity measures of fuzzy values. Fuzzy Sets and Systems 72, 79–89. 48. Clementini, E., and Di Felice, O. (1997). Approximate topological relations. Int. J. Approximate Reasoning 16, 173–204. 49. Clementini, E., Di Felice, P., and Hernandez, D. (1997). Qualitative representation of positional information. Artificial Intelligence 95, 317–356. 50. De Cock, M., and Kerre, E. (2003). On (u)n-suitable fuzzy relations to model approximate equality. Fuzzy Sets and Systems 133, 137–153. 51. De Cock, M., and Kerre, E. (2003). Why fuzzy T-equivalence relations do not resolve the Poincare´ paradox, and related issues. Fuzzy Sets and Systems 133, 181–192. 52. Cohn, A., Bennett, B., Gooday, J., and Gotts, N. M. (1997). Representing and reasoning with qualitative spatial relations about regions, in Spatial and Temporal Reasoning, edited by O. Stock. Kluwer, pp. 97–134.
118
ISABELLE BLOCH
53. Colliot, O., Bloch, I., and Tuzikov, A. (2002). Characterization of approximate plane symmetries for 3D fuzzy objects. In IPMU 2002, Annecy, France, Vol. III, pp. 1749–1756. 54. Cortelazzo, G., Deretta, G., Mian, G. A., and Zamperoni, P. (1996). Normalized weighted Levensthein distance and triangle inequality in the context of similarity discrimination of bilevel images. Pattern Recognition Lett. 17, 431–436. 55. Cross, V., and Cabello, C. (1995). A mathematical relationship between set-theoretic and metric compatibility measures. ISUMA-NAFIPS’95, College Park, MD, pp. 169–174. 56. Cuxac, C. (1999). French sign language: Proposition of a structural explanation by iconicity. Gesture Workshop, pp. 173–180. 57. de Barros, L. C., Bassanezi, R. C., and Tonelli, P. A. (1997). On the continuity of the Zadeh’s extension. Seventh IFSA World Congress, Prague, Vol. II, pp. 3–8. 58. Denis, M., Pazzaglia, F., Cornoldi, C., and Bertolo, L. (1999). Spatial discourse and navigation: An analysis of route directions in the city of Venice. Appl. Cognitive Psychology 13, 145–174. 59. Dessalles, J.-L. (2000). Aux origines du langage. Paris: Herme`s. 60. Dubois, D., and Jaulent, M.-C. (1987). A general approach to parameter evaluation in fuzzy digital pictures. Pattern Recognition Lett. 6, 251–259. 61. Dubois, D., and Prade, H. (1980). Fuzzy Sets and Systems: Theory and Applications. New York: Academic Press. 62. Dubois, D., and Prade, H. (1983). On distance between fuzzy points and their use for plausible reasoning. Int. Conf. Systems, Man, and Cybernetics, pp. 300–303. 63. Dubois, D., and Prade, H. (1985). A review of fuzzy set aggregation connectives. Information Sciences 36, 85–121. 64. Dubois, D., and Prade, H. (1991). A glance at non-standard models and logics of uncertainty and vagueness. Technical Report IRIT/91-97/R, IRIT, Toulouse, France. 65. Dubois, D., Prade, H., and Testemale, C. (1988). Weighted fuzzy pattern matching. Fuzzy Sets and Systems 28, 313–331. 66. Dutta, S. (1991). Approximate spatial reasoning: Integrating qualitative and quantitative constraints. Int. J. of Approximate Reasoning 5, 307–331. 67. Edwards, G. (1997). Geocognostics: A new framework for spatial information theory, in Spatial Information Theory: A Theoretical Basis for GIS, volume 1329 of LNCS. Springer, pp. 455–471. 68. Einstein, A., (1916). General Theory of Relativity. 69. Emptoz. H. (1983). Mode`le pre´topologique pour la reconnaissance des formes. Applications en neurophysiologie. Doctoral thesis, Univ. Claude Bernard, Lyon I, France. 70. Fan, J., and Xie, W. (1999). Some notes on similarity measure and proximity measure. Fuzzy Sets and Systems 101, 403–412. 71. Fan, J.-L. (1988). Note on Hausdorff-like metrics for fuzzy sets. Pattern Recognition Lett. 23, 793–796. 72. Frank, A. U. (1992). Qualitative spatial reasoning with cardinal directions. J. Visual Languages and Computing 3, 343–371. 73. Freeman, J., (1975). The modelling of spatial relations. Computer Graphics and Image Processing 4(2), 156–171. 74. Gahegan, M. (1995). Proximity operators for qualitative spatial reasoning, in Spatial Information Theory: A Theoretical Basis for GIS, volume 988 of LNCS, edited by A. U. Frank and W. Kuhn, Springer. 75. Gapp, K. P. (1984). Basic meanings of spatial relations: Computation and evaluation in 3D space, in 12th National Conference on Artificial Intelligence, AAAI-94, Seattle, WA, pp. 1393–1398.
ON FUZZY SPATIAL DISTANCES
119
76. Ga¨rdenfors, P. (2000). Conceptual Spaces: The Geometry of Thought. Cambridge, MA: MIT Press. 77. Ga¨rdenfors, P., and Williams, M.-A. (2001). Reasoning about categories in conceptual spaces. IJCAI’01, Seattle, WA, pp. 385–392. 78. Gasos, J., and Ralescu, A. (1997). Using imprecise environment information for guiding scene interpretation. Fuzzy Sets and Systems 88, 265–288. 79. Gaso´s, J., and Saffiotti, A. (2000). Using fuzzy sets to represent uncertain spatial knowledge in autonomous robots. J. Spatial Cognition and Computation 1, 205–226. 80. Ge´raud, T., Bloch, I., and Maıˆ tre, H. (1999). Atlas-guided recognition of cerebral structures in MRI using fusion of fuzzy structural information. CIMAF’99 Symposium on Artificial Intelligence, La Havana, Cuba, pp. 99–106. 81. Gerstenkorn, T., and Manko, J. (1991). Correlation of intuitionistic fuzzy sets. Fuzzy Sets and Systems 44, 39–43. 82. Goetschel, R., and Voxman, W. (1983). Topological properties of fuzzy numbers. Fuzzy Sets and Systems 10, 87–99. 83. Guesgen, H. W., and Albrecht, J. (2000). Imprecise reasoning in geographic information systems. Fuzzy Sets and Systems 113, 121–131. 84. Hart, R. A., and Moore, G. T. (1973). The development of spatial cognition: A review, in Image and Environment: Cognitive Mapping and Spatial Behavior, edited by R. M. Downs and D. Stea. Chicago: Aldine. 85. Helmoltz, H. (1878). The Facts of Perception. 86. Herskovits, A. (1986). Language and Spatial Cognition. An Interdisciplinary Study of the Prepositions in English. Cambridge, MA: Cambridge University Press. 87. Hughes, G. E., and Cresswell, M. J. (1968). An Introduction to Modal Logic. London: Methuen. 88. Huttenlocher, D. P., Klanderman, G. A., and Rucklidge, W. J. (1993). Comparing images using the Hausdorff distance. IEEE Trans. Pattern Analysis and Machine Intelligence 15(9), 850–863. 89. Hyung, L. K., Song, Y. S., and Lee, K. M. (1894). Similarity measure between fuzzy sets and between elements. Fuzzy Sets and Systems 62, 291–293. 90. Jacas, J., and Recasens, J. (1996). One-dimensional indistinguishability operators. Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU’96, Granada, Spain, Vol. I, pp. 377–381. 91. Jain, R., Murthy, S. N. J., and Chen, P. L. J. (1995). Similarity measures for image databases. IEEE Int. Conf. on Fuzzy Systems, Yokohama, Japan, pp. 1247–1254. 92. Janis, V. (2003). Resemblance is a nearness. Fuzzy Sets and Systems 133, 171–173. 93. Jaulent, M. C., and Yang, A. (1994). Application of fuzzy pattern matching to the flexible interrogation of a digital angiographies database. IPMU, Paris, pp. 904–909. 94. Kandel, A., and Byatt, W. J. (1978). Fuzzy sets, fuzzy algebra, and fuzzy statistics. Proc. IEEE 66(12), 1619–1639. 95. Kandil, A., Sedamed, E. H., and Morsi, N. N. (1994). Fuzzy proximities induced by functional fuzzy separation. Fuzzy Sets and Systems 63, 227–235. 96. Kant, E. (1781). Critique of Pure Reason. 97. Kaufman, A. (1973). Introduction to the Theory of Fuzzy Subsets: Fundamental Theoretical Elements. New York: Academic Press. 98. Keller, J., and Wang, X. (2000). A fuzzy rule-based approach to scene description involving spatial relationships. Computer Vision and Image Understanding 80, 21–41. 99. Keller, J. M., and Wang, X. (1995). Comparison of spatial relation definitions in computer vision. ISUMA-NAFIPS’95, College Park, MD, pp. 679–684.
120
ISABELLE BLOCH
100. Kitamoto, A., and Takagi, M. (1995). Retrieval of satellite cloud imagery based on subjective similarity. 9th Scandinavian Conference on Image Analysis, Uppsala, Sweden, pp. 449–456. 101. Klawonn, F. (2003). Should fuzzy equality and similarity satisfy transitivity? Comments on the paper by M. De Cock and E. Kerre, Fuzzy Sets and Systems 133, 175–180. 102. Krishnapuram, R., Keller, J. M., and Ma, Y. (1993). Quantitative analysis of properties and spatial relations of fuzzy image regions. IEEE Trans. on Fuzzy Systems 1(3), 222–233. 103. Kuipers, B. (1978). Modeling spatial knowledge. Cognitive Science 2, 129–153. 104. Kuipers, B. (2000). The spatial semantic hierarchy. Artifical Intelligence 119, 191–233. 105. Kuipers, B. J., and Levitt, T. S. (1988). Navigation and mapping in large-scale space. AI Magazine 9(2), 25–43. 106. Kullback, S. (1959). Information Theory and Statistics. New York: Wiley. 107. Lambrey, S., Viaud-Delmon, I., and Berthoz, A. (2002). Influence of a sensorimotor conflict on the memorization of a path traveled in virtual reality. Cognitive Brain Research 14, 177–186. 108. Liu, J. (1998). A method of spatial reasoning based on qualitative trigonometry. Artificial Intelligence 98, 137–168. 109. Liu, X., Tan, S., Srinivasan, V., Ong, S. H., and Xie, Z. (1994). Fuzzy pyramid-based invariant object recognition. Pattern Recognition 27(5), 741–756. 110. Liu, Z.-Q. and Satur, R., (1999). Contextual fuzzy cognitive map for decision support in geographic information systems. IEEE Trans. Fuzzy Systems 7(5), 495–505. 111. Lowen, R., and Peeters, W. Anisotropic semi-pseudometrics, unpublished manuscript. 112. Lowen, R., and Peeters, W. (1997). On various classes of semi-pseudometrics used in pattern recognition. Seventh IFSA World Congress, Prague, Vol. I, pp. 232–237. 113. Lowen, R., and Peeters, W. (1998). Distances between fuzzy sets representing grey level images. Fuzzy Sets and Systems 99(2), 135–150. 114. De Luca, A., and Termini, S. (1972). A definition of non-probabilistic entropy in the setting of fuzzy set theory. Information and Control 20, 301–312. 115. Maher, P. E. (1993). A similarity measure for conceptual graphs. Int. J. Intelligent Systems 8, 819–837. 116. Man, G. T., and Poon, J. C. (1993) a fuzzy-attributed graph approach to handwritten character recognition. Second IEEE Int. Conf. on Fuzzy Systems, San Francisco, CA, pp. 570–575. 117. Mangin, J.-F., Bloch, I., Lopez-Krahe, J., and Frouin, V. (1994). Chamfer distances in anisotropic 3D images. EUSIPCO 94, Edinburgh, UK, pp. 975–978. 118. Mark, D. M., and Egenhofer, M. J. (1994). Modeling spatial relations between lines and regions: combining formal mathematical models and human subjects testing. Cartography and Geographic Information Systems 21(4), 195–212. 119. Mascarilla, L. (1994). Rule extraction based on neural networks for satellite image interpretation. SPIE Image and Signal Processing for Remote Sensing, Rome, vol. 2315, pp. 657–668. 120. Masson, M., and Denœux, T. (2002). Multidimensional scaling of fuzzy dissimilarity data. Fuzzy Sets and Systems 128, 339–352. 121. Mellet, E., Bricogne, S., Tzourio-Mazoyer, N., Ghae¨m, O., Petit, L., Zago, L., Etard, O., Berthoz, A., Mazoyer, B., and Denis, M. (2000). Neural correlates of topographic mental exploration: The impact of route versus survey perspective learning. NeuroImage 12(5), 588–600. 122. Miyajima, K., and Ralescu, A. (1994). Spatial organization in 2D segmented images: Representation and recognition of primitive spatial relations. Fuzzy Sets and Systems 65, 225–236.
ON FUZZY SPATIAL DISTANCES
121
123. Montello, D. R. (1993). Scale and multiple psychologies of space. COSIT ’93, volume 716 of LNCS, Elba Island, Italy, pp. 312–321. 124. Nadel, L. (1995). The psychobiology of spatial behavior: the hippocampal formation and spatial mapping, in Behavioral Brain Research in Naturalistic and Semi-Naturalistic Settings: Possibilities and Perspectives, edited by E. Alleva, H.-P. Lipp, L. Nadel, A. Fasolo, and L. Ricceri. Dordrecht: Kluwer. 125. Nadel, L. (2002). Multiple perspectives in spatial cognition, in Colloque Cognitique. Paris. 126. Pappis, C. P., and Karacapilidis, N. I. (1993). A comparative assessment of measures of similarity of fuzzy values. Fuzzy Sets and Systems 56, 171–174. 127. Perchant, A., and Bloch, I. (2002). Fuzzy morphisms between graphs. Fuzzy Sets and Systems 128(2), 149–168. 128. Peuquet, D. J. (1998). Representations of geographical space: Toward a conceptual synthesis. Annals of the Association of American Geographers 78(3), 375–394. 129. Piaget, J., and Inhelder, B. (1967). The Child’s Conception of Space. New York: Norton. 130. Poincare´, H. (1902). La science et l’hypothe`se. Flammarion. 131. Potoczny, H. B. (1984). On similarity relations in fuzzy relational databases. Fuzzy Sets and Systems 12, 231–235. 132. Pullar, D., and Egenhofer, M. (1988). Toward formal definitions of topological relations among spatial objects. Third Int. Symposium on Spatial Data Handling, Sydney, Australia, pp. 225–241. 133. Puri, M. L. and Ralescu, D. A. (1981). Diffe´rentielle d’une fonction floue. C.R. Acad. Sci. Paris I 293, 237–239. 134. Puri, M. L., and Ralescu, D. A. (1983). Differentials of fuzzy functions. J. Mathematical Analysis and Applications 91, 552–558. 135. Randell, D., Cui, Z., and Cohn, A. (1992). A spatial logic based on regions and connection, Principles of Knowledge Representation and Reasoning, KR’92, San Mateo, CA, pp. 165–176. 136. Rifqi, M. (1995). Mesures de similitude et leur agre´gation, in Rencontres francophones sur la logique floue et ses applications. Paris, pp. 80–87. 137. Rosenfeld, A. (1984). The fuzzy geometry of image subsets. Pattern Recognition Lett. 2, 311–317. 138. Rosenfeld, A. (1985). Distances between fuzzy sets. Pattern Recognition Lett. 3, 229–233. 139. Saha, P. K., Wehrli, E. W., and Gomberg, B. R. (2003). Fuzzy distance transform: Theory, algorithms, and applications. Fuzzy Sets and Systems 86, 171–190. 140. Sallandre, M.-A., and Cuxac, C. (2001). Iconicity in sign language: A theoretical and methodological point of view. Gesture Workshop, 173–180. 141. Satur, R., and Liu, Z.-Q. (1999). A contextual fuzzy cognitive map framework for geographic information systems. IEEE Trans. Fuzzy Systems 7(5), 481–494. 142. Serra, J. (1982). Image Analysis and Mathematical Morphology. London: Academic Press. 143. Siegel, A. W., and White, S. H. (1975). The development of spatial representations of large-scale environments, in Advances in Child Development and Behavior, edited by H. W. Reese. New York: Academic Press. Vol. 10. 144. Sinha, D., and Dougherty, E. R. (1993). Fuzzification of set inclusion: Theory and applications. Fuzzy Sets and Systems 55, 15–42. 145. Sokic, C., and Pavlovic-Lazetic, G. (1997). Homogeneous images in fuzzy databases. In Seventh IFSA World Congress, Prague, Vol. IV, pp. 297–303. 146. Sridharan, K., and Stephanou, H. E. (1999). Fuzzy distances for proximity characterization under uncertainty. Fuzzy Sets and Systems 103, 427–434. 147. Talmy, L. (1983). How language structures space, in Spatial Orientation: Theory, Research and Application, edited by H. L. Pick and L. P. Acredolo. New York: Plenum Press.
122
ISABELLE BLOCH
148. Talmy, L. (2000). Toward a Cognitive Semantics. Cambridge, MA: MIT Press. 149. Tan, S. K., Teh, H. H., and Wang, P. Z. (1994). Sequential representation of fuzzy similarity relations. Fuzzy Sets and Systems 67, 181–189. 150. Tran, L., and Duckstein, L. (2002). Comparison of fuzzy numbers using a fuzzy distance measure. Fuzzy Sets and Systems 130, 331–341. 151. Tversky, A. (1977). Features of similarity. Psychological Review 84(4), 327–352. 152. Vandeloise, C. (1986). L’espace en franc¸ais: se´mantique des pre´positions spatiales. Paris: Seuil, travaux en linguistique. 153. Varzi, A. (1996). Parts, wholes, and part–whole relations: The prospects of mereotopology. Data and Knowledge Engineering 20(3), 259–286. 154. Vieilledent, S, Kosslyn, S.M., Berthoz, A., and Giraudo, M. D. (2003). Does mental simulation of following a path improve navigation performance without vision? Cognitive Brain Research 16(2), 238–249. 155. Vieu, L. (1997). Spatial representation and reasoning in artificial intelligence, in Spatial and Temporal Reasoning, edited by O. Stock. Dordrecht: Kluwer, pp. 5–41. 156. Walker, E. L. (1996). Fuzzy relations for feature–model correspondence in 3D object recognition. NAFIPS, Berkeley, CA, pp. 28–32. 157. Wang, W.-J. (1997). New similarity measures on fuzzy sets and on elements. Fuzzy Sets and Systems 85, 305–309. 158. Wang, X., De Baets, B., and Kerre, E. (1995). A comparative study of similarity measures. Fuzzy Sets and Systems 73, 259–268. 159. Yager, R. Y. (1992). Entropy measures under similarity relations. Int. J. General Systems 20, 341–358. 160. Zadeh, L. (1978). Fuzzy sets and information granularity, in Advances in Fuzzy Set Theory and Applications, edited by Gupta, M., Ragade, R., and Yager, R. Amsterdam: NorthHolland, pp. 3–18. 161. Zadeh, L. A. (1971). Similarity relations and fuzzy orderings. Information Sci. 3, 177–200. 162. Zadeh, L. A. (1975). The concept of a linguistic variable and its application to approximate reasoning. Information Sci. 8, 199–249. 163. Zwick, R., Carlstein, E., and Budescu, D. V. (1987). Measures of similarity among fuzzy concepts: A comparative analysis. Int. J. Approximate Reasoning 1, 221–242.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 128
Mathematical Morphology Applied to Circular Data ALLAN HANBURY* Pattern Recognition and Image Processing Group (PRIP), Vienna University of Technology, Favoritenstraße 9/1832, A-1040 Vienna, Austria I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . II. Processing Circular Data . . . . . . . . . . . . . . . . . . . . . A. Circular Data and the Unit Circle . . . . . . . . . . . . . B. Circular Statistics . . . . . . . . . . . . . . . . . . . . . . . C. Mathematical Morphology Applied to the Unit Circle . . D. Morphology with the Choice of an Origin . . . . . . . . . E. Pseudodilation and Pseudoerosion . . . . . . . . . . . . . 1. Morphological Center . . . . . . . . . . . . . . . . . . . 2. Erosion and Dilation . . . . . . . . . . . . . . . . . . . F. Circular Centered Morphology . . . . . . . . . . . . . . . 1. Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Top-Hat . . . . . . . . . . . . . . . . . . . . . . . . . . G. Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Connected Partitions . . . . . . . . . . . . . . . . . . . 2. Indexed Partitions . . . . . . . . . . . . . . . . . . . . . 3. Cyclic Operators . . . . . . . . . . . . . . . . . . . . . . 4. Series Closings . . . . . . . . . . . . . . . . . . . . . . . 5. Parallel Openings . . . . . . . . . . . . . . . . . . . . . 6. Rotationally Invariant Cyclic Opening. . . . . . . . . . H. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . III. Application Examples . . . . . . . . . . . . . . . . . . . . . . A. Homogeneous Phase Extraction in HRTEM Images . . . B. Oriented Texture . . . . . . . . . . . . . . . . . . . . . . . 1. The Rao and Schunck Algorithm . . . . . . . . . . . . 2. Segmentation. . . . . . . . . . . . . . . . . . . . . . . . 3. Defect Detection with the Circular Centered Top-Hat . 4. Defect Detection with the Labeled Opening. . . . . . . C. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . IV. 3D Polar Coordinate Color Spaces . . . . . . . . . . . . . . . A. Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . B. 3D Polar Coordinate Color Representations . . . . . . . . C. Discussion of the Existing 3D Polar Coordinate Spaces. . D. Derivation of a Useful 3D Polar Coordinate Space . . . . 1. Brightness . . . . . . . . . . . . . . . . . . . . . . . . . 2. Hue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Saturation . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
124 126 126 128 129 130 132 132 135 136 136 138 141 141 141 142 142 144 146 151 153 153 156 157 161 163 164 168 169 169 171 172 173 175 175 176
*The majority of this work was done while the author was with the Centre for Mathematical Morphology, Paris School of Mines, France. It is supported by the Austrian Science Foundation (FWF) under grants P14445-MAT and P14662-INF. 123
Copyright ß 2003 Elsevier Inc. All rights reserved. 1076-5670/2003 $35.00
124
ALLAN HANBURY
4. Chroma. . . . . . . . . . . . . . . . . . . . . . . . E. The IHLS Space . . . . . . . . . . . . . . . . . . . . 1. The Simplest RGB to IHLS Transformation . . . 2. An Alternative RGB to IHLS Transformation . . 3. The Inverse Transformation from IHLS to RGB. F. Conclusion . . . . . . . . . . . . . . . . . . . . . . . V. Processing of 3D Polar Coordinate Color Spaces . . . . A. Color Statistics . . . . . . . . . . . . . . . . . . . . . B. Vectorial Mathematical Morphology . . . . . . . . . 1. Vectorial Orders . . . . . . . . . . . . . . . . . . . 2. Morphological Operators . . . . . . . . . . . . . . C. Lexicographical Orders in the IHLS Color Space . . 1. Luminance and Saturation . . . . . . . . . . . . . 2. Hue . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Saturation-Weighted Hue . . . . . . . . . . . . . . 4. Color Top-Hat. . . . . . . . . . . . . . . . . . . . 5. Summary . . . . . . . . . . . . . . . . . . . . . . . D. Conclusion . . . . . . . . . . . . . . . . . . . . . . . VI. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A: Connected Partitions . . . . . . . . . . . . Appendix B: Cyclic Closings on Indexed Partitions . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
177 178 178 179 179 180 181 181 183 183 187 187 187 189 191 194 194 195 196 199 199 201 201
I. INTRODUCTION Data represented by angles or by two-dimensional orientations, called circular data, often appear in the analysis of the natural world. Some examples are wind directions, the directions of departure of birds or animals from a point of liberation, and the orientations of fracture planes in rocks. The statistical analysis of circular data is a well-studied subject (Fisher, 1993; Mardia and Jupp, 1999), but in the context of image processing and analysis, in which this type of data is also found, the development of methods for processing it correctly has received less attention. For color images, the hue component of color representations in 3D polar coordinates is an angular value. For this reason, the hue has properties different from those of its accompanying components, the saturation and brightness. Nevertheless, this difference is often ignored and the same algorithms are applied to the three components. It is also often necessary to process two-dimensional direction fields, for example in the analysis of oriented textures, of the vector fields produced by movement analysis in image sequences, or of the vector field produced by the Fourier transform of any image. Spectrograms, the result of a series of short-time Fourier transforms applied to a unidimensional signal, can also be visualized as a vector
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
125
field. With the results of a Fourier transform, one has the tendency to use only the vector amplitudes, leaving aside their directions (the phase). Why is there this reticence to process the angular values? Why is the nature of circular data sometimes ignored, leading them to be processed as linear data?1 Circular data can be visualized as points on the circumference of the unit circle, the circle with unit radius. By using this representation, one immediately sees the characteristics that render circular data difficult to process. They are cyclic—adding 2p or one of its multiples to a coordinate brings one back to the original position. Furthermore, there is no obvious origin—each position on the circle is equal to every other. For this reason, King Arthur chose a round table for his knights, and as for the knights, one cannot impose an order by magnitude on circular data. But are these problems insurmountable? Some possible solutions are discussed in this chapter. In particular, we show how mathematical morphology operators can be applied to this type of data. The notion of rotational invariance is important in the context of circular data processing. In a set of directions, the numerical value of the coordinate of each direction depends on the position chosen for the origin. If an operator acting on this set always gives the same direction as a result, independently of the position of the origin (note that this is not necessarily the same numerical value), then the operator is said to be rotationally invariant. For circular data, for which an obvious origin does not exist, this property is desirable, and among the morphological operators that we develop, those which satisfy this property are indicated. This chapter is concerned mainly with the processing of circular data in the context of image processing and analysis. It is essentially a translation into English of some parts of a Ph.D. thesis (Hanbury, 2002). It presents expanded versions of the material presented by Hanbury and Serra (2001a,c). In Section II, we develop morphological operators pertinent to circular data. The first applications of these operators are presented in Section III, in the context of processing Fourier transform phase images and oriented textures, applications in which it is often possible to process the circular data component in isolation. We then move on to the case where an angular coordinate forms part of a vector, as found in color images represented in a 3D polar coordinate system. In an effort to simplify the choice of a 3D polar coordinate color representation amongst the multitude available in the literature, Section IV discusses and develops the IHLS (improved hue, luminance, and saturation) system, a system of 1
The practitioners of image processing are not the only people to err in this direction. At one stage, statistics of wind directions were calculated using standard linear statistics, necessitating the later development of methods for correcting these errors (Fisher, 1993).
126
ALLAN HANBURY
3D polar coordinates suitable for use in image processing and analysis. The application of morphological operators in the IHLS space is discussed in Section V. A detailed introduction to mathematical morphology is outside the scope of this chapter. The reader is referred to Soille (1999) for a practical introduction, and to Serra (1982) and Heijmans (1994) for a more mathematical treatment. II. PROCESSING CIRCULAR DATA In image processing and analysis, one is sometimes confronted by images containing angular values at each point. Three applications of this type, which are described in more detail in Sections III and V, are the processing of the hue component of color images, of a direction field describing an oriented texture, and of the phase image produced by a Fourier transform. Angular data can be visualized as points on the unit circle, a representation discussed in Section II.A. The circle has neither order of importance of points nor a dominant position which could be taken as an origin. In addition, the data are cyclic—adding 2p to a coordinate on the circle gives a result at the same position as the initial point. In general, for this type of data, the classic operators designed for use on linear data are not valid. For statistical descriptors, one is inconvenienced by the periodicity; for mathematical morphology, also by the lack of an obvious origin based on which one can construct a lattice. The statistics of circular data is a welldeveloped area of research. We present in Section II.B a brief review of the circular statistics descriptions (Fisher, 1993; Mardia and Jupp, 1999) which are useful in our applications. We then move on to answering the following important question: Can we avoid the difficulty imposed by the lack of an obvious origin, and hence develop morphological operators which are rotationally invariant? In Sections II.C–II.G we consider four approaches to mathematical morphology for circular data. For the examples in this section, we make use of the hue band of color images. This is discussed in more detail in Section IV, but for this section, it is sufficient to know that the hue is an angular value describing the color of a pixel. A. Circular Data and the Unit Circle Two types of circular data exist: vectorial data and axial data (Fisher, 1993). Vectorial data represent direction, for example wind direction, and have a periodicity of 2p. Axial data represent the orientations of undirected lines, for example the orientations of cracks on a surface, and have a periodicity of
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
127
p. For the examples we consider, the hue value and the Fourier transform phases are vectorial data, and direction fields are axial data.2 In general, axial data are processed by first converting them to vectorial data (by a multiplication by 2 followed by a modulo 2p if necessary), followed by an application of vectorial data techniques, and finally a conversion of the results to axial data values. All the algorithms in this chapter are therefore designed for vectorial data. Angular valued data can usefully be represented as points on the circumference of the unit circle, the circle with center o and radius of length 1 shown in Figure 1. The points on the circle which indicate directions with respect to center o are written i with i 2 N. Upon choosing an arbitrary origin a0 on the unit circle, the positions of points i can be given by the corresponding angles ai, with these angles being measured in an anticlockwise direction with respect to the origin a0. Two points 1 and 2 are shown in Figure 1, with their angles a1 and a2 measured with respect to the origin a0 indicated. We stress that the points i are always found in the same position, independent of the position of the origin a0, whereas the values of the associated angles ai change as a function of the position of the origin. The angles have the property that the values ai þ 2kp, k 2 Z always correspond to the same point i. To simplify the comparison of angles, we constrain their values ai to lie in the interval [0, 2p). We therefore define the operator K() which takes a value 2 (1, 1) and moves it into the interval [0, 2p). This operator is defined as KðÞ ¼ þ 2kp, with k 2 Z chosen so that KðÞ 2 ½0, 2pÞ:
ð1Þ
We proceed to the definition of addition and subtraction operators for angular values on the unit circle, wishing to obtain results in the interval [0, 2p). The addition of two angular values ai and aj is defined as
ai þ aj ¼ Kðai þ aj Þ
ð2Þ
and the subtraction of these two values is defined as aj ¼ Kðai aj Þ: ai
ð3Þ
Another notion of angular difference is the smallest angle formed by the directions 1 and 2, and represented by the acute angle between the two 2 A type of circular data known as p-axial also exists, having the property that a direction represented by an angle is the same as the directions represented by the angles þ k(2p/p), k 2 Z with p 2 N, p>2. For example, for p ¼ 6, the angles 10 , 70 , 130 , 190 , 250 , and 310 correspond to the same direction.
128
ALLAN HANBURY
FIGURE 1. The unit circle.
angles a1 and a2, which we denote by a1 a2, with, in the general case of any two angles ai, aj 2 [0, 2p), jai aj j if jai aj j p ai a j ¼ : ð4Þ 2p jai aj j if jai aj j p The acute angle a1 a2 is indicated in Figure 1. For digital images, the range of pixel values is usually limited by the number of bits per pixel. For an 8-bit image, the angles between 0 and 2p are represented by integer values between 0 and 255. For the examples presented, we use images having floating point pixel values, allowing more precision. We also freely switch between units of degrees and radians, using whichever is more convenient for the problem at hand. B. Circular Statistics One cannot use the classic linear statistical descriptors for circular data, due to the periodicity of this type of data. We present definitions of the circular mean and circular variance applicable to circular data as well as to images containing this type of data. These definitions are from Fisher (1993). We begin with the circular mean. Given n angular values i, i ¼ 1,. . . , n, the mean direction is the direction of the resultant vector of the sum of unit vectors in the n directions i. To find the direction of this resultant vector, one first calculates the values A¼
X i
cos i ,
B¼
X i
sin i
ð5Þ
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
129
followed by 8 arctan ðB=AÞ > > > > > > arctan ðB=AÞ þ p > < ¼ arctan ðB=AÞ þ 2p > > > > p=2 > > > : 3p=2
if B > 0, A > 0 if A < 0 if B < 0, A > 0 :
ð6Þ
if A ¼ 0, B > 0 if A ¼ 0, B < 0
The arctan function gives angular values in the interval [p/2, p/2], and the top three levels of Equation (6) give a value of in the interval [0, 2p). The final two levels take into account the special case when A ¼ 0. The length of the resultant vector is R¼
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi A2 þ B2
ð7Þ
R : n
ð8Þ
and the average length is R¼
The average length has a value between 0 and 1 and can be used as an indicator of the dispersion of the data. If R ¼ 1, all the i are coincident. Conversely, a value of zero does not necessarily indicate a homogeneous data distribution, as many nonhomogeneous distributions can also result in a value of zero. The circular variance is defined as V ¼ 1 R:
ð9Þ
Analogously to the linear variance, the concentration of the distribution is inversely proportional to the value of V (note that V also takes values between 0 and 1). Definitions of the angular standard deviation are also available (Fisher, 1993). To calculate statistical measures of axial data, one first multiplies each angular value by two, and proceeds with the calculation of , R, and V. Lastly, is divided by two. C. Mathematical Morphology Applied to the Unit Circle Mathematical morphology is usually applied to grayscale images of the form f : z ! R where Z E is a subspace of the Euclidean space E.
130
ALLAN HANBURY
The existence of an order relation on R allows the construction of a complete lattice and hence the application of morphological operators. Morphological operators interact with images by means of small sets called structuring elements (SE). For simplicity, we use square structuring elements in the examples, where a square SE of size k is a square of size (2k þ 1) (2k þ 1) pixels. In the rest of this section, we consider images containing circular data, i.e., images of the type a : Z ! C, where C is the unit circle. In images of this type, there is no predefined order for the angular values. One is free to choose an origin a0 anywhere on the circle, and the order of the values depends on this choice. The application of mathematical morphology to this type of image is discussed in the following sections. We begin by developing, in Section II.D, operators for which it is necessary to initially choose an origin, but which take the periodicity of the circular values into account. Next, we suggest, in Sections II.E–II.G, some approaches which avoid the necessity of choosing an origin, and thereby allow the creation of rotationally invariant morphological operators (Hanbury and Serra, 2001c).
D. Morphology with the Choice of an Origin Having chosen an origin a0 on the unit circle, it is easy to build an order from 0 to 360 , with infimum 0 and supremum 360 (Vardavoulia et al., 2001; Weeks and Sartor, 1999). We then find ourselves in the unfortunate situation where the infimum and supremum are the same point of the circle, the origin. A solution allowing one to escape from this paradox is to order the points as a function of their distance to a chosen origin, using the acute angle between two points given by Equation (4). An order based on these differences is not total, as two points on opposite sides of the origin can have the same distance from the origin. We can nevertheless impose a total order on the points ai of the circle by using the following algorithm: ai aj
if or if
ai a0 aj a0 : ai a0 ¼ aj a0 and ai a0 180
ð10Þ
A similar relation to this has been used by Peters (1997) for applying morphological operators to hue differences in color images, and by Zhang and Wang (2000) in their definition of a central point of a segment
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
131
on the hue circle. Peters defines an erosion "PB by a structuring element B at point x as "PB aðxÞ ¼ inffað yÞ, y 2 Bx g
ð11Þ
in which the order given by Equation (10) with an origin a0 is used, that is to say that the infimum of a set of points on the unit circle is the point closest to the chosen origin a0. The supremum is therefore the point furthest away from the origin. When dealing with directions, this definition is not very intuitive. If we are interested in some color of hue H on which we wish to carry out a dilation, it is necessary to choose the origin at Hþ180 . To simplify the choice of the origin, we define the operators in a way that permits the user to choose the origin at the position of the hue of interest. Consider the simple twocolor example shown in Figure 2(a): an image which, in the HLS space (see Section IV), has a brightness L ¼ 1/2 and a saturation S ¼ 1 constant over the entire image, with the red grains having a hue H ¼ 0 , and the yellow background a hue H ¼ 60 . We choose the origin equal to the hue of the objects of interest a0 ¼ 0 . If we dilate the hue by choosing the supremum in the structuring element according to the Peters formulation, the result is shown in Figure 2(b), in which the red objects have been eroded. To allow the user to choose the origin more intuitively by placing it at the position of the hue of the objects of interest, we invert the Peters formulation, and define the erosion as "B aðxÞ ¼ sup fað yÞ, y 2 Bx g
ð12Þ
B aðxÞ ¼ inffað yÞ, y 2 Bx g
ð13Þ
and the dilation as
(a)
(b)
(c)
FIGURE 2. (a) Two-color images containing red grains (marked ‘R’) on a yellow background (marked ‘Y’). (b) Dilation using the Peters formulation. (c) Dilation using Equation (13). The hue origin is a0 ¼ 0 for these two operations.
132
ALLAN HANBURY
in which we continue to use the order defined by Equation (10) in choosing the supremum and infimum in the structuring element. The dilation of Figure 2(a) by using Equation (13) is shown in Figure 2(c). The behavior of this dilation is revealed to be more intuitive. The choice of the origin can be made based on the requirements of the user or the characteristics of the images to be treated. For example, the mean or median (Nikolaidis and Pitas, 1998) color (hue) of an image or information on the color of the objects of interest may be used. The application of this type of hue morphology to color images which are more complex than the one in Figure 2(a), in which there is an interaction between the hue and saturation components, is discussed in Section V.C. E. Pseudodilation and Pseudoerosion To avoid having to choose an origin, we proceed to the development of operators based on the idea of grouped data. Because of the difficulty in determining the number of groups in a sample of circular data (Fisher and Marron, 2001), we introduce a simple definition of grouped data by way of the morphological center, and then use it to define morphological pseudoerosion and pseudodilation operators. 1. Morphological Center The morphological center is a notion which appears naturally in the context of self-dual morphological filters (Serra, 1988). Given n numerical values ti 2 R and a supplementary value t which we wish to bring closer to the ti, we apply the morphological center operator as follows 8 < ^ti ðtÞ ¼ t : _ti
if t ^ti if ^ ti t _ti : if _ ti t
ð14Þ
In particular, for n ¼ 2 we find the median of the three values, t1, t2, and t. When we wish to transpose this notion to the unit circle, we immediately come up against an obstacle. In the linear case, it is always possible to say whether a value is outside (superior or inferior to) the set of values ti. Now consider a similar case on the unit circle, where we wish to bring a point a closer to a set of points ai. In Figure 3, let the origin O be the point to be moved closer to the ai (represented as crosses). In this case, it is possible to make sense of algorithm (14) only for certain distributions, such as those in Figures 3(a), (b), and (c), but not for the distribution of Figure 3(d), in which the data are too dispersed.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
133
FIGURE 3. Four distributions of circular data. (a), (b), and (c) are grouped, (d) is not.
A simple approach is to ignore the grouping of the data, and to unconditionally put a at the position of the closest point ai 0 ðaÞ ¼ fai jðai aÞ ¼ ^ðai aÞ, i 2 Ig:
ð15Þ
Alternatively, we can attempt to construct algorithms similar to algorithm (14). For this approach, it is necessary to formally define the notion of a group of points, of which an intuitive idea is given by Figures 3(a), (b), and (c). Definition 1 A family f i , i 2 Ig of points on the unit circle forms an !-group when an origin a0 exists for which the following is valid _fai , i 2 Ig ^ fai , i 2 Ig ! p
ð16Þ
where ! is an angle less than or equal to p, and ai is the angle corresponding to point i measured with respect to the origin a0. The condition ! p removes the case shown in Figure 3(d) from consideration. In practice, it is possible to decide if an !-group exists by simply choosing one arbitrary origin, as shown by the following proposition. Proposition 2 The family f i , i 2 Ig of points on the unit circle C forms an !-group if and only if one has _fai , i 2 Ig ^ fai , i 2 Ig !
ð17Þ
for an arbitrary origin a0, or for the origin a0 þ p. Proof If the ai are !-grouped, then it is possible to partition C into two semicircles so that all the ai are in one of the semicircles. With this partition
134
ALLAN HANBURY
of the circle, a point at the position of the origin a0 is in one of the semicircles, and the point at the position a0 þ p is necessarily in the other. One of these points is therefore in the semicircle opposite to the one which contains the family f i , i 2 Ig, and for this origin, relation (17) is satisfied, as the origin does not belong to the envelope of the group of points (i.e., the smallest sector of the circle which contains them all). Conversely, if the relation (17) is satisfied for an origin a0, we have the definition of an !-group of the ai. u This proof gives rise to a simple algorithm for determining whether a group of points is !-grouped. Given a family of points f i , i 2 Ig, an arbitrary origin a0 is chosen. If relation (17) is satisfied, then an !-group exists. If not, then the origin is placed at position a0 þ p. If relation (17) is satisfied for this origin, then an !-group exists. Otherwise, there is no grouping of points. If an !-group exists, then the infimum and supremum of the group can be determined with respect to the origin for which the grouping exists. The algorithm defining the circular morphological center uses this definition of an !-group. To begin, we take as origin the point which we wish to bring closer to the family { i}. Next, we look at the value of ^ ai . If > p, then either the points { i} do not form an !-group,
¼ _ai or the point is already in the interior of the group. We therefore leave in its initial position. If p, then the points { i} form an !-group, and is outside this group. The morphological center is the point of the group { i} which is closest to , this point always being one of the extremities of the group. The following definition presents a method for calculating the angular value of the morphological center. Definition 3 Given a family of points f i , i 2 Ig on the unit circle, and a point which we wish to bring closer to these points. If we place the origin of the angular values at the position of , then the morphological center is 8 <0 ð Þ ¼ ^ai : _ai
if > p if p and ð0 ^fai , i 2 IgÞ < ð0 _fai , i 2 IgÞ if p and ð0 _fai , i 2 IgÞ < ð0 ^fai , i 2 IgÞ
ð18Þ
^ ai . where ¼ _ai
The last two levels of Equation (18) have the function of choosing the extremity of the !-group closest to . For the examples shown in Figure 3, if we take the origin as the point to be moved closer to the others by the application of the morphological center operator, then it does not move
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
135
for Figures 3(b) and (d), and moves to the position of the circled points in Figures 3(a) and (c). 2. Erosion and Dilation The notion of an !-group (Equation (16)) suggests the introduction of two operators which are similar to the supremum and the infimum. Consider a finite !-group i , i 2 I. For all the origins for which Equation (17) is satisfied, the point at position amax ¼ _fai , i 2 Ig, even if the numerical value of amax depends on the position of the origin, always corresponds to the same point of the group. The same result applies to the infimum ^fai , i 2 Ig. These two extremities therefore have a significance partially independent of the choice of the origin on the unit circle. The operation leads to the introduction of a ‘‘pseudodilation’’ operator. Consider a function a : E ! C, and let B be a structuring element. The pseudodilation : C ! C is defined as follows aðxÞ ¼
_fai ðyÞ, y 2 Bx g aðxÞ
if fai ðyÞ, y 2 BðxÞg forms an !-group : otherwise
ð19Þ
The operator is not a true dilation, as one cannot find an underlying order relation. Nevertheless, for every symmetric B, we can define, by duality, a ‘‘pseudoerosion’’ "aðxÞ ¼
^fai ðyÞ, y 2 Bx g aðxÞ
if fai ð yÞ, y 2 BðxÞg forms an !-group : otherwise
ð20Þ
It follows that all classic extensive mathematical morphology operators, such as openings, closings, reconstructions, and levelings, have a ‘‘pseudo’’ version. Figure 4 shows a comparison between a pseudoerosion and a classic erosion. Figure 4(a) is the hue band of a subregion of the color image in Figure A.1(a). A classic erosion is shown in Figure 4(b), and a pseudoerosion in Figure 4(c). The region in which the differences are the most visible corresponds to the red fruit at the left. The hue values for red are found on the two sides of the angular discontinuity at 0 /360 . The classic erosion reduces them to the smallest values larger than zero. The pseudoerosion, on the other hand, replaces the pixels with the infimum of the group of angular values around zero. It is nevertheless important to examine other regions, such as the base of the wine glass, where the pseudoerosion operator has no effect, due to the pixels in these regions not forming !-groups.
136
ALLAN HANBURY
(a)
(b)
(c)
FIGURE 4. (a) Hue of a 231 134 pixel subregion of the image in Figure A.1(a). (b) Classic erosion of image (a). (c) Pseudoerosion of image (a). Both erosions are done with a square SE of size 2.
By introducing these pseudooperators to avoid the necessity of choosing an origin, we unfortunately lose some of the useful properties of the classic morphological operators. For example, the pseudoopening and pseudoclosing operators are not idempotent (although, in general, they become idempotent after a few iterations). This lack of idempotence is due to the operator not acting on each pixel of the image in the same way, as it leaves some of them in their original state. The decision to change or to leave a pixel depends on the values in the structuring element, which can change with each application of the operator.
F. Circular Centered Morphology It is clear that even though the order of angular values depends on the choice of the origin a0, the order of the differences between angular values is independent of the position of the origin. It is possible to reformulate the mathematical morphology operators which act only on increments of values so that they can be applied to circular data without requiring any initial choice. In this section, the morphological gradient and top-hat operators are adapted to circular data. 1. Gradient We define here the morphological gradient operating on circular increments (Equation (4)), which is therefore applicable to images containing circular data. Let f : E ! R be a differentiable numerical function and B a structuring element. Beucher introduced three morphological gradients, described in Serra (1982): the gradient by erosion f ð f BÞ,
ð21Þ
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
137
the gradient by dilation ð f BÞ f ,
ð22Þ
ð f BÞ ð f BÞ
ð23Þ
and the symmetric gradient
in which f B denotes an erosion of f by B, and f B the corresponding dilation. In the Euclidean space, if we use a small sphere S(x, r) centered on x with radius r as the structuring element B, the symmetric gradient can also be written as gðxÞ ¼ lim f½ f Sðx, rÞ ½ f Sðx, rÞ g=2r r ! 0
ð24Þ
¼ lim f_½ f ðxÞ f ðyÞ, y 2 Sðx, rÞ ^½ f ðxÞ f ð yÞ, y 2 Sðx, rÞ g=2r ð25Þ r ! 0
In a discrete space Zd, this symmetric gradient in terms of erosions and dilations is written as gðxÞ ¼ _½ f ð yÞ, y 2 BðxÞ ^½ f ð yÞ, y 2 BðxÞ :
ð26Þ
By using the following relation f f ðxÞ ^½ f ð yÞ, y 2 BðxÞ g ¼ _f f ðxÞ f ð yÞ, y 2 BðxÞg
ð27Þ
and the relation obtained by inverting the supremum and infimum operators, one can write Equation (26) in a form analogous to that of Equation (25), which contains only increments gðxÞ ¼ _½ f ðxÞ f ð yÞ, y 2 BðxÞ ^½ f ðxÞ f ð yÞ, y 2 BðxÞ :
ð28Þ
For the gradients by erosion and by dilation, Equation (27) and its inversion give their forms. The gradient by erosion is ge ðxÞ ¼ _½ f ðxÞ f ð yÞ, y 2 BðxÞ
ð29Þ
and the gradient by dilation is gd ðxÞ ¼ ^ ½ f ðxÞ f ð yÞ, y 2 BðxÞ ¼ _½ f ð yÞ f ð xÞ, y 2 BðxÞ :
ð30Þ
To move from numerical functions f(x) to angular functions a(x), it is sufficient to replace the increments [ f(x) f( y)] in Equations (28)–(30) by the angular difference given by Equation (4). For the case where the
138
ALLAN HANBURY
structuring element origin is inside the structuring element, the three equations reduce to a unique equation ga ðxÞ ¼ _½aðxÞ aðyÞ, y 2 BðxÞ :
ð31Þ
This is because ½ f ðxÞ f ð yÞ 2 ð1, 1Þ, but ½aðxÞ aðyÞ 2 ½0, 2pÞ, and therefore ^½aðxÞ að yÞ, y 2 BðxÞ in Equation (28) is always equal to zero if the origin is part of the structuring element. Equations (29) and (30) become identical because aðxÞ aðyÞ ¼ aðyÞ aðxÞ. For the case where the structuring element origin is not part of the structuring element, Equation (28) obviously becomes ga ðxÞ ¼ _½aðxÞ að yÞ, y 2 BðxÞ ^½aðxÞ að yÞ, y 2 BðxÞ :
ð32Þ
We demonstrate the action of this operator on Figure 5(a), which is the hue band of the color image in Figure A.1(b). This image was chosen because it is mainly red and purple, which puts the majority of its pixels on the two sides of the origin of the (circular) histogram. A discontinuity is therefore visible in the hue, with red pixels shown at both extremities of the straightened out hue histogram (Figure 5(b)). A classic morphological gradient on the hue (Figure 5(c)) results in a large number of highvalued pixels which do not correspond to highly visible color differences in the initial image. This phenomenon is particularly strongly visible in the outer part of the halo, which has a smooth appearance in the initial image, but which produces strong gradients in Figure 5(c). The circular centered gradient (Equation (31)), shown in Figure 5(d), solves this problem. Note that, for this example, if we add p to each of the hue values, the classic gradient becomes identical to the circular centered gradient. On the other hand, the circular centered gradient remains invariant to rotations of the pixel values. An alternative circular gradient based on measures of angular data dispersion is presented by Nikolaidis and Pitas (1998). 2. Top-Hat The top-hat operator, developed by F. Meyer, and described by Serra (1982), is the residue between a numerical function and its transformation by an opening. It therefore acts only on increments, and hence can be transposed to circular valued functions. We describe below the algorithm which is developed for the case of openings by adjunction (i.e., products by composition of an erosion and its adjunct dilation). We first recall the relation which gives the value B(x) of the opening by structuring element B
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
(a)
(b)
(c)
(d)
139
FIGURE 5. (a) Hue component of Figure A.1(b). (b) Histogram of the hue component. (c) Classic morphological gradient (Equation (23)) of the hue. (d) Circular centered gradient (Equation (31)) of the hue. The gradients were calculated using a square SE of size 1.
at point x. If we denote by fBi , i 2 Ig the family of structuring elements which contain point x, the relation is B ðxÞ ¼ sup finf ½ f ð yÞ, y 2 Bi , i 2 Ig:
ð33Þ
We now consider the top-hat expression f(x) B(x), which we rewrite in terms of increments of f f ðxÞ B ðxÞ ¼ f ðxÞ sup finf ½ f ð yÞ, y 2 Bi , i 2 Ig ¼ sup finf ½ f ð yÞ, y 2 Bi f ðxÞ, i 2 Ig ¼ sup finf ½ f ð yÞ f ðxÞ, y 2 Bi , i 2 Ig: As for the gradient, we replace ½ f ð yÞ f ðxÞ by ½aðxÞ aðyÞ . Nevertheless, it is necessary to take into account the fact that we are replacing the expression ½ f ð yÞ f ðxÞ, y 2 Bi 2 ð1, 1Þ by the expression
140
ALLAN HANBURY
(a)
(b)
(c)
(d)
(e)
FIGURE 6. (a) Hue component of a 311 227 pixel subregion of Figure A.1(a). (b) Classic top-hat by a square SE of size 1 applied to image (a). (c) Circular centered top-hat by a square SE of size 1 applied to image (a). (d) Histogram of image (b). (e) Histogram of image (c).
½aðxÞ aðyÞ, y 2 Bi 2 ½0, 2pÞ, and in consequence, if the structuring element origin forms part of the structuring element, the expression inf ½aðxÞ aðyÞ, y 2 Bi is always equal to zero. To avoid the result of this top-hat operator always being zero, it is necessary to use the dual form, which is equivalent, but which is written only in terms of suprema ATH½aðxÞ ¼ supfsup½aðxÞ aðyÞ, y 2 Bi , i 2 Ig:
ð34Þ
An example of the use of this top-hat is shown in Figure 6. Figure 6(a) is the hue band of a subregion of the color image in Figure A.1(a). In the color image, the red regions, i.e., those that are found on the discontinuity of the hue values, have been manually outlined. These hue value discontinuities are visible in Figure 6(a). The result of a classic top-hat operator applied to
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
141
Figure 6(a) is shown in Figure 6(b), with its histogram in Figure 6(d). It is clear that even though the colors do not change significantly in the regions indicated, many large-valued pixels appear in the top-hat result, and are also visible in the histogram. The result of the circular centered top-hat is shown in Figure 6(c), with its histogram in Figure 6(e). In this image, the false high values are no longer present.
G. Partitions Up to now, we have tried to construct a lattice on the unit circle C by changing the position of the origin, possibly variable at each point x 2 E. In this section, we consider the more classic lattice of sets or partitions, in which the directions are used only to label the partition elements. Starting from the concept of connected partitions, we develop opening and closing operators which act on these partitions. Next, the opening is further developed to produce a version which is rotationally invariant, which leads to an alternative definition of the top-hat. 1. Connected Partitions A connected partition of a space is defined as follows: Definition 4 A partition of the space E for which each element is connected is an application D : E ! PðEÞ, with connectivity C defined on P (E), such that for all points x and y of E: (1) x 2 DðxÞ (2) x ¼ 6 y ) DðxÞ ¼ DðyÞ or DðxÞ \ DðyÞ ¼ ; (3) DðxÞ 2 C: The first two axioms require that each x 2 E forms part of an element of the partition, and that there is no overlapping of partition elements. These two axioms define partitions in general. The third, more specific to our needs, imposes a connectivity on the partition elements. We define a connected partition as a partition of E for which the partition elements are connected. A proof that a family of connected partitions forms a complete lattice is given in Appendix A. 2. Indexed Partitions We move from a connected partition to an indexed partition by associating an index (e.g., linked to the hue or direction) with each element of the partition.
142
ALLAN HANBURY
Definition 5 An indexed partition of a space E, indexed by a finite number N, is as application D : E ! PðEÞ with a function M : PðEÞ ! ½1, 2, . . . , N which associates an index with each element D(x) of the connected partition. To simplify the notation, we define ( Dðx, iÞ ¼
DðxÞ
if M½DðxÞ ¼ i
;
otherwise
:
ð35Þ
The N sets associated with the gamut of indices (hue, direction, etc.) are called phases, and the phase Ai is the union of the partition elements associated with index i Ai ¼ [fDðx, iÞ, x 2 Eg:
ð36Þ
As each point x 2 E must be associated with an index, there are only N 1 independent index values—if we know the position of N 1 phases, the position of the Nth phase is necessarily known. Appendix B deals with lattices of indexed partitions and the behavior of increasing operators on these partitions. We now consider in more detail opening and closing operators acting on indexed partitions. 3. Cyclic Operators Indexed partitions constructed on the unit circle are called cyclic partitions. An operator acting on such a partition is called cyclic when it acts on the phases associated with all the indices of the partition. When a cyclic closing is applied to a cyclic partition, it is clear that this operation, being extensive, leads to the interaction of different phases. In order to be able to take these interactions into account, the closing is applied to phases in series. Conversely, the opening, because of its antiextensivity, can be applied to the phases in a parallel fashion. 4. Series Closings We equip the space E with a proper connection, i.e., a connection for which every grain of X is adjacent to at least one pore, and each pore to at least one grain ðX E, except for X ¼ ; and X ¼ EÞ. Let ’ be a connected closing on P(E). We introduce the following operation k ðEÞ ¼
x ’B ðAk Þ
if k ¼ l
Al n½x ’B ðAk Þ
if k 6¼ l
8l ¼ ½1, 2, . . . , N
ð37Þ
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
143
in which Ai is given by Equation (36), B is a structuring element, and x indicates the point connected opening. The first line of the equation applies a connected closing to the phase Ak, and the second line removes the region assigned to the phase Ak from the other phases, thereby ensuring that the properties of a partition are not lost. This closing of one phase is obviously not cyclic. Consider now the product by composition ¼
N
2
1
ð38Þ
applied to the N phases. The operator has the following effect on the partition: 1 closes certain pores of the phase A1 according to an increasing criterion, which signifies that if a certain pore is not closed, then no pores larger than it will be closed. The operations 2, 3,. . ., N then transform certain grains of x’(A1) into pores without ever adding more grains. Because the connection on E is proper, each grain of x’(A1) subsequently transformed into a pore can only increase the size of the adjacent pores of ¼ . In other x’(A1). Consequently, 1 ¼ , and by the iteration words, is idempotent. The practical effect of each operator i is to assign the index i to connected components of the partition which are smaller than the structuring element and which are entirely surrounded by the phase i. The result of the operator is not independent of the order of application of the closing operators i, as is demonstrated schematically in Figure 7. An example of a cyclic closing which simplifies the hue of the color image shown in Figure A.2(a) is given. The hue is first partitioned by using a simple algorithm for constructing the limits of partition elements. This algorithm constructs indices starting from 0 , requiring that each phase have either a maximum number of pixels (here equal to one-sixth of the total number of pixels in the image), or a maximum width of 45 . The 10 phases generated by this algorithm for the hue of the example image are listed in Table 1. Figure 8(a) shows the hue image containing the labeled phases. A cyclic closing (Equation (38)) by a square SE of size 10, with the phases processed in the order of increasing index i, is applied to the indexed partition to produce the closed indexed partition shown in Figure 8(b). To reconstruct a color image, each phase is replaced by its mean hue (Table 1), and this image is recombined with the initial saturation and brightness images to create the image of Figure A.2(b). In this image, some effects of the closing on the hue, while not striking, are visible. For example, the white elements of the mosaic which are surrounded by red have taken on a light red color in the output image.
144
ALLAN HANBURY
FIGURE 7. Schematic example of a cyclic closing using the indicated structuring element, which demonstrates that the result depends on the order of the component operators. TABLE 1 UPPER AND LOWER LIMITS OF THE ELEMENTS OF THE INDEXED PARTITION OF THE HUE USED IN FIGURE 8, AND THE MEAN HUE CALCULATED IN EACH PHASE Phase 1 2 3 4 5 6 7 8 9 10
Lower limit
Upper limit
Mean hue
0 35 48 93 138 183 211 241 286 331
35 48 93 138 183 211 241 286 331 360
22.3 40.5 60.1 115.1 163.2 199.7 218.2 256.1 311.7 353.3
In summary, the aim of this approach is to replace spatial data for which the choice of an origin is not obvious by a processing of indexed sets. The operator is, however, not rotationally invariant, as the result depends on the order in which the indices are processed. We presented an example of the simplification of the hue of a color image; another good example of the use of this operator on images of thin polarized sections of silicates is given by Mlynarczuk et al. (1998). 5. Parallel Openings The result of a series closing is to relabel some connected components using already existing indices. An opening, on the other hand, completely removes
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
(a)
145
(b)
FIGURE 8. Example of a cyclic closing. (a) Indexed partition of the hue band of Figure A.2 containing 10 phases. (b) Indexed hue partition after the cyclic closing by a square SE of size 10. The corresponding color image is shown in Figure A.2(b).
some of the connected components. In order to remain within the framework of partitions, we solve this problem by beginning with a partition labeled by N 1 indices, and we label the components which are removed by the opening with index N. It is clear that there is no interaction between different components of the partition when applying an opening, in contrast to the cyclic closing, which changes the shape of some of the components. This operator is therefore simply a reindexation of the components of a partition. We assign index N to the components which are eliminated by a connected opening. Those which are not eliminated keep their initial indices. This operation is cyclic as it acts on all indices. This reindexation can be symbolically written as M½DðxÞ :¼
M½DðxÞ N
if B ½DðxÞ 6¼ ; otherwise
8x 2 E
ð39Þ
in which B is a connected opening, and the symbol :¼ indicates that the value on the right is assigned to the one on the left. The phase with label N plays the role of residue of the opening. We take advantage of the fact that there is no interaction between the phases during the application of a cyclic opening in order to reformulate this opening to be applicable to labeled angular images, which permits the development of a simple top-hat operator. In a labeled image, it is not necessary to have a label at every point of the image; the residue is therefore represented more conveniently by the absence of a label, as in the case of sets. With the formulation in terms of labels, we are also no longer limited to using connected openings, as it is no longer necessary to preserve the properties of a partition.
146
ALLAN HANBURY
We label an angular image by choosing label boundary points qi, i ¼ 1, 2,. . ., N with q1 ¼ qN on the circle. The label i (equivalent to phase i in the partition context) is given by Ai ¼ fx : x 2 E, aðxÞ 2 ½qi , qiþ1 Þg:
ð40Þ
The set of labels which are not removed by an opening with structuring element B is Bc A ¼
N1 [
B ðAi Þ:
ð41Þ
i¼1
The result of this opening is independent of the order of the elementary openings from which it is constructed, as demonstrated schematically in Figure 9. Furthermore, the union in Equation (41) does not necessarily require a finite set of N nonoverlapping labels, which allows the easy development of a rotationally invariant labeled opening. 6. Rotationally Invariant Cyclic Opening An application of a cyclic opening, as defined in the preceding section, to a labeled image produces an image in which some connected components have lost their labels, but in which none of the components have been assigned a different label. Consequently, it is not necessary to carefully follow the evolution of the connected component labels, it is sufficient to look at the
FIGURE 9. Schematic example of a cyclic opening using the indicated structuring element, demonstrating that the result is independent of the order of the elementary openings from which it is built.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
147
intersection or union of results of openings acting on labels which overlap. We can therefore transform the images by extracting sets of pixels which satisfy certain criteria, applying an opening to each set, and combining the results in an isotropic way. Let A( , !) be a set of points x 2 E for which the angular value a(x) lies in the range [ , þ !] Að , !Þ ¼ fx : x 2 E, aðxÞ 2 ½ , þ !Þg: The opening BA( , !) behaves like a binary opening, with A( , !) in the foreground, and its inverse Að , !Þ in the background. To make this operator isotropic, we take the union of the transformed sets B[A( , !)] when moves around the unit circle, i.e., ðB, !Þ ¼
[
B ½Að , !Þ , 0 2p :
ð42Þ
One obtains a binary image whose foreground pixels are those which were not removed by the opening for at least one value of the angles . We therefore consider that all the pixels which were removed by the operator correspond to the residue of the operator. As for the top-hat, this residue, denoted byR ðB, !Þ, can be obtained by the set difference between the result of the opening ðB, !Þ and the union of all the labels [ {A( , !), 0 2p}, which we write R ðB, !Þ ¼ [ Að , !Þ, 0 2p nðB, !Þ:
ð43Þ
Given that the union of all the labels encompasses the entire image, the residue can equivalently be obtained by inversion of the result of the opening R ðB, !Þ ¼ ðB, !Þ:
ð44Þ
This residue contains all the pixels which were eliminated by the opening for each angle . In practice, in order to speed up the calculation, it is often necessary to approximate the variation of the angle in Equation (42) by a few discrete values, for example by varying starting from an origin 0 in steps of size . It is possible that the use of such an approximation adds some supplementary regions to the residue. This is demonstrated by the pathological situation represented in Figure 10, in which image (a) contains two angular values represented by two graylevels. For the labeled openings applied to this image, we use the structuring element shown, and a value of
148
ALLAN HANBURY
FIGURE 10. (a) Schematic image showing two angular values represented by two graylevels. (b–d) Label A( , !) shown in gray for the values of given below each image, and ! ¼ 30 . The structuring element used for the labeled opening is shown at the top.
! ¼ 30 . Figure 10(b) shows in dark gray the label A( , !) for 59 < 76 , Figure 10(c) for 76 < 89 , and Figure 10(d) for 89 < 106 . It is clear that an opening of the labeled region by the structuring element shown leaves Figures 10(b) and (c) in their initial state, and removes the labeled region in Figure 10(d). If we use all the values of 2 ½0 , 360 Þ, then the central region will not form part of the residue as it is not eliminated for all values of . If we choose to use an approximation with 0 ¼ 0 and ¼ 15 , only takes on the values 45 , 60 , 75 , 90 , and 105 in the interval of interest, which avoid the configuration of Figure 10(c). The central region therefore forms part of the residue with this approximation. We now present an example which illustrates the steps in a cyclic opening applied to the hue component (Figure 11(b)) of the subregion of the color image of Figure A.2(a) shown in Figure A.1(c). We apply a labeled connected opening (i.e., opening with reconstruction) with ! ¼ 90 and a square SE of size l ¼ 7. The opening (Equation (42)) is done by varying from 0 ¼ 0 to 315 in steps of size ¼ 45 . The definition of the labels is shown in Figures 12(a) and (b), and the labeled images are shown in Figures 12(c) and (d) (two labeled images are shown because the labels overlap). The results of the openings applied to each label in Figures 12(c) and (d) are shown in Figures 12(e) and (f ), respectively, with the residue indicated in white. This residue corresponds to the labeled regions which were completely removed by the opening. The final residue (Equation (44)) is shown in Figure 11(c). When looking at this result, it is clear that the residue is made up of two types of regions: (1) Those which have a label different to that of the neighboring regions, and which are smaller than the structuring element.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
(a)
(b)
149
(c)
FIGURE 11. The (a) luminance and (b) hue of Figure A.1(c). (c) The residue of a cyclic opening on the hue with ! ¼ 90 and l ¼ 7.
(2) Those in which the pixel values have a variation larger than the size ! of a label. In the case of hue images, the regions having a low saturation often fall into this category. These observations can be described in a more rigorous way. When the angle ! varies from 0 to p, it is clear that the opening is an increasing function of !. In addition, this opening is also a decreasing function of the structuring element size parameter l, from which we get the following proposition. Proposition 6 Let a : E ! C be an angular valued function, l a granulometry, and A( , !) a set of points having angular values which satisfy the restriction Að , !Þ ¼ fx : x 2 E, aðxÞ 2 ½ , þ !Þg: Then the operator ð, !Þ ¼
[ ½Að , !Þ , 0 2p
is an isotropic opening. The family fðl, p !Þ, 0 ! p, l > 0g gives rise to a double granulometry with respect to the parameters l and p !.
150
ALLAN HANBURY
(a)
(b)
(c)
(d)
(e)
(f)
FIGURE 12. (a), (b) Definition of the labels with parameters ! ¼ 90 , 0 ¼ 0 and ¼ 45 . (c), (d) Labeled hue following definitions (a) and (b). (e), (f ) Results of labeled openings, in which the residue is marked in white.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
151
An example illustrating this double granulometry is given in Table 2, in which the residues of a labeled opening applied to the hue band shown in Figure 11(b) are shown. The value of ! decreases from left to right, and the size l of the square SE increases from top to bottom. The area of the residue is therefore largest for the bottom right image. An application of this operator for the detection of defects in an oriented texture is given in Section III.B.3. In practice, the labeled opening is faster than the circular centered opening, and could be accelerated even more as it acts on the data in a parallel way (i.e., each l( , !) could be calculated by an independent processor).
H. Conclusion In this section, we presented four different methods for applying mathematical morphology to images containing circular data. The principal aim was to develop operators which (1) take into account the fact that the values on the circle are cyclic (and hence the discontinuity in the values at the origin 0/2p) (2) are rotationally invariant, i.e., independent of the choice of an origin. These objectives are met to differing extents by the operators developed, as unfortunately there is no general solution satisfying the two prerequisites. In summary, all the operators introduced take the periodicity into account, but only some are rotationally invariant: the pseudooperators, the circular centered operators, and the labeled cyclic opening. Often, in order to remove the necessity of choosing an origin, and to build a rotationally invariant operator, other preliminary choices are necessary. For example, for the pseudooperators, we are obliged to choose a definition of grouped data, and for the labeled openings, a sector size is needed. The only operators for which both prerequisites are satisfied without imposing an alternative choice are the circular centered operators. Nevertheless, only very few morphological operators can be rewritten in this form. In practice, certain operators show themselves to be more useful or convenient to use than others. The pseudooperator approach, for example, seems to give rise to more inconveniences (loss of basic properties) than advantages (isotropic). The cyclic closing, which requires an initial definition of a set of labels, is very difficult to apply in cases for which the label boundaries are not obvious. The fact that it is not rotationally invariant also limits its applicability.
152
ALLAN HANBURY TABLE 2 DEMONSTRATION OF THE DOUBLE GRANULOMETRY l/!
90
45
20
1
3
5
7
The columns show residues (in white) obtained for decreasing values of !, and the rows show the residues obtained for increasing structuring element size l.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
153
Without a doubt, the most useful operators are:
The two top-hats (circular centered and labeled cyclic), which are applied to defect detection in an oriented texture in Sections III.B.3 and III.B.4. The circular centered gradient, used for the extraction of smoothly varying regions in a phase image in Section III.A, and for the segmentation of an oriented texture in Section III.B.2. The operators requiring the choice of an origin, which are the earliest to apply in vector spaces (Section V.B.2).
III. APPLICATION EXAMPLES In this section we give examples of applications in which the angular data can often be treated separately. The first example (Section III.A) involves extracting homogeneous regions in Fourier transform phase images using the circular centered gradient operator. We then discuss some applications in oriented texture analysis (Section III.B). A. Homogeneous Phase Extraction in HRTEM Images Boulc’h et al. (2001) have measured the size of crystalline domains in yttriadoped nanocrystalline zirconia (Y-tetragonal zirconia polycrystals or YTZP) by image analysis of high-resolution transmission electron microscope (HRTEM) images. A geometric phase analysis method, developed by Hy¨tch et al. (1998), was used to make the crystalline domains visible. As the phase image consists of angular values, the morphological operators developed in Section II are perfectly suited to its analysis. We show here how the use of the circular centered morphological gradient can simplify the automated extraction of the crystalline domains from the phase image. For completeness, we first briefly describe the construction of the phase image. Figure 13 shows an HRTEM image of Y-TZP, which is used to illustrate the procedure. In order to compute the phase image (Hy¨tch et al., 1998), the Fourier transform of the image is first calculated, and one of the peaks in the Fourier transform amplitude (Figure 14) is chosen. The Fourier transform is then multiplied by a Gaussian mask centered on the chosen peak, and the inverse Fourier transform of the masked image is calculated. After subtracting a factor corresponding to the chosen frequency, one obtains the phase image. Figure 15 shows the geometric phase image corresponding to the indicated peak in Figure 14.
154
ALLAN HANBURY
FIGURE 13. An HRTEM image of Y-TZP (size 1024 1024 pixels). (Image courtesy of F. Boulc’h and P. Donnadieu.)
FIGURE 14. The amplitude of the Fourier transform of Figure 13. The geometric phase image corresponding to the indicated peak was calculated.
The regions of homogeneous phase in the phase image correspond to the crystalline domains. In order to easily extract the regions in which the angular values vary slowly, we apply a circular centered morphological gradient with a square SE of size 5 to the geometric phase image, resulting in the image shown in
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
155
FIGURE 15. The geometric phase image corresponding to the Fourier peak indicated in Figure 14.
FIGURE 16. The morphological circular centered gradient of the geometric phase image of Figure 15 calculated using a square SE of size 5.
Figure 16. In this image, the homogeneous (slowly varying) regions of the geometric phase image result in regions of low grayvalue (note that the rate of spatial variance of the regions to be found is selected by the size of the structuring element used). These low-grayvalue regions can easily be extracted using a threshold.
156
ALLAN HANBURY
FIGURE 17. The threshold of Figure 16 showing the pixels having graylevels between 0 and 80.
In Figure 17, the regions of the circular centered gradient image (Figure 16) with graylevels between 0 and 80 are extracted (the upper threshold limit was chosen by hand, but it should be quite stable over a range of images). If the small areas included in the threshold are not of interest, they can be removed using a morphological area opening (Soille, 1999). As a further demonstration of the unsuitability of the standard morphological gradient for circular data, we have applied such a gradient operator to the geometric phase image. The resulting gradient is shown in Figure 18. It is clear that a number of false strong gradients, due to the discontinuity in pixel values between p and p, have been detected. They have been indicated in the figure.
B. Oriented Texture An oriented texture is an anisotropic texture characterized by a dominant local orientation at each point of the texture. To describe such a texture quantitatively, the principal orientation is calculated in a group of neighborhoods superimposed on the image containing the texture. If we represent each neighborhood by one pixel having as value the dominant orientation in the neighborhood, we create an image which summarizes the
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
157
FIGURE 18. The standard morphological gradient operator applied to Figure 15. The false strong gradients due to the angular value discontinuity between p and p are indicated.
texture in the form of a direction field. In each neighborhood, one can also calculate a measure of coherence or a level of anisotropy so as to construct a second summary image. If the centers of the neighborhoods are separated by a distance of more than one pixel, then the summary images are smaller than the initial texture image. We first describe the Rao and Schunck algorithm for calculating an orientation field summarizing an oriented texture. This orientation field is then used for segmentation in conjunction with the circular centered gradient and watershed operators, and defect detection using the circular centered top-hat and labeled top-hat operators.
1. The Rao and Schunck Algorithm To calculate the summary images, we use an algorithm developed by Rao and Schunck (Rao, 1990; Rao and Schunck, 1991), based on an approach by Kass and Witkin (1987). It is based on the gradient of a Gaussian filter. In two dimensions, the Fourier transform of the first derivative of a Gaussian function consists of two lobes on opposite sides of the origin in frequency space. An oriented texture would have a dominant frequency component, and the response of the gradient of the Gaussian filter
158
ALLAN HANBURY
can be fitted to this dominant component (Rao, 1990). The steps of the algorithm are: (1) A Gaussian filter of standard deviation 1 is applied to the initial grayscale image in order to choose the scale of the interesting structures. (2) For each pixel (k, l), the horizontal and vertical gradients Hkl and Vkl are calculated. This is done by a convolution with the Prewitt or Sobel kernels (Gonzalez and Woods, 1992). (3) For each pixel (k, l), the modulus Rkl and angle kl (between 0 and 360 ) are calculated from the gradient values. (4) A neighborhood W of width 2h and height 2v is moved over the image in steps of h pixels horizontally, and v pixels vertically. At each position (x, y), the local orientation ^xy (between 0 and 180 ) and the orientational coherence xy are calculated: P R2 sin 2kl ^xy ¼ 1 arctan P ðk,lÞ2W kl 2 2 ðk,lÞ2W Rkl cos 2kl
ð45Þ
and P xy ¼
jRkl cosð^xy kl Þj P : ðk,lÞ2W Rkl
ðk,lÞ2W
ð46Þ
The dominant orientation is essentially the angular mean of the directions within the neighborhood, the relation between these two definitions being discussed after the presentation of the algorithm. The coherence is the sum of the lengths of the unit vectors with directions kl projected onto the unit vector in the mean direction ^xy of the neighborhood. It gives the proportion of the directions which are close to the mean direction. (5) We lastly build two summary images and , which represent, respectively, the distribution of the orientations and of the coherences. In these images, each pixel encodes the values calculated at all neighborhood positions. In symbolic form kl ¼ ^ðkh Þðlv Þ
and
kl ¼ ðkh Þðlv Þ:
The important step in this algorithm is the calculation of the orientation by Equation (45). The form of this equation is similar to that of the angular mean presented in Section II.B (one also has an arc-tangent of a sum of sine terms divided by a sum of cosine terms). The mean direction given by
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
159
Equation (45) is in fact the second trigonometric moment (Fisher, 1993). This moment was chosen so that vectors in opposite directions reinforce each other,3 the behavior needed when working with axial data. The sums are augmented by taking the moduli of the vectors into account, thereby giving more importance to directions associated with large moduli.4 Lastly, the result of the arctan function is divided by two so as to place the resultant angle in the range of axial data. Figure 19(b) shows the orientation summary image for the plank of wild cherry wood shown in Figure 19(a). For this image, the veins form the dominant oriented texture. We first use a threshold to separate the wood from the background, and then calculate the orientation summary image of the wood using the parameters 1 ¼ 1.4, 2h ¼ 2v ¼ 32, and h ¼ v ¼ 16. In this image, the graylevel of each pixel represents an angular value between 0 and 179 . The histogram of the orientation distribution, in other words the histogram of Figure 19(b), is shown in Figure 19(c), and a schematic diagram showing the encoding of the vein directions is given in Figure 19(d). This encoding is used for all the orientation summary images shown in this chapter. As is clear from the histogram, the majority of the veins of Figure 19(a) have orientations in the 40 to 100 range. The values of the parameters 1, 2h, 2v, h, and v are chosen as a function of the data being analyzed. The 1 parameter, the Gaussian filter standard deviation, chooses the scale of the texture to be processed. Higher values of 1 lead to the removal of small details from the image. The values of the 2h and 2v parameters, which specify the size of the neighborhood in which the mean direction is calculated, have less effect on the result. They should, however, be chosen so that there is at least one oriented structure within each neighborhood. The values of the h and v parameters specify the level of subsampling of the initial image. To take all the pixels in the initial image into account, it is necessary that h 2h and v 2v. For practical applications, a more efficient form of Equation (45) is available. One can derive it by using the relations R2kl ei2kl ¼ R2kl cos 2kl þ iR2kl sin 2kl
3
Consider a vector having a representation Rei in polar coordinates. The square of this vector is R2e2i. The vector facing the opposite way to Rei is described by Rei( þ p) and its square is R2 eið2þ2pÞ ¼ R2 e2i . Therefore, the addition of the squares of two vectors in opposite directions gives a vector having modulus 2R2 (the vectors reinforce each other). 4 In Section V.A we use a similar weighting in the context of color images.
160
ALLAN HANBURY
Frequency
15 10 5 0
0
20
40
60
80 100 120 140 160 180 Gray level
FIGURE 19. Calculation of an orientation image. (a) Initial image of size 272 608 pixels (courtesy of Scanwood System, Pont-a`-Mousson, France). (b) Orientation summary image (size 13 33 pixels). (c) Histogram of image (b). (d) Schematic diagram showing how the direction angles are encoded by the graylevels of image (b).
and R2kl ei2kl ¼ ðHkl þ iVkl Þ2 ¼ Hkl2 Vkl2 þ 2iHkl Vkl whence R2kl sin 2 ¼ 2Hkl Vkl
and
2 R2kl cos 2 ¼ Hkl Vkl2
which are substituted into Equation (45) producing a version having fewer trigonometric functions and which directly uses the gradient values P 2Hkl Vkl ^xy ¼ 1 arctan P ðk,lÞ2W : 2 2 2 ðk,lÞ2W Hkl Vkl The Rao and Schunck algorithm is a simple and fast way of calculating the principal orientations of a texture. We now present some variations of the algorithm which have been presented in the literature. Bigu¨n et al. (1991) give an alternative derivation of the same algorithm, except that the Gaussian convolution and gradient calculation are combined into a single
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
161
step. An alternative to steps 1 and 2, i.e., the choice of the scale and the calculation of the gradients, is a dyadic wavelet transformation (Mallat, 1998) which efficiently gives the horizontal and vertical gradients at several scales. Davies (1997) proposes a set of filter kernels which are well suited to determining the orientations of linear structures, and can be used as a replacement for the kernels in step 2. The main disadvantage of the Rao and Schunck algorithm is its inability to take into account cases in which there are more than one principal orientation in a neighborhood. Andersson and Knutsson (1991) present an approach capable of separating two directions, and Chetverikov (1999) presents a method which can detect several orientations by using a measure of anisotropy. Freeman and Adelson (1991) introduce the notion of steerable filters, a generalization of filter banks allowing the calculation of an orientation and of a coherence at each point of an image, as well as the possibility to deal with multiple principal orientations in a single neighborhood. Picard and Gorkani (1994) give the results of an experiment which compares the principal orientations found in the Brodatz textures by the Freeman and Adelson algorithm to those perceived by humans. 2. Segmentation Morphological segmentation of a grayscale image is usually done by applying the watershed algorithm to the gradient of the image. The circular centered gradient operator allows one to segment an image containing circular data in the same way. We present an example of the segmentation of an oriented texture. The aim of this type of segmentation is to create regions in which the orientations are homogeneous. The steps in the segmentation algorithm, which are illustrated in Figure 20, are: (1) The Rao and Schunck algorithm is applied to the initial image (a plank of oak, Figure 20(a)) to calculate the orientation image (Figure 20(b)). For the example, the parameters 1 ¼ 1.4, 2h ¼ 2v ¼ 64, and h ¼ v ¼ 8 were used. (2) The circular centered gradient of the orientation image is calculated (Figure 20(c)). For the example, a square SE of size 2 was used. (3) The minima are extracted. So as to avoid finding a large number of small minima which would result in an over-segmentation of the image, we close the gradient image with a square SE of size 1, and then find the h-minima (Soille, 1999) of height h ¼ 5 (Figure 20(d)). (4) The watershed is applied to the gradient image using the minima extracted in the previous step as markers, producing the segmentation shown in Figure 20(e). In this image, the watershed lines are
162
ALLAN HANBURY
(a)
(b)
(c)
(d)
(e)
FIGURE 20. Steps in the segmentation of an oriented texture. (a) Initial image with size 420 1040 pixels (courtesy of Scanwood System, Pont-a`-Mousson, France). (b) Orientation image with size 50 125 pixels. (c) Morphological circular centered gradient (with a square SE of size 2). (d) h-minima. (e) Watershed segmentation of the circular centered gradient image (c) using the markers in (d). The graylevel in each region encodes the mean orientation of the region.
(a)
(b)
(c)
(d)
FIGURE 21. Results of segmentations of oriented textures by the watershed algorithm for four oak images. (Images courtesy of Scanwood System, Pont-a`-Mousson, France.)
shown in black, and the graylevel of each region encodes the mean orientation of the region, calculated using circular statistics. For visualization purposes, the segmentation obtained is superimposed on the initial image in Figure 21(a), in which the black lines represent the watershed lines.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
163
Some further results of the segmentation of oriented textures are shown in Figures 21(b)–(d). In general, this algorithm manages to segment the textures into homogeneous regions, with the more globally homogeneous textures segmented into the fewest regions, as in Figure 21(b) for example. Some problems are nevertheless present in the current formulation of the algorithm: the first is common to almost all watershed segmentations, and involves the choice of markers. If we take all the minima of the gradient as markers, an over-segmentation (segmentation into too many regions) is produced. With the current approach, a small closing followed by the extraction of the h-minima, the number of regions is reduced, but a change in the value of parameter h can provoke a large difference in the segmentation. The segmentation of an oriented texture can also be modified by changing the scale parameter ( 1) in the calculation of the orientation image. A last difficulty is when the orientation variations are not localized enough to be detected by the structuring element used in the gradient calculation, which can lead to the presence of more than one orientation in one of the regions of the segmentation. Several possible solutions to these problems remain to be studied. For example, starting with an oversegmentation of the image, and then fusing regions with similar mean orientations using a graph of the partition (Meyer, 1999) so as to eliminate over-segmented regions, or taking into account several partitions of the same texture so as to find the most probable one (Nickels and Hutchinson, 1997). 3. Defect Detection with the Circular Centered Top-Hat We show the application of the circular centered top-hat operator to the detection of defects in oriented textures. The examples in this section were used by Chetverikov and Hanbury (2002) in studying the contribution and the limits of using the two most important perceptual properties of texture, regularity and isotropy, in detecting texture defects. Here we show some of the examples which use the orientation-based method so as to illustrate the application of the circular centered top-hat. For texture defects characterized by an orientation anomaly, the circular centered top-hat is a good choice for creating an image in which the defects can be detected by a threshold. To show the application of this operator, five images of size 256 256 pixels having a texture defect visible were chosen from the Brodatz (1966) album. Their reference numbers are d050, d052, d053, d077, and d079. The orientation images were calculated using the Rao and Schunck algorithm with parameters 1 ¼ 1.75, h ¼ v ¼ 2, and 2h ¼ 2v ¼ 16, except for images d052 and d079 for which 2h ¼ 2v ¼ 32 were used. The threshold
164
ALLAN HANBURY
for isolating the defect was chosen by hand for each image. The results for the five images are shown in Figure 22. In each line, the initial image, the orientation image, the result of the circular centered top-hat, and the borders of the thresholded regions superimposed on the initial image are shown. In textures d052 and d053, the defects are very subtle modifications of the structure, yet they perturb the orientations enough to be detected. The defect in texture d077 is easily seen and easily detectable in the orientation image. In textures d050 and d079, the defects cause perturbations in the orientation field, but the borders of these defects are not obvious, even to the naked eye. Among these textures, the only one made up of oriented lines is d050, but the others are anisotropic enough to have a uniform orientation field which is perturbed by the defects. This approach is obviously not applicable to textures which are not anisotropic, some examples of which are given by Chetverikov and Hanbury (2002). 4. Defect Detection with the Labeled Opening The labeled opening and its associated top-hat (Section II.G.5) have the advantage of being extremely rapid, and are therefore attractive for highspeed industrial inspection problems. In this section we show some examples of its application to the important industrial problem of the automated detection of defects on wood boards (Kim and Koivo, 1994; Silve´n and Kaupinen, 1996; Niskanen et al., 2001), part of a project done in collaboration with Scanwood System, Pont-a`-Mousson, France. In most of the existing algorithms, the defects are detected using color characteristics. For example, the knots are considered to be the darkest objects on the boards. We briefly consider the possibility of enriching the color information by a preliminary detection which takes into account the fact that certain types of defects cause a perturbation in the orientation of the veins in their neighborhood. This could allow the detection of defects which are not completely discernible by their color. For wood, the most important structural defects are the knots, some of which do not have a color very different to that of the wood, but which nevertheless cause perturbations in the surrounding vein orientations. The defects identifiable only by a change in color are obviously not detectable by these orientational methods, the same being applicable to defects due to external influences, such as insect stings. For the experiments, we used a database of oak boards with a very high defect occurrence. In order to speed up the calculation, we used a large separation between the neighborhoods in the orientation image calculation. The parameters used were 1 ¼ 1.4, 2h ¼ 2v ¼ 64, and h ¼ v ¼ 16. Some
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
d050ini
d050ori
d050ath
d050det
d052ini
d052ori
d052ath
d052det
d053ini
d053ori
d053ath
d053det
d077ini
d077ori
d077ath
d077det
d079ini
d079ori
d079ath
d079det
165
FIGURE 22. Results of defect detection by the circular centered top-hat applied to some Brodatz textures. The images labeled ‘‘ini’’ are the initial images, ‘‘ori’’ the orientation images, ‘‘ath’’ the top-hat operator results, and ‘‘det’’ the regions detected by the threshold superimposed on the initial image.
166
ALLAN HANBURY
images and their corresponding orientation images are shown in the first two columns of Figure 23. We then calculated the top-hat based on the labeled opening of the orientation images. The opening was done with a sector size ! ¼ 45 , and by varying from 0 ¼ 0 to 157.5 in steps of ¼ 22.5 . A square SE of size 3 was used. The residue of this top-hat enlarged and superimposed on the initial image is shown in the rightmost column of Figure 23, in which the light regions correspond to the residues. We briefly discuss the results shown in Figure 23:
For image c005, the black vein is evidently not detected as it does not perturb the orientation. The knot at the top right is detected, but the small knots at the bottom left do not perturb the orientation enough to be detected. For image c007, the large knot is detected, but the fissures to the left of the knot, which have similar orientations to the veins, are not detected. Some false detections near the borders of the image are also present. Image c034 demonstrates that this method is not very useful on boards which contain veins having elliptical forms. Their large curvature leads to many false detections. If one calculates the orientation image at a finer resolution, then smaller defects can be detected. For oak, this is useful for detecting the small light patches, some of which are indicated in Figure 24(a). Even if these are not classified as defects, their detection can be important if one wishes to determine the aesthetic appearance of the wood. The detection of these light patches based only on their color is rather difficult, as their color is very similar to that of other structures on the wood. On looking at the orientation image, one can see that because the light patches cut the veins, they produce perturbations in the orientation field which can be detected by a top-hat. The orientation image of Figure 24(a), calculated using the parameters 1 ¼ 1.4, 2h ¼ 2v ¼ 16, and h ¼ v ¼ 8 is shown in Figure 24(b). The result of a top-hat based on a labeled opening is shown in Figure 24(c). The parameters of this operator are ! ¼ 45 , 0 ¼ 0 and ¼ 22.5 , and a square SE of size 4 was used. Globally, it is clear that a perturbation in the vein orientation is not always associated with a defect, and that a defect does not always perturb the surrounding veins. This method of defect detection can therefore not function as a total solution to the defect detection problem. The results can nevertheless be used in a defect classification step, which takes color and other texture variations into account along with the orientation perturbations in the calculation of the probability of a defect being present at a certain position on the board.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
c005ini (438 × 1076)
c005ori (21 × 61)
c005cyc (438 × 1076)
c007ini (421 × 568)
c007ori (21 × 30)
c007cyc (421 × 568)
c034ini (432 × 686)
c034ori (23 × 37)
c034cyc (432 × 686)
167
FIGURE 23. Results of the detection of regions having orientation perturbations using the top-hat based on the labeled opening. The ‘‘ini’’ images are the initial images of oak boards (courtesy of Scanwood System, Pont-a`-Mousson, France), on which the defects found by an expert are outlined in black (the dark horizontal lines are red chalk marks on the board, and have no bearing on the experiment). The ‘‘ori’’ images are the orientation images. The light regions of the ‘‘cyc’’ images correspond to the residues of the orientation images detected by the top-hat. The size of the images, in pixels, is given below each image.
168
ALLAN HANBURY
(a)
(b)
(c)
FIGURE 24. Detection of defects at a smaller scale. (a) An oak board (courtesy of Scanwood System, Pont-a`-Mousson, France) with some of the small white patches manually indicated (size 608 955 pixels). (b) Orientation image (size 50 112 pixels). (c) Result of a tophat based on a labeled opening. The light pixels indicate the residue.
C. Conclusion In this section we consider applications in which the circular data can be processed independently. The first is the processing of the phase image resulting from a Fourier transform of an election microscopy image. The second is the processing of oriented textures described by orientation fields. Wood veins are a classic example of an oriented texture, in which defects are often characterized by perturbations in the orientation field. Finding these orientations using the labeled opening is demonstrated. The circular centered top-hat operator can also be applied to the wood defect detection problem, but has a far slower processing time (Hanbury and Serra, 2002b). The examples taken from the Brodatz album, even if they are not all made up of linear structures, are anisotropic enough that the texture perturbations are visible in the orientation field, and detectable by a circular centered tophat. Finally, a method of segmenting oriented textures demonstrated on wood textures is described. Even though we treated the angular components of the Fourier transform phase-magnitude pairs, and of the oriented texture orientation-coherence pairs separately, this is not always possible. In the case of color images, there tends to be a particularly close relation between the hue and saturation
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
169
coordinates. Processing of color images is considered in the next two sections. IV. 3D POLAR COORDINATE COLOR SPACES The analysis of color images has become very common due to the widespread availability of reasonably priced color cameras. These cameras almost always capture images which are stored in the RGB format. Each pixel of an RGB color image is encoded as a vector containing three values, giving the amount of each of the red, green, and blue primaries making up each color. Color images, due to their vectorial structure, are generally more difficult to process and analyze than grayscale images. Indeed, one of the commonly adopted approaches is to convert the color data to monochrome data by first calculating the luminance or first principal component, for example. Alternatively, each color channel is processed separately, or vector-valued operators which can take the three channels into account simultaneously are used. Lastly, an alternative representation of the vector space can be used, such as one in terms of 3D polar coordinates, describing each pixel color in terms of the possibly more intuitive hue, saturation, and brightness coordinates. In this section we discuss the RGB space, definitions of color intensity measures, and the improved hue, luminance, and saturation (IHLS) space, the latter being a 3D polar coordinate color description well suited to image processing and analysis tasks. A. Basic Definitions The RGB color space is a three-dimensional color space constructed from a basis of three primary color stimuli, given by the vectors 2 3 2 3 2 3 1 0 0 R ¼ 4 0 5, G ¼ 4 1 5, B ¼ 4 0 5 0 0 1 which correspond to the colors red, green, and blue. A color c is specified in this basis according to one of the laws of Grassman (Wyszecki and Stiles, 1982) c ¼ RR þ GG þ BB in which R, G, B 2 ½0, 1 , and the RGB cube is the cube [0, 1] [0, 1] [0, 1] which contains the coordinates corresponding to valid colors, where the vector corresponding to color c is c ¼ (R, G, B). For digital devices, the
170
ALLAN HANBURY
values R, G, and B are most often integers between 0 and 255, but it is easy to generalize from [0, 1] to any range of values. The primary color stimuli usually vary from device to device, making the RGB space device dependent. The images can be made device independent by transforming then to the International Commission on Illumination (CIE) XYZ space, for which calibration information on the coordinates of the primary color stimuli of the camera in the XYZ space and the lighting conditions (white point) is required. Gamma correction also plays a role in the formation of color images (Novak et al., 1992; Poynton, 1999). In general, video display devices have a nonlinear brightness response to the input voltage, of the form L ¼ V
ð47Þ
where L is the brightness, V is the input voltage, and the values of and are controlled by the brightness and contrast settings of the display. The value of is therefore variable, but usually around 2. To take this nonlinearity into account, many video cameras are designed with an inbuilt nonlinear light response, so that when a camera is connected directly to a display, the displayed image will be linearly related to the brightness of the scene. This means that the output voltage Vout of a camera is usually gamma corrected in the following way Vout ¼ I 1=
ð48Þ
where I is the light intensity recorded by the camera. For a color camera, this gamma correction is applied to each channel. Taking the device dependence and gamma correction into account requires that the image capture devices be calibrated, and is usually only necessary if one wishes to exchange colorimetric information between observers or devices. If one is only interested in measuring a change in images, such as a variation in the dominant shade of blue, where all images were taken with the same camera under the same conditions, then it is not essential to calibrate the equipment. The last basic facts considered in this section are the definitions of the terms brightness, luminance, and lightness. These terms are often used interchangeably, but they have the following specific definitions assigned to them by the CIE (Commission Internationale de l’E´clairage, 1987; Poynton, 1999):
Brightness: A subjective attribute of visual sensation describing whether an area appears to emit more or less light. It has no units of measurement.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
Luminance: Luminance, measured in the SI units of candela per square meter (cd/m2), is the luminous intensity per square meter. Luminous intensity, measured in candela, is the radiant intensity, measured in watts per steradian, weighted by the spectral response of the human eye. This measurement quantitatively describes the fact that if one looks at red, green, and blue light sources having the same radiant intensity in the visible spectrum, the green source will appear the brightest, the red one less bright, and the blue one the dimmest. In the international recommendation for the high-definition television standard (ITU-R Recommendation BT.709, 1990), the following equation for calculating luminance from the (non gamma corrected) red, green, and blue components is given: YðcÞ ¼ 0:2126R þ 0:7152G þ 0:0722B:
171
ð49Þ
Lightness: The human eye has a nonlinear response to luminance, which is taken into account by the lightness measure. A source with a luminance of only 18% of a reference luminance will appear to be half as bright. This measure is used in the CIE L*a*b* and L*u*v* color spaces.
B. 3D Polar Coordinate Color Representations These spaces essentially allow RGB rectangular coordinates to be specified in terms of 3D polar (also known as cylindrical) coordinates. As they are only an alternative representation of the RGB spaces, they do not add any supplementary properties such as device independence to the RGB space, but they are often more intuitive to use, and allow the colors to be treated more homogeneously. The first step in the conversion from an RGB space to a 3D polar coordinate space is to place a new axis between the points (0, 0, 0) and (1, 1, 1) in the RGB space. As this axis passes through all the achromatic points (graylevels) for which R ¼ G ¼ B, it is called the achromatic axis. We then define a set of 3D polar coordinates with respect to this axis: (1) Brightness L 2 ½0, 1 : This coordinate gives the position of the color on the achromatic axis. (2) Hue H 2 ½0 , 360 Þ: This angular coordinate specifies whether a color is red, yellow, green, magenta, etc. It is traditionally measured anticlockwise around the achromatic axis with respect to pure red.
172
ALLAN HANBURY
(3) Saturation S 2 ½0, 1 or chroma C 2 ½0, 1 : Measure of the distance of a color from the achromatic axis. Pure colors (i.e., highly saturated) are found further away from the achromatic axis. Upon examining the literature, one is faced with a bewildering array of such 3D polar coordinate color spaces, such as the HLS, HSV, HSI, and HSB spaces.5 We now discuss how a seemingly simple coordinate system transformation could have given rise to so many different conversion methods, examine the disadvantages of these commonly used spaces, and present the IHLS space, which removes many of these disadvantages.
C. Discussion of the Existing 3D Polar Coordinate Spaces One of the reasons for the existence of such a large variety of 3D polar coordinate color spaces is the number of different definitions of brightness. These definitions lead to spaces which have shapes which are not simply constructed as a pile of planar cross-sections of the cube taken perpendicular to the achromatic axis. Further problems with the existing transforms are due to them originally being developed for the easy numerical specification of colors in computer graphics applications (Smith, 1978). Due to the associated brightness functions, the ‘‘natural’’ shape of the HSV space is a cone, and of the HLS space, a double cone (Levkowitz and Herman, 1993). A vertical slice through the achromatic axis of each of these spaces is shown in Figures A.4(a) and (c). In these images, the achromatic axis is a vertical line in the center, with a hue value of 0 to the right, and 180 to the left. The problem with using these conically shaped representations when specifying a color is that there are large regions which lie outside the cones, i.e., outside the gamut of valid colors. In order to avoid complicated verification (originally on slow 1970s computers) of the validity of a specified color, these spaces were often artificially expanded into cylinders by dividing the saturation values by their maximum possible values for the corresponding brightness. Slices of these cylindrically shaped versions of the HSV and HLS spaces are shown in Figures A.4(b) and (d), respectively. The cylindrically shaped versions have often been carried over into image processing and computer vision, for which they are ill-suited, as discussed here.6 One of the claims often made in respect of the 3D polar coordinate color spaces is that the saturation and brightness coordinates are independent. 5
Shih (1995) summarizes the transforms to and from these spaces. Software already used by the author which implement cylindrically shaped color models include Matlab release 12.1, Aphelion 3.0, Optimas 6.1, and Paint Shop Pro 7. 6
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
173
However, the expansion of the conically shaped spaces into cylinders introduces a brightness normalization which removes this independence. This can easily be seen by examining the standard saturation component values of the color image in Figure A.1(d). In this image, not all the pixels which appear white have RGB coordinates of exactly (1, 1, 1), and not all the black pixels of exactly (0, 0, 0). The standard HSV saturation is shown in Figure 25(c). Due to the artificial expansion of the bottom part of the HSV cone, some of the pixels which look black, but do not have coordinates of exactly (0, 0, 0) are shown as being fully saturated, implying that they have a higher saturation than most of the colors. With the HLS saturation, shown in Figure 25(d), the HLS double cone is expanded in both the low- and highbrightness regions, leading to an image which is essentially useless for image analysis. The large difference between the HSV and HLS saturation images demonstrates the dependence of the saturation on the brightness function used (the brightness functions being different for the HSV and HLS spaces). We now consider two cases of the confusion that the cylindrically shaped spaces can cause. Demarty and Beucher (1998) applied a constant saturation threshold in the cylindrical HLS space (Figure A.4(d)) to differentiate between chromatic and achromatic colors. This threshold can be represented by a vertical line on either side of the achromatic axis in Figure A.4(d), and it is clear that this does not correspond to a constant saturation. Demarty (2000) later improved the threshold by using a hyperbola in the cylindrical HSV space (Figure A.4(b)), which corresponds to a constant threshold in the conic HSV space (Figure A.4(a)). Smith (1997) makes the assumption that the cylindrical HSV space is perceptually uniform when a Euclidean metric is used, but upon examining Figure A.4(b), one sees that a certain distance in the high-brightness (top) part of the space corresponds to a far larger perceived change in color than the same distance in the low-brightness part of the space.
D. Derivation of a Useful 3D Polar Coordinate Space In this section we examine a derivation of a 3D polar coordinate system in the RGB space, pointing out the choices which could (and have) led to characteristics which are disadvantageous, and ending up with a 3D polar coordinate representation of the RGB space which is useful for image processing and analysis. This derivation is based on the derivation of the generalized lightness, hue, and saturation (GLHS) model by Levkowitz and Herman (1993). As the derivation of color spaces is not the principal theme of this chapter, we present only an outline of the derivation. Full details can be found in Hanbury and Serra (2002a).
174
ALLAN HANBURY
(a) Luminance
(b) Hue
(c) HSV cylindrical saturation
(d) HLS cylindrical saturation
(e) Suggested IHLS saturation
(f) IHLS chroma
(g) Saturation _ chroma
FIGURE 25. (a)–(f ) Various 3D polar color space components for the color image in Figure A.1(d). (g) Arithmetic difference between images (e) and (f ). The highest pixel value in this image is 0.127, but the contrast has been stretched to make the differences more visible.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
175
1. Brightness In order to conform to the terminology suggested by the CIE, we call a subjective measure of luminous intensity the brightness. The brightness function of the GLHS model is LðcÞ ¼ wmin min ðcÞ þ wmid mid ðcÞ þ wmax max ðcÞ
ð50Þ
in which the functions min (c), mid (c), and max (c) return, respectively, the minimum, median, and maximum component of a vector c in the RGB space, and wmin, wmid, and wmax are weights set by the user, with the constraints wmax>0 and wmin þ wmid þ wmax ¼ 1. Specific values of the weights give the brightness functions used by the common cylindrically shaped color spaces: wmin ¼ 0, wmid ¼ 0, and wmax ¼ 1 for HSV; wmin ¼ 1/2, wmid ¼ 0, and wmax ¼ 1/2 for HLS; and wmin ¼ 1/3, wmid ¼ 1/3, and wmax ¼ 1/3 for HSI. In the RGB space, one can visualize surfaces of isobrightness (or isoluminance). The surfaces of isobrightness l contain all the points such that L(c) ¼ l and intersect the achromatic axis at l. For the HSV and HLS spaces, these surfaces have a complicated shape, as described by Levkowitz and Herman (1993), and for the HSI space they are planes perpendicular to the achromatic axis. The isoluminance surfaces (Equation (49)) are planes oblique to the achromatic axis. The isobrightness and isoluminance surfaces corresponding to a single brightness or luminance function are by definition parallel to each other. 2. Hue The hue angle is traditionally measured starting at the direction corresponding to pure red. The simplest way to derive an expression for this angle is to project the vector (1, 0, 0) corresponding to red in the RGB space and an arbitrary vector c onto a plane perpendicular to the achromatic axis, and to calculate the angle between them. This gives the expression 0
H ¼ arccos
R 12 G 12 B
ðR2 þ G2 þ B2 RG RB BGÞ1=2
ð51Þ
after which, in order to give a value of H 2 ½0 , 360 , we apply H¼
360 H 0 H0
if B > G : otherwise
ð52Þ
176
ALLAN HANBURY
An approximation to this trigonometric expression is often used, and it is shown by Levkowitz and Herman (1993) that the approximated value differs from the trigonometric value by at most 1.12 . Nevertheless, given that one generally has much processing power available today, the use of the approximation is not recommended, as it tends to suffer to a larger extent from the discretization problems pointed out by Kender (1976). 3. Saturation For the derivation of an expression for the saturation of an arbitrary color c, we begin by looking at the triangle which contains all the points with the same hue as c, as shown in Figure 26. The intersection of this triangle and the isobrightness surfaces are lines parallel to the line between c and its brightness value on the achromatic axis L(c) ¼ [L(c), L(c), L(c)]. Traditionally, the saturation is calculated as the length of the vector from L(c) to c divided by the length of the extension of this vector to the surface of the RGB cube. This definition, however, results in color spaces in the form of cylinders discussed in Section IV.C. Moreover, it is clear that this definition of the saturation depends intimately on the form of the brightness function chosen (i.e., on the slopes of the isobrightness lines). In order to keep the conic or bi-conic forms of the spaces, it is necessary to change the definition of the saturation. Instead of the definition given
FIGURE 26. The triangle which contains all the points with the same hue as c. The circled corners mark the extremities of the edges of the cube containing the points furthest away from the achromatic axis.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
177
above, we divide the length of the vector from L(c) to c (in Figure 26) by the length of the vector between L[q(c)] and q(c), that is, the longest vector parallel to [L(c), c] included in the isohue triangle, the vector which necessarily intersects the third corner q(c) of the triangle. We then end up with the following expression for the saturation S¼
kLðcÞ ck kL½qðcÞ qðcÞk
ð53Þ
in which k k indicates the Euclidean norm. This saturation is independent of the choice of the brightness function, which can be shown by using similar triangles (Hanbury and Serra, 2002a). An example of this saturation measurement is shown in Figure 25(e), where it should be compared to the corresponding HSV and HLS examples. The most visible improvement resulting from this definition is that both the white and black regions of the color image are assigned a low saturation value. The points furthest away from the achromatic axis are those on the edges of the RGB cube between the circled corners in Figure 26. These points correspond to the most highly saturated colors, and if we project them onto a plane perpendicular to the achromatic axis, they from the edges of a hexagon, which correspond to the maximum distance a point can be from the achromatic axis for a given hue. A simpler expression for the saturation of point c can be obtained by projecting it onto this hexagon, and dividing the distance of the projected point from the center of the hexagon by the distance from the center to the hexagon edge at the same hue value (Hanbury and Serra, 2002a). By using Equation (53) along with the brightness function LðcÞ ¼ min ðR, G, BÞ
ð54Þ
one can derive the following extremely simple saturation expression (Hanbury and Serra, 2002a): S0 ¼ max ðR, G, BÞ min ðR, G, BÞ
ð55Þ
4. Chroma Carron (1995) suggests the use of the distance of a point from the achromatic axis without the maximum distance normalization as an approximation to the saturation, which he calls chroma. This distance is multiplied by a constant so that for the six vertices of the projected hexagon (i.e., corresponding to the circled vertices in Figure 26) the chroma has a
178
ALLAN HANBURY
maximum value of one. An example of the chroma is shown in Figure 25(f ) and the difference between the chroma and the saturation images is shown in Figure 25(g) (the contrast has been enhanced for better visibility, the maximum pixel value in the image is 0.127). The maximum possible difference between a saturation and a chroma value for a color is 0.134.
E. The IHLS Space In this section we present algorithms for transforming back and forth between the RGB and the improved HLS (IHLS) space. This latter is an improvement on the standard HLS space that replaces the cylindrical saturation measure with a conic one, thereby allowing the use of any brightness or luminance function (provided that they produce parallel isobrightness surfaces). In these algorithms, we have chosen to use the luminance function because of its psychovisual properties. MATLAB routines implementing the following transformations are available at http://www.prip.tuwien.ac.at/ hanbury. Two transforms from the RGB space to the IHLS space are given, both of which produce exactly the same hue, saturation, and luminance coordinates. The first is extremely rapid, while the second is easier to invert. The inverse transformation from IHLS to RGB is also presented. 1. The Simplest RGB to IHLS Transformation For the simplest implementation, one calculates a brightness measure (Equation (49) or Equation (50)), the saturation using Equation (55), and the hue using Equations (51) and (52), as summarized here:
0
YðcÞ ¼ 0:2126R þ 0:7152G þ 0:0722B
ð56Þ
SðcÞ ¼ max ðR, G, BÞ min ðR, G, BÞ
ð57Þ
H ðcÞ ¼ arccos
R 12 G 12 B ðR2 þ G2 þ B2 RG RB BGÞ1=2
HðcÞ ¼
360 H 0 H0
if B > G : otherwise
ð58Þ
ð59Þ
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
179
2. An Alternative RGB to IHLS Transformation A method to calculate the luminance, trigonometric hue, chroma, and saturation coordinates is given here, based on the one suggested by Carron (1995). The changes with respect to the version given by Carron are the extension to calculate the saturation from the chroma, and the use of luminance instead of brightness. The algorithm gives precisely the same Y, S, and H component values as the simpler algorithm in the previous section, but is simpler to invert as it contains no max or min functions. The first step is 2
3 2 0:2125 Y 4 C1 5 ¼ 4 1 C2 0
0:7154 12 pffiffi 23
32 3 0:0721 R 12 54 G 5, pffiffi 3 B 2
ð60Þ
followed by the calculation of the chroma C 2 ½0, 1 C¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C12 þ C22 ,
the hue H 2 ½0 , 360 8 undefined > < H ¼ arccosðC1 =CÞ > : 360 arccosðC1 =CÞ
if C ¼ 0 if C 6¼ 0 and C2 0 , if C 6¼ 0 and C2 > 0
and, if required, the saturation S 2 ½0, 1 S¼
2C sin ð120 H * Þ pffiffiffi 3
ð61Þ
in which H * ¼ H k 60
where k 2 f0, 1, 2, 3, 4, 5g so that 0 H * 60 : ð62Þ
3. The Inverse Transformation from IHLS to RGB To transform colors represented in the IHLS coordinate system obtained using either of the above algorithms to RGB coordinates, one first calculates the chroma values from the saturation values (using Equation (61)): pffiffiffi 3S C¼ 2 sinð120 H * Þ
ð63Þ
180
ALLAN HANBURY
in which H* is given by Equation (62). From the chroma, one calculates C1 ¼ C cos ðHÞ
ð64Þ
C2 ¼ C sin ðHÞ:
ð65Þ
For the case where the hue is undefined C1 ¼ C2 ¼ 0. Finally, the inverse of the matrix used in Equation (60) is used to obtain R, G, and B: 2
3 2 R 1:0000 4 G 5 ¼ 4 1:0000 B 1:0000
32 3 0:7875 0:3714 Y 0:2125 0:2059 5 4 C1 5: 0:2125 0:9488 C2
ð66Þ
F. Conclusion The commonly used 3D polar coordinate color representation systems, such as the HLS and HSV, are unsuited to image processing and analysis. The principal reason for this is the artificial expansion of the natural conic shapes of the spaces into a cylindrical shape. In this chapter, we propose a generalized 3D polar coordinate representation of the RGB space, called the IHLS space, which has the following advantages over the commonly used cylindrically shaped ones:
Achromatic or near-achromatic colors always receive a low saturation value. As we have removed the normalization of the saturation by the brightness function present in the cylindrically shaped models, these two coordinates are independent, allowing a wide choice of brightness functions. The removal of the brightness normalization of the saturation means that comparisons between saturation values are meaningful, which is important in the context of mathematical morphology. Any 3D polar coordinate color representation is very closely tied to the RGB space, being simply a different representation of it. It therefore does not have any supplementary properties such as device independence. The main advantage of the 3D polar coordinate representation is that it is a more homogeneous color representation, as the colors are not specified in terms of fixed directions corresponding to red, green, and blue, but as angular values. This representation sometimes allows features which are not clearly visible in the RGB space to be seen and exploited.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
V. PROCESSING
OF
181
3D POLAR COORDINATE COLOR SPACES
Up to this point we have discussed the application of mathematical morphology to values on the unit circle, and have given some examples of the morphological processing of phase images and of orientation images describing oriented textures, cases in which the angular data can often be processed separately. This separate processing is nevertheless not always possible as there are often supplementary measures closely associated with the angular value: the amplitude associated with the phase in a Fourier transform, or the luminance and saturation coordinates associated with the hue. In this section we discuss the latter case, mathematical morphology applied to color images represented in a 3D polar coordinate system (i.e., in the IHLS space). Each pixel in this type of image is encoded by a vector containing an angular value, the hue, and two linear values, the luminance and the saturation. We begin, in Section V.A, by discussing circular statistics applied to the hue, and suggest a way of taking the saturation into account when calculating hue means and variances. The application of morphological operators to color images is a special case of vectorial mathematical morphology. Some aspects of this subject, notably the vectorial orders, are discussed in Section V.B, We then present, in Section V.C, the use of lexicographical orders in the IHLS space. We show that in order to obtain usable results with operators using a lexicographical order with hue at the top level, it is necessary to weight the hue by the saturation. The weighting method used here is nonetheless different to the one used in the context of color statistics. A color top-hat operator is also suggested. A. Color Statistics In a 3D polar coordinate color space, for example the IHLS space, if we treat each channel separately then the classic linear statistical methods may be used to calculate descriptors of the luminance and saturation. For the hue, on the other hand, circular statistics (Section II.B) must be used. This processing by color band presents the disadvantage of ignoring the close relationship between the two chrominance bands, the saturation and the hue, thereby giving equal importance to all the hues, irrespective of their associated saturation. A mean of the hue weighted by the saturation allows the simultaneous use of both chrominance components. We begin with n pairs of values, the hues Hi and their associated saturations Si. To calculate the mean of the hue weighted by the saturation, we proceed, as in
182
ALLAN HANBURY
Section II.B, by calculating the direction of the resultant vector of the sum of vectors in the directions Hi, except that instead of unit vectors, the vector with direction Hi has a length proportional to the saturation Si. The hues associated with small saturation values therefore have less influence on the result. We now present the changes to be made to the equations of Section II.B in order to calculate a saturation-weighted hue mean. Equations (5) and (7) are replaced by AS ¼
n X i¼1
Si cos Hi ,
BS ¼
n X
Si sin Hi ,
R2S ¼ A2S þ B2S
ð67Þ
i¼1
and we replace A and B in Equation (6) by AS and BS giving 8 arctan ðBS =AS Þ > > > > < arctan ðBS =AS Þ þ p H S ¼ arctan ðBS =AS Þ þ 2p > > p=2 > > : 3p=2
if if if if if
BS > 0, AS AS < 0 BS < 0, AS AS ¼ 0, BS AS ¼ 0, BS
>0 >0 >0 <0
ð68Þ
where the saturation-weighted hue mean is denoted by H S . The mean length (Equation (8)) becomes RS RS ¼ Pn i¼1
Si
:
This expression is still an indicator of the spread of the hue values, and is not linked to the standard saturation mean. The direction of the mean H S is independent of the position of the hue origin, even if its value depends on the position of the origin. In practice, for images which contain only highly saturated colors, there is not a large difference between the values of the weighted and unweighted hue means. Figure A.1(f ), whose saturation is shown in Figure 27(a), is an image for which the difference is significant. For this color image, the unweighted hue mean is H ¼ 326:9 , and the saturation-weighted hue thresholds on the hue band of mean is H S ¼ 19:7 . To show the difference, the image for the intervals ½H 20 , H þ 20 and ½H S 20 , H S þ 20 are shown in Figures 27(b) and (c), respectively. It is clear that the saturationweighted hue mean corresponds to the hue of the regions of the image with the highest saturation, the two cells. The unweighted hue mean is skewed by the low-saturation background.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
(a)
(b)
183
(c)
FIGURE 27. (a) Saturation of Figure A.1(f ). (b) Pixels of the initial image, which have a hue value in the interval of 20 on either side of the unweighted hue mean. (c) Pixels of the initial image, which have a hue value in the interval of 20 on either side of the saturation-weighted hue mean.
B. Vectorial Mathematical Morphology Mathematical morphology operators on complete lattices (Serra, 1988), which are based on two notions: the assignment of an order, and of a supremum and infimum. To be able to apply mathematical morphology to color images, it is necessary to be able to impose an order on the colors, and to ensure the existence of suprema and infima to allow the construction of a lattice to which morphological operators can be applied. The processing of color images is a special case of the processing of vectorial data or images. We consider the first requirement in the construction of a lattice, that of orders, and we present a general introduction to vectorial orders. We note that for certain vectorial orders, the supremum and infimum of a set of vectors do not always form part of the set. In the specific case of color images, this could result in the introduction of false colors into an image processed by a morphological operator (Talbot et al., 1998). A false color is a color present in the image after the application of an operator which was not in the initial image. For filters aiming to simplify an image, the appearance of these new colors is generally unwanted.
1. Vectorial Orders In the framework of mathematical morphology, the three important vectorial orders are the preorders, partial orders, and total orders. Before we define these orders, we give the definitions of relations useful for characterizing an order relation.
184 Definition 7
ALLAN HANBURY
Let R be a binary relation on an arbitrary set A.
(1) R is reflexive iff 8x 2 A, xRx (2) R is transitive iff 8x, y, z 2 A, xRy, and yRz ) xRz (3) R is antisymmetric iff 8x, y 2 A, xRy, and yRx ) x ¼ y For example, for order relations on R, the binary relation R is generally the relation ‘‘ ’’ or ‘‘<’’ (or their inverses). We now move onto the definitions of order relations. Definition 8 transitive.
A binary relation R on a set A is a preorder iff R is reflexive and
Definition 9 A binary relation R on a set A is a partial order iff R is reflexive, transitive, and antisymmetric. Definition 10
A partial order is totally ordered iff 8x, y 2 A, xRy, or yRx.
Hence, a totally ordered set contains no pairs of members which cannot be ordered. We call a partial order which is totally ordered a total order. Using these definitions, the mathematical framework in which mathematical morphology operates, the complete lattice, can be rigorously defined. Definition 11
A complete lattice is a set L such that
(1) L is provided with a partial order. (2) For every family of elements fXi g 2 L, a supremum and an infimum exist. Barnett (1976) describes four ways of ordering vectorial data, three of which are often used in the processing of vectorial images, these being the marginal order, reduced order, and conditional or lexicographical order. They have been applied to the definition of median filters (Pitas and Tsakalides, 1991) and morphological operates (Comer and Delp, 1999). We now give the definitions of the three vectorial orders used in image processing. Let V be a set of n vectors xi with dimension p, with xi ¼ ðx1ðiÞ , x2ðiÞ , . . . , xpðiÞ Þ,
i ¼ f1, 2, . . . , ng:
Definition 12 For the marginal order, the vector components are ordered for each of the p dimensions. For two vectors xi , xj 2 V xi xj Q xkðiÞ xkðjÞ 8k 2 f1, 2, . . . , pg:
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
185
The supremum of the set V is therefore xsup ¼
_
_ _ x1ðiÞ , x2ðiÞ , . . . xpðiÞ
i
i
!
i
and the infimum is
xinf
! ^ ^ ^ ¼ x1ðiÞ , x2ðiÞ , . . . xpðiÞ : i
i
i
The marginal order is a partial order. We cannot order the vectors (1, 2) and (2, 1), for example. It is clear that the vectors xsup and xinf do not necessarily form part of the initial set of vectors V. For example, the infimum of vectors (1, 2) and (2, 1) is the vector (1, 1). Definition 13 The reduced order uses a function g : Rp ! R to impose an order on the vectors. For two vectors xi , xj 2 V xi xj Q gðxi Þ gðxj Þ: The supremum of set V is ( xsup ¼ xi : gðxi Þ ¼
)
_
gðxj Þ
j¼1,2,...,n
and the infimum is ( xinf ¼ xi : gðxi Þ ¼
^
) gðxj Þ :
j¼1,2,...,n
For the reduced order, the vectors xsup and xinf are members of the initial set of vectors V, as we do not process the vector components separately. The reduced order is a preorder if the function g is not injective, but totally ordered if it is. For example, in a two-dimensional space, the function g[(x, y)] ¼ x þ y is not injective, which means that as g[(1, 2)] ¼ g[(2, 1)], it is not possible to order these vectors. This problem can be surmounted by using a function g which is injective, or by using a lexicographical order.
186
ALLAN HANBURY
Definition 14 The conditional or lexicographical order is based on the following order relation for two vectors xi , xj 2 V:
xi xj if
8 x1ðiÞ x1ðjÞ > > > > > or > > > > > < x1ðiÞ ¼ x1ðjÞ and x2ðiÞ x2ðjÞ or > > > > . > > > .. > > > : x1ðiÞ ¼ x1ðjÞ and x2ðiÞ ¼ x2ðjÞ and x3ðiÞ ¼ x3ðjÞ and . . . and xpðiÞ xpðjÞ ð69Þ
or, written more compactly (Chanussot and Lambert, 1998) xi xj Q 9k 2 f1, 2, . . . , pg : xlðiÞ ¼ xlðjÞ 8l 2 f1, 2, . . . , k 1g and xkðiÞ xkðjÞ : The supremum and infimum of the set V are defined based on this order relation. The lexicographical order is a total vector order, with the property that the supremum and infimum are always members of the initial vector set V. The use of this order necessarily implies the attribution of a priority to the components, as in the majority of cases the order of two vectors will be decided by the first line of Equation (69) (and hence by the first vector component). It is obviously not necessary to limit oneself to a component priority based on the order of the components within the vectors. The components can be placed into Equation (69) in an order of priority defined by the user. It is even possible to place a noninjective function g (from the definition of the reduced order) at the first level of the lexicographical order, thereby creating a total order. Some alternative orders are suggested in the literature. A total order based on space-filling curves is suggested by Chanussot and Lambert (1998) and Chanussot (1998). Serra (1992) suggests an intermediate order between the marginal and lexicographical orders. Comer and Delp (1999) use nontotal orders along with a geometric criterion based on pixel position in the structuring element, allowing vectors for which the order is not defined by the order relation chosen to be ordered. An application of fuzzy mathematical morphology (Bloch and Maıˆ tre, 1995) for color images is presented, along with a textile inspection application, by Ko¨ppen et al. (1999).
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
187
2. Morphological Operators After having chosen or defined a lattice for the color vectors, thereby permitting the choice of a supremum and an infimum of these vectors, the basic morphological operators can be applied. The erosion at point x by structuring element B is "B f ðxÞ ¼ f f ðyÞ : f ðyÞ ¼ inf ½ f ðzÞ , z 2 Bx g
ð70Þ
and the corresponding dilation is B f ðxÞ ¼ f f ð yÞ : f ð yÞ ¼ sup ½ f ðzÞ , z 2 Bx g:
ð71Þ
These operators can be used to build other operators, such as the opening B and the closing ’B. C. Lexicographical Orders in the IHLS Color Space When a lexicographical order is used with morphological operators, one finds that the majority of decisions on the vector order in a structuring element are taken at the first level of the order relation (Hanbury and Serra, 2001b). The application of a lexicographical order to a color space of type RGB necessarily results in the promotion of one of the red, green, or blue components to a dominant position, which produces operators which treat the color space inhomogeneously. The use of a 3D polar coordinate space, such as IHLS, allows the creation of two operators which use the homogeneous coordinates of luminance and saturation at the first level, or of an operator for which any hue can be chosen to play the dominant role (Hanbury, 2001; Hanbury and Serra, 2001a). In this section, we first present formulations of the lexicographical order with luminance and with saturation at the first level. Next, we consider an order with the hue at the first level. After a demonstration of the inconveniences of this order caused by the close relation between the two chrominance components, the hue and the saturation, we suggest a solution in the form of a lexicographical order with hue weighted by saturation at the first level. Lastly, a color top-hat operator is suggested. The image used in the examples in this section is shown in Figure A.1(e), and its hue, saturation, and luminance components are shown in Figure 28. 1. Luminance and Saturation The luminance and saturation coordinates each form a complete lattice, and it is therefore easy to use them in a lexicographical order. The angular
188
ALLAN HANBURY
(a) Hue
(b) Saturation
(c) Luminance
FIGURE 28. The IHLS space: (a) hue, (b) saturation, and (c) luminance of the image in Figure A.1(e).
coordinate (the hue), placed at the third level so as to minimize its importance, is ordered based on its distance from an origin (Section II.D). It is therefore necessary to choose an origin for the hues, but this origin intervenes only in the third level of the lexicographical order. It therefore only arbitrates in the cases where two vectors have equal luminance and saturation values. We define the lexicographical order with luminance at the first level for two vectors ci ¼ (Hi, Yi, Si) and cj ¼ (Hj, Yj, Sj) in the IHLS space, as
ci > cj
if
8 Yi > Yj > > > > < or Yi ¼ Yj and Si > Sj > > or > > : Yi ¼ Yj and Si ¼ Sj and Hi H0 < Hj H0
ð72Þ
where H0 is the hue origin chosen by the user and the symbol indicates the acute angle between the two hues (Equation (4)). If the luminance values of the two vectors being compared are equal, then the vector with the higher saturation value is taken as being larger. If the luminance and saturation values are equal, it is necessary to take the hue values into account. The top
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
189
two levels of this relation are invariant to rotation around the achromatic axis. The erosion and dilation obtained by using this order applied to Figure A.1(e) are shown in Figure A.5(a) and (b), respectively (we set H0 ¼ 0 and use a square SE of size 2). The lexicographical order with saturation at the top level is constructed by inverting the top two levels of relation (72), giving
ci > cj
8 Si > Sj > > > > < or if Si ¼ Sj and Yi > Yj : > > or > > : Si ¼ Sj and Yi ¼ Yj and Hi H0 < Hj H0
ð73Þ
The erosion and dilation obtained when using this order are shown in Figures A.5(c) and (d), respectively (a square SE of size 2 was used). The lexicographical orders suggested in Equations (72) and (73) are obviously not the only orders of this type possible. One can easily invert the second and third levels, or the directions of the comparison operators in these two levels, still keeping valid lexicographical orders. The choice between these two orders depends on whether one is interested in luminous and dark objects, or in saturated and nonsaturated objects. For example, if one wishes to eliminate only the white or black writing from the orange card at the bottom of the image in Figure A.1(e), one could use, respectively, a luminance opening or closing.7 If both black and white writing is to be removed, a saturation closing is recommended. In general, the operators with luminance at the top level are better at preserving object contours. A good example of an application using a lexicographical order with luminance at the top level is given by Iwanowski (2000) in the context of color image interpolation. He uses a lexicographical order having luminance at the top level, and the values of the green and red components of the RGB space at the second and third levels, respectively. 2. Hue For the hue, the obvious approach is to construct a lexicographical order with, at the first level, a hue order based on the distance from an origin. 7 Due to lack of space for printing color images, the results of the opening and closing are not shown. They can either be mentally extrapolated from the erosion and dilation images, or downloaded from http://www.prip.tuwien.ac.at/ hanbury. Larger versions of the images shown are also available on this web page.
190
ALLAN HANBURY
A possible form for this order is
ci > cj
8 Hi H0 < Hj H0 > > > > < or if Hi H0 ¼ Hj H0 and Si > Sj : > > > or > : Hi H0 ¼ Hj H0 and Si ¼ Sj and Yi > Yj
ð74Þ
Upon applying morphological operators using this order, one sees that the results are not satisfactory. This is due to the close relationship between the chrominance components, the hue and the saturation. To give an example, we use this order with H0 ¼ 40 (which corresponds to the color of the orange blobs near the top right of the image) applied to the image in Figure A.1(e). With this choice of origin, we intend a dilation to enlarge the red and orange regions, and to shrink the blue and violet regions with hues around 220 (the lettering on the violet area at the top left of the image has a hue of about 245 , and is surrounded by a low-saturation color with a hue of around 300 ). The results are shown in Figure A.5(e) (erosion) and Figure A.5(f ) (dilation). Upon examining the result of the dilation, one sees that the red lines have all been covered by the neighboring white pixels, that some black pixels still remain within the upper right orange blobs, that the white letters on the orange card have actually been enlarged, and that the rightmost border of the orange card has become jagged, due to only some of the orange pixels having been expanded over the surrounding white background. In essence, the black and white pixels are sometimes chosen preferentially to the red and orange pixels, contrary to our requirements. The reason is that the low-saturation black and white regions, for which the hue values are rather arbitrary, often have hue values closer to the chosen origin than the high-saturation regions. For example, a white pixel having a hue of 0 is considered to be closer to the chosen origin (40 ) than a red pixel with a hue of 350 . The disadvantage of this order by hue only is better illustrated by the simple example of Figure A.3(a), which shows four colors along with their hue, luminance, and saturation coordinates (in this order in the vectors). The positions of these colors on the hue circle are shown in Figure 29(a). If one chooses the color closest to red a ¼ (0.0, 0.21, 1.00) in this image by using only the hue values, the result is the brown c ¼ (9.3 , 0.47, 0.20), whereas the orange b ¼ (18.9 , 0.46, 0.90) is visually the most similar. This contradiction is due to the low value of the saturation of color c, which makes it more of a gray than a color, and therefore very far from pure red. The solution proposed is to weight the hue values by their corresponding saturation values before ordering the hues.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
191
FIGURE 29. (a) The positions of the colors of Figure A.3(a) on the hue circle. (b) The positions of the colors of Figure A.3(a) on the hue circle after weighting the hues by the corresponding saturations.
3. Saturation-Weighted Hue Demarty (2000) introduced an algorithm for dividing the pixels of an image into two classes, highly saturated pixels and monochromatic pixels. This binary approach to separating the pixels is nevertheless not flexible enough to be applied to a weighting of the hues by the saturation. In the context of hue statistics, we presented in Section V.A a method for determining saturation-weighted hues. This method represents each hue by a vector with a length proportional to the associated saturation. If we consider the hues as points on the unit circle, this method represents the weighted hues as points in the interior of the circle. This bidemensional representation is, however, not convenient if we wish to impose an order based only on the angular values of the hues. The method for weighting the hue by the saturation suggested in this section changes the position of the hue on the unit circle as a function of the saturation and of the origin chosen. Because these weighted hues remain on the unit circle (and do not move into its interior), they can be ordered based on their angular values, as for the nonweighted hues. We start with a set of vectors in the IHLS space in which we wish to find the supremum and infimum with respect to a selected hue origin H0. These vectors are ordered by weighted hue by using the following algorithm: (1) We first calculate, for each vector, a saturation-weighted hue H0 . (2) We use H0 to order the vectors with respect to the angular distance from the chosen origin H0. (3) After the supremum and infimum are chosen, the vectors are reassigned their initial hues so as to avoid introducing false colors.
192
ALLAN HANBURY
The principal characteristics of the saturation weighted hue H0 are:
The vectors with high saturation values keep their initial hue values. The vectors with low saturation values are assigned weighted hues close 90 , thereby reducing their likelihood of being to H0 þ 90 or H0 chosen as supremum or infimum.
Before giving a general formulation of the hue-weighting algorithm, we illustrate it using the example of Figure A.3(a). We choose the origin H0 ¼ 0 , and calculate the weighted hue values H0 for the four colors. For hues between 0 and 90 , we define the value of H0 as follows Hi0 ¼ sup½Hi , 90 ð1 Si Þ :
ð75Þ
The color c, whose hue is Hc ¼ 9.3 , and for which the second argument of the supremum operator in Equation (75) gives 90 (0.2 90 ) ¼ 72 , is assigned a weighted hue of Hc0 ¼ 72 . For the colors b and d, the expression 90 (0.9 90 ) ¼ 9 , and for color a, the expression 90 (1.0 90 ) ¼ 0 . Therefore, the weighted hues for these colors are equal to their initial hues: Ha0 ¼ Ha , Hb0 ¼ Hb , and Hd0 ¼ Hd . The positions of the weighted hues on the hue circle are shown in Figure 29(b), from which it is clear that color a remains closest to the origin, with color b now in second position. The general formulation (for hues between 0 and 360 ) of the hue weighting by their corresponding saturations is now presented. For each vector i, a value Hi0 is calculated from Hi and Si. To simplify the notation, we make the hypothesis that the origin H0 is placed at 0 . The value of Hi0 is
Hi0 ¼
8 sup ½Hi , 90 ð1 Si Þ > > > > > < inf ½Hi , 90 ð1 þ Si Þ > > sup ½Hi , 90 ð3 Si Þ > > > : inf ½Hi , 90 ð3 þ Si Þ
if 0 Hi 90 if 90 < Hi 180 if 180 < Hi 270
:
ð76Þ
if 270 < Hi < 360
To use any origin H0, it is sufficient to replace the Hi in Equation (76) by ( Hi !
Hi H0
if Hi H0 0
360 þ ðHi H0 Þ if Hi H0 < 0
:
ð77Þ
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
193
A lexicographical order with the saturation-weighted hue in the first position is
ci > cj
8 0 ðH 0 Þ < ðHj0 0 Þ > > > i > > > < or if ðHi0 0 Þ ¼ ðHj0 0 Þ and Si > Sj : > > > or > > > : 0 ðHi 0 Þ ¼ ðHj0 0 Þ and Si ¼ Sj and Yi > Yj
ð78Þ
Note that by placing the hue at the top level, we have created a morphological operator which by design is not rotationally invariant. The differences between the results obtained with morphological operators using the lexicographical order with the hue at the top level (Equation (74)) and the lexicographical order using the saturation-weighted hue at the top level (Equation (78)) are visible in Figure A.5, in which images (g) and (h) show the erosion and dilation of Figure A.1 using the order by saturation-weighted hue. The origin was chosen to be at 40 , so that the dilation should enlarge the red and orange regions, and shrink the blue and violet regions. The fact that this is not possible with a dilation with only the hue at the top level has already been discussed. It can be seen in Figure A.5(h) that the result that we require is produced, as the white and black pixels are removed from consideration due to their low saturation. It is instructive to compare the results of the operators which use the saturation-weighted hue at the top level (Figures A.5(g) and (h)) with those of the operators which use the saturation at the top level (Figures A.5(c) and (d)). We remark that, except for the nonsymmetric treatment of the colorful regions inherent to the operators which use the weighted (or unweighted) hue, the results are very similar. A disadvantage of using the saturationweighted hue is evident in regions which do not contain pixels having a saturated color close to the chosen origin, for which the results are less predictable. This is visible, for example, in regions containing only black and white pixels in Figures A.5(g) and (h). A possible solution to this problem could be to check if all the pixels in the structuring element have weighted hue values H0 in the intervals (subtended by an acute angle) h i , H0 þ 90 þ H0 þ 90
h or
i H0 90 , H0 90 þ
and in this case, to use instead the luminance for ordering the pixels. The value of , chosen by the user, gives the size of the interval.
194
ALLAN HANBURY
4. Color Top-Hat Hanbury and Serra (2002c) suggest taking advantage of the perceptual uniformity of the L*a*b* color space to calculate a type of morphological top-hat (Soille, 1999). One simply calculates the Euclidean distance between the color coordinates of each pixel of an initial color image and its transform by an opening or closing operator. The resulting Euclidean distances are encoded in a graylevel image, and represent the perceptual differences between colors. Even though the IHLS space is not perceptually uniform, an approximation to this top-hat can be calculated using Euclidean distances in the RGB or IHLS space. Even though these Euclidean distances do not represent perceptual differences in a rigorous way, they can still be useful for feature extraction, as shown in the following example. This example also demonstrates a situation in which a lexicographical order with luminance at the top level is not the ‘‘best,’’ contrary to what has been claimed (Ortiz et al., 2001; Louverdis et al., 2002). Figure A.3(b) is a color image for which we have set ourselves the task of extracting the grayish lines between the mosaic tiles. The luminance of this image is shown in Figure 30(a). It is visible that the luminance of the mosaic tiles is sometimes above and sometimes below that of the gray lines between them. In the saturation image, shown in Figure 30(b), one can see that the lines to be extracted have, in general, a lower saturation than the tiles. A morphological closing operation using a lexicographical order with saturation at the top level (Equation (73)) with a square SE of size 2 was applied to the initial color image to give Figure A.3(c). This closing succeeds in expanding the tiles to cover the gray lines. Finally, the suggested color top-hat was calculated by taking the Euclidean distance between corresponding pixels in Figures A.3(b) and (c) to give the grayscale image in Figure 30(c), in which the pixels of highest graylevel correspond to the features we wish to extract. 5. Summary We have shown the applicability of lexicographical orders for creating total orders in the IHLS space, although they are also applicable in any 3D polar coordinate color representation. For applications in which the pertinent information is in the luminance or saturation, classic lexicographical orders with one of these components at the first level are applicable. For an application in which we are interested by objects of a specific hue, we have shown that a lexicographical order using only a hue order based on a chosen origin does not give satisfactory results due to the close relationship between the chromatic coordinates. We therefore propose a method for weighting the hue by the corresponding saturation allowing the application of
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
(a)
195
(b)
(c)
FIGURE 30. (a) Luminance of Figure A.3(b). (b) Saturation of A.3(b). (c) The top-hat—the Euclidean distance between the corresponding pixels in Figures A.3(b) and (c).
morphological operators using a hue order. Even though the IHLS space is not perceptually uniform, a useful graylevel top-hat image can be calculated by taking the pixelwise Euclidean distances between the color coordinates in an initial image and either its opening or closing. The results of this section show the flexibility of the representation of color in terms of hue, luminance, and saturation in the context of mathematical morphology applied to color images. D. Conclusion We have considered the use of vectorial mathematical morphology in the special case for which one of the vector components is an angular value. This was done for the specific case of mathematical morphology applied to color images described in a space using an angular hue measure, but applications in other domains are conceivable. It is difficult to take into account the supplementary values associated with the hue when using most of the unit circle morphological operators developed in Section II. Therefore, the approach adopted is to create operators which are as rotationally invariant as possible, by placing the angular value with its associated choice of an origin at the third level of a
196
ALLAN HANBURY
lexicographical order, thereby minimizing its role. The lexicographical order was chosen because it imposes a total order on the vectors, avoiding the appearance of false colors brought about when the supremum and infimum of a set of vectors are not part of the initial set, which can happen with a lattice built on a partial order. The third level of the lexicographical order is taken into account when imposing an order on all the vectors in a color space. However, for the extremely reduced set of vectors which are usually found inside a structuring element during the application of a morphological operator, the process of choosing a supremum or infimum of the set almost never gets to the third level of the order, except in pathological cases (Hanbury and Serra, 2001b). The operators based on this minimization of the role of the angular value are those which use a lexicographical order with the luminance or saturation of the IHLS space at the first level. The opposite extreme is to create operators which by design are not rotationally invariant, such as those using the lexicographical order with saturation-weighted hue at the first level. For these operators, the effect of the hue origin chosen by the user is immediately visible, which is not necessarily disadvantageous if the user is interested in a single hue or group of hues, or if the image contains a dominant hue (subjective or objective). An objective measure of the dominant hue can be obtained by using the saturation-weighted hue mean. An objection to the use of the lexicographical order is its propensity to elevate one of the vector components to a role which is much more important than that of the others. In a representation in 3D polar coordinates, this characteristic of the order is nevertheless less restricting than in a rectangular coordinate representation (of type XYZ or RGB), in which we are limited to a choice between the three primary colors of the space. It is nevertheless possible to increase the importance of the roles played by the lower levels of the lexicographical order by using a quantization into a smaller number of levels for the components higher up in the order relation. For example, in the order relation with luminance at the top level, if the luminance is represented by 10 levels instead of 255, the saturation at the second level necessarily plays a more important role. Ortiz et al. (2001) use a similar approach, which uses a weighting parameter at the first level which can augment or reduce its importance. VI. CONCLUSION The principal theme of this chapter is the processing and analysis of circular data and of images which contain this type of data. These data
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
197
most often represent a set of directions which can be visualized as a set of points on the unit circle. The theory of circular statistics is well developed; we limit ourselves to using simple statistical measures such as the mean and variance, for which we present an extension to the vectorial case in the definition of saturation-weighed hue statistical measures for color images. The principal topic discussed is the development of morphological operators applicable to data on the unit circle. These operators attempt to surmount the two principal disadvantages associated with this data: the absence of an obvious origin and the cyclic nature of the data. We consider for operator formulations: (1) (2) (3) (4)
Operators which require the choice of an origin. Pseudooperators which use the notion of data grouping. Circular centered operators which operate on increments. Operators which begin by labeling an image.
For operators applied to circular data, the notion of rotational invariance is very important. One is free to choose the origin of an angular coordinate system at any position, but it is desirable that the results of the morphological operators are not changed by such a change of origin. Even if the values (measured with respect to the origin) of the results charge, the directions should stay the same. The development of such rotationally invariant operators is difficult, as requiring this invariance could
lead to the loss of other useful properties, for example the loss of idempotence for the pseudoopenings and pseudoclosings, be limited to a small set of operators, for example the circular centered operators which are limited to the gradient and top-hat. The usefulness of these operators is shown in two contexts. For the first, one is free to process the angular data in isolation, which is illustrated by the processing of oriented texture fields and of the phase image of a Fourier transform. For the second, we consider vectorial data for which each vector contains at least one angular value. The latter case is illustrated by the processing of color images represented in the IHLS space, a 3D polar coordinate color system. An oriented texture is a class of texture which has a certain level of orientation specificity at each point, and can be described by a direction field. We use the two top-hat operators adapted to circular data to detect defects associated with singularities in the dominant orientation. Examples are shown for images from the Brodatz album and for images of wooden boards. Finally, the circular centered
198
ALLAN HANBURY
gradient is applied to the segmentation of oriented textures, as well as to the extraction of homogeneous regions in Fourier transform phase images. We then move on to the subject of color images. A more isotropic representation of the RGB space can be made in terms of 3D polar coordinates, for which each coordinate is not linked to a fixed direction in the space. These brightness, saturation, and hue coordinates, the latter being an angular coordinate, are also often more intuitive to use. Before considering the application of mathematical morphology to color images, we examine the multitude of methods described in the literature for doing the simple coordinate system conversion from rectangular to 3D polar coordinates. We show that the principal reason for the proliferation of coordinate conversion algorithms is the (often implicit) normalization of the saturation by the associated brightness function. We suggest the use of the IHLS space which has independent saturation and brightness coordinates. This allows, for example, the use of a psychovisual luminance function instead of a brightness function. The most important property when applying mathematical morphology to color images is that the supremum and infimum of a set should be part of that set, thereby avoiding the introduction of false colors. To have this property, we use a total order, the lexicographical order. This order requires that one of the vector components be elevated to a dominant role. To avoid having to use a specific direction in this dominant position, we use the more isotropic 3D polar coordinate system. Due to the vectorial nature of the data, the rotationally invariant morphological operators are not easily applicable. We therefore make the compromise of developing operators which are as rotationally invariant as possible, by placing the hue order at the third level of a lexicographical order. The possibility of elevating the hues to the first level of importance, thereby creating operators which by design are not rotationally invariant, is also considered. It is shown that the use of a saturation-weighted hue measure at the first level gives better results than the hue only, due to the close relation between these chrominance characteristics. The main contribution of this chapter is the development of morphological operators adapted to circular data, and the demonstration of the similarities between the processing of Fourier transform phase images, oriented textures, and color images. Some extensions, both theoretical and practical, remain to be done. These include the investigation of morphological reconstruction operators for color images, the extension of the Fourier transform phase image application, and the investigation of the applicability of these operators to spectrogram processing.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
199
APPENDIX A: CONNECTED PARTITIONS In Section II.G.1 the definition of a connected partition is presented. Here we repeated this definition, followed by a discussion of the lattice created by the family of connected paritions. Definition 15 A partition of the space E for which each element is connected is an application D : E ! PðEÞ, with connectivity C defined on P(E), such that for all points x and y of E: (1) x 2 DðxÞ (2) x ¼ 6 y ) DðxÞ ¼ Dð yÞ or DðxÞ \ Dð yÞ ¼ ; (3) DðxÞ 2 C It is known that given two partitions D and D0 (not necessarily with connected classes), the relation DðxÞ D0 ðxÞ for all x 2 E defines an order on the partitions, from which a lattice is derived. If we limit ourselves to the family D0 of partitions with connected classes, this order relation remains valid, but gives rise to a different lattice. Hence all families fDi , i 2 Ig of connected partitions have in D0 a largest minorante D with its class at point x written as DðxÞ ¼ x ½\Di ðxÞ, i 2 I where x is the point connected opening. D(x) is none other than the connected component given by the intersection of the Di(x) which contain the point x. In the same way, the class at point x of the supremum of the Di is the connected component at x of the smallest set which is the union of the classes of D1, and also of D2, etc., and which contains the point x.
APPENDIX B: CYCLIC CLOSINGS ON INDEXED PARTITIONS In Section II.G.2 the notion of an indexed partition is introduced. The definition is repeated here, followed by a presentation of the order relation and the action of increasing operators on these partitions.
200
ALLAN HANBURY
Definition 16 An indexed partition of a space E, indexed by a finite number N, is an application D : E ! PðEÞ with a function M : PðEÞ ! ½1, 2, . . . , N which associates an index with each element D(x) of the connected partition. To simplify the notation, we define ( Dðx, iÞ ¼
DðxÞ
if M½DðxÞ ¼ i
;
otherwise
:
ðA:1Þ
The N sets associated with the gamut of indices (hue, direction, etc.) are called phases, and the phase Ai is the union of the partition elements associated with index i Ai ¼ [fDðx, iÞ, x 2 Eg:
ðA:2Þ
As each point x 2 E must be associated with an index, there are only N 1 independent index values—if we know the position of N 1 phases, the position of the Nth is necessarily known. We therefore limit ourselves to the first N 1 indices, and consider the relation between two indexed partitions D and D0 , defined by D D0 in the sense of connected partitions 0 DD Q : ðA:3Þ Ai A0i i 2 ½1, 2, . . . , N 1 The set D of partitions with N indices is the lattice produced from the N lattices associated with the orders of relation (A.3). This lattice is far from being unique as the Nth phase plays a particular role, for which we could just as easily choose any of the other phases. Are the orders really all different? In particular, we are interested in the orders of the transformations, most importantly the increasing transformations. Let : D ! D be an increasing operation, which hence respects the N inequalities of system (A.3). We obviously have fAi A0i ) ðAi Þ ðA0i Þg Q fAi A0i ) ðAi Þ ðA0i Þg and Ai A0i
for i 2 ½1, 2, . . . , N 1 Q AN A0N ) ðAN Þ ðA0N Þ:
Consequently, if the operator is increasing for one of the lattices D, it is increasing for the others, which all play the same role. Hence the following proposition. Proposition 17 Given an arbitrary space E and a finite family of N indices, the set D of indexed partitions on E is a complete lattice for every order
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
201
defined by the system (A.3), and for all those which are constructed from it by permutation of the indices or reversing the direction of the inequalities. All increasing operations : D ! D for one of these orders is increasing for all the others.
ACKNOWLEDGMENTS The author wishes to thank his colleagues at the Centre for Mathematical Morphology, Paris School of Mines, France and at the PRIP group, Vienna University of Technology, Austria for the many useful discussions leading to the completion of this work. Particular thanks go to Jean Serra, Etienne Decencie`re and Walter Kropatsch. Part of the texture work presented was undertaken in collaboration with Dmitry Chetverikov of the Hungarian Academy of Sciences, Budapest, Hungary. Thanks also to Florence Boulc’h and Patricia Donnadieu of the Laboratoire de Thermodynamique et Physicochimie Me´tallurgique, Grenoble, France for the Electron Microscope images.
REFERENCES Andersson, M. T., and Knutsson, H. (1992). Orientation estimation in ambiguous neighbourhoods, in Theory & Applications of Image Analysis, edited by P. Johansen and S. Olsen. River Edge, NJ: World Scientific Publishing Co., pp. 189–210. Barnett, V. (1976). The ordering of multivariate data. J. Statistical Society of America A139(3), 318–354. Bigu¨n, J., Granlund, G. H., and Wiklund, J. (1991). Multidimensional orientation estimation with applications to texture analysis and optical flow. IEEE Trans. Pattern Analysis and Machine Intelligence 13(8), 775–790. Bloch, I., and Maıˆ tre, H. (1995). Fuzzy mathematical morphologies: A comparative study. Pattern Recognition 28(9), 1341–1387. Boulc’h, F., Schouler, M.-C., Donnadieu, P., Chaix, J.-M., and Djurado, E. (2001). Domain size distribution of Y-TZP nano-particles using XRD and HRTEM. Image Analysis and Stereology 20, 157–161. Brodatz, P. (1966). Textures: A Photographic Album for Artists and Designers. New York: Dover. Carron, T. (1995). Segmentations d’images couleur dans la base Teinte-Luminance-Saturation: approche nume´rique et symbolique. Ph.D. thesis, Universite´ de Savoie. Chanussot, J. (1998). Approches vectorielles ou marginales pour le traitement d’images multicomposantes. Ph.D. thesis, Universite´ de Savoie. Chanussot, J., and Lambert, P. (1998). Total ordering based on space filling curves for multivalued morphology. Proc. International Symposium on Mathematical Morphology (ISMM ’98), pp. 51–58.
202
ALLAN HANBURY
Chetverikov, D. (1999). Texture analysis using feature-based pairwise interaction maps. Pattern Recognition 32, 487–502. Chetverikov, D., and Hanbury, A. (2002). Finding defects in texture using regularity and local orientation. Pattern Recognition 35(10), 2165–2180. Comer, M. L., and Delp, E. J. (1999). Morphological operations for color image processing. J. Electronic Imaging 8(3), 279–289. Commission Internationale de l’E´clairage (1987). International Lighting Vocabulary, 4th Edition, No. 17.4. CIE. Davies, E. R. (1997). Vectorial strategy for designing line segment detectors with high orientation accuracy. Electronics Lett. 33(21), 1775–1777. Demarty, C.-H. (2000). Segmentation et structuration d’un document vide´o pour la caracte´risation et l’indexation de son contenu se´mantique. Ph.D. thesis, CMM, Ecole des Mines de Paris. Demarty, C.-H., and Beucher, S. (1998). Color segmentation algorithm using an HLS transformation. Proc. International Symposium on Mathematical Morphology (ISMM ’98), pp. 231–238. Fisher, N. I. (1993). Statistical Analysis of Circular Data. Cambridge: Cambridge University Press. Fisher, N. I., and Marron, J. S. (2001). Mode testing via the excess mass estimate. Biometrika 88, 499–517. Freeman, W. T., and Adelson, E. H. (1991). The design and use of steerable filters. IEEE Trans. Pattern Analysis and Machine Intelligence 13(9), 891–906. Gonzalez, R. C., and Woods, R. E. (1992). Digital Image Processing. Reading, MA: Addison-Wesley. Hanbury, A. (2001). Lexicographical order in the HLS colour space. Tech. Rep. N-04/01/MM, CMM, Ecole des Mines de Paris. Hanbury, A. (2002). Morphologie mathe´matique sur le cercle unite´: avec applications aux teintes et aux textures oriente´es. Ph.D. thesis, CMM, Ecole des Mines de Paris. Hanbury, A., and Serra, J. (2001a). Mathematical morphology in the HLS colour space. Proc. British Machine Vision Conference 2001, BMVA, pp. 451–460. Hanbury, A., and Serra, J. (2001b). Mathematical morphology in the L*a*b* colour space. Tech. Rep. N-36/01/MM, CMM, Ecole des Mines de Paris. Hanbury, A., and Serra, J. (2001c). Morphological operators on the unit circle. IEEE Trans. Image Processing 10(12), 1842–1850. Hanbury, A., and Serra, J. (2002a). A 3D-polar coordinate colour representation suitable for image analysis. Tech. Rep. PRIP-TR-077, PRIP, T.U. Wien. Hanbury, A., and Serra, J. (2002b). Analysis of oriented textures using mathematical morphology, in Vision with Non-Traditional Sensors. Austrian Computer Society, pp. 201–208. Hanbury, A., and Serra, J. (2002c). Mathematical morphology in the CIELAB space. Image Analysis and Stereology 21, 201–206. Heijmans, H. (1994). Morphological Image Operators. Boston: Academic Press. Hy¨tch, M. J., Snoeck, E., and Kilaas, R. (1998). Quantitative measurement of displacement and strain fields from HREM micrographs. Ultramicroscopy 74, 131–146. ITU-R Recommendation BT.709 (1990). Basic parameter values for the HDTV standard for the studio and for international programme exchange. Geneva: ITU. Iwanowski, M. (2000). Application de la morphologie mathe´matique pour l’interpolation d’images nume´riques. Ph.D. thesis, CMM, Ecole des Mines de Paris. Kass, M., and Witkin, A. (1987). Analyzing oriented patterns. Computer Vision, Graphics and Image Processing 37, 362–385.
MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA
203
Kender, J. R. (1976). Saturation, hue and normalized color: Calculation, digitization effects, and use. Tech Rep., Department of Computer Science, Carnegie-Mellon University. Kim, C.-W., and Koivo, A. J. (1994). Hierarchical classification of surface defects on dusty wood boards. Pattern Recognition Lett. 15, 713–721. Ko¨ppen, M., Nowack, C., and Ro¨sel, G. (1999). Pareto-morphology for color image processing. Proc. 11th Scandinavian Conference on Image Analysis 1, 195–202. Levkowitz, H., and Herman, G. T. (1993). GLHS: A generalised lightness, hue and saturation color model. CVGIP: Graphical Models and Image Processing 55(4), 271–285. Louverdis, G., Vardavoulia, M. I., Andreadis, I., and Tsalides, P. (2002). A new approach to morphological color image processing. Pattern Recognition 35, 1733–1741. Mallat, S. (1998). A Wavelet Tour of Signal Processing. London: Academic Press. Mardia, K. V., and Jupp, P. E. (1999). Directional Statistics, 2nd Edition. Chichester: John Wiley. Meyer, F. (1999). Graph based morphological segmentation. Proc. 2nd IAPR-TC-15 Worshop on Graph-based Representations, 51–60. Mlynarczuk, M., Serra, J., Bailly, F., and Bouchet, S. (1998). Segmentation de lames minces polarise´es. Tech. Rep. N-48/98/MM, CMM, Ecole des Mines de Paris. Nickels, K. M., and Hutchinson, S. (1997). Textured image segmentation: Returning multiple solutions. Image and Vision Computing 15, 781–795. Nikolaidis, N., and Pitas, I. (1998). Nonlinear processing and analysis of angular signals. IEEE Trans. Signal Proccesing 46(12), 3181–3194. Niskanen, M., Silve´n, O., and Kauppinen, H. (2001). Experiments with SOM based inspection of wood. Proc. International Conference on Quality Control by Artificial Vision (QCAV’2001), pp. 311–316. Novak, C. L., Shafer, S. A., and Willson, R. G. (1992). Obtaining accurate color images for machine vision research, in Color, edited by G. E. Healey, S. A. Shafer, and L. B. Wolff. Jones and Bartlett, pp. 13–27. Ortiz, F., Torres, F., Angulo, J., and Puente, S. (2001). Comparative study of vectorial morphological operations in different color space. Proc. SPIE on Intelligent Robots and Computer Vision (IRCV) XX: Algorithms Techniques and Active Vision 4572, 259–268. Peters II, R. A. (1997). Mathematical morphology for angle-valued images, in Non-Linear Image Processing VIII. SPIE, Vol. 3026. Picard, R. W., and Gorkani, M. (1994). Finding perceptually dominant orientations in natural textures. Spatial Vision 8(2), 221–253. Pitas, I., and Tsakalides, P. (1991). Multivariate ordering in color image filtering. IEEE Trans. Circuits and Systems for Video Technology 1(3), 247–259. Poynton, C. (1999). Frequently asked questions about gamma. URL: http://www.inforamp.net/ poynton/PDFs/GammaFAQ.pdf. Rao, A. R. (1990). A Taxonomy for Texture Description and Identification. New York: Springer-Verlag. Rao, A. R., and Schunck, B. G. (1991). Computing oriented texture fields. CVGIP: Graphical Models and Image Processing 53(2), 157–185. Serra, J. (1982). Image Analysis and Mathematical Morphology. London: Academic Press. Serra, J. (1988). Image Analysis and Mathematical Morphology. Volume 2: Theoretical Advances. London: Academic Press. Serra, J. (1992). Anamorphosis and function lattices (multivalued morphology), in Mathematical Morphology in Image Processing, edited by E. R. Dougherty. New York: Marcel Dekker, pp. 483–523, Chapter 13. Shih, T.-Y. (1995). The reversibility of six geometric color spaces. Photogrammetric Engineering and Remote Sensing 61(10), 1223–1232.
204
ALLAN HANBURY
Silve´n, O., and Kaupinen, H. (1996). Recent developments in wood inspection. Int. J. Pattern Recognition and Artificial Intelligence 10(1), 83–95. Smith, A. R. (1978). Color gamut transform pairs. Computer Graphics 12(3), 12–19. Smith, J. R. (1997). Integrated spatial and feature image systems: Retrieval, compression and analysis. Ph.D. thesis, Columbia University. Soille, P. (1999). Morphological Image Analysis: Principles and Applications. Springer-Verlag. Talbot, H., Evans, C., and Jones, R. (1998). Complete ordering and multivariate mathematical morphology: Algorithms and applications. Proc. International Symposium on Mathematical Morphology (ISMM’98), pp. 27–34. Vardavoulia, M. I., Andreadis, I., and Tsalides, P. (2001). A new vector median filter for colour image processing. Pattern Recognition Lett. 22(6–7), 675–689. Weeks, A. R., and Sartor, L. J. (1999). Color morphological operators using conditional and reduced ordering. Proc. SPIE Conference on Applications of Digital Image Processing XXII. SPIE 3808, 358–366. Wyszecki, G., and Stiles, W. S. (1982). Color Science: Concepts and Methods, Quantitative Data and Formulae, 2nd Edition. New York: John Wiley. Zhang, C., and Wang, P. (2000). A new method of color image segmentation based on intensity and hue clustering. Proc. 15th ICPR Barcelona 3, 617–620.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 128
Quantum Tomography G. MAURO D’ARIANO, MATTEO G. A. PARIS, and MASSIMILIANO F. SACCHI Quantum Optics and Information Group, Istituto Nazionale per la Fisica della Materia, Unita` di Pavia, Dipartimento di Fisica ‘‘A. Volta,’’ Universita` di Pavia, Italy
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II. Wigner Functions and Elements of Detection Theory . . . . . . . . . A. Wigner Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . B. Photodetection . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Balanced Homodyne Detection . . . . . . . . . . . . . . . . . . . D. Heterodyne Detection . . . . . . . . . . . . . . . . . . . . . . . . III. General Tomographic Method. . . . . . . . . . . . . . . . . . . . . . A. Brief Historical Excursus. . . . . . . . . . . . . . . . . . . . . . . B. Conventional Tomographic Imaging . . . . . . . . . . . . . . . . 1. Extension to the Quantum Domain . . . . . . . . . . . . . . . C. General Method of Quantum Tomography . . . . . . . . . . . . 1. Basic Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Characterization of the Quorum . . . . . . . . . . . . . . . . . 3. Quantum Estimation for Harmonic Oscillator Systems . . . . 4. Some Generalizations . . . . . . . . . . . . . . . . . . . . . . . 5. Quantum Estimation for Spin Systems . . . . . . . . . . . . . 6. Quantum Estimation for a Free Particle . . . . . . . . . . . . D. Noise Deconvolution and Adaptive Tomography . . . . . . . . . 1. Noise Deconvolution . . . . . . . . . . . . . . . . . . . . . . . 2. Adaptive Tomography . . . . . . . . . . . . . . . . . . . . . . IV. Universal Homodyning. . . . . . . . . . . . . . . . . . . . . . . . . . A. Homodyning Observables . . . . . . . . . . . . . . . . . . . . . . B. Noise in Tomographic Measurements. . . . . . . . . . . . . . . . 1. Field Intensity . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Real Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Field Amplitude . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Comparison between Homodyne Tomography and Heterodyning V. Multimode Homodyne Tomography . . . . . . . . . . . . . . . . . . A. The General Method . . . . . . . . . . . . . . . . . . . . . . . . . 1. Numerical Results for Two-Mode Fields . . . . . . . . . . . . VI. Applications to Quantum Measurements . . . . . . . . . . . . . . . . A. Measuring the Nonclassicality of a Quantum State . . . . . . . . 1. Single-Mode Nonclassicality . . . . . . . . . . . . . . . . . . . 2. Two-Mode Nonclassicality . . . . . . . . . . . . . . . . . . . . B. Test of State Reduction . . . . . . . . . . . . . . . . . . . . . . . C. Tomography of Coherent Signals and Applications . . . . . . . .
205
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
206 209 210 213 215 218 222 223 224 225 227 227 229 232 235 237 239 239 240 241 243 243 246 248 249 249 251 253 255 256 260 265 266 267 270 272 277
Copyright ß 2003 Elsevier Inc. All rights reserved. 1076-5670/2003 $35.00
206
MAURO D’ARIANO ET AL.
VII. Tomography of a Quantum Device . . . . . . . . . . . . A. The Method. . . . . . . . . . . . . . . . . . . . . . . B. An Example in the Optical Domain. . . . . . . . . . VIII. Maximum Likelihood Method in Quantum Estimation . A. Maximum Likelihood Principle . . . . . . . . . . . . B. ML Quantum State Estimation . . . . . . . . . . . . C. Gaussian State Estimation . . . . . . . . . . . . . . . IX. Classical Imaging by Quantum Tomography . . . . . . . A. From Classical to Quantum Imaging . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
281 283 285 287 288 289 295 298 299 305 305
I. INTRODUCTION The state of a physical system is the mathematical description that provides complete information on the system. Its knowledge is equivalent to knowing the result of any possible measurement on the system. In classical mechanics it is always possible, at least in principle, to devise a procedure made of multiple measurements which fully recovers the state of the system. In quantum mechanics, on the contrary, this is not possible, due to the fundamental limitations related to the Heisenberg uncertainty principle [1,2] and the no-cloning theorem [3]. In fact, on the one hand one cannot perform an arbitrary sequence of measurements on a single system without inducing on it a back-action of some sort. On the other hand, the no-cloning theorem forbids one to create a perfect copy of the system without already knowing its state in advance. Thus, there is no way out, not even in principle, to infer the quantum state of a single system without having some prior knowledge on it [4]. It is possible to estimate the unknown quantum state of a system when many identical copies are available in the same state, so that a different measurement can be performed on each copy. A procedure of such kind is called quantum tomography. The problem of finding a procedure to determine the state of a system from multiple copies was first addressed in 1957 by Fano [5], who called quorum a set of observables sufficient for a complete determination of the density matrix. However, since for a particle it is difficult to devise concretely measurable observables other than position, momentum, and energy, the fundamental problem of measuring the quantum state has remained at the level of mere speculation up to almost 10 years ago, when the issue finally entered the realm of experimental physics with the pioneering experiments by Raymer’s group [6] in the domain of quantum optics. In quantum optics, in fact, using a
QUANTUM TOMOGRAPHY
207
balanced homodyne detector one has the unique opportunity of measuring all possible linear combinations of position and momentum of a harmonic oscillator, which here represents a single mode of the electromagnetic field. The first technique to reconstruct the density matrix from homodyne measurements—so-called homodyne tomography—originated from the observation by Vogel and Risken [7] that the collection of probability distributions achieved by homodyne detection is just the Radon transform of the Wigner function W. Therefore, as in classical imaging, by Radon transform inversion one can obtain W, and then from W the matrix elements of the density operator. This first method, however, was affected by uncontrollable approximations, since arbitrary smoothing parameters are needed for the inverse Radon transform. In Ref. [8] the first exact technique was given for measuring experimentally the matrix elements of the density operator in the photon-number representation, by simply averaging functions of homodyne data. After that, the method was further simplified [9], and the feasibility for nonunit quantum efficiency of detectors—above some bounds—was established. The exact homodyne method has been implemented experimentally to measure the photon statistics of a semiconductor laser [10], and the density matrix of a squeezed vacuum [11]. The success of optical homodyne tomography has then stimulated the development of state-reconstruction procedures for atomic beams [12], the experimental determination of the vibrational state of a molecule [13], of an ensemble of helium atoms [14], and of a single ion in a Paul trap [15]. Through quantum tomography the state is perfectly recovered in the limit of infinite number of measurements, while in the practical finitemeasurements case, one can always estimate the statistical error that affects the reconstruction. For infinite dimensions the propagation of statistical errors of the density matrix elements make them useless for estimating the ensemble average of unbounded operators, and a method for estimating the ensemble average of arbitrary observables of the field without using the density matrix elements has been derived [16]. Further insight on the general method of state reconstruction has led one to generalize homodyne tomography to any number of modes [17], and then to extend the tomographic method from the harmonic oscillator to an arbitrary quantum system using group theory [18–21]. A general data analysis method has been designed in order to unbias the estimation procedure from any known instrumental noise [20]. Moreover, algorithms have been engineered to improve the statistical errors on a given sample of experimental data—the so-called adaptive tomography [22]—and then max-likelihood strategies [23] have been used that improved dramatically statistical errors; however, this
208
MAURO D’ARIANO ET AL.
has been at the expense of some bias in the infinite dimensional case, and of exponential complexity versus N for the joint tomography of N quantum systems. The latest technical developments [24] derive the general tomographic method from spanning sets of operators, the previous group theoretical approaches [18–21] being just a particular case of this general method, where the group representation is just a device to find suitable operator ‘‘orthogonality’’ and ‘‘completeness’’ relations in the linear algebra of operators. Finally, a method for tomographic estimation of the unknown quantum operation of a quantum device has been derived [25], which uses a single fixed input entangled state, which plays the role of all possible input states in quantum parallel on the tested device, making finally the method a true ‘‘quantum radiography’’ of the functioning of a device. In this chapter we will give a self-contained and unified derivation of the methods of quantum tomography, with examples of applications to different kinds of quantum systems, and with particular focus on quantum optics, where also some results from experiments are reexamined. The chapter is organized as follows. In Section II we introduce the generalized Wigner functions [26,27] and we provide the basic elements of detection theory in quantum optics, by giving the description of photodetection, homodyne detection, and heterodyne detection. As we will see, heterodyne detection also provides a method for estimating the ensemble average of polynomials in the field operators; however, it is unsuitable for the density matrix elements in the photon-number representation. The effect of nonunit quantum efficiency is taken into account for all such detection schemes. In Section III we give a brief history of quantum tomography, starting with the first proposal of Vogel and Risken [7] as the extension to the domain of quantum optics of the conventional tomographic imaging. As already mentioned, this method indirectly recovers the state of the system through the reconstruction of the Wigner function, and is affected by uncontrollable bias. The exact homodyne tomography method of Ref. [8] (successively simplified in Ref. [9]) is here presented on the basis of the general tomographic method of spanning sets of operators of Ref. [24]. As another application of the general method, the tomography of spin systems [28] is provided from the group theoretical method of Refs. [18–20]. In this section we include also further developments to improve the method, such as the deconvolution techniques of [20] to correct the effects of experimental noise by data processing, and the adaptive tomography [22] to reduce the statistical fluctuations of tomographic estimators. Section IV is devoted to the evaluation from Ref. [16] of the expectation value of arbitrary operators of a single-mode radiation field via homodyne tomography. Here we also report from Ref. [29] the estimation of the
QUANTUM TOMOGRAPHY
209
added noise with respect to the perfect measurement of field observables, for some relevant observables, along with a comparison with the noise that would have been obtained using heterodyne detection. The generalization of Ref. [17] of homodyne tomography to many modes of radiation is reviewed in Section V, where it is shown how tomography of a multimode field can be performed by using only a single local oscillator with a tunable field mode. Some results of Monte Carlo simulations from Ref. [17] are also shown for the state that describes light from parametric downconversion. Section VI reviews some applications of quantum homodyne tomography to perform fundamental tests of quantum mechanics. The first is the proposal of Ref. [30] to measure the nonclassicality of radiation field. The second is the scheme of Ref. [31] to test the state reduction rule using light from parametric downconversion. Finally, we review some experimental results about tomography of coherent signals with applications to the estimation of losses introduced by simple optical components [32]. Section VII reviews the tomographic method of Ref. [25] to reconstruct the quantum operation of a device, such as an amplifier or a measuring device, using a single fixed input entangled state, which plays the role of all possible input states in a quantum parallel fashion. Section VIII is devoted to the reconstruction technique of Ref. [23] based on the maximum likelihood principle. As mentioned, for infinite dimensions this method is necessarily biased; however, it is more suited to the estimation of a finite number of parameters, as proposed in Ref. [33], or to the state determination in the presence of very low number of experimental data [23]. Unfortunately, the algorithm of this method has exponential complexity versus the number of quantum systems for a joint tomography of many systems. Finally, in Section IX we briefly review Ref. [34], showing how quantum tomography could be profitably used as a tool for reconstruction and compression in classical imaging.
II. WIGNER FUNCTIONS AND ELEMENTS OF DETECTION THEORY In this section we review some simple formulas from Ref. [35] that connect the generalized Wigner functions for s-ordering with the density matrix, and vice versa. These formulas prove very useful for quantum mechanical applications as, for example, for connecting master equations with Fokker– Planck equations, or for evaluating the quantum state from Monte Carlo simulations of Fokker–Planck equations, and finally for studying positivity
210
MAURO D’ARIANO ET AL.
of the generalized Wigner functions in the complex plane. Moreover, as we will show in Section III, the first proposal of quantum state reconstruction [7] used the Wigner function as an intermediate step. In the second part of the section we evaluate the probability distribution of the photocurrent of photodetectors, balanced homodyne detectors, and heterodyne detectors. We show that under suitable limits the respective photocurrents provide the measurement of the photon number distribution, of the quadrature, and of the complex amplitude of a single mode of the electromagnetic field. When the effect of nonunit quantum efficiency is taken into account an additional noise affects the measurement, giving a Bernoulli convolution for photodetection, and a Gaussian convolution for homodyne and heterodyne detection. Extensive use of the results in this section will be made in subsequent sections devoted to quantum homodyne tomography. A. Wigner Functions Since Wigner’s pioneering work [26], generalized phase-space techniques have proved very useful in various branches of physics [36]. As a method to express the density operator in terms of c-number functions, the Wigner functions often lead to considerable simplification of the quantum equations of motion, as, for example, for transforming master equations in operator form into more amenable Fokker–Planck differential equations (see, for example, Ref. [37]). Using the Wigner function one can express quantum mechanical expectation values in form of averages over the complex plane (the classical phase-space), the Wigner function playing the role of a cnumber quasiprobability distribution, which generally can also have negative values. More precisely, the original Wigner function allows one to easily evaluate expectations of symmetrically ordered products of the field operators, corresponding to Weyl’s quantization procedure [38]. However, with a slight change of the original definition, one defines generalized sordered Wigner function Ws ð , * Þ, as follows [27] Z Ws ð , * Þ ¼
C
d2 * * þðs=2Þjj2 e Tr½DðÞ , p2
ð1Þ
where * denotes the complex conjugate of , the integral is performed on the complex plane with measure d2 l ¼ d Re l d Im l, represents the density operator, and Dð Þ:expð ay * aÞ
ð2Þ
QUANTUM TOMOGRAPHY
211
denotes the displacement operator, where a and ay (½a, ay ¼ 1) are the annihilation and creation operators of the field mode of interest. The Wigner functions in Equation (1) allow one to evaluate s-ordered expectation values of the field operators through the following relation Z
y n m
Tr½: ða Þ a : s ¼
C
d2 Ws ð , * Þ * n m :
ð3Þ
The particular cases s ¼ 1, 0, 1 correspond to antinormal, symmetrical, and normal ordering, respectively. In these cases the generalized Wigner function Ws ð , * Þ are usually denoted by the following symbols and names 1 Qð , * Þ for s ¼ 1 ‘‘Q function” p Wð , * Þ for s ¼ 0 ðusual Wigner functionÞ Pð , * Þ
ð4Þ
for s ¼ 1 ‘‘P function:”
For the normal (s ¼ 1) and antinormal (s ¼ 1) orderings, the following simple relations with the density matrix are well known Qð , * Þ:h jj i, Z ¼ d2 Pð , * Þ j ih j,
ð5Þ ð6Þ
C
where j i denotes the customary coherent state j i ¼ D( )j0i, j0i being the vacuum state of the field. Among the three particular representations (4), the Q function is positively definite and infinitely differentiable (it actually represents the probability distribution for ideal joint measurements of position and momentum of the harmonic oscillator; see Section II.D). On the other hand, the P function is known to be possibly highly singular, and the only pure states for which it is positive are the coherent states [39]. Finally, the usual Wigner function has the remarkable property of providing the probability distribution of the quadratures of the field in the form of a marginal distribution, namely Z
1 1
d Im Wð ei’ , * ei’ Þ ¼’ hRe jjRe i’ ,
ð7Þ
where jxi’ denotes the (unnormalizable) eigenstate of the field quadrature X’ ¼
ay ei’ þ aei’ 2
ð8Þ
212
MAURO D’ARIANO ET AL.
with real eigenvalue x. Notice that any couple of quadratures X’, X’ þ p/2 is canonically conjugate, namely [X’, X’ þ p/2] ¼ i/2, and it is equivalent to position and momentum of a harmonic oscillator. Usually, negative values of the Wigner function are viewed as a signature of a nonclassical state, the most eloquent example being the Schro¨dinger-cat state [40], whose Wigner function is characterized by rapid oscillations around the origin of the complex plane. From Equation (1) one can notice that all s-ordered Wigner functions are related to each other through Gaussian convolution Z Ws ð , * Þ ¼
C
d2 Ws0 ð , * Þ
¼ exp
2 2 2 exp j
j pðs0 sÞ s0 s
s0 s @ 2 Ws0 ð , * Þ, ðs0 > sÞ: 2 @ @ *
ð9Þ ð10Þ
Equation (9) shows the positivity of the generalized Wigner function for s < 1, as a consequence of the positivity of the Q function. From a qualitative point of view, the maximum value of s keeping the generalized Wigner functions as positive can be considered as an indication of the classical nature of the physical state [41]. An equivalent expression for Ws ð , * Þ can be derived as follows [35]. Equation (1) can be rewritten as Ws ð , * Þ ¼ Tr½Dð ÞW^ s Dy ð Þ ,
ð11Þ
where Z W^ s ¼
C
d2 ðs=2Þjj2 e DðÞ: p2
ð12Þ
Through the customary Baker–Campbell–Hausdorff (BCH) formula 1 exp A exp B ¼ exp A þ B þ ½A, B , 2
ð13Þ
which holds when ½A, ½A, B ¼ ½B, ½A, B ¼ 0, one writes the displacement in normal order, and integrating on argðlÞ and jlj one obtains W^ s ¼
y 1 X 2 1 2 n yn n 2 sþ1 a a a a ¼ , pð1 sÞ n¼0 n! s 1 pð1 sÞ s 1
ð14Þ
213
QUANTUM TOMOGRAPHY
where we used the normal-ordered forms : ðay aÞn :¼ ðay Þn an ¼ ay aðay a 1Þ ðay a n þ 1Þ, and the identity y
: exa a :¼
1 X y ðxÞl y l l ða Þ a ¼ ð1 xÞa a : l! l¼0
ð15Þ
ð16Þ
The density matrix can be recovered from the generalized Wigner functions using the following expression ¼
2 1þs
Z C
d2 Ws ð , * Þeð2=ð1þsÞÞj j eð2 =ð1þsÞÞa 2
y
y s 1 a a ð2 * =ð1þsÞÞa e : sþ1 ð17Þ
For the proof of Equation (17) the reader is referred to Ref. [35]. In particular, for s ¼ 0 one has the inverse of the Glauber formula Z y ¼2 d2 Wð , * ÞDð2 ÞðÞa a , ð18Þ C
whereas for s ¼ 1 one recovers Equation (6) that defines the P function. B. Photodetection Light is revealed by exploiting its interaction with atoms/molecules or electrons in a solid, and, essentially, each photon ionizes a single atom or promotes an electron to a conduction band, and the resulting charge is then amplified to produce a measurable pulse. In practice, however, available photodetectors are not ideally counting all photons, and their performance is limited by a nonunit quantum efficiency , namely only a fraction of the incoming photons leads to an electric signal, and ultimately to a count: some photons are either reflected from the surface of the detector, or are absorbed without being transformed into electric pulses. Let us consider a light beam entering a photodetector of quantum efficiency , i.e., a detector that transforms just a fraction of the incoming light pulse into electric signal. If the detector is small with respect to the coherence length of radiation and its window is open for a time interval T, then the Poissonian process of counting gives a probability pðm; TÞ of revealing m photons written as [42]
½IðTÞT m exp½IðTÞT : , ð19Þ pðm; TÞ ¼ Tr : m!
214
MAURO D’ARIANO ET AL.
where is the quantum state of light, : : denotes the normal ordering of field operators, and I(T) is the beam intensity IðTÞ ¼
2"0 c T
Z
T
EðÞ ðr, tÞ EðþÞ ðr, tÞ dt,
ð20Þ
0
given in terms of the positive (negative) frequency part of the electric field operator EðþÞ ðr, tÞ (EðÞ ðr, tÞ). The quantity pðtÞ ¼ Tr½IðTÞ equals the probability of a single count during the time interval ðt, t þ dtÞ. Let us now focus our attention on the case of the radiation field excited in a stationary state of a single mode at frequency !. Equation (19) can be rewritten as
ðay aÞm expðay aÞ : , p ðmÞ ¼ Tr : m!
ð21Þ
where the parameter ¼ c h!=V denotes the overall quantum efficiency of the photodetector. Using Equations (15) and (16) one obtains p ðmÞ ¼
1 X
nn
n¼m
n m ð1 Þnm , m
ð22Þ
where nn :hnjjni ¼ p¼1 ðnÞ:
ð23Þ
Hence, for unit quantum efficiency a photodetector measures the photon number distribution of the state, whereas for nonunit quantum efficiency the output distribution of counts is given by a Bernoulli convolution of the ideal distribution. The effects of nonunit quantum efficiency on the statistics of a photodetector, i.e., Equation (22) for the output distribution, can be also described by means of a simple model in which the realistic photodetector is replaced with an ideal photodetector preceded by a beam splitter of transmissivity :. The reflected mode is absorbed, whereas the transmitted mode is photodetected with unit quantum efficiency. In order to obtain the probability of measuring m clicks, notice that, apart from trivial phase changes, a beam splitter of transmissivity affects the unitary transformation of fields c y a U ¼ :U b d
pffiffiffi pffiffiffiffiffiffiffiffiffiffiffi 1
pffiffiffiffiffiffiffiffiffiffiffi ! a 1 , pffiffiffi b
ð24Þ
QUANTUM TOMOGRAPHY
215
where all field modes are considered at the same frequency. Hence, the output mode c hitting the detector is given by the linear combination c¼
pffiffiffiffiffiffiffiffiffiffiffi pffiffiffi a 1 b,
ð25Þ
and the probability of counts reads p ðmÞ ¼ Tr½U j0i 0jUy jm hmj 1 ! 1 X n nn ð1 Þnm m : ¼ m n¼m
ð26Þ
Equation (26) reproduces the probability distribution of Equation (22) with ¼ . We conclude that a photodetector of quantum efficiency is equivalent to a perfect photodetector preceded by a beam splitter of transmissivity which accounts for the overall losses of the detection process. C. Balanced Homodyne Detection The balanced homodyne detector provides the measurement of the quadrature of the field X’ in Equation (8). It was proposed by Yuen and Chan [43], and subsequently demonstrated by Abbas et al. [44]. The scheme of a balanced homodyne detector is depicted in Figure 1. The signal mode a interferes with a strong laser beam mode b in a balanced 50/50 beam splitter. The mode b is the so-called the local oscillator (LO) mode of the detector. It operates at the same frequency as a, and is excited by the laser in a strong coherent state jzi. Since in all experiments that use homodyne detectors the signal and the LO beams are generated by a common source, we assume that they have a fixed phase relation. In this case the LO phase provides a reference for the quadrature measurement, namely we identify the phase of the LO with the phase difference between the two modes. As we will see, by tuning ’ ¼ arg z we can measure the quadrature X’ at different phases. After the beam splitter the two modes are detected by two identical photodetectors (usually linear avalanche photodiodes), and finally the difference of photocurrents at zero frequency is electronically processed and rescaled by 2jzj. According to Equation (24), the modes at the output of the 50=50 beam splitter ( ¼ 1=2) are written ab aþb c ¼ pffiffiffi , d ¼ pffiffiffi , 2 2
ð27Þ
216
MAURO D’ARIANO ET AL.
FIGURE 1. Scheme of the balanced homodyne detector.
hence the difference of photocurrents is given by the following operator I¼
d y d c y c ay b þ by a ¼ : 2jzj 2jzj
ð28Þ
Let us now proceed to evaluate the probability distribution of the output photocurrent I for a generic state of the signal mode a. In the following treatment we will follow Refs. [45,46]. Let us consider the moment generating function of the photocurrent I ðÞ ¼ Tr jzihzjeiI ,
ð29Þ
which provides the probability distribution of I as the Fourier transform Z
þ1
PðIÞ ¼ 1
d iI e ðÞ: 2p
ð30Þ
Using the BCH formula [47,48] for the SU(2) group, namely y
y
y
y
expð aby * ay bÞ ¼ eb a ð1 þ jj2 Þð1=2Þðb ba aÞ e* a b ,
¼
tanj j, ð31Þ j j
one can write the exponential in Equation (29) in normal-ordered form with respect to mode b as follows * y
ðÞ ¼ eitanð=ð2jzjÞÞb a
+ ay aby b y cos eitanð=ð2jzjÞÞa b : 2jzj
ab
ð32Þ
QUANTUM TOMOGRAPHY
217
Since mode b is in a coherent state jzi the partial trace over b can be evaluated as follows *
+ ay a itanð=ð2jzjÞÞzay cos e ðÞ ¼ e 2jzj a * by b + z : z cos 2jzj
itanð=ð2jzjÞÞz* a
ð33Þ
Using now Equation (13), one can rewrite Equation (33) in normal order with respect to a, namely
2 izsinð=ð2jzjÞÞay y 2 iz* sinð=ð2jzjÞÞa ðÞ ¼ e ða a þ jzj Þ e exp 2 sin , 4jzj a
ð34Þ
In the strong-LO limit z ! 1, only the lowest order terms in l/jzj are retained, ay a is neglected with respect to jzj2, and Equation (34) simplifies as follows 2
i’ y i’ lim ðÞ ¼ eið=2Þe a exp eið=2Þe a ¼ hexp½iX’ ia , z!1 8 a
ð35Þ
where ’ ¼ arg z. The generating function in Equation (35) is then equivalent to the positive operator-valued measure (POVM) Z
þ1
ðxÞ ¼ 1
d exp½iðX’ xÞ ¼ ðX’ xÞ:jxi’’ hxj, 2p
ð36Þ
namely the projector on the eigenstate of the quadrature X’ with eigenvalue x. In conclusion, the balanced homodyne detector achieves the ideal measurement of the quadrature X’ in the strong LO limit. In this limit, the probability distribution of the output photocurrent I approaches exactly the probability distribution pðx, ’Þ ¼’ hxjjxi’ of the quadrature X’, and this for any state of the signal mode a. It is easy to take into account nonunit quantum efficiency at detectors. According to Equation (25) one has the replacements pffiffiffiffiffiffiffiffiffiffiffi pffiffiffi c 1 u, pffiffiffiffiffiffiffiffiffiffiffi pffiffiffi d ) d 1 , c)
u, vacuum modes
ð37Þ ð38Þ
218
MAURO D’ARIANO ET AL.
and now the output current is rescaled by 2jzj, namely 1 I ^ 2jzj
("
sffiffiffiffiffiffiffiffiffiffiffi # ) 1 y aþ ðu þ Þ b þ h c , 2
ð39Þ
where only terms containing the strong LO mode b are retained. The POVM is then obtained by replacing sffiffiffiffiffiffiffiffiffiffiffi 1 ðu’ þ ’ Þ X’ ! X’ þ 2
ð40Þ
in Equation (36), with w’ ¼ ðwy ei’ þ wei’ Þ=2, w ¼ u, , and tracing the vacuum modes u and . One then obtains Z ðxÞ ¼ Z
þ1 1
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi d iðX’ xÞ e jh0jei ð1Þ=2 u’ j0ij2 2p
þ1
d iðX’ xÞ 2 ðð1Þ=8Þ e e 1 2p " # 1 ðx X’ Þ2 ¼ qffiffiffiffiffiffiffiffiffiffiffi exp 22 2p2 ¼
1 ¼ qffiffiffiffiffiffiffiffiffiffiffi 2p2
Z
þ1
1
0 2
dx0 eð1=2 Þðxx Þ jx0 i’’ hx0 j, 2
ð41Þ
where 2 ¼
1 : 4
ð42Þ
Thus the POVM, and in turn the probability distribution of the output photocurrent, are just pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi the Gaussian convolution of the ideal ones with rms ¼ ð1 Þ=ð4Þ. D. Heterodyne Detection Heterodyne detection allows one to perform the joint measurement of two conjugated quadratures of the field [49,50]. The scheme of the heterodyne detector is depicted in Figure 2.
QUANTUM TOMOGRAPHY
219
FIGURE 2. Scheme of the heterodyne detector.
A strong local oscillator at frequency ! in a coherent state j i hits a beam splitter with transmissivity pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! 1, and with the coherent amplitude such that :j j ð1 Þ is kept constant. If the output photocurrent is sampled at the intermediate frequency !IF , just the field modes a and b at frequency ! !IF are selected by the detector. Modes a and b are usually referred to as signal band and image band modes, respectively. In the strong LO limit, upon tracing the LO mode, the output photocurrent Ið!IF Þ rescaled by is equivalent to the complex operator Z¼
Ið!IF Þ ¼ a by ,
ð43Þ
where the arbitrary phases of modes have been suitably chosen. The heterodyne photocurrent Z is a normal operator, equivalent to a couple of commuting selfadjoint operators Z ¼ Re Z þ iIm Z, ½Z, Zy ¼ ½Re Z, Im Z ¼ 0:
ð44Þ
The POVM of the detector is then given by the orthogonal eigenvectors of Z. It is here convenient to introduce the notation of Ref. [51] for vectors in the tensor product of Hilbert spaces H H jAii ¼
X
Anm jni jmi:ðA IÞjIii:ðI A ÞjIii,
ð45Þ
nm
where A denotes the transposed operator with respect to some prechosen orthonormal basis. Equation (45) exploits the isomorphism between the Hilbert space of the Hilbert–Schmidt operators A, B 2 HSðHÞ with scalar product hA, Bi ¼ Tr½Ay B , and the Hilbert space of bipartite vectors jAii, jBii 2 H H, where one has hhAjBii:hA, Bi.
220
MAURO D’ARIANO ET AL.
Using the abovepnotation it is easy to write the eigenvectors of Z with ffiffiffi eigenvalue z as ð1= pÞjDðzÞii. In fact one has [52] ZjDðzÞii ¼ ða by ÞðDa ðzÞ Ib ÞjIii ¼ ðDa ðzÞ Ib Þða by þ zÞ
1 X
jni jni
n¼0
¼ zðDa ðzÞ Ib ÞjIii ¼ zjDðzÞii:
ð46Þ
The orthogonality of such eigenvectors can be verified through the relation hhDðzÞjDðz0 Þii ¼ Tr½Dy ðzÞDðz0 Þ ¼ pð2Þ ðz z0 Þ,
ð47Þ
where ð2Þ ð Þ denotes the Dirac delta function over the complex plane Z
ð2Þ
ð Þ ¼
C
d2 expð * * Þ: p2
ð48Þ
In conventional heterodyne detection the image band mode is in the vacuum state, and one is just interested in measuring the field mode a. In this case we can evaluate the POVM upon tracing on mode b. One has 1 Trb ½jDðzÞiihhDðzÞjIa j0ih0j p 1 1 ¼ DðzÞj0ih0jDy ðzÞ ¼ jzihzj, p p
ðz, z* Þ ¼
ð49Þ
namely one obtain the projectors on coherent states. The coherent-state POVM provides the optimal joint measurement of conjugated quadratures of the field [53]. In fact, heterodyne detection allows one to measure the Q-function in Equation (4). According to Equation (3) then it provides the expectation value of the antinormal ordered field operator. For a state the expectation value of any quadrature X’ is obtained as Z hX’ i ¼ Tr½X’ ¼
C
d2 Reð ei’ ÞQð , * Þ: p
ð50Þ
The price to pay for jointly measuring noncommuting observables is an additional noise. The rms fluctuation is evaluated as follows Z C
d2 1 ½Reð ei’ Þ 2 Qð , * Þ hX’ i2 ¼ hX’2 i þ , p 4
ð51Þ
QUANTUM TOMOGRAPHY
221
where hX’2 i is the intrinsic noise, and the additional term is usually referred to as ‘‘the additional 3 dB noise due to the joint measure’’ [54–56]. The effect of nonunit quantum efficiency can be taken into account in an analogous way as in Section II.C for homodyne detection. The heterodyne photocurrent is rescaled by an additional factor 1=2 , and vacuum modes u and v are introduced, thus giving [57] sffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffi 1 1 y u : Z ¼ a by þ
ð52Þ
Upon tracing over modes u and , one obtain the POVM Z ðz, z* Þ ¼ Z
C
d 2 ðZy z* Þ * ðZ zÞ j0iu j0i u h0j h0je 2 p
d 2 ðZy z* Þ * ðZzÞ ðð1Þ=Þjj2 e e ð53Þ 2 C p Z d 2 z0 ðjz0 zj2 Þ=2 ð=ð1ÞjZzj2 e ¼ e jDðz0 ÞiihhDðz0 Þj: ¼ 2 pð1 Þ C p
¼
The probability distribution is then a Gaussian convolution on the complex plane of the ideal probability with rms 2 ¼ ð1 Þ=. Analogously, the coherent-state POVM for conventional heterodyne detection with vacuum image band mode is replaced with Z ðz, z* Þ ¼
C
d 2 z0 ðjz0 zj2 =2 Þ 0 0 e jz ihz j: p2
ð54Þ
From Equation (9) we can equivalently say that the heterodyne detection probability density is given by the generalized Wigner function Ws ð , * Þ, with s ¼ 1 ð2=Þ. Notice that for < 1, the average of functions n * m is related to the expectation value of a different ordering of field operators. However, one has the relevant identity [27,58]
: ðay Þn am : s ¼
ðn,mÞ X k¼0
k!
n m t sk : ðay Þnk amk :t , 2 k k
ð55Þ
222
MAURO D’ARIANO ET AL.
where ðn, mÞ ¼ minðn, mÞ, and then Z C
¼
d 2 W1ð2=Þ ð , * Þ m * n ðn,mÞ X k¼0
k!
n k
!
m k
!
1 k mk y nk ha ða Þ i:
ð56Þ
Notice that the measure of the Q-function (or any smoothed version for < 1) does not allow one to recover the expectation value of any operator through an average over heterodyne outcomes. In fact, one needs the admissibility of anti-normal ordered expansion [59] and the convergence of the integral in Equation (56). In particular, the matrix elements of the density operator cannot be recovered. For some operators in which heterodyne measurement is allowed, a comparison with quantum homodyne tomography will be given in Section IV.C. Finally, it is worth mentioning that all results of this section are valid also for an image-band mode with the same frequency of the signal. In this case a measurement scheme based on multiport homodyne detection should be used [50,58,60–66].
III. GENERAL TOMOGRAPHIC METHOD In this section we review the general tomographic method of spanning sets of operators of Ref. [24], and re-derive in this general framework the exact homodyne tomography method of Ref. [8]. In the first section we first give a brief history of quantum tomography, starting with the original proposal of Vogel and Risken [7], that extended the conventional tomographic imaging to the domain of quantum optics. Here we will briefly sketch the conventional imaging tomographic method, and show the analogy with the method of Ref. [7]. The latter achieves the quantum state via the Wigner function, which in turn is obtained by inverse Radon transform of the homodyne probability distributions for varying phase with respect to the LO. As already mentioned, the Radon transform inversion is affected by uncontrollable bias: such limitations and the intrinsic unreliability of this method are thoroughly explained in the same section. In contrast to the Radon transform method, the first exact method of Ref. [8] (and successively refined in Ref. [9]) allows the reconstruction of the density matrix , bypassing the step of the Wigner function, and achieving the matrix elements of —or the expectation of any arbitrary operator—by
QUANTUM TOMOGRAPHY
223
just averaging the pertaining estimators (also called Kernel functions or pattern functions), evaluated on the experimental homodyne data. This method will be re-derived in Section III.C.3, as a special case of the general tomographic method of Ref. [24] here reviewed in Section III.C, where we introduce the concept of ‘‘quorum,’’ which is the complete set of observables whose measurement provides the expectation value of any desired operator. Here we also show how some ‘‘orthogonality’’ and ‘‘completeness’’ relations in the linear algebra of operators are sufficient to individuate a quorum. As another application of the general method, in Section III.C.5 the tomography of spin systems [28] is reviewed, which was originally derived from the group theoretical methods of Refs. [18–20]. Another application is the quantum tomography of a free particle state, given in Section III.C.6. In Section III.D we include some further developments to improve the tomographic method, such as the deconvolution techniques of Ref. [20] to correct the imperfections of detectors and experimental apparatus with a suitable data processing, and the adaptive tomography of Ref. [22] to reduce the statistical fluctuations of tomographic estimators, by adapting the averaged estimators to the given sample of experimental data. The other relevant topics of homodyning observables, multimode tomography, and tomography of quantum operations will be given a separate treatment in the following sections of the chapter.
A. Brief Historical Excursus The problem of quantum state determination through repeated measurements on identically prepared systems was originally stated by Fano in 1957 [5], who first recognized the need for measuring more that two noncommuting observables to achieve such a purpose. However, it was only with the proposal by Vogel and Risken [7] that quantum tomography was born. The first experiments, which already showed reconstructions of coherent and squeezed states, were performed by Raymer and his group at the University of Oregon [6]. The main idea at the basis of the first proposal is that it is possible to extend to the quantum domain the algorithms that are conventionally used in medical imaging to recover two-dimensional (mass) distributions from unidimensional projections in different directions. This first tomographic method, however, was unreliable for the reconstruction of an unknown quantum state, since arbitrary smoothing parameters were needed in the Radon transform-based imaging procedure. The first exact unbiased tomographic method was proposed in Ref. [8], and successively simplified in Ref. [9]. Since then, the new exact method has
224
MAURO D’ARIANO ET AL.
been practically implemented in many experiments, such as the measurement of the photon statistics of a semiconductor laser [10], and the reconstruction of the density matrix of a squeezed vacuum [11]. The success of optical homodyne tomography has then stimulated the development of state-reconstruction procedures in other quantum harmonic oscillator systems, such as for atomic beams [12], and the vibrational state of a molecule [13], of an ensemble of helium atoms [14], and of a single ion in a Paul trap [15]. After the original exact method, quantum tomography has been generalized to the estimation of arbitrary observables of the field [16], to any number of modes [17], and, finally, to arbitrary quantum systems via group theory [18–21], with further improvements such as noise deconvolution [20], adaptive tomographic methods [22], and the use of max-likelihood strategies [23], which has made it possible to reduce dramatically the number of experimental data, up to a factor of 103–105, with negligible bias for most practical cases of interest. Finally, a method for tomographic estimation of the unknown quantum operation of a quantum device has been proposed [25], where a fixed input entangled state plays the role of all input states in a sort of quantum parallel fashion. Moreover, as another manifestation of such a quantum parallelism, one can also estimate the ensemble average of all operators by measuring only one fixed ‘‘universal’’ observable on an extended Hilbert space in a sort of quantum hologram [67]. This latest development is based on the general tomographic method of Ref. [24], where the tomographic reconstruction is based on the existence of spanning sets of operators, of which the irreducible unitary group representations of the group methods of Refs. [18–21] are just a special case. B. Conventional Tomographic Imaging In conventional medical tomography, one collects data in the form of marginal distributions of the mass function m(x, y). In the complex plane the marginal r(x, ’) is a projection of the complex function m(x, y) on the direction indicated by the angle ’ 2 [0, p], namely Z rðx, ’Þ ¼
þ1
1
dy m ðx þ iyÞei’ , ðx iyÞei’ : p
ð57Þ
The collection of marginals for different ’ is called ‘‘Radon transform.’’ The tomography process essentially consists in the inversion of the Radon transform (57), in order to recover the mass function m(x, y) from the marginals r(x, ’).
225
QUANTUM TOMOGRAPHY
Here we derive inversion of Equation (57). Consider the identity Z mð , * Þ ¼
C
d 2 ð2Þ ð Þmð , * Þ,
ð58Þ
where (2)( ) denotes the Dirac delta function of Equation (48), and m( , *) ¼ m(x, y) with ¼ x þ iy and * ¼ xiy. It is convenient to rewrite Equation (48) as follows ð2Þ
Z
þ1
ð Þ ¼ 0
dk k 4
Z
2p
0
d’ ik ’ e ¼ p2
Z
þ1 1
dk jkj 4
Z 0
p
d’ ik ’ e , p2
ð59Þ
with ’ ¼ Re( ei’) ¼ ’ þ p. Then, from Equations (58) and (59) the inverse Radon transform is obtained as follows: Z
p
mðx, yÞ ¼ 0
d’ p
Z
þ1
dx0 rðx0 , ’Þ
1
Z
þ1 1
dk 0 jkjeikðx ’ Þ : 4
ð60Þ
Equation (60) is conventionally written as Z
p
mðx, yÞ ¼ 0
d’ p
Z
þ1 1
dx0 rðx0 , ’Þ Kðx0 ’ Þ,
ð61Þ
where K(x) is given by Z
þ1
KðxÞ: 1
dk 1 jkjeikx ¼ Re 4 2
Z
þ1
dk keikx ¼ 0
1 1 P , 2 x2
ð62Þ
with P denoting the Cauchy principal value. Integrating Equation (61) by parts one obtains the tomographic formula that is usually found in medical imaging, i.e., mðx, yÞ ¼
1 2p
Z
p 0
Z
þ1
d’ P 1
dx0
1 @ rðx0 , ’Þ, x0 ’ @x0
ð63Þ
which allows the reconstruction of the mass distribution m(x, y) from its projections along different directions r(x, ’). 1. Extension to the Quantum Domain In the ‘‘quantum imaging’’ process the goal is to reconstruct a quantum state in the form of its Wigner function starting from its marginal probability distributions. As shown in Section II.A, the Wigner function is a
226
MAURO D’ARIANO ET AL.
real normalized function that is in one-to-one correspondence with the state density operator . As noticed in Equation (7), the probability distributions of the quadrature operators X’ ¼ ( yei’ þ ei’)/2 are the marginal probabilities of the Wigner function for the state . Thus, by applying the same procedure outlined in the previous subsection, Vogel and Risken [7] proposed a method to recover the Wigner function via an inverse Radon transform from the quadrature probability distributions p(x, ’), namely Z
p
Wðx, yÞ ¼ 0
d’ p
Z
þ1 1
0
0
dx pðx , ’Þ
Z
þ1 1
dk 0 jkjeikðx xcos’ysin’Þ : 4
ð64Þ
(Surprisingly, in the original paper [7] the connection to the tomographic imaging method was never mentioned.) As shown in Section II.C the experimental measurement of the quadratures of the field is obtained using the homodyne detector. The method proposed by Vogel and Risken, namely the inversion of the Radon transform, was the one used in the first experiments [6]. This first method is, however, not reliable for the reconstruction of an unknown quantum state, due to the intrinsic unavoidable systematic error related to the fact that the integral on k in Equation (64) is unbounded. In fact, in order to evaluate the inverse Radon transform, one would need the analytical form of the marginal distribution of the quadrature p(x, ’), which, in turn, can only be obtained by collecting the experimental data into histograms, and thence ‘‘spline-ing’’ them. This, of course, is not an unbiased procedure since the degree of spline-ing, the width and the number of the histogram bins, and finally the number of different phases used to collect the experimental data sample introduce systematic errors if they are not set above some minimal values, which actually depend on the unknown quantum state that one wants to reconstruct. Typically, an over-spline-ing will washout the quantum features of the state, whereas, vice versa, an under-spline-ing will create negative photon probabilities in the reconstruction (see Ref. [8] for details). A new exact method was then proposed in Ref. [8], as an alternative to the Radon transform technique. This approach, referred to as quantum homodyne tomography, allows one to recover the quantum state of the field —along with any ensemble average of arbitrary operators—by directly averaging functions of the homodyne data, abolishing the intermediate step of the Wigner function, which is the source of all systematic errors. Only statistical errors are present, and they can be reduced arbitrarily by collecting more experimental data. This exact method will be re-derived from the general tomographic theory in Section III.C.3.
QUANTUM TOMOGRAPHY
227
C. General Method of Quantum Tomography In this section the general method of quantum tomography is explained in detail. First, we give the basics of Monte Carlo integral theory which are needed to implement the tomographic algorithms in actual experiments and in numerical simulations. Then, we derive the formulas on which all schemes of state reconstruction are based. 1. Basic Statistics The aim of quantum tomography is to estimate, for an arbitrary quantum system, the mean value hOi of a system operator O using only the results of the measurements on a set of observables {Ql, l 2 }, called the‘‘quorum.’’ The procedure by which this can be obtained needs the estimator or ‘‘Kernel function’’ R[O](x, l) which is a function of the eigenvalues x of the quorum operators. Integrating the estimator with the probability p(x, l) of having outcome x when measuring Ql, the mean value of O is obtained as follows Z hOi ¼
Z d ðxÞpðx, ÞR½O ðx, Þ,
d
ð65Þ
where the first integral is performed on the values of l that designate all quorum observables, and the second on the eigenvalues of the quorum observable Ql determined by the l variable of the outer integral. For discrete set and/or discrete spectrum of the quorum, both integrals in (65) can be suitably replaced by sums. The algorithm to estimate hOi with Equation (65) is the following. One chooses a quorum operator Ql by drawing l with uniform probability in and performing a measurement, obtaining the result xi. By repeating the procedure N times, one collects the set of experimental data {(xi, li), with i ¼ 1, . . . , N}, where li identifies the quorum observable used for the ith measurement, and xi its result. From the same set of data the mean value of any operator O can be obtained. In fact, one evaluates the estimator of hOi and the quorum Ql, and then samples the double integral of (65) using the limit hOi ¼ lim
N!1
N 1 X R½O ðxi , i Þ: N i¼1
ð66Þ
Of course the finite sum FN ¼
N 1 X R½O ðxi , i Þ N i¼1
ð67Þ
228
MAURO D’ARIANO ET AL.
gives an approximation of hOi. To estimate the error in the approximation one applies the central limit theorem that we recall here. Central limit theorem. Consider N statistically uncorrelated random variables {zi, i ¼ 1, . . . , N}, with mean values (zi), variances 2(zi), and bounded third-order moments. If the variances 2(zi) are all of the same order then the statistical variable ‘‘average’’ y defined as
yN ¼
N 1 X zi N i¼1
ð68Þ
has mean and variance
ðyN Þ ¼
N 1 X ðzi Þ, N i¼1
2 ðyN Þ ¼
N 1 X 2 ðzi Þ: N 2 i¼1
ð69Þ
The distribution of yN approaches asymptotically a Gaussian for N ! 1. In practical cases, the distribution of y can be considered Gaussian already for N as low as N 10. For our needs the hypotheses are met if the estimator R[O](xi, li) in Equation (67) has limited moments up to the third order, since, even though xi have different probability densities depending on li, nevertheless, since li is also random all zi here given by zi ¼ R½O ðxi , i Þ
ð70Þ
ðzi Þ ¼ hOi
ð71Þ
have common mean
and variance Z
Z
ðzi Þ ¼ 2
d
d ðxÞpðx, ÞR2 ½O ðx, Þ hOi2 :
ð72Þ
Using the central limit theorem, we can conclude that the experimental average y:FN in Equation (67) is a statistical variable distributed as a Gaussian with mean value ( yN):(zi) and variance 2( yN):(1/N) 2(zi). Then the tomographic estimation converges with statistical error that
229
QUANTUM TOMOGRAPHY
decreases as N1/2. A statistically precise estimate of the confidence interval is given by sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PN 2 i¼1 ½zi yN , "N ¼ NðN 1Þ
ð73Þ
with zi given by Equation (70) and yN by Equation (68). In order to test that the confidence intervals are estimated correctly, one can check that the FN distribution is actually Gaussian. This can be done by comparing the histogram of the block data with a Gaussian, or by using the 2 test. 2. Characterization of the Quorum Different estimations technique have been proposed tailored to different quantum systems, such as the radiation field [9,17], trapped ions and molecular vibrational states [68], and spin systems [69]. All the known quantum estimation techniques can be embodied in the following approach. The tomographic reconstruction of an operator O is possible when there exists a resolution of the form Z
d Tr½OBy ðÞ CðÞ,
O¼
ð74Þ
where l is a (possibly multidimensional) parameter on a (discrete or continuous) manifold . The only hypothesis in (74) is the existence of the trace. If, for example, O is a trace–class operator, then we do not need to require B(l) to be of Hilbert–Schmidt class, since it is sufficient to require B(l) bounded. The operators C(l) are functions of the quorum of observables measured for the reconstruction, whereas the operators B(l) form the dual basis of the set C(l). The term E½O ðÞ ¼ Tr½OBy ðÞ CðÞ
ð75Þ
represents the quantum estimator for the operator O. The expectation value of O is given by the ensemble average Z
y
Z
d Tr½OB ðÞ Tr½CðÞ :
hOi:Tr½O ¼
dhE½O ðÞi,
ð76Þ
where is the density matrix of the quantum system under investigation. Notice that the quantity Tr½CðlÞ depends only on the quantum state, and it is related to the probability distribution of the measurement outcomes,
230
MAURO D’ARIANO ET AL.
whereas the term Tr½OBy ðlÞ depends only on the quantity to be measured. In particular, the tomography of the quantum state of a system corresponds to writing Equation (74) for the operators O ¼ jkihnj, {jni} being a given Hilbert space basis. For a given system, the existence of a set of operators C(l), together with its dual basis B(l) allows universal quantum estimation, i.e., the reconstruction of any operator. We now give two characterizations of the sets B(l) and C(l) that are necessary and sufficient conditions for writing Equation (74). Condition 1: bi-orthogonality. Let us consider a complete orthonormal basis of vectors jni (n ¼ 0, 1, . . .). Equation (74) is equivalent to the biorthogonality condition Z
d qjBy ðÞjp mjCðÞjl ¼ mp lq ,
ð77Þ
where ij is the Kronecker delta. Equation (77) can be straightforwardly generalized to a continuous basis. Condition 2: completeness. If the set of operators CðlÞ is complete, namely if any operator can be written as a linear combination of the CðlÞ as Z O¼
d aðÞ CðÞ,
ð78Þ
then Equation (74) is also equivalent to the trace condition Tr By ðÞ CðÞ ¼ ð, Þ,
ð79Þ
where ðl, Þ is a reproducing kernel for the set BðlÞ, namely it is a function or a tempered distribution which satisfies Z d BðÞ ð, Þ ¼ BðÞ:
ð80Þ
An analogous identity holds for the set of CðlÞ Z d CðÞ ð, Þ ¼ CðÞ:
ð81Þ
The proofs are straightforward. The completeness condition on the operators CðlÞ is essential for the equivalence of (74) and (79). A simple counterexample is provided by the set of projectors PðlÞ ¼ jlihlj over the eigenstates of a self-adjoint operator L. In fact, Equation (79) is satisfied by CðlÞ ¼ BðlÞ:PðlÞ. However, since they do not form a complete set in the
QUANTUM TOMOGRAPHY
231
sense of Equation (78), it is not possible to express a generic operator in the R form X ¼ dl hljOjli jlihlj. If either the set BðlÞ or the set CðlÞ satisfy the additional trace condition Tr½By ðÞBðÞ ¼ ð, Þ,
ð82Þ
Tr½C y ðÞCðÞ ¼ ð, Þ,
ð83Þ
then we have CðlÞ ¼ BðlÞ (notice that neither BðlÞ nor CðlÞ need to be unitary). In this case, Equation (74) can be rewritten as Z
d Tr OC y ðÞ CðÞ:
O¼
ð84Þ
A certain number of observables Ql constitute a quorum when there are functions fl ðQl Þ ¼ CðlÞ such that CðlÞ form an irreducible set. The quantum estimator for O in Equation (75) is then written as a function of the quorum operators E½O ðÞ:E ½O ðQ Þ:
ð85Þ
Notice that if a set of observables Ql constitutes a quorum, than the set of projectors jqill hqj over their eigenvectors provides a quorum too, with the measure dl in Equation (74) including the measure dl ðqÞ. Notice also that, even once the quorum has been fixed, the unbiased estimator for an operator O will not in general be unique, since there can exist functions N ðQl Þ that satisfy [22] Z d N ðQ Þ ¼ 0,
ð86Þ
and that will be called ‘‘null estimators.’’ Two unbiased estimators that differ by a null estimator yield the same results when estimating the operator mean value. We will see in Section III.D.2 how the null estimators can be used to reduce the statistical noise. In terms of the quorum observables Ql Equation (76) is rewritten Z
d Tr OBy ðÞ Tr½ f ðQ Þ
hO i ¼ Z
¼
Z d
d ðqÞpðq, Þ Tr½OBy ðÞ f ðqÞ,
ð87Þ
232
MAURO D’ARIANO ET AL.
where pðq, lÞ ¼ lhqjjqil is the probability density of getting the outcome q from the measurement of Ql on the state . Equation (87) is equivalent to the expression (65), with estimator ð88Þ R½O ðq, Þ ¼ Tr OBy ðÞ f ðqÞ: Of course it is of interest to connect a quorum of observables to a resolution of the form (74), since only in this case can there be a feasible reconstruction scheme. If a resolution formula is written in terms of a set of self-adjoint operators, the set itself constitutes the desired quorum. However, in general a quorum of observables is functionally connected to the corresponding resolution formula. If the operators CðlÞ are unitary, then they can always be taken as the exponential map of a set of self-adjoint operators, which then are identified with our quorum Ql . The quantity Tr½CðlÞ is thus connected with the moment generating function of the set Ql , and hence to the probability density pðq, lÞ of the measurement outcomes, which play the role of the Radon transform in the quantum tomography of the harmonic oscillator. In general, the operators CðlÞ can be any function (neither self-adjoint nor unitary) of observables and, even more generally, they may be connected to POVMs rather than observables. The dual set BðlÞ can be obtained from the set CðlÞ by solving Equation (79). For finite quorums, this resorts to a matrix inversion. An alternative procedure uses the Gram–Schmidt orthogonalization procedure [24]. No such general procedure exists for a continuous spanning set. Many cases, however, satisfy conditions (82) and (83), and thus we can write BðlÞ ¼ CðlÞy . 3. Quantum Estimation for Harmonic Oscillator Systems The harmonic oscillator models several systems of interest in quantum mechanics, such as the vibrational states of molecules, the motion of an ion in a Paul trap, and a single mode radiation field. Different proposals have been suggested in order to reconstruct the quantum state of a harmonic system, which all fit the framework of the previous section, which is also useful for devising novel estimation techniques. Here, the basic resolution formula involves the set of displacement operators Dð Þ ¼ expð ay * aÞ, which can be viewed as exponentials of the field-quadrature operators X’ ¼ ðay ei’ þ aei’ Þ=2. We have shown in Section II.C that for a singlemode radiation field X’ is measured through homodyne detection. For the vibrational tomography of a molecule or a trapped ion X’ corresponds to a time-evolved position or momentum. The set of displacement operators
233
QUANTUM TOMOGRAPHY
satisfies Equations (79) and (83), since Tr½Dð ÞDy ð Þ ¼ pð2Þ ð Þ,
ð89Þ
whereas Equation (84) reduces to the Glauber formula Z
d2 Tr ODy ð Þ Dð Þ: p
O¼ C
ð90Þ
Changing to polar variables ¼ ði=2Þkei’ , Equation (90) becomes Z
p
O¼
d’ p
0
Z
þ1 1
dkjkj Tr½OeikX’ eikX’ , 4
ð91Þ
which shows explicitly the dependence on the quorum X’ . Taking the ensemble average of both members and evaluating the trace over the set of eigenvectors of X’ , one obtains Z
p
hO i ¼
d’ p
0
Z
þ1
1
dx pðx, ’Þ R½O ðx, ’Þ,
ð92Þ
where pðx; ’Þ ¼ ’hxjjxi’ is the probability distribution of quadratures outcome. The estimator of the operator ensemble average hOi is given by R½O ðx, ’Þ ¼ Tr½OKðX’ xÞ ,
ð93Þ
where KðxÞ is the same as in Equation (62). Equation (92) is the basis of quantum homodyne tomography. Notice that even though KðxÞ is unbounded, the matrix element h jKðX’ xÞji can be bounded, whence it can be used to sample the matrix element h jji of the state , which, according to Section III.C.1, is directly obtained by averaging the estimator (93) over homodyne experimental values. In fact, for bounded h jKðX’ xÞji, the central limit theorem guarantees that
jj ¼
Z
p 0
d’ p
¼ lim
N!1
Z
þ1 1
dx pðx, ’Þ
jKðX’ xÞj
N 1 X jKðx’n xn ÞÞj , N n¼0
ð94Þ ð95Þ
234
MAURO D’ARIANO ET AL.
where xn is the homodyne outcome measured at phase ’n and distributed with probability pðx, ’Þ. Systematic errors are eliminated by choosing randomly each phase ’n at which homodyne measurement is performed. As shown in Section III.C.1, for a finite number of measurements N, the estimate (95) of the integral in Equation (94) is Gaussian distributed around the true value h jji, with statistical error decreasing as N 1=2 . Notice that the measurability of the density operator matrix element depends only on the boundedness of the matrix element of the estimator, and that no adjustable parameters are needed in the procedure, which thus is unbiased. The general procedure for noise deconvolution is presented in Section III.D.1. However, we give here the main result for the density matrix reconstruction. As shown in Section II.C, the effect of the efficiency in homodyne detectors is a Gaussian convolution of the ideal probability pðx, ’Þ, as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Z þ1 2 0 2 dx0 eð2=ð1ÞÞðxx Þ pðx0 , ’Þ: p ðx, ’Þ ¼ pð1 Þ 1
ð96Þ
The tomographic reconstruction procedure still holds upon replacing pðx, ’Þ with p ðx, ’Þ, so that Z
p
¼ 0
d’ p
Z
þ1
dx p ðx, ’ÞK ðX’ xÞ,
1
ð97Þ
where now the estimator is K ðxÞ ¼
1 Re 2
Z
þ1
k dk eðð1Þ=8Þk
2
þikx
:
ð98Þ
0
In fact, by taking the Fourier transform of both members of Equation (96), one can easily check that Z ¼
p
0
Z ¼
0
p
d’ p d’ p
Z
þ1
1
Z
dx p ðx, ’ÞK ðX’ xÞ
þ1
1
dx pðx, ’ÞKðX’ xÞ:
ð99Þ
Notice that the anti-Gaussian in Equation (98) causes a much slower convergence of the Monte Carlo integral (97): the statistical fluctuation will increase exponentially for decreasing detector efficiency . In order to
QUANTUM TOMOGRAPHY
235
achieve good reconstructions with non-ideal detectors, then one has to collect a larger number of data. It is clear from Equation (95) that the measurability of the density matrix depends on the chosen representation and on the quantum efficiency of the detectors. For example, for the reconstruction of the density matrix in the Fock basis the estimators are given by Z
þ1
dkjkj ðð1Þ=8Þk2 ikx e hn þ djeikX’ jni 4 1 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Z þ1 n! 2 ¼ eidð’þðp=2ÞÞ dkjkjeðð12Þ=2Þk i2kx kd Ldn ðk2 Þ, ðn þ dÞ! 1
R ½jnihn þ dj ðx, ’Þ ¼
ð100Þ where Ldn ðxÞ denotes the generalized Laguerre polynomials. Notice that the estimator is bounded only for > 1=2, and below the method would give unbounded statistical errors. However, this bound is well below the values that are reasonably achieved in the laboratory, where actual homodyne detectors have efficiencies ranging between 70% and 90% [11,70]. Moreover, a more efficient algorithm is available, that uses the factorization formulas that hold for ¼ 1 [71,72] R½jnihdj ðx, ’Þ ¼ eid’ ½4xun ðxÞnþd ðxÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffi 2 n þ 1unþ1 ðxÞnþd ðxÞ 2 n þ d þ 1un ðxÞnþdþ1 ðxÞ ,
ð101Þ
where uj ðxÞ and j ðxÞ are the normalizable and unnormalizable eigenfunctions of the harmonic oscillator with eigenvalue j, respectively. The noise from quantum efficiency can be unbiased via the inversion of the Bernoulli convolution, which holds for > 1=2 [73]. The use of Equation (92) to estimate arbitrary operators through homodyne tomography will be the subject of Section IV. Notice that Equation (90) cannot be used for unbounded operators; however the estimators for some unbounded operators will be derived in Section IV.A. 4. Some Generalizations Using condition (79) one can see that the Glauber formula can be generalized to Z O¼ C
d 2 Tr½OF1 Dð ÞF2 F21 Dy ð ÞF11 , p
ð102Þ
236
MAURO D’ARIANO ET AL.
where F1 and F2 are two generic invertible operators. By choosing F1y ¼ F2 ¼ SðÞ, where SðÞ is the squeezing operator
1 2 y2 2 2 * a a , 2 C, SðÞ ¼ exp 2
ð103Þ
we obtain the tomographic resolution Z
p
hO i ¼ 0
d’ p
Z
þ1 1
dx p ðx, ’Þ Tr OKðX’ xÞ ,
ð104Þ
in terms of the probability distribution of the generalized squeezed quadrature operators X’ ¼ S y ðÞX’ SðÞ ¼
1 i’ ðe þ ei’ Þay þ ðei’ þ * ei’ Þa , 2
ð105Þ
with ¼ coshjj and ¼ sinhjjexp½2i argðÞ . Such an estimation technique has been investigated in detail in Ref. [74]. A different estimation technique can be obtained by choosing in y Equation (102) F1 ¼ I, the identity operator, and F2 ¼ ðÞa a , the parity operator. In this case one gets Z O¼ C
i y y d 2 h Tr ODy ð ÞðÞa a ðÞa a Dð Þ: p
ð106Þ
Changing variable to ¼ 2 and using the relation y
y
ðÞa a Dð2 Þ ¼ Dy ð ÞðÞa a Dð Þ
ð107Þ
it follows Z hO i ¼ C
i i h y y d 2 h Tr O4Dy ð ÞðÞa a Dð Þ Tr Dð ÞDy ð ÞðÞa a : p
ð108Þ
Hence, it is possible to estimate hOi by repeated measurement of the parity operator on displaced versions of the state under investigation. An approximated implementation of this technique for a single-mode radiation field has been suggested in Refs. [75,76] through the measurement of the photon number probability on states displaced by a beam splitter. A similar
QUANTUM TOMOGRAPHY
237
scheme has been used for the experimental determination of the motional quantum state of a trapped atom [15]. In comparison with the approximated methods, Equation (108) allows one to obtain directly the estimator R[O]( ) for any operator O for which the trace exists. For instance, the reconstruction of the density matrix in the Fock representation is obtained by averaging the estimator y
R½jnihn þ djj ð Þ ¼ 4hn þ djDy ð ÞðÞa a Dð Þjni sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 n! nþd ¼ 4ðÞ ð2 Þd e2j j Ldn ð4j j2 Þ, ðn þ dÞ!
ð109Þ
without the need of artificial cut-off in the Fock space [15]. 5. Quantum Estimation for Spin Systems The spin tomographic methods of Refs. [20,28,69] allow the reconstruction of the quantum state of a spin system. These methods utilize measurements of the spin in different directions, i.e., the quorum is the set of operators of the form S~ n~, where S~ is the spin operator and n~:ðcos ’ sin #, sin ’ sin #, cos #Þ is a varying unit vector. Different quorums can be used that exploit different sets of directions. The easiest choice for the set of directions n~ is to consider all possible directions. The procedure to derive the tomographic formulas for this quorum is analogous to the one employed in Section III.C.3 for homodyne tomography. The reconstruction formula for spin tomography for the estimation of an arbitrary operator O is hOi ¼
s Z X d n~ pðm, n~Þ R½O ðm, n~Þ, 4p m¼s
ð110Þ
where pðm, n~Þ is the probability of obtaining the eigenvalue m when measuring the spin along direction n~, R½O ðm, n~Þ is the tomographic estimator for the operator O, and is the unit sphere. In this case the operators C(l) of Equation (74) are given by the set of projectors over the eigenstates jm, n~i of the operators S~ n~. Notice that this is a complete set of operators on the system Hilbert space H. In order to find the dual basis B, one must consider the unitary operators obtained by exponentiating the quorum, i.e., Dð , n~Þ ¼ expði S~ n~Þ, which satisfy the bi-orthogonality condition (77). In fact, Dð , n~Þ constitutes a unitary irreducible representation of the group G ¼ SU(2), and the bi-orthogonality condition is just
238
MAURO D’ARIANO ET AL.
the orthogonality relations between the matrix elements of the group representation [77], i.e. Z G
dg Djr ðgÞDytk ðgÞ ¼
V jk tr , d
ð111Þ
where D is a unitary irreducible representation Rof dimension d, dg is the group Haar invariant measure, and V ¼ G dg. For G ¼ SU(2), with the (2s þ 1)-dimensional unitary irreducible representation Dð , n~Þ (~ n 2 S 2 unit vector on the sphere, and 2 ½0, 4p the rotation angle around n~) the Haar’s invariant measure is sin2 ð =2Þ sin# d# d’ d , and V=d ¼ 8p2 =ð2s þ 1Þ. We need, however, to integrate only for 2 ½0, 2p (the change of sign for 2p rotation is irrelevant), whence the bi-orthogonality condition is 2s þ 1 4p2
Z
Z d n~
D
2p
d sin2
2
0
jjei
n~S~
ED jr tjei
n~S~
E jk ¼ jk tr ,
ð112Þ
and hence the spin tomography identity is given by 2s þ 1 O¼ 4p2
Z
Z
2p
d n~
d sin2 0
2
Tr ODy ð , n~Þ Dð , n~Þ:
ð113Þ
Notice the analogy between Equation (113) and Glauber’s formula (90). In fact, both homodyne and spin tomography can be derived using the method of group tomography [20], and the underlying groups are the Weyl–Heisenberg group and the SU(2) group, respectively. Formula (110) is obtained from Equation (113) through the expectation value calculated on the eigenstates of S~ n~. Thus, the explicit form of the tomographic estimator is obtained as 2s þ 1 R½O ðm, n~Þ ¼ p
Z
2p
d sin2 0
2
h Tr Oei
S~~ n
i
ei
m
:
ð114Þ
As already noticed, there are other possible quorums for spin tomography. For example, for spin s ¼ 1/2 systems, a self-dual basis for the operator space is given by the identity and P the Pauli matrices. In fact, from the properties Tr½ ¼ 0 and ¼ i " ( , , ¼ x, y, z),
QUANTUM TOMOGRAPHY
239
both the bi-orthogonality relation (77) and the trace condition (79) follow. In this case the reconstruction formula is hOi ¼
1 1 X X Tr½O þ mpðm, n~ ÞTr½O : 2 2 ¼x,y,z m¼1=2
ð115Þ
In the case of generic s spin system, Weigert has also shown [69] that by choosing ð2s þ 1Þ2 arbitrary directions for n~, it is possible to obtain (in almost all cases) a quorum of projectors js, n~j ihs, n~j j ( j ¼ 1, . . . , ð2s þ 1Þ2 ), where js, n~j i is the eigenstate pertaining to the maximum eigenvalue s of S~ n~j . 6. Quantum Estimation for a Free Particle The state of a moving packet can be inferred from position measurement at different times [78]. Assuming a particle with unit mass and using normalized unit h=2 ¼ 1, the free Hamiltonian is given by the square of momentum operator HF ¼ p2 . In terms of the eigenvectors jxi of the position operator and of the self-adjoint operator Rðx, Þ ¼ eip jxihxjeip , 2
2
ð116Þ
the probability density of the position of the free particle at time is obtained as pðx, Þ ¼ Tr½Rðx, Þ . The operators Rðx, Þ provide a self-dual basis, and an arbitrary particle state can be written as Z Z ¼
R
R
dx d pðx, Þ Rðx, Þ:
ð117Þ
D. Noise Deconvolution and Adaptive Tomography In this section we will analyze: (1) the noise deconvolution scheme of Refs. [20,79], that allows one to eliminate the experimental noise that arises from imperfect detection and lossy devices; and (2) the adaptive tomography technique of Ref. [22] that allows one to tune the unbiased tomographic estimators to a specific sample of experimental data, in order to reduce the statistical noise.
240
MAURO D’ARIANO ET AL.
1. Noise Deconvolution In short, it is possible to eliminate detection noise when it is possible to invert the noise map. A noise process is described by a trace preserving a completely positive map . The noise can be deconvolved at the data analysis if the inverse of exists, namely 1 : LðHÞ ! LðHÞ, with 1 ½½O ¼ O, for 8O 2 LðHÞ, the estimator E l ½O ðQl Þ is in the domain of 1, the map 1 ½E l ½O ðQl Þ is a function of Ql .
If the above conditions are met, we can recover the ‘‘ideal’’ expectation value hOi that we would get without noise. This is achieved by replacing E l ½O ðQl Þ with 1 ½E l ½O ðQl Þ , and evaluating the ensemble average with the state ðÞ, namely the state affected by the noise ( represents the dual map that provides the evolution in the Schroedinger picture). Hence, one has Z
d Tr½1 ½E ½O ðQ Þ ðÞ
hOi ¼
Z
ð118Þ 1
:
dh ½E ½O ðQ Þ i :
Consider, for example, the noise arising from nonunity quantum efficiency of homodyne detectors. Recall that the ideal probability density is replaced by a Gaussian convolution with rms 2 ¼ ð1 Þ=ð4Þ. Then, the map acts on the quorum as follows Z ½e
ikX’
þ1
¼
dx eikx ½jxihxj
1
Z
Z
þ1
¼
þ1
dx 1
0 2
dx0 eikx eðxx Þ
=22
½jx0 ihx0 j
ð119Þ
1
¼ e2 k eikX’ : 1
2 2
Of course one has ikX’ ¼ e 2 1 ½e 1
2 2
k
eikX’ :
ð120Þ
QUANTUM TOMOGRAPHY
In terms of the Fourier transform of the estimator Z þ1 dx ixy ˜ e R½O ðx, ’Þ, R½O ð y, ’Þ ¼ 1 2p
241
ð121Þ
one has ˜ ˜ ½O ðy, ’Þ ¼ e122 y2 R½O ð y, ’Þ: R
ð122Þ
We applied the above result in Section III.C.3, where the effect of nonunity quantum efficiency for reconstructing the density matrix elements was discussed. The use of the estimator in Equation (98) and the origin of the bound > 1=2 is now more clear. Another simple example of noise deconvolution is given here for a spin 1=2 system. Consider the map that describes the ‘‘depolarizing channel’’ p ð123Þ p ½O ¼ ð1 pÞO þ Tr½O I, 0 p 1: 2 This map can be inverted for p 6¼ 1 as follows 1 p ½O ¼
1 p O Tr½O I : 1p 2
ð124Þ
Then Equation (115) can be replaced with hOi ¼
X X 1 1 Tr½O þ mpp ðm, n~ Þ Tr½O , 2 2ð1 pÞ m¼1=2 ¼x,y,z
ð125Þ
where now pp ðm, n~ Þ represents the probability of outcome m when measuring on the noisy state p ½ . 2. Adaptive Tomography The idea of adaptive tomography is that the tomographic null estimators of Equation (86) can be used to reduce statistical errors. In fact, the addition of a null estimator in the ideal case of infinite statistics does not change the average since its mean value is zero, but can change the variance. Thus, one can look for a procedure to reduce the variance by adding suitable null functions. Consider the class of equivalent estimators for O E 0 ½O ðQ Þ ¼ E ½O ðQ Þ þ
M X i¼1
i N i ðQ Þ:
ð126Þ
242
MAURO D’ARIANO ET AL.
Each estimator in the class E0 is identified by the coefficient vector ~. The variance of the tomographic averages can be evaluated as 2 E 0 ½O ¼ 2 E½O þ 2
M X
i N i E½O þ
i¼1
where F:h
R
M X
i j N i N j ,
ð127Þ
i,j¼1
dl FðQl Þi, and 2
2 E½O ¼ E 2 ½O E½O :
ð128Þ
Minimizing 2 E 0 ½O with respect to the coefficients i , one obtains the equation M X
j N i N j ¼ E½O N i ,
ð129Þ
j¼1
which can be solved starting from the estimated mean values, with the vector ~ as unknown. Notice that the obtained vector ~ will depend on the experimental data, and has to be calculated with the above procedure for any new set of data. In this way we obtain an adaptive tomographic algorithm, which consists of the following steps: Find the null estimators N i ðQl Þ ði ¼ 1, . . . , MÞ for the quorum which is being used in the experiment. Execute the experiment and collect the input data. Calculate, using the obtained data, the mean values N i N j and E½O N i , and solve the linear system (129), to obtain ~. Use the vector ~ obtained in the previous step to build the ‘‘optimized P estimator’’ E 0 ½O ðQl Þ ¼ E½O ðQl Þ þ i i N i ðQl Þ. Using the data collected in the first step, the mean value hOi is now evaluated as
Z hOi ¼
d hE 0 ½O ðQ Þi,
ð130Þ
where the optimized estimator has been used. For each new set of data the whole procedure must be repeated, as ~ is dependent on the data.
Notice that also the experimental mean values are slightly modified in the adaptive tomographic process, since null estimators do not change mean values only in the limiting case of infinite statistics. Examples of simulations
QUANTUM TOMOGRAPHY
243
of the adaptive technique that efficiently reduce statistical noise of homodyne tomographic reconstructions can be found in Ref. [22]. In homodyne tomography null estimators are obtained as linear combinations of the following functions N k,n ðX’ Þ ¼ X’k eiðkþ2þ2nÞ’ ,
k, n 0:
ð131Þ
One can easily check that such functions have zero average over ’, independent of . Hence, for every operator O one actually has an equivalence class of infinitely many unbiased estimators, which differ by a linear combination of functions N k,n ðX’ Þ. It is then possible to minimize the rms error in the equivalence class by the least-squares method, obtaining in this way an optimal estimator that is adapted to the particular set of experimental data.
IV. UNIVERSAL HOMODYNING As shown in Ref. [16], homodyne tomography can be used as a kind of universal detector for measuring generic field operators, at the expense, however, of some additional noise. In this section the general class of field operators that can be measured in this way is reviewed, which includes also operators that are inaccessible to heterodyne detection. In Ref. [29] the most relevant observables were analyzed—such as the intensity, the real, the complex field, and the phase—showing how their tomographic measurements are affected by noise that is always larger than the intrinsic noise of the direct detection of the considered observables. On the other hand, by comparing the noise from homodyne tomography with that from heterodyning (for those operators that can be measured in both ways), in Ref. [29] it was shown that for some operators homodyning is better than heterodyning when the mean photon number is sufficiently small, i.e., in the quantum regime, and in this section such comparisons will be also reviewed. A. Homodyning Observables Homodyne tomography provides the maximum achievable information on the quantum state of a single-mode radiation field through the use of the estimators in Section III.C.3. In principle, the knowledge of the density matrix should allow one to calculate the expectation value for unbounded operators. However, this is generally true only when one has an analytic knowledge of the density matrix, but it is not true when the matrix has been
244
MAURO D’ARIANO ET AL.
obtained experimentally. In fact, the Hilbert space is actually infinite dimensional, whereas experimentally one can achieve only a finite matrix, each element being affected by an experimental error. Notice that, even though the method allows one to extract any matrix element in the Hilbert space from the same bunch of experimental data, it is the way in which errors converge in the Hilbert space that determines the actual possibility of estimating the trace hOi ¼ Tr½O for an arbitrary operator O. This issue has been debated in the set of papers of Ref. [73]. Consider, for example, the number representation, and suppose that we want to estimate the average photon number hay ai. In Ref. [80] it has been shown that for nonunit quantum efficiency the statistical error for the diagonal matrix element hnjjni diverges faster than exponentially versus n, whereas pffiffiffiffiffiffiffiffiffi for ¼ 1 the error saturates for large n to the universal value "n ¼ 2=N that depends only on the number N of experimental data, but is independent of both n and on the quantum state. Even for the unrealistic case P ¼ 1, one can see immediately that the estimated expectation value hay ai ¼ H1 n¼0 nnn based on the measured matrix elements nn , will exhibit an unbounded error versus the truncated-space dimension H, because the nonvanishing error of nn versus n multiplies the increasing eigenvalue n. Here, we report the estimators valid for any operator that admits a normal ordered expansion, giving the general class of operators that can be measured in this way, also as a function of the quantum efficiency . Hence, from the same tomographic experiment, one can obtain not only the density matrix, but also the expectation value of various field operators, also unbounded, and including some operators that are inaccessible to heterodyne detection. However, the price to pay for such detection flexibility is that all measured quantities will be affected by noise. If one compares this noise with that from heterodyning (for those operators that can be measured in both ways), it turns out that for some operators homodyning is anyway less noisy than heterodyning, at least for small mean photon numbers. The procedure for estimating the expectation hOi will be referred to as homodyning the observable O. By homodyning the observable O we mean averaging an appropriate estimator R½O ðx, ’Þ, independent on the state , over the experimental homodyne data, achieving in this way the expectation value hOi for every state , as in Equation (92). For unbounded operators one can obtain the explicit form of the estimator R½O ðx, ’Þ in a different way. Starting from the identity involving trilinear products of Hermite polynomials [81] Z
þ1 1
dx ex Hk ðxÞ Hm ðxÞ Hn ðxÞ ¼ 2
2ðmþnþkÞ=2 p1=2 k!m!n! , ðs kÞ!ðs mÞ!ðs nÞ!
ð132Þ
QUANTUM TOMOGRAPHY
245
for k þ m þ n ¼ 2s even, Richter proved the following nontrivial formula for the expectation value of the normally ordered field operators [82] hayn am i ¼
Z
p 0
d’ p
Z
þ1 1
pffiffiffi Hnþm ð 2xÞ dx pðx, ’ÞeiðmnÞ’ pffiffiffiffiffiffiffiffiffiffi , 2nþm ð0Þðnþm n Þ
ð133Þ
which corresponds to the estimator yn m
R½a a ðx, ’Þ ¼ e
iðmnÞ’
pffiffiffi Hnþm ð 2xÞ pffiffiffiffiffiffiffiffiffiffi nþm : 2nþm ð n Þ
ð134Þ
This result can be easily extended to the case of nonunit quantum efficiency < 1. Using Equation (122) one obtains pffiffiffiffiffi Hnþm ð 2xÞ ffi R ½ayn am ðx, ’Þ ¼ eiðmnÞ’ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð2Þnþm ðnþm n Þ
ð135Þ
From Equation (135) by linearity one can obtain the estimator R ½ f ðx, ’Þ for any operator function f that has normal ordered expansion f :f ða, ay Þ ¼
1 X
ðNÞ yn m fnm a a :
ð136Þ
nm¼0
From Equation (135) one obtains pffiffiffiffiffi 1 1 X Hs ð 2xÞ X ðNÞ iðmnÞ’ fnm e n!m!nþm,s R ½ f ðx, ’Þ ¼ s=2 s!ð2Þ s¼0 nm¼0 pffiffiffiffiffi 1 X Hs ð 2xÞis d s ¼ F ½ f ð, ’Þ, s!ð2Þs=2 d s ¼0 s¼0
ð137Þ
where F ½ f ð, ’Þ ¼
1 X
ðNÞ fnm
nm¼0
nþm m
1 ðiÞnþm eiðmnÞ’ :
ð138Þ
Continuing from Equation (137) one has R ½ f ðx, ’Þ ¼ exp
1 d2 2ix d þ F ½ f ð, ’Þ, pffiffiffi 2 d2 d ¼0
ð139Þ
246
MAURO D’ARIANO ET AL.
and finally
Z
R ½ f ðx, ’Þ ¼
þ1
1
dw pffiffiffi 2 pffiffiffiffiffiffiffiffiffiffiffiffiffi eð=2Þw F ½ f ðw þ 2ix= , ’Þ: 1 2p
ð140Þ
Hence one concludes that the operator f can be measured by homodyne tomography if the function F ½ f ð, ’Þ in Equation (138) grows slower than expð2 =2Þ for ! 1, and the integral in Equation (140) grows at most exponentially for x ! 1 (assuming pðx, ’Þ goes to zero faster than exponentially at x ! 1). The robustness to additive phase-insensitive noise of this method of homodyning observables has also been analyzed in Ref. [16], where it was shown that just half a photon of thermal noise would spoil completely the measurement of the density matrix elements in the Fock representation. In Table 1 we report the estimator R ½O ðx, ’Þ for some operators O. The operator W^ s gives the generalized Wigner function Ws ð , * Þ for ordering parameter s through the relation in Equation (11). From the expression of R ½W^ s ðx, ’Þ it follows that by homodyning with quantum efficiency one can measure the generalized Wigner function only for s < 1 1 : in particular the usual Wigner function for s ¼ 0 cannot be measured for any quantum efficiency. B. Noise in Tomographic Measurements In this section we will review the analysis of Ref. [29], where the tomographic measurement of following four relevant field quantities has been studied: the field intensity, the real field or quadrature, the complex field, and the phase. For all these quantities the conditions given after Equation (140) are fulfilled. TABLE 1 ESTIMATOR R ½O ðx, ’Þ
FOR SOME
OPERATORS O (FROM [16]) R ½O ðx, ’Þ
O aynam
pffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi eiðmnÞ’ ½Hnþm ð 2xÞ= ð2Þnþm ðnþm n Þ
a
2ei’x
a2
e2i’(4x21/)
y
2x2(1/2)
aa y
(a a) y W^ s ¼ ½2=pð1 sÞ ½ðs þ 1Þ=ðs 1Þ a a
(8/3)x4 [((4 2)/)x2] þ [(1 )/22] pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi R1 t 0 dt½2e =ðpð1 sÞ ð1=ÞÞ cos½2 2t=ðð1 sÞ ð1=ÞÞ x
jnihn þ dj
R[jnihn þ dj](x, ’) in Equation (100)
2
QUANTUM TOMOGRAPHY
247
The tomographic measurement of the observable O is provided in terms of the average w of the estimator w :R ½O ðx, ’Þ over the homodyne data. q ffiffiffiffiffiffiffiffiffiffiThe precision of the measurement is given by the confidence interval w2 . When w is a real quantity, one has w2 ¼ w2 w 2 ,
ð141Þ
where Z w2 :R2 ½O ðx, ’Þ ¼
p 0
d’ p
Z
1 1
dx p ðx, ’Þ R2 ½O ðx, ’Þ:
ð142Þ
When w is complex, one has to consider the eigenvalues of the covariance matrix, namely w2 ¼
i 1h 2 jwj jw j2 jw2 w 2 j : 2
ð143Þ
When the observable O can also be directly measured by a specific setup we can compare the tomographic precision w2 with hO2 i ¼ hO2 i hO2 i. 2 Notice that, when we deal with < 1 the noise hO i is larger that the quantum fluctuations due to the smearing effect of nonunit quantum efficiency. As we will see, the tomographic measurement is always more noisy than the corresponding direct measurement for any observable at any quantum efficiency . This is not surprising, in view of the larger amount of information retrieved in the tomographic measurement as compared to the direct measurement of a single quantity. According to Equation (142), the evaluation of the added noise requires the average of the squared estimator. For the estimators in Equation (135) it is very useful to consider the following identity for the Hermite polynomials [83] Hn2 ðxÞ ¼ 2n n!2
n X k¼0
H2k ðxÞ , k!2 2k ðn kÞ!
ð144Þ
that allows one to write R2 ½ayn am ðx, ’Þ ¼ e2i’ðmnÞ
þn X n!2 m!2 m ð2kÞ!k R ½ayk ak ðx, ’Þ, mþn k¼0 k!4 ðn þ m kÞ!
ð145Þ
248
MAURO D’ARIANO ET AL.
namely the squared estimator R2 ½ayn am ðx, ’Þ can be written just in terms of ‘‘diagonal’’ estimators R ½ayk ak ðx, ’Þ. 1. Field Intensity Photodetection is the direct measurement of the field intensity. For nonunit quantum efficiency , the probability of detecting m photons is given by the Bernoulli convolution in Equation (22). Let us consider the rescaled photocurrent I ¼
1 y a a,
ð146Þ
which traces the photon number, namely hI i ¼
1 1X m p ðmÞ ¼ hay ai: n: m¼0
ð147Þ
The variance of I is given by D
1 E 1 X 1 m2 pðmÞ n2 ¼ n2 þ n 1 , I2 ¼ 2 m¼0
ð148Þ
where hn2i denotes the intrinsic photon number variance, and nð1 1Þ represents the noise introduced by inefficient detection. The tomographic estimator that traces the photon number is given by the phase-independent function w :2x2 ð2Þ1 . Using Equation (145) we can evaluate its variance as follows 1 2 3 1 w2 ¼ hn2 i þ hn2 i þ n ð149Þ þ 2: 2 2 2 The noise N[n] added by tomography in the measurement of the field intensity n is then given by
1 2 1 N½n ¼ w2 hI 2 i ¼ hn2 i þ n 1 þ 2 : ð150Þ 2 Notice that N[n] is always positive, and largely depends on the state under examination. For coherent states we have the noise ratio sffiffiffiffiffiffiffiffiffiffiffiffiffiffi
w2 1 1 1=2 n ¼ n þ , ð151Þ ¼ 2þ 2 n hI 2 i which is minimum for n ¼ 1 .
QUANTUM TOMOGRAPHY
249
2. Real Field For single-mode radiation the electric field is proportional to a quadrature X ¼ ða þ ay Þ=2, which is just traced by homodyne detection at fixed zerophase with respect to the local oscillator. The tomographic estimator is given by w :R ½X ðx, ’Þ ¼ 2x cos ’, independent of , whereas the squared estimator R2 ½X can be written as w2 ¼
1 1 cosð2’Þ R ½a2 ðx, ’Þ þ R ½ay2 ðx, ’Þ þ R ½ay a ðx, ’Þ þ þ : ð152Þ 4 2 2
Then one has w2 ¼
1 y2 1 1 ha i þ ha2 i þ n þ ha þ ay i2 4 2 4
¼ hX 2 i þ
1 2 , nþ 2 4
ð153Þ
where hX2i represents the intrinsic quadrature fluctuations. The tomographic noise in Equation (153) can be compared with the rms variance of direct homodyne detection (see Section II.C) hX 2 i ¼ hX 2 i þ
1 : 4
ð154Þ
Then the added noise reads N½X ¼
n 1 þ : 2 4
ð155Þ
For coherent states hX2i ¼ 1/4, and one has the noise ratio sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi w2 x ¼ ¼ 2n þ 2: hX 2 i
ð156Þ
3. Field Amplitude The detection of the complex field amplitude of a single-mode light beam is represented by the generalized measurement of the annihilation operator a. The tomographic estimator for a is given by the complex function
250
MAURO D’ARIANO ET AL.
w :R ½a ðx, ’Þ ¼ 2x exp ði’Þ, and the precision of the measurement is evaluated as in Equation (143). From Equation (145) one obtains
1 ei2’ þ 2R ½ay a ðx, ’Þ ¼ þ R ½a2 ðx, ’Þ,
w2 :R2 ½a ðx, ’Þ ¼ ei2’
ð157Þ
and jw j2 :jR ½a ðx, ’Þj2 ¼
1 1 þ 2R ½ay a ðx, ’Þ ,
ð158Þ
and hence w2
1 1 2 2 2 þ 2n jhaij jha i ha ij : ¼ 2
ð159Þ
The optimal measurement of the complex field a is obtained through heterodyne detection. As noticed in Section II.D the probability distribution is given by the generalized Wigner function Ws ð , * Þ, with s ¼ 1 ð2=Þ. Using Equation (56) the precision of the measurement is easily evaluated as follows i 2 1h a ¼ j j2 j j2 j 2 2 j 2
1 1 2 2 2 ¼ n þ jhaij jha i hai j : 2
ð160Þ
The noise added by quantum tomography then reads N½a ¼
1 n, 2
ð161Þ
which is independent on quantum efficiency. For a coherent state we have w2
1 1 ¼ nþ , 2
ha2 i ¼
1 , 2
ð162Þ
and the noise ratio is then sffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffi w2 a ¼ ¼ 1 þ n: 2 ha i
ð163Þ
251
QUANTUM TOMOGRAPHY
4. Phase The canonical description of the quantum optical phase is given by the probability operator measure [53,84] dð’Þ ¼
1 d’ X exp½iðm nÞ’ jnihmj: 2p n,m¼0
ð164Þ
However, no feasible setup is known that achieves the optimal measurement (164). For this reason, here we consider the heterodyne measurement of the phase, and compare it with the phase of the tomographic estimator for the corresponding field operator a, i.e., w ¼ argð2xei’ Þ. Notice that the phase w does not coincide with the local oscillator phase ’, because x has varying sign. The probability distribution of w can be obtained by the following identity Z
p 0
d’ p
Z
Z
1 1
dx p ðx, ’Þ ¼ 1 ¼
p
p
dw p
Z
1
dx p ðx, w Þ,
ð165Þ
0
which implies p ðw Þ ¼
1 p
Z
1
dx p ðx, w Þ:
ð166Þ
0
The precision in the tomographic phase measurement is given by the rms variance w2 of the probability (166). In the case of a coherent state with positive amplitude j i:jj ji, Equation (166) gives " !# pffiffiffi 1 2j j cos w 1 þ Erf , p ðw Þ ¼ pffiffiffi 2p
ð167Þ
which approaches a ‘‘boxed’’ distribution in ½p=2, p=2 for large intensity j j 1. We compare the tomographic phase measurement with heterodyne detection, namely the phase of the direct-detected complex field a. The outcome probability distribution is the marginal distribution of the generalized Wigner function Ws ð , * Þ (s ¼ 1 (2/)) integrated over the radius Z p ð’Þ ¼ 0
1
d Ws ðei’ , ei’ Þ,
ð168Þ
252
MAURO D’ARIANO ET AL.
whereas the precision in the phase measurement is given by its rms variance’2n : We are not able to give a closed formula for the added noise N½’ ¼ w2 ’2 . However, for high excited coherent states j i:jj ji (zero mean phase) one has w2 ¼ p2 =12 and ’2 ¼ ð2nÞ1 . The asymptotic noise ratio is thus given by vffiffiffiffiffiffiffiffiffi rffiffiffiffiffi u 2 uy n t , ’ ¼ ¼p 2 6 ’
n 1:
ð169Þ
A comparison for low excited coherent states can be performed numerically. The noise ratio ’ (expressed in dB) is shown in Figure 3 for some values of the quantum efficiency . It is apparent that the tomographic determination of the phase is more noisy than heterodyning also in this low-intensity regime. In Table 2 a synthesis of the results of this section is reported. We have considered the ratio between the tomographic and the direct-measurement noise. This is an increasing function of the mean photon number n, scaled by the quantum efficiency . Therefore homodyne tomography turns out to be a very robust detection scheme for low quantum efficiency. In Figure 4 the coherent-state noise ratios (in dB) for all the considered quantities are plotted for unit quantum efficiency versus n.
FIGURE 3. Ratio between tomographic and heterodyne noise in the measurement of the phase for low excited coherent states. The noise ratio is reported versus the mean photon number n for some values of the quantum efficiency. From bottom to top we have ¼ 0.2, 0.4, 0.6, 0.8, 1.0. (From Ref. [29].)
253
QUANTUM TOMOGRAPHY TABLE 2
ADDED NOISE N[O] IN TOMOGRAPHIC MEASUREMENT OF O AND NOISE RATIO O FOR COHERENT STATES. FOR THE PHASE THE RESULTS ARE VALID IN THE ASYMPTOTIC REGIME n 1 (FROM REF. [29]) O aya X A ’
N[O]
O
ð1=2Þ½hn2 i þ nðð2=Þ 1Þ þ ð1=2 Þ ð1=2Þ½n þ ð1=2Þ ð1=2Þn ðp=12Þ ð1=2nÞ
½2 þ ðn=2Þ þ ð1=2nÞ 1=2 ½2ð1 þ nÞ 1=2 ð1 þ nÞ1=2 pffiffiffiffiffiffiffiffiffiffi p n=6
FIGURE 4. The coherent-state noise ratio (in dB) for all the quantities considered in this section. (From Ref. [29].)
In conclusion, homodyne tomography adds larger noise for highly excited states, however, it is not too noisy in the quantum regime of low n. It is then very useful in this regime, where currently available photodetectors suffer most limitations. Indeed, it has been adopted in experiments of photodetection [10,11].
C. Comparison between Homodyne Tomography and Heterodyning We have seen that homodyne tomography allows one to measure any field observable f :f ða, ay Þ having normal ordered expansion f :f ðNÞ ða, ay Þ ¼ P 1 ðNÞ yn m and bounded integral in Equation (140). On the other nm¼0 fnm a a hand, as shown in Section II.D, heterodyne detection allows one to measure
254
MAURO D’ARIANO ET AL.
field that admit antinormal ordered expansion f :f ðAÞ ða, ay Þ ¼ P1 observables ðAÞ m yn nm¼0 fnm a a , in which case the expectation value is obtained through the heterodyne average Z hfi ¼ C
d 2 ðAÞ f ð , * Þh jj i: p
ð170Þ
As shown in Section II.D, for ¼ 1 the heterodyne probability is just the Q-function Qð , * Þ ¼ ð1=pÞh jj i, whereas for < 1 it is Gaussian convoluted with rms ð1 Þ=, thus giving the Wigner function Ws ð , * Þ, with s ¼ 1 ð2=Þ. Indeed, the problem of measurability of the observable f through heterodyne detection is not trivial, since one needs the admissibility of antinormal ordered expansion and the convergence of the integral in Equation (170). We refer the reader to Refs. [16,59] for more details and to Refs. [58,60] for analysis of quantum state estimates based on heterodyne detection. The additional noise in homodyning the complex field a has been evaluated in Equation (161), where we found that homodyning is always more noisy than heterodyning. On the other hand, for other field observables it may happen that homodyne tomography is less noisy than heterodyne detection. For example, the added noise in homodyning the intensity aya with respect to direct detection has been evaluated in Equation (150). Analogously, one can easily evaluate the added noise Nhet ½n when heterodyning the photon number n ¼ ay a. According to Equation (56), the random variable corresponding to the photon number for heterodyne detection with quantum efficiency is ð Þ ¼ j j2 ð1=Þ. From the relation j j4 ¼ ha2 ay2 i þ 4
1 1 2 haay i þ 2
ð171Þ
one obtains 2
ð Þ ¼ hn2 i þ n
2 1 1 þ 2:
ð172Þ
Upon comparing with Equation (148), one concludes that the added noise in heterodyning the photon number is given by D E 1 Nhet ½n ¼ 2 ðzÞ I2 ¼ 2 ðn 1Þ:
ð173Þ
QUANTUM TOMOGRAPHY
255
With respect to the added noise in homodyning of Equation (150) one has Nhet ½n ¼ N½n
1 1 hn2 i n 2 : 2
ð174Þ
Since hn2 i n2 , we can conclude that homodyning the photon number is less noisy than heterodyning pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi it for sufficiently low mean photon number hni < (1/2)ð1 þ 1 þ ð4=2 ÞÞ.
V. MULTIMODE HOMODYNE TOMOGRAPHY The generalization of homodyne tomography from a single-mode to a multimode field is quite obvious, the estimator of simple operator tensors O ¼ O1 O2 On being just the product of the estimators of each single-mode operator O1 ,O1 , . . . ,On . By linearity, one then obtains also the estimator for arbitrary multimode operators. Such a simple generalization, however, requires a separate homodyne detector for each mode, which is unfeasible when the modes of the field are not spatiotemporally separated. This is the case, for example, of pulsed fields, for which a general multimode tomographic method is especially needed, also due to the problem of mode matching between the local oscillator and the detected fields (determined by their relative spatiotemporal overlap) [85], which produces a dramatic reduction of the overall quantum efficiency. In this section we review the general method of Ref. [17] for homodyning observables of a multimode electromagnetic field using a single local oscillator (LO), providing the rule to evaluate the estimator of an arbitrary multimode operator. The expectation value of the operator can then be obtained by averaging the estimator over the homodyne outcomes that are collected using a single LO whose mode randomly scans all possible linear combinations of incident modes. We will then specifically consider some observables for a two-mode field in a state corresponding to a twin-beam produced by parametric downconversion, and prove the reliability of the method on the basis of computer simulations. Finally, we report some experimental results [86] obtained in Prem Kumar’s laboratory at Northwestern University. Such an experiment actually represents the first measurement of the joint photon number probability distribution of the twin-beam state.
256
MAURO D’ARIANO ET AL.
A. The General Method The Hilbert–Schmidt operator expansion in Equation (91) can be generalized to any number of modes as follows ( " #) Z 2 Z 2 M X d 2 z0 d z1 d zM y * ... Tr O exp zl al þ zl al O¼ p c p c p c l¼0 " # M X y * z l al z l al , exp Z
ð175Þ
l¼0
where al and ayl , with l ¼ 0, . . . , M and ½al , ayl0 ¼ ll0 , are the annihilation and creation operators of M þ 1 independent modes, and O now denotes an operator over all modes. Using the following hyperspherical parameterization for zl 2 C i ku0 ð~Þei 2 i z1 ¼ ku1 ð~Þei 2 i z2 ¼ ku2 ð~Þei 2
z0 ¼
i i 0 ke cos 1 , 2 i 1 ¼ _ k ei 1 sin 1 cos 2 , 2 i 2 ¼ _ kei 2 sin 1 sin 2 cos 3 , 2 0
¼ _
... i i kuM1 ð~Þei M1 ¼ _ kei M1 sin 1 sin 2 sin M1 cos M , 2 2 i i ¼ kuM ð~Þei M ¼ _ kei M sin 1 sin 2 sin M1 sin M , 2 2
zM1 ¼ zM
ð176Þ
where k 2 ½0, 1Þ; l 2 ½0, 2p for l ¼ 0,1, . . . , M; and l 2 ½0, p=2 for l ¼ 1, 2, . . . ,M, Equation (175) can be rewritten as follows: Z O¼
d½ ~
Z
d½~
Z
þ1
dk 0
2Mþ1 k 1 ~ ~ ~~ Tr½OeikXð, Þ eikXð, Þ : 2 M!
ð177Þ
Here we have used the notation Z
d½ ~ ¼ _
M Z Y l¼0
0
2p
d l , 2p
ð178Þ
257
QUANTUM TOMOGRAPHY
Z
d½~ ¼ _ 2M M!
M Z Y l¼1
p=2
dl sin2ðMlÞþ1 l cos l ,
i 1h Xð~, ~Þ ¼ Ay ð~, ~Þ þ Að~, ~Þ , 2 Að~, ~Þ ¼
M X
ð179Þ
0
ð180Þ
ei l ul ð~Þal :
ð181Þ
l¼0
P 2 ~ From the parameterization in Equation (177), one has M l¼0 ul ðÞ ¼ 1, and y y hence ½Að~, ~Þ, A ð~, ~Þ ¼ 1, namely Að~, ~Þ and A ð~, ~Þ themselves are annihilation and creation operators of a bosonic mode. By scanning all values of l 2 ½0, p=2 and l 2 ½0, 2p , all possible linear combinations of modes al are obtained. For the quadrature operator Xð~, ~Þ in Equation (180), one has the following identity for the moments generating function Z þ1 D E 1 2 ikXð~, ~Þ k ¼ exp e dx eikx p ðx; ~, ~Þ, ð182Þ 8 1 where p ðx; ~, ~Þ denotes the homodyne probability distribution of the quadrature Xð~, ~Þ with quantum efficiency . Generally, can depend on the mode itself, i.e., it is a function ¼ ð~, ~Þ of the selected mode. In the following, for simplicity, we assume to be mode independent, however. By taking the ensemble average on each side of Equation (177) and using Equation (182) one has Z Z Z þ1 hOi ¼ d½ ~ d½~ dx p ðx; ~, ~Þ R ½O ðx; ~, ~Þ, ð183Þ 1
where the estimator R ½O ðx; ~, ~Þ has the following expression k R ½O ðx; ~, ~Þ ¼ M!
Mþ1
Z
þ1
dt eð1ðk=2ÞÞtþ2i
h i pffiffiffi ~ ~ t Tr O e2i ktXð, Þ ,
pffiffiffi kt x M
0
ð184Þ with k ¼ 2=ð2 1Þ. Equations (183) and (184) allow one to obtain the expectation value hOi for any unknown state of the radiation field by averaging over the homodyne outcomes of the quadrature Xð~, ~Þ for ~ and ~ randomly distributed according to d½ ~ and d½~ . Such outcomes can be obtained by using a single LO that is prepared in the multimode coherent i l state M l¼0 jl i with l ¼ e ul ðÞK=2 and K 1. In fact, in this case the
258
MAURO D’ARIANO ET AL.
rescaled zero-frequency photocurrent at the output of a balanced homodyne detector is given by I¼
M 1 X ð * al þ l ayl Þ, K l¼0 l
ð185Þ
which corresponds to the operator Xð~, ~Þ. In the limit of a strong LO (K ! 1), all moments of the current I correspond to the moments of Xð~, ~Þ, and the exact measurement of Xð~, ~Þ is then realized. Notice that for modes al with different frequencies, in the d.c. photocurrent in Equation (185) each LO with amplitude l selects the mode al at the same frequency (and polarization). For less-than-unity quantum efficiency, Equation (182) holds. Equation (184) can be applied to some observables of interest. In particular, one can estimate the matrix element hfnl gjRjfml gi of the multimode density operator R. This will be obtained by averaging the estimator PM
kMþ1 M! sffiffiffiffiffiffi) ( M Y pffiffiffi l ! l l ½i kul ð~Þ l! l¼0
R ½jfml gihfnl gj ðx; ~, ~Þ ¼ ei
Z
l¼0
ðnl ml Þ
þ1
dt etþ2i
l
M PM pffiffiffi Y kt x Mþ l¼0 ðl l Þ=2
t
0
L ll l ½ku2l ð~Þt ,
l¼0
ð186Þ where l ¼ maxðml , nl Þ, l ¼ minðml , nl Þ, and L n ðzÞ denotes the generalized Laguerre polynomial. For diagonal matrix elements, Equation (186) simplifies to k R ½jfnl gihfnl gj ðx; ~, ~Þ ¼ M!
Mþ1
Z
þ1 0
dt etþ2i
M pffiffiffi Y kt x M
t
Lnl ½ku2l ð~Þt ð187Þ
l¼0
with Ln ðzÞ denoting the customary Laguerre polynomial in z. Using the following identity [81] L n 0 þ 1 þ þ M þM ðx0 þ x1 þ þ xM Þ X L i00 ðx0 ÞL i11 ðx1 Þ L iMM ðxM Þ, ¼ i0 þi1 þ þiM ¼n
ð188Þ
259
QUANTUM TOMOGRAPHY
from Equation (187) one can easily derive the estimator of the probability P y distribution of the total number of photons N ¼ M l¼0 al al k R ½jnihnj ðx; ~, ~Þ ¼ M!
Mþ1
Z
þ1
dt etþ2i
pffiffiffi kt x M
0
t LM n ½kt ,
ð189Þ
where jni denotes the eigenvector of N with eigenvalue n. Notice that the estimator in Equation (187) does not depend on the phases l ; only the knowledge of the angles l is needed. For the estimator in Equation (189), even the angles l can be unknown. Now we specialize to the case of only two modes a and b (i.e., M ¼ 1 and ~ is a scalar ). The joint photon number probability distribution is obtained by averaging R ½jn, mihn, mj ðx; , 0 , 1 Þ Z þ1 pffiffiffi 2 ¼k dt etþ2i kt x t Ln ðkt cos2 ÞLm ðkt sin2 Þ:
ð190Þ
0
The estimator (189) of the probability distribution of the total number of photons can be written as Z R ½jnihnj ðx; ,
0,
1Þ
¼k
þ1
2 0
dt etþ2i
pffiffiffi kt x
t L1n ½kt :
ð191Þ
For the total number of photons one can also derive the estimator of the moment generating function, using the generating function for the Laguerre polynomials [81]. One obtains R ½za
y
aþby b
ðx; ,
0,
1Þ ¼
1 1 1z 2 ; x : 2, 2 z þ ðð1 zÞ=kÞ ðz þ ðð1 zÞ=kÞÞ2 ð192Þ
For the first two moments one obtains the simple expressions 2 ¼ 4x2 þ 2, k 24 6 10 y y 2 4 20 x2 þ 2 þ 4: ð193Þ R ½ða a þ b bÞ ðx; , 0 , 1 Þ ¼ 8x þ R ½ay a þ by b ðx; ,
0,
1Þ
It is worth noting that analogous estimators of the photon number difference between the two modes are singular and one needs a cutoff
260
MAURO D’ARIANO ET AL.
procedure, similar to the one used in Ref. [87] for recovering the correlation between the modes by means of the customary two-mode tomography. In fact, in order to extract information pertaining to a single mode only one needs a delta-function at ¼ 0 for mode a, or ¼ p=2 for mode b, and, in this case, one could better use the standard one-mode tomography by setting the LO to the proper mode of interest. Finally, we note that for two-mode tomography the estimators can be averaged by the integral Z
2p
hO i ¼ 0
d 0 2p
Z
2p 0
d 1 2p
R ½O ðx; ,
0,
Z
1
1
dðcos 2Þ 2
Z
þ1
1
dx p ðx; ,
0,
1Þ
1Þ
ð194Þ
over the random parameters cosð2Þ, 0 , and 1 . For example, in the case of two radiation modes having the same frequency but orthogonal polarizations, represents a random rotation of the polarizations, whereas 0 and 1 denote the relative phases between the LO and the two modes, respectively. 1. Numerical Results for Two-Mode Fields In this section we report some Monte Carlo simulations from Ref. [17] to judge the experimental working conditions for performing the single-LO tomography on two-mode fields. We focus our attention on the twin-beam state, usually generated by spontaneous parametric downconversion, namely qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X 1 ji ¼ SðÞj0ia j0ib ¼ 1 j j2
n jnia jnib ,
ð195Þ
n¼0
where SðÞ ¼ expðay by * abÞ and ¼ e iarg tanhjj. The parameter is related to the average number of photons per beam n ¼ j j2 =ð1 j j2 Þ. For the simulations we need to derive the homodyne probability distribution pðx; , 0 , 1 Þ which is given by pðx; ,
0,
1Þ
¼ Tr U y jx aa hxj 1b Ujihj D E ¼ 0jb 0jSy ðÞU y ½jxiaa hxj 1b USðÞj0 a j0 , a
b
ð196Þ
261
QUANTUM TOMOGRAPHY
where jxia is the eigenvector of the quadrature x ¼ 12 ðay þ aÞ with eigenvalue x and U is the unitary operator achieving the mode transformation Uy
i 0 a cos e U¼ b ei 1 sin
ei 1 sin ei 0 cos
a : b
ð197Þ
In the case of two radiation modes having the same frequency but orthogonal polarizations—the case of Type II phase-matched parametric amplifier—Equation (196) gives the theoretical probability of outcome x for the homodyne measurement at a polarization angle with respect to the polarization of the a mode, and with 0 and 1 denoting the relative phases between the LO and the two modes, respectively. By using the Dirac- representation of the X-quadrature projector Z
þ1
jxihxj ¼ 1
d exp½iðX xÞ , 2p
ð198Þ
Equation (196) can be rewritten as follows [17] Z
E d D y y iðXa xÞ USðÞj0 a j0 a 0jb 0jS ðÞU e b 1 2p 8 9 Z þ1 < i ½ðei 0 cos þ ei 1 * sinÞa = d ix 2 e j0i j0i , ¼ a h0jb h0jexp : ; a b 1 2p þ ðei 0 * cos þ ei 1 sinÞb þ H:c:
pðx; ,
0,
þ1
1Þ ¼
ð199Þ where we have used Equation (197) and the transformation a S ðÞ y SðÞ ¼ b * y
a by
ð200Þ
with ¼ coshjj and ¼ e iarg sinhjj. Upon defining KC ¼ ei 0 cos þ ei 1 * sin , KD ¼ ei 0 * cos þ ei 1 sin ,
ð201Þ
where K 2 R and C, D 2 C, with |C|2 þ |D|2 ¼ 1 one has K 2 ¼ 2 þ j j2 þ 2j j sin 2 cosð
0
þ
1
arg Þ:
ð202Þ
262
MAURO D’ARIANO ET AL.
Now, since the unitary transformation
C D*
D C*
a a ! b b
ð203Þ
has no effect on the vacuum state, Equation (199) leads to the following Gaussian distribution pðx; , 0 , 1 Þ Z þ1 d ix ea ¼ 0jb 0j exp iK ½ðCa þ DbÞ þ H:c: j0 j0 2 1 2p a b
Z þ1 d ix 1 ea a þ ay j0 ¼ ja 0jx=K a j2 ¼ 0j exp iK 2 K 1 2p a 1 x2 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi exp 2 , 2 ð, 0 , 1 Þ 2p2 ð, 0 , 1 Þ where the variance 2 ð, 2 ð,
0,
1Þ
¼
0,
1Þ
ð204Þ
is given by
K 2 1 þ j j2 þ 2j j sin 2 cosð 0 þ ¼ 4 4ð1 j j2 Þ
1
arg Þ
:
ð205Þ
Taking into account the Gaussian convolution that results from less-thanunity quantum efficiency, the variance just increases as 2 ð,
0,
1Þ
! 2 ð,
0,
1Þ
¼ 2 ð,
0,
1Þ
þ
1 : 4
ð206Þ
Notice that the probability distribution in Equation (204) corresponds to a squeezed vacuum for ¼ p=4 and 0 þ 1 arg ¼ 0 or p. We study the tomographic measurement of the joint photon number probability distribution and the probability distribution for the total number of photons with use of the estimators in Equations (190) and (191), respectively. Moreover, using the estimator in Equation (186) we reconstruct the matrix elements Cn,m :a mjb hmjihjnia jn b ,
ð207Þ
QUANTUM TOMOGRAPHY
263
FIGURE 5. Two-mode photon number probability p(n, m) of the twin-beam state in Equation (195) for average number of photons per beam n ¼ 5 obtained by a Monte Carlo simulation with the estimator in Equation (190) and random parameters cos 2, 0, and 1. Left: quantum efficiency ¼ 1 and 106 data samples were used in the reconstruction; right: ¼ 0.9, and 5 106 data samples. (From Ref. [17].)
which reveal the coherence of the twin-beam state. Theoretically one should have Cn,m ¼ ð1 j j2 Þ m *n :
ð208Þ
The estimators have been numerically evaluated by applying the Gauss method for calculating the integral in Equation (186), which results in a fast and sufficiently precise algorithm with the use of just 150 evaluation points. In Figure 5 a Monte Carlo simulation of the joint photon number probability distribution is reported. The simulated values compare very well with the theoretical ones. In Ref. [17] a careful analysis of the statistical errors has been done for various twin-beam states by constructing histograms of deviations of the results from different simulated experiments from the theoretical ones. In comparison to the customary two-LO tomography of Ref. [87], where for ¼ 1 the statistical errors saturate for increasingly large n and m, here we have statistical errors that are slowly increasing versus n and m. This is due to the fact that the range of the estimators in Equation (190) increases versus n and m. Overall we find that for any given quantum efficiency the statistical errors are generally slightly larger than those obtained with the two-LO method. The convenience of using a single LO then comes with its own price tag.
264
MAURO D’ARIANO ET AL.
FIGURE 6. Probability distribution for the total number of photons of the twin beams in Equation (195) for average number of photons per beam n ¼ 2 obtained using the estimator in Equation (191). The oscillation of the total photon number probability due to the perfect correlation of the twin beams has been reconstructed by simulating 107 data samples with quantum efficiency ¼ 0:9 (on the left), and 2 107 data samples ¼ 0:8 (on the right). The theoretical probability (thick solid line) is superimposed onto the result of the Monte Carlo experiment; the latter is shown by the thin solid line. Notice the dramatic increase of errors (in gray shade) versus N and for smaller . (From Ref. [17].)
FIGURE 7. Tomographic reconstruction of the matrix elements Cn,m :a hmjb hmjihjnia jnib of the twin beams in Equation (195) for average number of photons per beam n ¼ 2, obtained using the estimator in Equation (186). On the left we used 106 simulated data samples and quantum efficiency ¼ 0:9; on the right 3 106 data samples and ¼ 0:8. The coherence of the twin-beam state is easily recognized as Cn,m varies little for n þ m ¼ constant ( in Equation (195) has been chosen real). For a typical comparison between theoretical and experimental matrix elements and their relative statistical errors, see results in Figure 6. (From Ref. [17].)
QUANTUM TOMOGRAPHY
265
FIGURE 8. A schematic of the experimental setup. NOPA: nondegenerate optical parametric amplifier; LOs: local oscillators; PBS: polarizing beam splitter; LPFs: low-
By using the estimator in Equation (191) the probability distribution for the total number of photons N of the twin beams has been also constructed (Figure 6). Notice the dramatic increase of error bars versus N and for smaller . Finally, in Figure 7 we report the results of the tomographic measurement of Cn,m defined in Equation (207). Because the reconstructed Cn,m is close to the theoretically expected value in Equation (208), these reveal the purity of the twin beams, which cannot be inferred from the thermal diagonal distribution of Figure 5. The first experimental results of a measurement of the joint photon number probability distribution for a two-mode quantum state created by a nondegenerate optical parametric amplifier has been presented in Ref. [86]. In this experiment, however, the twin beams are detected separately by two balanced-homodyne detectors. A schematic of the experimental setup is reported in Figure 8, and some experimental results are reported in Figure 9. As expected for parametric fluorescence, the experiment has shown a measured joint photon number probability distribution that exhibited up to 1.9 dB of quantum correlation between the two modes, with thermal marginal distributions.
VI. APPLICATIONS TO QUANTUM MEASUREMENTS In this section we review a number of applications of quantum tomography related to some fundamental tests in quantum mechanics.
266
MAURO D’ARIANO ET AL.
FIGURE 9. Left: measured joint photon number probability distribution for the twin-beam state with average number of photons per beam n ¼ 1:5 and 4 105 samples. Right: marginal distribution for the signal beam for the same data. The theoretical distribution is also shown. Very similar results are obtained for the idler beam. (From Ref. [86].)
First, we report the proposal of Ref. [30] for testing the nonclassicality of quantum states by means of an operational criterion based on a set of quantities that can be measured experimentally with some given level of confidence, even in the presence of loss, noise, and less-than-unity quantum efficiency. Second, we report the experiment proposed in Ref. [31] for testing quantum state reduction. The state reduction rule is tested using optical homodyne tomography by directly measuring the fidelity between the theoretically expected reduced state and the experimental state. Finally, we review some experimental results obtained at the Quantum Optics Lab of the University of Naples [32] about the reconstruction of coherent signals, together with application to the estimation of the losses introduced by simple optical components.
A. Measuring the Nonclassicality of a Quantum State The concept of nonclassical states of light has received much attention in quantum optics [41,88–96]. The customary definition of nonclassicality is given in terms of the P-function presented in Section II.A: a nonclassical state does not admit a regular positive P-function representation, namely it cannot be written as a statistical mixture of coherent states. Such states produce effects that have no classical analogue. These kinds of states are of fundamental relevance not only for the demonstration of the inadequacy of classical description, but also for applications, e.g., in the realms of information transmission and interferometric measurements [91,92,95].
QUANTUM TOMOGRAPHY
267
We are interested in testing the nonclassicality of a quantum state by means of a set of quantities that can be measured experimentally with some given level of confidence, even in the presence of loss, noise, and less-thanunity quantum efficiency. The positivity of the P-function itself cannot be adopted as a test, since there is no viable method to measure it. As proved in Section IV.A only the generalized Wigner functions of order s < 1 1 can be measured, being the quantum efficiency of homodyne detection. Hence, through this technique, all functions from s ¼ 1 to s ¼ 0 cannot be recovered, i.e., we cannot obtain the P-function and all its smoothed convolutions up to the customary Wigner function. For the same reason, the nonclassicality parameter proposed by Lee [41], namely the maximum s-parameter that provides a positive distribution, cannot be experimentally measured. Among the many manifestations of nonclassical effects, one finds squeezing, antibunching, even–odd oscillations in the photon-number probability, and negativity of the Wigner function [89–91,95,97–100]. Any of these features alone, however, does not represent the univocal criterion we are looking for. Neither squeezing nor antibunching provides a necessary condition for nonclassicality [93]. The negativity of the Wigner function, which is well exhibited by the Fock states and the Schro¨dinger-cat-like states, is absent for the squeezed states. As for the oscillations in the photon number probability, some even–odd oscillations can be simply obtained by using a statistical mixture of coherent states. Many authors [93,94,96] have adopted the nonpositivity of the phaseR 2p averaged P-function FðIÞ ¼ ð1=2pÞ 0 d PðI 1=2 ei Þ as the definition for a nonclassical state, since FðIÞ < 0 invalidates Mandel’s semiclassical formula [88] of photon counting, i.e., it does not allow a classical description in terms of a stochastic intensity. Of course, some states can exhibit a ‘‘weak’’ nonclassicality [96], namely a positive FðIÞ, but with a nonpositive P-function (a relevant example being a coherent state undergoing Kerr-type self-phase modulation). However, from the point of view of the detection theory, such ‘‘weak’’ nonclassical states still admit a classical description in terms of positive intensity probability FðIÞ > 0. For this reason, we adopt nonpositivity of FðIÞ as the definition of nonclassicality. 1. Single-Mode Nonclassicality The authors of Refs. [93,94,96] have pointed out some relations between FðIÞ and generalized moments of the photon distribution, which, in turn, can be used to test the nonclassicality. The problem is reduced to an infinite set of inequalities that provide both necessary and sufficient conditions for nonclassicality [94]. In terms of the photon number
268
MAURO D’ARIANO ET AL.
probability pðnÞ ¼ hnjjni of the state with density matrix , the simplest sufficient condition involves the following three-point relation [94,96] BðnÞ:ðn þ 2ÞpðnÞpðn þ 2Þ ðn þ 1Þ½ pðn þ 1Þ 2 < 0:
ð209Þ
Higher-order sufficient conditions involve five-, seven-, . . . , ð2k þ 1Þ-point relations, always for adjacent values of n. It is sufficient that just one of these inequalities is satisfied in order to assure the negativity of FðIÞ. Notice that for a coherent state BðnÞ ¼ 0 identically for all n. In the following we show that quantum tomography can be used as a powerful tool for performing the nonclassicality test in Equation (209). For less-than-unity quantum efficiency ( < 1), we rely on the concept of a ‘‘noisy state’’ , wherein the effect of quantum efficiency is ascribed to the quantum state itself rather than to the detector. In this model, the effect of quantum efficiency is treated in a Schro¨dinger-like picture, with the state evolving from to , and with playing the role of a time parameter. Such lossy evolution is described by the master equation [37] @t ðtÞ ¼
2aðtÞay ay aðtÞ ðtÞay a , 2
ð210Þ
wherein ðtÞ: with t ¼ ln =. For the nonclassicality test, reconstruction in terms of the noisy state has many advantages. In fact, for nonunit quantum efficiency < 1 the tomographic method introduces errors for pðnÞ which are increasingly large versus n, with the additional limitation that quantum efficiency must be greater than the minimum value ¼ 0:5. On the other hand, the reconstruction of the noisy-state probabilities p ðnÞ ¼ hnj jni does not suffer such limitations, and even though all quantum features are certainly diminished in the noisystate description, nevertheless the effect of nonunity quantum efficiency does not change the sign of the P-function, but only rescales it as follows: PðzÞ ! P ðzÞ ¼
1 Pðz=1=2 Þ:
ð211Þ
Hence, the inequality (209) still represents a sufficient condition for nonclassicality when the probabilities pðnÞ ¼ hnjjni are replaced with p ðnÞ ¼ hnj jni, the latter being given by a Bernoulli convolution, as shown in Equation (22). When referred to the noisy-state probabilities p ðnÞ, the inequality in Equation (209) keeps its form and is simply rewritten as follows B ðnÞ:ðn þ 2Þp ðnÞp ðn þ 2Þ ðn þ 1Þ½p ðn þ 1Þ 2 < 0:
ð212Þ
QUANTUM TOMOGRAPHY
269
The quantities BðnÞ and B ðnÞ are nonlinear in the density matrix. Then, they cannot be measured by averaging a suitable estimator over the homodyne data. Hence, in the evaluation of BðnÞ one has to reconstruct the photon number probabilities pðnÞ, using the estimator R ½jnihnj ðx, ’Þ in Equation (100). The noisy-state probabilities p ðnÞ are obtained by using the same estimator for ¼ 1, namely without recovering the convolution effect of nonunit quantum efficiency. Notice that the estimator does not depend on the phase of the quadrature. Hence, the knowledge of the phase of the local oscillator in the homodyne detector is not needed for the tomographic reconstruction, and it can be left fluctuating in a real experiment. Regarding the estimation of statistical errors, they are generally obtained by dividing the set of homodyne data into blocks, as shown in Section III.C.1. However, in the present case, the nonlinear dependence on the photon number probability introduces a systematic error that is vanishingly small for increasingly larger sets of data. Therefore, the estimated value of BðnÞ is obtained from the full set of data, instead of averaging the mean value of the different statistical blocks. In Figures 10 and 11 some numerical results from Ref. [30] are reported, which are obtained by a Monte Carlo simulation of a quantum tomography experiment. The nonclassicality criterion is tested either on a Schro¨dingercat state j ð Þi / ðj i þ j iÞ or on a squeezed state j , ri:Dð ÞSðrÞj0i, wherein j i, Dð Þ, and SðrÞ denote a coherent state with amplitude ,
FIGURE 10. Tomographic measurement of BðnÞ (dashed trace) with the respective error bars (superimposed in gray-shade) along with the theoretical values (solid trace) for a Schro¨dinger cat state with average photon number n ¼ 5 (left); for a phase-squeezed state with n ¼ 5 and nsq ¼ sinh2 r ¼ 3 squeezing photons (right). In both cases the quantum efficiency is ¼ 0:8 and the number of simulated experimental data is 107 . (From Ref. [30].)
270
MAURO D’ARIANO ET AL.
FIGURE 11. Same as Figure 10, but here for B ðnÞ. (From Ref. [30].) y
the displacement operator Dð Þ ¼ e a a , and the squeezing operator y2 2 SðrÞ ¼ erða a Þ=2 , respectively. Figure 10 shows tomographically obtained values of BðnÞ, with the respective error bars superimposed, along with the theoretical values for a Schro¨dinger-cat state and for a phase-squeezed state (r > 0). For the same set of states the results for B ðnÞ obtained by tomographic reconstruction of the noisy state are reported in Figure 11. Let us compare the statistical errors that affect the BðnÞ and B ðnÞ on the original and the noisy states, respectively. In the first case the error increases with n, whereas in the second it remains nearly constant, albeit with less marked oscillations in B ðnÞ than those in BðnÞ. The nonclassicality of the states here analyzed is experimentally verifiable, as B ð0Þ < 0 by more than five standard deviations. In contrast, for coherent states one obtains small statistical fluctuations around zero for all n. Finally, we remark that the simpler test of checking for antibunching or oscillations in the photon number probability in the case of the phase-squeezed state (left of Figures 10 and 11) would not reveal the nonclassical features of such a state. *
2. Two-Mode Nonclassicality In Ref. [30] it is also shown how quantum homodyne tomography can also be employed to test the nonclassicality of two-mode states. For a two-mode state nonclassicality is defined in terms of nonpositivity of the following phase-averaged two-mode P-function [96]: 1 FðI1 , I2 , Þ ¼ 2p
Z 0
2p
d1 PðI11=2 ei1 , I21=2 eið1 þÞ Þ:
ð213Þ
QUANTUM TOMOGRAPHY
271
In Ref. [96] it is also proved that a sufficient condition for nonclassicality is C ¼ hðn1 n2 Þ2 i ðhn1 n2 iÞ2 hn1 þ n2 i < 0,
ð214Þ
where n1 and n2 are the photon number operators of the two modes. A tomographic test of the inequality in Equation (214) can be performed by averaging the estimators for the involved operators using Table 1. Again, the value ¼ 1 can be used to reconstruct the ensemble averages of the noisy state . As an example, we consider the twin-beam state of Equation (195). The theoretical value of C is given by C ¼ 2j j2 =ð1 j j2 Þ < 0. With regard to the effect of quantum efficiency < 1, the same argument still holds as for the single-mode case: one can evaluate C for the twin beams degraded by the effect of loss, and use ¼ 1 in the estimators. In this case, the theoretical value of C is simply rescaled, namely C ¼ 22 j j2 =ð1 j j2 Þ:
ð215Þ
In Figure 12 we report C vs. 1 , with ranging from 1 to 0.3 in steps of 0.05, for the twin beam in Equation (195) with j j2 ¼ 0:5, corresponding to a total average photon number hn1 þ n2 i ¼ 2. The values of C result from
FIGURE 12. Tomographic measurement of the nonclassical parameter C for twin beams in Equation (195) with j j2 ¼ 0:5. The results are shown for different values of the quantum efficiency (in steps of 0.05), and for each value the number of simulated data is 4 105 . Statistical errors are shown in the gray shade. (From Ref. [30].)
272
MAURO D’ARIANO ET AL.
a Monte Carlo simulation of a homodyne tomography experiment with a sample of 4 105 data. The nonclassicality test in terms of the noisy state gives values of C that are increasingly near the classically positive region for decreasing quantum efficiency . However, the statistical error remains constant and is sufficiently small to allow recognition of the nonclassicality of the twin beams up to ¼ 0:3. We conclude that quantum homodyne tomography allows one to perform nonclassicality tests for single- and two-mode radiation states, even when the quantum efficiency of homodyne detection is rather low. The method involves reconstruction of the photon number probability or of some suitable function of the number operators pertaining to the noisy state, namely the state degraded by the less-than-unity quantum efficiency. The noisy-state reconstruction is affected by the statistical errors; however, they are sufficiently small that the nonclassicality of the state can be tested even for low values of . For the cases considered here, we have shown that the nonclassicality of the states can be proved (deviation from classicality by many error bars) with 105 –107 homodyne data. Moreover, since the knowledge of the phase of the local oscillator in the homodyne detector is not needed for the tomographic reconstruction, it can be left fluctuating in a real experiment.
B. Test of State Reduction In quantum mechanics the state reduction (SR) is still a much discussed rule. The so-called ‘‘projection postulate’’ was introduced by von Neumann [2] to explain the results from the Compton–Simons experiment, and it was generalized by Lu¨ders [101] for measurements of observables with degenerate spectrum. The consistency of the derivation of the SR rule and its validity for generic measurements have been analyzed with some criticism [102]. In a very general context, the SR rule was derived in a physically consistent way from the Schro¨dinger equation for the composite system of object and measuring apparatus [103]. An experiment for testing quantum SR is therefore a very interesting matter. Such a test in general is not equivalent to a test of the repeatability hypothesis since the latter holds only for measurements of observables that are described by self-adjoint operators. For example, joint measurements like the Arthurs–Kelly [54] are not repeatable, as the reduced states are coherent states, which are not orthogonal. Quantum optics offers a possibility of testing the SR, because several observables can be chosen to perform different measurements on a fixed system. For instance, one can decide to perform either homodyne or
273
QUANTUM TOMOGRAPHY
heterodyne, or photon number detection. This is a unique opportunity; in contrast, in particle physics the measurements are mostly quasiclassical and restricted to only a few observables. In addition, optical homodyne tomography allows a precise determination of the quantum system after the SR. A scheme for testing the SR could be based on tomographic measurements of the radiation density matrix after nondemolition measurements. However, such a scheme would reduce the number of observables that are available for the test. Instead, one can take advantage of the correlations between the twin beams of Equation (195) produced by a nondegenerate optical parametric amplifier (NOPA), in which case one can test the SR even for demolitive-type measurements. Indeed, if a measurement is performed on one of the twin beams, the SR can be tested by homodyne tomography on the other beam. This is precisely the scheme for an experimental test of SR proposed in Ref. [31], which is reviewed in the following. The scheme for the SR test is given in Figure 13. Different kinds of measurements can be performed on beam 1, even though here the SR only for heterodyne detection and photon number detection will be considered.
KTP
Re α
Beam 1 Heterodyne
Im α
Beam 2
LO
LO φ Homodyne
Data Analysis
FIGURE 13. Schematic of the proposed scheme for testing the SR for heterodyne detection. A NOPA generates a pair of twin beams (1 and 2). After heterodyning beam 1, the reduced state of beam 2 is analyzed by homodyne tomography, which is conditioned by the heterodyne outcome. In place of the heterodyne detector one can put any other kind of detector for testing the SR on different observables. We also consider the case of direct photodetection. (From Ref. [31].)
274
MAURO D’ARIANO ET AL.
For a system described by a density operator , the probability pðlÞdl that the outcome of a quantum measurement of an observable is in the interval ½l, l þ dlÞ is given by Born’s rule pðlÞdl ¼ Tr½ l dl , where l is the POVM pertaining to the measurement that satisfies l 0 and R dl l ¼ I. For an exact measurement of an observable, which is described by a self-adjoint operator, l is just the projector over the eigenvector corresponding to the outcome l. In the case of the photon number ay a the spectrum is discrete and the POVM is m ¼ jmi hmj for integer eigenvalue m. For the Arthurs–Kelly joint measurement of the position and momentum (corresponding to a joint measurement of two conjugated quadratures of the field) we have the coherent-state POVM ¼ p1 j i h j. When on beam 1 we perform a measurement described by l , the reduced normalized state of beam 2 is ðÞ ¼
Tr1 ½j i h jð 1Þ y ¼ , Tr1,2 ½j i h jð 1Þ pðÞ
ð216Þ
where O denotes the transposed operator (on a fixed basis), ¼ y ð1 j j2 Þ1=2 a a , and pðlÞ ¼ Tr1,2 ½ l y is the probability density of the measurement outcome l. In the limit of infinite gain j j ! 1 one has ðlÞ / l . For example, for heterodyne detection with outcome , we have ð Þ ¼ j * i h * j. If the readout detector on beam 1 has quantum efficiency r , Equation (216) is replaced with r ðÞ ¼
ðr Þ y , pr ðÞ
ð217Þ
where pr ðlÞ ¼ Tr1,2 ½ ðlr Þ y , and lr is the POVM for measurement with quantum efficiency r . As shown in Section II.D, for heterodyne detection one has the Gaussian convolution r ¼
1 p
Z C
d 2 z ðjz j2 =2r Þ e jzi hzj, p2r
ð218Þ
with 2r ¼ ð1 r Þ=r . For direct photodetection m ¼ jmi hmj is replaced with the Bernoulli convolution mr ¼
1 X j jm j ji h jj: m r ð1 r Þ m j¼m
ð219Þ
QUANTUM TOMOGRAPHY
275
The experimental test proposed here consists of performing conditional homodyne tomography on beam 2, given the outcome l of the measurement on beam 1. We can directly measure the ‘‘fidelity of the test’’ FðÞ ¼ Tr½r ðÞ meas ðÞ ,
ð220Þ
where r ðlÞ is the theoretical state in Equation (217), and meas ðlÞ is the experimentally measured state on beam 2. Notice that we use the term ‘‘fidelity’’ even if FðlÞ is a proper fidelity when at least one of the two states is pure, which occurs in the limit of unit quantum efficiency r . In the following we evaluate the theoretical value of FðlÞ and compare it with the tomographic measured value. The fidelity (220) can be directly measured by homodyne tomography using the estimator for the operator r ðlÞ, namely Z
p
FðÞ ¼ 0
d’ p
Z
þ1 1
dx ph ðx, ’; ÞRh ½r ðÞ ðx, ’Þ,
ð221Þ
where ph ðx, ’; lÞ is the conditional homodyne probability distribution for outcome l at the readout detector. For heterodyne detection on beam 1 with outcome 2 C, the reduced state on beam 2 is given by the displaced thermal state y
r ð Þ ¼ DðÞð1 Þa a Dy ðÞ,
ð222Þ
where ¼ 1 þ ðr 1Þj j2 ,
¼
r * :
ð223Þ
The estimator in Equation (221) is given by 2h 1 2h 2 1, ; ðx ’ Þ , Rh ½ ð Þ ðx, ’Þ ¼ 2 2h 2h r
ð224Þ
where ’ ¼ Re ðei’ Þ, and ða, b; zÞ denotes the customary confluent hypergeometric function. The estimator in Equation (224) is bounded for h > ð1=2Þ , then one needs to have h >
1 1 j j2 ð1 r Þ : 2
ð225Þ
276
MAURO D’ARIANO ET AL.
As one can see from Equation (225), for h > 0:5 the fidelity can be measured for any value of r and any gain parameter of the NOPA. We recall that the condition h > 0:5 is required for the measurement of the density matrix. However, in this direct measurement of the fidelity, the reconstruction of the density matrix is bypassed, and we see from Equation (225) that the bound h ¼ 0:5 can be lowered. The measured fidelity F( ) in Equation (221) with r ð Þ as given in Equation (222) must be compared with the theoretical value Fth ¼ =ð2 Þ,
ð226Þ
that is independent of . For direct photodetection on beam 1 with outcome n, the reduced state on beam 2 is given by r ðnÞ ¼
1
n
ay a
!
n
y
ð1 Þa a :
ð227Þ
The estimator for the fidelity measurement is ð @z Þn 2h 1 2h ð zÞ 2 1, ; x : Rh ½ ðnÞ ðx, ’Þ ¼ 2 2h þ z n! z¼0 2h þ z ð228Þ r
We see that the same bound of Equation (225) holds. In this case the measured fidelity FðnÞ must be compared with the theoretical value
Fth ðnÞ ¼ 2þ2n F 2n þ 1, 2n þ 1; 1; ð1 Þ2 ,
ð229Þ
where Fða, b; c; zÞ denotes the customary hypergeometric function. Several simulations have been reported in Ref. [31] for both heterodyne and photodetection on beam 1. In the former case the quadrature probability distribution has been simulated, pertaining to the reduced state (222) on beam 2, and averaged the estimators in Equation (224). In the latter case the reduced state (227) and the estimators in Equation (228) have been used. Numerical results for the fidelity were thus obtained for different values of the quantum efficiencies r and h , and of the NOPA gain parameter . A decisive test can be performed with samples of just a few thousand measurements. The statistical error in the measurement was found to be rather insensitive to both quantum efficiencies and NOPA gain.
QUANTUM TOMOGRAPHY
277
C. Tomography of Coherent Signals and Applications Quantum homodyne tomography has been proved useful in various experimental situations, such as for measuring the photon statistics of a semiconductor laser [10], for determining the density matrix of a squeezed vacuum [11] and the joint photon number probability distribution of a twin beam created by a nondegenerate optical parametric amplifier [86], and for reconstructing the quantum states of spatial modes with an array detector [104]. In this section we review some experimental results about homodyne tomography with coherent states, with application to the estimation of the loss introduced by simple optical components [32]. The experiment has been performed in the Quantum Optics Lab of the University of Naples, and a schematic is presented in Figure 14. The principal radiation source is provided by a monolithic Nd : YAG laser ( 50 mW at 1064 nm; Lightwave, model 142). The laser has a linewidth of less than 10 kHz/ms with a frequency jitter of less than 300 kHz/s, while its intensity spectrum is shot–noise limited above 2.5 MHz. The laser emits a linearly polarized beam in a TEM00 mode, which is split in two parts by a beam splitter. One part provides the strong local oscillator for the homodyne detector. The other part, typically less than 200 W, is the homodyne signal. The optical paths traveled by the local oscillator and
FIGURE 14. Schematic of the experimental setup. A Nd : YAG laser beam is divided into two beams, the first acting as a strong local oscillator, the second representing the signal beam. The signal is modulated at frequency with a defined modulation depth to control the average photon number in the generated coherent state. The tomographic data are collected by a homodyne detector whose difference photocurrent is demodulated and then acquired by a digital oscilloscope. (From Ref. [32].)
278
MAURO D’ARIANO ET AL.
the signal beams are carefully adjusted to obtain a visibility typically above 75% measured at one of the homodyne output ports. The signal beam is modulated by means of a phase electrooptic modulator (EOM, Linos Photonics PM0202), at 4 MHz, and a halfwave plate (HWP2, HWP3) is mounted in each path to carefully match the polarization state at the homodyne input. The detector is composed of a 50/50 beam splitter (BS), two amplified photodiodes (PD1, PD2), and a power combiner. The difference photocurrent is demodulated at 4 MHz by means of an electrical mixer. In this way the detection occurs outside any technical noise and, more important, in a spectral region where the laser does not carry excess noise. The phase modulation added to the signal beam moves a certain number of photons, proportional to the square of the modulation depth, from the carrier optical frequency ! to the side bands at ! so generating two weak coherent states with engineered average photon number at frequencies ! . The sum sideband mode is then detected as a controlled perturbation attached to the signal beam. The demodulated current is acquired by a digital oscilloscope (Tektronix TDS 520D) with 8-bit resolution and record length of 250,000 points per run. The acquisition is triggered by a triangularshaped waveform applied to the PZT mounted on the local oscillator path. The piezo ramp is adjusted to obtain a 2p phase variation between the local oscillator and the signal beam in an acquisition window. The homodyne data to be used for tomographic reconstruction of the state have been calibrated according to the noise of the vacuum state. This is obtained by acquiring a set of data leaving the signal beam undisturbed while scanning the local oscillator phase. It is important to note that in the case of the vacuum state no role is played by the visibility at the homodyne beam splitter. The tomographic samples consist of N homodyne data fxj , ’j gj¼1,:::,N with phases ’j equally spaced with respect to the local oscillator. Since the piezo ramp is active during the whole acquisition time, we have a single value xj for any phase ’j . From calibrated data we first reconstruct the quantum state of the homodyne signal. According to the experimental setup, we expect a coherent signal with nominal amplitude that can be adjusted by varying the modulation depth of the optical mixer. However, since we do not compensate for the quantum efficiency of photodiodes in the homodyne detector (^90%) we expect to reveal coherent signals with reduced amplitude. In addition, the amplitude is further reduced by the nonmaximum visibility (ranging from 75 to 85%) at the homodyne beam splitter. In Figure 15 we report a typical reconstruction, together with the reconstruction of the vacuum state used for calibration. For both states, we
QUANTUM TOMOGRAPHY
279
FIGURE 15. Reconstruction of the quantum state of the signal, and of the vacuum state used for calibration. For both states, from left to right, we report the raw data, a histogram of the photon number distribution, and a countour plot of the Wigner fumction. The reconstruction has been performed by a smaple of N ¼ 242250 homodyne data. The coherent signal has an estimated average photon number equal to hayai ¼ 8.4. The solid line denotes the theoretical photon distribution of a coherent state with such number of photons. Statistical errors on matrix elements are about 2%. The slight phase asymmetry in the Wigner distribution corresponds to a value of about 2% of the maximum. (From Ref. [32].)
report the raw data, the photon number distribution nn , and a contour plot of the Wigner function. The matrix elements are obtained by sampling the corresponding estimators in Equation (100), whereas pffiffiffiffithe confidence intervals for diagonal elements are given by nn ¼ = N , being the rms deviation of the estimator over data. For off-diagonal elements the confidence intervals are evaluated for the real and imaginary part separately. In order to see the quantum state as a whole, we also report the reconstruction of the Wigner function of the field, which can be expressed in terms of the matrix elements as the discrete Fourier transform Wð , * Þ ¼ Re
1 X d¼0
eid’
1 X n¼0
ðn, d; j jÞn,nþd
ð230Þ
280
MAURO D’ARIANO ET AL.
where ’ ¼ arg , and sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 n! e2j j Ldn ðj2 j2 Þ, ðn, d; j jÞ ¼ ðÞ 2ð2 d0 Þj2 j ðn þ dÞ! n
d
ð231Þ
Ldn ðxÞ denoting the Laguerre polynomials. Of course, the series in Equation (230) should be truncated at some point, and therefore the Wigner function can be reconstructed only at some finite resolution. Once the coherence of the signal has been established we may use homodyne tomography to estimate the loss imposed by a passive optical component like an optical filter. The procedure may be outlined as follows. We first estimate the initial mean photon number n0 ¼ j 0 j2 of the signal beam, and then the same quantity inserting an optical neutral density filter in the signal path. If is the loss parameter, then the coherent amplitude is reduced to ¼ 0 e , and the intensity to n ¼ n0 e2 . The estimation of the mean photon number can be performed adaptively on data, using the general method presented in Section III.D.2. One takes the average of the estimator 1 R½ay a ðx, ’Þ ¼ 2x2 þ ei2’ þ * ei2’ , 2
ð232Þ
where is a parameter to be determined in order to minimize fluctuations. As proved in Ref. [22] one has ¼ 1=2hay2 i, which itself can be obtained from homodyne data. In practice, one uses the data sample twice: first to evaluate , then to obtain the estimate for the mean photon number. In Figure 16 the tomographic determinations of n are compared with the expected values for three sets of experiments, corresponding to three different initial amplitudes. The expected values are given by n ¼ n0 e2 V, where is the value obtained by comparing the signal d.c. currents I0 and I at the homodyne photodiodes and V ¼ V =V 0 is the relative visibility. The solid line in Figure 16 denotes these values. The line is not continuous due to variations of visibility. It is apparent from the plot that the estimation is reliable in the whole range of values we could explore. It is worth noting that the estimation is absolute, i.e., it does not depend on the knowledge of the initial amplitude, and it is robust, since it can be performed independently of the quantum efficiency of the homodyne detector. One may notice that the estimation of loss can be pursued also by measuring an appropriate observable, typically the intensity of the light beam with and without the filter. However, this is a concrete possibility only
QUANTUM TOMOGRAPHY
281
FIGURE 16. Estimation of the mean photon number of a coherent signal as a function of the loss imposed by an optical filter. Three sets of experiments, corresponding to three different initial amplitudes are reported. Open circles are the tomographic determinations, whereas the solid lines denote the expected values, as follow from nominal values of loss and visibility at the homodyne detector. Statistical errors are within the circles (From Ref. [32].)
for high-amplitude signals, whereas losses on weak coherent states cannot be properly characterized neither by direct photocounting using photodiodes (due to the low quantum efficiency and large fluctuations) nor by avalanche photodetectors (due to the impossibility of discriminating among the number of photons). On the contrary, homodyne tomography provides the mean intensity (actually the whole photon distribution) independent of the signal level, thus allowing a precise characterization also in the quantum regime. Indeed, in Ref. [22] adaptive tomographic determination of the mean photon number has been extensively applied to (numerically simulated) homodyne data for coherent states of various amplitudes. The analysis has shown that the determination is reliable also for small samples and that precision is not much affected by the intensity of the signal.
VII. TOMOGRAPHY OF A QUANTUM DEVICE If we want to determine experimentally the operation of a quantum device, we need, by definition, quantum tomography. In fact, the characterization of the device operation could be done by running a basis of possible known inputs, and determining the corresponding outputs by quantum tomography. In quantum mechanics the inputs are density operators, and the role
282
MAURO D’ARIANO ET AL.
of the transfer matrix is played by the so-called quantum operation of the device, here denoted by E. Thus the output state out (a part from a possible normalization) is given by the quantum operation applied to the input state as follows out ¼ Eðin Þ:
ð233Þ
Since the set of states actually belongs to a space of operators, this means that if we want to characterize E completely, we need to run a complete orthogonal basis of quantum states jni ðn ¼ 0, 1, 2, . . .Þ, along with their pffiffiffi linear combinations ð1= 2Þðjn0 i þ ik jn00 iÞ, with k ¼ 0, 1, 2, 3 and i denoting the imaginary unit. However, the availability of such a set of states in the laboratory is, by itself, a very difficult technological problem. For example, for an optical device, the states jni are those with a precise number n of photons, and, apart from very small n—say at most n ¼ 2—they have never been achieved in the laboratory, whereas preparing their superpositions remains a dream for experimentalists, especially if n 1 (a kind of Schrodinger kitten state). The idea of achieving the quantum operation of a device by scanning the inputs and making tomography of the corresponding output is the basis of the early methods proposed in Refs. [105,106]. Due to the mentioned problems of the availability of input states, both methods have limited application. The method of Ref. [105] has been designed for NMR quantum processing, whereas the method of Ref. [106] was conceived for determining the Liouvillian of a phase-insensitive amplifier, namely for a case in which the quantum operation has no off-diagonal pffiffiffi matrix elements, to evaluate which one needs the superpositions ð1= 2Þðjn0 i þ ik jn00 iÞ with k ¼ 0,1,2,3 mentioned above. The problem of availability of input states and their superpositions was partially solved by the method of Ref. [107], where it was suggested to use randomly drawn coherent states to estimate the quantum operation of an optical device via a maximum likelihood approach. This method, however, cannot be used for quantum systems different from the em radiation—such as finite dimensional systems, i.e., qubits—due to the peculiarity of coherent states. The solution to the problem came with the method of Ref. [25], where the problem of the availability of input states was solved by using a single bipartite entangled input, which is equivalent to run all possible input states in a kind of ‘‘quantum parallel’’ fashion (bipartite entangled states are nowadays easily available in most quantum systems of interest). The method is also very simple and effective, and its experimental feasibility (for single-photon polarization-encoded qubits) has been already demonstrated in an experiment performed in the Francesco De Martini laboratory in Roma La Sapienza [108]. In the next sections
QUANTUM TOMOGRAPHY
283
we will review the general method and report some computer simulated results from Ref. [25]. A. The Method As already mentioned, the description of a general state transformation in quantum mechanics is given in terms of the so-called quantum operation. The state transformation due to the quantum operation E is given as follows !
EðÞ : TrðEðÞÞ
ð234Þ
The transformation occurs with probability given by p ¼ Tr½EðÞ 1. The quantum operation E is a linear, trace-decreasing completely positive (CP) map. We remember that a map is completely positive if it preserves positivity generally when applied locally to an entangled state. In other words, upon denoting by I the identical map on the Hilbert space K of a second quantum system, the extended map E I on H K is positive for any extension K. Typically, the CP map is written using a Kraus decomposition [109] as follows EðÞ ¼
X
Kn Kny ,
ð235Þ
n
where the operators Kn satisfy X
Kny Kn I:
ð236Þ
n
The transformation (235) occurs with generally nonunit probability Tr½EðÞ 1, and the probability is unity independent of when E is trace-preserving, i.e., when we have the equal sign in Equation (236). The particular case of unitary transformations corresponds to having just one term K1 ¼ U in the sum (235), with U unitary. However, one can consider also nonunitary operations with one term only, namely EðÞ ¼ AAy ,
ð237Þ
where A is a contraction, i.e., jjAjj 1. Such operations leave pure states as pure, and describes, for example, the state reduction from a measurement apparatus for a particular fixed outcome that occurs with probability Tr½Ay A 1.
284
MAURO D’ARIANO ET AL.
In the following we will use the notation for bipartite pure states introduced in Equation (45), and we will denote by O and O* the transposed and the conjugate operator of O with respect to some prechosen orthonormal basis. The basic idea of the method in Ref. [25] is the following. An unknown quantum operation E can be determined experimentally through quantum tomography, by exploiting the following one-to-one correspondence: E $ RE between quantum operations E and positive operators RE on two copies of the Hilbert space H H RE ¼ E I ðjIii hhIjÞ,
EðÞ ¼ Tr2 ½I RE :
ð238Þ
Notice that the vector jIii represents a (unnormalized) maximally entangled state. If we consider a bipartite input state j ii and operate with E only on one Hilbert space as in Figure 17, the output state is given by Rð Þ:E I ðj ii hh jÞ: For invertible
ð239Þ
the two matrices R(I):RE and R( ) are related as follows RðIÞ ¼ ðI
1
Rð ÞðI
1*
Þ:
ð240Þ
Hence, the (four-index) quantum operation matrix RE can be obtained by estimating via quantum tomography the following ensemble averages
hhi, jjRðIÞjl, kii ¼ Tr Rð Þ jli hij
1*
jki h jj
1*
:
ð241Þ
FIGURE 17. General scheme of the method for the tomographic estimation of a quantum operation. Two identical quantum systems are prepared in a bipartite state j ii, with invertible . One of the two systems undergoes the quantum operation E, whereas the other is left untouched. At the output one performs a quantum tomographic estimation, by measuring jointly two observables Xl and Xl0 from two quorums {Xl} and {Xl0 } for the two Hilbert spaces, such as two different quadratures of the two field modes in a two-mode homodyne tomography. (From Ref. [25].)
QUANTUM TOMOGRAPHY
285
Then one simply has to perform a quantum tomographic estimation, by measuring jointly two observables Xl and Xl0 from two quorums {Xl} and {Xl0 } for the two entangled quantum systems. B. An Example in the Optical Domain In Ref. [25] it is shown that the proposed method for quantum tomography of a device can be actually performed using joint homodyne tomography on a twin-beam from downconversion of vacuum, with an experimental setup similar to that used in the experiment in Ref. [86]. The feasibility analysis considers, as an example, the experimental determination of the quantum operation corresponding to the unitary displacement operator y DðzÞ ¼ eza z* a . The pertaining matrix R(I) is given by RðIÞ ¼ jDðzÞii hhDðzÞj,
ð242Þ
which is the (unnormalizable) eigenstate of the operator a by with eigenvalue z, as shown in Section II.D. As an input bipartite state, one uses the twin beam from parametric downconversion of Equation (195), which is clearly invertible, since ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi y 1 j j2 a a ,
1
1 y ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a a : 2 1 j j Þ
ð243Þ
The experimental apparatus is the same as in the experiment of Ref. [86], where the twin beam is provided by a nondegenerate optical parametric amplifier (a KTP crystal) pumped by the second harmonic of a Q-switched mode-locked Nd : YAG laser, which produces a 100-MHz train of 120-ps duration pulses at 1064 nm. The orthogonally polarized twin beams emitted by the KTP crystal (one of which is displaced DðzÞ by a nearly transparent beam splitter with a strong local oscillator) are separately detected by two balanced homodyne detectors that use two independent local oscillators derived from the same laser. This provides the joint tomography of quadratures X0 X00 needed for the reconstruction. The only experimental problem which still needs to be addressed (even though is practically solvable) with respect to the original experiment of Ref. [86] is the control of the quadrature phases 0 and 00 with respect to the LO, which in the original experiment were random. In Figure 18 the results of a simulated experiment are reported, for displacement parameter z ¼ 1, and for some typical values of the quantum efficiency at homodyne detectors and of the total average photon
286
MAURO D’ARIANO ET AL.
FIGURE 18. Homodyne tomography of the quantum operation corresponding to the unitary displacement operator DðzÞ, with z ¼ 1. The reconstructed diagonal elements Ann ¼ hnjDðzÞjni are shown (thin solid lines on an extended abscissa range, with their respective error bars in gray shade), compared to the theoretical value (thick solid lines). Similar results are obtained for off-diagonal terms. The reconstruction has been achieved using at the input the twin beam state of Equation (195), with total average photon number n and quantum efficiency at homodyne detectors . Left: n ¼ 5, ¼ 0:9, and 150 blocks of 104 data have been used. Right: n ¼ 3, ¼ 0:7, and 300 blocks of 2 105 data have been used. (From Ref. [25].)
number n of the twin beam. The diagonal elements Ann ¼ hnjDðzÞjni ¼ ½hnjhnjRDðzÞ jnini 1=2 are plotted for the displacement operator with z ¼ 1. The reconstructed values are shown by thin solid lines on an extended abscissa range, with their respective error bars in gray shade, and compared to the theoretical probability (thick solid line). A good reconstruction of the matrix can be achieved in the given range with n 1, quantum efficiency as low as ¼ 0.7, and 106 –107 data. The number of data can be decreased by a factor of 100–1000 using the tomographic max-likelihood techniques of Ref. [23], at the expense, however, of the complexity of the algorithm. Improving the quantum efficiency and increasing the amplifier gain (toward a maximally entangled state) have the effect of making statistical errors smaller and more uniform versus the photon labels n and m of the matrix Anm . It is worth emphasizing that the quantum tomographic method of Ref. [25] for measuring the matrix of a quantum operation can be much improved by means of a max-likelihood strategy aimed at the estimation of some unknown parameters of the quantum operation. In this case, instead of obtaining the matrix elements of R(I) from the ensemble averages in Equation (241), one parametrizes R(I) in terms of unknown quantities to be experimentally determined, and the likelihood is maximized for the set of experimental data at various randomly selected (tensor) quorum elements,
QUANTUM TOMOGRAPHY
287
keeping the same fixed bipartite input state. This method is especially useful for a very precise experimental comparison between the characteristics of a given device (e.g., the gain and loss of an active fiber) and those of a quantum standard reference.
VIII. MAXIMUM LIKELIHOOD METHOD IN QUANTUM ESTIMATION Quantum estimation of states, observables, and parameters is, from very basic principles, a matter of statistical inference from a population sampling, and the most comprehensive quantum estimation procedure is quantum tomography. As we have shown in Section III, the expectation value of an operator is obtained by averaging an estimator over the experimental data of a ‘‘quorum’’ of observables. The method is very general and efficient, however, in the averaging procedure, we have fluctuations which result in relatively large statistical errors. Another relevant strategy, the maximum likelihood (ML) method, can be used for measuring unknown parameters of transformation on a given state [33], or for measuring the matrix elements of the density operator itself [23]. The ML strategy [110,111] is an entirely different approach to quantum state measurement compared to the standard quantum tomographic techniques. The ML procedure consists in finding the quantum state, or the value of the parameters, that are most likely to generate the observed data. This idea can be quantified and implemented using the concept of the likelihood functional. As regards state estimation, the ML method estimates the quantum state as a whole. Such a procedure incorporates a priori knowledge about relations between elements of the density matrix. This guarantees positivity and normalization of the matrix, with the result of a substantial reduction of statistical errors. Regarding the estimation of specific parameters, we notice that in many cases the resulting estimators are efficient, unbiased, and consistent, thus providing a statistically reliable determination. As we will show, by using the ML method only small samples of data are required for a precise determination. However, we want to emphasize that such a method is not always the optimal solution of the tomographic problem, since it suffers from some major limitations. Besides being biased due to the Hilbert space truncation—even though the bias can be very small if, from other methods, we know where to truncate—it cannot be generalized to the estimation of any ensemble average, but just of a set of
288
MAURO D’ARIANO ET AL.
parameters on which the density matrix depends. In addition, for increasing number of parameters the method has exponential complexity. In the following we will review the ML methods proposed in Refs. [23] and [33], by deriving the likelihood functional, and applying the ML method to the quantum state reconstruction, with examples for both radiation and spin systems, and, finally, considering the ML estimation for the relevant class of Gaussian states in quantum optics. A. Maximum Likelihood Principle Here we briefly review the theory of the ML estimation of a single parameter. The generalization to several parameters, as for example the elements of the density matrix, is straightforward. The only point that should be carefully analyzed is the parameterization of the multidimensional quantity to be estimated. In the next section the specific case of the density matrix will be discussed. Let p(xjl) the probability density of a random variable x, conditioned to the value of the parameter l. The form of p is known, but the true value of l is unknown, and will be estimated from the result of a measurement of x. Let x1 , x2 , . . . , xN be a random sample of size N. The joint probability density of the independent random variable x1 , x2 , . . . , xN (the global probability of the sample) is given by Lðx1 , x2 , . . . , xN jÞ ¼ N k¼1 pðxk jÞ,
ð244Þ
and is called the likelihood function of the given data sample (hereafter we will suppress the dependence of L on the data). The maximum likelihood estimator (MLE) of the parameter l is defined as the quantity lml :lml ðfxk gÞ that maximizes LðlÞ for variations of l, namely lml is given by the solution of the equations @LðÞ ¼ 0; @
@2 LðÞ < 0: @2
ð245Þ
The first equation is equivalent to @L=@l ¼ 0 where LðÞ ¼ log LðÞ ¼
N X k¼1
is the so-called log-likelihood function.
log pðxk jÞ
ð246Þ
QUANTUM TOMOGRAPHY
289
In order to obtain a measure for the confidence interval in the determination of lml we consider the variance 2 ¼
Z "Y
# dxk pðxk jÞ ½ml ðfxk gÞ 2 :
ð247Þ
k
In terms of the Fisher information Z F¼
@pðxjÞ 2 1 , dx @ pðxjÞ
ð248Þ
it is easy to prove that 2
1 , NF
ð249Þ
where N is the number of measurements. The inequality in Equation (249) is known as the Crame´r–Rao bound [112] on the precision of the ML estimation. Notice that this bound holds for any functional form of the probability distribution pðxjlÞ, provided that the Fisher information exists 8l and @l pðxjlÞ exists 8x. When an experiment has ‘‘good statistics’’ (i.e., for a large enough data sample) the Crame´r–Rao bound is saturated. B. ML Quantum State Estimation In this section we review the method of the maximum likelihood estimation of the quantum state of Ref. [23], focusing attention to the cases of homodyne and spin tomography. We consider an experiment consisting of N measurements performed on identically prepared copies of a given quantum system. Each measurement is described by a positive operator-valued measure (POVM). The outcome of the ith measurement corresponds to the realization of a specific element of the POVM used in the corresponding run, and we denote this element by i . The likelihood is here a functional of the density matrix LðÞ and is given by the product LðÞ ¼
N Y
Trði Þ,
ð250Þ
i¼1
which represents the probability of the observed data. The unknown element of the above expression, which we want to infer from data, is the
290
MAURO D’ARIANO ET AL.
density matrix describing the measured ensemble. The estimation strategy of the ML technique is to maximize the likelihood functional over the set of the density matrices. Several properties of the likelihood functional are easily found, if we restrict ourselves to finite dimensional Hilbert spaces. In this case, it can be easily proved that LðÞ is a concave function defined on a convex and closed set of density matrices. Therefore, its maximum is achieved either on a single isolated point, or on a convex subset of density matrices. In the latter case, the experimental data are insufficient to provide a unique estimate for the density matrix using the ML strategy. On the other hand, the existence of a single maximum allows us to assign unambiguously the ML estimate for the density matrix. The ML estimation of the quantum state, despite its elegant general formulation, results in a highly nontrivial constrained optimization problem, even if we resort to purely numerical means. The main difficulty lies in the appropriate parameterization of the set of all density matrices. The parameter space should be of the minimum dimension in order to preserve the maximum of the likelihood function as a single isolated point. Additionally, the expression of quantum expectation values in terms of this parameterization should enable fast evaluation of the likelihood function, as this step is performed many times in the course of numerical maximization. For such a purpose one introduces [23] a parameterization of the set of density matrices which provides an efficient algorithm for maximization of the likelihood function. We represent the density matrix in the form ¼ T y T,
ð251Þ
which automatically guarantees that is positive and Hermitian. The remaining condition of unit trace Tr ¼ 1 will be taken into account using the method of Lagrange multipliers. In order to achieve the minimal parameterization, we assume that T is a complex lower triangular matrix, with real elements on the diagonal. This form of T is motivated by the Cholesky decomposition known in numerical analysis [113] for arbitrary nonnegative Hermitian matrix. For an M-dimensional Hilbert space, the number of real parameters in the matrix T is M þ 2MðM 1Þ=2 ¼ M 2 , which equals the number of independent real parameters for a Hermitian matrix. This confirms that such parameterization is minimal, up to the unit trace condition. In numerical calculations, it is convenient to replace the likelihood functional by its natural logarithm, which of course does not change the location of the maximum. Thus the log-likelihood function subjected to
QUANTUM TOMOGRAPHY
291
numerical maximization is given by LðTÞ ¼
N X
ln TrðT y Ti Þ TrðT y TÞ,
ð252Þ
i¼1
where l is a Lagrange multiplier accounting P for normalization of . Writing in terms of its eigenvectors j i as ¼ y2 j ih j,, with real y , the maximum likelihood condition @L=@y ¼ 0 reads y ¼
N X
½y h
ji j i=Trði Þ ,
ð253Þ
i¼1
which, after multiplication by y and summation over , yields l ¼ N. The Lagrange multiplier then equals the total number of measurements N. This formulation of the maximization problem allows one to apply standard numerical procedures for searching the maximum over the M 2 real parameters of the matrix T. The examples presented below use the downhill simplex method [114]. The first example is the ML estimation of a single-mode radiation field. The experimental apparatus used in this technique is the homodyne detector. According to Section II.D the homodyne measurement is described by the POVM sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 2 ðX’ xÞ2 , ð254Þ exp Hðx; ’Þ ¼ pð1 Þ 1 where is the detector efficiency, and X’ ¼ ðay ei’ þ aei’ Þ=2 is the quadrature operator at phase ’. After N measurements, we obtain a set of pairs ðxi ; ’i Þ, where i ¼ 1, . . . , N. The log-likelihood functional is given by Equation (252) with i :Hðxi ; ’i Þ. Of course, for a light mode it is necessary to truncate the Hilbert space to a finite dimensional basis. We shall assume that the highest Fock state has M 1 photons, i.e., that the dimension of the truncated Hilbert space is M. For the expectation Tr½T y THðx; ’Þ it is necessary to use an expression which is explicitly positive, in order to protect the algorithm against the occurrence of small negative numerical arguments of the logarithm function. A simple derivation yields Tr½T y THðx; ’Þ ¼
2 kj k X XX pffiffiffi M1 pffiffiffi in’ hkjTjn þ jiB hnj xie nþj,n , k¼0 j¼0
n¼0
ð255Þ
292
MAURO D’ARIANO ET AL.
where Bnþj,n ¼
nþj n
1=2 j n , ð1 Þ
ð256Þ
and 1=4 pffiffiffi 2 1 pffiffiffiffiffiffiffiffiffi Hn ð 2xÞ expðx2 Þ hnjxi ¼ n p 2 n!
ð257Þ
are the eigenstates of the harmonic oscillator in the position representation—Hn(x) being the nth Hermite polynomial. The ML technique can be applied to reconstruct the density matrix in the Fock basis from Monte Carlo simulated homodyne statistics. Figure 19 depicts the matrix elements of the density operator as obtained for a coherent state and a squeezed vacuum. Remarkably, only 50,000 homodyne data have been used for quantum efficiency ¼ 80%. We recall that in quantum homodyne tomography the statistical errors are known to grow rapidly with decreasing efficiency of the detector [29,80]. In contrast, the elements of the density matrix reconstructed using the ML approach remain bounded, as the whole matrix must satisfy positivity and normalization constraints. This results in much smaller statistical errors. As a comparison one could see that the same precision of the reconstructions in Figure 19 could be achieved using 107 –108 data samples with conventional quantum
0.8 0.6
0.2
ρ n,m
ρ n,m
0.3
0.1
0.4 0.2
4
0
2 m
0
4
0
2 m
0
2 n
2 4
0
n
4
0
FIGURE 19. Reconstruction of the density matrix of a single-mode radiation field by the ML method. The plot shows the matrix elements of a coherent state (left) with hay ai ¼ 1 photon, and for a squeezed vacuum (right) with hay ai ¼ 0:5 photon. A sample of 50,000 simulated homodyne data for quantum efficiency ¼ 80% has been used. (From Ref. [23].)
QUANTUM TOMOGRAPHY
293
tomography. On the other hand, in order to find numerically the ML estimate we need to set a priori the cut-off parameter for the photon number, and its value is limited by increasing computation time. Another relevant example is the reconstruction of the quantum state of two-mode field using single-LO homodyning of Section V. Here, the full joint density matrix can be measured by scanning the quadratures of all possible linear combinations of modes. For two modes the measured quadrature operator is given by
Xð,
0,
1Þ
1 ¼ ðaei 2
0
cos þ bei
1
sin þ h:c:Þ,
ð258Þ
where ð, 0 , 1 Þ 2 S2 ½0, 2p , S 2 being the Poincare´ sphere and one phase ranging between 0 and 2p. In each run these parameters are chosen randomly. The POVM describing the measurement is given by the righthand side of Equation (254), with X’ replaced by Xð, 0 , p1ffiffiÞ.ffi An experiment for the ptwo ffiffiffi orthogonal states j1 i ¼ ðj00i þ j11iÞ= 2 and j2 i ¼ ðj01i þ j10iÞ= 2 has been simulated, in order to reconstruct the density matrix in the two-mode Fock basis using the ML technique. The results are reported in Figure 20. The ML procedure can also be applied for reconstructing the density matrix of spin systems. For example, let us consider N repeated preparations of a pair of spin-1/2 particles. The particles are shared by two parties. In each run, the parties select randomly and independently from each other a
ρ nm,ls
ρ nm,ls
0.5 0.3 0.1 00 01 10 nm 02 11 20
20 11 02 10 ls 01 00
0.3 0.1 00 01 10 nm 02 11 20
20 11 02 10 01 ls 00
FIGURE 20. ML reconstruction of the density matrix of a two-mode p radiation field. On the ffiffiffi left the matrix elements pffiffiffi obtained for the state j1 i ¼ ðj00i þ j11iÞ= 2; on the right for j2 i ¼ ðj01i þ j10iÞ= 2. For j1 i we used 100,000 simulated homodyne data and ¼ 80%; for j2i we used 20,000 data and ¼ 90%. (From Ref. [23].)
294
MAURO D’ARIANO ET AL.
direction along which they perform a spin measurement. The obtained result is described by the joint projection operator (spin coherent states [115]) B A B A B F i ¼ jA i , i i hi , i j, where i and i are the vectors on the Bloch sphere corresponding to the outcomes of the ith run, and the indices A and B refer to the two particles. As in the previous examples, it is convenient to use an expression for the quantum expectation value TrðT y TF i Þ which is explicitly positive. The suitable form is TrðT y TF i Þ ¼
X
B 2 jhjTjA i , i ij ,
ð259Þ
where ji is an orthonormal basis in the Hilbert space of the two particles. The result of a simulated experiment with only 500 data for the reconstruction of the density matrix of the singlet state is shown in Figure 21. Summarizing, the ML technique can be used to estimate the density matrix of a quantum system. With respect to conventional quantum tomography this method has the great advantage of needing much smaller experimental samples, making experiments with low data rates feasible; however, with a truncation of the Hilbert space dimension. We have shown that the method is general and the algorithm has solid methodological background, its reliability being confirmed in a number of Monte Carlo simulations. However, for increasing dimension of Hilbert spaces the method has exponential complexity.
0.5 ρ nm,ls
0.3 0.1 − 0.1 − 0.3 − 0.5 00 01 10 02 nm
02 10 01 ls 00
FIGURE 21. ML reconstruction of the density matrix of a pair of spin-1/2 particles in the singlet state. The particles are shared by two parties. In each run, the parties select randomly and independently from each other a direction along which they perform a spin measurement. The matrix elements have been obtained by a sample of 500 simulated data. (From Ref. [23].)
QUANTUM TOMOGRAPHY
295
C. Gaussian State Estimation In this section we review the ML determination method of Ref. [33] for the parameters of Gaussian states. Such states represent the wide class of coherent, squeezed, and thermal states, all of them being characterized by a Gaussian Wigner function. Apart from an irrelevant phase, we consider Wigner functions of the form Wðx, yÞ ¼
22 exp 22 e2r ðx Re Þ2 þ e2r ðy Im Þ2 , p
ð260Þ
and the ML technique with homodyne detection is applied to estimate the four real parameters , r, Re , and Im . The four parameters provide the number of thermal, squeezing, and coherent-signal photons in the quantum state as follows nth ¼
1 1 1 , 2 2
nsq ¼ sinh2 r, ncoh ¼ jj2 :
ð261Þ
The density matrix corresponding to the Wigner function in Equation (260) is written ¼ DðÞSðrÞ
ay a 1 nth S y ðrÞDy ðÞ, nth þ 1 nth þ 1
ð262Þ
where SðrÞ¼exp½rða2 ay2 Þ=2 and DðÞ¼expðay * aÞ denote the squeezing and displacement operators, respectively. The theoretical homodyne probability distribution at phase ’ with respect to the local oscillator can be evaluated using Equation (7), and is given by the Gaussian sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 22 pðx, ’Þ ¼ pðe2r cos2 ’ þ e2r sin2 ’Þ 22 i’ 2 x Re ðe Þ : exp e2r cos2 ’ þ e2r sin2 ’
ð263Þ
296
MAURO D’ARIANO ET AL.
The log-likelihood function (246) for a set of N homodyne outcomes xi at random phase ’i is then written as follows L¼
N X 1 22 log 2 pðe2r cos2 ’i þ e2r sin2 ’i Þ i¼1
2 22 xi Re ðei’i Þ : 2 2r 2 2r e cos ’i þ e sin ’i
ð264Þ
The ML estimators ml, ml, Reml, and Imml are found upon maximizing Equation (264) versus , r, Re, and Im. In order to evaluate globally the state reconstruction, one considers the normalized overlap O between the theoretical and the estimated state Tr½ml O ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : Tr½2 Tr½2ml
ð265Þ
Notice that O ¼ 1 iff ¼ ml. Through Monte Carlo simulations, one always finds a value around unity, typically with statistical fluctuations over the third digit, for number of data samples N ¼ 50,000, quantum efficiency at homodyne detectors ¼ 80%, and state parameters with the following ranges: nth < 3, ncoh < 5, and nsq < 3. Also with such a small number of data samples, the quality of the state reconstruction is so good that other physical quantities that are theoretically evaluated from the experimental values of ml, rml, Reml, and Imml are inferred very precisely. For example, in Ref. [33] the photon number probability of a squeezed thermal state has been evaluated, which is given by the integral Z
2p
hnjjni ¼ 0
d ½Cð, nth , rÞ 1 n , 2p Cð, nth , rÞnþ1
ð266Þ
with Cð, nth , rÞ ¼ ðnth þ 1=2Þðe2r sin2 þ e2r cos2 Þ þ 1=2. The comparison of the theoretical and the experimental results for a state with nth ¼ 0.1 and nsq ¼ 3 is reported in Figure 22. The statistical error of the reconstructed number probability affects the third decimal digit, and is not visible on the scale of the plot. The estimation of parameters of Gaussian Wigner functions through the ML method allows one to estimate the parameters in quadratic Hamiltonians of the generic form H ¼ a þ * ay þ ’ay a þ
1 2 1 * y2
a þ a : 2 2
ð267Þ
QUANTUM TOMOGRAPHY
297
FIGURE 22. Photon number probability of a squeezed thermal state (thermal photons nth ¼ 0.1, squeezing photons nsq ¼ 3). Compare the reconstructed probabilities by means of the maximum likelihood method and homodyne detection (gray histogram) with the theoretical values (black histogram). Number of data samples N ¼ 50,000, quantum efficiency ¼ 80%. The statistical error affects the third decimal digit, and it is not visible on the scale of the plot. (From Ref. [33].)
In fact, the unitary evolution operator U ¼ eiHt preserves the Gaussian form of an input state with Gaussian Wigner function. In other words, one can use a known Gaussian state to probe and characterize an optical device described by a Hamiltonian as in Equation (267). Assuming t ¼ 1 without loss of generality, the Heisenberg evolution of the radiation mode a is given by U y aU ¼ a þ ay þ ,
ð268Þ
with qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ’ ¼ cosð ’2 j j2 Þ i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sinð ’2 j j2 Þ, ’2 j j2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi q
* ¼ i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sinð ’2 j j2 Þ, ’2 j j2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ’ * * * 2 2 ¼ 2 ðcosð ’ j j Þ 1Þ i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sinð ’2 j j2 Þ: ’ j j2 ’2 j j2
ð269Þ
For an input state with known Wigner function W( , *), the corresponding output Wigner function is WUU y ð , * Þ ¼ W ½ð Þ * ð * * Þ, ð * * Þ ð Þ* : ð270Þ
298
MAURO D’ARIANO ET AL.
Hence, by estimating the parameters , , and inverting Equation (269), one obtains the ML values for , ’ and of the Hamiltonian in Equation (267). The present example can be used in practical applications for the estimation of the gain of a phase-sensitive amplifier or equivalently to estimate a squeezing parameter.
IX. CLASSICAL IMAGING BY QUANTUM TOMOGRAPHY As we showed in Section II, the development of quantum tomography has its origin in the inadequacy of classical imaging procedures to face the quantum problem of Wigner function reconstruction. In this section we briefly illustrate how to go back to classical imaging and profitably use quantum tomography as a tool for image reconstruction and compression: this is the method of fictitious photons tomography of Ref. [34]. The problem of tomographic imaging is to recover a mass distribution m(x, y) in a two-dimensional slab from a finite collection of one-dimensional projections. The situation is schematically sketched in Figure 23 where m(x, y) describes two circular holes in a uniform background. The tomographic machine, say X-ray equipment, collects many stripe photos of the sample from various directions , and then numerically performs a mathematical transformation—the so-called inverse Radon transform [116]—in order to reconstruct m(x, y) from its radial profiles at different
FIGURE 23. (a) Tomography of a simple object: analytical transmission profiles are reported for ¼ 0, p/2. (b) The same case of (a), but for very weak signals: in this case the transmission profiles are given in terms of random points on a photographic plate (here obtained from a Monte Carlo simulation). (From Ref. [34].)
QUANTUM TOMOGRAPHY
299
values of . The problem which is of interest for us is when the radial profiles are not well-resolved digitalized functions, but actually represent the density distribution of random points, as if in our X-ray machine the beam is so weak that radial photos are just the collection of many small spots, each from a single X-ray photon (this situation is sketched in Figure 23(b). It is obvious that this case can be reduced to the previous one by counting all points falling in a predetermined one-dimensional mesh, and giving radial profiles in the form of histograms (this is what actually happens in a real machine, using arrays of photodetectors). However, we want to use the whole available information from each ‘‘event’’—i.e., the exact onedimensional location of each spot—in a way which is independent of any predetermined mesh. In practice, this situation occurs when the signal is so weak and the machine resolution is so high (i.e., the mesh-step is so tiny) that only zero or one photon at most can be collected in each channel. As we will see, this low-signal/high-resolution case naturally brings the imaging problem into the domain of quantum tomography. Images are identified with Wigner functions, so as to obtain a description in terms of density matrices. These are still trace-class matrices (corresponding to ‘‘normalizable’’ images), but are no longer positive definite, because an ‘‘image’’ generally is not a genuine Wigner function and violates the Heisenberg relations on the complex plane (the phase space of a single mode of radiation). Hence, such density matrices are unphysical: they are just a mathematical tool for imaging. This is the reason why this method has been named fictitious photons tomography [34]. As we will see in the following, the image resolution improves by increasing the rank of the density matrix, and in this way the present method also provides a new algorithm for image compression, which is suited to angular image scanning.
A. From Classical to Quantum Imaging We adopt the complex notation, with ¼ x þ iy representing a point in the image plane. In this way and * are considered as independent variables, and the two-dimensional image—here denoted by the same symbol W( , *) used for the Wigner function—is just a generic real function of the point in the plane. In the most general situation W( , *) is defined on the whole complex plane, where it is normalized to some finite constant, and it is bounded from both below and above, with range representing the darkness nuance. For X-ray tomography W( , *) roughly represents the absorption coefficient as a function of the point . We consider a linear absorption regime, i.e., the image extension is negligible with respect to the radiation
300
MAURO D’ARIANO ET AL.
absorption length in the medium. At the same time we neglect any diffraction effect. As shown in Section III.B the customary imaging technique is based on the inverse Radon transform. A tomography of a two-dimensional image W( , *) is a collection of one-dimensional projections p(x, ) at different values of the observation angle . We rewrite here the definition of the Radon transform of W( , *) Z pðx, Þ ¼
þ1 1
dy W ðx þ iyÞei , ðx iyÞei : p
ð271Þ
In Equation (271) x is the current coordinate along the direction orthogonal to the projection and y is the coordinate along the projection direction. The situation is depicted in Figure 23 where W( , *) is plotted along with its p(x, ) profiles for ¼ 0, p/2 for a couple of identical circular holes that are symmetrically disposed with respect to the origin. The reconstruction of the image W( , *) from its projections p(x, )— also called ‘‘back projection’’—is given by the inverse Radon transform, which, following the derivation in Section III.B, leads to the filtering procedure Z
p
Wð , * Þ ¼ 0
d P 2p
Z
þ1
dx 1
@pðx, Þ=@x , x
ð272Þ
where P denotes the Cauchy principal value and ¼ Re( ei). Equation (272) is commonly used in conventional tomographic imaging (see, for example, [117]). Let us now critically consider the above procedure in the case of very weak signals, namely when p(x, ) just represents the probability distribution of random X-ray spots on a fine-mesh multichannel: this situation is sketched in Figure 23(b). From Equation (272) one can recover W( , *) only when the analytical form of p(x, ) is known. But the experimental outcomes of each projection actually are random data distributed according to p(x, ), whereas in order to recover W( , *) from Equation (272) one has to evaluate the first-order derivatives of p(x, ). The need of the analytical form for projections p(x, ) requires a filtering procedure on data, usually obtained by ‘‘splining’’ data in order to use Equation (272). The above procedure leads to approximate image reconstructions, and the choice of any kind of smoothing parameter unavoidably affects in a systematic way the statistics of errors. In the following we show how quantum tomography can be used for conventional imaging in the presence
301
QUANTUM TOMOGRAPHY
of weak signals, providing both ideally controlled resolution and reliable error statistics. The basic formula we will use is the expansion of the Wigner function in the number representation of Equations (230) and (231). In practice, the Hilbert space has to be truncated at some finite dimension dH, and this sets the resolution for the reconstruction of W( , *). However, as we will show, this resolution can be chosen at will, independently of the number of experimental data. As previously noticed, in general an image does not correspond to a Wigner function of a physical state, due to the fact that the Heisenberg relations unavoidably produce only smooth Wigner functions, whereas a conventional image can have very sharp edges. However, if one allows the density matrix to be no longer positive definite (but still trace class), a correspondence with images is obtained, which holds in general. In this way every image is stored into a trace-class matrix n,m via quantum tomography, and a convenient truncation of the matrix dimension dH can be chosen. The connection between images and matrices is the main point of this approach: the information needed to reconstruct the image is stored in a dH dH matrix. For suitably chosen dimension dH the present method can also provide a procedure for image compression. Notice that the correspondence between images and trace-class matrices retains some symmetries of the image, which manifest as algebraic properties of the matrix n,m . For example, an isotropic image (like a uniform circle centered at the origin) is stored in a diagonal matrix. Other symmetries are given in Table 3. The truncated Hilbert space dimension dH sets the imaging resolution. The kind of resolution can be understood by studying the behavior of the kernels R½jn þ dihnj ðx, Þ of Equation (100), which are averaged over the experimental data in order to obtain the matrix elements n,nþd . Outside a region that is almost independent of n and d, all functions R½jn þ dihnj ðx, Þ
TABLE 3 GEOMETRICAL SYMMETRIES OF AN IMAGE, ANALYTICAL PROPERTIES OF PROJECTIONS, ALGEBRAIC PROPERTIES OF THE CORRESPONDING MATRIX (FROM REF. [34]) Symmetry Isotropy X-axis mirror Y-axis mirror Inversion through the origin
AND
p(x, )
pðx, Þ:pðxÞ pðx, p Þ ¼ pðx, Þ pðx, p Þ ¼ pðx, Þ pðx, Þ ¼ pðx, Þ
n,m ¼ n,m n,m n,m 2 R in,m 2 R n,nþ2dþ1 ¼ 0
302
MAURO D’ARIANO ET AL.
FIGURE 24. Tomographic reconstruction of the font ‘‘a’’ for increasing dimension of the truncated matrix, dH ¼ 2, 4, 8, 16, 32, 48. The plot is obtained by averaging the kernel function R½jn þ dihnj ðx, Þ of Equation (100) with assigned analytic transmission profiles pðx, Þ, and then using Equations (230) and (231) (From Ref. [34].)
decrease exponentially, whereas inside this region they oscillate with a number of oscillations linearly increasing with 2n þ d. This behavior produces the effects illustrated in Figure 24, where we report the tomographic reconstruction of the font ‘‘a’’ for increasing dimension dH . The plot is obtained by numerically integrating the kernel functions from given analytic transmission profiles pðx, Þ. As we see from Figure 24 both the radial and the angular resolutions improve versus dH , making the details of the image sharper and sharper already from a relatively small truncation dH ¼ 48. A quantitative measure of the precision of the tomographic reconstruction can be given in terms of the distance D between the true and the
QUANTUM TOMOGRAPHY
303
FIGURE 25. Convergence of both trace and Hilbert distance D in Equation (273) versus the dimensional truncation dH of the Hilbert space. Here the image is a uniform circle of unit radius centered at the origin. The reconstructed matrix elements are obtained as in Figure 24, whereas the exact matrix elements are provided by Equation (274) (From Ref. [34].)
reconstructed image, which, in turn, coincides with the Hilbert distance D between the corresponding density matrices. One has Z D¼ ¼
d 2 jWð , * Þj2 ¼ TrðÞ2 1 X
2n,n þ 2
1 X 1 X
j2n,nþ j2 ,
ð273Þ
n¼0 ¼1
n¼0
where ½. . . ¼ ½. . . true ½. . . reconstructed . The convergence of D versus dH is given in Figure 25 for a solid circle of unit radius centered at the origin. In this case the obtained density matrix has only diagonal elements, according to Table 3. These are given by n,n ¼ 2
n X ¼0
ð2Þ
n
ð1 , 2, 2R2 Þ,
ð274Þ
where ð , , zÞ denotes the confluent hypergeometric function of argument z and parameters and . So far we have analyzed the method only on the basis of given analytic profiles p(x, ). As already said, however, the method is particularly advantageous in the weak-signal/high-resolution situation, where
304
MAURO D’ARIANO ET AL.
FIGURE 26. Monte Carlo simulation of an experimental tomographic reconstruction of the font ‘‘a.’’ The truncation dimension is fixed at dH ¼ 48, and the number of scanning phases is F ¼ 100. The plots correspond to 103 , 104 , 105 , 106 data for each phase, respectively. (From Ref. [34].)
the imaging can be achieved directly from averaging the kernel functions on data. In this case the procedure allows one to exploit the whole available experimental resolution, whereas the image resolution is set at will. In Figure 26 we report a Monte Carlo simulation of an experimental tomographic reconstruction of the font ‘‘a’’ for increasing number of data. All plots are obtained at the maximum available dimension dH ¼ 48, and using F ¼ 100 scanning phases. The situation occurring for small numbers of data is given in the first plot, where the highly resolved image still exhibits the natural statistical fluctuations due to the limited number of data. For a larger sample the image appears sharper from the random background, and it is clearly recognizable for a number of data equal to 106 . The method is efficient also from the computational point of view, as the time needed for image reconstruction is quadratic in the number of elements of the density matrix, and linear in the number of experimental data. Needless to say, imaging by quantum homodyne tomography is at the very early stages and further investigation is in order.
QUANTUM TOMOGRAPHY
305
ACKNOWLEDGMENTS The writing of this chapter has been cosponsored by the Italian Ministero dell’Istruzione, dell’Universita’ e della Ricerca (MIUR) under the Cofinanziamento 2002 Entanglement assisted high precision measurements, the Istituto Nazionale di Fisica della Materia under the project PRA-2002CLON, and by the European Community programs ATESIT (Contract No. IST-2000-29681) and EQUIP (Contract No. IST-1999-11053). G. M. D. acknowledges partial support by the Department of Defense Multidisciplinary University Research Initiative (MURI) program administered by the Army Research Office under Grant No. DAAD19-00-1-0177. M. G. A. P. is research fellow at Collegio Alessandro Volta.
REFERENCES 1. Heisenberg, W. (1927). Zeit. fu¨r Physik 43, 172; Heisenberg, W. (1930). The Physical Principle of Quantum Theory. Dover, NY: Univ. Chicago Press. 2. von Neumann, J. (1955). Mathematical Foundations of Quantum Mechanics. Princeton, NJ: Princeton Univ. Press. 3. Wootters, W. K., and Zurek, W. H. (1982). Nature 299, 802; Yuen, H. P. (1986). Phys. Lett. A 113, 405. 4. D’Ariano, G. M., and Yuen, H. P. (1996). Phys. Rev. Lett. 76, 2832. 5. Fano, U. (1957). Rev. Mod. Phys. 29, 74. 6. Smithey, D. T., Beck, M., Raymer, M. G., and Faridani, A. (1993). Phys. Rev. Lett. 70, 1244; Raymer, M. G., Beck, M., and McAlister, D. F. (1994). Phys. Rev. Lett. 72, 1137; Smithey, D. T., Beck, M., Cooper, J., and Raymer, M. G. (1993). Phys. Rev. A 48, 3159. 7. Vogel, K., and Risken, H. (1989). Phys. Rev. A 40, 2847. 8. D’Ariano, G. M., Macchiavello, C., and Paris, M. G. A. (1994). Phys. Rev. A 50, 4298. 9. D’Ariano, G. M., Leonhardt, U., and Paul, H. (1995). Phys. Rev. A 52, R1801. 10. Munroe, M., Boggavarapu, D., Anderson, M. E., and Raymer, M. G. (1995). Phys. Rev. A 52, R924. 11. Schiller, S., Breitenbach, G., Pereira, S. F., Mu¨ller, T., and Mlynek, J. (1996). Phys. Rev. Lett. 77, 2933; Breitenbach, G., Schiller, S., and Mlynek, J. (1997). Nature 387, 471. 12. Janicke, U., and Wilkens, M. (1995). J. Mod. Opt. 42, 2183; Wallentowitz, S., and Vogel, W. (1995). Phys. Rev. Lett. 75, 2932; Kienle, S. H., Freiberger, M., Schleich, W. P., and Raymer, M. G. (1997). In Experimental Metaphysics: Quantum Mechanical Studies for Abner Shimony, edited by S. Cohen et al. Lancaster: Kluwer, pp. 121. 13. Dunn, T. J., Walmsley, I. A., and Mukamel, S. (1995). Phys. Rev. Lett. 74, 884. 14. Kurtsiefer, C., Pfau, T., and Mlynek, J. (1997). Nature 386, 150. 15. Leibfried, D., Meekhof, D. M., King, B. E., Monroe, C., Itano, W. M., and Wineland, D. J. (1996). Phys. Rev. Lett. 77, 4281. 16. D’Ariano, G. M., (1997). Quantum Communication, Computing, and Measurement, edited by O. Hirota, A. S. Holevo, and C. M. Caves, New York and London: Plenum, pp. 253. 17. D’Ariano, G. M., Kumar, P., and Sacchi, M. F. (2000). Phys. Rev. A 61, 013806.
306
MAURO D’ARIANO ET AL.
18. D’Ariano, G. M. (2000). Quantum Communication, Computing, and Measurement, edited by P. Kumar, G. M. D’Ariano, and O. Hirota. New York and London: Kluwer Academic/ Plenum, pp. 137. 19. Paini, M. preprint quant-ph/0002078. 20. D’Ariano, G. M. (2000). Phys. Lett. A 268, 151. 21. Cassinelli, G., D’Ariano, G. M., De Vito, E., and Levrero, A. (2000). J. Math. Phys. 41, 7940. 22. D’Ariano, G. M., and Paris, M. G. A. (1998). Acta Phys. Slov. 48, 191; D’Ariano, G. M., and Paris, M. G. A. (1999). Phys. Rev. A 60, 518. 23. Banaszek, K., D’Ariano, G. M., Paris, M. G. A., and Sacchi, M. F. (2000). Phys. Rev. A 61, R010304. 24. D’Ariano, G. M., Maccone, L., and Paris, M. G. A. (2001). Phys. Lett. A 276, 25; (2001) J. Phys. A 34, 93. 25. D’Ariano, G. M., and Lo Presti, P. (2001). Phys. Rev. Lett. 86, 4195. 26. Wigner, E. P. (1932). Phys. Rev. 40, 749. 27. Cahill, K. E., and Glauber, R. J. (1969). Phys. Rev. 177, 1857, 1882. 28. D’Ariano, G. M., Maccone, L., and Paini, M. (2003). J. Opt. B 5, 77. 29. D’Ariano, G. M., and Paris, M. G. A. (1997). Phys. Lett. A 233, 49. 30. D’Ariano, G. M., Kumar, P., and Sacchi, M. F. (1999). Phys. Rev. A 59, 826. 31. D’Ariano, G. M., Kumar, P., Macchiavello, C., Maccone, L., and Sterpi, N. (1999). Phys. Rev. Lett. 83, 2490. 32. D’Ariano, G. M., De Laurentis, M., Paris, M. G. A., Porzio, A., and Solimeno, S. (2002). J. Opt. B 4, 127. 33. D’Ariano, G. M., Paris, M. G. A., and Sacchi, M. F. (2000). Phys. Rev. A 62, 023815; (2001). Phys. Rev. A 64, 019903(E). 34. D’Ariano, G.M., Macchiavello, C., and Paris, M.G.A. (1996). Opt. Comm. 129, 6. 35. D’Ariano, G. M. D. and Sacchi, M. F., (1997). Nuovo Cimento 112B, 881. 36. Kim, Y. S., and Zachary, W. W. (1986). The Physics of Phase Space. Berlin: Springer. 37. Gardiner, C. W. (1991). Quantum Noise. Berlin: Springer-Verlag. 38. Weyl, H. (1950). The Theory of Groups and Quantum Mechanics. New York: Dover. 39. Cahill, K. E. (1965). Phys. Rev. 138, B1566. 40. D’Ariano, G. M., Fortunato, M., and Tombesi, P. (1995). Nuovo Cimento 110B, 1127. 41. Lee, C. T. (1991) Phys. Rev. A 44, R2775; (1995). Phys. Rev. A 52, 3374. 42. Kelley, P. L., and Kleiner, W. H. (1994). Phys. Rev. 136, 316. 43. Yuen, H. P., and Chan, V. Y. S. (1983). Opt. Lett. 8, 177. 44. Abbas, G. L., Chan, V. W. S., Yee, S. T. (1983). Opt. Lett. 8, 419 (1985). IEEE J. Light. Tech. LT-3, 1110. 45. D’Ariano, G. M. (1992). Nuovo Cimento 107B, 643. 46. D’Ariano, G. M. (1997). Quantum estimation theory and optical detection, in Quantum Optics and the Spectroscopy of Solids, edited by T. Hakiog˘lu and A. S. Shumovsky, Dordrecht: Kluwer, pp. 139–174. 47. Wodkiewicz, K., and Eberly, J. H. (1985). JOSA B 2, 458. 48. D’Ariano, G. M. (1992). Int. J. Mod. Phys. B 6, 1291. 49. Yuen, H. P., and Shapiro, J. H. (1978). IEEE Trans. Inf. Theory 24, 657. (1979). 25, 179. 50. Yuen, H. P., and Shapiro, J. H. (1980). IEEE Trans. Inf. Theory 26, 78. 51. D’Ariano, G. M., Lo Presti, P., and Sacchi, M. F. (2000). Phys. Lett. A 272, 32. 52. D’Ariano, G. M., and Sacchi, M. F. (1995). Phys. Rev. A 52, R4309. 53. Helstrom, C. W. (1976). Quantum Detection and Estimation Theory. New York: Academic Press. 54. Arthurs, E., and Kelly, J. L., Jr. (1965). Bell System Tech. J. 44, 725.
QUANTUM TOMOGRAPHY 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73.
74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95.
307
Yuen, H. P. (1982). Phys. Lett. 91A, 101. Arthurs, E., and Goodman, M. S. (1988). Phys. Rev. Lett. 60, 2447. D’Ariano, G. M., Macchiavello, C., and Paris, M. G. A. (1995). Phys. Lett. A 198, 286. Paris, M. G. A. (1996). Phys. Rev. A 53, 2658. Baltin, R. (1983). J. Phys. A 16, 2721; (1984). Phys. Lett. A 102, 332. Paris, M. G. A. (1996). Opt. Comm. 124, 277. Shapiro, J. H., and Wagner, S. S. (1984). IEEE J. Quant. Electron. QE-20, 803. Shapiro, J. H. (1985). IEEE J. Quant. Electron. QE-21, 237. Walker, N. G., (1987). J. Mod. Opt. 34, 15. Lai, Y., and Haus, H. A. (1989). Quantum Opt. 1, 99. Paris, M. G. A., Chizhov, A., and Steuernagel, O. (1986). Opt. Comm. 134, 117. Zucchetti, A., Vogel, W., and Welsch, D. G. (1996). Phys. Rev. A 54, 856. D’Ariano, G. M. (2002). Phys. Lett. A 300, 1. Opatrny, T., and Welsch, D.-G. (1999). Prog. Opt. XXXIX, 63. Weigert, S. (2000). Phys. Rev. Lett. 84, 802. Kim, C., and Kumar, P. (1994). Phys. Rev. Lett. 73, 1605. Richter, Th. (1996). Phys. Lett. A 211, 327. Leonhardt, U., Munroe, M., Kiss, T., Raymer, M. G., and Richter, Th. (1996). Opt. Comm. 127, 144. Kiss, T., Herzog, U., and Leonhardt, U. (1995). Phys. Rev. A 52, 2433; D’Ariano, G. M., and Macchiavello, C. (1998). Phys. Rev. A 57, 3131; Kiss, T., Herzog, U., and Leonhardt, U. (1998). Phys. Rev. A 57, 3131. D’Ariano, G. M., Mancini, S., Manko, V. I., and Tombesi, P. Q. (1996). Opt. 8, 1017. Banaszek, K. and Wodkievicz, K., (1996). Phys. Rev. Lett. 76, 4344. Opatrny, T., and Welsch, D.-G. (1997). Phys. Rev. A 55, 1462. Murnaghan, F. D. (1938). The Theory of Group Representation. Baltimore: Johns Hopkins Press, pp. 216. Leonhardt, U., and Raymer, M. G. (1996). Phys. Rev. Lett. 76, 1985. D’Ariano, G. M. (1999). Acta Phys. Slov. 49, 513. D’Ariano, G. M., Macchiavello, C., and Sterpi, N. (1997). Quantum Semiclass. Opt. 9, 929. Gradshteyn, I. S. and Ryzhik, I. M. (1980). Table of Integrals, Series, and Products. New York: Academic Press. Richter, Th. (1996). Phys. Rev. A 53, 1197. Orlowsky, A., and Wu¨nsche, A. (1993). Phys. Rev. A 48, 4617. Holevo, A. S. (1982). Probabilistic and Statistical Aspects of Quantum Theory. Amsterdam: North-Holland. Shapiro, J. H., and Shakeel, A. (1997). JOSA B 14, 232. Vasilyev, M., Choi, S.-K., Kumar, P., and D’Ariano, G. M. (2000). Phys. Rev. Lett. 84, 2354. D’Ariano, G. M., Vasilyev, M., and Kumar, P. (1998). Phys. Rev. A 58, 636. Mandel, L. (1958). Proc. Phys. Soc. 72, 1037. Hong, C. K., and Mandel, L. (1985). Phys. Rev. Lett. 54, 323; (1985). Phys. Rev. A 32, 974. Hillery, M. (1987). Phys. Rev. A 36, 3796. Special issues on squeezed states: J. Opt. Soc. Am. B 4 (1987); J. Mod. Opt. 34 (1987). Tombesi, P., and Pike, E. R., eds. (1989). Squeezed and Non-classical Light. New York: Plenum. Agarwal, G. S., and Tara, K. (1992). Phys. Rev. A 46, 485. Klyshko, D. N. (1994). Phys. Usp. 37, 1097; (1996). Phys. Usp. 39, 573; (1996). Phys. Lett. A 213, 7. De Martini, F. et al., eds. (1996). Quantum Interferometry. Wenheim: VCH.
308 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117.
MAURO D’ARIANO ET AL. Arvind, N., Mukunda, N., and Simon, R. (1998). J. Phys. A 31, 565. Schleich, W., and Wheeler, J. A. (1987). Nature 326, 574. Yuen, H. P. (1976). Phys. Rev. A 13, 2226. Yamamoto, Y., and Haus, H. A. (1986). Rev. Mod. Phys. 58, 1001. Bandilla, A., Drobny´, G., and Jex, I. (1995). Phys. Rev. Lett. 75, 4019; (1996). Phys. Rev. A 53, 507. Lu¨ders, G. (1951). Ann. Physik 8, 322. See, for example E. P. Wigner (1963). Am. J. Phys. 31, 6; Imoto, N., Ueda, M., and Ogawa, T. (1990). Phys. Rev. A 41, 4127. Ozawa, M. (1987). Ann. Phys. 259, 121, and references therein. Beck, M. (2000). Phys. Rev. Lett. 84, 5748. Chuang, I. L., and Nielsen, M. A. (1997). J. Mod. Opt. 44, 2455. D’Ariano, G. M., and Maccone, L. (1998). Phys. Rev. Lett. 80, 5465. Sacchi, M. F. (2001). Phys. Rev. A 63, 054104. De Martini, F., D’Ariano, G. M., Mazzei, A., and Ricci, M. (2003). Fortschr. Phys. 51, 342 and (2003). Phys. Rev. A 87, 062307. Kraus, K. (1983). States, Effects, and Operations. Berlin: Springer-Verlag. Hradil, Z. (1997). Phys. Rev. A 55, R1561; in the context of phase measurement, see Braunstein, S. L., Lane, A. S., and Caves, C. M. (1992). Phys. Rev. Lett. 69, 2153. Banaszek, K. (1998). Phys. Rev. A 57, 5013. Cramer, H. (1946). Mathematical Methods of Statistics. Princeton, NJ: Princeton University Press. Householder, A. S. (1964). The Theory of Matrices in Numerical Analysis. New York: Blaisdell, Sec. 5.2. Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992). Numerical Recipes in Fortran. Cambridge: Cambridge Univ. Press, Sec. 10.4. Arecchi, F. T., Courtens, E., Gilmore, R., and Thomas, H. (1972). Phys. Rev. A 6, 2211. Natterer, F. (1986). The Mathematics of Computerized Tomography. Wiley. Mansfield, P., and Morris, P. G. (1982). NMR Imaging in Biomedicine. Academic Press.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 128
Scanning Low-Energy Electron Microscopy ILONA MU¨LLEROVA´ and LUDE˘K FRANK Institute of Scientific Instruments AS CR, Kra´lovopolska´ 147, CZ-61264 Brno, Czech Republic
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . II. Motivations to Lower the Electron Energy . . . . . . A. Extensions to Conventional Modes of Operation B. New Opportunities . . . . . . . . . . . . . . . . . C. Issues Inherent to Slow Electron Beams . . . . . III. Interaction of Slow Electrons with Solids. . . . . . . A. Elastic Scattering . . . . . . . . . . . . . . . . . . 1. Scattering on Nuclei . . . . . . . . . . . . . . 2. Reflection on Energy Gaps . . . . . . . . . . . B. Inelastic Scattering . . . . . . . . . . . . . . . . . 1. Scattering on Electrons . . . . . . . . . . . . . 2. Scattering on Atoms . . . . . . . . . . . . . . C. Penetration of Electrons . . . . . . . . . . . . . . D. Heating and Damage of the Specimen . . . . . . E. Specimen Charging . . . . . . . . . . . . . . . . . F. Tools for Simulation of Electron Scattering . . . IV. Emission of Electrons . . . . . . . . . . . . . . . . . A. Electron Backscattering . . . . . . . . . . . . . . B. Crystallinity Effects . . . . . . . . . . . . . . . . . C. Coherence within the Primary Beam Spot . . . . D. Secondary Electron Emission . . . . . . . . . . . V. Formation of the Primary Beam . . . . . . . . . . . A. The Spot Size . . . . . . . . . . . . . . . . . . . . B. Incorporation of the Retarding Field . . . . . . . C. The Cathode Lens . . . . . . . . . . . . . . . . . D. The Pixel Size. . . . . . . . . . . . . . . . . . . . E. Spurious Effects . . . . . . . . . . . . . . . . . . . F. Testing the Resolution . . . . . . . . . . . . . . . VI. Detection and Specimen-Related Issues . . . . . . . . A. Detection Strategies . . . . . . . . . . . . . . . . B. Detectors . . . . . . . . . . . . . . . . . . . . . . C. Signal Composition. . . . . . . . . . . . . . . . . D. Specimen Surface . . . . . . . . . . . . . . . . . . E. Specimen Tilt . . . . . . . . . . . . . . . . . . . . VII. Instruments . . . . . . . . . . . . . . . . . . . . . . . A. Adaptation of Conventional SEMs . . . . . . . . B. Dedicated Equipment. . . . . . . . . . . . . . . . C. Alignment and Operation . . . . . . . . . . . . .
309
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
310 314 314 316 317 319 320 320 323 324 324 328 331 334 336 340 343 345 350 353 354 361 362 366 369 374 377 379 381 382 387 393 394 397 399 399 401 407
Copyright ß 2003 Elsevier Inc. All rights reserved. 1076-5670/2003 $35.00
310
MU¨LLEROVA´ AND FRANK
D. Practical Issues . . . . . . . . . . . . . VIII. Selected Applications . . . . . . . . . . . . A. Prospective Application Areas. . . . . B. General Characteristics of Micrograph C. Surface Relief . . . . . . . . . . . . . . D. Critical Energy Mode . . . . . . . . . E. Diffraction Contrast . . . . . . . . . . F. Contrast of Crystal Orientation . . . . G. Layered Structures . . . . . . . . . . . H. Material Contrast . . . . . . . . . . . I. Electronic Contrast in Semiconductors J. Energy-Band Contrast . . . . . . . . . IX. Conclusions . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
410 413 414 415 417 418 419 422 422 425 426 430 431 432 432
I. INTRODUCTION Two versions of the electron microscope, the directly imaging (usually TEM, transmission electron microscope) and scanning (SEM) models, have coexisted in the instrument market and in laboratories for decades and none of them seems likely to lose ground. At certain periods, one or the other attracts enhanced attention and makes a more significant step forward, leaving afterwards the momentary leadership to the competing principle. A period rich in innovations has been experienced by SEM designers and brought two successful novelties, namely the environmental SEM with the specimen surrounded by a gas at a pressure of thousands of Pa and high-resolution imaging at electron beam energies down to fractions of eV. Our purpose here is to review the theoretical and practical aspects of the latter and to present the method as being already fully feasible and worth employing in the majority of SEM application areas. The term ‘‘low-energy’’ electron is obviously of a qualitative nature and should be given some quantitative limits. This limit could be best based on characteristics of the electron interaction with solids that provides the image signal in SEM. Examining the typical energy dependences of all relevant quantities connected with this interaction, we can find good reasons for the definition of two such energy intervals instead of only one. Figure 1 shows the atomic number dependence of the so-called second critical energy EC2, i.e., the higher of two electron impact energies at which the total electron emission yield is equal to one (or to 100%). These energies exist for nearly all solids with a few exceptions such as conductors of lowest mean atomic numbers, for which the total emission does not exceed the unit level. Above EC2 the electron emission decreases monotonically and no thresholds can be
SCANNING LEEM
311
FIGURE 1. The higher of the critical energies for normal electron impact, EC2, at which the total electron yield is equal to one, plotted versus the atomic number for conductive chemical elements; data were collected from Bauer and Seiler (1984) and Zadrazˇil and El-Gomati (2002).
identified. On the contrary, the energy EC2 alone represents a significant breakpoint at which the specimen charging changes its sign. As the graph shows, a value like 5 keV can be taken as a margin of this interval. Thus, let us consider the ‘‘low-energy’’ range below 5 keV beam energy. As will be shown subsequently, around this threshold the yield of backscattered electrons (BSE) also loses its monotonic dependence on the atomic number, which exists at higher energies, so that the conventional material contrast ceases to be reliably usable. In Figure 2, the well-known plot of the inelastic mean free path (IMFP) of electrons is shown versus energy for numerous elements and compounds. The deep minimum at approximately 50 eV represents another crucial threshold: the IMFP starts to grow below this point because the main interaction phenomena, the secondary electron (SE) emission in particular, settle here and a fundamentally new situation emerges for the scanned imaging. So let us define also the ‘‘very low energy’’ interval below 50 eV. Later we will see that this energy range can be further subdivided but this would miss its practical purpose. The commercial SEM instruments traditionally used primary beam energies of 15–30 keV as a compromise between sufficiently small spot size and reasonable SE emission. The series of preadjusted beam energies in SEM mostly ended at 5 keV and when even lower energies were possible, good-quality micrographs were not acquired. Progress in the computeraided design methods for electron optics opened ways to tailoring the objective lenses and even full columns to desired parameters and afterwards the SEM instruments entered the low-energy range down to about 1 keV.
312
MU¨LLEROVA´ AND FRANK
FIGURE 2. The energy dependence of the inelastic mean free path of electrons; the dots represent various elements and compounds. (Reprinted with permission from Seah and Dench, 1979.)
The motivation included the suppressed charging and better visualization of surface relief details that projected itself, among others, into more precise measurement of distances in the images. This experience made the lowenergy range known and acceptable for the community of microscopists but no trends to push the energy further down have been apparent albeit possible sources of motivation existed for a long time in experimental areas adjacent to SEM. The so-called emission electron microscope (EEM) is in fact one of the oldest versions of EM. In this type of directly imaging microscope the specimen itself is the source of electrons, which are emitted under excitations that include impact of photons, electrons and/or ions or high-temperature heating. More than 60 years ago Recknagel (1941) published a theoretical study showing that the immersion objective lens, a crucial part of EEM that accelerates the electrons emitted at quite low energy E0 to some final energy E and forms the first image of the emitting surface, has surprisingly good properties. Its basic aberration coefficients are proportional to the ratio E0/E so that they decrease even for the lowest emission energies. Of the EEM versions mentioned, the photoemission one (PEEM) is most often met in laboratories at present, partly because of the progress in this method, connected with extended availability of intensive radiation sources at synchrotrons. However, for us another version of the EEM is most important, namely the low-energy electron microscope (LEEM), in which the specimen excitation is made via a parallel coherent wave of slow
SCANNING LEEM
313
electrons. The method and instrument were first proposed by Bauer (1962) and demonstration experiments were later performed by Delong and Drahosˇ (1971). Only in the 1980s did the first micrographs appear in the literature (Telieps and Bauer, 1985) but since then the method boomed; for a review see Bauer (1994), while more practical aspects are summarized by Veneklasen (1992). Although the LEEM apparatus remains an expensive tool for top specialists, it produced the most attractive and fruitful results from among the surface examination methods except, maybe, probe microscopies. The scanning LEEM (SLEEM) method described here aims at achieving similar results as regards observability of surface-localized physical phenomena, with possibilities of extension toward multiple signal detection. The idea of reversing the function of the immersion objective lens with respect to that in EEM can be originally found in Zworykin et al. (1942) where an electrostatic SEM with biased specimen is outlined. Adaptation to a conventional SEM, by inserting a retarding field element below its objective lens, was published by Paden and Nixon (1968). Yau et al. (1981) demonstrated lowering of the aberration coefficients by means of a retarding field, either overlapped over the focusing magnetic field or arranged sequentially, and even measured the aberration coefficients down to tens of mm at very low energies but their aim was solely to improve tools for electron lithography and annealing and they did not consider any application to scanned low-energy imaging. Many other attempts to retard the primary beam electrons before their impact onto the specimen were published and this history is reviewed by Mu¨llerova´ and Lenc (1992a). It is interesting that, although many of the previous studies proposed quite feasible solutions to the problem of decelerating the beam in SEM, none of the reviewed papers presented convincing results, i.e., micrographs collected throughout the full energy scale. To our knowledge, the first such series was published by us (Mu¨llerova´ and Frank, 1993) together with practical experience from adaptation of a commercial SEM to the SLEEM method. The low-energy microscopy program at ISI Brno was started in the 1960s (see above) and after a long break it continued with the first demonstration experiments with the SLEEM method (Mu¨llerova´ and Lenc, 1992b) and theoretical examination of properties of the immersion objective lens (IOL) (Lenc and Mu¨llerova´, 1992a,b). The problem of IOL and its optimization was systematically treated by Rose and Preikszas (1992) and Preikszas and Rose (1995). More literature references will be given below. One could easily conclude that the method and corresponding apparatus have been sufficiently explored to appear on the instrument market and to
314
MU¨LLEROVA´ AND FRANK
enter the broad community of users. Moreover, the method may be launched after quite moderate adaptation to a conventional SEM. Nevertheless, the small number of existing instruments corresponds to only a handful of users, who do not represent a sufficient marketing target, so that the barrier has not been broken yet. The first commercial device is expected in 2003. In the following the SLEEM method will be discussed in detail from all fundamental viewpoints so that the reader can comprehend it and even start to use it. The scope of the application results is still quite limited and awaits additional users who could contribute to filling the obscure areas in the interpretation of the contrast observed.
II. MOTIVATIONS
TO
LOWER
THE
ELECTRON ENERGY
The low-energy range below 5 keV is now available in commercial SEM instruments and widely used, for observation of nonconductors, for measurement of dimensions in images, for improved observation of surface relief, etc. In this chapter we will summarize the main advantages of working in this energy range and then continue with the very-lowenergy range. Let us mention here that practical experience with image contrast (and therefore also the awareness of motivation for using it) is quite naturally concentrated mainly in the energy ranges of commercial instruments. These mostly provide quality imaging down to 1 keV where the resolution value is often still guaranteed. Somewhere below 1 keV the imaging properties ‘‘break down’’ and the image quality becomes unacceptable—this threshold is usually met around 500 eV. For microscopes containing compound objective lenses, this limit is shifted to about 200 eV and the performance of devices equipped with an aberration corrector is similar. A. Extensions to Conventional Modes of Operation
It is well known (and also evident from Figures 13 and 16) that the total electron yield per incident electron, acquired from any known specimen, increases when the primary beam energy is reduced below its usual value of 15 to 30 keV. This is because of shrinkage of the interaction volume together with the path of generated secondary electrons towards the surface, which reduces absorption. Because is generally less than 1 at high energies, its increase leads to a decrease in the portion of electrons dissipated in the specimen, suppression of
SCANNING LEEM
315
charging of nonconductive specimens, and lesser demands on requirements to make them conductive. At the critical energy EC2 we get ¼ 1 and a true noncharging microscopy is possible (Frank and Mu¨llerova´, 1994). As the SE yield keeps growing even when below EC2, a significant signal increase with respect to the traditional beam energies is achieved, which projects itself into improved signal-to-noise ratio (SNR) in the image. Measurements on elemental specimens showed that the SE signal maximum appears at energy Em between 100 and 800 eV (e.g., Seiler, 1983) and below this energy the yield again falls. The so-called material contrast, based on direct proportionality between the mean atomic number of the specimen and the yield of backscattered electrons, which is reliably available at tens of keV, disappears in the low energy range in the sense that the (E ) curves for various specimens start to cross each other (see Figure 13). Instead, for particular combinations of materials, optimum energies can be found at which the mutual contrast reaches its maximum (see Mu¨llerova´, 2001). As the interaction volume of slower electrons diminishes, information generated in the microscope becomes better localized and more sensitive to the true surface, which is then also more truly visualized. Tiny protrusions and ridges appear on facets that were apparently smooth at higher beam energies. The so-called edge effect, i.e., overbrightening of steeply inclined facets or side walls of surface steps that is apparent and mostly dominant at tens of keV, diminishes here and fully disappears somewhere below 500 eV (in fact near Em for the maximum SE yield). The reason is that the penetration depth of primary electrons (PE) shortens and approaches the escape depth of SE. Consequently, all generated SE are emitted and no surface steps can extend the emitting area. Experience has shown that, in the range of hundreds of eV, a contrast between various grains appears on polycrystalline specimens. This phenomenon needs to be explored more carefully because in principle there are several possible explanations: in addition to dependence of both the generation and absorption of SE and of the electron backscattering on the crystal orientation, the presence of thin surface films can also play a role as these layers cease to be transparent here and their thickness (like that of oxides) is also orientation dependent. The energy dissipation in the specimen is clearly smaller at low energies—each electron delivers less energy. The increase in emission of slow SE makes no significant change and the BSE emission, which is responsible for the great majority of the energy export, is roughly
316
MU¨LLEROVA´ AND FRANK
constant down to hundreds of eV (see Figure 13). Nevertheless, at higher energies and still at 1 keV the penetration depth (or electron range) decreases faster than linearly (see Bo¨ngeler et al., 1993) so that the spatial density of dissipated energy grows. Then this decrease gradually becomes linear (Joy and Joy, 1996) and the density of dissipated energy stays constant because the deficit in the energy income is just compensated with thinning of the interaction layer. The question of radiation damage is even more complicated than the previous issue. In addition to changes in the total amount of dissipated energy and in its spatial density, the cross-sections for various inelastic phenomena also depend on the energy of incident electrons. For example, cracking of hydrocarbon molecules and creation of the contamination graphitic layer is most effective for electron impact at about 100 to 200 eV. Only in the very-low-energy range do the elastic collisions dominate and the radiation damage diminishes and disappears.
B. New Opportunities
In the very-low-energy range, the wavelength of incident electrons, l [nm] ¼ 1.226 {E [eV]}1/2, becomes comparable with the interatomic distances. As in the classical LEED (low-energy electron diffraction) apparatus, the angular distribution of the reflected electrons is strongly anisotropic and the intensity is concentrated into diffracted beams. In our case, the incident wave is convergent and one can refer to the CBED (convergent beam electron diffraction) method used in the STEM (scanning transmission electron microscope). Selective detection of some diffracted beams enables one to directly visualize the surface crystallinity and its possible changes. Flat clean crystal surfaces are composed of terraces that are smooth on the atomic level and separated by steps of a height of one or more atoms. If the primary spot illuminates a terrace margin and the electron wavelength is in suitable relation to the step height, the two parts of the wavefront, reflected on adjacent terraces, can interfere (the dividedwavefront interference) constructively or destructively and reveal the step although the point resolution of the microscope does not reach the atomic level. A similar phenomenon can be observed when the interference concerns electrons reflected from upper and lower interfaces of a thin surface film (the divided-amplitude interference). At wedge-shaped layers,
SCANNING LEEM
317
equal-thickness stripes (an analogy to Newton’s rings) should be observed. Electrons impacting the solid with energy just above the surface potential barrier are, according to laws of quantum mechanics, subject to partial reflection so that the height and shape of the barrier can be sensed. It is known from LEED experiments that the electron reflection (Bauer, 1994; Bartosˇ et al., 1996) is inversely proportional to the local density of electron states coupled to the incident wave. This phenomenon can appear only below 20 or 30 eV of landing energy. The contrast based on the local density of states enables one to directly observe the energy band structure, which opens ways to attractive applications, e.g., in development and diagnostics of semiconductor structures (Frank et al., 2002; Mu¨llerova´ et al., 2002). Already in the low-energy range (and especially for heavier specimens), the elastic electron scattering displays behavior that can be described solely by the quantum mechanical Mott cross-sections, which incorporate screening of the nucleus by electrons, existence of the spin, and the spin–orbital interaction (see, e.g., Reimer, 1998). Thus, the electron spin influences the image signal and the magnetic microstructure becomes observable provided a spin-polarized beam is used for the illumination (Bauer, 1994). As mentioned above, below about 20 to 30 eV elastic collisions of incident electrons start to dominate so that radiation damage becomes negligible. This can be important for examination of highly sensitive materials and also, for example, for interoperational checks in semiconductor production where any damage should be avoided.
C. Issues Inherent to Slow Electron Beams Problems with the generation of sufficiently small electron probes in the lowenergy range have been solved reasonably satisfactorily and these beam energies are available in modern instruments. Nevertheless, the low-energy range has been explored only down to about 200 eV, as already noted. Let us now summarize the problems that have to be overcome when lowering the beam energy with an instrument having this energy constant throughout the microscope column.
The chromatic aberration of electron lenses depends on the ratio E/E, where the energy spread E within the beam is given by the emission mechanism used in the gun and E is the beam energy within the lens.
318
MU¨LLEROVA´ AND FRANK
Obviously, the chromatic aberration disc enlarges with decreasing energy and in the low-energy range it usually dominates as regards the image resolution and affects this crucial characteristic adversely. The diffraction aberration, i.e., the size of the Airy disc arising from interference of the unscattered electron wave, passing through the aperture-restricting diaphragm, with the marginal wave scattered on the diaphragm edge, is proportional to the wavelength l, i.e., to E1/2. Thus, this contribution to the final spot size also grows at low energies. The electron current extracted from the gun is proportional to the extraction voltage. For thermionic cathodes, the gun brightness is linearly proportional to E (Reimer, 1998). For Schottky- and fieldemission guns, this proportionality is not so simple because the first acceleration voltage controls the emission and the final beam energy is adjusted afterwards. But the beam current always decreases with decreasing energy. In spite of some screening against the spurious electromagnetic fields coming from the environment, particularly the a.c. ones, which is secured by the material of magnetic circuits, some undesired influence is usually observed. This grows strongly at low energies and is proportional to the time of flight through the column, i.e., to E 1/2. The situation is most critical in the ultrahigh-vacuum (UHV) devices where the chamber walls are traditionally made of nonmagnetic materials. Any narrow directed beams of charged particles suffer from mutual interaction of those particles via Coulomb forces. Particularly in crossovers, the mutual repulsion intensifies so that the size of these crossovers becomes larger than that given by geometrical optics. The consequences of the inner interaction are strongly dependent on the beam current, and for low currents and Gaussian beam profile the crossover broadening is proportional to E3/2 (Spehr, 1985). In connection with the previous point we should also mention that another consequence of the electron–electron interaction within the beam, and again mainly in crossovers, is broadening of the energy spread (so-called Boersch effect). Here again the situation depends on the beam current and also on the crossover shape and dimensions; for stigmatic focusing the mean quadratic broadening E is proportional to E1/4 (Rose and Spehr, 1980) so that figures are even slightly more favorable at low energies. The conventional detector of secondary electrons of the Everhart– Thornley (ET) type (Everhart and Thornley, 1960) relies upon extraction of SE by means of the front-grid bias to about 300 to 500 V and subsequent acceleration of them with the scintillator potential of
319
SCANNING LEEM
about 10 kV. These electrostatic fields, oriented laterally to the optic axis, might be found damaging to the primary beam geometry in the low-energy range. Thus, novel approaches to the detection strategy are needed. With decreasing electron energy and reduction of the active depth of signal generation, the surface cleanliness becomes more important. At energies near to the minimum of IMFP the penetration depth of PE becomes comparable with the thickness of contamination layers, both that of the graphitic carbon from cracked hydrocarbon molecules and that of oxide or other products of the surface reactivity. From this point of view, the vacuum conditions within the specimen chamber become more important, as in the case of electron spectroscopies. However, in the very-low-energy range the IMFP again extends and normal vacuum demands are restored. It is obvious from this list of issues that major obstacles arise from physical principles and can only be avoided by keeping the primary electron beam at high energy for as long as possible and decelerating it only shortly before its impact on the specimen. This approach has already been applied in various modifications that will be outlined here and completed with some new data and experience of the authors.
III. INTERACTION
OF
SLOW ELECTRONS
WITH
SOLIDS
The physics of electron scattering and diffusion in solids is described in many original papers and also good textbooks. Precise and sufficiently detailed analysis of the problem for nonspecialists can be found in the book of Reimer (1998) and a condensed review of the scattering phenomena suffered by very slow electrons has been published by Bauer (1994). In this chapter we summarize the main approaches only briefly and depict basic differences inherent to low- and very-low-energy ranges. Elastic scattering on atom nuclei and inelastic scattering connected with excitation of electrons belonging to the target are fundamental processes determining the range of incident electrons, in-depth distribution of the ionization processes, and consequently also emission of the secondary and backscattered electrons. In order to characterize the individual scattering processes, usually the quantity known as the differential cross-section d/d is used, which shows the relation between the distance of the original electron trajectory from the scatterer and the angle of its deflection , and represents the probability that an electron approaching the target will be scattered into a solid angle d.
320
MU¨LLEROVA´ AND FRANK
Integrating over we get the total cross-section . Multiple scattering is described via statistical quantities, first of all by the mean free path between collisions. Multiple elastic scattering causes broadening of the incident beam up to possible backscattering while multiple inelastic scattering causes gradual loss of the electron energy along its trajectory. For us, the most important inelastic process is release of a secondary electron.
A. Elastic Scattering 1. Scattering on Nuclei Scattering of incident electrons on nuclei of the specimen atoms is considered elastic when the mass of the nucleus is regarded as so large with respect to the mass of the electron that after interaction the nucleus remains in rest. This simplifying assumption neglects generation of phonons, which becomes apparent particularly at the lowest electron energies where other losses already vanish. Nevertheless, we will mention this type of scattering among the inelastic types. In the frame of classical mechanics, we can solve the Newton equation containing the attractive Coulomb force between a positively charged nucleus and the negative charge of an electron. The result of the classical calculation of the differential cross-section, first published by Rutherford in 1911, is (for electron energies negligible with respect to the rest energy E0 ¼ mc2 ¼ 511 keV) given by del e4 Z 2 1 ¼ 2 4 2 d 16ð4p"0 Þ E sin ð=2Þ
ð1Þ
where e and m are the electron charge and mass, respectively, "0 is the permittivity of vacuum, and Z is the atomic number of the nucleus. This relation is acceptable for electrons above 100 keV but at low energies it represents a bad approximation. It diverges at ¼ 0 because the small scattering angles arise for electrons flying far from the nucleus where in fact its potential is screened by electrons of the atom. At large scattering angles the approximation also fails owing to neglect of relativistic effects (Reimer, 1998). The screening effects can be described solely by quantum mechanics by determining the scattering amplitude f() of a spherical wave scattered on the atom and superposing it on the incident plane wave. The differential cross-section is then generally expressed as del =d ¼ jf ðÞj2 :
ð2Þ
SCANNING LEEM
321
The scattering amplitude for the screened Coulomb potential can be found by solving the Schro¨dinger equation for ground states of the atom electrons. A good approximation is the exponential screening with the screening radius equal to RS ¼ aHZ1/3, where aH ¼ 0.0569 nm is the Bohr radius; this gives, after substitution into Equation (2), the so-called screened Rutherford cross-section del e4 Z 2 1 ¼ 2 2 d 16ð4p"0 Þ E sin2 ð=2Þ þ sin2 ð0 =2Þ 2
ð3Þ
with 0 ffi l/2pRS (Reimer, 1998). This cross-section already produces finite values for small and can be further improved by taking into account the full series of exponential potentials instead of only one, by incorporating potentials from neighboring atoms (e.g., via so-called muffin-tin model), by modification of the scattering potential by correlation and exchange phenomena between incident and target electrons, etc., and/or decomposition into partial waves can be used. The exact cross-sections, so-called Mott cross-sections (Mott and Massey, 1965), for elastic large-angle scattering may be obtained for the screened Coulomb potential when the relativistic Schro¨dinger or Pauli– Dirac equation is used. The result is then in the form of a superposition of terms belonging to both spin directions with respect to the direction of propagation but no analytical expression for d/d can be written. For an unpolarized electron beam, the Mott cross-section remains axially symmetric and in the general Equation (2) two formally identical members are summed on the right-hand side. These can be then developed into a series of Legendre functions (e.g., Ding and Shimizu, 1996). In addition to modified values at large scattering angles, the Mott cross-sections exhibit one property not met before, namely nonmonotonic angular dependence as shown in Figure 3. Obviously, this behavior emerges in the lowenergy range for large Z while for small Z it is not present until near the very-low-energy range. Data regarding the Mott cross-sections for chemical elements at low and even very low energies can be taken from numerous sources (see, e.g., Reimer and Lo¨dding, 1984; Czyzewski et al., 1990; Werner, 1992). The role of the electron spin in scattering was examined by Kirschner (1984). When decreasing the energy of incident electrons toward the verylow-energy range, the Mott cross-sections, expressing a relativistic freeelectron scattering incorporating partial waves, overestimate the scattering rate and bring unrealistic short elastic mean free path (EMFP) in tenths of nm only. It was suggested (Fitting et al., 2001) that they should be
322
MU¨LLEROVA´ AND FRANK
FIGURE 3. Differential cross-sections for elastic scattering of electrons at various energies, calculated by decomposition into partial waves. (Reprinted with permission from Ichimura, 1980.)
replaced here with factors inherent to quasi-elastic scattering on acoustic phonons; this mechanism works down to the thermalization threshold of electrons at mean energy 3kT/2, i.e., a few tens of meV, and preserves EMFP in the nm range. Then the scattered electrons are considered as quasifree Bloch electrons within a dispersion relation of the conduction band of the target. Acoustic phonons have energies of a few meV only, but scattering on them is nearly isotropic so that they effectively influence any oriented stream of electrons and, for example, lower the electric conductivity. While the total elastic cross-section el characterizes a single scattering event, the multiple scattering is described by the EMFP el ¼ 1=Nel
ð4Þ
where N is the number of atoms per unit volume. As Figure 4 shows, this quantity also exhibits nonmonotonic behavior starting from the low-energy range. We can conclude that, for slow electrons, anisotropy appears already in scattering on single atoms and resulting features then combine with a directional segregation owing to interference of partial waves from a lattice of scatterers. Further, the path length between the scattering phenomena generally shortens down to the lowest energies but in the very-low-energy range this dependence is far from being monotonic.
SCANNING LEEM
323
FIGURE 4. Calculated energy dependence of elastic mean free paths (EMFP) for electrons. (Reprinted with permission from Ding, 1990.)
2. Reflection on Energy Gaps Even above the vacuum level, the energy band structure exists and the energy states E(k) are separated by forbidden gaps, particularly where the kvector touches the boundary of the Brillouin zone. If the incident electron hits the gap, it does not enter allowed states and should be reflected. Nevertheless, total reflection is not obtained because the electron can pass an inelastic collision and lose energy or it changes its wave-vector owing to scattering on a phonon or some crystal imperfection—in both cases a shift to the allowed states can occur. In the range of units of eV elastic reflection is strongly enhanced. Electron microscopists have never had any reason to take this phenomenon into account, as its appearance requires the electron impact to be far below the energy range available in the SEM. But those using the VLEED (very-low-energy electron diffraction) method know the energy band structure region existing on the intensity vs. energy (I–V ) curve for the specularly reflected (00) spot below the threshold where the first nonspecular diffracted beam appears (Jaklevic and Davis, 1982). It is important to note that the incident electron wave has to couple to the energy states into which it is to penetrate. This means those Bloch states inside the specimen, the surface-parallel wave-vector component of which is equal to Kk þ g, where Kk is the vector component of the incident electron and g is any surface reciprocal-lattice vector (Strocov and Starnberg, 1995); in other words, those Bloch states that have dominant Fourier components resembling the incident wave (the coupling bands). The local extremes on the energy dependence of reflectivity R(E ) are first of all connected with critical points of the energy bands E(k) at which @E/@k? exhibits sharp changes or is equal to zero at the band-gap edges. When mapping those
324
MU¨LLEROVA´ AND FRANK
critical points upon variation of Kk, complete bands in the symmetry planes of the Brillouin zone can be compiled (see Strocov et al., 1996). A further crucial condition is low absorption of electrons, which is met below landing energies of 25 to 30 eV. Simulations show that any local R(E ) features are washed out at even moderate absorption (appearing for the imaginary part of the crystal potential exceeding 1 eV) and that the model fits better the experimental data when the nonisotropic situation is considered with absorption reduced in directions along the surface (Bartosˇ et al., 1996). In addition to extremes of R(E ) revealing the critical points at the coupling bands, oscillations might also appear as minor features. These are connected with surface resonances caused by interference between the specular beam and a nearly emerged nonspecular diffracted beam moving parallel to the surface, and can be used for mapping the surface potential barrier (Jaklevic and Davis, 1982). Figure 5 demonstrates the reflection anisotropy connected with its k vector dependence, which enables one to get contrast between different crystal orientations at suitable energies. Mapping of the local variations in the density of states at the energy of the electron impact, for example those connected with the local doping of semiconductor structures, is also potentially available. B. Inelastic Scattering The term inelastic scattering is usually used to describe an interaction between the incident electron and the atomic electrons in the target. More generally, this should include all phenomena at which the impinging electron changes its energy. 1. Scattering on Electrons The main mechanisms of interaction between electrons include:
quasifree electron–electron collisions (i.e., a Compton-like scattering), excitation of electrons within partially occupied energy bands, excitation of interband transitions, excitation of plasmons, i.e., energy quanta connected with the charge density waves of valence or conduction electrons, and ionization of inner shells of the atoms. These mechanisms exhibit not only various cross-sections, but also represent very different amounts of energy interchanged between colliding particles. Although inelastic scattering is often assumed to cause only energy decrease
SCANNING LEEM
325
FIGURE 5. Reflection coefficient R(E ) of the W(110) and W(100) surfaces for slow electrons. (Reprinted with permission from Bauer, 1994.)
but not trajectory deflection, some collisions can be associated with large scattering angles (like the Compton scattering). In spite of this, the relevant differential cross-section is often considered in the energy scale instead of in the angular one, namely as d/dW, where W is the transferred energy. Energies transferred at ionization range from a few eV up to nearly 100 keV depending of the atomic number and electron shell involved. Excitation of plasmons and electron transitions cause energy losses not exceeding tens of eV but the loss due to the electron–electron collision can be up to 50% of the initial energy. If we neglect any binding forces acting upon the target electron at rest and consider the incident electron approaching with energy E, we can use classical mechanics to get the differential cross-section (Reimer, 1998) din pe4 1 ¼ : dW ð4p"0 Þ2 EW 2
ð5Þ
This relation is derived upon the assumption that the target electron stays at rest during the collision and only acquires momentum—this is not the case for slow electrons and hence for E ! 0 (and also for central collisions) this cross-section diverges. But Equation (5) indicates that this type of scattering is more common at low energies, that small energy losses are
MU¨LLEROVA´ AND FRANK
326
more probable, and, because the same approximation gives the scattering angle as sin2 ¼ W=E,
ð6Þ
small scattering angles are also more probable. If this simplified approach is upgraded, correcting terms due to indistinguishability of electrons and due to their spin are added to 1/W 2. These are of the same dimension (energy)2 and combine E, W, and E0. Thus, the added terms also grow at low energies but for W E these corrected relations converge to Equation (5). An equation derived by Gryzinski (1965), which takes into account the binding of electrons in atoms, also converges to Equation (5) for low binding energies. In fact, the ‘‘continuum’’ of losses owing to scattering on quasifree electrons appears in the EELS (electron energy loss spectroscopy) spectra only in the range of hundreds of meV (Reimer, 1995). The inner shell ionization can be solved in the same way as the problem of screening of the nuclear potential, i.e., by using the Schro¨dinger equation for the nucleus, one atomic electron, and one incident electron, which leads to Equation (3). Now, excited states of the target electron are incorporated too and the final result, the total cross-section, e.g., for ionization of the K shell, is (Reimer, 1998) K ¼
pe4 zK bK ln u ð4p"0 Þ2 EK2 u
ð7Þ
where zK is the number of electrons in the shell (zK ¼ 2), bK is a constant factor (bK ¼ 0.35), EK is the ionization energy of the shell, and u ¼ E/EK is the overvoltage. A maximum of K appears at u 3 for all atomic numbers and at lower energies K steeply falls. This means that throughout the lowenergy range the electron impact ionization is possible for every atom but in the very-low-energy range this type of scattering does not take place. The differential cross-section from the same calculation is 2 4
din eZ 4 1
¼ 1 d 1 þ 2 þ E2 =02 ð4p"0 Þ2 E 2
!2 3 5
1 2
þ E2
2
ð8Þ
where the characteristic angle is E ¼ J/4E with J being the mean ionization potential of the atom (J [eV] 10Z) and 0 is that from Equation (3). This relation is similar to Equation (3) and we can compare two characteristic features. First, at E ¼ 5 keV the inelastic scattering is still confined to smaller deflection angles (e.g., for Z ¼ 30, 0 is 10 times larger than E) but this
SCANNING LEEM
327
difference is less marked at lower energies. Second, the ratio of both differential and total cross-sections for inelastic with respect to elastic scattering is proportional to 1/Z, at least for large scattering angles. Within so-called dielectric theory, considering the solid described by the complex dielectric constant " and employing the analogy between the inelastic scattering of electrons and spatial attenuation of electromagnetic waves being proportional to the imaginary dissipative part of ", the differential cross-section can be written as (e.g., Bo¨ngeler et al., 1993)
d2 in 1 1 1 ¼ 2 Im 2 "ðW, Þ 2 þ D dW d p aH EN
ð9Þ
with D ¼ W/2E. The analogy is based on modeling groups of electrons, similarly strongly bound within the given energy-band structure, by oscillators defined by their strengths and characteristic frequencies. So the problem is now shifted to determination of the complex dielectric constant ". Similar relations and results as regards the inelastic cross-sections are obtained when using the formalism characterizing the incident electron as a quasiparticle with self-energy, the imaginary part of which describes the quasiparticle lifetime while the real part expresses the shifts in the energy eigenvalues with respect to the noninteracting system. The same holds for the formalism of the electron–jellium correlation potential with the imaginary part governing attenuation of the dielectric response of jellium to the electron impact. Equation (9) used to be written also in variables W and q (the momentum) or q and ! (with W ¼ h!/2p). An overview of these approaches was published by Nieminen (1988). The energy loss function, written as Im[1/"(q, !)], can be calculated on the basis of EELS experimental data for q ¼ 0 (the ‘‘optical’’ data) when employing, for example, the quadratic dispersion relation (Kuhr and Fitting, 1999) !ðqÞ ¼ !ð0Þ þ
h 2 q: 4pm
ð10Þ
Figure 6 shows an example of the measured dielectric loss function for SiO2 (Fitting et al., 2001). This contains peaks inherent in scattering on optical phonons, which will be mentioned in the next section. Calculations of quantities characterizing the inelastic scattering, which employed the dielectric function, were performed by many authors (e.g., Cailler et al., 1981; Powell, 1974, 1984, 1985; Penn, 1987; Egerton, 1986; Ding and Shimizu, 1996).
328
MU¨LLEROVA´ AND FRANK
FIGURE 6. The dielectric loss function for SiO2. (Reprinted with permission from Fitting et al., 2001.)
In Figure 7 we see the calculated energy dependences of in for two elements, including the main contributions to in. These curves reflect the general IMFP curve in Figure 2, which we used for the definition of the very-low-energy range (with IMFP, lin, defined analogously to Equation (4)). Further data regarding the IMFP behavior at low energies can be found in the work of Ding and Shimizu (1996), Powell (1987), Tanuma et al. (1991a,b), and others. We can notice that, starting from lowest energies, first the electron–electron scattering appears, then the ionization, and finally the plasmon excitation emerges. Let us repeat that the steep drop in in below about 50 eV is the most important feature here and also the reason for considering the use of this range as a separate mode of SEM. 2. Scattering on Atoms In the dielectric loss function in Figure 6, noticeable peaks that appear around 100 meV belong to scattering of electrons on optical phonons. Having formally separated the inelastic phenomena due to scattering on electrons in the previous paragraph, we discuss this mechanism here. The electron–phonon interactions are important mainly in dielectrics and insulators, but also in semiconductors. The forward scattering on longitudinal optical phonons (LO) is strongest. In 1969 Llacer and Garwin calculated (by means of Monte Carlo simulations) the secondary electron transport in alkali halides below 7.5 eV using the time-dependent perturbation of plane waves with the interaction Hamiltonian containing the polarization field caused by relative displacement of ions in the LO vibrational modes. Schreiber and Fitting (2002) discussed these phenomena in detail for SiO2 and included two LO modes with energies
SCANNING LEEM
329
FIGURE 7. Calculated total inelastic cross-sections (———) and their main contributions, namely the electron–electron scattering (— — ), shell ionization (- - - - - -), and plasmon excitation (— — ). (Reprinted with permission from Ho et al., 1991.)
of 60 and 150 meV and also scattering phenomena representing both emission and annihilation of phonons. They also presented the scattering rates of collisions, which are much higher for the phonon emission phenomena. At about triple phonon energies, these rates reach their maxima between 1014 and 1015 s1 and toward higher energies they fall approximately as E1/3. Thus, this scattering mechanism concerns mostly electrons with energy around 1 eV. It seems clear that, even when incorporating the phonon scattering, the enlargement of IMFP in the very-low-energy range (see Figure 2) is preserved at least for conductors and possibly for semiconductors. In Figure 8, comparison is made on the basis of data simulated by the Monte Carlo (MC) program employing the dielectric loss function (Kuhr and Fitting, 1999). Further, Figure 9 details the contributions to IMFP for SiO2, again presenting results of a MC program specialized to the very-low-energy scattering in semiconductors and wide-gap insulators. In addition to the mean free paths, both figures also contain the attenuation lengths characterizing a no-loss escape of electrons. We note that the energy range below 50 eV still has some structure in Figure 9, which could be utilized for the subdivision of this range. Nevertheless, this would be specimen-specific and would not allow any general conclusions to be drawn. In studies of the very-low-energy electron scattering, one more scattering mechanism is mentioned, namely the intervalley scattering (e.g., Schreiber and Fitting, 2002). This consists of collisions with suitable optical phonons, at which, in addition to energy loss corresponding to the phonon energy, additional energy and also momentum is transferred because the final state is in a different band or ‘‘valley’’ within a multiple-band structure. For SiO2, this type of scattering occurs much less frequently than the LO scattering.
330
MU¨LLEROVA´ AND FRANK
FIGURE 8. Elastic (el.) and inelastic (inel.) mean free paths and attenuation lengths (atten.) for Ag, Si, and SiO2, calculated by means of a MC program incorporating the Mott crosssections and the dielectric loss function. (Reprinted with permission from Fitting et al., 2001.)
FIGURE 9. The mean free paths in SiO2 as a function of the electron energy for scattering at optical phonons (LO) and acoustic phonons (ac) and for impact ionization (ii), together with the attenuation length (at) for monoenergetic electrons. (Reprinted with permission from Schreiber and Fitting, 2002.)
For completeness we should also mention here the inelastic scattering of electrons on the screened Coulomb potential of the nucleus, leading to generation of an X-ray photon of the continuous emission (Bremsstrahlung emission). The low probability of radiative scattering on the nucleus can be demonstrated by comparing the ratio of the mean energy loss per unit trajectory Srad to the analogous quantity for the electron–electron scattering. When using an approximate stopping power Se–e according
SCANNING LEEM
331
to the Thomson–Whiddington law (see next section), we get the ratio (Feldman and Mayer, 1986) Srad 4 Z v2 ffi Se–e 3p 137 c
ð11Þ
which can be simply written as (Z/161) E/E0 so that in the low-energy range it falls to a value of the order of 103 or 104.
C. Penetration of Electrons The primary beam in SEM strikes the specimen surface at a point, the coordinates of which within the field of view are then used to describe the localization of all information collected during the dwell time of the beam. Nevertheless, the primary electrons penetrate to nonnegligible distances in all directions from the impact point and within this interaction volume they cause scattering phenomena and generate signal species. Thus, the abovedescribed single-scattering mechanisms are important not only for interpretation of the observed properties of emissions but also for tracing the spatial distribution of the information sources. The analysis of the electron penetration goes through the concept of multiple scattering, which can be characterized by statistical quantities only. We have mentioned the mean free paths for individual types of scattering. From Figures 2 and 4 or from Figure 8 it is obvious that, throughout the low-energy range, the ratio of rates for elastic and inelastic scattering is approximately constant and dependent on the mean atomic number of the target. The very-low-energy range is characterized by the onset of a strong dominance of the elastic scattering. When penetrating into the specimen (and, after undergoing some highangle scattering events, also into lateral directions) by a distance dx, the electron encounters N dx atoms (where N ¼ NA/A is the number of atoms per unit volume, NA is the Avogadro number, the target density, and A the atomic mass). Thus, the decrease in the stream of unscattered electrons within the trajectory section dz is dI/I ¼ N dz, where is the total crosssection of one atom for a particular scattering mechanism. The unscattered intensity after passing the thickness l is I ¼ I0 exp(l/l) (with l as the mean free path), p ¼ l/l is the mean number of collisions in the layer, and Pn ¼ pnep/n! is the probability of n collisions for one electron. This simple model can be used only up to about p 25 (see Reimer, 1998), i.e., only for tracing the penetration to distances of the order of 101 nm.
332
MU¨LLEROVA´ AND FRANK
In the course of its penetration, the electron beam broadens as regards both its spread of angles and also its cross-section. Within the approximation of small energy losses and small scattering angles, the rootmean-square (RMS) width of the beam increases as l3/2 (Reimer, 1998). Nevertheless, this approximation is not good for low energies and successful modeling of the geometry of electron penetration is possible only by using tools such as MC programs. The multiple inelastic scattering is responsible for the finite length of the electron path within the target. The appropriate statistical quantity for the examination of this process is the stopping power S ¼ dEm/dx (with dEm for the mean energy loss), corresponding to the continuous slowing-down approximation. This approach, which neglects discreteness of the collisions, does not allow study of the emission of elastically backscattered electrons (eBSE) but is useful for MC programs simulating the SEM image signals. When taking into consideration the e–e interactions only, the so-called Bethe formula, usually written as (Reimer, 1998) 2pe4 Z NA E ln 1:166 S¼ J ð4p"0 Þ2 E A
ð12Þ
represents the first approximation. For composite targets, the individual stopping powers have to be accumulated so that the relation for S contains a sum of terms like Cim ðZi =Ai Þ lnðbE=Ji Þ, where Cim are the mass fractions. This sum is often replaced by some energy-independent factor and the resulting expression is then called the Thomson–Whiddington law. Another practically convenient form of the stopping power relation for elemental targets is (Joy and Luo, 1989) S ¼ 7850
Z X Zi E ln AE i Z Ei
½eV=nm
ð13Þ
where is in g cm3, Zi is the occupancy of the level i, and Ei its binding energy. Equation (13) is claimed to work down to the binding energy of the outermost occupied level. While the differential cross-sections for the main scattering mechanisms are all proportional to E2, the stopping power according to Equations (12) and (13) increases only as E 1. The validity of approximation (12) is restricted to high energies, notably in the dependence on the ratio E/J, so that for light elements it is acceptable down to about 1 keV. For lower energies, the correction J ! J0 ¼ J/(1 þ kJ/E ) with k ffi 0.8 is possible. Below E/J ¼ 6.3 the energy dependence of S used to be replaced by S / E0.5 (Rao-Sahib and Wittry, 1974) but some authors assert that this parabolic
SCANNING LEEM
333
relation overestimates the energy loss of slow electrons (see Ding and Shimizu, 1996). In the very-low-energy range, the stopping power seems to behave according to the statistical theory of Tung et al. (1979), in which electrons in the target are considered to form a homogeneous electron gas. Then S / E5/2 for all targets (Tung et al., 1979; Nieminen, 1988), which corresponds to the sharp fall obvious in Figure 7. A theoretical model exists also for the most probable electron energy after passing a layer of the target, together with the distribution around this mean value. For SEM applications, the energy distribution of the backscattered electrons, which is mentioned below, is relevant. From the practical point of view, we need to know to what depth the primary electrons penetrate and what is the escape depth of the signal species. Various quantities have been defined to measure these distances and one of them is the attenuation length shown in Figures 8 and 9. The most useful is the electron range R, which can be defined in several different ways according to the method of measurement (see Reimer, 1998). Determination of the electron range is possible via measurement of the number of electrons T(x) passing a foil of a given material with some known thickness x. Because R depends also on energy E, it is convenient to use one foil thickness and to vary the energy. It is also advantageous to use an extrapolated value Rx (obtained by extrapolating the linear part of T(x) toward T ¼ 0) instead of measuring down to really negligible transmission in order to get some Rmax. Most of the experimental data obey a simple law R ¼ aE n
ð14Þ
with a around 10 and n decreasing from 5/3 at high energies to about 4/3 at low energies (Bo¨ngeler et al., 1993). This relation seems to be valid down to about 1 keV and only few data exist below this energy. Salehi and Flinn (1981) verified the power law (14) for the penetration depth using two different amorphous glasses within the energy range 100 to 5000 eV and found n as 1.4 and 1.5 with the larger value for higher mean atomic number. The theoretical limit for the electron range can be obtained by integrating the stopping power S up to the particle ‘‘stop,’’ which gives some RS. Experimental data will provide lower values. According to Reimer (1998), for light elements with Z below about 20 we get Rmax ffi RS and Rx ffi 0.75RS, while for high Z above 50, Rmax
334
MU¨LLEROVA´ AND FRANK
toward a nearly energy-independent behavior at 100 eV, with values of 10 nm for Si and 6 nm for Au. One further parameter of the electron beam penetration is the depth distribution of the energy dissipation. The interaction volumes of the beam are roughly the same size for all specimens when distances are measured in the mass thickness x. Nevertheless, as the MC simulations show, for light elements the majority of scattering events is concentrated in the central level of the volume and somewhat below it while for heavy elements more scattering takes place above the central level.
D. Heating and Damage of the Specimen In introductions to papers dealing with low-energy microscopy, especially those concerning the instrumentation, we often meet formulations stating that the low energies are advantageous because of reduction in the specimen radiation damage. However, this opinion is wrong in most instances, at least within the low-energy range usually meant in the statements cited. Although the slow electrons really deliver less energy per impact, their interaction volume shrinks strongly and the spatial density of energy dissipation even increases. If the primary beam spot is considered stationary and its interaction volume hemispherical of a radius R/2, the temperature increase in the illuminated point amounts to (Reimer, 1998) T ¼
3dUI 2pcR
ð15Þ
where U and I are the accelerating voltage and beam current, respectively, d is the portion of the incoming power that is dissipated into the specimen, and c [Js1m1K1] is the thermal conductivity of the target. Obviously, the heating is proportional to E/R and because of Equation (14) T increases at low energies. This heating is naturally higher for any noncompact material; e.g., for fibers it increases roughly in the ratio of their length to diameter (Reimer, 1998), and cannot be significantly suppressed by surface metalizing because of insufficient cross-section of the metal layer. When considering the electron probe scanning an island of area SA of a layer of thickness l R/2 from a low thermal conductivity (e.g., organic) material deposited onto a metal surface, we can derive the simple relation for its heating T ¼
d jU ðl R=4Þ c
ð16Þ
SCANNING LEEM
335
with the illumination current density j ¼ I/SA. Now we get T decreasing at low energies but when SA denotes the size of the field of view on a larger area of this organic material, the lateral heat escape can reverse again the energy dependence of the heating, according to the relation between SA and R2. In the low-energy range, the energies of emitted SE and BSE are still so different that in spite of the SE yield being higher than that of BSE, , the energy output mediated by SE can be neglected. Thus, d 1 and it remains nearly constant over the low-energy range. Further, because of / E0.8 (Drescher et al., 1970) at energies sufficiently above the maximum of , i.e., above, say, 2 keV (see the next section), we can operate with the primary current decreasing in the same ratio with the decreasing energy. Taking this into account, we find (even in Equation (15)) that the specimen heating slowly decreases at low energies down to about 2 keV where the increase starts again. The temperature distribution around the moving electron probe was studied by Kohl et al. (1981). For common materials, particularly metals, semiconductors, and even insulators with sufficient thermal conductivity, the temperature increase remains far below 1 K. Nevertheless, some materials such as foams and gels have very low thermal conductivity and their heating might be critical. These specimens were studied by, for example, Brown and Swift (1974), Berry (1988) and Price and McCarthy (1988). Mostly low-energy modes are recommended for sensitive specimens. The direct radiation damage of the specimen material consists first of all in breaking the chemical bonds, decomposition of molecules, and possibly release of gaseous components. Because it is generally ionization phenomena that are in question here, the energy dependence can be assessed according to the stopping power S. According to Equation (12), S / ln(E/J)/E, i.e., approximately S / E0.8. This is an even steeper slope than that of E1/3, which results from the simplest assumption about the beam energy tIE/e (with t being time) distributed homogeneously into the depth measured by R / E4/3. Hence the spatial density of the radiation damage events increases throughout the low-energy range. For organic molecules the radiation damage mechanisms are diverse and it is not possible to survey them here. Let us briefly mention semiconductors. Incident electrons generate, for example in silicon, electron–hole pairs and holes can be trapped in the SiO2 layer where they have much lower mobility than electrons. Hence the density of surface states on the SiO2–Si interface increases, which in turn increases the rate of surface recombinations and the generated space charge might even cause layer inversion. For example, in MOS (metal–oxide–semiconductor) structures, the electron bombardment can induce changes in the threshold
336
MU¨LLEROVA´ AND FRANK
voltage, gain, and dark current, i.e., in all the important parameters of the device. Some of these effects disappear only after long heating to temperatures above 250 C. Nevertheless, significant suppression of these influences has been proved for electron energies below 1 keV, which are also used in IC (integrated circuit) testers. A special kind of radiation damage consists in breaking the carbon bonds to oxygen, hydrogen, and other atoms in hydrocarbon molecules. Owing to their high sticking coefficients, these molecules are always present on the inner walls of vacuum vessels including the specimen surface, unless this has been prepared or cleaned in situ. The carbon atoms then close double bonds and create a ‘‘polymerized’’ graphitic layer, the thickness of which progressively grows owing to easy diffusion of additional hydrocarbon molecules from the nonilluminated neighborhood of the field of view. Consequently, dark rectangles with darker frames indicate the areas of previous observation; these effects were studied by, for example, Fourie (1976, 1979, 1981) and Reimer and Wa¨chter (1978). At low current densities, the contamination thickness is proportional to dissipated energy, i.e., to the stopping power (which increases down to the boundary of the very-low-energy range). When all the molecules that have diffused into the illuminated area within a time interval are cracked in that time, the contamination rate saturates and then the contaminant accumulation becomes linear in time. As indicated above, the carbon contamination rate increases with decreasing E; according to experience, the situation around 100 to 200 eV is the most critical. Let us underline that, within a certain energy range of a few hundreds of eV, this phenomenon is the main obstacle against routine taking of micrographs (see, e.g., Figure 65). Fortunately, from the beginning of the very-low-energy range, this trend is reversed, elastic phenomena start to dominate, and the radiation damage disappears.
E. Specimen Charging As already mentioned above, one important reason for using low energies in SEM is to gain an advantage in the perpetual fight of SEM microscopists against specimen charging (see, e.g., Pfefferkorn et al., 1972; Welter and McKee, 1972; Morin et al., 1976). In the next section we will see in more detail that in general the amount of charge emitted from the specimen differs significantly from the incoming charge. The difference is dissipated in the target and when this has a low conductivity, the absorbed current is not carried away from the illuminated area efficiently enough and significant charge density gradients arise. Furthermore, except for glasses and other
SCANNING LEEM
337
FIGURE 10. Typical energy dependence of the total electron emission for specimen tilt angles 3> 2> 1 indicating the development of charging processes (see text for details).
homogeneous materials, the nonconductors and particularly specimens from the area of the life sciences are as a rule of a heterogeneous and anisotropic nature. Thus, the specimen charging is usually also inhomogeneous with the result that electric fields that vary strongly both in space and time are created above the surface. These fields destroy the micrograph geometry by deflecting and defocusing the primary beam, and also affect the brightness distribution by influencing the signal electron trajectories toward the detector. Qualitatively the issue can be comprehended from Figure 10. For every specimen, the total electron yield, ¼ þ , exhibits a maximum, which for the great majority of elements and compounds and for all nonconductors exceeds the value 1.0. When progressing from the conventional SEM energies around 15 keV downwards, the (E ) curve rises and crosses the unit level at the critical energy EC2. This is the optimum energy for no charging and we will discuss ways of employing it for practical microscopy in Section VIII. Further, (E ) reaches its maximum at some Em0 that more or less coincides with the maximum of the SE emission at Em (see the next section) and then descends, crosses again the unit level at EC1 and enters the range where no general curve can be drawn owing to very diverse behavior of the BSE emission. Finally, (E ) ! 1 when approaching the mirror microscopy range at and below the zero energy impact. The differences in curves labeled with angles i express the angular dependence of the emission, which here means the influence of the specimen tilt and also of local inclinations corresponding to the surface relief.
338
MU¨LLEROVA´ AND FRANK
Suppose now the primary beam incident at energy E0 >EC2 on a poorly conducting surface that exhibits a finite leakage resistance RG between the illuminated point and ground; this is a measure of the ability of the specimen to carry the incoming charge away. We can characterize it by a straight line of slope (eRGIP)1 with IP as the specimen current, which corresponds to a positive surface potential formed on RG. The potential drop across RG partially compensates the negative potential of the accumulated surface charge. The residual local potential decelerates the incoming beam so that its landing energy decreases causing (E ) to increase. This iteration continues until an equilibrium is reached at the point A for the final landing energy E where the incident and leakage currents are equal, leaving some net surface potential US<0. Obviously, for RG ! 0 no charging occurs, the leakage line is vertical, and E ¼ E0 while for RG ! 1 the line is horizontal, E ¼ EC2 at the equilibrium point A1, and maximum charging takes place. The charging process should be identical for E 0 <EC2, only with a positive sign of US, but an important difference is that the positively charged surface reattracts the slowest secondary electrons so that instead of the ‘‘working point’’ moving along the original curve (E ), this curve is modified toward lower effective SE emission. This results in a reduced positive surface potential US0 . Let us underline that here we discuss exclusively the above-surface electric fields, which are those influencing the image acquisition. Thus, we consider the specimen illuminated by the given beam current within a certain energy range. Quite a different issue is to interpret the internal charging phenomena and alterations arising inside the specimen due to electron beam bombardment. These questions have been thoroughly studied by Cazaux and collaborators (see, e.g., Cazaux, 1986, 1996a,b, 1999; Cazaux and Le Gressus, 1991; Cazaux et al., 1991; Cazaux and Lehuede, 1992; Le Gressus et al., 1990). They studied physical phenomena connected with electron bombardment of insulators, electron trapping, internal electric fields, electromigration of ions due to these fields, etc., and when external fields are treated in these works they are considered as a consequence of internal charge distributions. We will restrict ourselves to phenomenological assessment of the net surface charge, based on its influence on the (E ) curve. Thus, we will regard the charging as the creation of a thin charged plate of the same size as the field of view just at the specimen/vacuum interface. Otherwise, even a simple physical model has to adopt a positively charged below-surface layer from which SE are emitted and a deeper negatively charged region where PE are trapped. Between EC1 and EC2, a third layer appears closest to the surface in which the reattracted SE generate
SCANNING LEEM
339
some negative charge again. The above-surface field is then dependent on the total charge distribution. Furthermore, the (E ) curve cannot be understood as a static property of the specimen since it exhibits a dynamical behavior, the specimen emission yields being affected by the penetrating charged species. We now look at the dynamics of the charging process based on our phenomenological model—these data will be needed for one important SLEEM application described below. From the (E ) curve in Figure 10, we easily deduce E E0 : ð17Þ ðEÞ 1 ¼ eRG IP Let us assume for simplicity that the charging process is characterized by only one time constant C so that the accumulated charge develops as Q ¼ Qmax[1exp(t/ C)]. This time constant will now be determined: Qmax C ¼ : ð18Þ ðdQ=dtÞt¼0 We consider the charged field of view to be a thin circular disc of diameter a and charge density q, situated in a medium of permittivity ". From the Coulomb law and the principle of superposition, we get the disc potential as US ¼ qa/2". Figure 10 gives eUS ¼ E E0 , so that Qmax ffi qa2 ¼ 2"aUS ¼
2"a ðE E 0 Þ: e
ð19Þ
At t ¼ 0 the dissipated part of the beam current is [ (E0 )1] IP ¼ dQ(0)/dt (<0 for negative charging). In this equality, we substitute for (E0 ) the first two terms of the Taylor expansion around E and then substitute for (E ) from Equation (17). Finally we obtain
dQ 1 d ffi ðE E 0 Þ IP : dt t¼0 eRG dE
ð20Þ
The behavior of (E ) for energies above the critical energy can be estimated, according to the relations const and / E0.8 (Reimer et al., 1992), as (E ) ffi þ (1 )(E/EC2)0.8, which enables us to take the derivative in Equation (20). Substituting now from Equations (19) and (20) into Equation (18), we find for large RG the final result C ffi 2:5
"aEC2 : eIP ð1 Þ
ð21Þ
340
MU¨LLEROVA´ AND FRANK
In order to get some quantitative figure, let us use the values ¼ 0.2, "r ¼ 4, EC2 ¼ 2000 eV, IP ¼ 1 nA as an example. Then the time constant C varies between 200 ms and 20 ms as the size of the field of view changes from 1 to 100 mm. Practical experience confirms that some charging is nearly instantaneous and might correspond to this range of C but afterwards further changes in the image are usually seen for seconds or even longer. We stress that the above calculation differs from the approach often met in the literature (see, e.g., Shaffner and Van Veld, 1971; Welter and McKee, 1972; Cazaux, 1986), which considers the charging dynamics to be identical with the charging of a plate capacitor situated between the specimen surface and a metallic holder; then C ¼ ". At high resistivity , this time constant can be very long, e.g., about 3300 s for SiO2. Nevertheless, we recall that we aimed at getting a time constant for the progress in image destruction effects caused by the above-surface field, but the capacitor field is entirely closed between its plates and hence restricted to inside the specimen. A weak point of our approach is that the specimen current is considered to be of an ohmic nature, which might not always be realistic (see Cazaux et al., 1991). The crucial quantity in the above considerations is the critical energy EC2. This quantity can be easily measured on conductors (via the absorbed current) but the opposite holds for the nonconductors that are of interest here; Reimer et al. (1992) summarized possible methods for this case. While in Figure 1 we have the values of EC2 for some conductors, Joy (1989) published values for a choice of technologically important inorganic insulators between 550 and 3000 eV and for a selection of polymers, he found EC2 from 0.4 to 1.8 keV. The dependence of EC2, governing the scatter of these properties at rough specimens, can be estimated as EC2( ) ¼ EC2(0) sec2 (e.g., Joy, 1989). This relation suggests that EC2(60 )/EC2(0) ¼ 4 but detailed studies showed a less steep angular dependence and this ratio was found to be only around 2 and even smaller for insulators (Reimer et al., 1992). In particular, the angular dependence of EC2 should weaken in the low-energy range together with the same trend regarding .
F. Tools for Simulation of Electron Scattering Phenomena connected with electron scattering inside the target, particularly in the case of multiple scattering met in SEM, are too complex for it to be possible to solve the ‘‘direct task,’’ i.e., to reconstruct
341
SCANNING LEEM
the specimen from the SEM image or an image series, as can be done in certain circumstances in the TEM. Thus, efforts were oriented toward simulating the image of a fictitious specimen described by its scattering cross-sections. Two main approaches can be found in the literature: the transport equation approach and the Monte Carlo procedure. The transport equation is the equation for the phase-space density of electrons, n(r, t, t), dependent on the position vector r, velocity vector t, and time t. All scattering phenomena that influence local values of n(r, t, t), including the generation of the incident beam, can be expressed by integrals containing the corresponding probabilities for movement within phase space (e.g., (t ! t0 ) for both inelastic and elastic scattering, differing by whether |t| ¼ |t0 | or not) and taken over the rest of the phase space. If a stationary case is considered, the terms increasing and decreasing the local density n compensate mutually, composing some integral equation for n(r, t). In order to render the equation solvable, some simplifying assumptions are usually made, such as restriction to semi-infinite, amorphous, and in-depth homogeneous specimens with ideally flat surfaces, to multiple scattering corresponding to the Poisson stochastic process, the inelastic scattering being described by the mean free path, etc.; these assumptions enable one to use the Boltzmann-type classical transport equation (see, e.g., Werner, 1996). The simplest solutions then assume that the scattering is restricted to small angles. Program packages utilizing the MC algorithm also rely on the simplifying assumptions mentioned above, except those regarding the homogeneity, semi-infinity, and flatness of the specimen. Otherwise, similar information about the specimen is needed—the differential elastic and inelastic cross-sections or their equivalents, the differential inverse mean free paths. Let us briefly summarize the MC procedure for tracing the scattering of one electron (see, e.g., Ding and Shimizu, 1996). The basic concept is the normalized accumulation function A(x) for some probability distribution function P(x) of a physical phenomenon with one parameter x: Z AðxÞ ¼
x xmin
0
0
Pðx Þ dx
Z
xmax
Pðx0 Þ dx0 :
ð22Þ
xmin
Obviously A(x) 2 (0,1) and when uniformly distributed random numbers R are taken within this interval and x is calculated from A(x) ¼ R, then after many attempts the assembly of x values obeys P(x). The second basic concept is the rule for deciding which member of a set of n possible phenomena will take place when one must definitely occur. The rule is that
MU¨LLEROVA´ AND FRANK
342
the ith phenomenon occurs if i1 X j¼1
pj =
n X j¼1
pj < R <
i X j¼1
pj =
n X
pj
ð23Þ
j¼1
where pj is the probability of the jth alternative. Again, after many attempts the total occurrence of a particular phenomenon corresponds to its probability. Now we can apply Equation (22) to the free path s of the electron provided the scattering probability distribution is that of the Poisson process, i.e., P(s) ¼ (1/lT)exp(s/lT) with 1/lT ¼ 1/lel þ 1/lin as the total inverse mean free path; we simply get s ¼ lT lnR. Having determined the free-path section, we use another random number to decide what collision takes place by means of Equation (23) so that p1 ¼ lel and p2 ¼ lin. When simulating the scattering in a compound of m different atoms, the elastic scattering phenomenon can be ascribed to the jth atom according to Equation (23) again, now with pj ¼ Cja =ljel , Cja as the atomic fraction. The same procedure can be followed for the inelastic collisions only in alloy-like compounds where the scattering cross-sections can be summed. Otherwise, the compound-specific data for 1/lin should be acquired. The scattering angle due to an elastic collision is also calculated from Equation (22) with A(x) ¼ R, provided P(x) is replaced by d el/d; the scattering angle and energy loss for an inelastic event can likewise be found by using d2 l1 in /d /dW, respectively, as the probability P(x). dW and d2 l1 in When simulating sophisticated processes such as the formation of an angular resolved energy spectrum of electrons with characteristic energies, in which case only very few of the incident electrons create the relevant signal species, simulation of reversed trajectories can also be used (Gries and Werner, 1990), starting at the detector and finishing at the first inelastic collision. Ding and Shimizu (1996) presented MC modeling of SE generation including cascading processes, which produced energy spectra of SE þ BSE emission that fitted the experimental data very well. A detailed study of SE emission from SiO2 by MC simulations was performed by Schreiber and Fitting (2002). These data directly relate to the SEM application. Obviously, both transport function and MC algorithms can be applied on various levels of the physical model, depending of what data for the mean free paths are utilized. For the inelastic scattering, more recent works use the dielectric functions, usually compiled from measured optical data and some dispersion relation, and sometimes a combination of discrete events and continuous slowing down contribution is used or even no discrete events
SCANNING LEEM
343
are considered. For the elastic scattering, all the choices from classical Rutherford cross-sections up to the Mott ones can be found, including the scattering on phonons, as described in previous sections of this chapter. Generally the MC algorithm is capable of working down to very low energies, provided it is completed with adequate scattering data, and only coherent scattering phenomena and effects connected with the energy band structure are excluded. When in the very-low-energy range the scattered electrons are taken as the Bloch electrons within the energy band structure of the target, the electron trajectory, needed for the MC procedure, has to be extracted from E(k), tensor of the reciprocal effective mass km1k, etc. Then the electron movement in an electric field F is first considered in the reciprocal space as dk/dt ¼ (2pe/h)F, and transition into the real space is made via the real electron velocity (Schreiber and Fitting, 2002) tðkÞ ¼
1 2p 1 h k gradk E ¼ h 2p 1 þ 2 E ðkÞ m
ð24Þ
with as the nonparabolicity parameter of the energy band. The MC methods were developed many years ago by Joy (see Joy, 1995) and his programs are widely used. The work of Ding and Shimizu (e.g., Ding and Shimizu, 1996) is important here and simulations carried down to even fractions of eV were made by Fitting and co-workers (e.g., Kuhr and Fitting, 1999; Fitting et al., 2001). The MOCASIM program introduced by Reimer (1996) is also popular. Although the MC approach seems to be much more flexible and universal and enables one to model directly various phenomenological quantities, the necessary number of simulated trajectories grows enormously when any spectrum-like data need to be modeled and the computation time exceeds reasonable limits. Specific solutions to the transport equation are then sought, as, for example, in simulation of the energy spectra of BSE (Reimer et al., 1991).
IV. EMISSION
OF
ELECTRONS
The previous section provided fundamentals enabling one to comprehend the behavior of the electron emission excited by the impact of primary electrons. We now deal with individual contributions to the total energy spectrum of emitted electrons, shown schematically in Figure 11. This is composed of two main components, SE and BSE emission, which together form the background for electrons with characteristic energies that convey
344
MU¨LLEROVA´ AND FRANK
FIGURE 11. Typical energy spectrum of electrons emitted under the electron beam impact (for explanation of symbols see text for details).
spectroscopic information. Secondary electron emission and electron backscattering are quite different phenomena, the first of which consists of a stream of particles released from atoms by impact ionization, the most important item within the inelastic scattering, while the latter comprises reflected primary electrons that have undergone some elastic scattering events. Nevertheless, owing to the indistinguishability of electrons, there is no way of separating these groups, except according to some statistical quantities connected with their motion after emission, i.e., with their velocity. Even when considering in simulations that the faster electron leaving the inelastic collision connected with ionization is the scattered particle and the slower is the ionization product, we do not arrive at the correct results because of the existence of processes in which the particles exchange their energies (Fitting et al., 2001). By definition, the separation between SE and BSE emission is situated at the threshold emission energy Et ¼ 50 eV and it is believed that the tails of both distributions, extended beyond Et, compensate each other. This can be the case for high primary energies but, in the low-energy range, the compensation is far from complete, as Fitting et al. (2001) showed by simulating both SE and BSE emissions from Au (see Figure 12). Obviously, below 400 eV the BSE contribution below 50 eV (BE) strongly exceeds the amount of fast SE above 50 eV (SE), while for EP>400 eV the opposite but weaker imbalance exists. This indicates that measurements of both yields, with respect to the 50 eV convention, overestimate below
SCANNING LEEM
345
FIGURE 12. Contribution SE of fast SE (E>50 eV) to the BSE yield , together with contribution BE of slow BSE (E<50 eV) to the SE yield , as calculated for Au. (Reprinted with permission from Fitting et al., 2001.)
a certain energy and underestimate it above the same value, while the opposite holds for . Nevertheless, any more correct distinction between the two emissions is possible only in simulations but can hardly be made in experiment, so from now on we continue to accept the 50 eV threshold. Naturally, when approaching the very-low-energy range, i.e., somewhere below 100–200 eV, no separation is possible and the total yield has to be considered. In this section we will also deal with phenomenological features of emissions such as their yields, energy and angular distributions, information depths, and (for eBSE) the coherence.
A. Electron Backscattering Emission of the backscattered electrons consists of a fraction of the electrons reflected without any significant energy loss, the eBSE emission, while the rest down to 50 eV are electrons with various energy losses. The eBSE peak is a potential source of information about the probability of incoherent elastic scattering, and hence about relations between elastic and inelastic scattering (Gergely, 1986). To acquire this information, it is necessary to measure the eBSE emission yield el with respect to the primary current, namely within the maximum possible
346
MU¨LLEROVA´ AND FRANK
range of emission angles. Measurement on a thin-film-covered substrate under variation of the film thickness is particularly fruitful. The intensity of the eBSE peak generally increases with increasing atomic number and decreasing energy, as does the scattering cross-section el. Nevertheless, these monotonic dependences break down below about 1500 eV where the el(E ) curves for different Z start to cross each other (Schmid et al., 1983). Owing to multiple scattering inside bulk specimens in SEM, the discrete peaks connected with ionization losses are usually washed out to a smooth distribution curve of BSE modulated with low peaks due to Auger electrons (AE) and to plasmon losses. Only under special circumstances can some very weak ionization loss peaks be observed as, for example, the oxygen ionization peak for a specimen covered with a thin oxide layer; the height of this peak can be up to about 103 of the eBSE peak (Gergely et al., 1986). The plasmon peaks can normally be observed in electron spectra taken with an Auger electron microprobe (AEM) and used for analytical purposes as in the transmission mode in EELS. The risk of carbon contamination makes it necessary to examine discrete features in the BSE spectrum solely under UHV conditions. Although the UHV-conditioned modes are not excluded from the scope of this text, we do not consider here instruments equipped with analyzers for signal discrimination according to energy. Hence we will not further discuss the reflection EELS (REELS) method, because within the total detected BSE signal the contribution from discrete peaks is negligible. The overall shape of the BSE spectrum exhibits a very broad maximum at m (see Figure 11). It represents the most probable energy loss of the EBSE emitted BSE, and in addition to scattering properties of the specimen, it depends also on the experiment geometry, i.e., on the impact and emission angles. For a tilted specimen and high emission angles (taken from the surface normal), this maximum moves toward EP (Bauer, 1979) and the same holds for higher atomic numbers (Kulenkampff and Spyra, 1954). Reimer et al. (1991) simulated the energy spectra of the fast electron backscattering into the full half-space for layered structures and demonstrated that the position and height of the spectral maximum sensitively changes with thickness and material of the overlayer relative to substrate. Frank (1992b) presented experimental data for similar layered structures, taken in the low-energy range, together with a model interpreting the position of the BSE spectral maximum in terms of the depth of the film/substrate interface and treating the height of this maximum as proportional to the rate of quasi-elastic backscattering on the interface.
SCANNING LEEM
347
Examination of the BSE spectral maximum also requires using an energy analyzer and hence goes beyond the scope of this review, but the mere existence of this feature is important for SEM because it defines the average energy of the BSE signal species, which is crucial for their detection. Nevertheless, even for homogeneous specimens and normal impact, the mean BSE energy varies within a broad range, and is also affected by the acceptance angle of the detector. Most of the available data are taken with the cylindrical mirror analyzer (CMA) of electron energies, in which the input beam is limited between two cones with a mean emission angle of m =EP ¼ 0:83 for Al and 0.87 for Si about 42 . In this case one gets EBSE but for Cu no maximum develops (Frank, 1992a). Obviously, the energy distribution of BSE should be modeled specifically for a particular detector geometry and the mean energy of BSE can move everywhere above about 0.6EP. A crucial parameter is the total yield of BSE. With the normal SEM, it is expected that will be nearly independent of the energy of the incident electrons and grow monotonically with the mean atomic number of the specimen, giving the broadly used material contrast in the image. Values of as well as its energy dependence are available from many sources but we will abandon citing individual experimental studies of the yields because near-complete experimental data from the literature have been collected by Joy (2001) into a database in which data for the low-energy range can also be found. Nevertheless, the scatter in the published data is significant, e.g., for Al at 1 keV, six values of can be found in this database, spanning the interval between 0.134 and 0.2346. It is generally assumed that only the SE yield has to be measured on clean surfaces because of the sensitivity of this parameter toward surface contaminations. Nevertheless, large variations in the measured BSE yields can also be explained only by the surface status. In order to confirm this opinion, a targeted study was made consisting in careful measurement of SE and BSE yields at low energies for 24 conductive elements, both as-inserted into a UHV apparatus and after in situ cleaning by ions. The yields were measured using a device based on a principle published by Reimer and Tollkamp (1980). Here we will quote data from this study (Zadrazˇil and El-Gomati, 2002), which have been only partially published hitherto (Zadrazˇil et al., 1997; Zadrazˇil and El-Gomati, 1998a,b). In Figure 13 the data are shown for in situ cleaned specimens throughout the low-energy range down to 250 eV. Obviously, the (E ) curves for various Z do not tend to one point at the low end of the plot as was shown in some older published results but they do this in the dataset for the as-inserted specimens; in Figure 14 we compare the ‘‘clean’’ and ‘‘unclean’’
348
MU¨LLEROVA´ AND FRANK
FIGURE 13. The BSE yields measured in UHV under normal incidence of primary electrons onto targets in situ cleaned by an ion beam; values at E ¼ 5 keV correspond to the atomic numbers (top to bottom) 79, 78, 82, 73, 72, 74, 64, 50, 47, 41, 40, 30, 42, 29, 48, 32, 28, 24, 26, 23, 22, 14, 13, and 6 (data provided by Zadrazˇil and El-Gomati, 2002).
FIGURE 14. Comparison of the BSE yields for normal impact, measured under UHV conditions for as-inserted (- - - - - -) and in situ cleaned (——) specimens; data provided by Zadrazˇil and El-Gomati (2002).
data for several elements. It is usual to explain the observations as a consequence of graphitic contamination and oxide layers on as-inserted specimens, which become less transparent at low energies and increasingly contribute to the BSE signal. Then, because of the presence of ‘‘standard’’
SCANNING LEEM
349
contaminations, very similar data can be obtained for different specimens, at least within their groups of a similar reactivity. Three of these pairs of measurements were studied in detail by simulation of the BSE yields from those specimens, both clean and covered by a contamination layer of a probable composition and thickness (Frank et al., 2000b). It was found that the observed differences in due to cleaning can be explained, for example, by the presence of a 3 nm layer of Al2O3.3H2O on Al or 7 nm of carbon on Au. The question of the information depth D of BSE emission was also addressed. At a first glance, we can expect a value similar to half the penetration depth, i.e., D ffi Rx/2. Simulation showed that this is approximately so at 1 keV while at 3 keV the information depth is 2 to 4 times smaller than Rx/2 (Frank et al., 2000b) while Joy and Joy (1996) presented this depth as approximately equal to 0.2RS. These conclusions indicate that, when entering the low-energy range, we have to consider the BSE yield as another surface-sensitive quantity and when interpreting the SEM image we should take into account data corresponding to the vacuum conditions and surface treatment used. In Figure 13 we notice that, just below 5 keV, the (E ) curves for various Z start to cross each other, which means that the material contrast, i.e., the monotonic (Z) dependence at constant E, disappears here and cannot be reliably used for interpretation of micrographs in the low-energy range. This represents an additional threshold separating the low-energy range. In the SLEEM instrument the above-surface electric field has to be kept as homogeneous as possible, which restricts the specimen tilt to within narrow limits. Even so, nonnormal electron impact can occur because of relative enhancement of the radial velocity by deceleration. The BSE yield grows with the specimen tilt so that, according to MC simulations, for example at 1 keV the ratio (80 )/(0 ) reaches 2.9 for Al, 2.3 for Cu, and 1.8 for Au (Bo¨ngeler et al., 1993). The shape of ( ) does not noticeably vary with the energy of the electron impact. As regards the angular distribution of BSE, at high energies it is circular in the polar diagram, i.e., ( ) ¼ (0) cos . At energies as low as 1 keV, the same shape of the distribution was simulated for Al, but for heavier elements (Cu, Ag, Au) the distribution is more ‘pointed,’’ i.e., it increases more steeply at small (Bo¨ngeler et al., 1993). The same change resulted from the MC simulations of Kuhr and Fitting (1999) at 100 eV, though the Au data appeared much nearer to the cosine law than those for Ag. The eBSE distribution was found to be not only strongly pointed toward the axis but also exhibited anisotropy similar to that in Figure 3.
350
MU¨LLEROVA´ AND FRANK
B. Crystallinity Effects In the previous section we discussed the electron backscattering from a specimen behaving like a homogeneous and isotropic continuum. Experimentally, this corresponds to amorphous substances but otherwise averaging has to be made over grains of polycrystals or different orientations of a single crystal. In practice, we usually observe details connected with anisotropy of the scattering properties of crystals. However, electron crystallography is an independent discipline with sophisticated theory and a broad range of experimental data so within this review we can only briefly touch on several key points. For very thin specimens and higher electron energies, the kinematic theory of diffraction can provide useful results, particularly as regards the geometry of the diffraction pattern from which the crystal structure and orientation can be determined; the basic relation is the Bragg equation k k0 ¼ g ¼ g1 x*1 þ g2 x*2 þ g3 x*3
ð25Þ
in which k0 and k are the wave-vectors of the incident and scattered wave, respectively, and xi* are base vectors of the reciprocal lattice, orthonormal to the base vectors xi of the Bravais lattice of the crystal. This equation represents a condition for constructive interference of waves scattered by individual unit cells in the crystal. In SEM and at low energies, large-angle scattering and multiple scattering phenomena always occur together in the signal generation so that the dynamical diffraction theory is needed. The problem consists in conversion of the incident plane wave into a wave field with the crystal periodicity. This is solved via the Schro¨dinger equation with periodic potential and with the solution developed into the Bloch waves. The topic is again discussed in sufficient detail by Reimer (1998) and by authors cited therein. An important step is the development of the crystal potential into a Fourier series with complex coefficients Vg þ iVg* where g comprises all points of the reciprocal lattice. The imaginary coefficients Vg* (with the dimension of energy) express the absorption of the electron waves and V0 is the (mean) inner potential inside the crystal. The Bloch waves have the formal appearance X aj ðgÞexp 2pi kj þ g r ð26Þ j ¼ g
where aj(g) have the periodicity of the crystal potential. The index j should run over all points of the reciprocal lattice but in practice it is sufficient to
SCANNING LEEM
351
consider only n beams for which the lattice points are near enough to the Ewald sphere. The incident wave then splits into n2 partial waves forming n Bloch waves. On substituting Equation (26) into the Schro¨dinger equation, we get h
2 i 2m X K 2 kj þ g aj ðgÞ þ Vh aj ðg hÞ ¼ 0 h h6¼0
ð27Þ
where K ¼ h1[2m(E þ V0)]1/2 is the length of the incident wave-vector inside the crystal. Solution of Equation (27) gives n values of kj and n2 values of the amplitudes aj(g) for the incident wave K and, naturally, at least two beams should be considered of which that of g ¼ 0 is the primary beam. It is quite obvious that the intensity of such a scattered wave is anisotropically distributed, which projects itself into anisotropy of the emitted signals so that their cosine or similar monotonic distributions become modulated. The directions kB of enhanced intensity can be estimated by the Bragg equation, now in the form of kB ¼ K þ g. Observable consequences of these phenomena include dependence of on the crystallinic orientation (i.e., on the specimen tilt or on orientation of a crystal grain) and also some structure modulating the ( ) distribution. Variations in due to the orientation can be a source of grain contrast at polycrystals. In the two-beam approximation and assuming that the Bloch waves do not interfere, we get (according to Reimer, 1998) for variation in the backscattering coefficient
a ¼ 0 2pD
! þ 0a = ga 2 1 þ !2 0a = ga
ð28Þ
where ga is the absorption length, ga ¼ h=2Vg* , and ! ¼ s ge is a dimensionless factor in which s ¼ |s| characterizes deviation from the exact Bragg position (distance of the reciprocal lattice point from the Ewald sphere) and ge ¼ h=2Vg is the extinction length. Further, D is the (already introduced) information depth of the backscattered signal, D ¼ /N B, with B as the cross-section for scattering through angles greater than 90 . Because at least down to hundreds of eV D decreases with decreasing energy faster than ga , we find that, among other effects, the grain contrast increases at low energies. Modulation of ( ) includes all the electron diffraction phenomena that have proved to be sources of extremely interesting image signals in LEEM. In the low-energy range, the Bragg angles B ¼ arcsin (l/2dhkl) (with dhkl as the interplanar distance in the real crystal) are still less than 90 and no
352
MU¨LLEROVA´ AND FRANK
regular diffraction pattern is formed by backscattered waves. But an important phenomenon is the formation of EBSP (electron backscattering patterns) whereby diffused BSE upon their return toward surface diffract on sets of atomic planes and form Kikuchi bands and lines, which are in fact intersections of so-called Kossel cones with the observation plane. The Kossel cones comprise all the vector directions fulfilling the Bragg equation so that one cone contains k0 and the other k, both having g as the cone axis. In EBSP these features can be visible when the backscattered electron suffers only single elastic scattering before emission. Thus, a good contrast of EBSP can be achieved solely at high tilt angles around 70 and also at higher electron energies. Nevertheless, some modulation of ( ) should be present even in the low-energy range but to the authors’ knowledge, no successful observation has been made yet. At energies of the order of hundreds of eV, the Bragg angles start to exceed 90 and true diffraction patterns can be formed. At the same time, the penetration depth shrinks so that electrons interact with a nearly twodimensional lattice and the Bragg equation has to be fulfilled only for vector components parallel to the surface. In other words, the reciprocal lattice, normally three-dimensional with lattice points of a size inversely proportional to dimensions of the real crystal, is now filled with ‘‘rods’’ perpendicular to the surface. In fact the real situation is usually between the two marginal cases, so that the ‘‘rods’’ are modulated in ‘‘thickness’’ and the third-dimension condition still has to be considered. Formation of a LEED pattern with sharp diffraction spots requires the use of a beam aperture below 1 mrad as in transmission microscopy. As will be shown below, in SEM the optimum beam aperture, tuned to ultimate resolution, can approach the 1 mrad level but after deceleration to very-low-energy, the aperture grows to tens of mrad. Consequently, the diffraction spots extend to discs, as in CBED conditions in STEM. It is a matter of debate whether these mutually overlap, which would further enhance the signal. A position-sensitive multichannel detector situated above the specimen (or in a side position when a through-the-lens detection system, to which the signal electrons are deflected, is employed) enables one to observe the local deviations from fulfillment of the diffraction condition. Even a single-channel integral detector will show as brighter those areas where, at the energy used, the Ewald sphere just crosses some reciprocal lattice point. However, owing to dynamical effects, additional maxima of the signal can appear. This imaging mode will be illustrated in Section VIII. Irrespective of the beam aperture, effects connected with channeling of the Bloch functions, like formation of Kikuchi lines, take place and are visible in the phenomena outlined above, because their geometry is connected with the crystal structure and not with the beam shape.
353
SCANNING LEEM
C. Coherence within the Primary Beam Spot A crucial condition for constructive interference of waves scattered from individual atoms is coherence within the primary spot. This should be assessed in connection with the energy and angular spreads of the beam that determine whether all contributions from the illuminated spot are amplitude summed. As discussed by Buseck et al. (1988) for STEM, no amplitude addition between neighboring pixels can appear. The coherence conditions for a single static spot were discussed by Frank et al. (1999). The energy spread in the primary beam is mainly given by the type of the electron gun and varies between 0.2 eV for a field-emission cathode at room temperature up to about 2 to 3 eV for thermoemission from tungsten, provided we neglect any additional spreads generated by e–e interactions in crossovers. According to Born and Wolf (1975), the coherence condition for the path difference s, i.e., for the size DC of the coherently illuminated area, is DC ¼ jsj
pffiffiffiffi hi2 E ffi 2:45 E
½nm; eV :
ð29Þ
The initial source size, also connected with the emission type used, gives the maximum diameter of the coherently illuminated diaphragm. If we tolerate a decrease to 88% in the complex degree of coherence from the center to the edge of the illuminated area, then a quasimonochromatic uniform source of angular radius ¼ /x (see Figure 15) illuminates ‘‘nearly coherently’’ a circle of a diameter 2r ¼ 0.16hli/ (Born and Wolf, 1975). Hence a further coherence condition is
0:08hi 0:98 ffi pffiffiffiffi 0 0 E
½nm; eV
ð30Þ
In diffraction experiments the so-called ‘‘transfer width’’ (see Woodruff and Delchar, 1986) plays a similar role to that of the beam coherence area. In order to understand this concept, we have to recall that the reciprocal lattice points have a ‘‘size,’’ which is inversely proportional to the crystal dimensions. Hence the diffracted beams have a finite angular size corresponding to the dimensions of the area from which the amplitude addition takes place. If any imperfections in the primary beam exist, such as energy and angular spread, that cause a change in the wave-vector kk, then they also correspond to some distance on a surface, just as we get the surface periodicity length from the Bragg condition, dhk ¼ 1/kk. These lengths are analogously defined as w ¼ 1/kk, and determine maximum distances over
354
MU¨LLEROVA´ AND FRANK
FIGURE 15. Definition of quantities used in the assessment of the primary beam coherence.
which variations in the surface periodicity can be detected. In other words, kk represents now a dispersion caused by a finite aperture and energy spread of the illuminating beam. Thus, the area of addition of amplitudes is limited by ‘‘angular’’ and ‘‘energy’’ transfer widths w and wE so that DC w ¼
0:61
pffiffiffiffi 2 cos E
½nm; eV
ð31Þ
and DC wE ¼ dhk
2E : E
ð32Þ
Later we will see that in a normal SLEEM configuration all these conditions for sufficient coherence within the primary spot can be satisfied. The diffraction spots can overlap in the radial direction when their angular size, ffi l /dhk cos, is larger than 2 . This can easily happen and then further intensity increase is achieved. D. Secondary Electron Emission Secondary electrons are released from the target atoms by impact ionization, which forms a substantial contribution to quantities characterizing
SCANNING LEEM
355
the inelastic scattering. According to conclusions drawn from results of momentum-resolved coincidence spectroscopy, the main source of SE is decay of the valence band excitations caused by large momentum transfer spatially localized scattering events (Drucker et al., 1993). Further intensive SE generators include decay of volume and surface plasmons; the yield from the electron–electron collisions is substantially weaker. Upon release from an atom, the internal secondary electron possesses kinetic energy (taken with respect to the bottom of the conduction band) of the order of 101 eV. For example, Schreiber and Fitting (2002) studied in detail the SE emission from SiO2 and found the mean initial kinetic energy to be 13 eV. Owing to further impact ionization and cascading processes, the energy of the SE dropped below 10 eV within 10 fs. Then, scattering on phonons dominated and after 200 fs the electrons were more or less thermalized so that their energy approached 3kT/2, i.e., approx. 40 meV at room temperature. Finally, electron–hole recombination took place and within 1000 fs the released electrons were almost all recombined or trapped. Emission of a SE has, therefore, to take place within a very short time after its generation. Data important for understanding the SE signal in SEM were reviewed by many authors, e.g., Bruining (1954), Kollath (1956), Dekker (1958), Hachenberg and Brauer (1959) and Seiler (1983). The measured SE yields are contained in the database of Joy (2001) and we will quote also the data from the study targeted at the determination of the influence of the surface cleanliness (Zadrazˇil and El-Gomati, 2002). The SE yield is relatively low at the energies normally used in the SEM; we can verify in the database of Joy (2001) that at 20 keV, < for all except the lightest elements, for which both yields are roughly equal. But at low energies, is significantly larger than —this relation creates another crucial distinction of the low-energy range. Similarly, the information depth of BSE is normally much larger than that of SE but this relation also reverses. According to the simulations of Kuhr and Fitting (1998), the relation between maxima of the depth distributions for SE and BSE from Ag is mutually opposite for electron energies 3000 eV and 100 eV. The maximum SE yield m, achieved at a certain energy of the incident electrons Em (located between 100 and 900 eV), remains within 0.5 to 1.7 (Seiler, 1983) or 0.6 to 2.1 (Zadrazˇil and El-Gomati, 2002) for metals. For insulators, owing to the extended escape depth of SE, the yield can reach values even higher than 10 (Seiler, 1983; Joy, 2001), for alkali halides in particular. In Figure 16 a set of data similar to that in Figure 13 is given; these are now values of for the same selection of 24 conductive elements.
356
MU¨LLEROVA´ AND FRANK
FIGURE 16. The SE yields measured in UHV with normal incidence of primary electrons onto targets in situ cleaned by an ion beam; values at E ¼ 1 keV correspond to the atomic numbers (top to bottom) 64, 13, 40, 78, 79, 72, 47, 30, 82, 50, 14, 29, 24, 48, 74, 73, 28, 32, 42, 26, 22, 41, 23, and 6 (data provided by Zadrazˇil and El-Gomati, 2002).
FIGURE 17. Comparison of the SE yields for normal impact, measured under UHV conditions for as-inserted (- - - - - -) and in situ cleaned (——) specimens; dashed curves correspond to (top to bottom) Ag, Al, Pt, Cu, and C at 1 keV (data provided by Zadrazˇil and El-Gomati, 2002).
SCANNING LEEM
357
Further, in Figure 17 the same pairs of measurements are given as in Figure 14, i.e., for the specimen as-inserted into UHV and after being ion-beam cleaned. Here we notice a pronounced similarity between the as-inserted curves, which obviously corresponds to similarly contaminated surfaces although the specimens were thoroughly precleaned and measured under clean conditions. A semiempirical theory of SE emission, summarized by Seiler (1983), gives a universal (i.e., specimen-independent) curve " 0:35 ( 1:35 #) E E ¼ 1:11 1 exp 2:3 : m Em Em
ð33Þ
It is also stated that for metals, m/Em is constant (due to the proportionality of both quantities to J4/5) and approximately equal to 2 103 eV1 (Ono and Kanaya, 1979). Nevertheless, data tabulated by Seiler (1983) do not indicate constancy of m/Em and when extracting this ratio from Figure 13, we find that its average value is around 0.002 eV1 but the values are scattered between 0.0012 and 0.0053 eV1. All sources of data confirm that in their Z dependences both m and Em exhibit a modulation corresponding to the periodic system of elements. This modulation is apparent even when drawing a (Z) curve for an arbitrary energy value (Zadrazˇil and El-Gomati, 1998b). At incident electron energies sufficiently higher than Em, the SE yield decreases as E0.8 (Drescher et al., 1970), which is the energy dependence of the Bethe stopping power. Near and below the yield maximum, no universal relation exists except Equation (33). The energy distribution of emitted SE has a strong maximum at m energy ESE (see Figure 11), which is smaller for insulators than for metals, for which it moves between 1 and 5 eV, while the width of this distribution, measured at half maximum, ranges from 3 to 15 eV (Scha¨fer and Ho¨lzl, 1972). Ding and Shimizu (1996) verified that the energy distribution does not depend strongly on the emission angle. The dependence of the position and width of the distribution peak on the material and its surface status were studied by Dietrich and Seiler (1960) and Joy (1987), and m others. Fitting et al. (2001) found, again for SiO2, that the value of ESE decreases with increasing escape depth of SE. Chung and Everhart (1974) presented a simple theory leading to a relation for (ESE) for metals. They supposed the surface potential barrier fully transparent for ESE>0 (with ESE measured from the vacuum level) and nonpenetrable otherwise, and the SE generation to be isotropic and depth independent. The resulting
358
MU¨LLEROVA´ AND FRANK
expression was dNSE 1 ESE ¼K E ðESE þ W Þ4 dESE
ð34Þ
(where W is the work function and K is a material constant), giving m ¼ W =3. For our next considerations, we need some ‘‘mean’’ energy of ESE SE; from Equation (34) the mean value of ESE is 2 W. However, the mean value overestimates the contribution of fast SE so that it is more reasonable to take the median, which is equal to W here. Thus, for detection considerations, we can use 3 to 5 eV as the typical energy of SE. A more exact theory would require incorporation of processes of SE generation, diffusion inside the target, and penetration through the surface barrier. Reimer (1998) reviewed calculations made for aluminum (see, e.g., Bindi et al., 1980) and hinted at anisotropy of the internal SE release, which is afterwards quickly randomized, owing to the short mean free path, to the cosine distribution. In practice, the distribution ( ) / cos is observed generally at all instances (see, e.g., Kanaya and Kawakatsu, 1972). Nevertheless, with single crystals some structure again appears on the angular distribution, caused by channeling of the Bloch functions as we mentioned for the BSE emission (see, e.g., Burns, 1960). The smooth energy distribution described by Equation (34) can exhibit some additional structure at energies equal to energies of plasmons. Everhart et al. (1976) observed this structure with aluminum and for an atomically clean surface they found that the energy distribution was broadened and contained some features at energies corresponding to surface and volume plasmons. Nevertheless, after very slight oxidation the structure not only disappeared but also the main peak became much narrower. This indicates that the SE generation via decay of plasmons is sensitive to the surface status and is much weaker at ‘‘real’’ surfaces. At high energies the dependence of on the specimen tilt angle is very important, causing the most pronounced contribution to the image signal, owing to which the SEM image acquires its three-dimensional appearance. The proportionality can be written as ( ) / secn with n decreasing from about 1.3 to 0.8 throughout the Z scale (Seiler, 1983). An extreme demonstration of this dependence is so-called edge effect, i.e., a strong overbrightening of side walls of surface steps that dominates micrographs at conventional energies. The phenomenon is simply caused by the SE escape depth being shallower than the penetration depth of primaries, owing to which any inclined facet represents an additional emitting surface. Thus, the edge effect should disappear at low energies (see Figure 18) near to Em for the maximum SE yield where all generated SE are emitted; this was
SCANNING LEEM
359
FIGURE 18. Experimental data for tilt-angle dependence of the SE yield. (Reprinted with permission from Bo¨ngeler et al., 1993.)
quantitatively verified by Pejchl et al. (1993). Consequently, the SE contrast at low energies is restricted to the ‘‘shadowing’’ connected with the usual side position of the detector and the image becomes more ‘‘flat’’ (see Joy and Joy, 1996). When exciting SE from a single crystal, the monotonic ( ) dependence again acquires a structure. This is normally comparable with that of the BSE yield, / (see comparison made for Si(111) by Seiler and Kuhnle, 1970) but toward low energies / does not grow so distinctly as / does. Hence any grain contrast in SEM micrographs at low energies are more probably caused by the BSE emission anisotropy. Further studies regarding the angular distribution of include those of Salehi and Flinn (1981) and Libinson (1999). An important collection of experimental results concerning the SE emission anisotropy has been acquired by using UHV SEM instruments equipped with detectors featuring an enhanced angular sensitivity, usually achieved by suppression of SE emitted off the direction toward the detector. Then, Homma et al. (1993) observed alternating 2 1 and 1 2 domains in subsequent atomic layers on Si (100) as well as reconstructed 7 7 domains coexisting with nonreconstructed remains of 1 1 phase on Si (111). Domains were visible even at an electron energy of 25 keV but enhanced contrast was demonstrated at 2 keV. Similar instrumentation was used to visualize surface atomic steps, e.g., those on Si (111) (Ishikawa et al., 1985) or on an oxidized Cu surface (Bleloch et al., 1989). Obviously, with careful in situ treatment of the specimen surface, even in the ‘‘incoherent’’ SE imaging many phenomena can be observed which would intuitively be
360
MU¨LLEROVA´ AND FRANK
expected to be perceptible solely by diffraction contrasts in the LEEM method. An important characteristic is the mean escape depth lesc of SE, which governs the information depth of the SE image. The probability of escape Pesc is generally considered to be exponentially dependent on the depth, i.e., Pesc exp (z/lesc). Values of lesc range between 0.5 and 1.5 nm for metals and between 10 and 20 nm for insulators while the maximum escape depth is T ffi 5lesc (Seiler, 1967). The larger values of lesc for insulators are in accordance with the enhanced SE yield from them. Fitting et al. (2001) found for SiO2 that lesc decreases with increasing SE energy—for ESE 3 eV it amounted to about 10 nm while for ESE>20 eV it dropped below 1 nm. If the escape depth is brought into relation to the electron range R, we get the maximum SE yield at R ¼ 2.3lesc (Seiler, 1983). At higher energies the SE generation extends to depths from which no escape is possible while at lower energies the generation rate (the integral of the stopping power along trajectory of the incident electron) diminishes. The shallow escape depth, together with sensitivity toward ionization energies of least bound electrons, makes the SE emission very sensitive to the surface status, its cleanliness and contamination, and also to the radiation damage. At conventional SEM energies, the secondary electron signal is composed of so-called SE1 and SE2 contributions, the first being excited directly by PE while the latter are due to BSE returning toward the surface. While SE1 escape from an area the diameter of which is approximately (dp2 þ l2esc Þ1=2 with dP as the primary spot size (see, e.g., Everhart and Chung, 1972), the SE2 signal emission spot is broadened by lateral diffusion of BSE so that the specimen response function consists of two bell-shaped features of different width. We will discuss this later in connection with the image resolution but now let us mention that the total SE yield is usually written as ¼ PE þ BSE ¼ 0 ½sec þ ðÞ
ð35Þ
where 0 is the SE1 emission at normal impact of PE and ( ) denotes the ratio of the SE yields between PE and BSE. The dependence of
is decreasing (Seiler, 1983) and >1 because BSE have lower energies than PE and also their trajectories are generally more inclined with respect to the surface normal. Above about 10 keV, we get ffi 2.5 with only weak material and energy dependences. For low energies, when the electron range approaches the escape depth of SE, this approach, as well as any distinction between SE1 and SE2, becomes questionable. Nevertheless, at least at the beginning of the low-energy range, i.e., down to, say, 2 to
SCANNING LEEM
361
3 keV, Equation (35) can be considered, probably with an increasing value of . The role of BSE in the SE emission has been studied by numerous authors (e.g., Kanter, 1961; Kanaya and Kawakatsu, 1972; Joy, 1984; Hasselbach and Krauss, 1988; Bo¨ngeler et al., 1993). For us the distinction between SE1 and SE2 is of minor importance because in SLEEM the standard detectors acquire the total emission ¼ þ . Nevertheless, we should be aware that, even at low energies, the SE yield from surface films depends on the underlying substrate and when the two materials have very different Z, the change in with the film thickness is very strong so that in fact the SE2 contribution prevails over that of SE1, see, e.g., measurements of Thomas and Pattinson (1970). As regards the noise in SE emission, it is usually considered to follow the Poisson distribution. This was proved for energies below 250 eV (Seiler, 1983) but at higher energies some excess noise content is found (see Reimer, 1971) because of the SE2 contribution. This question does not seem to have been fully answered. Finally let us mention that SE emitted from ferromagnetics are spin polarized (Kirschner, 1984). The degree of polarization is nonnegligible even for very low-energy electrons and further increases with E, with the highest polarization found for the slowest SE. The effect is explained by the different reflectivity of electrons with different spin orientations. This phenomenon would enable one to observe the domain contrast if a detector of polarized electrons was available.
V. FORMATION
OF THE
PRIMARY BEAM
We have already touched on the important circumstance that in SEM the specimen represents a part of the imaging system. The information collected, coming from the entire interaction volume of the primary beam, is ascribed to a single point labeled by pixel coordinates so that the response function of the specimen, i.e., distribution of the signal excited by a monochromatic infinitely narrow incident pencil, has to be taken into account when assessing the resolution. However, incorporation of the specimen properties prevents us from drawing general conclusions about the instrument quality so that it is usual to evaluate an ‘‘intermediate’’ quantity, namely, the current distribution in the primary beam spot entering the specimen. We will do the same and afterwards we extend the discussion toward the concept of the ‘‘real’’ resolution on a particular specimen.
MU¨LLEROVA´ AND FRANK
362
A. The Spot Size Within the scope of this text, we cannot go into details of the electron optical theory of the SEM column, of the lens aberrations and their combinations, and related problems. Let us only mention that correct results, particularly for coherent or nearly coherent illumination by various types of field-emission guns, can be obtained only by wave-optical theory of the electron probe formation, which regards lenses as diaphragms filled by a phase shifting medium that deforms and trims the wavefronts. However, our aim is to explain, using relations as simple as possible, the specifics of the low-energy spot formation and hence we will utilize the simplest approximate figures obtained from geometric optical theory. For more details we can refer to Reimer (1998) and particularly to an exact analysis of the topic made by Hawkes and Kasper (1996b). We will simply consider the primary spot as a convolution of the current distribution within the demagnified image of the gun crossover with discs of confusion of the basic aberrations. Assuming the astigmatism and defocusing aberrations fully corrected, we take into account contributions to the spot size expressed in the form of discs of confusion the sizes of which are
4I dG ¼ p2
1=2
1 , dS ¼ KS CS 3 ,
dC ¼ KC CC
E , dD ¼ KD 1 E ð36Þ
where dG is the demagnified crossover, dS, dC, and dD are the discs of spherical, chromatic, and diffraction aberration, respectively, I is the beam current, is the gun brightness, is the specimen-side angular aperture of the primary beam, CS and CC are the coefficients of spherical and chromatic aberration, respectively, and KS, KC, and KD are numerical factors dependent on the model of the spot formation. Here the leastconfusion planes are assumed for spherical and chromatic aberrations and the final aperture-limiting diaphragm is considered uniformly illuminated. When using the full beam diameters in the least-confusion planes of spherical and chromatic aberrations and FWHM of the Airy disc for the diffraction aberration, we get the numerical factors as KS ¼ 0.5, KC ¼ 1, and KD ¼ 0.6. The next step is to select a summation rule for combining the contributions from Equation (36) into the overall spotsize dP. It is traditional to consider the ray radii in the individual discs as mutually independent random variables with normal distributions. Then the
363
SCANNING LEEM
summation rule is given by a convolution of Gaussian functions the result of which is also Gaussian and dP2 ¼ dG2 þ dS2 þ dC2 þ dD2 :
ð37Þ
In fact, the individual contributions are neither independent nor normally distributed so that Equation (37) provides only a rough estimate of dP. A more realistic but still reasonably simple relation is obtained by defining the disc sizes as the diameter encircling some current fraction. Using this approach, Barth and Kruit (1996) derived the summation rule (for 50% of encircled signal) dP2 ¼
h
dS4 þ dD4
1:3=4
þdG1:3
i2=1:3
þdC2
ð38Þ
and determined modified values of the numeric factors, namely KS ¼ 0.18, KC ¼ 0.34, and KD ¼ 0.54. Other summation rules exist that provide more exact but at the same time more complicated relations for the spotsize (see, e.g., Kolarˇ ı´ k and Lenc, 1997) but we shall use the summation rules in Equations (37) and (38) and compare their results. First of all let us make the following simple observation: when the electron energy E decreases, the wavelength increases as l / E1/2. This causes the Airy disc to extend and in order to suppress the impact on resolution, we have to adjust the beam aperture to the same slope / E1/2. But then the spherical and chromatic aberration discs grow as dS / E3/2 and also dC / E3/2. The same energy dependence would in turn apply to the total spotsize dP, fully preventing any use of very low energies. To compensate this, an objective lens would be needed with aberration coefficients, CS and CC, proportional to E3/2. However, the normal magnetic lenses have energy-independent aberration coefficients. It is true that, for example, for weak lenses CS is proportional to f 3 (Glaser, 1952), i.e., in fact to E3, but after changing the beam energy we have to refocus to the same specimen plane and hence to get the same f and also the same CS. Consequently, the spotsize enlarges at low energies. The optimum angular aperture opt for achieving the ultimate resolution dPm is simply calculated from the relation @dP/@ ¼ 0. In Figure 19 we have the function opt(E ) plotted from the beam energy 15 keV downwards for both above-given summation rules and for two model SEM instruments of a different quality. These are defined in order to span the current instrumentation scope; the first, ‘‘TEG SEM,’’ represents old instruments probably not marketable any more but still serving in plenty of laboratories
MU¨LLEROVA´ AND FRANK
364
FIGURE 19. The optimum angular aperture, opt, for the smallest spotsize, plotted versus electron energy. TEG SEM and FEG SEM denote the two sets of SEM parameters given in the text, the dashed line corresponds to the summation rule (37), and the full line to the rule (38).
while the other, ‘‘FEG SEM,’’ is for high-quality modern devices. The parameters were chosen as ¼ 105 A cm2 sr1, I ¼ 5 pA, E ¼ 2 eV, CS ¼ 50 mm, CC ¼ 20 mm for TEG SEM, and ¼ 109 A cm2 sr1, I ¼ 100 pA, E ¼ 0.2 eV, CS ¼ 1.9 mm, CC ¼ 2.5 mm for FEG SEM. Naturally, there might be queries about individual parameters but as we will see, the basic trends that we are now seeking for are independent of these details. One general trend is obvious already from Figure 19: along the low-energy range, all curves progressively acquire the same slope opt / E1/4. This behavior can be easily obtained from Equation (37) when we retain in it only members growing at low energies, i.e., dC and dD. When substituting / E1/4 into all terms listed in Equation (36), we get the proportionalities dG / E 1=4 ,
dS / E 3=4 ,
dC / E 3=4 ,
dD / E 3=4
ð39Þ
so that the influence of dC and dD dominates and hence the same slope can be expected for dP. In Figure 20 is shown the dP (E ) plot for ¼ opt, which confirms the said behavior, again independently of the summation rule and instrument parameters. The foregoing very simple considerations have yielded the general relation for SEM, namely the proportionality of the spotsize to E3/4. This says that when we want to turn from the conventional energy like 15 keV to units of eV, the resolution in nanometers deteriorates to the same number
SCANNING LEEM
365
FIGURE 20. The ultimate spotsize, dPm , for the optimum angular aperture opt, calculated for two sets of SEM parameters denoted by TEG SEM and FEG SEM (see text) from the summation rules (37) (- - - - - -) and (38) (——).
in micrometers, i.e., below the level of a standard optical microscope. The proportionality to E3/4 seems to be broken by parameters of some recent microscopes that guarantee the spotsize at 1 keV only about three times larger than that at 15 keV but the improvement is achieved at the cost of shortened working distance, reduced current, and other restrictions (see, e.g., Nagatani et al., 1987). In general, a conventional SEM without aberration correctors can work at acceptable quality of micrographs down to 1 keV. Because the E3/4 slope does not depend on the instrument class, we will not discuss in detail the methods of optimizing the objective lenses and detection systems toward improved resolution at low energies. These mostly rely on placing the specimen very close to or even inside the magnetic field, which in turn brings some limitations on other parameters of the microscope operation. Among possible configurations, the so-called single-polepiece lens (Mulvey, 1984), with the second polepiece shifted far from the optic axis and the primary spot, attracts the most attention. Various configurations based on the single-polepiece principle were studied by Pawley (1984), Bode and Reimer (1985), Shao (1989), Mu¨llerova´ et al. (1989) and Ximen et al. (1993), and others. Some setups achieved very low aberration coefficients like the CS ¼ 0.15 mm and CC ¼ 0.55 mm of Tsai and Crewe (1998).
366
MU¨LLEROVA´ AND FRANK
B. Incorporation of the Retarding Field A qualitative step forward as regards possibilities of the SEM operation throughout the full energy scale was achieved by introducing nonconstant beam energy along the column. The idea is to form and transport the beam at high energy and only close to the specimen to retard it to a final low energy. The underlying principle consisted in one property of the immersion electrostatic lenses, namely that the magnitude of their aberrations corresponds to the higher of the electron energies on either side of the lens. So an immersion lens, i.e., an electrostatic lens with different potential on the marginal electrodes, can be inserted into the end part of the column with the negatively biased electrode toward the specimen. Fundamentals about configurations utilizing this principle were studied in detail by Frank and Mu¨llerova´ (1999). For estimation of aberrations of the immersion lens we use the approximate equation (Lenc, 1995)
Z 1 z1 0 1=2 0 0 dz ð40Þ CS CC 2 z0 ðzÞ ðzÞ 1 where interval (z0, z1) spans the transition region of (z) between 0 and 1. If we consider the electrostatic field strength abruptly changing in the planes of flat electrodes held on 0 and 1, we get " # w l 1 þ pffiffiffi CS CC ð41Þ 1 2 k þ 1 2 k (see also Lencova´, 1997) with w and l being the distances between the specimen and the first electrode and between electrodes, respectively, and k the ratio of electron energies on either side of the lens, i.e., k ¼ EP/E (EP is the beam energy in the SEM column and E ¼ EP þ eUb, with Ub being the retarding potential, is now the lowered energy of impact on the specimen). In Figure 21 we see that the approximation (41) differs appreciably from results obtained when substituting real potential distributions into Equation (40) but on its basis we still can make at least one simple consideration. At very low energies, i.e., for high values of k, both CS and CC approach w/2. So they are still independent of energy but can be quite small. However, for w small enough both coefficients are approximately proportional to l/k ¼ (l/EP)E and hence diminish with decreasing energy, as we required in the previous section. Of course, the aberrations according to Equation (41) combine with aberrations of the magnetic objective lens but
SCANNING LEEM
367
FIGURE 21. The aberration coefficients, CS and CC, of the immersion electrostatic lens plotted versus the working distance w with both axes scaled by the length of the retarding field l. (a) Approximate Equation (41) for abrupt field transitions; (b) and (c) calculation from Equation (40) for real potential distributions with the first electrode (nearest to the specimen) of a thickness t ¼ 0.1 l (b) and t ¼ 0.2 l (c).
those are in the summation rule weighed by k3/2 / E3/2 (Lencova´, 1997), which is exactly the energy dependence that fully suppresses the resolution worsening at low energies. Obviously, the immersion objective lens eliminates deterioration of the objective lens parameters for slow electrons and introduces its own but weaker tendency to a larger spotsize. Figure 22 shows the most popular design of a compound lens consisting of the magnetic focusing lens and electrostatic retarding lens (Frosien et al., 1989), which is, together with the above-lens detector, also called MEDOL (magnetic–electrostatic detector objective lens). Authors report improvement in the aberration coefficients from CS ¼ 59 mm and CC ¼ 15 mm to CS ¼ 3.7 mm and CC ¼ 1.8 mm at the immersion ratio k ¼ 17 so that a resolution of 5 nm at 500 eV was achieved (Martin et al., 1994). This design was also used in the first and still the only commercial SEM with the retarding field element and subsequently its parameters have been further upgraded. With a similar configuration, Knell and Plies (1998) obtained 3 nm at 1 keV and 9 nm at 200 eV. The MEDOL-type lens was preceded by a purely electrostatic (three-electrode) lens by Zach and Rose (1988), called EDOL (electrostatic detector objective lens, see Figure 34); further data were then published by Zach (1989) and Zach and Haider (1992). For a beam energy of 8 keV inside the column, they applied the electrode potentials –7.5, þ 7, and 0 keV (when proceeding from the specimen) and
368
MU¨LLEROVA´ AND FRANK
FIGURE 22. Combined magnetic–electrostatic (compound) objective lens. (Reprinted with permission from Frosien et al., 1989.)
hence reached a landing energy of 500 eV for which a resolution of 7 nm was reported. Other configurations on a similar principle include the use of a so-called ‘‘booster,’’ i.e., a tube around the optic axis between the anode plane and the lower polepiece of the objective lens, insulated and held at a high positive potential (Beck et al., 1995) so that its lower end fully corresponds to the arrangement in Figure 22. Preikszas and Rose (1995) explored the possibilities of optimizing compound lenses and took into account maximum feasible magnetic and electric fields (they considered as limiting values 5 kV mm1 and 1 T), tolerable fields at the specimen surface, bore diameters in electrodes and polepieces, maximum immersion ratio, and energy spread in the beam. Also Khursheed (2002) examined the aberrations of a set of the compound lens configurations. Let us only briefly mention that, adjacent to the SEM instrumentation area, is the family of IC testers, i.e., specialized scanning devices for inspection of semiconductor structures and measurement of critical dimensions on them (see, e.g., Ezumi et al., 1996). Their recent versions nearly exclusively work in the low-energy range around 1 keV, employ
SCANNING LEEM
369
various combinations of the compound lenses with energy filters (e.g., Frosien and Plies, 1987) and detectors, and achieve resolution comparable with those mentioned above. Practice has confirmed the advantages of using the retarding field principle, i.e., immersion or compound lenses, for SEM in the low-energy range. In recent commercial instruments acceptable imaging parameters have been achieved down to about 200 eV and the limit for reported laboratory configurations and IC testers is similar. A separate class is formed by the first operated versions of aberration correctors. These are capable of achieving the resolution quoted above even in a device with the beam energy constant within the column. Possible corrector configurations were reviewed by Rose (1987), Rose and Preikszas (1992), Hawkes and Kasper (1996a) and Hawkes (1997). The aberration correctors are, nevertheless, mostly applied to STEM, TEM, and LEEM instruments where the specimen influence on the real image resolution is either nearly negligible or does not apply so that any spotsize correction is more efficiently projected into the final result. Only a few applications in SEM have been reported yet; these were briefly reviewed by Frank (2002). C. The Cathode Lens In the previous section we noticed that for a very short working distance w of the retarding immersion lens, the aberration coefficients diminish with decreasing electron energy. A promising alternative is thus to choose w ¼ 0, i.e., to apply the retarding potential directly between the specimen and some anode placed closely above. This configuration is called a cathode lens (CL) and has been known since the beginnings of electron microscopy as the crucial component of the emission electron microscopes. As we already mentioned in Section I, Recknagel published the fundamental theory of this optical element as early as 1941 and showed that its basic aberrations are proportional to the ratio of the initial and final electron energies. The same should be expected for the reversed function in the SEM and this is indicated by Equation (41). More exact analytical relations for CS and CC for a combination of the cathode lens with the focusing magnetic objective lens with aberration coefficients CSf and CCf were derived by Lenc and Mu¨llerova´ (1992b): 2 !4 3 pffiffiffi f l 4 ð k 1Þ 2 l=D 3 k 1 CS 5 p ffiffiffi p ffiffiffi þ 1 þ CS ¼ 3=2 pffiffiffi 3 k l kþ1 2 k kþ1
ð42Þ
MU¨LLEROVA´ AND FRANK
370
2 !2 3 pffiffiffi l 4 ð k 1Þ 2 3 k 1 CCf 5 pffiffiffi CC ¼ 3=2 pffiffiffi 3 k l 2 k kþ1
ð43Þ
with D as diameter of the anode bore. Instead of an abrupt potential transition in the electrode plane, the quadratic polynomial shape was considered here. For our simple characteristics of the energy dependences, development of Equations (42) and (43) into a power series for large k (i.e., small E ) gives relations that are easier to grasp: CS ffi
2 l l 81 1 þ CSf Eþ E 3=2 þ , 3=2 EP D 16 EP
CC ffi
l 9 CCf 3=2 Eþ E þ EP 4 EP3=2
ð44Þ
Equation (44) confirms the conclusions of the previous section: the immersion lens introduces the E1 slope for both spherical and chromatic aberrations but eliminates the energy dependence of the focusing lens aberrations via the weight proportional to E3/2. The same holds for the ‘‘aperture lens,’’ i.e., the optical power of the CL field penetrating the anode bore and forming a divergent lens, as we will discuss below. If we now substitute Equations (42) and (43) into Equation (36), then into Equation (38), and finally calculate again the optimum aperture opt, we obtain the results shown in Figure 23. (In this section we complete the sets of model parameters, FEG SEM and TEG SEM, with D ¼ 3.5 mm, EP ¼ 15 keV, and l ¼ 1.5 and 15 mm.) We see that the optimum angular aperture in the specimen plane is, at least at lowest energies, proportional to E1/4. When substituting this into all four contributions to the spot size (Equation (36)), we get both dG and dS proportional to E1/4 while both dC and dD scale as E1/4 and this can also be expected for dP. Because previously we found that these basic proportionalities are the same for both summation rules, with Equation (38) simply providing 1.6 times larger aperture and 1.8 times smaller spotsize, we used here only one rule. It is important to note that the optimum angular aperture just below the focusing lens, i.e., the beam aperture C formed by the microscope column, remains nearly the same when switching the cathode lens on. Hence the SLEEM mode does not require any significant realignments of the column. In Figure 24 we see again a comparison of the calculated ultimate spotsizes, dPm, for two sets of the SEM parameters as defined above; the summation rule (38) was used again. Obviously, the slope E1/4 is actually achieved at low energies, namely in the energy range where the higher
SCANNING LEEM
371
FIGURE 23. The optimum angular aperture, opt, for the smallest spotsize, plotted versus electron energy for a CL-equipped SLEEM. TEG SEM and FEG SEM denote the two sets of SEM parameters given in the text; the summation rule (38) was used with CS and CC substituted from Equations (42) and (43), respectively. For the cathode lens mode, the aperture is shown both between the focusing and cathode lens (— — ) as well as in the specimen plane (- - - - - -); for the latter case the aperture without CL is also shown (———). The numeric labels denote the maximum field within the CL in kV mm1.
members in Equation (44) become negligible. For larger aberrations of the focusing lens this happens at lower energies so that, quite paradoxically, the overall drop in resolution between the primary beam energy and, say, 1 eV is smaller for the lower quality device—for the TEG SEM and 10 kV mm1, these spotsizes are identical in Figure 24. Figure 24 demonstrates one crucial fact: below some threshold of the order of hundreds of eV, even the routine microscope, equipped with the cathode lens, surpasses the top-quality device as regards the image resolution. This advantage is paid for by the fact that the specimen has to be immersed in the electrostatic field, the strength of which governs the spotsize. The optimum aperture varies with energy and is therefore not convenient to use when acquiring a series of micrographs typical for the SLEEM operation, i.e., showing the same field of view over a broader energy range. In this case some fixed angular aperture is adjusted and it is interesting to enquire how this modifies the resolution vs. energy curve. In Figure 25 we see that when fixed apertures are chosen from among those optimum for certain energy within the low-energy range, deterioration at higher energies
372
MU¨LLEROVA´ AND FRANK
FIGURE 24. The ultimate spotsize, dPm, for the optimum angular aperture opt, calculated for the two sets of SEM parameters denoted by TEG SEM and FEG SEM (see text) from the summation rule (38): the conventional SEM mode without CL (- - - - -) and the SLEEM mode with the CL excited (——), namely for the maximum field strength labeled in kV mm1.
FIGURE 25. The ultimate spotsize, dPm, for the optimum angular aperture opt, calculated for the model FEG SEM parameters (see text, maximum CL field 10 kV mm1) from the summation rule (38) (- - - - -), together with resolutions obtained for three fixed angular apertures, namely 1, 2, and 4 mrad (——).
SCANNING LEEM
373
FIGURE 26. (a) Simplest configuration of SEM with the cathode lens introduced below the objective lens; (b) single-polepiece magnetic lens (SPL) installed below the specimen and serving as the focusing lens while the original objective lens is either switched off or used as an additional condenser lens (in the SLEEM mode, the anode/detector assembly was radially inserted from side to below OL).
is moderate only and, in some instances, a resolution really constant throughout the energy scale is obtained. In the previous paragraphs we concentrated on simple relations concerning the energy dependences of the beam aperture and spotsize. We assumed the electrostatic and magnetic fields of the immersion and focusing lenses as nonoverlapping and, furthermore, the shapes of electrodes and polepieces have not been taken into account. The simplest arrangement, shown in Figure 26(a), can be also realized via adaptation of a conventional SEM (Mu¨llerova´ and Frank, 1993), as will be mentioned below. An electrostatic focusing lens was used in LEEM by Liebel and Senftinger (1991) while Mu¨llerova´ and Lenc (1992b) applied to SLEEM the singlepolepiece magnetic lens (see Figure 26(b)). Khursheed (2002) compared the ultimate resolutions achievable in three configurations that included the specimen inserted into the magnetic field without any retarding, and both the nonoverlapping and overlapping magnetic focusing and electric retarding fields. Using a simple model of very thin electrodes and the bellshaped magnetic field (Glaser, 1952), he found that the overlapping fields provide 1.5 to 2 times smaller spotsize than the ‘‘sequential’’ configuration
374
MU¨LLEROVA´ AND FRANK
and at 5 kV mm1 a spotsize of about 1 nm for an electron energy of 200 eV was calculated. D. The Pixel Size As we already mentioned in Section IV.D, the specimen response function for the total electron emission is composed of two bell-shaped contributions of different widths. The narrower peak corresponds to the SE1 part of SE, released directly with primary electrons, and its width is similar to the primary spotsize dP, amounting approximately to (d2P þ l2esc)1/2, while the broader component is that of SE2 and BSE and its width is similar to the electron range R. At high energies, the SE and BSE signals are, as a rule, detected separately and the SE resolution is much higher than that of BSE. The SE2 contribution to the SE image is usually smeared so much that visually it is not apparent and when the resolution is measured between 25 and 75% of the signal rise on a sharp edge, the SE2 signal need not manifest itself at all. The BSE resolution is usually presented on small clusters of heavy metals so that the localization of information is improved by a sharp structure within the broad three-dimensional distribution of the BSE yield. However, at low energies the electron range approaches the escape depth of SE and the widths of both response functions become similar. As demonstrated for a silicon specimen by Reimer (1998), below 1 keV the SE distribution becomes even broader than that of BSE owing to lateral diffusion of SE2 after their release by BSE. In the SLEEM method, we usually detect a mixture of SE and BSE and use just the energy range where both distribution widths are comparable—this is why we have to consider the real resolution, or the pixel size, as determined by the full response function incorporating also the specimen. The problem was solved using the response function formalism by Frank (1996a,b). The spatial distribution IT (r) of the total emitted current in the surface plane can be written as Z IT ðrÞ=IP ¼ C ðrÞ þ ð1 þ Þ
C ðr0 ÞS ðr r0 Þ dr0
ð45Þ
where IP is the primary current. Let us assume both the column response C(r) and specimen response S(r) to be two-dimensional distributions of independent normal random variables. The normal distribution of BSE and SE2 (i.e., the shape of S(r)) was proved by Hasselbach and Rieke (1982) above 20 keV so at lower energies it can be assumed only as a rough
375
SCANNING LEEM
approximation and the same holds for the shape of C(r). One way of assessing the pixel size is to take the RMS distance of the emitted electron, dRMS, which can be calculated for the axially symmetric case as "Z ,Z #1=2 1
dRMS ¼ 2
1
r2 IT ðrÞ dr 0
IT ðrÞ dr
:
ð46Þ
0
After substituting from Equation (45) and taking two-dimensional Gaussians for both C(r) and S(r), we get
1=2 dRMS ¼ ½ þ ð1 þ Þ 1=2 dP2 þ ð1 þ Þ dP2 þ dS2
ð47Þ
where dP is the spotsize and dS is the RMS width of the specimen response. Equation (47) was then used for the estimation of the best achievable values of dRMS at low energies. The emission yields were calculated from the approximate relations reviewed above and the primary spotsize was assumed both for a standard SEM and for the CL-equipped one. The RMS specimen response dS was determined by MC simulations using software described by Czyzewski and Joy (1989) with the result dS ffi C1 E 1:75
ð48Þ
where C ffi 9 1011 kg m2 eV1. With the approximations described above, the pixel size dRMS exhibits a minimum (see Figure 27) enabling one to define the optimum imaging conditions for a particular specimen when the total electron emission is detected, as it is in most versions of SLEEM. Hence
FIGURE 27. Comparison of the primary spotsize (——) with the total pixel size dRMS calculated for Cu (- - - - -). The CL parameters were l ¼ D ¼ 5 mm and the summation rule (37) was used.
376
MU¨LLEROVA´ AND FRANK
optimum energies of the electron impact and ultimate values of dRMS were calculated for all three configurations indicated in Figure 27 and for the majority of chemical elements. The optimum energies move between 330 and 4530 eV while the ultimate resolutions were found as 5 to 13 times the nominal spotsize at 30 keV for both microscopes without CL and only 1.6 to 2 times for the CL-equipped model TEG SEM (Frank, 1996a). However, these data provide only broad guidance because of many simplifications made. The approach employing the specimen response function can be extended one step further, provided the SE emission is considered only (see Frank, 1996b). In the previous derivation we took the specimen to be fully homogeneous, with all yields constant with respect to r. Now we can progress to a specimen composed of a homogeneous substrate with a heterogeneous surface film or surface relief. Then, both and in Equation (47) remain position independent but the emission distribution ISE (r) can be written as a convolution, ISE(r) ¼ (r) iSE(r), with iSE ðrÞ=IP ¼ C ðrÞ þ ½C ðrÞ SðrÞ
ð49Þ
which enables us to separate the imaged surface from the distribution of illumination by both PE and BSE. Because S(r) does not vary over the surface for a homogeneous substrate, we get the true specimen response function, which can be, for C(r) and S(r) approximated by Gaussians, written as h 1=2 i IRF ¼ G2 ðP , rÞ þ ðZÞG2 P2 þ S2 ,r
ð50Þ
where G2(, r) ¼ (2p)1 2 exp(r2/2 2) is the two-dimensional Gaussian function. Equation (50) opens possibilities of using any acknowledged resolution criterion, like the Rayleigh one or those based on a certain encircled portion of signal, in addition to the evaluation via statistical moments that was performed before. In Figure 28 the real resolutions for C, Cu, and Au are compared for the Rayleigh criterion and the pixel size defined by 80 and 90% of the encircled signal. Obviously, the appearance of the resolution minimum, as in Figure 27, is connected with criteria oriented to the total signal (like dRMS) or to its major portion (like d90). In the dR curves the minimum is not present at all and the dnn curves exhibit the minimum (connected with a significant influence of SE2) only for very high percentage nn. Already at nn ¼ 80% the minimum disappears for the lightest element and at lower nn it is also not found. The above analysis showed that the real resolution has to be assessed by means of criteria oriented onto the central peak of the total response
SCANNING LEEM
377
FIGURE 28. (a) Resolution dR calculated from IRF according to Equation (50) when the Rayleigh criterion is used (i.e., a drop of IRF to 36.74% of the maximum; see Born and Wolf, 1975) for three elements, with dR0 representing the first term in Equation (50) only; (b) resolution dnn for nn ¼ 80 and 90% of the signal encircled within the diameter defining the resolution, again with dnn0 for the first term in Equation (50). Parameters of the model FEG SEM and Equation (37) were used.
function, i.e., criteria based on a certain decrease of IRF with respect to its maximum or on some encircled signal. These criteria show only a small extension of the pixel size with respect to the primary spotsize, as exemplified in Figure 28. On the contrary, the statistical moments of the signal distribution in the specimen plane overestimate the influence of species having diffused to great distances so that fully unrealistic figures appear at higher energies (see Figure 27). This indicates that even at low energies the conventional resolution tests can be used provided their evaluation respects the above-mentioned circumstances. E. Spurious Effects The spurious effects influencing parameters of the electron probe in SEM are listed in Section II.C. Some of them are connected with the Coulomb forces acting between electrons moving within the beam so that the intensity of effects depends on the beam energy. The main phenomena include probe size broadening owing to stochastic e–e interactions, broadening of the energy spread (the Boersch effect), and defocus or probe shift caused by the overall space charge. The probe broadening caused by stochastic interactions was studied by Spehr (1985). He found the spotsize enlargement proportional to the normalized beam current 3=2 I E0 ð51Þ k¼ 2 I0 2E
378
MU¨LLEROVA´ AND FRANK
where I is the beam current, is the angle of beam convergence, I0 ¼ 3.41 104 A, and E0 is the rest energy of an electron. This E3/2 dependence is further enhanced by another factor that increases with decreasing energy with progressively varying slope and cannot be characterized by a simple proportion, but for short slow beams it approximately behaves as ln2 (const E 1). Naturally, the final crossover at the specimen surface is the most critical one because the energy is lowest there. In cathode lenses, the beam aperture grows toward the specimen surface as 2 / E 1 so that altogether we get the probe-broadening rate somewhere around E 1. Mankos and Adler (2002) explored the problem of stochastic interactions for the cathode lens configurations. Using precise tracing of particle bunches through calculated electric and magnetic fields for both electrostatic and compound lenses with non-overlapping retarding and focusing parts, they obtained the ‘‘blur’’ values for wide ranges of the beam current and current density. Being oriented to direct imaging in the PEEM mode, their data range is shifted to larger currents and lower densities than those corresponding to the SLEEM situation. We can extrapolate their data to our case, the probe current of 5 pA and the spotsize of 10 nm at lowest energies, i.e., to the current density 5 103 mA cm2, and obtain a broadening of about 1 to 2 nm. Otherwise, a linear increase in the blur with decreasing EP was found. As regards the increase in the energy spread owing to e–e interactions, we already mentioned the fundamental work of Rose and Spehr (1980). For the stigmatic focus they calculated the extra energy spread to be hE/Ei ¼ 2pk (see Equation (51)) for low currents, so that E / I1/2E1/4. This result is independent of the beam aperture provided k 1 and 12E/E0. The second condition is easily satisfied and for a beam current of 5 pA and aperture of 1 mrad, we get k 2 102 at 1 eV while for larger energies it further decreases as E 3/2. Thus, the Boersch effect is not enhanced at low energies. The average space charge within the whole beam acts as a divergent lens causing some defocus of the primary spot. Spehr (1985) showed that for a constant current density across the beam and k<102, refocusing of the appropriate lens enables the spot broadening to be corrected with negligible residual effect. For an electron beam with Gaussian cross-section, a contribution to the spherical aberration is generated too, with the corresponding confusion disc that, again for low currents, has a diameter de–e ¼ 1.1k D0, where D0 is the diameter of the beam-limiting diaphragm. Nevertheless, at the same time it is claimed that this deviation can also be corrected by readjusting the lens excitations. The proportionality to k, i.e., to E3/2, requires the effect to be listed here although there are no reports
SCANNING LEEM
379
about its practical demonstration so that successful correction via fine focusing can be believed. Important spurious effects in SEM are caused by the penetration of external electromagnetic waves into the column. These phenomena were reviewed by Frank and Mu¨llerova´ (1999) and found negligible except the beam deflection y caused by a radial magnetic field Br, which amounts to y ¼ eBr (2E/m3)1/2 2 with as the time of flight across the region exposed to the magnetic field. The beam trajectory inside magnetic lenses is shielded by magnetic circuits against the spurious fields relatively well so that the most exposed part is the trajectory along the working distance between the lower polepiece of the objective lens and the specimen. If this region is traversed by p slow electrons at energy E, the time of flight is (1 þ k)/2 times longer than when electrons are decelerated to E from the primary energy EP along p the same trajectory. This means that the beam deflection is reduced (1 þ k)2/4 times, i.e., for example 266 for k ¼ 1000. From this point of view, insertion of the cathode lens below the magnetic objective lens represents the optimum solution. Finally let us recall the problem of mechanical vibrations. This issue is common to all types of SEM and its impact simply depends on demagnification of the gun crossover. The most sensitive component to any vibrations is the cathode itself but in TEG SEM its movements are demagnified 103 to 104 times together with the crossover and become negligible. However, for FEG operated at room temperature the necessary demagnification remains of the order of units, so serious problems arise unless the device is carefully insulated from vibration sources. This problem is not specific to the SLEEM mode.
F. Testing the Resolution We have already mentioned that, even at low energies, the conventional tests of resolution, made with a specimen containing small particles of a heavy metal on a low-atomic-number substrate (most often gold on carbon), can be used provided their evaluation respects the signal composition of SE1, SE2, and BSE. For psychological reasons, it is desirable to extract from these tests numbers that approach very closely the calculated spotsize without any enlargement owing to diffusion inside the specimen. This means the tests should be performed solely upon the SE1 signal. In previous sections we showed that in the low-energy range where the lateral spread of SE2 and BSE emission shrinks and approaches that of SE1, progressively enhanced fraction of SE release takes place within lesc and hence the SE1 signal relatively grows. It is reasonable to suppress the
380
MU¨LLEROVA´ AND FRANK
distribution tail that appears in the edge width measurement by taking the thresholds far enough from the signal levels on adjacent facets; the proved algorithm is to measure between levels of 25 and 75% of the signal rise. Reimer (1998) used MC simulations to model the resolution test for emission distributions of all signals and also their integrals on one side of a moving straight edge and verified applicability of the 25/75 scheme. As we will discuss in the next section, the SE and BSE signals are detected together in the SLEEM mode and therefore extension of the lateral distribution of the total signal with respect to that of SE1 becomes even more probable. Nevertheless, practical experience showed that also here the 25/75 rule is suitable (see Figure 30). Another crucial circumstance, not taken into account with a conventional SEM, is the necessity of using a specimen that preserves a sufficient contrast throughout the full energy scale. It is believed that a difference in atomic numbers as large as that between Au and C should secure this. However, we see in Figures 13 and 16 that the signal yields already change their mutual relations drastically above 250 eV; other published data confirm this down to even lower energies, as Schmid et al. (1983) showed for eBSE. The contrast behavior of the standard Au/C specimen was verified in two instruments equipped with the same SLEEM detector and CL assembly described below. The microscopes differed mainly in the vacuum conditions; one had the usual medium vacuum (MV) of the order of 104 Pa and the other used clean UHV at about 2 108 Pa. As shown in Figure 29, at 320 eV for both devices the Au/C contrast is substantially less than that at 3 keV but at 20 eV for the MV instrument the contrast fully disappears, while in UHV it is inverted and quite high. Surface contaminant layers being less transparent at 20 eV under worse vacuum might cause the difference but as yet the interpretation is not fully clear (see Mu¨llerova´ and Frank, 2003). Nevertheless, fine cracks are still apparent at the surface, the contrast of which is obviously due to enhanced electron absorption in deep cavities, and an appreciable signal rise at edges can be observed. Figure 30 shows a linescan across such an edge, taken at an electron energy of 10 eV, which demonstrates a resolution of 9.3 nm, to the authors’ knowledge the best one achieved yet (Mu¨llerova´ and Frank, 2002). This value corresponds to the objective lens aberrations CS ¼ 33 mm, CC ¼ 15 mm, published by Takashima (1994) for the working distance (WD) of 6.5 mm, and to WD ¼ 8 mm used by us, which is larger than that appropriate for the guaranteed instrument resolution (1 nm at 15 keV). As regards the UHV SLEEM, the same measurement gave 11.5 nm at 10 keV and 26 nm at 10 eV.
SCANNING LEEM
381
FIGURE 29. Micrographs of the standard resolution-testing specimen with Au particles on a carbon substrate, taken (from the top) at 3020, 320, and 20 eV. Left column: dedicated UHV SLEEM of ISI Brno, right column: JEOL 6700F adapted for SLEEM. The width of the field of view is 100 mm (top left) and 200 mm (top right). (Reprinted with permission from Mu¨llerova´ and Frank, 2003.)
VI. DETECTION
AND
SPECIMEN-RELATED ISSUES
This review concentrates on SEM modes employing primary beam retardation close to the specimen. In the majority of instances the retarding field is also traversed by the signal electrons in the opposite direction so that these are accelerated and, if the field has its axial component strongly
382
MU¨LLEROVA´ AND FRANK
FIGURE 30. Linescan across an edge in a micrograph of the Au/C specimen, taken at an energy of 10 eV (JEOL 6700F adapted for the SLEEM method), with the edge width indicated.
prevailing, also collimated toward the axis. In the cathode lens configurations the specimen itself becomes one electrode of the electron optical system. These facts have a decisive impact on the choice of detection principles to be used. First of all, the classical Everhart–Thornley detector, extracting slow SE by a lateral electric field, cannot be used because the emitted electrons are being accelerated along the axis and also because the slow primary beam could be undesirably affected. Hence the objective lens with its focusing and retarding parts has to be considered together with the detector. Further, owing to acceleration of signal electrons, the crucial difference between typical SE and BSE energies is shifted so that usually both appear in the same order of magnitude. Then, separate detection of SE and BSE via conventional methods is not efficient any more and novel detection principles are needed. Finally, the specimen surface parameters, the roughness in particular, have to be considered for the cathode lens assemblies. A. Detection Strategies In previous sections we frequently compared the properties of the immersion objective lens (IOL) in its general form with the particular case for w ¼ 0, in which the retarding part is called the cathode lens. It is worth continuing this also here: as demonstrated in Figure 31, the formation of the ‘‘signal beam’’ is significantly different in these cases.
SCANNING LEEM
383
FIGURE 31. Trajectories of electrons in the electrostatic immersion lens (a) and cathode lens (b); the potential difference within the lens is 10 kV; energies of electron emission are 5 eV (bottom half of the bundles) and 200 eV. The simulation was made using the SIMION 3D package (Dahl, 1995). (Reprinted with permission from Mu¨llerova´ and Frank, 1999.)
For the IOL in the low-energy range, a non-negligible part of BSE impinge on the first electrode and can be detected in this plane while the SE emission is concentrated to a bundle that is focused into some crossover and then again spreads. This formation of an image of the emitting pixel is further supported by the focusing part of the IOL. Thus, the SE beam can be acquired even around the axis in a suitable plane above the IOL with a detector type normally used for BSE. The arrangement (see Figure 22) is then similar to the so-called ‘‘upper’’ SE detector utilized for acquisition of the SE beam from a specimen immersed in the magnetic field of the objective lens, which collimates the SE emission toward flux lines of the field (see Kruit and Lenc, 1992). Here the SE beam is already accelerated so that its detection is easier. The upper SE detector is usually situated above the deflection stage so that its action also influences the trajectories of signal electrons and a general issue here is to minimize or avoid escape of signal through the detector bore. We will discuss this problem further in the next sections (see Figures 40 and 46). If a CL is used (Figure 31(b)), the signal electrons are collimated to a diverging beam the width of which depends on emission energy and CL parameters. If we solve the classical equation of motion for an electron emitted with initial energy Ee under an angle with respect to the surface normal into the CL field within which it is accelerated to energy EP, we get for its radial coordinate ra at the end of field, i.e., in the anode plane, ra ¼
i 1=2 2l sin h ke sin2 cos
ke 1
ð52Þ
384
MU¨LLEROVA´ AND FRANK
where ke ¼ EP/Ee is the immersion ratio for the emitted electron, defined analogously to k. The entire emitted bundle for 2 (0, p/2) is then concentrated into the spot of radius 2l ra, max ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi : ke 1
ð53Þ
The angle of passage through the anode plane, a, is given as sin a ¼ ke1/2sin so that the angular aperture of the bundle is a,max ¼ arcsin (ke1/2). When drawing a ray backward from the anode plane, we find it crossing the optic axis, according to its emission angle, between l and l(k1/21)/(k1/2 þ 1) behind the cathode, with the latter position corresponding to the paraxial ray. This virtual source is further imaged by the aperture lens in the anode plane and by the focusing part of the IOL. For secondary electrons with characteristic energy 3 eV (see Section IV.D) and for typical values of l ¼ 7 mm and EP ¼ 15 keV, we get ra,max ¼ 0.2 mm and a,max ¼ 14 mrad. The beam of backscattered electrons is formed according to the BSE energy, which is dependent on the landing energy of the primary beam, but at very low energies similar figures are obtained to those for SE. Obviously, the detection made below the focusing lens has to be extended up to the close vicinity of the optic axis or again the throughthe-lens principle has to be incorporated. The previous simple considerations indicate a general problem: we have a narrow signal beam along the axis, escaping at least partially the detection through the central bore left for the primary beam. The signal losses can be reduced if the signal beam can be again broadened within a suitably arranged electric field as is done in the EDOL-type lens already mentioned; we will return to this arrangement in the next section. The issue can be fully solved via deflection by means of crossed electric and magnetic fields, i.e., by a so-called Wien filter or E B filter (see Figure 32). The electric and magnetic forces subtract for the primary beam direction, but for the opposite signal beam direction they add and cause its deflection toward the detector. The Wien condition for equality of electric and magnetic forces can be easily fulfilled for the homogeneous parts of the fields but it is more difficult to satisfy for the spurious fields at the margins of electrodes/polepieces. In addition, any spread in electron velocities causes beam dispersion. This is why two identical but oppositely oriented filters are often incorporated so that the primary beam passes both and any undesired modifications are mutually compensated while the signal beam escapes between the filters (see Figure 46). In order to further minimize any influence on the primary beam, the Wien filters can be made
SCANNING LEEM
385
FIGURE 32. Principle of the beam deflector employing crossed electric and magnetic fields.
weak, just sufficient to deflect the signal beam to where it can enter some other electric field, not penetrating to the optic axis, that extracts it strongly towards the detector; for example, see Figure 33 and the same principle is shown in Figure 46. Various modifications of detector assemblies containing Wien filters have appeared in the literature since the 1980s (e.g., Schmid and Brunner, 1986; Brunner and Schmid, 1987; Reimer and Ka¨ssens, 1994; McKernan, 1998). Zach and Rose (1986) studied the influence on the primary beam of filter aberrations and proposed using higher order than dipole fields. A detailed study of the filter properties, including fringing fields, was presented by Kato and Tsuno (1990). A significant effort has also been invested in shifting the range of efficient operation of BSE detectors down to lower energies, i.e., in breaking as much as possible the traditional threshold at 2 to 3 keV. The amendments include both technological improvements concerning the preparation of scintillator surfaces and the introduction of extraction fields in such a way that secondary electrons are still not incorporated. Important studies include those of, for example, Autrata and Hejna (1991), Autrata et al. (1992), Hejna (1994), Autrata and Schauer (1994) and Hejna (1998). New detector principles have emerged that employ sophisticated arrangements of electric fields, created by electrodes situated within the magnetic lens bore and gap, and permitting a wide range of manipulations with the emitted electrons and extension of the scope of operation modes of the upper SE detector. These operation modes include collimation of the SE beam for enhanced detection efficiency, reflection of a portion
386
MU¨LLEROVA´ AND FRANK
FIGURE 33. Scheme of the double Wien filter with intermediate electrostatic mirror for deflection of the signal beam in an SEM. The bundle of SE trajectories is shown for the emission energy of 4 eV and emission angles within 70 , Ub ¼ 10 kV. (Reprinted with permission from Pejchl et al., 1994.)
of the SE in order to achieve the charge balance at a nonconductive surface without charging, and, in combination with a moderate specimen bias, conversion of accelerated SE, impacting a converter surface, to tertiary electrons that are detected normally. In this way, good-quality micrographs were obtained down to 100 eV (Kazumori, 2002). An analogous configuration of electrodes, combined with the E B filter, enables one to control the content of SE and BSE in their mixture detected by a single (upper) detector. In this configuration, BSE are converted to SE and biases of the electrodes can discriminate between SE from the specimen and those from the converter surface. These combinations with the crossed E B fields appeared for the first time in the early 1990s (see Sato et al., 1993). Inevitably, arrangements of all electrodes are being published only schematically because details are considered confidential. Nevertheless, the CAD systems available do enable one to tailor the above-specimen fields in various ways and to optimize the detection efficiency for individual portions of the energy and angular distributions of total electron emission.
SCANNING LEEM
387
B. Detectors In the previous section we listed the possible approaches to detection of electrons in systems with retarding field elements. Now we will describe several actual configurations. For IOL setups, i.e., with the retarding field not directly applied to the specimen, the variety of detector assemblies is very broad. For SE detection, they mostly rely upon the through-the-lens principle and detect SE either with a coaxial scintillator-type detector or deflect them by an E B element toward a side-attached detector similar to the ET type. As regards BSE detection, conventional assemblies below the IOL polepiece are utilized but novel approaches have also appeared that include conversion of BSE to SE and detection of a controllable mixture of SE and BSE as mentioned above. Let us recall here the simple EDOL arrangement published by Zach and Rose (1988) (see Figure 34(a)), in which an accelerating electrostatic lens is employed and designed so that the conical electrodes generate significant radial components of the field. Within the first accelerating part of the lens, the SE emission is collimated to a beam while in the decelerating part the beam is, owing to radial forces, appropriately broadened so that it hits the annular detector with a reasonably large central bore. In a configuration according to Figure 34(b), the decelerating part of an electrostatic lens is similarly utilized to broaden the beam but here the lens is of the decelerating type and hence its first part is employed for this purpose. A further difference consists in using the conical electrode as the electron converter transforming accelerated signal electrons to tertiary electrons that are directed by the radial field toward a microchannel plate (MCP)-based detection assembly (Frank et al., 2000a). In this setup, the collection efficiency for 10 eV signal electrons, i.e., the probability of their impact on the converter surface, was calculated to exceed 98% and, after conversion and passing the MCP, 35% of emitted electrons still create signal impulses. A setup on this principle, an electrostatic detector lens (EDL), can be more widely utilized as it enables one to introduce a segmented or even twodimensional collector below the MCP and to acquire data about the angular distribution of emission. For completeness, let us also recall the MEDOL configuration (Figure 22) in which the signal beam is projected onto the detector by the combined action of both components of the compound lens. Now we will specifically deal with detection in systems that employ a cathode lens, particularly those based on adaptation of a conventional SEM. The authors’ first experimental arrangement contained the detector assembly shown in Figure 35(a) (Mu¨llerova´ and Frank, 1993) in which, on the surfaces of the diaphragm and lower polepiece of the objective lens (OL),
388
MU¨LLEROVA´ AND FRANK
FIGURE 34. (a) Scheme of the EDOL arrangement with an accelerating lens that broadens the signal beam inside its decelerating second part (electrode potentials are shown together with electron energy in parentheses). (Reprinted with permission from Zach and Rose, 1988.) (b) Similar principle combined with a converter of accelerated signal electrons into tertiary electrons detected by a multichannel-plate-based assembly (L, lens; Conv, convertor; CPI and CPO, input and output of MCP, respectively, Coll, collector; A, anode; Sp, specimen). (Reprinted with permission from Frank et al., 2000a.)
the accelerated signal electrons are converted to tertiary electrons and these are attracted to a conventional ET detector with the front grid removed in order to allow the scintillator field to penetrate towards the axis, as shown in Figure 35(b). The advantages of this type include very low price and easy realization but the drawback is the quite large working distance that is necessary. In fact, the same arrangement was introduced by manufacturers in the form of the upper detector with converter (see above), but for an easy adaptation in users’ laboratories the space above the SEM objective lens was, of course, not accessible. Another successful design employed the single-polepiece magnetic lens situated below the specimen (see Figure 26(b)), now completed with the anode of a CL and a BSE detector with a YAG crystal; the scheme of the assembly is shown in Figure 36. In this setup, a micrograph with a resolution of 80 nm at 0.5 eV electron energy was obtained for the first time (Mu¨llerova´ and Lenc, 1992b). This type is hardly suitable for adaptation of classical SEM instruments but was used, for example in a specialized lowenergy SEM for inspection of semiconductor structures at a landing energy of 800 eV and a primary energy of 20 keV (Meisburger et al., 1992). The most successful arrangement to date is shown in Figure 37. A crucial component is the YAG:Ce3 þ single-crystal scintillator disc with small central bore of depth and diameter 300 mm, side-attached to a light guide
SCANNING LEEM
389
FIGURE 35. Configuration of the SLEEM mode detector utilizing conversion of accelerated signal electrons on surfaces of the diaphragm and lower polepiece of an OL, and extraction of tertiary electrons toward a conventional ET detector (left); equipotential surfaces within the assembly when the front grid of the ET detector is removed (right). (Reprinted with permission from Frank and Mu¨llerova´, 1999.)
FIGURE 36. A combination of the cathode lens with a single-polepiece lens situated below the specimen.
390
MU¨LLEROVA´ AND FRANK
FIGURE 37. The CL/detector assembly with a YAG:Ce3 þ single-crystal scintillator. (Reprinted with permission from Frank and Mu¨llerova´, 1999.)
made of organic glass (for standard vacuum applications) or of quartz (for bakeable UHV instruments). The bore size was tuned to some balance between reasonable dimensions of the field of view and successful acquisition of very-low-energy electrons collimated towards the close vicinity of the axis. As shown in Figure 38, for one typical set of dimensions (used in the experiment described in Section VIII.E) and for normal impact of PE, signal electrons are detected above 0.5 eV of emission energy. This configuration is similar to the so-called Autrata-type BSE detector (Autrata, 1989) but important differences are the much smaller central bore and the related necessity for fine adjustment of the crystal position in all three axes. Fortunately, this adjustment is decisively facilitated by the fact that the upper crystal surface is also active so that the detector bore is directly observed on the SEM screen (see Section VII.C). The bore shape with a 45 sink is dictated by several issues that include requirements of the boring technology, feasibility of a conductive coating of the inner bore wall, and an advantageous axial field distribution. It is obvious from Equation (42) that one term in the relation for CS, namely aberration of the anode field, is inversely proportional to the bore diameter D and hence D should not be made too small.
SCANNING LEEM
391
FIGURE 38. The maximum emission angle, m, of an electron with the emission energy Ee for which the electron still escapes through the central bore. Solid line: data calculated from Equation (52) for ra ¼ 0.15 mm, l ¼ 11 mm, EP ¼ 10 keV; squares: exact values obtained via trajectory simulations using a software described by Lencova´ and Wisselink (1990). (Reprinted with permission from Frank et al., 1999.).
FIGURE 39. Derivative @/@z of the axial potential distribution (z) for the three anode shapes outlined in the inset (calculated using software described by Lencova´ and Wisselink, 1990).
The shape shown in Figure 37 produces the axial potential distribution that resembles more closely the distribution corresponding to the outer diameter of the sink than that of the inner one (Zobacˇova´ et al., 2003); see Figure 39.
392
MU¨LLEROVA´ AND FRANK
FIGURE 40. Scheme of trajectories of primary and signal electrons in a CL-equipped SEM with an upper detector.
With regard to adaptations of commercial SEMs to the SLEEM method, we should also mention an alternative employment of the upper SE detector, either in a coaxial arrangement or side-attached one with the electron converter, when it is combined with a CL operated between the specimen and the lower polepiece of the OL. Here we gain the space required for the detector as in Figure 37, so that a shorter working distance can be attained. In Figure 40, a sketch of electron trajectories is shown for the case when none of the above-mentioned additional electric fields for manipulation with signal electrons is considered. All previously described setups are intended for adaptation of conventional SEMs that naturally have their full columns at the ground potential and hence any retarding field can be created only via a specimen bias. This can impose serious difficulties particularly for instruments equipped with an air-lock for insertion of a specimen cartridge. But in any case the specimen biasing loads the routine operation with extra tasks and checks. It is much more convenient to use the booster principle with a positively biased central tube that creates the necessary retarding field even toward the specimen at ground potential—the next section will address these questions. Here again we should recall the family of IC testers in which CL/detector assemblies are increasingly used and designed subject to an extra requirement that is not so important for other SEM applications, namely
SCANNING LEEM
393
that the beam current be as high as possible in order to achieve a high throughput in the productions checks (see Section VII.B).
C. Signal Composition Within the CL field the whole energy spectrum of emitted electrons, outlined in Figure 11, is accelerated by the potential difference between specimen and anode. For larger CL fields, the energies of the SE and BSE are of the same order of magnitude and no type of detector can efficiently separate them unless a true energy filter is incorporated. The particular composition of the signal mixture depends on parameters of the fields and on the geometry and here we can only show one typical example. Let us consider the CL/detector assembly according to Figure 37 with a YAG:Ce3 þ crystal for which Autrata and Schauer (1998) published the detection quantum efficiency (DQE), defined as the squared ratio of the signal-to-noise ratios (SNR) at the scintillator input and output. For a cosine distribution of both SE and BSE emissions, the portion of electrons hitting the scintillator (or the collection efficiency) is simply P ¼ IDET =I ¼ sin2 max sin2 min
ð54Þ
where max and min correspond to the marginal rays incident onto the scintillator, which can be determined from Equation (52). Figure 41 shows P and DQE for both SE and BSE, plotted with respect to the landing energy of the PE together with the detection weight WBSE=SE ¼
PBSE DQEBSE : PSE DQESE
ð55Þ
This obviously favors BSE in a ratio between 5:3 and 4:3. Since the SE yield usually surpasses the BSE one, the signal mixture, governed by the weight WBSE/SE, represents a more or less balanced combination of both components. The graph in Figure 41 depends on particular values of three parameters, EP, D/l, and D0/l (D0 being the outer scintillator diameter), but also for other reasonable combinations of these factors similar results for WBSE/SE are obtained. With below-the-lens detectors the BSE/SE ratio cannot be efficiently controlled and generally a sum of both signals is obtained. On the contrary, setups employing the upper detector open possibilities for controlling the signal ratio at least down to about 200 eV where SE and BSE cease to be
394
MU¨LLEROVA´ AND FRANK
FIGURE 41. Collection efficiency P for SE emitted at mean energy of 3 eV, for BSE emitted at 75% of the landing energy of PE (above 50 eV), and for eBSE below 50 eV, calculated for the assembly shown in Figure 37 with a crystal outer diameter 10 mm and bore diameter 0.3 mm and field length l ¼ 7 mm, together with the DQE values corresponding to the various impact energies and with the weight WBSE/SE according to Equation (55).
distinguishable even in the energy spectrum of emission—see simulation results of Kuhr and Fitting (1998).
D. Specimen Surface One important characteristic of an IOL is the magnitude of the electric field penetrating toward the specimen surface. For the CL as the extreme case, the surface field is the maximum retarding field used, while IOL arrangements with a nonzero working distance w, i.e., with the specimen electrically connected to the first electrode, are usually regarded as situating the specimen in a field-free space. Let us assess the field penetration outwards of the retarding lens for a simple case of flat electrodes of the same thickness t and identical bore diameter D, situated at distances w and w þ l from the specimen, as in Figure 21. For three particular IOL geometries we show in Figure 42 the ratio of the surface axial field to the maximum axial field within the lens plotted versus w/l, together with the immersion ratio k for which the IOL alone focuses the beam onto the specimen surface. We see that for very low energies, i.e., say for k 500, the surface field does not drop below 10% of Ez, max. Nevertheless, if a magnetic lens contributes to the probe focusing, the working distance further shortens. If both lenses are of an equal optical strength, the surface field is Ez ffi 0.5Ez, max.
SCANNING LEEM
395
FIGURE 42. The axial field strength on the specimen surface referred to the maximum field strength within an IOL, and the immersion ratio k for focusing the probe by the electrostatic lens only, plotted for three configurations of the IOL with flat thin electrodes: (a) D ¼ l, t ¼ 0; (b) D ¼ l, t ¼ 0.1 l; (c) D ¼ l/2, t ¼ 0.1 l (for symbols see text).
Obviously, the field penetration has to be taken into account even for an IOL with a nonzero working distance. In order to improve the situation, the lens electrodes have to be shaped and their bores tuned to create an axial potential distribution falling as sharply as possible in the close vicinity of the specimen. For this task powerful simulation software packages are available; the optimization procedures were studied by Preikszas and Rose (1995). We will now address the specimen surface roughness. While no problems are expected and appear with an IOL, it has been repeatedly argued that a CL is unsuitable because the specimen surface has to be very smooth if not polished like that of an electrode. Practical experience shows that the real demands are not so strict albeit application of the method is without any doubts restricted to observation of flat specimens. Naturally, the maximum applicable field between specimen and anode is restricted by the danger of a possible discharge that increases with the specimen roughness. Also the imaging process alone requires that the CL field should be homogeneous up to the very specimen, which can be so only for smooth surfaces. Tolerable roughness depends on shapes of protrusions and depressions and on the field strength. Any radial forces, connected with ‘‘waving’’ of equipotential surfaces above the surface relief, shift or smear the primary probe and locally deteriorate the image. Nevertheless, the above-specimen equipotentials do not simply copy the proper surface but depend also on the distribution of surface dipoles and any trapped charges.
396
MU¨LLEROVA´ AND FRANK
FIGURE 43. Si (100) substrate with heterogeneously etched trenches of both width and depth 3 mm, with some traces of Cu decoration, imaged (from the left) at 5 keV, 250 eV, and 1 eV; primary beam energy ¼ 10 keV, CL field ¼ 1.5 kV mm1, width of the field of view ¼ 20 mm.
To the authors’ knowledge, no study focusing on this issue has been published as yet. In Figure 43 we see one practical example of a specimen with known shape and dimensions of surface relief features and can verify that even in the 1 eV micrograph no traces of local deterioration at the trench edge are apparent. Other similar experiments also resulted in the conclusion that a surface relief up to a p–p height of a few mm could be tolerated for the SLEEM mode at moderate CL field strengths not exceeding 2 to 3 kV mm1. Otherwise, in Figure 43 we can also notice a strong shortening of the depth of focus with decreasing electron energy. Restriction to a relief not exceeding a few mm is more or less in accordance with the depth of focus, which is here significantly shortened, owing to p enlargement of the beam aperture within the CL by approximately k times. Otherwise, limitation to the depth of focus is felt to be the most important disadvantage of the method and can be at least partly suppressed only at the cost of resolution, as indicated in Figure 25. Finally we recall one more of the traditional misgivings regarding the SLEEM/LEEM methods, namely that UHV conditions and atomic cleanliness of the surface are unavoidable because of extreme surface sensitivity. It is true that at around 50 eV of landing energy, where the minimum penetration depth of PE is achieved, the surface sensitivity is high and the image contrast is dominated by that of surface contaminants. However, above and particularly below this threshold the electron penetration grows steeply so that at a few eV it is comparable to that at tens of keV. This means that the method itself does not put any demands on the surface cleanliness but such demands might follow from phenomena to
SCANNING LEEM
397
be observed. For example, phenomena connected with surface crystallinity, reconstructions, phase transformations, etc., can take place solely on surfaces free of amorphous contaminants that might prevent atoms arranging according to the distribution of forces inherent in the crystal. E. Specimen Tilt When the specimen is not immersed in an electric field, normal conditions for the specimen tilting can be expected irrespectively of whether the retarding field is used or not. When only a weak field penetrates the anode bore toward the specimen, some balance has to be sought between the image shift and deterioration on one side and the tilt angle on the other side. In the CL any specimen tilt introduces a lateral field component that is not considered in the simple electron-optical theory outlined above. Small specimen tilt can be easily caused by imperfect fixing or sticking of the specimen to the holder so it is advantageous to have the specimen stage equipped with the double-tilt facility at least within a small range. However, it is also interesting to explore the limits of an intentional tilt introduced in order to get the primary beam incident under an angle. Let us now look at the consequences of a moderate tilt at an angle !. Having the anode of the CL in the plane z ¼ 0 at potential ¼ 0 and the cathode at z ¼ l and ¼ 0, we get now the CL field, for the tilt made with respect to the y axis, modified to ðx, zÞ ¼ 0
z : l þ x tan!
ð56Þ
A solution of the equations of electron motion in this field was presented by Frank et al. (1999). The lateral component of the CL field causes some shift of the primary spot in the tilt direction. For small values of ! this shift amounts to
pffiffiffi 3 k1 2 pffiffiffi ffi !l k , 3 ð k 1Þ 2
ð57Þ
i.e., for very large k it tends to (2/3)!l. This magnitude of shift is quite significant so that when tilting the specimen intentionally, we have to do it in very small steps and to correct the position of the field of view in between. A further effect of the tilt is a unidirectional smearing of the primary spot owing to the lateral field created. In scanning devices, it is reasonable to express the image resolution via the number N of spotsizes filling the field of
398
MU¨LLEROVA´ AND FRANK
view and to compare this figure with the number of pixels acquired. Naturally, the optimum operation mode is achieved when these two numbers are equal. Normally, the spotsize and the size of the field of view are to a first approximation independent so that the optimum mode for a given number of pixels can be adjusted via the magnification. Nevertheless, imaging with some excess nonutilized information or with insufficient information density is also available for the magnification below or above the optimum, respectively. In other words, the number of spotsizes within the field of view is inversely proportional to magnification, and for any number of pixels acquired some optimum size of the field of view exists. In our case the beam inclination, generated by deflection coils in order to reach an off-axis pixel, plays its role in the spotsize deformation so that the optimum number of pixels N directly results from the configuration data as 3 : N ffi pffiffiffi 7 k 10 c !
ð58Þ
This number corresponds to the margin of the field of view while toward the optic axis the spotsize linearly diminishes toward its original dimension for no tilt. When taking typical values for the very-low-energy range, i.e., say k ¼ 1000 and c ¼ 2 mrad, and considering 1 tilt, we get N ¼ 407, i.e., a number on the edge of acceptability. An additional consequence of the tilt is the oblique impact of the primary beam, which is deflected by the lateral field component. Notice that this change in the impact angle is homogeneous within the field of view and has nothing in common with inclination due to beam rocking around the pivot point of the scanning system. For a certain deceleration, defined by a value of k, we get some tilt angle !max for which the illumination becomes glancing: !max
1 ffi k
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3ð k 1Þ 2
ð59Þ
Independence of !max on any other parameter offers an advantageous possibility of calibrating the true tilt scale according to the specimen bias causing a near-glancing impact easily recognized from long shadows of relief details. Otherwise, the impact angle can be assessed from Figure 44; this plot depends solely on EP, which is here 10 keV. Obviously, the impact angle is, except in the near-glancing situation, well approximated by !k1/2. Even very small tilts of tenths of a degree can at very low energies secure a full scale of impact angles, which is important for applications connected with the acquisition of the diffraction or energy-band contrasts.
SCANNING LEEM
399
FIGURE 44. Dependence of the beam impact angle on the landing energy for the specimen tilt angles shown (EP ¼ 10 keV); exact values are shown together with those approximated by !k1/2. (Reprinted with permission from Frank et al., 1999.)
The above analysis showed that even with the CL the specimen can be tilted sufficiently to observe phenomena or features that require a tilt for this purpose. Moreover, the mechanical tilts necessary for large impact angles of the beam are only tolerably destructive as regards the image quality.
VII. INSTRUMENTS A. Adaptation of Conventional SEMs As noted in the introduction, many attempts to design and build a SEM incorporating the retarding principle have been performed and the majority of them can be classified as adaptations of conventional instruments. Mu¨llerova´ and Lenc (1992a) reviewed the older works and we will not repeat here this literature survey. This section will discuss some recently applied approaches without aiming at a complete summary of relevant publications. It is obvious that in a conventional SEM the way of providing beam deceleration in front of the specimen is usually restricted to specimen biasing to a high negative potential. An additional electrode, placed above the specimen and electrically connected to it, enables one to arrange for a nonzero working distance of an IOL while without such an electrode we get
400
MU¨LLEROVA´ AND FRANK
the CL. The latter alternative has proved itself successful in all groups of application tasks listed below in Section VIII and experience has been collected from adapting microscopes of all major manufacturers. After gaining preliminary experience with the configuration shown in Figure 35, the setup of Figure 37 was repeatedly installed because of its shorter working distance, easy alignment, and superior efficiency of the signal collection, albeit at a significantly higher price. Up to now, in all versions the YAG:Ce3 þ single-crystal scintillators were used with a thickness of 2 or 2.5 mm and outer diameter between 10 and 20 mm. The light-guide shape is partly dictated by the arrangement of the OL polepiece and of the xy stage but otherwise some space exists for optimizing the shape with respect to the light transfer from scintillator to the side-attached light guide (see Schauer and Autrata, 1998). It is recommended to make the detector retractable so that the resulting restriction to the field of view is not preserved for other operation modes. It is sufficient to provide specimen insulation for about 15 keV while for practical microscopy 10 keV of primary energy is most suitable. In this case the overall working distance should be not less than 8 mm from which 5 mm is left for the CL field. In SEM instruments without specimen loading via an air-lock, it is usually easy to design an insulating insert cup into the standard specimen cartridge so that the specimen holder connects to a high-voltage contact via a pin passing along the stage axis. This design leaves intact all specimen movements including rotation. In air-locks the specimen cartridge is usually side inserted so that some additional mechanism is needed for contacting from the lower side of the specimen already loaded. Alternatively, the contact can be connected in the direction of the specimen loading and a flexible cable used between the specimen stage and feedthrough. These designs have to be tailored to a particular instrument and to its full setup with all options and attachments. The specimen biasing itself needs a negative supply with a ripple not exceeding 105 of the output voltage, finely adjustable in steps of 1 eV or smaller. Because the supply is usually operated near its maximum output voltage, the parameters should be assessed for the full scale. Generating the low landing energy of electrons via the difference between two high voltages is obviously not an optimum solution not only because of the addition of instabilities of the two supplies; it would be more advantageous to apply a voltage directly between the gun cathode and specimen. However, for CL applications this supply would have to be designed as passing smoothly through the zero voltage to the opposite polarity, e.g., during alignment, which is a feature not available with commercial laboratory supplies. The alternative and better solution consisting in using the beam booster (an insulated and positively biased tube around the optical axis between the
SCANNING LEEM
401
FIGURE 45. Possible design of a booster inserted into a magnetic pinhole lens. (Adapted with permission from Beck et al., 1995.)
gun anode and bottom of OL, see Figure 45) is mostly reserved for dedicated instruments but adaptation of this type to a conventional SEM was also reported (Plies et al., 1998). Let us explicitly state an obvious fact that with a commercial SEM containing the booster, the adaptation to the CL mode might be restricted to mere removal of the final electrode of the IOL or to its connection to the upper electrode; in this way the retarding field is shifted to between the end of the booster and specimen.
B. Dedicated Equipment To the authors’ knowledge, among general purpose SEM instruments marketed at present no IOL-containing dedicated device exists except the 1500 Series SEM of LEO Electron Microscopy with the Gemini lens shown in Figure 22 (see http://www.leo-usa.com). The new type of JSM-7400F of JEOL already incorporates specimen biasing but only to 1 to 2 kV and the purpose is declared to consist solely in acceleration of SE toward an electron converter where they are transformed to tertiary electrons detected by the upper detector (Kazumori, 2002). Probably all recent IC testers, i.e., specialized SEMs with very high beam current and a scope of operation modes tailored to inspection of semiconductor structures, including measurement of critical dimensions and local voltages, generation and sensing the electron-beam-induced currents, and operation with surface charges compensated by controlled
402
MU¨LLEROVA´ AND FRANK
FIGURE 46. High-probe-current, low-energy SEM column equipped with a single-pole condenser lens (SPCL) and a single-pole objective lens (SPIOL) (see text for details). (Reprinted with permission from Beck et al., 1995.)
back-streaming of SE, are solved with immersion objective lenses (see, e.g., Meisburger et al., 1992; Miyoshi et al., 1999). Let us describe two of these solutions in more details. Beck et al. (1995) summarized problems connected with the formation of high-current low-energy probes and designed the dedicated column shown in Figure 46. It incorporates the central booster at þ 9 kV (consisting of the gun anode and electrodes 1 to 4) so that with the Schottky cathode at 1 kV and specimen at ground, the primary beam is held at 10 keV throughout the column but within the CL is retarded to the final 1 keV. Using singlepolepiece configurations both as the condenser lens (CS ¼ 13.2 mm, CC ¼ 11.5 mm) and objective lens (CS ¼ 1.05 mm and CC ¼ 0.83 eV at a
SCANNING LEEM
403
working distance of 8 mm and focal length 5 mm, when the retarding field is taken into account), they reached the calculated spotsize of 46 nm and the measured resolution of 50 nm. The beam current was calculated to reach 150 nA but measured only 20 nA, which was ascribed to insufficient vacuum conditions in the gun chamber. The lower SE detector was used only with an unbiased booster while in the low-energy mode the accelerated SE are deflected in the lower Wien filter to outside electrode 2 so that they can impinge on the upper SE detector. Although the Wien filter did not deflect the primary beam, it caused dispersion, astigmatism, and some higher-order aberrations that required pre-compensation by the upper Wien filter with opposite orientation. Meisburger et al. (1992) used the primary beam at 20 keV in the column and decelerated it in front of the specimen held at 19.2 kV together with the nearest electrode of the immersion objective lens that was combined with a magnetic single-polepiece lens below the specimen. Again a resolution of 50 nm was achieved at the landing energy of 800 eV with 50 nA in the probe. As above, two Wien filters were incorporated of which the lower one deflected signal electrons to a semiconductor detector with a fiber-optic light guide while the upper Wien filter incorporated in its electrostatic octupole the blanking unit, stigmator, and centering system. More instruments of the type described above can be found at major producers of SEM technology and also a specialized instrument industry for this application exists. Nevertheless, critical details about equipment developed outside the academic community are very often confidential. Another family of dedicated instruments consists of laboratory equipment composed for basic research tasks and although their development often began with a commercial microscope, the volume of modifications was much larger than that described in the previous section. One example is the UHV SEM working within the range 100 eV to 3 keV with resolution of 60 nm at 250 eV and equipped with a LEED pattern detector consisting of two concentric hemispherical grids, two microchannel plates, and a two-dimensional detector. The specimen was typically inclined 45 toward the detector that was, moreover, rotatable around the sample. Signal processing functions included selection of a diffraction spot for the brightfield imaging and for various dark-field images. This device, based on a Hitachi S-800 but completed by equipment typical for surface analysis devices including the magnetically driven transfer from the air-lock, provided many interesting observations, e.g., of grains on polycrystalline Si, the step structure on a Si (111) surface, domain structure on the reconstructed Si (110) 16 2 surface (Ichinokawa et al., 1987), and also of superstructures and of movement of islands on an Au-evaporated Si (111) surface (Ichinokawa et al., 1986). In order to collect and process the image
404
MU¨LLEROVA´ AND FRANK
FIGURE 47. UHV SLEEM for examination of clean and defined surfaces (the magnetically driven specimen transport is not attached to the air-lock on the far right-hand end).
data, the authors of these studies made enormous effort with their electronic equipment, the quality of which was far below that of the present. In the authors’ laboratory an UHV SLEEM instrument has been developed, the design representing a combination of the ‘‘adapted SEM’’ outlined above (i.e., biased specimen and detector according to Figure 37) and facilities usual in surface analysis equipment for examination of clean and defined surfaces (clean UHV conditions of the order of 108 Pa, separate preparation chamber with the ion gun for cleaning and sputtering the surface, and with an attachment for evaporation of metals)— see Figure 47. The basic illumination system is the commercial electrostatic two-lens column with a Schottky cathode (2LE Column, FEI Company; see http://www.feibeamtech.com/pages/electron.html). The apparatus is intended to employ diffraction, interference, and energy-band contrasts and will be equipped with a two-dimensional LEED pattern detector the design of which has not been finished yet and with a parallel operating electron
SCANNING LEEM
405
FIGURE 48. Double-tilt specimen stage for the device in Figure 47, permitting insertion of a specimen cartridge via the air-lock with five high-voltage connections to outside of the vacuum chamber.
energy analyzer of the hyperbolic type (Jacka et al., 1999) that is under preparation. As we saw in Section VI.E, even a tiny specimen tilt manifests itself with large beam impact angles at the lowest energies. In order to acquire full control over the impact direction, a double-tilt specimen stage is necessary as shown in Figure 48. It features the x and y movements of 5 mm, z-axis movement of 10.5 mm, rotation of 8 , and two mutually perpendicular tilts of up to 5 . One prospective task is to solve a combination of the SLEEM method with surface microanalysis like Auger spectromicroscopy. For this purpose a miniature all-electrostatic SLEEM column has been developed with the built-in detection part shown in Figure 34(b) (see Frank et al., 2000a). The whole device (see Figure 49) is of length 90 mm and diameter 45 mm, and at 5 keV primary energy, probe current of a few nA, and working distance 5 mm, it provides a resolution of 30 nm (El-Gomati et al., 2002) that predicts a value around 100 nm at 10 eV. The column fits inside a cylindrical mirror analyzer for Auger electron spectroscopy but in a separate installation it can be completely biased to a high positive potential providing for the CL configuration with earthed specimen. Bearing all the previous considerations in mind, we can now outline a design for the ideal dedicated SLEEM instrument for general
406
MU¨LLEROVA´ AND FRANK
FIGURE 49. Electrostatic three-lens mini-column, equipped with Schottky cathode, two-stage quadrupole centering system, octupole stigmator/centering and built-in detector (see Figure 34(b)) with a six-segment collector. (Reprinted with permission from Frank et al., 2000a.)
very-low-energy SEM applications. Such a device would be similar to the setup shown in Figure 46, i.e., equipped with the positively biased booster and the CL with the specimen at ground potential. For acquisition of diffraction and energy-band contrasts, a two-dimensional multichannel detector is desirable onto which the full LEED pattern would be projected. In order to focus the diffraction pattern, we have to let the signal beam pass through some lens. This can be an IOL and then deflection toward some kind of upper detector is necessary but here the deflection assembly has to image its input plane onto the detector plane, so that behind a weak Wien filter the simple extraction field has to be replaced by regular large-angle deflection unit. Altogether the device becomes relatively complicated so that
SCANNING LEEM
407
a configuration with the single-polepiece lens inserted below the specimen (e.g., a miniature one with permanent magnets) seems to be more promising. Then, the LEED pattern will be formed in the space between the specimen and the SEM column (Mu¨llerova´ and Lenc, 1992b). Having ample free space available (see Figure 36), we can design the two-dimensional detector on various principles. Of course, the specular beam would still escape detection, but this could be avoided by allowing for a small specimen tilt. Naturally, a field-emission gun of a high brightness is desirable and when designing the specimen chamber as a UHV one, we extend significantly the scope of possible applications.
C. Alignment and Operation If a conventional SEM is adapted via insertion of the CL, the routine procedures for alignment and operation become modified. Let us now make a few remarks on this topic. A key issue is the strong electric field in the specimen chamber. The maximum applicable strength of the field between the specimen and anode within the CL naturally depends on the quality of both surfaces. It is generally recommended to arrange the specimen holder or cartridge so that no sharp edges or protrusions appear on the side facing the anode and to cover the specimen with a large flat cap made of a smooth foil, leaving the desired part of the specimen exposed for observation. It is good practice to start the experiments with every new specimen by ‘‘training’’ the specimen biasing by slow stepwise increase of the voltage with the rest of the microscope switched off. Good optical properties are obtained for sufficiently high values of the immersion ratio k so that the primary energy should not be chosen below 5 keV while 10 keV seems to be an optimum value. Thus, the field strength is mainly controlled via the working distance and according to experience the range between 1 and 2.5 kV mm1 (i.e., l between 4 and 10 mm) usually suits the purpose. Naturally, when predicting the imaging parameters dependent on the working distance, we have to take into account the necessary underfocusing of the SEM column, connected with the CL optical power (see the next section). We now restrict ourselves to the configuration shown in Figure 37 that has proved to be optimum for adaptation of general purpose SEMs. Two points are important here, namely aligning the detector onto the optical axis and tuning the homogeneity of the CL retarding field. The detector alignment is a standard initial routine used not only when the detector is designed to be retractable as was recommended above but always before entering the SLEEM mode. This routine should be performed
408
MU¨LLEROVA´ AND FRANK
FIGURE 50. An example of the appearance of the microscope screen when the detector/ anode assembly shown in Figure 37 is being adjusted onto the optic axis: (a) the upper surface of the YAG crystal with no specimen bias and only low BSE signal from a specimen composed of light elements; (b) the YAG surface combined with the specimen image in the bore, both images being defocused with the sharp image plane situated between them (specimen biased for an impact energy of 1 keV so that the SE signal dominates).
with a perfectly aligned column, particularly as regards the beam centering. The upper surface of the YAG crystal is treated in the same way as the lower detecting surface so that at low magnifications the scintillator material around the central bore is struck by primary electrons from above and the YAG surface is visible on the microscope screen as a bright area with black circular feature, which can then be easily mechanically adjusted to the screen center (see Figure 50). When afterwards increasing the magnification, we restrict the scanning range to within the detector bore and only the lower YAG surface remains active for electrons emitted from the specimen. This low limit of magnification is usually between 250 and 500, which is one of the drawbacks of this detector type. Nevertheless, modern computercontrolled microscopes enable one to control every active element within the column easily so if a larger field of view is needed, the pivot point of the deflection system can be shifted nearer to the scintillator. This change increases the OL aberrations because of an out-of-axis passage of the beam but the corresponding drop in resolution should not be perceptible owing to the enlarged pixel size. A further step in the recommended procedure is to check whether the position of the detector bore does not move on the screen when refocusing from the detector plane to the specimen plane visible inside the bore; otherwise the OL centering has to be improved. Next the specimen bias is increased in steps and image astigmatism is corrected together with the shift
SCANNING LEEM
409
FIGURE 51. An example of the very-low-energy image in the center of the field of view with the rim of the mirror image connected with glancing electron impact and with their reflection in front of the surface; electron energies from the left: 2.5, 1, 0, and 2.5 eV.
of field of view, which is eliminated via slight specimen tilt. When continuing this until the specimen bias is equal to the gun voltage, we approach the zero energy of electron impact so that from the margin of the field of view the surface image starts to convert to the mirror image of an above-specimen equipotential surface. Now the most sensitive alignment of the CL field can be made by shifting the residual central area of the very-low-energy image into the screen center by using small specimen tilts. If the double-tilt facility is not available, the single tilt can be combined with the specimen rotation. Sometimes small corrections of the OL centering are made but this is a pragmatic step not belonging to the consistent alignment procedure. This step of the alignment procedure is illustrated in Figure 51. Roughly below 5 eV some signal decrease in the center of the field of view starts to appear owing to loss in signal escaping through the detector bore, as follows from Figure 38. At low magnification, the margin of the field of view appears (see the next section, Equation (60)) and shrinks with further reduction of the energy. Around the field of view, a rim of mirror image is seen where electrons reflect on above-surface equipotentials. When the zero energy is reached, the specimen surface acts as a planar mirror reflecting the lower scintillator surface with the central sink and bore and this image can be focused. The signal deficit in the center of the field of view, analogous to the escape of the (00) diffracted beam, is inherent in coaxial detectors in general. In practice this is avoided by a slight specimen tilt causing only a small loss in resolution. Systematic solution requires using a deflection unit and sidepositioned detector. In a well-aligned device, the very-low-energy image centered according to Figure 51 is seen and the field of view does not move when decreasing the specimen bias throughout its full range. In Figure 51, some residual ellipticity of the central spot is still visible, which indicates that the configuration is not perfectly axially symmetric.
410
MU¨LLEROVA´ AND FRANK
In the course of the SLEEM mode operation it is always wise not to increase the specimen bias too fast and to use specimen movements, z-shift, tilt, and rotation in particular, very carefully with larger changes better made at a low bias. Interpretation of contrast observed below a few hundreds of eV is often not straightforward and is significantly facilitated when a series of micrographs of the field of view is available within the full energy scale. D. Practical Issues The optical power of the cathode lens influences the beam impact on the specimen and modifies characteristics like the image magnification, the working distance (when assessed according to excitation of the magnetic focusing lens), and the impact angle connected with beam rocking around the pivot point of the scanning system. Also an additional condition restricting the field of view appears, namely that connected with the increase of the impact angle to p/2 as is shown in Figure 51. While the restriction to the field of view and increase in the impact angle can be only recognized and considered when interpreting the image, corrections for focus and magnification should be incorporated into the microscope control software. This section aims at preparing algorithms for these corrections (see Zobacˇova´ et al., 2003). The basic equations were given by Mu¨llerova´ and Frank (1993) while Hutarˇ et al. (2000) solved the correction of magnification. Let the primary beam trajectory (see Figure 52) be initially directed into a point with radial coordinate r0 that in paraxial approximation is given as r0 ¼ (wS þ l ) , where wS is the axial coordinate of the virtual vertex of the scanning system. The field in the anode bore acts as a diverging lens of a focal length fA that moves the virtual vertex to z ¼ wS0 ¼ fAwS(wS – fA)1. The lens also enlarges ! 0 and in the paraxial approximation (i.e., when we can put tan ffi sin ffi ), we get
’ ¼ wS/wS0 . The homogeneous retarding field further deflects the oblique impacting ray along a parabolic trajectory so that it is easy to trace its radial coordinate and velocity; an interesting point is where the axial velocity falls to zero. This takes place on a fictitious ‘‘reflection surface’’ that intersects the specimen surface at a radial coordinate rmax defining the maximum size of the field of view to 4l 4l þ 3wS Vmax ¼ 2rmax ffi pffiffiffi : k 4l þ wS
ð60Þ
SCANNING LEEM
411
FIGURE 52. The primary beam trajectory inside the objective and cathode lenses.
In order to derive Equation (60), the focal length of the aperture lens (Lenc and Mu¨llerova´, 1992a) fA ¼ 4l
k k1
ð61Þ
was, for near-zero landing energies, i.e., k 104, approximated by 4l. Assuming wS ¼ 25 mm (see below), l ¼ 5 mm, and EP ¼ 10 keV, we get Vmax ¼ 0.42 mm at the landing energy of 1 eV and 130 mm at a mere 0.1 eV, i.e., quite acceptable figures. It is obvious from Figure 52 that the beam impact angle, which is initially equal to and, owing to the aperture lens action, enlarges to 0 , increases further within the retarding field. Solving again the parabolic motion up to the specimen surface, we get for the final impact angle, CL, the relation tan CL
pffiffiffi k sin 0 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , 1 k sin2 0
ð62Þ
412
MU¨LLEROVA´ AND FRANK
i.e., this enlargement is roughly k1/2 times. When substituting for 0 and then from Equation (61), we find near the optic axis
CL
pffiffiffi r0 k k 1 wS 1þ : ffi k 4l wS þ l
ð63Þ
For a tilted specimen, this impact angle, increasing linearly with the off-axis distance, combines with the uniform impact inclination shown in Figure 44. The above-described changes in the parameters of the field of view are inherent in the configuration and although we can modify them to some extent by controlling the vertex position wS, actual correction for them is neither possible nor desired. On the contrary, changes in magnification and in the efficient working distance should be automatically corrected in the control software. In the paraxial approximation the coordinate rC of the impact point (see Figure 52) amounts to rC ¼ (2l" þ wS0 ) 0 with " ¼ k1/2/(1 þ k1/2). Let us define the magnification correction factor M ¼ r0/rC<1, which can be used for updating the size of micron marks or the numerical magnification values. From previous relations we can write this as
pffiffiffi fA ðwS þ l Þ k þ 1 pffiffiffi M ¼ :
pffiffiffi fA wS k þ 1 þ 2l kðfA wS Þ
ð64Þ
It should be noted that M does not represent the CL magnification, which is, of course, not wS dependent. When substituting for fA from Equation (61), we get at k>>1 the factor M 2 (1/2, 2/3) within full range of wS/l; the most often met value is M 0.6. Careful measurement of M showed that the approximation (61), derived while considering an abrupt change of the CL field in the anode plane, does not provide values of fA fitting the measured data with accuracy sufficient for the purposes of critical dimension measurement for example. Hence a more exact relation was derived, based upon modeling the anode field of a finite thickness t within which both the axial potential and the electron trajectory follow parabolic curves. Lenc and Mu¨llerova´ (1992b) used this approach when deriving relations (42) and (43) for the CL aberrations. The result was rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
4 ln 1 X 2 k k1 t 1 , X ¼ X : fA ¼ 4l 2 Arth X k1 2k l
ð65Þ
SCANNING LEEM
413
Finally we will look at the magnitude of underfocusing f that has to be made with the magnetic OL when the cathode lens is excited. In a similar way as above, we find in the paraxial approximation that the surface point is imaged by the retarding field at a distance 2l" below the anode. The aperture lens in the anode plane further images this virtual crossover so that it appears near a point lying at a distance of l/3 below the specimen; more accurately, the axial shift of the focused probe is given by pffiffiffi
pffiffiffi 2l k þ fA k 1 ð66Þ f ¼ l pffiffiffi
pffiffiffi > 0: 2l k fA k þ 1 In order to obtain a rough quantitative estimation, pffiffiffi we use fA from Equation (61) and k>>1 and get f ffi (l/3)[1 (8/3 k)]. This leads to a slope of the refocusing, necessary when the energy varies, expressed as @ðf Þ 4l ffi pffiffiffiffiffiffiffiffiffiffi : @E 9 EP E
ð67Þ
For example, with l ¼ 5 mm and EP ¼ 10 keV we get 7 mm eV1 at 10 eV. Equations (64) and (66), each in combination with Equation (65), represent the desired algorithms for the on-line correction of magnification and for the refocusing. Both algorithms contain three parameters, EP, l, and t, while M depends also on the vertex position wS. Values of t and wS cannot be directly measured and in fact they represent some effective dimensions obtainable only by fitting experimental data to the model. This was made in one particular SLEEM arrangement and the result was wS ¼ 25.62 mm (with the OL aperture situated at 27 mm above the anode) and t ¼ 5.79 mm for an anode bore of D ¼ 3.5 mm. The ratio t/D ¼ 1.65 for an anode thickness of 2 mm corresponds accordingly to a ratio of 1.25 for a thin anode (Lenc and Mu¨llerova´, 1992b). Furthermore, for a broad range of variables the measured magnification factors M(l, E ) fitted the above model with deviations below 1.9% including the measurement errors (Hutarˇ et al., 2000).
VIII. SELECTED APPLICATIONS This section summarizes the results of some demonstration experiments in which the aim was to map the main features of SLEEM micrographs at low and very low energies for a particular family of specimen types and
414
MU¨LLEROVA´ AND FRANK
to verify the feasibility of obtaining the types of contrast inherent in slow electrons. Only in Sections VIII.D and VIII.I do we quote results of more systematic studies.
A. Prospective Application Areas First of all let us briefly characterize the application areas in which the use of the SLEEM mode can contribute to progress in the solution of research tasks. It is well established that examination of semiconductor structures, both as regards their geometry and critical dimensions and the local voltages and currents, either initiated by powering the structure or induced by the electron beam, is best performed at beam energies around 1 keV. Dedicated IC testers use this energy and many of them are equipped with some retarding field element. The doping contrast is highest around 1 keV, too, and a further possibility is to use the elastically backscattered electrons at a tailored very-low-energy causing no damage. Nonconductors were for a long time observed below their critical energy where only a moderate charging takes place. This mode was, of course, surpassed by observation just at the critical energy, as described in Section VIII.D. However, a new approach has arisen consisting in controlled return of the fraction of SE needed to balance the charge so that the noncharging situation can be secured at any energy but still only below EC2. Detailed examination of surface topography is best made at an electron energy for which the interaction volume inside the specimen fits in size the relief protrusions. To a certain extent this remains valid even at energies for which the primary electrons do not penetrate to below the escape depth of SE—if a raised feature is just filled with the interaction volume, even SE directed quite far from the surface normal might be emitted. For details smaller than 100 nm this means that the low-energy range must be used. These small features replace the topographical contrast of inclined facets and surface steps, dominating at high energies. Variations in the electron yield with crystal orientation culminate at a few hundreds of eV, which energy range is then optimum for observing grains in polycrystals, crystallinic precipitates, and amorphized areas in crystals. The diffraction and interference contrast below, say, 20 or 30 eV reveals phenomena connected with surface crystallinity and its changes owing to surface reconstructions, adsorption, desorption, growth of layers, sublimation, diffusion, segregation, etc.
SCANNING LEEM
415
This list obviously covers selected but very numerous tasks from virtually any of the application fields of microstructure examination, both in materials science and the life sciences.
B. General Characteristics of Micrograph Series Because of the lack of experience with contrast appearing in micrographs below, say, 500 eV, it is good practice to acquire always (or at least with a new type of specimen) a series of micrographs beginning with the primary beam energy used and continuing by increased specimen bias, possibly up to near the zero energy. When doing this, attention should be paid to preserving the identical field of view and to correcting for the magnification changes wherever this is not performed automatically. This micrograph series will show some characteristic features that include disappearance of the edge effect, transformations in the material contrast, and enhancement of the relief contrast. Further image changes with decreasing electron energy are then inherent to individual structure types. One example is seen in Figure 53, which shows the surface of a Cu polycrystal with the surface oxides and contaminants removed by chemical etching. While at 5 keV the image is strongly dominated by the edge effect appearing on steps made by etching along grain boundaries and also on other etch pits, at 200 eV these over-brightened features are not visible. Instead, the fine surface relief combined with the grain contrast appears as most pronounced. At 10 eV the contrast of residual islands of
FIGURE 53. Surface of a polycrystalline Cu sheet etched in nitric acid, Tesla BS 340 SEM adapted for the SLEEM method, energies from the left 5 keV, 200 eV, and 10 eV, the width of the field of view is 70 mm. (Reprinted with permission from Mu¨llerova´ and Frank, 1994).
416
MU¨LLEROVA´ AND FRANK
FIGURE 54. GaAs-based integrated circuit, JEOL JSM T220A SEM adapted for the SLEEM method, electron energies (a) 9800 eV, (b) 4300 eV, (c) 1400 eV, (d) 20 eV; the width of the field of view is 400 mm.
contamination is strongest; the mechanism would require further examination in order to be explained, a surface microanalysis in particular. Another typical example, shown in Figure 54, represents the semiconductor structures. Here one can notice primarily the contrast changes caused by decreasing penetration depth as interfaces between technological layers are crossed, which projects itself into variations in the BSE and SE2 yields. Possible dynamic effects, connected with injection of electrons into interface states and creation of space charges within the information depth, would also need further examination. At 20 eV the local charging and surface details and defects are most obvious. This micrograph series also illustrates the noncorrected changes in the image magnification with decreasing electron energy. The third example in Figure 55 consists of only a single micrograph representing a typical example of an unexpected contrast that appeared at low energies without being apparent at all at 10 keV. The dots arranged in rows on a cleaved GaAs crystal surface might represent islands of oxide layer preferentially grown on crystal defects or on edges of surface steps made when cleaving but no reliable explanation is at hand. Again, surface microanalysis would greatly facilitate the interpretation.
SCANNING LEEM
417
FIGURE 55. ‘‘Decoration dots’’ on the fracture surface of a low-quality GaAs crystal, Tesla BS 343 SEM adapted for the SLEEM method, electron energy 250 eV, the width of the field of view is 20 mm.
C. Surface Relief In all the series of micrographs in this section, strong enhancement of the relief contrast is apparent when they are made within a broader energy range. At flat metal surfaces without any artificial structure, small relief details are best visible around 50 eV where the penetration depth is shortest. In Figure 56 two frames from the first published series of micrographs, taken throughout the full energy scale, demonstrate this trend. We notice here that although the detector system shown in Figure 37 that was used to acquire the majority of the micrographs is of the overhead type and should not produce any shadowing effects, in practice this is not entirely true. The scintillator is placed in an axially symmetrical position but the side-attached light guide breaks the symmetry and the efficiency of light transport is not identical all over the scintillator crystal even if the optical contact is made properly. Owing to the strong acceleration within the CL field, the emitted electrons more or less preserve their off-axis coordinates and species emitted from any one pixel impact the detector locally. Consequently, the image side situated below the light guide is the brightest (see Figures 50 and 51) and sometimes some off-line corrections are needed at very low energies. In connection with this, larger surface facets inclined toward the light-guide direction might also exhibit a higher signal and hence some kind of moderate shadowing is observed.
418
MU¨LLEROVA´ AND FRANK
FIGURE 56. Chemically etched polycrystalline Ti sheet, Tesla BS 350 UHV SEM adapted for the SLEEM method, electron energies 15 keV (left) and 50 eV (right), the width of the field of view is 50 mm. (Reprinted with permission from Mu¨llerova´ and Frank, 1993).
D. Critical Energy Mode In Section III.E, we discussed phenomena connected with charge accumulation in nonconductive specimens and saw that if the total electron yield curve (E ) is taken as the process diagram, then spontaneous movement of the working point toward the critical energy EC2 does qualitatively explain the observed effects. In addition, it was argued that when the electron energy is below EC2, the ultimate (positive) surface potential is reduced because of recapture of the slower part of the SE. It is important to recall that, irrespective of the initial energy of impact, the charging process causes shifts of the impact energy, influenced by fields of persisting charges, directed always toward EC2. This is, of course, connected with corresponding changes in the image signal; Figure 57 illustrates this for the alternative of a positive charging up. This is a well known effect encountered in the observation of nonconductors and can be utilized via a practical procedure consisting of a temporary increase of the image magnification and subsequent relative assessment of the signal level from the smaller field of view with respect to its surrounding, which reveals in what direction the charging has changed the average emission (see Joy and Joy, 1996). The same approach forms the basis for an automatic method of determination of EC2 (or, more exactly, of the energy causing minimum damage of the image owing to charging), which is also outlined in Figure 57. The method (see Frank et al., 2001) consists in acquisition of a temporal sequence of image signals from individual pixels since their first illumination, and in off-line determination of the integral under the S(t) curve, which can be taken as a measure of the total signal change caused by the charge accumulation. By plotting this quantity versus the initial impact
SCANNING LEEM
419
FIGURE 57. Scheme of the spontaneous time development of the image signal in the course of positive charging up: movement of the ‘‘working point’’ from initial (EB) to final (EF) impact energy (top), signal vs. time plot (bottom left), and the area below the S(t) curve (the charging rate) as a function of the initial impact energy (bottom right). (Reprinted with permission from Zobacˇova´ and Frank, 2003.)
energy, we can find its optimum value where the curve crosses the zero level. The peculiar behavior of this curve below EC2 was explained as a consequence of SE being ‘‘focused’’ into the detector bore by the radial field component above the charged field of view surrounded by the noncharged specimen (Zobacˇova´ and Frank, 2003). The results, demonstrated in Figure 58, are more reliable for flat specimens exhibiting only moderate heterogeneity in the conductivity and electron yields. The observation method described in this section requires modifications in the SEM control software that go beyond the scope of a simple adaptation made by the customer. It is mentioned here to demonstrate that, when working just at the critical energy, we can achieve much better results than at low energies in general. The same idea led to the above-described detection approach incorporating controlled return of a portion of the SE. E. Diffraction Contrast In Section IV.B, we dealt with the electron backscattering from single crystals and hinted at the possibility of obtaining image contrast connected with locally varying fulfillment of the diffraction condition. This consists in getting a bright albeit defocused diffracted beam(s) incident on the detector, which is, for the detector types described here, achieved automatically except for the specular spot (00). Although at very low energies the reciprocal lattice is theoretically two-dimensional and bright spots are
420
MU¨LLEROVA´ AND FRANK
FIGURE 58. Surface of writing paper, nonprocessed and uncoated, Tesla BS 343 SEM adapted for the SLEEM method, electron energies (a) 3650 eV, (b) 2650 eV (the critical energy EC2), and (c) 1850 eV; the width of the field of view is 40 mm. (Unpublished micrographs courtesy of M. Zadrazˇil.)
received at any energy, in fact significant variations in the spot brightness with electron energy are always observed. Consequently, the eBSE signal from crystals is modulated along the energy scale according to the crystal orientation and the distance from the Ewald sphere of a nearby reciprocal lattice point. Also, additional features can appear owing to effects going beyond the kinematical diffraction theory. The first test experiment was published and interpreted in detail by Frank et al. (1999). In Figure 59 we see micrographs taken at normal impact of the slow electron beam. One can compare the brightness of rectangular (with (100) orientation) and triangular (with (111) orientation) Pb crystals on Si and verify that it varies with energy in different ways in the two cases; this difference can be correlated with the diffraction condition for individual configurations. The interpretation of Figure 60 in which the micrographs are taken with the specimen tilted by a mere 1.3 , is much more sophisticated.
FIGURE 59. Flat Pb islands deposited in situ onto an Si (100) surface, JEOL JAMP 30 UHV SEM adapted for the SLEEM method, electron energies (a) 5, (b) 12.5, (c) 42.5, and (d) 378 eV, the width of the field of view is 60 mm. (Reprinted with permission from Frank et al., 1999.)
SCANNING LEEM
421
FIGURE 60. The same specimen and microscope as in Figure 59, the specimen tilted to approximately 1.3 in the direction inclined at 55 with respect to the horizontal line, electron energies from the top left by rows: 6.5, 7.5, 10.5, 16, 18, 22, 29, and 34.5 eV; the width of the field of view is 50 mm. (Reprinted with permission from Frank et al., 1999.)
Now, not only the crystal orientation and energy but also the impact angles, both polar and azimuthal, play a role as they define the incident ray orientation with respect to the reciprocal lattice. Hence even crystals with an identical crystalline plane on the surface but mutually rotated exhibit some specific behavior of the eBSE signal with energy. In connection with this experiment, the degree of the illumination coherence was also assessed according to relations given in Section IV.C. The size DC of the coherently illuminated area was determined according to Equations (29), (31), and (32) and also the condition (30) for the source size was verified. The experimental data were taken as E ¼ 0.5 eV, EP ¼ 10 keV, C ¼ 1 mrad, E ¼ 10 eV, and dhk ¼ 3 nm. Considering the OL demagnification to be 10 times, we have 0 ¼ 0.1 mrad for the diaphragm illumination angle in Figure 15, and finally we get ¼ 30 mrad for the aperture angle on the specimen. Then the factors limiting DC result as |s| ¼ 15.5 nm, wE ¼ 12 nm, and w ¼ 6.4 nm for ¼ p/2 with increase to 9.1 nm at 45 . The real spotsize was not measured but could be estimated to be between 10 and 20 nm. Obviously, the constructive interference took place at least for a major part of the primary spot, which means that at
422
MU¨LLEROVA´ AND FRANK
favorable conditions the image signal was increased by a factor approaching the number of unit cells within the coherence area or within the crystal domain, whichever is smaller. It is obvious that observations like these can be made solely on very clean surfaces and under true UHV conditions. However, it should be underlined once more that these demands do not arise from the observation method employing very slow electrons but they condition the phenomena that can be observed. Prospective applications of the diffraction and interference contrast mechanisms can be estimated according to the huge variety of experimental results collected by means of the LEEM apparatus (see, e.g., Telieps and Bauer, 1985; Telieps, 1987; Bauer and Telieps, 1988; Tromp and Reuter, 1993; Tromp et al., 1993; Bauer, 1994; Tromp, 2000). A survey of references in this area can be found at http://www.leem-user.com. F. Contrast of Crystal Orientation In the previous section the examples showing a crystallinic structure in verylow-energy micrographs concerned the coherent backscattering when the detected yield is increased by amplitude instead of intensity addition of scattered waves. However, in Sections IV.B and IV.D we also mentioned the dependences of both BSE and SE yields on the crystal orientation and argued that these should become more pronounced at low energies. In fact, experiments showed that for metal polycrystals the grain contrast in SLEEM images is highest between 50 and 150 eV (see Figures 53 and 56). Another application of the same effect arises when amorphous and crystalline areas are to be distinguished. The example in Figure 61 presents a lattice of spots amorphized by laser beam exposure of a crystalline layer. Owing to this amorphization, a decrease in both SE and BSE signal can be generally expected so that brighter centers of dots, caused by increased laser beam power, need to be examined in more detail. Figure 61 illustrates the enhanced sensitivity of slow electrons to spurious a.c. electromagnetic fields. The vertical stripes are caused by an excessively high amplitude of the 50 Hz interference from the SEM electronic console. G. Layered Structures One trivial consequence of lowering the impact energy is that thin surface layers that were transparent at high energies become opaque and their structure can be observed. The example in Figure 62 shows a trilayer structure prepared for exploration of the backscattering factor in Auger
SCANNING LEEM
423
FIGURE 61. Structure created by laser beam exposure of microdots (with various beam intensities and exposure times) in a Pt3Si layer made on a glass substrate, dot pitch 2.9 mm; Tesla BS 343 SEM adapted for the SLEEM method, electron energy 200 eV. (Specimen provided by Dr. H. Birecki, HP Labs; reprinted with permission from Mu¨llerova´, 1996.)
FIGURE 62. A patterned multilayer structure consisting of islands of a 500 nm thick Au layer (right) on a Si substrate, partially covered with a 200 nm thick layer of GeSi (top left), JEOL JSM T220A SEM adapted for the SLEEM method, electron energies 9800 eV (left) and 850 eV (right), the width of the field of view is 300 mm. (Specimen provided by Professor M.M. El-Gomati, University of York, UK.)
spectromicroscopy (El-Gomati et al., 1992) with an obvious demonstration of this effect. Specimens of semiconductor devices in a plan view, like that in Figure 54, exhibit the same features but combined with other effects and hence are not so striking.
424
MU¨LLEROVA´ AND FRANK
FIGURE 63. A beveled cross-section cratered by oblique impact of a low-energy ion beam across a multilayer structure consisting of 12 pairs of 100 nm GaAs/63 nm AlAs layers; Tesla BS 343 SEM adapted for the SLEEM method, electron energies from the top left by rows: 20, 30, 40, 130, 430, and 2430 eV; the width of the field of view is 600 mm. (Specimen provided by Dr. J. Kova´cˇ, TU Bratislava, Slovakia, preparation by Dr. A. Barna, KFKI Budapest.)
Unlike the previous example, the structure in Figure 63 produces contrasts that are not so easy to understand. The beveled section of a multilayer, composed of two alternating different semiconductors, shows outcrops of layers of one material (GaAs instead of AlAs) at a strongly elevated contrast within a certain energy interval, and in addition three stripes of different intensity instead of two can be distinguished periodically repeating across the structure. One boundary of the ‘‘extra’’ bright strip, namely that next to the dark part corresponding to the thicker wedge, is not sharp, which indicates that the contrast source might be buried. And in addition, the effect, i.e., both the contrast enhancement and formation of the third fuzzy strip, is clearly of a dynamical nature as can be seen from Figure 64 where none of the previous effects appears at the lowest electron dose while both features progressively emerge with increased current as well as with prolonged frame time. Similar effects were observed with Mo/Si multilayers (Mu¨llerova´ et al., 1997) but they were absent at metallic multilayers such as Ni/Cr. The phenomenon will be further studied because the provisional interpretation, relying upon the influence of charges trapped in the interface states and forming a buried space charge layer, needs to be
SCANNING LEEM
425
FIGURE 64. The same specimen and microscope as in Figure 63; electron energy 450 eV, the width of the field of view is 600 mm, primary beam current 0.2 nA (upper row) and 80 pA (lower row), frame times from the left 3, 11, 30, and 83 s. (Reprinted with permission from Mu¨llerova´ et al., 1997.)
supported by more experimental data; furthermore a complete model of the contrast mechanism, even qualitative, is not available yet.
H. Material Contrast The absence of the monotonic material contrast in the BSE emission, i.e., the direct proportionality / Z, available at conventional beam energies in SEM, is characteristic of the low-energy ranges. This fact is obvious from the (E ) plots in Figure 13 for clean material surfaces. A comparison of clean and ‘‘real’’ surfaces in Figure 14 indicates that, under standard vacuum conditions and on specimens without any special treatment, some residual traces of this contrast can be observed down to about 1 keV. Below 1 keV any relations between BSE yields from different materials have to be specifically reconsidered. As Figure 29 shows, even the contrast between gold and carbon, otherwise representing the extreme in this respect, is inverted or at least disappears at 20 eV, where the eBSE emission already dominates. When following a particular combination of materials throughout the energy scale, even more than one inversion can be registered; Figure 65 shows two of them for the Cu/Si combination and both are met at energies for which the Cu layer is far from being penetrated so that no alternative explanation is possible.
426
MU¨LLEROVA´ AND FRANK
FIGURE 65. Islands of a 300 nm thick Cu layer deposited through a mask, exposed by electron beam lithography, onto the Si substrate, period of squares is 10 mm; Tesla BS 340 SEM adapted for the SLEEM method, electron energies from the top left by rows: 5000, 500, 250, 100, 50, and 10 eV. (Specimen provided by Mgr. F. Mateˇjka, ISI Brno, Czech Republic.)
On the other hand, for a particular couple of materials that at high energies exhibit a moderate contrast because of a small difference in atomic numbers only, in the low-energy range an energy value can be tuned for which much enhanced contrast is available (Mu¨llerova´, 2001). Figure 65 also illustrates consequences of the bad practice consisting in performing the alignment, stigmation, and focusing inside the field of view selected for the final frame. Rectangles of the graphitic layer of contaminants, which are always formed on specimens but at high energies are usually transparent enough, heavily damage the images at low energies and particularly around 100 eV, albeit the sign of the material contrast remains preserved.
I. Electronic Contrast in Semiconductors Observation of doped areas with respect to the semiconductor substrate, both in plan view and on cleaved cross-sections, is one of the major tasks of microscopists, imposed by the semiconductor industry, which is faced with requests for continued diminution of the size of features and increase of the throughput. Several times we recalled the instrumentation branch of IC testers that are represented by low-energy SEMs with special sophisticated attachments. However, the basic question, how to get the best visualization
SCANNING LEEM
427
of the doped areas and what is the correct contrast interpretation, does not seem to have been definitively answered so far. It is obvious that no material contrast can reveal the dopant concentrations as low as 1016 to 1019 cm3, when in a matrix of 5 1022 cm3 of silicon atoms. Still, successful observations have been made since the mid-1990s and interpreted via the electronic contrast mechanism. Mu¨llerova´ et al. (2002) reviewed the previous studies and summarized the present situation in understanding the dopant contrast. The main points are that this contrast is observed in the SE emission, achieves up to 10% level when calculated from the equation Cp=n ¼
Sp Sn Sn
ð68Þ
with Sp and Sn as the mean signal levels in p- and n-type areas, respectively, that p-type generally appears brighter than n-type, and that Cp/n grows toward low energies. In Figure 66 the main ideas are represented of the contrast model described by Sealy et al. (2000), which relies upon differences in the ionization energy, i.e., a distance between the valence band top and the vacuum energy. Because the tiny content of dopant cannot change this characteristic, the ionization energies Ep and En are considered identical but the local ‘‘vacuum’’ level varies in the model, being then balanced via abovesurface patch fields created by surface dipoles of nonconstant density. When the patch fields disappear at a distance comparable with the sizes of the doped areas, some average reference energy level is progressively reached sufficiently far from the specimen. A consequence is that electrons to be emitted from the n-type area have to surmount a barrier higher by some En. The flat band situation, shown in Figure 66, is modified when the presence of the surface states is taken into account, namely so that the band bending causes a drop in En and hence a contrast decrease. For the Fermi level pinned mid-gap at a high density of surface states, no contrast should be observed. The SLEEM observations were made on a boron-doped p-type patterned structure fabricated in an n-type Si substrate using two instruments with considerably different vacuum conditions (Mu¨llerova´ et al., 2002). The experiments (see Figure 67) confirmed the basic premises of the model, i.e., no BSE contrast and a moderate contrast in the SE emission. However, the most important finding was that significant increase in contrast was registered in the SLEEM mode. Careful contrast quantification verified the high contrast for the specimen inserted into the CL and revealed that even vacuum conditions play a very important role: at a standard vacuum of the
428
MU¨LLEROVA´ AND FRANK
FIGURE 66. Combined band structures of p and n regions in the same specimen, with no influence of surface states assumed.
FIGURE 67. Boron-doped (1 1019 cm3) p-type patterns on an n-type phosphor-doped (4 to 6 1014 cm3) Si (111) substrate: (a) BSE image at 10 keV, (b) SE image at 10 keV, (c) SLEEM image at EP ¼ 10 keV, E ¼ 1 keV; Tescan Vega 5130 SEM adapted for the SLEEM method, the width of the field of view is 350 mm for (a) and (b) and 500 mm for (c). (Specimen provided by Ing. B. Necˇasova´, Tesla Sezam, Inc., Rozˇnov p/R, Czech Republic.)
order of 103 to 104 Pa the contrast clearly surpasses that obtained under clean UHV conditions (Figure 68). The existence of the extremely high contrast for a specimen immersed into a moderate electric field not exceeding 2 V mm1, i.e., weaker than fields normally applied to semiconductor structures under operation, and the enhancement under
SCANNING LEEM
429
FIGURE 68. The electron energy dependence of the SLEEM image contrast between p and n areas for the specimen shown in Figure 67; (A) dedicated UHV SLEEM microscope (see Section VII.B), (B) standard vacuum conditions, (C) SE signal from a standard ET detector. ((B) and (C) from Tescan Vega 5130 SEM adapted for the SLEEM method.)
FIGURE 69. The p/n contrast measured in the SLEEM mode for constant impact energy E ¼ 1 keV but variable primary energy EP.
routine vacuum conditions are facts very promising for application of the SLEEM method in semiconductor diagnostics and testing. The influence of the CL field is further illustrated by Figure 69 showing directly the contrast dependence on the field strength. The low-field limit obviously fits the contrast level achieved with the standard ET detector (see Figure 68). The crucial role of the vacuum conditions clearly indicates that the realization about the clean crystal surface as a base for the contrast interpretation is not correct. Further experiments showed that the contrast
430
MU¨LLEROVA´ AND FRANK
could be manipulated and even inverted by coating the structure with metals of various work functions. On this basis a new model was proposed (El-Gomati et al., 2003) that considers the surface to be covered by a graphitic layer of contaminants with quasimetal properties and a metal–semiconductor junction to be formed beneath the surface. The subsurface fields, connected with the junction, explain successfully the observed phenomena even in cases when no patch field can be created, for example with the metalized surface that has to be taken as an equipotential one.
J. Energy-Band Contrast In Section III.A.2, we described the reflection of very slow electrons on energy gaps, i.e., a contrast mechanism quite exotic from the point of view of SEM practice. In Figure 5 this was illustrated by measured energy dependences of the (00) spot intensity for two crystal orientations of tungsten. However, demonstration of this contrast in a SEM micrograph is difficult—any bicrystal and/or polycrystal specimens exhibit a combination of contrasts caused by phenomena anisotropic with respect to the crystal orientation so that reliably extracting this contribution is a problematic task. One exception is a semiconductor structure with patterned doping, observed in plan view. A clean semiconductor surface can be believed to possess identical properties on the doped pattern as well as on the substrate and the same holds for the crystal orientation in the sense that a small amount of dopant cannot change the electron yields. Nevertheless, additional impurity levels in the energy-band structure, namely those appearing in the energy gaps, can manifest themselves via this contrast mechanism. If such an energy level is hit, electrons penetrate into the doped pattern but not into the surrounding substrate so that the pattern appears dark. The first successful observation was announced by Mu¨llerova´ et al. (2001) and is shown in Figure 70. A signal decrease is apparent in micrographs taken at 3 and 1 eV and very pronouncedly in the 0.5 eV frame. This first experience has proved that this type of contrast is strongly dependent on even a tiny mechanical tilt of the specimen incorporated into the cathode lens. Figure 70 was taken with a provisional specimen stage with no tilt facility. Hence no true CL field alignment was possible and influences of inhomogeneity of the retarding field could be only compensated by suitable misalignment of the objective lens, which resulted in lowered resolution and enhanced axially nonsymmetric aberrations.
SCANNING LEEM
431
FIGURE 70. A p-type rectangle on the specimen shown in Figure 67, the SLEEM image at the electron energies from the top left by rows: 7, 4, 3, 2, 1, and 0.5 eV; dedicated UHV SLEEM microscope (see Section VII.B), the width of the field of view is 70 mm.
IX. CONCLUSIONS The element of instrumentation common to the history of work summarized in this text, the cathode lens, is in fact a very simple and very old assembly that can be easily incorporated into any electron optical device. In spite of this, it took more than 10 years before it started to appear frequently in the titles of papers in the journals devoted to scanning electron microscopy and its applications. The authors of this review feel a certain satisfaction about this development and about the forthcoming commercial devices containing this attachment, which may belong to the family of dedicated instruments for IC technologies or even to general purpose SEMs. Progress in this direction can break the ‘‘magic ring’’ and the increasing number of instruments will expand the community of users who quickly extend the application fields, etc. A UHV version of the instrument, equipped with devices for surface microanalysis methods, opens the way to examination of fascinating physical phenomena taking place on crystal surfaces that were revealed by the LEEM method. The scanning counterpart can take advantage
432
MU¨LLEROVA´ AND FRANK
of multiple signal acquisition and simultaneous compilation of separate image slices for individual diffraction spots, possibly even completed by additional signals. More experienced users with some technical background can introduce the method into their commercial SEM instruments by an effort comparable with embarking on any other small nonstandard adaptation. For a boosterequipped SEM the adaptation might be quite trivial. In the near future, the first commercial SEM with the CL mode among the standard operation routines is expected. But, as for any other experimental method, the future progress of this method will also depend on its usefulness for a sufficiently broad community of users.
ACKNOWLEDGMENTS This chapter reviews a major part of the work of the authors’ team since the beginning of the 1990s. In the course of this time several particular projects have been brought to a successful conclusion under support of the Grant Agency of the Czech Republic and of the Grant Agency of the Academy of Sciences of the Czech Republic. The final period was supported by the GA ASCR grant no. A1065901. The results presented were naturally obtained in collaboration with other team members, both present and past, in particular Dr. Martin Zadrazˇil, Mr. Pavel Klein, and Mr. Mojmı´ r Sirny´. The participation of other members of the Institute of Scientific Instruments of ASCR in Brno and the Institute’s background in general were crucial for the whole long-term program. External cooperation was most intensive with Professor E. Bauer (TU Clausthal, Germany, and later the Arizona State University) and with Professor M.M. El-Gomati (University of York, UK). The authors express their profound gratitude to all who helped them in their work. The final manuscript was compiled during a fruitful stay at the University of Toyama, Japan, for which sincere thanks are due to Professors S. Ikeno and M. Shiojiri and to Dr. K. Matsuda.
REFERENCES Autrata, R. (1989). Backscattered electron imaging using single crystal scintillator detectors. Scanning Microsc. 3, 739–763. Autrata, R., Hermann, R., and Mu¨ller, M. (1992). An efficient single crystal BSE detector in SEM. Scanning 14, 127–135.
SCANNING LEEM
433
Autrata, R., and Hejna, J. (1991). Detectors for low voltage scanning electron microscopy. Scanning 13, 275–287. Autrata, R., and Schauer, P. (1994). Behaviour of planar and annular YAG single crystal detectors for LVSEM operation, in Proceedings of the Thirteenth International Congress on Electron Microscopy, Vol. 1, edited by B. Jouffrey and C. Colliex. Les Ulis, France: Les e´ditions de Physique, pp. 71–72. Autrata, R., and Schauer, P. (1998). Single crystal scintillation detectors for LVSEM, in Proceedings of the Fourteenth International Congress on Electron Microscopy, Vol. 1, edited by H. A. Calderon Benavides and M. J. Yacaman. Bristol, UK: Institute of Physics, pp. 437–438. Barth, J. E., and Kruit, P. (1996). Addition of different contributions to the charged particle probe size. Optik 101, 101–109. Bartosˇ , I., van Hove, M. A., and Altman, M. S. (1996). Cu (111) electron band structure and channeling by VLEED. Surface Sci. 352–354, 660–664. Bauer, E. (1962). Low energy electron reflection microscopy, in Proceedings of the Fifth International Congress on Electron Microscopy, Vol. I, edited by S. S. Bresse, Jr. New York: Academic Press, pp. D11–12. Bauer, E. (1994). Low energy electron microscopy. Rep. Progr. Phys. 57, 895–938. Bauer, E., and Telieps, W. (1988). Emission and low-energy reflection electron microscopy, in Surface and Interface Characterization by Electron Optical Methods, edited by A. Howie and U. Valdre, New York: Plenum Press, pp. 195–233. Bauer, H. D. (1979). Messungen zur Energieverteilung von Ru¨ckstreu-elektronen an polykristallinen Festko¨rpern. Exp. Techn. Phys. 27, 331–344. Bauer, H. E., and Seiler, H. (1984). Determination of the non-charging electron beam energies of electrically floating metal samples, in Scanning Electron Microscopy, Vol. III, edited by O. Johari. Chicago: SEM, pp. 1081–1088. Beck, S., Plies, E., and Schiebel, B. (1995). Low-voltage probe forming columns for electrons. Nucl. Instrum. Methods Phys. Res. A 363, 31–42. Berry, V. K. (1988). Characterization of polymer blends by low voltage scanning electron microscopy. Scanning 10, 19–27. Bindi, R., Lanteri, H., and Rostaing, P. (1980). A new approach and resolution method of the Boltzman equation applied to secondary electron emission by reflection from polycrystalline aluminum. J. Phys. D: Appl. Phys. 13, 267–280. Bleloch, A. L., Howie, A., and Milne, R. H. (1989). High resolution secondary electron imaging and spectroscopy. Ultramicroscopy 31, 99–110. Bode, M., and Reimer, L. (1985). Detector strategy for a single-polepiece lens. Scanning 7, 125–133. Bo¨ngeler, R., Golla, U., Ka¨ssens, M., Reimer, L., Schindler, B., Senkel, R., and Spranck, M. (1993). Electron-specimen interactions in low-voltage scanning electron microscopy. Scanning 15, 1–18. Born, M., and Wolf, E. (1975). Principles of Optics. Oxford: Pergamon Press. Brown, A. C., and Swift, J. A. (1974). Low voltage scanning electron microscopy of keratin fibre surfaces, in Scanning Electron Microscopy, edited by O. Johari. Chicago: SEM, pp. 68–74. Bruining, H. (1954). Physics and Application of Secondary Electron Emission. Oxford: Pergamon Press. Brunner, M., and Schmid, R. (1987). Characteristics of an electric/magnetic quadrupole detector for low voltage scanning electron microscopy. Scanning Microsc. 1, 1501–1506. Burns, J. (1960). Angular distribution of secondary electrons from (100) faces of Cu and Ni. Phys. Rev. 119, 102–114.
434
MU¨LLEROVA´ AND FRANK
Buseck, P., Cowley, J., and Eyring, L. (1988). High-Resolution Transmission Electron Microscopy and Associated Techniques. London: Oxford University Press. Cailler, M., Ganachaud, J. P., and Bourdin, J. P. (1981). The mean free path of an electron in copper between two inelastic collisions. Thin Solid Films 75, 181–189. Cazaux, J. (1986). Some considerations on the electric field induced in insulators by electron bombardment. J. Appl. Phys. 59, 1418–1430. Cazaux, J. (1996a). Electron probe microanalysis in insulating materials: quantification problems and some possible solutions. X-ray Spectrom. 25, 265–280. Cazaux, J. (1996b). The electric image effects at dielectric surfaces. IEEE Trans. Diel. Electr. Insul. 3, 75–79. Cazaux, J. (1999). Some considerations on the secondary electron emission, , from e irradiated insulators. J. Appl. Phys. 85, 1137–1147. Cazaux, J., and Le Gressus, C. (1991). Phenomena relating to charge in insulators: macroscopic effects and microscopic causes. Scanning Microsc. 5, 17–27. Cazaux, J., and Lehuede, P. (1992). Some physical descriptions of the charging effects of insulators under incident particle bombardment. J. El. Spectrosc. Rel. Phenom. 59, 49–71. Cazaux, J., Kim, K. H., Jbara, O., and Salace, G. (1991). Charging effects of MgO under electron bombardment and nonohmic behaviour of the induced specimen current. J. Appl. Phys. 70, 960–965. Chung, M. S., and Everhart, T. E. (1974). Simple calculation of energy distribution of lowenergy secondary electrons emitted from metals under electron bombardment. J. Appl. Phys. 45, 707–709. Czyzewski, Z., and Joy, D. C. (1989). Fast Monte Carlo method for simulating electron scattering in solids. J. Microscopy 156, 285–291. Czyzewski, Z., MacCallum, D. O., Romig, A., and Joy, D. C. (1990). Calculations of Mott scattering cross-section. J. Appl. Phys. 68, 3066–3072. Dahl, D. A. (1995). SIMION 3D Version 6.0, in Proceedings of the Forty-Third ASMS Conference on Mass Spectrometry and Allied Topic. Santa Fe, NM: American Society for Mass Spectrometry, p. 717. Dekker, A. J. (1958). Secondary electron emission. Solid State Phys. 6, 251–315. Delong, A., and Drahosˇ , V. (1971). Low-energy electron diffraction in an emission electron microscope. Nature Phys. Sci. 230, 196–197. Dietrich, W., and Seiler, H. (1960). Energieverteilung von Elektronen, die durch Ionen und Elektronen in Durchstrahlung an du¨nnen Folien ausgelo¨st werden. Z. Angew. Physik 157, 576–585. Ding, Z.-J. (1990). Fundamental studies on the interactions of kV electrons with solids for application to electron spectroscopies. Ph.D. Thesis, Osaka University, Japan. Ding, Z.-J., and Shimizu, R. (1996). A Monte Carlo modeling of electron interaction with solids including cascade secondary electron production. Scanning 18, 92–113. Drescher, H., Reimer, L., and Seidel, H. (1970). Ru¨ckstreukoeffizient und Sekunda¨relektronenAusbeute von 10–100 keV Elektronen und Beziehungen zur Raster-Elektronenmikroskopie. Z. Angew. Physik 29, 331–336. Drucker, J., Scheinfein, M. R., Liu, J., and Weiss, J. K. (1993). Electron coincidence spectroscopy studies of secondary and Auger-electron generation mechanisms. J. Appl. Phys. 74, 7329–7339. Egerton, R. F. (1986). Electron Energy-Loss Spectroscopy in the Electron Microscope. New York: Plenum Press. El-Gomati, M. M., Barkshire, I., Greenwood, J., Kenny, P., Roberts, R., and Prutton, M. (1992). Compositional imaging in scanning Auger microscopy, in: Microscopy: The Key Research Tool, edited by C. Lyman. Chicago: Electron Microscopy Society of America, pp. 29–38.
SCANNING LEEM
435
El-Gomati, M. M., Romanovsky´, V., Frank, L., and Mu¨llerova´, I. (2002). A very low energy electron column for surface studies, in Proceedings of the Fifteenth International Congress on Electron Microscopy, Vol. 3, edited by R. Cross, J. Engelbrecht, T. Sewell, M. Witcomb, and P. Richards. Onderstepoort: Microscopy Society of Southern Africa, pp. 323–324. El-Gomati, M. M., Wells, T. C. R., Mu¨llerova´, I., Frank, L., and Jayakody, H. (2003). Why is it that diferently doped regions in semiconductors are visible in low voltage SEM? IEEE Trans. Electron Devices, submitted. Everhart, T. E., and Chung, M. S. (1972). Idealized spatial emission distribution of secondary electrons. J. Appl. Phys. 43, 3708–3711. Everhart, T. E., and Thornley, R. F. M. (1960). Wideband detector for micro-micro-ampere low electron currents. J. Sci. Instrum. 37, 246–248. Everhart, T. E., Saeki, N., Shimizu, R., and Koshikawa, T. (1976). Measurement of structure in the energy distribution of slow secondary electrons from aluminum. J. Appl. Phys. 47, 2941–2945. Ezumi, M., Otaka, T., Mori, H., Todokoro, H., and Ose, Y. (1996). Development of critical dimension measurement scanning electron microscope for ULSI (S-8000 series). Hitachi Instrument News Electron Microscopy Edition 30, 15–21. Feldman, L. C., and Mayer, J. W. (1986). Fundamentals of Surface and Thin Film Analysis. New York: Elsevier. Fitting, H.-J., Schreiber, E., Kuhr, J.-Ch., and von Czarnowski, A. (2001). Attenuation and escape depths of low-energy electron emission. J. El. Spectrosc. Rel. Phenom. 119, 35–47. Fourie, J. T. (1976). Contamination phenomena in cryopumped TEM and ultrahigh vacuum field-emission STEM systems, in Scanning Electron Microscopy, Vol. I, edited by O. Johari. Chicago: SEM, pp. 53–60. Fourie, J. T. (1979). A theory of surface-originating contamination and a method for its elimination, in Scanning Electron Microscopy, Vol. II, edited by O. Johari. Chicago: SEM, pp. 87–102. Fourie, J. T. (1981). Electric effects in contamination and electron beam etching, in Scanning Electron Microscopy, Vol. I., edited by O. Johari. Chicago: SEM, pp. 127–134. Frank, L. (1992a). Towards electron beam tomography, in Proceedings of the Tenth European Congress on Electron Microscopy, Vol. 1, edited by A. Rios, J.M. Arias, L. Megias-Megias, and A. Lopez-Galindo. Granada: Univ. de Granada, pp. 141–142. Frank, L. (1992b). Experimental study of electron backscattering at interfaces. Surface Sci. 269/270, 763–771. Frank, L. (1996a). Real image resolution of SEM and low-energy SEM and its optimization. Ultramicroscopy 62, 261–269. Frank, L. (1996b). Width of the SEM and LESEM response function as a tool for the image resolution assessment, in Proceedings of the Ninth Conference on Electron Microscopy of Solids, edited by A. Czyrska-Filemonowicz. Krako´w: State Committee for Scientific Research, pp. 109–112. Frank, L. (2002). Advances in scanning electron microscopy, in Advances in Imaging and Electron Physics, Vol. 123, edited by P. W. Hawkes. San Diego: Academic Press, pp. 327–373. Frank, L., and Mu¨llerova´, I. (1994). Zero-charging electron microscopy in a cathode lens equipped SEM, in Proceedings of the Thirteenth International Congress on Electron Microscopy, Vol. 1, edited by B. Jouffrey and C. Colliex. Les Ulis, France: Les e´ditions de Physique, pp. 139–140. Frank, L., and Mu¨llerova´, I. (1999). Strategies for low- and very-low-energy SEM. J. El. Microsc. 48, 205–219. Frank, L., Mu¨llerova´, I., and El-Gomati, M. M. (2000a). A novel in-lens detector for electrostatic scanning LEEM column. Ultramicroscopy 81, 99–110.
436
MU¨LLEROVA´ AND FRANK
Frank, L., Mu¨llerova´, I., and El-Gomati, M. M. (2002). SEM visualization of doping in semiconductors, in Proceedings of the Fifteenth International Congress on Electron Microscopy, Vol. 1, edited by R. Cross, J. Engelbrecht, M. Witcomb, and P. Richards. Onderstepoort: Microscopy Society of Southern Africa, pp. 39–40. Frank, L., Mu¨llerova´, I., Faulian, K., and Bauer, E. (1999). The scanning low-energy electron microscope: first attainment of diffraction contrast in the scanning electron microscope. Scanning 21, 1–13. Frank, L., Stekly´, R., Zadrazˇil, M., El-Gomati, M. M., and Mu¨llerova´, I. (2000b). Electron backscattering from real and in situ treated surfaces. Mikrochim. Acta 132, 179–188. Frank, L., Zadrazˇil, M., and Mu¨llerova´, I. (2001). Scanning electron microscopy of nonconductive specimens at critical energies in a cathode lens system. Scanning 23, 36–50. Frosien, J., and Plies, E. (1987). High performance electron optical column for testing ICs with submicrometer design rules. Microelectronic Engineering 7, 163–172. Frosien, J., Plies, E., and Anger, K. (1989). Compound magnetic and electrostatic lenses for low voltage applications. J. Vac. Sci. Technol. B 7, 1874–1877. Gergely, G. (1986). Elastic peak electron spectroscopy. Scanning 8, 203–214. Gergely, G., Menyha´rd, M., and Sulyok, A. (1986). Some new possibilities in non-destructive depth profiling using secondary emission spectroscopy: REELS and EPES. Vacuum 36, 471–475. Glaser, W. (1952). Grundlagen der Elektronenoptik. Wien: Springer-Verlag. Gries, W. H., and Werner, W. (1990). Take-off angle and film thickness dependences of the attenuation length of x-ray photoelectrons by a trajectory reversal method. Surf. Interf. Anal. 16, 149–153. Gryzinski, M. (1965). Classical theory of atomic collisions. I. Theory of inelastic collisions. Phys. Rev. A 138, 336–358. Hachenberg, O., and Brauer, W. (1959). Secondary electron emission from solids, in Advances in Electronics and Electron Physics, Vol. II, edited by L. Marton. San Diego: Academic Press, pp. 413–499. Hasselbach, F., and Krauss, H.-R. (1988). Backscattered electrons and their influence on contrast in the scanning electron microscope. Scanning Microsc. 2, 1947–1956. Hasselbach, F., and Rieke, U. (1982). Spatial distribution of secondaries released by backscattered electrons in silicon and gold for 20–70 keV primary energy, in Proceedings of the Tenth International Congress on Electron Microscopy, Vol. 1, edited by the Congressional Organization Committee. Hamburg: Deutsche Ges. EM, pp. 253–254. Hawkes, P. W. (1997). Aberrations, in Handbook of Charged Particle Optics, edited by J. Orloff. New York: CRC Press, pp. 223–274. Hawkes, P. W., and Kasper, E. (1996a). Principles of Electron Optics Vol. 2, Applied Geometrical Optics. San Diego: Academic Press. Hawkes, P. W., and Kasper, E. (1996b). Principles of Electron Optics Vol. 3, Wave Optics. San Diego: Academic Press. Hejna, J. (1994). Backscattered electron imaging in low-voltage SEM, in Proceedings of the Thirteenth International Congress on Electron Microscopy, Vol. 1, edited by B. Jouffrey and C. Colliex. Les Ulis, France: Les e´ditions de Physique, pp. 75–76. Hejna, J. (1998). Optimization of an immersion lens design in the BSE detector for the low voltage SEM, in Recent Trends in Charged Particle Optics and Surface Physics Instrumentation, Sixth Seminar, edited by I. Mu¨llerova´ and L. Frank. Brno: Czechoslovak Society for Electron Microscopy, pp. 30–31. Ho, Y. C., Tan, Z. Y., Wang, X. L., and Chen, J. G. (1991). A theory and Monte Carlo calculation on low-energy electron scattering in solids. Scanning Microsc. 4, 945–951.
SCANNING LEEM
437
Homma, Y., Suzuki, M., and Tomita, M. (1993). Atomic configuration dependent secondary electron emission from reconstructed silicon surfaces. Appl. Phys. Lett. 62, 3276–3278. Hutarˇ , O., Oral, M., Mu¨llerova´, I., and Frank, L. (2000). Dimension measurement in a cathode lens equipped low-energy SEM, in Proceedings of the Twelfth European Congress on Electron Microscopy, Vol. 3, edited by L. Frank and F. Cˇiampor. Brno: Czechoslovak Society for Electron Microscopy, pp. 199–200. Ichimura, S. (1980). Basic study of scanning Auger electron microscopy for surface analysis. Ph.D. Thesis, Osaka University, Japan. Ichinokawa, T., Ishikawa, Y., Kemmochi, M., Ikeda, N., Hosokawa, Y., and Kirschner, J. (1986). Low-energy scanning electron microscopy combined with low-energy electron diffraction. Surface Sci. 176, 397–414. Ichinokawa, T., Ishikawa, Y., Kemmochi, M., Ikeda, N., Hosokawa, Y., and Kirschner, J. (1987). Scanning low-energy electron microscopy. Scanning Microsc. Suppl. 1, 93–97. Ishikawa, Y., Ikeda, N., Kemmochi, M., and Ichinokawa, T. (1985). UHV-SEM observations of cleaning process and step formation on silicon (111) surfaces by annealing. Surface Sci. 159, 256–264. Jacka, M., Kirk, M., El-Gomati, M. M., and Prutton, M. (1999). A fast, parallel acquisition, electron energy analyzer: The hyperbolic field analyzer. Rev. Sci. Instrum. 70, 2282–2287. Jaklevic, R. C., and Davis, L. C. (1982). Band signatures in the low-energy-electron reflectance spectra for fcc metals. Phys. Rev. B 26, 5391–5397. Joy, D. C. (1984). Beam interactions, contrast and resolution in the SEM. J. Microsc. 136, 241–258. Joy, D. C. (1987). A model for calculating secondary and backscattered electron yields. J. Microsc. 147, 51–64. Joy, D. C. (1989). Control of charging in low-voltage SEM. Scanning 11, 1–4. Joy, D. C. (1995). Monte Carlo Modeling for Electron Microscopy and Microanalysis. Oxford Series in Optical and Imaging Sciences. London: Oxford University Press. Joy, D. C. (2001). A database of electron-solid interactions, Rev. 01–01. http://web.utk.edu/ srcutk/htm/interact.htm. Joy, D. C., and Joy, C. S. (1996). Low voltage scanning electron microscopy. Micron 27, 247–263. Joy, D. C., and Luo, S. (1989). An empirical stopping power relationship for low-energy electrons. Scanning 11, 176–180. Kanaya, K., and Kawakatsu, H. (1972). Secondary electron emission due to primary and backscattered electrons. J. Phys. D: Appl. Phys. 5, 1727–1742. Kanter, H. (1961). Contribution of backscattered electrons to secondary electron formation. Phys. Rev. 121, 681–684. Kato, M., and Tsuno, K. (1990). Numerical analysis of trajectories and aberrations of a Wien filter including the effect of fringing fields. Nucl. Instrum. Meth. Phys. Res. A298, 296–320. Kazumori, H. (2002). Development of JSM-7400F: new secondary electron detection systems permit observation of non-conductive materials. JEOL News 37E(1), 44–47. Khursheed, A. (2002). Aberration characteristics of immersion lenses for LVSEM. Ultramicroscopy 93, 331–338. Kirschner, J. (1984). On the role of the electron spin in scanning electron microscopy, in Scanning Electron Microscopy, Vol. III. edited by O. Johari. Chicago: SEM, pp. 1179–1185. Knell, G., and Plies, E. (1998). Determination of the collection efficiency for combined magnetic–electrostatic SEM objective lenses. Optik 108, 37–42. Kohl, H., Rose, H., and Schnabl, H. (1981). Dose-rate effect at low temperatures in FBEM and STEM due to object heating. Optik 58, 11–24.
438
MU¨LLEROVA´ AND FRANK
Kolarˇ ı´ k, R., and Lenc, M. (1997). An expression for the resolving power of a simple optical system. Optik 106, 135–139. Kollath, R. (1956). Sekunda¨relektronen-Emission fester Ko¨rper bei Bestrahlung mit Elektronen, in Handbuch der Physik, Vol. 21. Berlin: Springer-Verlag, pp. 232–303. Kruit, P., and Lenc, M. (1992). Optical properties of the magnetic monopole field applied to electron microscopy and spectroscopy. J. Appl. Phys. 72, 4505–4513. Kuhr, J.-Ch., and Fitting, H.-J. (1998). Monte-Carlo simulation of low-voltage scanning electron microscopy—LVSEM, in Proceedings of the Fourteenth International Congress on Electron Microscopy, Vol. 1, edited by H.A. Calderon Benavides and M.J. Yacaman. Bristol, UK: Institute of Physics, pp. 451–452. Kuhr, J.-Ch., and Fitting, H.-J. (1999). Monte Carlo simulation of electron emission from solids. J. El. Spectrosc. Rel. Phenom. 105, 257–273. Kulenkampff, H., and Spyra, W. (1954). Energieverteilung ru¨ckdiffundierter Elektronen. Z. Phys. 137, 416–425. Le Gressus, C., Valin, F., Gautier, M., Duraud, J. P., Cazaux, J., and Okuzumi, H. (1990). Charging phenomena on insulating materials: mechanisms and applications. Scanning 12, 203–210. Lenc, M. (1995). Immersion objective lenses for very low energy electron microscopy, in Proceedings of the Multinational Congress on Electron Microscopy, edited by F. Ciampor. Bratislava: Slovak Academic Press, pp. 103–104. Lenc, M., and Mu¨llerova´, I. (1992a). Electron optical properties of a cathode lens. Ultramicroscopy 41, 411–417. Lenc, M., and Mu¨llerova´, I. (1992b). Optical properties and axial aberration coefficients of the cathode lens in combination with a focusing lens. Ultramicroscopy 45, 159–162. Lencova´, B. (1997). Electrostatic lenses. Handbook of Charged Particle Optics, edited by J. Orloff. New York: CRC Press, pp. 177–221. Lencova´, B., and Wisselink, G. (1990). Program package for the computation of lenses and deflectors. Nucl. Instrum. Meth. A298, 56–66. Libinson, A. G. (1999). Tilt dependence of secondary electron emission at low excitation energy. Scanning 21, 23–26. Liebel, H., and Senftinger, B. (1991). Low-energy electron microscope of novel design. Ultramicroscopy 36, 91–98. Llacer, J., and Garwin, E. L. (1969). Electron–phonon interaction in alkali halides—I. The transport of secondary electrons with energies between 0.25 and 7.5 eV. J. Appl. Phys. 40, 2766–2775. Mankos, M., and Adler, D. (2002). Electron–electron interactions in cathode objective lenses. Ultramicroscopy 93, 347–354. Martin, J. P., Weimer, E., Frosien, J., and Lanio S. (1994). Ultra-high resolution SEM—A new approach. Microscopy and Analysis (USA), March, 19. McKernan, S. (1998). A comparison of detectors for low voltage contrast in the SEM, in Proceedings of the Fourteenth International Congress on Electron Microscopy, Vol. 1, edited by H. A. Calderon Benavides and M. J. Yacaman. Bristol, UK: Institute of Physics, pp. 481–482. Meisburger, W. D., Brodie, A. D., and Desai, A. A. (1992). Low-voltage electron-optical system for the high-speed inspection of integrated circuits. J. Vac. Sci. Technol. B 10, 2804–2808. Miyoshi, M., Yamazaki, Y., Nagai, T., and Nagahama, I. (1999). Development of a projection imaging electron microscope with electrostatic lenses. J. Vac. Sci. Technol. B 17, 2799–2802. Morin, P., Pitaval, M., and Vicario, E. (1976). Direct observation of insulators with a scanning electron microscope. J. Phys. E: Sci. Instrum. 9, 1017–1020.
SCANNING LEEM
439
Mott, N. F., and Massey, H. S. W. (1965). The Theory of Atomic Collisions. London: Oxford University Press. Mu¨llerova´, I. (1996). Contrast mechanisms in low voltage SEM, in Proceedings of the Ninth Conference on Electron Microscopy of Solids, edited by A. Czyrska-Filemonowicz. Krako´w: State Committee for Scientific Research, pp. 93–96. Mu¨llerova´, I. (2001). Imaging of specimens at optimized low and very low energies in scanning electron microscopes. Scanning 23, 379–394. Mu¨llerova´, I., and Frank, L. (1993). Very low energy microscopy in commercial SEMs. Scanning 15, 193–201. Mu¨llerova´, I., and Frank, L. (1994). Use of cathode lens in scanning electron microscope for low voltage applications. Mikrochim. Acta 114/115, 389–396. Mu¨llerova´, I., and Frank, L. (2002). Practical resolution limit in the scanning lowenergy electron microscope, in Proceedings of the Fifteenth International Congress on Electron Microscopy, Vol. 3, edited by R. Cross, J. Engelbrecht, T. Sewell, M. Witcomb, and P. Richards. Onderstepoort: Microscopy Society of Southern Africa, pp. 99–100. Mu¨llerova´, I., and Frank, L. (2003). Contrast at very low energies of the gold/carbon specimen for resolution testing. Scanning, submitted. Mu¨llerova´, I., and Lenc, M. (1992a). Some approaches to low-voltage scanning electron microscopy. Ultramicroscopy 41, 399–410. Mu¨llerova´, I., and Lenc, M. (1992b). The scanning very-low-energy electron microscope (SVLEEM). Mikrochim. Acta (Suppl.) 12, 173–177. Mu¨llerova´, I., El-Gomati, M. M., and Frank, L. (2002). Imaging of the boron doping in silicon using low-energy SEM. Ultramicroscopy 93, 223–243. Mu¨llerova´, I., Frank, L., and Hutarˇ , O. (2001). Visualization of the energy band contrast in SEM through low-energy electron reflectance. Scanning 23, 115. Mu¨llerova´, I., Lenc, M., and Floria´n, M. (1989). Collection of backscattered electrons with a single polepiece lens and a multiple detector. Scanning Microsc. 3, 419–428. Mu¨llerova´, I., Zadrazˇil, M., and Frank, L. (1997). Low-energy SEM imaging of bevelled multilayers. J. Comput. Assist. Microsc. 9, 121–122. Mulvey, T. (1984). Magnetic electron lenses II, in Electron Optical Systems for Microscopy, Microanalysis and Microlithography, edited by J. J. Hren, F. A. Lenz, E. Munro, and P. B. Sewell. Chicago: SEM, pp. 15–27. Nagatani, T., Saito, S., Sato, M., and Yamada, M. (1987). Development of an ultrahigh resolution SEM by means of a field emission source and in-lens system. Scanning Microsc. 1, 901–909. Nieminen, R. M. (1988). Stopping power for low-energy electrons. Scanning Microsc. 2, 1917–1926. Ono, S., and Kanaya, K. (1979). The energy dependence of secondary emission based on the range-energy retardation power formula. J. Phys. D: Appl. Phys. 12, 619–632. Paden, R. S., and Nixon, W. C. (1968). Retarding field scanning electron microscopy. J. Phys. E: Sci. Instrum. 1, 1073–1080. Pawley, J. B. (1984). Low voltage scanning electron microscopy. J. Microsc. 136, 45–68. Pejchl, D., Mu¨llerova´, I., and Frank, L. (1993). Unconventional imaging of surface relief. Czech. J. Phys. 43, 983–992. Pejchl, D., Mu¨llerova´, I., Frank, L., and Kolarˇ ı´ k, V. (1994). Separator of primary and signal electrons for very-low-energy SEM. Czech. J. Phys. 44, 269–276. Penn, D. R. (1987). Electron mean free path calculations using a model dielectric function. Phys. Rev. B 35, 482–486.
440
MU¨LLEROVA´ AND FRANK
Pfefferkorn, G. E., Gruter, H., and Pfautsch, M. (1972). Observations on the prevention of specimen charging, in Scanning Electron Microscopy, Vol. I, edited by O. Johari. Chicago: SEM, pp. 147–152. Plies, E., Degel, B., Hayn, A., Knell, G., and Schiebel, B. (1998). Experimental results using a ‘‘low-voltage booster’’ in a conventional SEM, in Proceedings of the Fifth International Conference on Charged Particle Optics, edited by P. Kruit and P. W. van Amersfoort. Amsterdam: Elsevier, pp. 126–130. Powell, C. J. (1974). Attenuation lengths of low-energy electrons in solids. Surface Sci. 44, 29–46. Powell, C. J. (1984). Inelastic mean free paths and attenuation lengths of low-energy electrons in solids, in Scanning Electron Microscopy, Vol. IV, edited by O. Johari. Chicago: SEM, pp. 1649–1664. Powell, C. J. (1985). Calculations of electron inelastic mean free paths from experimental optical data. Surf. Interf. Anal. 7, 263–274. Powell, C. J. (1987). The energy dependence of electron inelastic mean free path. Surf. Interf. Anal. 10, 349–354. Preikszas, D., and Rose, H. (1995). Procedures for minimizing the aberrations of electromagnetic compound lenses. Optik 100, 179–187. Price, C. W., and McCarthy, P. L. (1988). Low voltage scanning electron microscopy of lowdensity materials. Scanning 10, 29–36. Rao-Sahib, T. S., and Wittry, D. B. (1974). X-ray continuum from thick element targets for 10– 50 keV electrons. J. Appl. Phys. 45, 5060–5068. Recknagel, A. (1941). Theorie des elektrischen Elektronenmikroskops fu¨r Selbststrahler. Z. Phys. 117, 689–708. Reimer, L. (1971). Rauschen der Sekunda¨relektronenemission. Beitra¨ge Elektr. Direktabb. Oberfla¨chen (BEDO) 412, 299–304. Reimer, L. (1995). In Energy-Filtering Transmission Electron Microscopy, edited by L. Reimer. Berlin: Springer-Verlag, pp. 7–9. Reimer, L. (1996). MOCASIM—Ein Monte Carlo Programm fu¨r Forschung und Lehre. Beitr. Elektr. Mikr. Direktabb. Oberfl. (BEDO) 29, 1–10. Reimer, L. (1998). Scanning Electron Microscopy. Berlin: Springer-Verlag. Reimer, L., and Ka¨ssens, M. (1994). Application of a two-detector system for secondary and backscattered electrons in LVSEM, in Proceedings of the Thirteenth International Congress on Electron Microscopy, Vol. 1, edited by B. Jouffrey and C. Colliex. Les Ulis, France: Les e´ditions de Physique, pp. 73–74. Reimer, L., and Wa¨chter, M. (1978). Contribution to the contamination problem in TEM. Ultramicroscopy 3, 169–174. Reimer, L., Bo¨ngeler, R., Ka¨ssens, M., Liebscherr, F. F., and Senkel, R. (1991). Calculation of energy spectra from layered structures for backscattered electron spectrometry and relations to Rutherford backscattering spectrometry by ions. Scanning 13, 381–391. Reimer, L., Golla, U., Bo¨ngeler, R., Ka¨ssens, M., Schindler, B., and Senkel, R. (1992). Charging of bulk specimens, insulating layers and free supporting films in scanning electron microscopy. Optik 92, 14–22. Reimer, L., and Lo¨dding, B. (1984). Calculation and tabulation of Mott cross-sections for large-angle scattering. Scanning 6, 128–151. Reimer, L., and Tollkamp, C. (1980). Measuring the backscattering coefficient and secondary electron yield inside a SEM. Scanning 8, 35–39. Rose, H. (1987). The retarding Wien filter as a high-performance imaging filter. Optik 77, 26–34. Rose, H., and Preikszas, D. (1992). Outline of a versatile corrected LEEM. Optik 92, 31–44.
SCANNING LEEM
441
Rose, H., and Spehr, R. (1980). On the theory of the Boersch effect. Optik 57, 339–364. Salehi, M., and Flinn, E. A. (1981). Dependence of secondary electron emission from amorphous materials on primary angle of incidence. J. Appl. Phys. 52, 994–996. Sato, M., Todokoro, H., and Kageyama, K. (1993). A snorkel type objective lens with E B field for detecting secondary electrons. SPIE 2014, 17–23. Scha¨fer, J., and Ho¨lzl, J. (1972). A contribution to the dependence of secondary electron emission from the work function and Fermi energy. Thin Solid Films 18, 81–86. Schauer, P., and Autrata, R. (1998). Computer optimized design of BSE scintillation detector for SEM, in Proceedings of the Eleventh European Congress on Electron Microscopy, Vol. I, edited by Committee of European Societies for Microscopy. Brussels: CESM, pp. 369–370. Schmid, R., and Brunner, M. (1986). Design and application of a quadrupole detector for lowvoltage scanning electron microscopy. Scanning 8, 294–299. Schmid, R., Gaukler, K.H., and Seiler, H. (1983). Measurement of elastically reflected electrons (E 2.5 keV) for imaging of surfaces in a simple ultra high vacuum scanning electron microscope, in Scanning Electron Microscopy, Vol. II, edited by O. Johari. Chicago: AMF O’Hare, pp. 501–509. Schreiber, E., and Fitting, H.-J. (2002). Monte Carlo simulation of secondary electron emission from the insulator SiO2. J. El. Spectrosc. Rel. Phenom. 124, 25–37. Seah, M. P., and Dench, W. A. (1979). Quantitative electron spectroscopy of surfaces: a standard database for electron inelastic mean free paths in solids. Surf. Interf. Anal. 1, 1–11. Sealy, C. P., Castell, M. R., and Wilshaw, P. R. (2000). Mechanism for secondary electron dopant contrast in the SEM. J. El. Microsc. 49, 311–321. Seiler, H. (1967). Some problems of secondary electron emission. Z. Angew. Physik 22, 249–263. Seiler, H. (1983). Secondary electron emission in the scanning electron microscope. J. Appl. Phys. 54, R1–R18. Seiler, H., and Kuhnle, G. (1970). Anisotropy of the secondary electron yield as a function of the energy of the primary electrons from 5 to 20 keV. Z. Angew. Physik 29, 254–260. Shaffner, T. J., and Van Veld, R. D. (1971). ‘‘Charging’’ effects in the scanning electron microscope. J. Phys. E: Sci. Instrum. 4, 633–637. Shao, Z. (1989). Extraction of secondary electrons in a newly proposed immersion lens. Rev. Sci. Instrum. 60, 693–699. Spehr, R. (1985). Broadening of charged particle microprobes by stochastic Coulomb interactions. Optik 70, 109–114. Strocov, V. N., and Starnberg, H. I. (1995). Absolute band-structure determination by target current spectroscopy: Application to Cu(100). Phys. Rev. B 52, 8759–8765. Strocov, V. N., Starnberg, H. I., Nilsson, P. O., and Holleboom, L. J. (1996). Determining unoccupied bands of layered materials by VLEED: implications for photoemission band mapping. J. Phys.: Condens. Matter 8, 7549–7559. Takashima, S. (1994). New electron optical technologies in low voltage scanning electron microscope. JEOL News 31E(1), 33–35. Tanuma, S., Powell, C. J., and Penn, D. R. (1991a). Calculations of electron inelastic mean free paths II. Data for 27 elements over the 50–2000 eV range. Surf. Interf. Anal. 17, 911–926. Tanuma, S., Powell, C. J., and Penn, D. R. (1991b). Calculations of electron inelastic mean free paths III. Data for 15 inorganic compounds over the 50–2000 eV range. Surf. Interf. Anal. 17, 927–939. Telieps, W. (1987). Surface imaging with LEEM. Appl. Phys. A 44, 55–61. Telieps, W., and Bauer, E. (1985). An analytical reflection and emission UHV surface electron microscope. Ultramicroscopy 17, 57–66. Thomas, S., and Pattinson, E. B. (1970). Range of electrons and contribution of back-scattered electrons in secondary production in aluminum. J. Phys. D: Appl. Phys. 3, 349–357.
442
MU¨LLEROVA´ AND FRANK
Tromp, R. M. (2000). Low-energy electron microscopy. IBM J. Res. Develop. 44, 503–516. Tromp, R. M., and Reuter, M. C. (1993). Imaging with a low-energy electron microscope. Ultramicroscopy 50, 171–178. Tromp, R. M., Denier van der Gon, A. W., LeGoues, F. K., and Reuter, M. C. (1993). Observation of buried interfaces with low-energy electron microscopy. Phys. Rev. Lett. 71, 3299–3302. Tsai, F. C., and Crewe, A. V. (1998). A gapless magnetic objective lens for low voltage SEM. Optik 109, 5–11. Tung, C. J., Ashley, J. C., and Ritchie, R. H. (1979). Electron inelastic mean free paths and energy losses in solids, II. Electron gas statistical model. Surface Sci. 81, 427–439. Veneklasen, L. H. (1992). The continuing development of low-energy electron microscopy for characterizing surfaces. Rev. Sci. Instrum. 63, 5513–5532. Welter, L. M., and McKee, A. N. (1972). Observations on uncoated, nonconducting or thermally sensitive specimens using a fast scanning field emission source SEM, in Scanning Electron Microscopy, Vol. I, edited by O. Johari. Chicago: SEM, pp. 161–166. Werner, W. S. M. (1992). The role of the attenuation parameter in electron spectroscopy. J. El. Spectrosc. Rel. Phenom. 59, 275–291. Werner, W. S. M. (1996). Transport equation approach to electron microbeam analysis: fundamentals and applications. Mikrochim. Acta (Suppl.) 13, 13–38. Woodruff, D. P., and Delchar, T. A. (1986). Modern Techniques of Surface Science. Cambridge: Cambridge University Press. Ximen, J., Lin, P. S. D., Pawley, J. B., and Schippert, M. (1993). Electron optical design of a high-resolution low-voltage scanning electron microscope with field emission gun. Rev. Sci. Instrum. 64, 2905–2910. Yau, Y. W., Pease, R. F. W., Iranmanesh, A. A., and Polasko, K. J. (1981). Generation and applications of finely focused beams of low-energy electrons. J. Vac. Sci. Technol. 19, 1048–1052. Zach, J. (1989). Design of a high-resolution low-voltage scanning electron microscope. Optik 83, 30–40. Zach, J., and Haider, M. (1992). A high-resolution low voltage scanning electron microscope, in Proceedings of the Tenth European Congress on Electron Microscopy, Vol. 1, edited by A. Rios, J.M. Arias, L. Megias-Megias, and A. Lopez-Galindo. Granada: Univ. de Granada, pp. 49–53. Zach, J., and Rose, H. (1986). Efficient detection of secondary electrons in low-voltage scanning electron microscopy. Scanning 8, 285–293. Zach, J., and Rose, H. (1988). High-resolution low-voltage electron microprobe with large SE detection efficiency, in Proceedings of the Ninth European Congress on Electron Microscopy, Vol. 1, edited by P.J. Goodhew and H.G. Dickinson. Bristol, UK: Institute of Physics, pp. 81–82. Zadrazˇil, M., El-Gomati, M. M., and Walker, A. (1997). Measurements of very-low-energy secondary and backscattered electron coefficients. J. Comput. Ass. Microsc. 9, 123–124. Zadrazˇil, M., and El-Gomati, M. M. (1998a). Measurements of the secondary and backscattered electron coefficients in the very-low-energy range, in Recent Trends in Charged Particle Optics and Surface Physics Instrumentation, Sixth Seminar, edited by I. Mu¨llerova´ and L. Frank. Brno: Czechoslovak Society for Electron Microscopy, p. 82. Zadrazˇil, M., and El-Gomati, M. M. (1998b). Measurements of the secondary and backscattered electron coefficients in the energy range 250–5000 eV, in Proceedings of the Fourteenth International Congress on Electron Microscopy, Vol. 1, edited by H.A. Calderon Benavides and M.J. Yacaman. Bristol, UK: Institute of Physics, pp. 495–496. Zadrazˇil, M., and El-Gomati, M. M. (2002). Unpublished data.
SCANNING LEEM
443
Zobacˇova´, J., and Frank, L. (2003). Specimen charging and detection of signal from nonconductors in a cathode lens equipped SEM. Scanning, 25, 150–156. Zobacˇova´, J., Oral, M., Hutarˇ , O., Mu¨llerova´, I., and Frank, L (2003). Corrections of magnification and focusing in a cathode lens equipped SEM. To be submitted. Zworykin, V. A., Hillier, J., and Snyder, R. L. (1942). A scanning electron microscope. ASTM Bull. 117, 15–23.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 128
Scale-Space Methods and Regularization for Denoising and Inverse Problems OTMAR SCHERZER Department of Computer Science, Universita¨t Innsbruck, Technikerstraße 25, A-6020 Innsbruck, Austria
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . II. Image Smoothing and Restoration via Diffusion Filtering A. Level Set Modeling . . . . . . . . . . . . . . . . . . . . B. Morphological Diffusion Filtering. . . . . . . . . . . . C. Applications of Diffusion Filtering . . . . . . . . . . . D. Scale-Space Theory. . . . . . . . . . . . . . . . . . . . III. Regularization of Inverse Problems . . . . . . . . . . . . . A. Tikhonov-Type Regularization Methods . . . . . . . . B. Regularization Models for Denoising . . . . . . . . . . C. Relations between Regularization and Perona–Malik Diffusion Filtering . . . . . . . . . . . . . . . . . . . . D. Numerical Experiments . . . . . . . . . . . . . . . . . IV. Mumford–Shah Filtering . . . . . . . . . . . . . . . . . . . V. Regularization and Spline Approximation . . . . . . . . . VI. Scale-Space Methods for Inverse Problems . . . . . . . . . A. Deblurring with a Scale-Space Method . . . . . . . . . B. Numerical Simulations . . . . . . . . . . . . . . . . . . VII. Nonconvex Regularization Models . . . . . . . . . . . . . A. Perona–Malik Regularization . . . . . . . . . . . . . . B. Relative Error Regularization . . . . . . . . . . . . . . VIII. Discrete BV Regularization and Tube Methods . . . . . . A. Discrete BV Regularization (Sampling) . . . . . . . . . B. Finite Volume BV Regularization . . . . . . . . . . . . C. The Taut String Algorithm . . . . . . . . . . . . . . . D. Multidimensional Discrete BV Regularization . . . . . E. Numerical Test Examples . . . . . . . . . . . . . . . . 1. One-Dimensional Test Example . . . . . . . . . . . 2. Two-Dimensional Bench-Mark Problem . . . . . . IX. Wavelet Shrinkage . . . . . . . . . . . . . . . . . . . . . . A. Daubechies’ Wavelets . . . . . . . . . . . . . . . . . . B. Denoising by Wavelet Shrinkage . . . . . . . . . . . . 1. Relation to Diffusion Filtering . . . . . . . . . . . . X. Regularization and Statistics . . . . . . . . . . . . . . . . . XI. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . .
445
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
446 447 453 455 458 458 460 464 465
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
466 468 472 474 478 481 484 493 493 494 500 502 504 505 506 508 509 510 510 511 514 514 517 522 523 523
Copyright ß 2003 Elsevier Inc. All rights reserved. 1076-5670/2003 $35.00
446
OTMAR SCHERZER
I. INTRODUCTION Inverse problems and imaging are two of the fastest growing areas in applied mathematics. Such problems appear in a variety of applications such as medical imaging and nondestructive evaluation. A typical example is computerized tomography (CT) where the density of a body is determined from X-ray measurements at the boundary. Inverse problems can be vaguely characterized as the problems of estimating the cause for an observed effect; in CT the cause is the density of the body and the observed effect is the X-ray data at the boundary of the object. With inverse problems one typically associates ill-posedness, that is, that there may not exist a solution, the solution is nonunique, or the solution does not depend continuously on the input data. In order to overcome these difficulties Tikhonov suggested approximating the ill-posed problem by a scale of well-posed variational problems. This initiated the work on regularization methods for the solution of ill-posed problems. Partial differential equations (PDEs) have proved to be efficient methods in image processing and computer vision. They are mainly used for smoothing and restoration, in particular noise removal. Their success is partly due to the fact that the approximation is independent of the underlying numerical method. The success of PDE methods in image processing has stimulated the development of new efficient numerical algorithms for the solution of inverse problems by constructing variational methods based on the energy formulations of PDEs. Nowadays the interaction between PDE models and variational formulations is subtle and has led to a fruitful interaction of inverse problems and image processing with splines, wavelets, morphology, and statistics. A goal of this survey is to review these interactions. The second goal of this survey is to compare various reconstruction algorithms. The outline of this work is as follows. In Section II we review image smoothing and restoration with PDEs. We compare several noise removal (denoising) techniques and show the effect of filtering as a prerequisite step of image analysis, such as segmentation. Moreover, we use the analogy of fluid flow to motivate PDEs for diffusion filtering. In Section III we review regularization methods for the solution of inverse problems and establish the connection between PDEs and variational methods for denoising. Section IV is devoted to the Mumford–Shah filtering method, which is a combined method for image smoothing and segmentation. In Section V we review the interaction between approximate spline filtering and variational methods. Section VI establishes a diffusion framework for the solution of inverse problems, which is linked to Section VII where nonconvex
DENOISING AND INVERSE PROBLEMS
447
variational problems are considered. In Section VIII we introduce a discrete framework for regularization and in Section IX we highlight the relation of variational methods and diffusion filtering with wavelets. In Section X we review the interactions of regularization and statistics.
II. IMAGE SMOOTHING
AND
RESTORATION
VIA
DIFFUSION FILTERING
PDE-based models have proved to be efficient in a variety of image processing and computer vision areas such as restoration, denoising, segmentation, shape from shading, histogram modification, optical flow, and stereo vision. To demonstrate the efficiency of diffusion filtering we recall a few models. To this end let u be an image defined on the open domain :¼ ð0, 1Þ ð0, 1Þ. (1) The simplest and best investigated diffusion filtering technique for image smoothing is the linear heat equation @u ¼ u, @t
ð1Þ
associated with homogeneous Neumann boundary data @u ¼ 0; @ here and in the following @u=@ denotes the derivative of u in normal direction to the boundary @ of . As initial data we use the input image uð0, xÞ ¼ u ðxÞ for x 2 :
ð2Þ
It is well known that the heat equation blurs the initial data and spurious noise is filtered. Figure 1 shows heat equation filtering of ultrasound data at specified times. (2) The heat equation is equally efficient in removing noise and destroying image details such as edges and corners. The total variation flow equation is able to preserve edges and denoise the image simultaneously. Here the differential equation @u 1 ¼r ru @t jruj
ð3Þ
448
OTMAR SCHERZER
50
50
100
100
150
150
200
200
250
250 20
40
60
80
100
120
140
160
180
50
50
100
100
150
150
200
200
250
20
40
60
80
100
120
140
160
180
20
40
60
80
100
120
140
160
180
20
40
60
80
100
120
140
160
180
250 20
40
60
80
100
120
140
160
180
50
50
100
100
150
150
200
200
250
250 20
40
60
80
100
120
140
160
180
FIGURE 1. Solution of the heat equation at time t ¼ 0, 0:1, 1, 10, 100, 1000.
together with homogeneous Neumann data is applied to the initial data u. Here and in the following j j denotes the Euclidean norm. A detailed rigorous mathematical analysis of this partial differential equation (see [14–17,23]) has been given. The mathematical analysis impressively supports the remarkable properties of this filtering technique (cf. Figure 2). (3) The Bingham fluid flow equation is able to preserve flat regions and denoise the image simultaneously. This filtering technique requires one to solve the differential equation @u 1 ¼ r ru þ ku @t jruj
ð4Þ
449
DENOISING AND INVERSE PROBLEMS
50
50
100
100
150
150
200
200
250
250 20
40
60
80
100
120
140
160
180
50
50
100
100
150
150
200
200
250
20
40
60
80
100
120
140
160
180
20
40
60
80
100
120
140
160
180
20
40
60
80
100
120
140
160
180
250 20
40
60
80
100
120
140
160
180
50
50
100
100
150
150
200
200
250
250 20
40
60
80
100
120
140
160
180
FIGURE 2. Solution of the total variation flow equation at time t ¼ 0, 0:01, 0:05, 0:1, 0:5, 1.
together with homogeneous Neumann conditions and initial data u. The parameters and k are strictly positive. Bingham fluid flow is a widely investigated model in fluid mechanics (see, e.g., [80]), in which the parameters and k have the physical meaning of yield stress and plastic viscosity. The particular properties of Bingham fluids make them extremely useful for image denoising (see [77]). Figure 3 shows the solution of Equation (4) at specified times. (4) Linear anisotropic diffusion filtering is based on matrix-valued diffusivity. Let u be a smooth approximation of u, then a linear anisotropic diffusion equation is @u ¼ r ðDðru ÞruÞ, @t
ð5Þ
450
OTMAR SCHERZER
50
50
100
100
150
150
200
200
250
250 20
40
60
80
100
120
140
160
180
50
50
100
100
150
150
200
200
250
20
40
60
80
100
120
140
160
180
20
40
60
80
100
120
140
160
180
20
40
60
80
100
120
140
160
180
250 20
40
60
80
100
120
140
160
180
50
50
100
100
150
150
200
200
250
250 20
40
60
80
100
120
140
160
180
FIGURE 3. Bingham filtering technique with k ¼ 0:05 and ¼ 0:05 at t ¼ 0, 0:2, 2, 10, 100, 1000.
where 1 DðrÞ ¼ 2 jrj þ 22
@ @ 2 , þ I , @y @x ð@=@xÞ ð@=@yÞ
ð6Þ
with > 0. In image processing the differential Equation (5) is considered with homogenous Neumann boundary conditions ðDðu ÞruÞ ¼ 0 and initial data u.1 u is considered an approximation of the filtered data which can be obtained, for instance, by solving the heat 1 The product of two vectors x ¼ ðx1 , . . . , xn Þ and y ¼ ðy1 , . . . , yn Þ is defined by x y ¼ Pn i¼1 xi yi :
DENOISING AND INVERSE PROBLEMS
451
equation with initial data u up to a certain time. In anisotropic diffusion models the matrix Dðru Þ is designed in such a way that the eigenvectors ~1 and ~2 are parallel, respectively orthogonal to ru . These methods prefer diffusion along edges to diffusion perpendicular to them. (5) Nonlinear anisotropic diffusion utilizes a matrix-valued diffusivity which itself depends on the solution. A typical example is @u ¼ r ðDðruÞruÞ @t
ð7Þ
where D( ) is as defined in (6). In Figure 4 we have evolved an image according to the nonlinear anisotropic diffusion equation. (6) The classical Perona–Malik filter [134,135] is the oldest nonlinear diffusion filter. It is based on the equation @u 1 ¼r ru @t 1 þ jruj2 =
ð8Þ
with a positive parameter l. In comparison with total variation flow the diffusivity is smaller near edges. So far no completely successful mathematical analysis for this model has been obtained. A few results concerning the existence of a solution have been given in [102]. (7) Mean curvature motion is a widely inspected model in applied mathematics describing phenomena such as crystal growth and polymer processing. The mean curvature equation @u 1 ¼ jrujr ru @t jruj
ð9Þ
is a paradigm of morphological differential equations. The image evolution according to the mean curvature motion is shown in Figure 5. It is common to distinguish between two classes of diffusion filtering techniques:
Giving tribute to Perona and Malik [134] for initiating the use of nonlinear diffusion filtering we call any of the differential equations @u ¼ r ðDðruÞruÞ @t
ð10Þ
452
OTMAR SCHERZER
50
50
100
100
150
150
200
200
250
250 20
40
60
80
100
120
140
160
180
50
50
100
100
150
150
200
200
250
20
40
60
80
100
120
140
160
180
20
40
60
80
100
120
140
160
180
20
40
60
80
100
120
140
160
180
250 20
40
60
80
100
120
140
160
180
50
50
100
100
150
150
200
200
250
250 20
40
60
80
100
120
140
160
180
FIGURE 4. Solution of nonlinear anisotropic diffusion at time t ¼ 0, 10, 500, 1000, 5000, 10,000; ¼ 104.
Perona–Malik diffusion filtering. Prototypes are the heat equation, the total variation, the Bingham model, as well as anisotropic diffusion. Typically one differentiates also between —isotropic Perona–Malik filtering models, where D is a one dimensional function and —anisotropic Perona–Malik, where D is a matrix-valued function with nonzero entries in the off-diagonals. Morphological partial differential equations are invariant under image transformation such as gray level modification and data set deformation. A paradigm of a morphological differential equation is the mean curvature flow equation. The use of Perona–Malik diffusion filtering can be motivated by showing its analogy to fluid flow.
453
DENOISING AND INVERSE PROBLEMS
50
50
100
100
150
150
200
200
250
250 20
40
60
80
100
120
140
160
180
50
50
100
100
150
150
200
200
250
20
40
60
80
100
120
140
160
180
20
40
60
80
100
120
140
160
180
20
40
60
80
100
120
140
160
180
250 20
40
60
80
100
120
140
160
180
50
50
100
100
150
150
200
200
250
250 20
40
60
80
100
120
140
160
180
FIGURE 5. Solution of the mean curvature equation at time t ¼ 0, 0:1, 1, 10, 100, 1000.
A. Level Set Modeling We consider the movement of a fluid in ¼ ð0, 1Þ ð0, 1Þ over time. The conservation of mass principle (see, e.g., [48]) is expressed by the differential equation @ ðt, xÞ þ r ð~Þðt, xÞ ¼ 0, @t
ð11Þ
where is the density of the fluid and ~ is the velocity field of the fluid. For a fixed time t we consider a level curve of the density. A level curve xðÞ ¼ ðx1 ðÞ, x2 ðÞÞ, parameterized by , in satisfies ðt, xðÞÞ ¼ constant for all :
454
OTMAR SCHERZER
Differentiating this equation with respect to we find that @ @x1 @ @x2 ðÞ þ ðÞ ¼ 0: ðt, xðÞÞ ðt, xðÞÞ @x1 @x2 @ @
ð12Þ
The direction of the tangent at the level curve at x() is t~ :¼
ð@x1 =@Þ ðÞ : ð@x2 =@Þ ðÞ
If t~ 6¼ 0 and rðt, xðÞÞ ¼
@ @ ðxðÞÞ 6¼ 0, , @x1 @x2
then from (12) it follows that the vectors r and t~T are orthogonal; here and in the following T denotes the transpose of a vector or matrix. Fick’s law states that the velocity ~ is orthogonal to the level curves, i.e., ~ :¼ Cr with C < 0; the negative sign indicates that the direction of the flow is from regions of high density to regions of low density. Different models can be imagined by adequately choosing C: (1) If C ¼ 1/, then we get the diffusion filtering technique (1). This choice of C represents the fact that an object in a fluid of higher density moves slower than an object in a fluid of lower density. (2) If C ¼ 1=ðjrjÞ, then we get the diffusion filtering technique (3). The choice of C represents the fact that an object in a fluid moves faster at smooth portions of level curves. (3) If C ¼ ð1=ÞDðr Þ the flux is biased both in the tangential direction and in normal direction to the level curve. To derive the analogy of image diffusion filtering and fluid flow we identify the gray value image data with the density of a fluid. Since the fluid flow equations have been derived from the conservation of mass principle, we have mean gray value invariance for diffusion filtering in image processing, that is Z ðt, xÞ dx ¼ constant over time:
DENOISING AND INVERSE PROBLEMS
455
Without external forces a fluid flow is subsequently simplified: for instance, we might expect that the entropy of the density increases over time and that finally the density approaches a constant function. The analogy between fluid flow and image diffusion filtering suggests a subsequent simplification of the gray value data leading to a scale space of images. In image processing a quantization of these phenomena via Lyapunov functionals has been given by Weickert [172].
B. Morphological Diffusion Filtering Morphological diffusion filtering techniques, such as (9), are closely related to shape and curve evolutions. To illustrate this connection we recall the definition of curvature. Let c : ½0, 2pÞ ! R2 °
x1 ðÞ
!
x2 ðÞ
be a closed parameterized curve in R2, then the standard definition (see, e.g., [33]) of curvature is K¼
ð@x1 =@Þ ð@2 x2 =@ 2 Þ ð@2 x1 =@ 2 Þ ð@x2 =@Þ , ðð@x1 =@Þ2 þ ð@x2 =@Þ2 Þ3=2
where @ =@ denotes the derivative with respect to the curve parameter . Let C : ½0,1Þ ½0,2pÞ ! R2 ðt, Þ !
x1 ðt, Þ
!
x2 ðt, Þ
be a temporally varying oriented closed curve. We consider the curvaturebased evolution process @C ðt, Þ ¼ ðKÞðt, Þ ðt, Þ, @t
ð13Þ
where denotes the normal vector to the curve C, and is an appropriate scalar-valued function.
456
OTMAR SCHERZER
Let f 2 C 2 ð½0, 1Þ Þ. We assume that the zero level set LðtÞ :¼ fx ¼ ðx1 , x2 Þ : f ðt, xÞ ¼ 0g can be parameterized by a curve C(t, ) which evolves according to (13) and that f is locally invertible in a neighborhood N of the zero-level set, i.e.,
@f @f rf ¼ 6 0 in N : ¼ , @x1 @x2 Then f ðt, Cðt, ÞÞ ¼ 0 for all 2 ½0, 2pÞ and t 2 ½0, 1Þ. Consequently, by differentiation with respect to and t we get rf ðt, Cðt, ÞÞ
@C @f ðt, Þ þ ðt, Cðt, ÞÞ ¼ 0, @t @t @C rf ðt, Cðt, ÞÞ ðt, Þ ¼ 0, @
ð14Þ
for all 2 ½0, 2pÞ, t 2 ½0, 1Þ. The latter equation shows that rf ðt, Cðt, ÞÞ and the tangential vector on the level curve ð@C=@Þðt, Þ ¼ ð@x1 =@, @x2 =@ÞT ðt, Þ are orthogonal, which implies that rf ðt, Cðt, ÞÞ is proportional to the normal vector ð@x2 =@, @x1 =@ÞT ðt, Þ on the level curve, that is
@f @f , rf ðt, Cðt, ÞÞ ¼ ðt, Cðt, ÞÞ @x1 @x2 @x2 @x1 T , ðt, Þ: ¼ ðt, Þ @ @ Since we assumed that jrf j 6¼ 0 we find that differentiation of (15) with respect to we get
ð15Þ
ðt, Þ 6¼ 0. Then by
2
ðt, Þ
2 @2 x2 @ f @f @2 f @f @ @f ðt, Cðt, ÞÞ ðt, Þ ðt, Þ ¼ þ ðt, Cðt, ÞÞ @ @x1 @ 2 @x21 @x2 @x1 @x2 @x1
2
ðt, Þ
2 @2 x1 @ f @f @2 f @f @ @f ðt, Þ ðt, Þ ¼ þ ðt, Cðt, ÞÞ: ðt, Cðt, ÞÞ þ @ @x2 @ 2 @x22 @x1 @x1 @x2 @x2
DENOISING AND INVERSE PROBLEMS
457
Consequently, we have 1 K¼ jrf j3 Let
(
@f @x2
2
)1 @2 f @2 f @f @f @2 f @f 2 2 þ : @x1 @x2 @x1 @x2 @x22 @x1 @x21
ð16Þ
< 0, then 1T 0sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 11 0 @x2 2 2 ðt, Þ @x1 @x2 A B @ rf C : ðt, Þ ¼ @ þ @ A ¼ @x jrf j @ @ 1 ðt, Þ @
Note that if f(t, ) is monotonically increasing into the interior of a domain ˜ ˜ ðtÞ with boundary L(t), then rf =jrf j points in outside direction of ðtÞ, which implies that < 0. Using the abbreviations Hf for the Hessian of f it follows that curvð f Þ :¼ r ¼
rf jrf j
jrf j2 f rf T Hf rf jrf j3
¼ K:
ð17Þ
This, together with (14), shows that the level set formulation of (13) is @f ðt, xÞ ¼ ðcurvð f Þðt, xÞÞjrf ðt, xÞj: @t
ð18Þ
Examples of curvature-based morphological processes are summarized in the following:
(t) 1 1 t t1/3
morphological process dilation erosion mean-curvature flow affine invariant mean-curvature flow.
Morphological diffusion filtering methods have been derived axiomatically in [2–4,37].
458
OTMAR SCHERZER
C. Applications of Diffusion Filtering The design and mathematical analysis (including existence, uniqueness, and stability results for solutions) of diffusion filtering techniques are active research areas. Appropriate filtering is important as a prerequisite step for image segmentation and edge detection, to mention but a few applications. In the following we apply Canny’s edge detection algorithm from the software package MATLAB [112] to the filtered ultrasound examples in Figures 1, 2, 4, and 5, respectively. There are various parameters to be tuned in the implementation of Canny’s edge detection algorithm, which might of course have considerable effect on the detected edges. For reasons of comparison we have used the standard MATLAB setting, which does not require input parameters. Canny’s edge detector is a sophisticated algorithm to extract edges in image data. For more background on this method we refer the reader to [36,108]. As can be realized from Figures 6–10, appropriate filtering is an important prerequisite for edge detection. D. Scale-Space Theory Images contain structures at a variety of scales. Any feature can optimally be recognized at a particular scale. This has already been observed in the edge detection example above. If the optimal scale is not available a priori, it is desirable to have an image representation at multiple scales. A scale space is an image representation at a continuum of scales, embedding the image u into a family fTt ðu Þ : t 0g of gradually simplified versions satisfying: (1) Fidelity: T0 ðu Þ ¼ u : (2) Causality: Ttþs ðu Þ ¼ Tt ðTs ðu ÞÞ for all s, t 0: (3) Regularity: lim Tt ðu Þ ¼ u :
t!0þ
DENOISING AND INVERSE PROBLEMS
459
FIGURE 6. Canny’s edge detector applied to the solution of the heat equation at time t ¼ 0, 0:1, 1, 10, 100, 1000. At t ¼ 1000 the image is so blurred that no edges could be detected.
The differential equations introduced above satisfy these properties with Tt ðu Þ ¼ uðt,Þ : In mathematics, a family of operators Tt satisfying fidelity, causality, and regularity is called semi-group. For more background on semi-group theory we refer the reader to Pazy [133] (in the linear case) and Bre´zis [32] (in the nonlinear case).
460
OTMAR SCHERZER
FIGURE 7. Canny’s edge detector applied to the solution of the total variation flow equation at time t ¼ 0, 0:01, 0:05, 0:1, 0:5, 1.
III. REGULARIZATION
OF INVERSE
PROBLEMS
A vague characterization of inverse problems is that they are concerned with determining causes for a desired or an observed effect. Such problems appear in a variety of applications like (1) Medical imaging such as CT (see, e.g., [25,94,123,170]). A mathematical framework for CT has been analyzed by Radon [140]. The theory has been applied in other areas including radioastronomy (e.g., [28]) and electron microscopy (e.g., [79]). (2) Signal and Image processing, such as the extrapolation of band-limited functions (see, e.g., [35]).
DENOISING AND INVERSE PROBLEMS
461
FIGURE 8. Canny’s edge detector applied to the solution of the Bingham filtering technique at time t ¼ 0, 0:2, 2, 10, 100, 1000.
It is well known that many inverse problems violate Hadamard’s principle of well-posedness, that is, at least one of the following postulates is violated: (1) There exists a solution. (2) The solution is unique. (3) The solution depends continuously on the input data. If one of these properties is violated the problem is said to be ill-posed or improperly posed. Regularization methods are numerical algorithms for solving ill-posed problems in a stable way. In the linear setting Torre and Poggio [165] emphasize that differentiation is ill-posed, and that applying suitable regularization strategies approximates linear diffusion filtering or—equivalently—Gaussian convolution. Much of the linear-scale-space
462
OTMAR SCHERZER
FIGURE 9. Canny’s edge detector applied to the solution of nonlinear anisotropic diffusion at time t ¼ 0, 10, 500, 1000, 5000, 10,000; ¼ 104 .
literature is based on the regularization properties of convolutions with Gaussians. In particular, differential geometric image analysis is performed by replacing derivatives by Gaussian-smoothed derivatives; see, e.g., [76,106,126,156] and references therein. In order to present a general framework of regularization methods it is convenient to consider an inverse problem as the problem of solving an illposed operator equation F ðuÞ ¼ y0 :
ð19Þ
Here F : DðF Þ X ! Y is an operator defined on an appropriate subset D(F) of a space X.
DENOISING AND INVERSE PROBLEMS
463
FIGURE 10. Canny’s edge detector applied to the solution of the mean curvature motion at time t ¼ 0, 0:1, 1, 10, 100, 1000.
We use the following terminology:
Linear inverse problems: if F is a linear operator. (1) If F ¼ I, the identity operator, then the linear inverse problem is called denoising. (2) If F is a convolution, that is Z F uðxÞ ¼
kðjx yj2 ÞuðyÞ dy,
ð20Þ
with k being a smooth function, and j j being the Euclidean distance, then the problem is referred to as deblurring.
464
OTMAR SCHERZER
(3) If F is the Radon transform (see, e.g., [123]), then the problem of solving (19) is the problem of computerized tomography.
Nonlinear inverse problems: if F is a nonlinear operator.
Regularization methods were first considered by Tikhonov in 1930. Since that time regularization theory has developed systematically. (1) During the 1980s there was success in a rigorous analysis of linear ill-posed problems. We mention the books of Louis [107], Groetsch [86], Tikhonov and Arsenin [164], Morozov [116], Nashed [122], Engl and Groetsch [66], Natterer [123,124], Bertero and Boccacci [25], Kirsch [103], and Colton and Kress [52,53,105]. See also Groetsch [83,84] for some elementary introduction in the topic of inverse problems. (2) Since 1989, starting with three fundamental papers of Seidman and Vogel [154] and Engl and co-workers [68,125], regularization theory for nonlinear inverse problems developed systematically. Some expository books on this topic are [21,67,99,116,117], to name but a few. (3) Acar and Vogel [1] and Geman and Yang [78] proposed a novel framework of nondifferentiable regularization of Tikhonov type. This work stimulated the development of regularization methods for efficiently recovering discontinuous solutions in inverse problems.
A. Tikhonov-Type Regularization Methods Tikhonov proposed approximating the solution of the operator Equation (19) by the minimizer of the functional (Tikhonov regularization) f ðuÞ :¼ kF ðuÞ y k2Y þ ku u* k2X :
ð21Þ
Here, u* 2 X is some initial (a priori selected) guess on the desired solution and y is an approximation of the right-hand-side data in (19). The classical theory of regularization methods assumes a Hilbert space setting, that is X and Y are Hilbert spaces and that F : DðF Þ X ! Y is (1) continuous and (2) weakly (sequentially) closed, that is for any sequence fun gn2N DðF Þ, xn*X x and F ðxn Þ*Y y imply x 2 DðF Þ and F(x) ¼ y The Hilbert space X and Y are associated with inner products and norms, h,iX
and
h,iY ,
k:kX
and
k:kY ,
DENOISING AND INVERSE PROBLEMS
465
respectively. In almost every application considered in practice Y ¼ L2() with inner product Z fg f,g ¼
is used; typically for X Sobolev spaces of weakly differentiable functions are used. There exists a variety of results showing that Tikhonov’s approach in fact yields a regularization method, that is (1) there exists minimizer of (21) and (2) for fixed > 0 the minimizers are stable with respect to perturbations in y. (3) Even more, it has been proved that for an appropriate choice of the minimizer is an approximation of the solution of (19).
B. Regularization Models for Denoising For denoising we have F ¼ I and y ¼ u . Tikhonov-type regularization methods for denoising consist in minimizing a functional Z
ðu u Þ2 þ
f ðuÞ :¼
Z gðjruj2 Þ:
ð22Þ
(1) g(t) ¼ t is refered to as H 1 -semi-norm regularization. (2) A popular specific energy functional arises frompunconstrained total ffiffi variation denoising [1,41,43,46]. Here gðtÞ ¼ t. This method is called BV-semi-norm regularization. (3) The combination of H1- and BV-semi-norm regularization gives pffiffi gðtÞ ¼ kt þ t. This method exhibits similar filtering properties as the Bingham fluid flow. (4) The regularization counterpart to linear anisotropic diffusion filtering consists in minimizing the functional Z
ðu u Þ2 þ
Z j1=2 Vruj2 ,
where Dðru Þ ¼ V T V
ð23Þ
466
OTMAR SCHERZER
is the singular value decomposition of D(ru) with " 1=2
¼
1=2 1
0
0
21=2
#
with l1 and l2 the singular values of Dðru Þ. (5) Minimizing the functional in (23) where and V are dependent on u results in the regularization counterpart to nonlinear anisotropic diffusion. C. Relations between Regularization and Perona–Malik Diffusion Filtering Let us assume that the functional (22) is defined on the Sobolev space H1(), that is the space of weakly differentiable functions. Moreover, we assume that there exists a minimizer, which is denoted by u . Then for any h 2 H 1 ðÞ and any real number t the definition of u implies that f ðu þ thÞ f ðu Þ 0, which is equivalent to Z n
2 2 o u þ th u u u
Z
g jru j2 þ 2tru rh þ t2 jrhj2 g jru j2 0:
þ
If g is twice differentiable, then by making a Taylor series expansion, we find
g jru j2 þ 2tðru rhÞ þ t2 jrhj2
¼ g jru j2 þ 2tru rh þ t2 jrhj2 g0 jru j2 þ Oðt2 Þ
¼ g jru j2 þ 2tðru rhÞg0 jru j2 þ Oðt2 Þ: Therefore, for t > 0 we have Z Z
0 ð2hðu u Þ þ th2 Þ þ 2 ðru rhÞg0 jru j2 þ OðtÞ:
Taking the limit t ! 0þ shows Z Z
hðu u Þ þ ðru rhÞg0 jru j2 : 0
DENOISING AND INVERSE PROBLEMS
467
Simulating the above calculation with t instead of t gives Z 0
hðu u Þ þ
Z
ðru rhÞg0 jru j2 :
Thus, in total, we have Z 0¼
hðu u Þ þ
Z
ðru rhÞg0 jru j2 :
Then, by using Green’s formula, we get Z 0¼
Z ¼
hðu u Þ
Z
r g0 jru j2 ru h þ
hðu u r g0 jru j2 ru Þ þ
Z @
Z @
g0 jru j2 ðru Þh
hg0 ðjru j2 Þðru Þ,
ð24Þ
where denotes the unit normal vector on @. If g0 > 0, then since (24) holds for all h 2 H 1 ðÞ, we find
u u ¼ r g0 jru j2 ru on , @u ¼ ðru Þ on @: 0¼ @
ð25Þ
In particular, setting ¼ t, u(0) ¼ u, u(t) ¼ u shows that Tikhonov regularization with small regularization parameter t provides an approximation of the solution of the diffusion filtering method
@u ¼ r g0 jruj2 ru on , @t @u ¼ 0 on @ , @ uð0Þ ¼ u on , at time t. In other words, the regularization parameter and the diffusion time can be identified if regularization is regarded as time-discrete diffusion filtering with a single implicit time step [115,138,139,145,148,158]. Moreover,
468
OTMAR SCHERZER
iterated regularization with small regularization parameters, consisting in subsequently minimizing the functionals f ðkÞ ðuÞ :¼
Z
ðu uðk1Þ Þ2 þ
Z gðjruj2 Þ , k ¼ 1, 2 . . .
ð26Þ
and denoting the minimizers by u(k) approximates a diffusion process. The basic connection between regularization and diffusion filtering methods is the basis of both practical considerations and fundamental mathematical theory, such as (nonlinear) semi-group theory (see [31,133]). D. Numerical Experiments The numerical experiments presented below have been considered in [138,139,148] and illustrate the behavior of different regularization strategies. For more details on the numerical implementation we refer to these papers. Figure 11 shows three common test images and a noisy variant of each of them: an outdoor scene with a camera, a magnetic resonance (MR) image of a human head, and an indoor scene. Gaussian noise with zero mean has been added. Its variance was chosen to be a quarter, equal to, and four times the image variance, respectively. We applied linear and total variation regularization to the three noisy test images, used 1, 4, and 16 regularization steps, and varied the regularization parameter until the optimal restoration was found. Discretizing stabilized total variation regularization with gðxÞ ¼
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 þ x
leads to a nonlinear system of equations. The system of nonlinear equations was solved numerically for ¼ 0.1 by combining convergent fixed point iterations as outer iterations [62] with inner iterations using the Gauss– Seidel algorithm for solving the linear system of equations. The results are shown in Figures 12 and 13. Figure 14 shows BV-denoised and rendered 3D ultrasound data. This gives rise to the following conclusions (Figure 15):
In all cases total variation (BV) regularization performed better than Tikhonov regularization. As expected, total variation regularization leads to visually sharper edges. The BV-restored images consist of piecewise almost constant patches.
DENOISING AND INVERSE PROBLEMS
469
FIGURE 11. Test images. Top left: Camera scene. Top right: Gaussian noise added. Middle left: Magnetic resonance image. Middle right: Gaussian noise added. Bottom left: Office scene. Bottom right: Gaussian noise added.
In the linear case, iterated Tikhonov regularization produced better restorations than noniterated. Visually, noniterated regularization results in images with more high-frequency fluctuations. Improvements caused by iterating the regularization were mainly seen between 1 and 4
470
OTMAR SCHERZER
FIGURE 12. Optimal restoration results for H 1 -regularization (cf. (22)). Top left: Camera, 1 iteration. Top right: Camera, 16 iterations. Middle left: MR image, 1 iteration. Middle right: MR image, 16 iterations. Bottom left: Office, 1 iteration. Bottom right: Office, 16 iterations.
iterations. Increasing the iteration number to 16 hardly leads to further improvements. It appears that the theoretical and experimental results in the linear setting do not necessarily carry over to the nonlinear case with total variation regularization. For the slightly degraded camera
DENOISING AND INVERSE PROBLEMS
471
FIGURE 13. Optimal restoration results for total variation regularization. Top left: Camera, 1 iteration. Top right: Camera, 16 iterations. Middle left: MR image, 1 iteration. Middle right: MR image, 16 iterations. Bottom left: Office, 1 iteration. Bottom right: Office, 16 iterations.
image, iterated regularization performed worse than noniterated regularization. For the MR image, the differences are negligible, and the highly degraded office scene allows better restoration results with iterated regularization.
472
OTMAR SCHERZER
pffiffiffiffiffiffiffiffiffiffiffiffiffi FIGURE 14. Bounded variation seminorm denoising with gðtÞ ¼ t þ 2 and ¼ 0:001 of three-dimensional ultrasound data (top). The left column shows the renderings for noniterated, the right column for iterated regularization. The regularization parameter for iterated regularization was ¼ 2.
IV. MUMFORD–SHAH FILTERING The Mumford–Shah filtering technique has been proposed in [118] for simultaneous filtering and edge detection of noisy piecewise continuous data. Since then the Mumford–Shah technique has received considerable interest, theoretically, due to the challenging mathematics involved (see, e.g.,
473
DENOISING AND INVERSE PROBLEMS
FIGURE 15. Results for the MR image from Figure 11(a) with noniterated and iterated regularization ( ¼ 0:001). The left column shows the results for noniterated, the middle column for iterated regularization. The images in the right column depict the modulus of the differences between the results for the iterated and noniterated method.
[6–10,115] and references therein), for segmentation applications, and its numerical implementation (see, e.g., [29,30,38,39,59,60]). Formally the Mumford–Shah segmentation model looks like a regularization functional, and consists in minimizing the functional Z f ðu, KÞ :¼
2
jruj þ 2 H ðKÞ :
Z
ðu u Þ þ 1
2
1
ð27Þ
nK
Here, 1 > 0, 2 > 0, and K is the discontinuity set (edges and corners) of u, which is assumed to be of finite one-dimensional Hausdorff measure, i.e., H1 ðKÞ < 1; n K denotes the set excluded by the discontinuity set K. For instance for a rectifiable curve, the one-dimensional Hausdorff measure is the length of the curve. The minimizer u :¼ u 1 , 2 is the filtered data with discontinuity set K :¼ K 1 , 2 .
474
OTMAR SCHERZER
R The functional 1 nK jruj2 þ 2 H1 ðKÞ serves as a penalization functional and is designed to simultaneously penalize for (1) high oscillations of the filtered data u outside the discontinuity set and (2) complex (long) discontinuity sets K. For the numerical minimization of f(u, K) a common tool is to use non-local approximations, like the Ambrosio–Tortorelli approximation (see [9–11]), where the minimizer of f(u, K) is approximated by the minimizer of the functional Z
2
f ðu, wÞ :¼
Z
ðu u Þ þ 1
Z 1 2 2 w jruj þ 2
jrwj þ ð1 wÞ : ð28Þ
2
2
This functional is minimized with respect to ðu, wÞ 2 H 1 ðÞ H 1 ðÞ and no longer involves tedious Rminimization over a family of discontinuity sets K. The functional ð1= Þ ð1 wÞ2 in (28) penalizes for w 6¼ 1. Eventually, for ! 0þ this term becomes dominant and the set where w 6¼ 1 becomes one-dimensional, e.g., a curve with finite length. For ! 0þ the set fw 6¼ 1g eventually becomes the discontinuity set K of the minimizer of the Mumford–Shah functional (27). The minimizer (u, w) of the functional (28) satisfies the optimality condition, which is a system of coupled partial differential equations ðu u Þ 1 r ðw2 ruÞ ¼ 0, wjruj2
2
2 ð1 wÞ ¼ 0, w 1 1
ð29Þ
together with homogeneous Neumann boundary data for both u and w. Figure 16 shows some numerical simulations for Mumford–Shah segmentation and filtering by solving the system of coupled differential equations (29).
V. REGULARIZATION
AND
SPLINE APPROXIMATION
So far, the regularization models have been presented in an infinite dimensional setting. In this section we review a relation between regularization and cubic spline approximation, by using a semi-infinite dimensional setting.
DENOISING AND INVERSE PROBLEMS
475
FIGURE 16. Top: Test data u . Bottom left: u solving (29). Bottom right: w approximating the discontinuity set.
Suppose u0 is a smooth function on 0 x 1 and noisy samples ui of the values u0 ðxi Þ are known at the points of a uniform grid ¼ f0 ¼ x0 < x1 < < xn ¼ 1g: Let h ¼ xiþ1 xi be the mesh size of the grid and suppose jui u0 ðxi Þj ,
ð30Þ
where is a known level of noise in the data. For the sake of simplicity of presentation we assume that the boundary data are known exactly: u0 ¼ u0 ð0Þ and
un ¼ u0 ð1Þ:
476
OTMAR SCHERZER
We are interested in finding a smooth approximation @u/@x of @u0/@x in (0, 1), from the given data ui . To make the computations concrete we have to quantize the terminology ‘‘smooth,’’ which we characterize by the size of the second derivative, i.e., we consider a function to be smooth if the second derivative is small. Then, this approximation problem can be formulated as a constraint optimization problem. R1 2 Problem 5.1 Minimize 0 @2 u=@x2 among all smooth functions u satisfying uð0Þ ¼ u0 ð0Þ, uð1Þ ¼ u0 ð1Þ, and n1 1 X ðu uðxi ÞÞ2 2 : n 1 i¼1 i
ð31Þ
Then, take the derivative @u* =@x of the minimizing element u* as an approximation of @u0 =@x. In fact, given the uncertainty in the data, all functions u satisfying (31) can be considered as solution candidates. The minimizer of Problem 5.1 is the particular candidate that is ‘‘smoothest.’’ If the minimizing element u* of Problem 5.1 satisfies the constraint (31) with strict inequality (i.e., the constraint (31) is inactive) then u* ðxÞ ¼ u0 ð0Þ þ xðu0 ð1Þ u0 ð0ÞÞ,
ð32Þ
i.e., it is the straight line interpolating the two boundary values. This case occurs if and only if u* satisfies the constraint (31). Excluding this trivial case, the minimizer u* satisfies (31) with equality and hence can be calculated using the method of Lagrange. If 1= denotes the corresponding Lagrange multiplier for constraint (31), the equivalent formulation of Problem 5.1 is: Problem 5.2 Minimize f ðuÞ :¼
n1 1 X ðu uðxi ÞÞ2 þ n 1 i¼1 i
Z
@2 u @x2
2 ð33Þ
among all smooth functions u satisfying uð0Þ ¼ u0 ð0Þ, uð1Þ ¼ u0 ð1Þ, where is such that the minimizing element u of (33) satisfies n1 1 X ðu u ðxi ÞÞ2 ¼ 2 : n 1 i¼1 i
ð34Þ
DENOISING AND INVERSE PROBLEMS
477
The derivative @u /@x of the minimizing function u is then an approximation of @u0/@x. The model essentially differs from the regularization models considered in Section III, since here discrete sample data are available. Also the scope is to find a smooth approximation of the derivative of u and not just a denoised approximation of u as in the regularization methods considered in Section III.B. Problem 5.2 is a special instance of Tikhonov regularization. The way of choosing the regularization parameter in Problem 5.2 is called the discrepancy principle [87]. Except for the interpolatory constraints at the boundary of the interval, (33) has been investigated and solved by Schoenberg [151] and Reinsch [142], who showed that the solution of Problem 5.2 is a natural cubic spline over the grid . Reinsch also gives a constructive algorithm for calculating this spline. A more comprehensive level on the interaction between cubic spline approximation and numerical differentiation can be found in [92], see also Hanke [90]. The interaction between regularization and spline approximation is not limited to cubic splines. In fact it can be shown that the optimal solution u of the functional Z m 2 X @ u 2 ðui uðxi ÞÞ þ with m ¼ 1, 2, . . . m @x i2Z is a combination of B-splines of order n ¼ 2m 1,2 i.e., X ui 2m1 ðx kÞ, u ðxÞ ¼ k2Z
where 2m1 ðÞ denotes the B-spline of order 2m 1, which is defined as follows
2m1 ¼ 0 * 0 * . . . * 0 , |fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl} 2m times
where
* denotes convolution and
8 > < 1, 0
¼ 12 , > : 0,
2
12 < x < 12 , jxj ¼ 12 , otherwise:
Z denotes the set of integer numbers, . . . , 1, 0, 1, . . .
478
OTMAR SCHERZER
A survey on the interaction of splines and regularization can be found in Unser [168]. The topic of numerical differentiation has been studied extensively before (see, e.g., Torre and Poggio [165], Murio [119], Groetsch [82–84]). For more background on the topic of spline approximation we refer to Schoenberg [151], Reinsch [142], Schultz [152], Strang and Fix [157], de Boor [58], Schumaker [153], Rice and Rosenblatt [143], and Wahba [169] to mention a few. VI. SCALE-SPACE METHODS FOR INVERSE PROBLEMS As has been shown in [147] the concept of diffusion filtering cannot be used directly for the solution of ill-posed operator equations. The argumentation is outlined below. For the moment we restrict our attention to Tikhonov functionals defined on the Sobolev space of differentiable functions H 1 ðÞ where the stabilization term ku u* k2X in (21) is replaced by Z gðjruj2 Þ :
Then, arguing as in Section III.C the minimizer u of (22) satisfies for any h 2 H 1 ðÞ and any real number t f ðu þ thÞ f ðu Þ 0, which is equivalent to Z
2 2 F ðu þ thÞ y F ðu Þ y Z
g jru j2 þ 2tðru rhÞ þ t2 jrhj2 g jru j2 0:
þ
A Taylor series expansion of ðF ðu þ thÞ y Þ2 gives ðF ðu þ thÞ y Þ2 ¼ ðF ðu Þ y Þ2 þ 2tðF 0 ðu ÞhÞðF ðu Þ y Þ þ Oðt2 Þ: Then by similar arguments as in Section III.C we find Z
0
0¼
Z
ðru rhÞg0 jru j2 :
ðF ðu ÞhÞðF ðu Þ y Þ þ
DENOISING AND INVERSE PROBLEMS
479
Using Green’s formula, we get Z
2
0
Z
r g0 jru j2 ru h
hF ðu Þ ðF ðu Þ y Þ
0¼
Z
þ
@
hg0 jru j2 ðru Þ:
Here F 0 ðu Þ2 denotes the L2 -adjoint of F 0 ðu Þ, i.e., Z
F 0 ðu Þ2 ðÞw ¼
Z
F 0 ðu ÞðwÞ for all 2 L2 ðÞ, w 2 H 1 ðÞ:
This shows that the optimality criterion for the minimizer u of (21) is F 0 ðu Þ2 ðF ðu Þ y Þ ¼ r ðg0 ðjru j2 Þru Þ on , @u ¼ 0 on @: @ In the case of noise-free attainable data, that is for y ¼ y0 ¼ F ðuy Þ, we have 0
F ðu Þ
2
F ðu Þ F ðuy Þ ¼ r ðg0 ðjru j2 Þru Þ
and there exists an associated diffusion-type methodology F 0 ðu Þ2 F 0 ðuÞ
@u ¼ r ðg0 ðjruj2 ÞruÞ on ð0, 1Þ , @t @u ¼ 0 on ð0, 1Þ @, @
uð0Þ ¼ uy on :
ð35Þ
Due to the ill-posedness of the operator Equation (19) there will generally not exist a solution of (19) when y0 is replaced by y 6¼ y0 . The ill-posedness thus prohibits an a priori estimation of an approximation of uy . Thus method (35) is inappropriate for calculating a scale space of an inverse problem. The relation to diffusion filtering becomes apparent if we use F ¼ I, Y ¼ L2 ðÞ, the space of square integrable functions, X ¼ H 1 ðÞ, the Sobolev space of weakly differentiable functions, and the H 1 ðÞ-seminorm for regularization, that is gðxÞ ¼ x.
480
OTMAR SCHERZER
In this setting the minimizer u of the Tikhonov functional satisfies u u ¼ u on , @u ¼ 0 on @: @ Thus for ! 0þ the diffusion filtering Equation (1) is approximated. The iterative Tikhonov–Morozov method is a variant of Tikhonov regularization for solving inverse problems. This method consists in iteratively minimizing the sequence of functionals f ðkÞ ðuÞ :¼ kF ðuÞ y k2Y þ k ku uðk1Þ k2X , k ¼ 1, 2, . . .
ð36Þ
and denoting the minimizer by uðkÞ . If the functionals f ðkÞ are convex, then the minimizers uðkÞ satisfy F 0 ðuÞXY ðF ðuðkÞ Þ y Þ þ k ðuðkÞ uðk1Þ Þ ¼ 0,
k ¼ 1, 2, . . .
ð37Þ
Here F 0 ðuÞXY denotes the adjoint of F 0 ðuÞ with respect to the spaces X and Y, that is
F 0 ðuÞXY ðÞ, w X ¼ , F 0 ðuÞðwÞ Y for all 2 Y, w 2 X:
Typically in the Tikhonov–Morozov method one sets uð0Þ ¼ 0. But any other choice is suitable as well. For example, a priori information on the solution may be incorporated in the initial approximation uð0Þ . Taking k ¼ 1=ðtk tk1 Þ shows that uðkÞ and uðk1Þ can be considered as approximations of the solution u of the asymptotic Tikhonov–Morozov filtering technique @u ¼ F 0 ðuÞXY ðF ðuÞ y Þ in ð0, 1Þ , @t uð0, Þ ¼ uð0Þ ¼ 0 in :
ð38Þ
For F ¼ I, the embedding operator from H 1 ðÞ into L2 ðÞ, the iterative Tikhonov–Morozov method, where we use the H 1 -seminorm for regularization instead of the full norm, generates minimizers uðkÞ of the functionals f
ðkÞ
Z
2
ðuÞ :¼
Z
ðu u Þ þ k
jru ruðk1Þ j2 , k ¼ 1, 2, . . .
ð39Þ
DENOISING AND INVERSE PROBLEMS
481
Accordingly, the asymptotic Tikhonov–Morozov method consists in solving the differential equation of third order u u ¼
@u in ð0, 1Þ , @t
@u ¼ 0 on ð0, 1Þ @, @ uð0, Þ ¼ 0 on :
ð40Þ
Figure 17 shows the evolution of the solution of the differential equation (40). It starts with a completely diffused image; at t ¼ 1 the input data is restored. In analogy to scale-space theory (cf. Section II.D) we call this method the inverse scale-space method, since it generates a data representation at a continuum of scales, embedding the input data u into a family of gradually simplified versions initialized with a totally blurred imaged. In Section VI.A we discuss the asymptotic Tikhonov–Morozov method for deblurring images. In this case, F is a linear integral operator. For this particular model problem we can motivate preferences of different numerical methods in inverse problems and image processing. A. Deblurring with a Scale-Space Method We consider a problem of deblurring data to recover a function uy on ¼ ð0, 1Þ2 given (blurred) data y ¼ F uy þ noise :¼
Z
kðj yjÞuy ðyÞ dy þ noise
on . To formulate the Tikhonov–Morozov method we have to specify a similarity measure for the data and an appropriate function space containing uy . In this section we restrict our attention to those uy in one of the following three spaces: (1) The Sobolev space H 1 ðÞ, that is the Hilbert space of weakly differentiable functions u that satisfy Z kukH 1 :¼
1=2 jruj2 þ !juj2
<1
with an appropriate positive weighting parameter ! > 0.
482
OTMAR SCHERZER
FIGURE 17. The result of nonstationary regularization (39) and regularization parameters 1 ¼100,000, 2 ¼50,000, 3 ¼25,000, 4 ¼12,500, 5 ¼6250, 6 ¼3125, 7 ¼1500, 8 ¼750, 9 ¼300, 10 ¼150, 11 ¼75, and 12 ¼30 (the last image is not visually different to the input data).
(2) The more general Banach space W 1,p ðÞ, with p > 1, of functions u satisfying Z 1=p p p kukW 1,p :¼ jruj þ !juj < 1,
with an appropriate positive weighting parameter ! > 0. (3) The space BVðÞ of functions of bounded variation. That is the class of functions u satisfying Z kukBVðÞ :¼ ðjruj þ !jujÞ < 1:
DENOISING AND INVERSE PROBLEMS
For a function u 2 BVðÞ the term measure (see [73]).
R
483
jruj has to be understood as a
An appropriate choice for the similarity measure is the L2 ðÞ-norm. Depending on a priori information on uy it is instructive to study the Tikhonov–Morozov method in a variety of settings.
If uy 2 H 1 ðÞ, it is appropriate to consider F as an operator from H 1 ðÞ into L2 ðÞ. Accordingly, the iterated Tikhonov–Morozov method consists in minimizing fHðkÞ1 ðuÞ :¼ kF u y k2L2 ðÞ þ k ku uðk1Þ k2H 1 ðÞ :
ð41Þ
Instead of the H 1 -norm the H 1 -seminorm can be used if F does not annihilate constant functions. In particular, for denoising images, that is if F ¼ I the H 1 -seminorm is suitable. For ill-posed problems, such as deconvolution problems, this seminorm may lead to some numerical difficulties. For uy 2 W 1,p ðÞ, p > 1, the corresponding Tikhonov–Morozov method consists in minimizing the functional ðkÞ 2 ðk1Þ p kW 1,p ðÞ : fW 1,p ðuÞ :¼ kF u y kL2 ðÞ þ k ku u
ð42Þ
For uy 2 BVðÞ the Tikhonov–Morozov method consists in minimizing ðkÞ ðuÞ :¼ kF u y k2L2 ðÞ þ k ku uðk1Þ kBVðÞ : fBV
ð43Þ
Since the operator F is self-adjoint on L2 ðÞ, that is F 2 ¼ F, the asymptotic Tikhonov–Morozov method in the H 1 -setting reads as follows ðF F uÞðt, xÞ ðF y ÞðxÞ ¼ ð !IÞ
@u ðt, xÞ for ðt, xÞ 2 ð0, 1Þ , @t
@u ðt, xÞ ¼ 0 for ðt, xÞ 2 ð0, 1Þ @, @ uð0, xÞ ¼ 0 for x 2 :
ð44Þ
ðkÞ The minimizer uðkÞ of fW 1,p has to satisfy
p F ðF uðkÞ y Þ ¼ k r jrðuðkÞ uðk1Þ Þjp2 rðuðkÞ uðk1Þ Þ 2 p k !juðkÞ uðk1Þ jp2 ðu uðk1Þ Þ: 2
ð45Þ
484
OTMAR SCHERZER
Introducing the relation k ¼
2 1 p ðtk tk1 Þp1
ð46Þ
between the regularization parameters and the time discretization we derive the asymptotic Tikhonov–Morozov method on W 1,p ðÞ: p2 ! p2 @u @u @u @u r : F ðF u y Þ ¼ r r ! @t @t @t @t
ð47Þ
For p ¼ 1 the relation (46) degenerates, indicating that there is no asymptotic integro-differential equation for the Tikhonov–Morozov method on BVðÞ. One of the most significant differences between diffusion filtering and iterative Tikhonov–Morozov regularization is that a small timestep size in the diffusion filtering method results in very large regularization parameters. This is not inconsistent with standard regularization theory since we consider an iterative regularization technique which uses the information of the previous iteration cycle. In our numerical simulations an exponentially decreasing sequence k for the iterative regularization algorithms (41)–(43) leads to a visually attractive image sequence. This, in turn, implies that the time steps tk of the diffusion filtering method (47) are exponentially increasing. This compensates for the fact that in the beginning the diffusion process is rather strong and a small step size is required. As the diffusion progresses the image starts to stagnate and a large timestep size becomes appropriate. B. Numerical Simulations The following test cases have been considered in [147]. We discuss the numerical implementation of the asymptotic Tikhonov–Morozov method and present some numerical simulations for deblurring images. In the numerical simulations presented below we have used the kernel function kðtÞ ¼
ðt2 "2 Þ4 for t 2 ½", " "8
and
kðtÞ ¼ 0 otherwise:
For the numerical solution of the integro-differential Equation (44) we discretize in time and use a finite element ansatz of products of linear splines on . Let ðtk , x1 , x2 Þ ¼
N X i,j¼0
cij ðtk Þij ðx1 , x2 Þ
485
DENOISING AND INVERSE PROBLEMS
be the approximation of the solution of (44) where ij ðx1 , x2 Þ ¼ i ðx1 Þj ðx2 Þ and i is a spline of order 1, that is i ð j=nÞ ¼ ij for i ¼ 0, . . . , N and i is piecewise linear on ½0, 1 . For the approximation of the time derivative of we use a backward difference operator, that is ðtk , xÞ ðtk1 , xÞ @ ðtk , xÞ: tk tk1 @t Using k ¼ 1=ðtk tk1 Þ the discretized system for an approximation of (44) at time tk requires solving the following linear equation for the coefficients cij ðtkþ1 Þ from given coefficients cij ðtk Þ Z X X cij ðtkþ1 ÞðFij,kl þ k I !ij,kl Þ ¼ y F ðk l Þ þ k cij ðtk ÞI !ij,kl
ij
ð48Þ
ij
for all l, k 2 f0, . . . , Ng. Here
h i Z I ! ¼ I !ij,kl ¼ ði Þx1 ðk Þx1 ðj Þx2 ðl Þx2 þ !i k j l
ij,kl
and
F ¼ Fij,kl
Z
ij,kl
¼
F ði j ÞF ðk l Þ
: ij,kl
The solution of the unregularized Equation (48) (that is with k ¼ 0) is illconditioned. This becomes clear when the singular values of the matrix F are plotted (cf. Figure 18); most of the singular values are comparatively small. Errors in components of the data corresponding to singular functions with singular value near zero are then exceedingly amplified. Thus, it is prohibitive to calculate the solution of the unregularized equation. Example VI.1 In the first example we aim to reconstruct the pattern (top image in the first row of Figure 19) from the blurred and additionally noisy data (cf. Figure 19). Figures 20–22 show the inverse scale-space method for reconstructing the pattern from blurred data. When the blurred data is additionally distorted with Gaussian noise the ill-posedness of the problems
486
OTMAR SCHERZER
becomes apparent. Only for a relatively short period of time is the reconstruction visually attractive. For t ! 1 the reconstruction becomes useless. This effect is more significant the more error we have in the data as a comparison of Figures 20–22 shows. One of the major concerns in regularization theory is the estimation of appropriate regularization parameters needed to stop the iteration process before the image becomes hopelessly distorted by noise. For some references on appropriate stopping rules for the Tikhonov–Morozov method we refer the reader to [85,88,91,145]. Example VI.2 Here we aim to compare the Tikhonov–Morozov method on H 1 ðÞ and BVðÞ. We have chosen a piecewise constant function on a rectangle as a paradigm of a function that is in BVðÞ but not in H 1 ðÞ (cf. Figure 23).This has the effect that the reconstruction with the (asymptotic) Tikhonov–Morozov method on H 1 ðÞ always has a blurry character (cf. Figure 24). Figure 25 shows the reconstruction with the Tikhonov–Morozov method on BVðÞ. This method performs worse than the asymptotic Tikhonov–Morozov method on H 1 ðÞ. This numerically supports the fact that there is no inverse scales space method on BVðÞ. This section has been devoted to highlighting the controversial behavior of scale-space methods for the solution of inverse problems and image smoothing and restoration. One of the significant differences in inverse scale space theory for inverse problems is the choice of an adequate stopping
FIGURE 18. The singular values of the matrix F.
DENOISING AND INVERSE PROBLEMS
487
FIGURE 19. Top: The test pattern. This pattern is aimed to be recovered from the blurred data (middle left), the blurred data which is additionally distorted with medium noise (middle right), and distorted with high noise (bottom).
488
OTMAR SCHERZER
FIGURE 20. Reconstruction from blurred data without noise by the inverse scale-space method (44). The images show the solution u of (44) at specified time with exponentially decreasing time-steps. At a certain time the test pattern can be completely recovered. The inverse scale-space methods stagnated at the test pattern.
DENOISING AND INVERSE PROBLEMS
489
FIGURE 21. Reconstruction from blurred data with medium noise using the inverse scalespace method (44). The images show the solution u of (44) at specified time. Top left shows the optimal time for recovery with medium noise. After that time the reconstruction gets worse, showing the importance of determining an optimal stopping time for the inverse scale-space method.
490
OTMAR SCHERZER
FIGURE 22. Reconstruction from blurred data with high noise using the inverse scale-space method (44). Middle right shows the optimal time for recovery with high noise. After that time the reconstruction algorithm diverges extremely fast (cf. the scales of the images).
DENOISING AND INVERSE PROBLEMS
491
FIGURE 23. Test-data for comparing the Tikhonov–Morozov method on H 1 ðÞ and BVðÞ. Left: Image to be reconstructed. Right: The available blurred data, from which we intend to recover the left image.
FIGURE 24. Reconstruction with the asymptotic Tikhonov–Morozov method on H 1 ðÞ at specified time.
492
OTMAR SCHERZER
FIGURE 25. Reconstruction with the asymptotic Tikhonov–Morozov method on BVðÞ at specified time.
DENOISING AND INVERSE PROBLEMS
493
time; after a certain time the noise is considerably amplified in the reconstruction. This is not an issue in image smoothing, where the effect of noise is weakened over time. We also remark that for image smoothing and restoration the total variation flow filtering in almost all documented cases performed significantly better than the heat equation. This is not always true for inverse problems. VII. NONCONVEX REGULARIZATION MODELS In Section III.C we considered regularization functionals of the general form Z Z gðruÞ: f ðuÞ :¼ ðu u Þ2 þ
The existence of a minimizer is relatively easy to establish under the essential assumptions that g is convex with respect to the gradient variable ru and the functional is coercive (see, e.g., Dacorogna [54,55] and Aubert and Kornprobst [19]). The analysis of regularization functionals becomes considerably more involved if the functional f is nonconvex. Such models are outlined below. A. Perona–Malik Regularization In the classical Perona–Malik filter [134,135] we have DðruÞ ¼
1 : 1 þ jruj2
The corresponding variational technique consists in minimizing the functional Z Z Z ðu u Þ2 þ lnð1 þ jruj2 Þ ¼: g^ ðu,ruÞ:
The function g^ is nonconvex with respect to the variable ru. In this case it is well known from the calculus of variations (see, e.g., [55]) that the optimization problem is not well-posed in the sense that there need not exist a minimizer. Therefore, additional regularization concepts are involved in the functional, such as Z Z fRPM ðuÞ :¼ ðu u Þ2 þ lnð1 þ jrL uj2 Þ, ð49Þ
where L is a linear convolution operator with a smooth kernel.
494
OTMAR SCHERZER
The minimizer of the regularized Perona–Malik functional satisfies ! rL u * u u ¼ L r : ð50Þ 1 þ jrL uj2 The corresponding nonlinear diffusion process associated with this regularization technique is ! @u rL u * ¼ L r : ð51Þ @t 1 þ jrL uj2 Regularized Perona–Malik filters have been considered in the literature [22,37,130,171,172]. Catte´ et al. [37], for instance, investigated the nonlinear diffusion process ! @u ru ¼r : ð52Þ @t 1 þ jrL uj2 This technique (as well as other previous regularizations) does not have a corresponding formulation as an optimization problem. In an experiment we juxtapose the regularizations (51) and (52) of the Perona–Malik filter. Both processes have been implemented using an explicit finite difference scheme. The results using the MR image from Figure 11 are shown in Figure 26, where different values for , the standard deviation of the Gaussian, have been used. For small values of , both filters produce rather similar results, while larger values lead to a completely different behavior. For (51), the regularization smoothes the diffusive flux, so that it becomes close to zero everywhere, and the image remains unaltered. The regularization in (52), however, creates a diffusivity which gets closer to one for all image locations, so that the filter creates blurry results resembling linear diffusion filtering. B. Relative Error Regularization The noise in data detected with common measurement devices frequently correlates with the exact data. Here, relevant situations are when the noise locally correlates with the amplitude or the variation of the data. Assuming correlation between the data and noise we are led to fit-to-data terms of the form Z Z 1 ðu u Þ2 1 ju u jp , p ¼ 1, 2, . . . 2 jujp p jrujp1
DENOISING AND INVERSE PROBLEMS
495
FIGURE 26. Comparison of two regularizations of the Perona–Malik filter (t ¼ 250). Top left: Filter (51), ¼ 0:5. Top right: Filter (52), ¼ 0.5. Middle left: Filter (51), ¼ 2. Middle right: Filter (52), ¼ 2. Bottom left: Filter (51), ¼ 8. Bottom right: Filter (52), ¼ 8.
496
OTMAR SCHERZER
FIGURE 27. Correlated noiseR in 1D signals. Top left: Noise-free data. Top right: R UncorrelatedR noise. Middle left: ½ðu u Þ2 =juj ¼R 2 . Middle right: ½ðu u Þ2 =juj2 ¼ 2 . Bottom left: ½ðu u Þ2 =jruj ¼ 2 . Bottom right: ½ðu u Þ4 =jruj3 ¼ 2 .
In Figures 27 and 28 we have plotted noisy data revealing the difference between uncorrelated and correlated noise. We concentrate on Tikhonov type regularization models with BV-seminorm stabilizing functional. This leads to regularization models of the form (relative error regularization): 1 2
Z
ðu u Þ2 þ jujp
Z jruj;
1 p
Z
ju u jp þ p1 jruj
Z jruj:
In order to put this work into context with diffusion filtering techniques it is convenient to consider iterative relative error regularization.
497
DENOISING AND INVERSE PROBLEMS 10
10
20
20
30
30
40
40
50
50
60
60
70
70
80
80
90
90
100
100 10
20
30
40
50
60
70
80
90
100
10
10
20
20
30
30
40
40
50
50
60
60
70
70
80
80
90
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
90
100
100 10
20
30
40
50
60
70
80
90
100
10
10
20
20
30
30
40
40
50
50
60
60
70
70
80
80
90
90
100
100 10
20
30
40
50
60
70
80
90
100
FIGURE 28. Correlated noiseR in 2D signals. Top left: Noise-free data. Top right: R UncorrelatedR noise. Middle left: ½ðu u Þ2 =juj ¼R 2 ; Middle right: ½ðu u Þ2 =juj2 ¼ 2 ; Bottom left: ½ðu u Þ2 =jruj ¼ 2 ; Bottom right: ½ðu u Þ4 =jruj3 ¼ 2 .
In particular, we consider the models of iteratively minimizing the functionals: Z Z Z Z 1 ðu uðk1Þ Þ2 1 ju uðk1Þ jp þ jruj, þ jruj ð53Þ 2 p jrujp1 jujp and denoting the minimizers (presuming they exist) by uðkÞ ; moreover, we again use the convention uð0Þ :¼ u . Since the functionals in (53) are nonconvex and thus quite delicate to handle analytically and numerically, it is convenient to consider semiimplicit variants such as the models of minimization Z Z Z Z 1 ðu uðk1Þ Þ2 1 ju uðk1Þ jp þ jruj; þ jruj: ð54Þ 2 juðk1Þ jp p jruðk1Þ jp1 The functionals in (54) are convex and straightforward to analyze (see [146]).
498
OTMAR SCHERZER
Minimization of the second functional in (54) with p ¼ 2 can be considered as a semi-implicit time step with step size t ¼ for the mean curvature flow equation (9); for p ¼ 4 it is a semi-implicit method for solving the affine invariant mean curvature flow equation 1=3 @u ru ¼ jruj r : @t jruj
ð55Þ
The first functional in (54) corresponds to a semi-implicit time step for solving @u ru p ¼ juj r : @t jruj
ð56Þ
The Euler equation for the minimizer of 1 2
Z
ju uðk1Þ j2 þ jruj
Z jruj
ð57Þ
is u uðk1Þ ¼r jruj
1 ðu uðk1Þ Þ2 ru : 2 jruj jruj2
Note that (58) is only formal since the regularization functional not differentiable. Division of the equation (58) by gives u uðk1Þ ¼ jrujr
ð58Þ R
jruj is
1 ðu uðk1Þ Þ2 ru 1 : 2 2 jruj2 jruj
Taking the formal limit ! 0þ and considering again uðk1Þ uðtk1 Þ, uðkÞ uðtk Þ and ¼ tk tk1 gives again @u ru ¼ jrujr , @t jruj the mean curvature flow equation. Since (58) can be considered to be a Perona–Malik model with positive and negative diffusion, the solution is illposed. The ill-posedness in the optimality condition reflects the fact that the underlying energy functional (57) is nonconvex with respect to the gradient
499
DENOISING AND INVERSE PROBLEMS
10
20
30
40
50
60
70
80
90
100 10
20
30
40
50
60
10
10
20
20
30
30
40
40
50
50
60
60
70
70
80
80
90
70
80
90
100
90
100
100 10
20
30
40
50
60
70
80
90
100
10
10
20
20
30
30
40
40
50
50
60
60
70
70
80
80
90
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
90
100
100 10
20
30
40
50
60
70
80
90
100
FIGURE 29. Original image (top) and filter images: mean curvature flow (middle left); affine mean curvature flow (middle right); implicit regularization (bottom left); BV regularization (bottom right).
variable. By employing generalized solution concepts such as convexification or -limits the ill-posedness (see [146]) disappears. We present two numerical experiments for relative error denoising: (1) We use the artificially generated data set at the top left of Figure 28. The several reconstructions in Figure 29 have been created with bounded variation regularization, mean curvature filtering, affine mean curvature filtering, and implicit error regularization. The stopping time in the diffusion filtering method and the regularization parameters are selected such that all reconstructions have about the same amplitudes. (2) The second example is concerned with denoising of ultrasound data sets.
500
OTMAR SCHERZER
50
100
150
200
250 20
40
60
80
100
120
50
50
100
100
150
150
200
200
250
140
160
180
250 20
40
60
80
100
120
140
160
180
50
50
100
100
150
150
200
200
250
20
40
60
80
100
120
140
160
180
20
40
60
80
100
120
140
160
180
250 20
40
60
80
100
120
140
160
180
FIGURE 30. Original image (top) and filter images: mean curvature flow (middle left); affine mean curvature flow (middle right); implicit regularization (bottom left); BV regularization (bottom right).
From the numerical reconstructions one finds that mean curvature flow and implicit error regularization produce very similar results if the regularization parameter and the diffusion time are identified (Figure 30).
VIII. DISCRETE BV REGULARIZATION AND TUBE METHODS So far we have presented regularization models in infinite dimensional settings (cf. Section III) and in semi-infinite dimensional setting (cf. Section V). In this section we concentrate on completely discrete settings for bounded variation regularization.
DENOISING AND INVERSE PROBLEMS
501
The derivation of discrete variants is not straightforward. Several numerical realizations of discrete bounded variation regularization can be derived, some of which are outlined below. For a piecewise constant function g in [0, 1] of the form i1 i , , i ¼ 1, . . . , n g ¼ gi in i :¼ ð59Þ n n we define
T~ g :¼ ðTgÞi i¼0,...,n , where gi þ giþ1 for i ¼ 1, . . . , n 1 2 ðTgÞ0 :¼ g1 ,
ðTgÞi :¼
ðTgÞn :¼ gn : We call T~ g the traces of g. A piecewise constant function and its traces are plotted in Figure 31. Using these ingredients we are able to formulate two discrete variants of BV regularization. We restrict our attention to minimization of functionals over the set of piecewise constant functions S :¼ fu : uðxÞ ¼
n X ci i
with jci j < 1g,
ð60Þ
i¼1
where i denotes the characteristic function of the interval i .
FIGURE 31. A piecewise constant function with values gi, i ¼ 1, . . . , n and the traces (Tg)i, i ¼ 0, . . . , n; symbolized by *.
502
OTMAR SCHERZER
Discrete bounded variation regularization functionals differ by the way of interpreting the available discrete data. Two possibilities are considered. (1) The data can be interpreted as the measurement data of traces ðTf Þði=nÞ of a BV-function f. This is a sampling problem. In typical sampling problems one interprets the data as the function value f ði=nÞ. Since in our setting f may be discontinuous at i=n, and point evaluation is not possible, we are forced to use trace evaluation. This leads us to consider minimization of the functional TBVd ðuÞ ¼
n X 1 jðTuÞi ðTf Þi j2 þ 2ðn þ 1Þ i¼0
Z
1
jux j
ð61Þ
0
over S, where ðTf Þi is the given data at i=n. (2) Alternatively to assuming available sampled data, one can interpret them as values of a piecewise constant function f ¼
n X
fi i :
i¼1
Given measurement data fi , i ¼ 1, . . . , n, this suggests minimization of the functional TBVd2 ðuÞ :¼ ¼
n 1 X ðci fi Þ2 þ 2n i¼1
Z
1
jux j
0
n n1 X 1 X ðci fi Þ2 þ jciþ1 ci j 2n i¼1 i¼1
ð62Þ
over S. These two possibilities will be utilized below.
A. Discrete BV Regularization (Sampling) The functional TBVd is well-posed, i.e., there exists a unique minimizer in S P (see [95]). To further analyze properties of the minimizer u ¼ ni¼1 ui i of TBVd it is instructive to study the optimality condition for the coefficients
DENOISING AND INVERSE PROBLEMS
503
ui , i ¼ 1, . . . , n. Setting ¼ 4n , the coefficients of u satisfy the set-valued equations ui ui1 ui uiþ1 þ ðui1 þ ui Þ þ ðui þ uiþ1 Þ þ
jui ui1 j jui uiþ1 j 3 ðfi1 þ fi Þ þ ðfi þ fiþ1 Þ for i ¼ 2, . . . , n 1, u1 u2 5u1 þ u2 þ
3 5f1 þ f2 , ju1 u2 j un un1 3 5fn þ fn1 , 5un þ un1 þ
jun un1 j where we use the abbreviation 8 > < f1g e f1g ¼ jej > : ½1, 1
ð63Þ
if
e > 0,
if if
e < 0, e ¼ 0,
which is the subgradient of jej. We observe from (63) that for j ¼ 1, . . . , n 1 ðuj þ ujþ1 Þ þ 2
j1 X
ðui þ uiþ1 Þ þ 4u1 þ
i¼1
3 ð fj þ fjþ1 Þ þ 2
j1 X
uj ujþ1 juj ujþ1 j
ð fi þ fiþ1 Þ þ 4f1 :
ð64Þ
i¼1
and 4un þ 2
n1 X
ðui þ uiþ1 Þ þ 4u1 ¼ 4fn þ 2
i¼1
n1 X
ð fi þ fiþ1 Þ þ 4f1 :
ð65Þ
i¼1
Let F^u ðj=nÞ :¼ ðuj þ ujþ1 Þ þ 2
j1 X ðui þ uiþ1 Þ þ 4u1 , j ¼ 1, . . . , n 1, i¼1
F^u ð1Þ :¼ 4un þ 2
n1 X ðui þ uiþ1 Þ þ 4u1 : i¼1
ð66Þ
504
OTMAR SCHERZER
Moreover, let F^fþ , F^f be linear splines with respect to the nodes f j=n : j ¼ 0, . . . , ng interpolating F^f ð0Þ ¼ 0, F^f ð j=nÞ ¼ F^f ð j=nÞ
j ¼ 1, . . . , n 1,
F^f ð1Þ ¼ F^f ð1Þ:
ð67Þ
The region between the two linear splines is referred to as the tube T^ . F^f and F^fþ mark the lower and upper bounds of T^ . Since uj uj1 ju u j 1 j j1 we find from (64) and (65) that F^u 2 T^ :
ð68Þ
In other words, the antiderivative3 F^u of the minimizer u of the discrete bounded variation regularization formulation is in the tube T^ . We therefore refer to (61) as a tube method.
B. Finite Volume BV Regularization Minimization of TBVd2 as defined in (62) over S is a standard method of formulating discrete bounded variation regularization (cf. Mallat [108]). Note that in this case the data fi are interpreted as coefficients of a piecewise constant function, while in Section VIII.A the data (Tf )i are interpreted as sampling data. Since the first term in TBVd2 is strictly convex it is immediate that the functional TBVd2 has a unique minimizer. To specify the optimality criteria for a minimizer of the functional TBVd2 let ¼ n . Then the minimizer u of TBVd2 can be represented as
3
We refer to
Rt 0
f ðsÞ ds as the anti-derivative of the function f.
DENOISING AND INVERSE PROBLEMS
u :¼
505
Pn
ui i with ui satisfying ui ui1 ui uiþ1 þ ui þ
3 fi jui ui1 j jui uiþ1 j u1 u2 3 f1 , u1 þ
ju1 u2 j un un1 3 fn : un þ
jun un1 j
i¼1
for i ¼ 2, . . . , n 1,
ð69Þ
Let Ff ð0Þ ¼ 0,
Ff ð j=nÞ ¼
j X
fk
for j ¼ 1, . . . , n
k¼1
Ff ð0Þ ¼ 0,
Ff ð j=nÞ ¼ Ff ð j=nÞ
for j ¼ 1, . . . , n 1,
Ff ð1Þ ¼ Ff ð1Þ:
ð70Þ
With f we associate the tube T bounded by the linear splines Ff connecting the values Ff ð j=nÞ. Thus the minimizer u has the property that its antiderivative Fu satisfies Fu 2 T , and thus the finite volume BV regularization is a tube method as well. Again, the antiderivative Fu of the minimizer u is in the tube T, i.e., it is a tube method. C. The Taut String Algorithm In this section we recall the taut string algorithm (see [57,109]) for denoising discrete one-dimensional data. We choose a description which allows generalization to higher-dimensional data. T denotes the tube from the previous section. The taut string ‘‘algorithm’’ is actually the solution to a minimization problem, which we specify next. P Algorithm VIII.1 Let ¼ ni¼1 i i with antiderivative V denote the solution to: Z 1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1X 2 j 1 þ j ¼ 1 þ jðÞj2 d ! min ð71Þ i n i¼1 0 over all continuous and piecewise linear functions V on [0,1] with function values in T.
506
OTMAR SCHERZER
Physically speaking V is a string of minimal length contained in the tube T, connecting (0,0) and (1, Ff (1)), i.e., it is taut. In particular in regions between two contact points of V with the boundary of T a taut string is affine linear and is a piecewise constant function. The values i approximate the input data fi and constitutes a denoised approximation to f. In [57] an algorithm for computing the solution to (71) was presented which proceeded iteratively from one nodal value of the tube to the next. The solution method that we shall propose will be completely different. The taut string, as determined from Algorithm VIII.1, and the finite volume BV-regularized solution with ¼ =n are both contained in the same T. This in particular shows that the graph of the finite BV-regularized solution is at least as long as that of the taut string solution. Finite volume BV regularization and the taut string algorithm share the property that they preserve homogeneous regions of the original data. This is easily seen for the taut string algorithm, since in a flat region of the original data f the function Ff is linear and consequently the taut string is linear in this region too, showing that the flat regions of the filtered data (i.e., the derivative of the taut string) either correspond with the input data or are enlarged. For finite volume BV regularization (as well as other methods) this statement was addressed with rigor in [128].
D. Multidimensional Discrete BV Regularization In this section we present multidimensional analogs of sampling and finite volume BV regularization. Moreover, we propose a multidimensional analog of the taut string algorithm. Let ¼ ð0, 1Þ ð0, 1Þ and let f be piecewise constant with respect ij ¼ i j . To introduce the sampling BV regularization in R2 we proceed as in Section VIII.A and model the fit-to-data term as in (61) as P the sum of both components in the x1 and x2 directions, separately. For f ¼ ni,j¼1 cij ij the BV sampling method consists in minimization of the functional n1 X n1 1 X ciþ1,j þ ci,j fiþ1,j þ fi,j 2 TBVd2s ðuÞ :¼ 2n i¼1 j¼1 2 2 þ
2 Z n1 X n1 1 X ci,j þ ci,jþ1 fi,j þ fi,jþ1 þ jruj 2n i¼1 j¼1 2 2
ð72Þ
DENOISING AND INVERSE PROBLEMS
507
over ( S :¼ u : u ¼
n X
) cij ij :
i,j¼1
A multidimensional analog of the finite volume BV regularization in a higher dimension consists in minimizing the functional Z n X n 1 X 2 jcij fij j þ jruj ð73Þ TBVd2f ðuÞ :¼ 2 2n i¼1 j¼1 over S. To propose an extension of the taut string algorithm to the case of twodimensional data it will be useful to reconsider the taut string algorithm in the following form: (1) Integration of fi , i ¼ 1, . . . , n gives a linear spline Ff . (2) Determination of the taut string Fu in the tube Ff and Ff þ . (3) Differentiation of Fu to obtain the reconstruction for f. Generalization to higher dimensions is impeded by the fact that there is no obvious analog for integration step 1 above. To overcome this difficulty, we proceed by introducing an appropriate potential equation and consider the one-dimensional case first. Given f, we define as a solution to xx ¼ f x ¼ 0
on , on f0, 1g,
and set Ff ¼ x : This replaces step 1 above. Step 2 can then be realized by solving the contact problem: (1) Find a function w 2 BVð0, 1Þ satisfying wð0Þ ¼ Ff ð0Þ and wð1Þ ¼ Ff ð1Þ (in the sense of traces) that minimizes Z 1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ w2x ð74Þ 0
subject to the constraint Ff w Ff þ :
508
OTMAR SCHERZER
(2) Set u ¼ wx . This approach can be generalized to higher dimensions in a straightforward way: (1) Solve ¼ f
in
@ ¼0 @
on @:
Define Ff ¼ r: (2) In R2 find two functions wi 2 BVðÞ, i ¼ 1, 2, minimizing Z qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ jrwi j2
ð75Þ
subject to the constraints ðFf Þi wi ðFf Þi þ , i ¼ 1, 2,
ð76Þ
and w ¼ Ff on @: This is a contact problem for finding a ‘‘minimal surface’’ in the layer bounded by ðFf Þi and ðFf Þi þ . (3) Set u ¼ r w. The choice of proper boundary conditions for and w is not obvious. We tested alternatives to our choice and found that they have no significant influence on the numerical reconstruction. In the one-dimensional case we could have chosen the boundary conditions in such a way that they are consistent with Algorithm VIII.1. This choice, however, has no clear multidimensional analog. E. Numerical Test Examples The practical realization of the taut string algorithm in R2 requires the solution of the bilateral obstacle problem (cf. Algorithm VIII.1), which can be solved efficiently using active set strategies. The particular implementation has been considered in [95], where also general references on active set
DENOISING AND INVERSE PROBLEMS
509
strategies and its numerical implementation can be found. In the following we present some numerical simulations with this method. 1. One-Dimensional Test Example We consider the function 8 1 > > > <2 f ðxÞ :¼ > 4 4x > > : 1
in ½0, 1=4Þ in ð1=4, 1=2Þ in ð1=2, 3=4Þ in ð3=4, 1
:
In Figure 32 we display the results of several test runs for the onedimensional example with three different absolute noise levels, i.e., 1 ¼ 0, 2 ¼ 0.1, and 3 ¼ 0.5 from left to right. The respective values are 1 ¼ 1.0 105, 2 ¼ 5.0 103, and 3 ¼ 2.5 102. In our test, these values produce the best reconstructions. In Figure 32 column i (i ¼ 1, 2, 3) corresponds to a test run with ði , i Þ. The first row in Figure 32 shows the
FIGURE 32. One-dimensional tests. Column i (i ¼ 1, 2, 3) corresponds to (i, i).
510
OTMAR SCHERZER
FIGURE 33. One-dimensional tests. Column i (i ¼ 1, 2, 3) corresponds to (i, li ).
data, the second presents the reconstructions. In the third row we plot the string w (solid) together with its bounds (dashed). In Figure 33 we study the effect of . We have selected values which are larger than the values in Figure 32. In fact, we have l1 ¼ 1.0 102, l2 ¼ 1.0 102, and l3 ¼ 7.5 102. Since we loosen the barriers of the tube the string becomes more flat. We next summarize some of the features observed for denoising of onedimensional images with the taut string algorithm. They are similar to those obtained by nonlinear BV-regularized reconstructions. (1) The mean value of the registered image intensity is preserved by the filtering method. This important feature of diffusion filtering methods (cf. [172]) and nonlinear regularization models (cf. [148]) does not hold for instance for discrete morphological filters, such as the median filter. (2) Spurious noise is removed. (3) Edges are preserved. (4) The taut string algorithm produces images which are damped in height. The magnitude of damping is comparable to that observed for BV-regularized solutions. 2. Two-Dimensional Bench-Mark Problem Here the solution bilateral contact problem is shown for the bench-mark image in Figure 34 (upper left).
IX. WAVELET SHRINKAGE In this section we review the interactions of wavelet filtering, diffusion filtering, and variational methods. For this purpose it is convenient to briefly review orthonormal wavelets.
DENOISING AND INVERSE PROBLEMS Exact data
511
Data
Reconstruction
FIGURE 34. Exact data and noisy data in the first row, reconstruction in the second row.
A. Daubechies’ Wavelets We review Daubechies’ construction of orthonormal wavelets (see [56]). The construction is based on the existence of a scaling function , such that for m 2 Z the functions m,k :¼ 2m=2 ð2m x kÞ, k 2 Z, are orthonormal with respect to the norm on L2 ðRÞ. Moreover, is chosen in such a way that for m 2 Z Vm :¼ spanfm,k : k 2 Zg ( ) X :¼ ak m,k : ak 6¼ 0 for only finitely many k 2 Z , k2Z
form a multiresolution analysis on L2 ðRÞ, that is Vm Vm1 , with
\ m2Z
Vm ¼ f0g
and
m 2 Z, [ m2Z
Vm ¼ L2 ðRÞ:
512
OTMAR SCHERZER
The wavelet spaces Wm are the orthogonal complements of Vm in Vm1 , that is Wm :¼ Vm? \ Vm1 : The mother wavelet
is chosen such that the functions m,k
:¼ 2m=2 ð2m x kÞ, k 2 Z,
form an orthonormal basis of Wm. Since ¼ 0,0 2 V0 V1 , the scaling function must satisfy the dilation equation X ðxÞ ¼ hk ð2x kÞ, ð77Þ k2Z
where the sequence {hk} is known as the filter sequence of the wavelet X ðxÞ ¼ ð1Þk h1k ð2x kÞ: ð78Þ k2Z
The filter coefficients have to satisfy certain conditions in order to guarantee that the scaling functions and fulfill certain properties. In orthogonal wavelet theory due to Daubechies the desired properties on the scaling functions and wavelets are: (1) For fixed integer N 1 the scaling function has support in the interval ½1 N, N . This, in particular, holds when the filter coefficients satisfy hk ¼ 0,
for k < 1 N and for k > N:
ð79Þ
(2) The existence of a scaling function satisfying (77) requires that X hk ¼ 2: ð80Þ k2Z
(3) In order to impose orthonormality R of the integer translates of the scaling function , that is R ðx lÞðxÞ dx ¼ 0,l , the filter coefficients fhk g have to satisfy X hk hk2l ¼ 20,l , l ¼ 0, . . . , N 1: ð81Þ k2Z
(4) The wavelet
Z
is postulated to have N vanishing moments, that is xl ðxÞ dx ¼ 0, l ¼ 0, . . . , N 1 R
ð82Þ
DENOISING AND INVERSE PROBLEMS
513
which require the filter sequence to satisfy X
ð1Þk h1k kl ¼ 0,
l ¼ 0, . . . , N 1:
ð83Þ
k2Z
The oldest wavelet is the Haar wavelet where h0 ¼ h1 ¼ 1. In this case the scaling function is ðxÞ ¼ The wavelet function
1
for x 2 ½0, 1
0
otherwise
:
is given accordingly by 8 for x 2 ½0, 1=2Þ, > <1 ðxÞ ¼ 1 for x 2 ½1=2, 1 , > : 0 otherwise:
Since the functions m,k form an orthonormal basis of L2 ðRÞ, any function f 2 L2 ðRÞ can be expanded in terms of this basis: f ðxÞ ¼
X
fj,k
j,k ðxÞ:
j,k2Z
Orthogonal wavelets on L2 ðRÞ form the basis to construct wavelets on compact intervals [50,113] (for a summary of this topic we also refer the reader to [49]). A family of orthonormal scaling functions and wavelets on multidimensional domains can be constructed from products of one-dimensional scaling and wavelet functions: j~,k~ðx1 , x2 Þ ¼ j1 ,k1 ðx1 Þj2 ,k2 ðx2 Þ, 1 ðx ,x Þ j~,k~ 1 2
¼
j1 ,k1 ðx1 Þ
j2 ,k2 ðx2 Þ,
2 ðx , x Þ j~,k~ 1 2
¼
j1 ,k1 ðx1 Þj2 ,k2 ðx2 Þ,
3 ðx1 , x2 Þ j~,k~
¼ j1 ,k1 ðx1 Þ
j2 ,k2 ðx2 Þ,
where we use the convention that j~ ¼ ð j1 , j2 Þ, k~ ¼ ðk1 , k2 Þ. The functions j~,k~ are called multidimensional scaling functions and the functions i~ ~, j ,k i ¼ 1, 2, 3 are called multidimensional wavelets.
514
OTMAR SCHERZER
B. Denoising by Wavelet Shrinkage Donoho and Johnstone [63] introduced a wavelet-based denoising algorithm, the so-called wavelet shrinkage algorithm. This algorithm consists in calculating the wavelet expansion (see, e.g., [56]) u ðx1 , x2 Þ ¼
3 X
X
i¼1 ð j~,k~Þ2Z2 Z2
u,i ~~ j ,k
i ðx , x Þ j~,k~ 1 2
of the input data and manipulating its coefficients, to be precise the are approximated by ðu,i Þ, with coefficients u,i ~~ ~~ j ,k
j ,k
8 > : tþ
t> jtj : t <
Figure 35 shows the wavelet denoising algorithm with the Daubechies-2 wavelet (with four coefficients h1 , h0 , h1 , h2 ) and the Canny edge detector applied to the filtered data (cf. Figure 36). The method of wavelet shrinkage has been paid considerable attention in the literature (see, e.g., [40,42,63,64,114]) and has been applied for the solution of many practically important problems. 1. Relation to Diffusion Filtering In [40,42] the relation between wavelet shrinkage and regularization methods on the Besov spaces has been established. For the sake of simplicity of presentation we assume that the image data u is available on R2. This is not quite consistent with the overall presentation where we assumed image data on the bounded domain ¼ ð0, 1Þ ð0, 1Þ. In principal one can proceed as outlined below, if instead of wavelets, periodic wavelets are used. The Besov space B11 ðL1 ðR2 ÞÞ can be characterized as follows (note that this is not the standard definition): f 2 B11 ðL1 ðR2 ÞÞ if and only if ð f Þ ¼
3 X
X
i¼1 ð j~,k~Þ2Z2 Z2
Z i fj~,k~ < 1 with fj~i,k~ ¼
R
f 2
i : j~,k~
Here i~ ~ are smooth, orthonormal wavelet functions, and thus f ~i ~ denote j ,k j ,k the wavelet coefficients of the function f.
515
DENOISING AND INVERSE PROBLEMS 140
120
50
140
120
50
100 100
100 100
80
150
60
80
150
60
40 200
40 200
20
250 20
40
60
80
100
120
140
160
0
180
20
250 20
40
60
80
100
120
140
160
180
140
120
50
140
120
50
100 100
100 100
80
150
60
80
150
60
40 200
40 200
20
250
20
250 20
40
60
80
100
120
140
160
180
20
40
60
80
100
120
140
160
180
140
120
50
140
120
50
100 100
100 100
80
150
60
80
150
60
40 200
40 200
20
250
20
250 20
40
60
80
100
120
140
160
180
20
40
60
80
100
120
140
160
180
FIGURE 35. Wavelet shrinkage with ¼ 0, 1, 5, 10, 50, 100.
Formally, the derivative of ð f Þ can be calculated as follows. Let h 2 B11 ðL1 ðR2 ÞÞ \ L2 ðR2 Þ, then the derivative of ð f Þ in direction h is given by ð f þ thÞ ðf Þ t 3 X fj~i,k~ Z X ¼ 2 h i¼1 ðj~,k~Þ2Z2 fj~,k~ R
@ð f ÞðhÞ ¼ lim t!0
which in turn implies that @ðf Þ ¼
3 X
X
i¼1 ðj~,k~Þ2Z2 Z2
f ~i ~ j,k i f j~,k~
i , j~,k~
i , j~,k~
516
OTMAR SCHERZER
FIGURE 36. Canny’s edge detector applied to the wavelet shrinkage data with ¼ 0, 1, 5, 10, 50, 100.
where, of course, the meaning of f ~i ~= f ~i ~ is set-valued, as explained in j ,k j ,k Section VIII.A. i Thus the wavelet coefficients u~ ~ of the minimizer u of the regularization j ,k functional 1 2
Z R2
ðu u Þ2 þ ðuÞ
ð84Þ
satisfy the optimality condition 3 X
X
i¼1 ð j~,k~Þ2Z2 Z2
uij~,k~ u,i ~~ j ,k
i j~,k~
3
3 X
X
i¼1 ðj~,k~Þ2Z2 Z2
ui~ ~ j ,k i u j~,k~
i : j~,k~
DENOISING AND INVERSE PROBLEMS
517
are orthonormal we find that ui~ ~ j ,k ,i i uj~,k~ u~ ~ 3 for all ð j~, k~Þ 2 Z2 Z2 , i ¼ 1, 2, 3: j ,k ui ~ ~
ð85Þ
Since the functions
j~,k~
j ,k
This shows that u,i > ~~ j ,k u,i j~,k~
<
u,i ¼0 ~~ j ,k
if ui~ ~ > 0, j ,k
if ui~ ~ < 0, j ,k i if u~ ~ < :
Consequently from (85) it follows 8 > u,i > > j~,k~ > > < uij~,k~ ¼ 0 > > > > > : u,i þ ~~ j ,k
j ,k
if u,i > j~,k~ : if u,i ~ ~ j ,k
ð86Þ
if u,i < ~~ j ,k
This shows that Besov space regularization is Donoho’s wavelet shrinkage algorithm. Proceeding as in Section III with the regularization technique (84) the diffusion filtering @u þ @ðuÞ 3 0 for t > 0 @t uð0Þ ¼ u is associated. X. REGULARIZATION AND STATISTICS There has been considerable interest in incorporating statistical a priori information in regularization techniques. In this section we outline the basic principle. There are several publications in the literature devoted to this topic; an extremely useful overview article is [89], where also adequate references can be found. For an elementary introduction to statistics we refer the reader to [98]. Let u ðxi Þ ¼ uðxi Þ þ nðxi Þ be the measured image intensity at the pixel xi, which is degraded data u with noise n. In the stochastic framework the
518
OTMAR SCHERZER
intensities u ðxi Þ and nðxi Þ are considered registered intensities of random variables U and N. We denote by PðfU < ugÞ and PðfN < ngÞ the probabilities that the random variables U and N are less than u, n, respectively. The probability density functions are accordingly denoted by PðfU ¼ ugÞ ¼ lim
PðfU 2 ½u, u þ duÞgÞ , du
PðfN ¼ ngÞ ¼ lim
PðfN 2 ½n, n þ dnÞgÞ : dn
du!0þ
dn!0þ
The notation PðfU ¼ ugÞ, PðfN ¼ ngÞ is typically used in the case of discrete random variables. We find it instructive to use this notation for continuous random variables too. The goal is to recover the image intensity u such that the conditional probability
P fU ¼ ug \ fN ¼ u ug ¼ PðfU ¼ ugÞPðfN ¼ u ugÞ
ð87Þ
is maximized with respect to u. Note that the last identity requires that the random variables U and N are independent. If the noise is normally distributed with mean value zero and variance , then 2 1 2 PðfN ¼ u ugÞ ¼ pffiffiffiffiffiffiffiffiffiffi e½ðu uÞ =ð2 Þ : 2p 2
ð88Þ
It is convenient to set PðfU ¼ ugÞ :¼ eF ðuÞ ,
ð89Þ
where F is a nonnegative function. In this case maximization of (87) is equivalent to minimizing F ðuÞðxi Þ þ
1 ðu ðxi Þ uðxi ÞÞ2 : 2 2
ð90Þ
For image processing applications it is necessary to take into account neighborhood relations of image intensities between pixels. This can for instance be achieved by using a nonnegative probabilistic model F which is dependent on gradient approximations of the image intensity.
DENOISING AND INVERSE PROBLEMS
519
In order to minimize (90) for all pixel values xi , i 2 I , we minimize the functional X 1 2 F ðuÞðxi Þ þ 2 ðu ðxi Þ uðxi ÞÞ : 2 i2I To realize the interaction between Tikhonov-type regularization models it is convenient to note that the sum is a quadrature rule approximation of the integral Z
2 2 F ðuÞðxÞ þ ðu ðxÞ uðxÞÞ2 dx:
Using F ðuÞ ¼ jruj2 , with > 0, the stochastic approach is equivalent to Tikhonov regularization with regularization parameter ¼ 2 2 ; F ðuÞ ¼ jruj is bounded variation regularization; F ðuÞ ¼ jrujlogjruj is entropy regularization. For Tikhonov regularization, F ðuÞ ¼ jruj2 , the associated probability density function PðfU ¼ ugÞ ¼ eF ðuÞ ¼ ejruj
2
is large in regions where u is almost constant, and small in regions of high oscillations. Or, in other words, the image intensity u is considered to be reliable if the gradient is low. In establishing the link between stochastic models and Tikhonov-type regularization we assumed Gaussian white noise (88) and (89). Following the derivation above, minimization principles get much more complicated if we skip the assumption of Gaussian white noise. To highlight the arising complications we consider exemplarily Rayleigh distributed noise, that is PðfN ¼ u ugÞ ¼
ju uj ½ðu uÞ2 =ð22 Þ e : 2
ð91Þ
Proceeding as above we find that maximization of the conditional probability results in minimization of the functional Z ðu uÞ2 2
jruj þ logðju ujÞ : 2 2
ð92Þ
520
OTMAR SCHERZER
There is no scale space associated with this regularization functional. However, an inverse scale-space method can be constructed, by considering iterative minimization of the functionals Z ðu uÞ2
jrðu uðk1Þ Þj2 þ logðju ujÞ , k ¼ 1, 2, . . . , 2 2
ð93Þ
and denoting a minimizer by uðkÞ , which then satisfies the optimality condition
ðuðkÞ uðk1Þ Þ ¼
1 1 ðuðkÞ u Þ: 2 2 2ju uðkÞ j2
Setting tk ¼ k , k ¼ 1, . . . , and uðkÞ ¼ uðktÞ, we get by taking the limit
! 0þ the inverse scale-space method: @u 1 1 ðt, xÞ ¼ ðuðt, xÞu ðxÞÞ @t 2 2 2ju ðxÞ uðt, xÞj2 uð0, xÞ ¼ 0 for x 2 :
for ðt, xÞ 2 ð0, 1Þ, ð94Þ
We present filtering of data degraded with Rayleigh noise, with ¼ 0:2 (cf. Figure 37). Figure 38 shows the solution of (94) at specified time. Finally we compare the stochastic regularization method with well-established diffusion filtering methods: the quality of the stochastic regularization is completely different from diffusion-type filtering. We have selected test data that are extremely distorted by Rayleigh noise. The filtered images in Figure 39 were obtained with a large as possible stopping time (to reduce the effect of noise), so that still a number of details, like the tripod of the cameraman, could be recovered. The selected stopping time is too small for the mean curvature motion filtering and the anisotropic diffusion filtering to completely smear out the noise. We find that the number of preserved details in the filtered images is optimal for the total variation flow and the stochastic method. The good performance of these methods is due to the fact that
the stochastic method uses a priori information on the noise and the total variation filtering optimally incorporates information on the image data, which is a blocky image.
521
DENOISING AND INVERSE PROBLEMS
10
10
20
20
30
30
40
40
50 50 60 60 70 70 80 80 10
20
30
40
50
60
70
80
10
20
30
40
50
60
70
80
FIGURE 37. Camera scene (left) is distorted by Rayleigh noise ( ¼ 0.2) (right).
10
10
20
20
30
30
40
40
50
50
60
60
70
70
80
80 10
20
30
40
50
60
70
80
10
10
20
20
30
30
40
40
50
50
60
60
70
70
80
10
20
30
40
50
60
70
80
10
20
30
40
50
60
70
80
10
20
30
40
50
60
70
80
80 10
20
30
40
50
60
70
80
10
10
20
20
30
30
40
40
50
50
60
60
70
70
80
80 10
20
30
40
50
60
70
80
FIGURE 38. Denoising of the image represented in Figure 37 with the inverse scale-space method (94) at time t ¼ 0.04, 0.12, 0.32, 0.56, 1.2, 2.4.
522
OTMAR SCHERZER
10
10
20
20
30
30
40
40
50
50
60
60
70
70
80
80
10
20
30
40
50
60
70
80
10
20
30
40
50
60
70
80
10
20
30
40
50
60
70
80
10 10 20 20 30
40
30
50 40 60 50 70
80
60 10
20
30
40
50
60
10 10 20 20 30
30
40
50
40
60 50 70 60 80 10
20
30
40
50
60
10
20
30
40
50
60
70
80
FIGURE 39. Denoising. Top left: Heat equation. Top right: Total variation flow. Middle left: Perona–Malik diffusion. Middle right: Anisotropic diffusion. Bottom left: Mean curvature motion. Bottom right: Stochastic regularization.
XI. CONCLUSIONS In this chapter we reviewed interactions between variational methods, diffusion filtering for denoising and image smoothing, and reviewed links to splines, wavelets, and statistical methods. We presented an introduction to inverse problems, such as deblurring and deconvolution, and highlighted numerical methods for their solution. Various other important image processing applications which are solved by variational methods and partial differential equations have not been touched on:
Optical flow models [5,18,27,51,75,97,100,104,120,121,149,150,173,174].
DENOISING AND INVERSE PROBLEMS
523
Computational anatomy and image registration [12,13,65,75,81,96, 162,163]. Inpainting [20,24,44,110,111,141]. Diffusion and regularization of vector-valued data, such as color images and tensor-valued medical images data [26,136,167]. Blind deconvolution [34,45,47,144]. Level set methods [69–72,93,132,137,155]. Surface smoothing [61,144]. Active contours [161]. Other variational techniques [129,130,159,160].
We did not attempt to give a complete list of references on these topics, since they are not within the main goal of this chapter. We apologize for any reference that has been omitted. For the reader interested in these topics it should be possible to get a complete account from the references listed in these papers.
ACKNOWLEDGMENTS This work has been supported by the Austrian Fonds zur Fo¨rderung der Wissenschaftlichen Forschung (FWF), grant Y-123 INF. The author thanks H. Grossauer, M. Haltmeier, W. Hinterberger, R. Kowar, J. Ku¨nstle, S. Leimgruber, and G. Regensburger. Moreover, the author is grateful for the agreement of Ch. Groetsch, M. Hintermu¨ller, K. Kunisch, M. Oehsen, E. Radmoser, and J. Weickert to use some data of previous joint publications.
REFERENCES 1. Acar, R., and Vogel, C. R. (1994). Analysis of bounded variation penalty methods for illposed problems. Inverse Probl. 10, 1217–1229. 2. Alvarez, L., Guichard, F., Lions, P.-L., and Morel, J.-M. (1993). Axioms and fundamental equations of image processing. Arch. Ration. Mech. Anal. 123, 199–257. 3. Alvarez, L., Lions, P.-L., and Morel, J.-M. (1992). Image selective smoothing and edge detection by nonlinear diffusion. II. SIAM J. Numer. Anal. 29, 845–866. 4. Alvarez, L., and Morel, J.-M. (1994). Formalization and computational aspects of image analysis. Acta Numerica, 1–59. 5. Alvarez, L., Weickert, J., and Sanchez, J. (1999) A scale-space approach to nonlocal optical flow calculations, in [127], pp. 235–246. 6. Ambrosio, L. (1989). A compactness theorem for a new class of functions of bounded variation. Boll. Un. Mat. Ital. B 3, 857–881.
524
OTMAR SCHERZER
7. Ambrosio, L. (1989). Variational problems in SBV and image segmentation. Acta Appl. Math. 17, 1–40. 8. Ambrosio, L. (1990). Existence theory for a new class of variational problems. Arch. Rational Mech. Anal. 111, 291–322. 9. Ambrosio, L., Fusco, N., and Pallara, D. (2000). Functions of Bounded Variation and Free Discontinuity Problems. New York: Oxford University Press. 10. Ambrosio, L., and Tortorelli, V. M. (1990). Approximation of functionals depending on jumps by elliptic functionals via -convergence. Commun. Pure Appl. Math. 43, 999–1036. 11. Ambrosio, L., and Tortorelli, V. M. (1992). On the approximation of free discontinuity problems. Boll. Un. Mat. Ital. B (7) 6, 105–123. 12. Amit, Y. (1994). A nonlinear variational problem for image matching. SIAM J. Sci. Comput. 15, 207–224. 13. Amit, Y., Grenander, U., and Piccioni, M. (1991). Structural image restoration through deformable templates. J. Amer. Statist. Assoc. 86, 376–387. 14. Andreu, F., Ballester, C., Caselles, V., and Mazo´n, J. M. (2000). Minimizing total variation flow. C.R. Acad. Sci. Paris Se´r. I Math. 331, 867–872. 15. Andreu, F., Ballester, C., Caselles, V., and Mazo´n, J. M. (2001). The Dirichlet problem for the total variation flow. J. Funct. Anal. 180, 347–403. 16. Andreu, F., Ballester, C., Caselles, V., and Mazo´n, J. M. (2001). Minimizing total variation flow. Differential Integral Equations 14, 321–360. 17. Andreu, F., Caselles, V., Dı´ az, J. I., and Mazo´n, J. M. (2002). Some qualitative properties for the total variation flow. J. Funct. Anal. 188, 516–547. 18. Aubert, G., Deriche, R., and Kornprobst, P. (1999). Computing optical flow via variational techniques. SIAM J. Appl. Math. 60, 156–182. 19. Aubert, G., and Kornprobst, P. (2002). Mathematical Problems in Image Processing. New York: Springer-Verlag. 20. Ballester, C., Caselles, V., Verdera, J., Bertalmio, M., and Sapiro, G. (2001). A variational model for filling-in gray level and color images, in Proceedings of the Eighth International Conference On Computer Vision (ICCV-01), Los Alamitos, CA: IEEE Computer Society, pp. 10–16. 21. Banks, H. T., and Kunisch, K. (1989). Estimation Techniques for Distributed Parameter Systems. Basel: Birkha¨user. 22. Barenblatt, G. I., Bertsch, M., Dal Passo, R., and Ughi, M. (1993). A degenerate pseudoparabolic regularization of a nonlinear forward–backward heat equation arising in the theory of heat and mass exchange in stably stratified turbulent shear flow. SIAM J. Math. Anal. 24, 1414–1439. 23. Bellettini, G., Caselles, V., and Novaga, M. (2001). The total variation flow in RN. Technical report, Sezione de analisi matematica e probabilita`, Universita` di Pisa. 24. Bertalmio, M., Sapiro, G., Caselles, V., and Ballester, C. (2000). Image inpainting, in Proceedings of the Computer Graphics Conference 2000 (SIGGRAPH-00). New York: ACM Press, pp. 417–424. 25. Bertero, M., and Boccacci, P. (1998). Introduction to Inverse Problems in Imaging. London: IOP Publishing. 26. Black, M., Sapiro, G., Marimont, D., and Heeger, H. (1997). Robust anisotropic diffusion and sharpening of scalar and vector images, in [166], pp. 263–266. 27. Black, M. J., and Anandan, P. (1991). Robust dynamic motion estimation over time, in Proc. IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR ’91). Los Alamitos, CA: IEEE Computer Society Press, pp. 292–302. 28. Bracewell, R. N., and Riddle, A. C. (1967). Inversion of fan-beam scans in radio astronomy. Astrophys. J. 150, 427–434.
DENOISING AND INVERSE PROBLEMS
525
29. Braides, A. (2000). Free discontinuity problems and their non-local approximation, in Calculus of Variations and Partial Differential Equations. Berlin: Springer, pp. 171–180. 30. Braides, A., and Dal Maso, G. (1997). Non-local approximation of the Mumford–Shah functional. Calc. Var. Partial Differential Equations 5, 293–322. 31. Brezis, H. (1973). Operateurs Maximaux Monotones et semi-groupes de contractions dans les espaces de Hilbert. Amsterdam: North-Holland. 32. H. Brezis (1983). Analyse fonctionnelle. Theorie et applications. Collection Mathematiques Appliquees pour la Maitrise. Paris: Masson. 33. Bronstein, I. N., Semendjajew, K. A., Musiol, G., and Muehlig, H. (1997). Taschenbuch der Mathematik (Handbook of Mathematics), 3rd edition. Frankfurt am Main: Deutsch. 34. Burger, M., and Scherzer, O. (2001). Regularization methods for blind deconvolution and blind source separation problems. Math. Control Signals Systems 14, 358–383. 35. Cadzow, J. A. (1979). An extrapolation procedure for band-limited signals. IEEE Trans. Acoust. Speech Signal Process. 27, 4–12. 36. Canny, J. F. (1986). A computational approach to edge detection. IEEE Trans. Pattern Anal. Machine Intell. PAMI-8(6), 679–697. 37. Catte´, F., Lions, P.-L., Morel, J.-M., and Coll, T. (1992). Image selective smoothing and edge detection by nonlinear diffusion. SIAM J. Numer. Anal. 29, 182–193. 38. Chambolle, A. (1999). Finite-differences discretizations of the Mumford–Shah functional. M2AN Math. Model. Numer. Anal. 33, 261–288. 39. Chambolle, A., and Dal Maso, G. (1999). Discrete approximation of the Mumford–Shah functional in dimension two. M2AN Math. Model. Numer. Anal. 33, 651–672. 40. Chambolle, A., DeVore, R. A., Lee, N., and Lucier, B. J. (1998). Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans. Image Process. 7, 319–335. 41. Chambolle, A., and Lions, P. L. (1997). Image recovery via total variation minimization and related problems. Numer. Math. 76, 167–188. 42. Chambolle, A., and Lucier, B. J. (2001). Interpreting translation invariant wavelet shrinkage as a new image smoothing scale space. IEEE Trans. Image Process. 10, 993–1000. 43. Chan, T. F., Golub, G. H., and Mulet, P. (1999). A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20, 1964–1977 (electronic). 44. Chan, T. F., and Shen, J. (2002). Mathematical models for local nontexture inpaintings. SIAM J. Appl. Math. 62, 1019–1043 (electronic). 45. Chan, T. F., and Wong, C. K. (2000). Convergence of the alternating minimization algorithm for blind deconvolution. Linear Algebra Appl. 316, 259–285. 46. Chan, T. F., Golub, G. H., and Mulet, P. (1996). A nonlinear primal-dual method for total variation-based image restoration, in Proceedings ICAOS ’96, Berlin: Springer, pp. 241– 252. 47. Chan, T. F., and Wong, C. K. (1998). Total variation blind deconvolution. IEEE Trans. Image Process 7, 370–375. 48. Chorin, A. J., and Marsden, J. E. (1993). A Mathematical Introduction to Fluid Mechanics, 3rd edition. New York: Springer-Verlag. 49. Chyzak, F., Paule, P., Scherzer, O., Schoisswohl, A., and Zimmermann, B. (2001). The constrution of orthonormal wavelets using symbolic methods and a matrix analytical approach for wavelets on the interval. Exp. Math. 10, 67–86. 50. Cohen, A., Daubechies, I., and Vial, P. (1993). Wavelets on the interval and fast wavelet transforms. Appl. Comput. Harmon. Anal. 1, 54–81. 51. Cohen, I. (1993). Nonlinear variational method for optical flow computation, in Proc. Eighth Scandinavian Conf. on Image Analysis, Vol. 1, Tromsø, pp. 523–530.
526
OTMAR SCHERZER
52. Colton, D., and Kress, R. (1983). Integral Equation Methods in Scattering Theory. New York: Wiley. 53. Colton, D., and Kress, R. (1992). Inverse Acoustic and Electromagnetic Scattering Theory. New York: Springer-Verlag. 54. Dacorogna, B. (1982). Weak Continuity and Weak Lower Semi-Continuity of Non-Linear Functionals. Berlin: Springer-Verlag. 55. Dacorogna, B. (1989). Direct Methods in the Calculus of Variations. Berlin: Springer-Verlag. 56. Daubechies, I. (1992). Ten Lectures on Wavelets. Philadelphia, PA: SIAM. 57. Davies, P. L., and Kovac, A. (2001). Local extremes, runs, strings and multiresolution. Ann. Statist. 29, 1–65 (with discussion and rejoinder by the authors). 58. De Boor, C. (1978). A Practical Guide to Splines. New York: Springer. 59. De Giorgi, E., Carriero, M., and Leaci, A. (1989). Existence theorem for a minimum problem with free discontinuity set. Arch. Ration. Mech. Anal. 108, 195–218. 60. Dibos, F., and Se´re´, E. (1997). An approximation result for the minimizers of the Mumford–Shah functional. Boll. Un. Mat. Ital. A (7) 11, 149–162. 61. Diewald, U., Preusser, T., Rumpf, M., and Strzodka, R. (2000). Diffusion models and their accelerated solution in image and surface processing. Acta Math. Univ. Comenian (NS) 70, 15–31. 62. Dobson, D. C., and Vogel, C. R. (1997). Convergence of an iterative method for total variation denoising. SIAM J. Num. Anal. 34, 1779–1791. 63. Donoho, D. L., and Johnstone, I. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90, 1200–1224. 64. Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. B., and Picard, D. (1996). Density estimation by wavelet thresholding. Ann. Statist. 24, 508–539. 65. Dupuis, P., Grenander, U., and Miller, M. I. (1998). Variational problems on flows of diffeomorphisms for image matching. Quart. Appl. Math. 56, 587–600. 66. Engl, H. W., and Groetsch, C. W., eds (1987). Inverse and Ill-Posed Problems. Boston: Academic Press. 67. Engl, H. W., Hanke, M., and Neubauer, A. (1996). Regularization of Inverse Problems. Dordrecht: Kluwer Academic. 68. Engl, H. W., Kunisch, K., and Neubauer, A. (1989). Convergence rates for Tikhonov regularization of nonlinear ill-posed problems. Inverse Probl. 5, 523–540. 69. Evans, L. C., and Spruck, J. (1991). Motion of level sets by mean curvature. I. J. Differ. Geom. 33, 635–681. 70. Evans, L. C., and Spruck, J. (1992). Motion of level sets by mean curvature. II. Trans. Am. Math. Soc. 330, 321–332. 71. Evans, L. C., and Spruck, J. (1992). Motion of level sets by mean curvature. III. J. Geom. Anal. 2, 121–150. 72. Evans, L. C., and Spruck, J. (1995). Motion of level sets by mean curvature. IV. J. Geom. Anal. 5, 79–116. 73. Evans, L. C., and Gariepy, R. F. (1992). Measure Theory and Fine Properties of Functions. Boca Raton: CRC Press. 74. Fasano, A., and Primicerio, M., eds. (1994). Proc. Seventh European Conf. on Mathematics in Industry. Stuttgart: Teubner. 75. Fischer, B., and Modersitzki, J. (1999). Fast inversion of matrices in image processing. Numer. Algor. 22, 1–11. 76. Florack, L. (1997). Image Structure. Dordrecht: Kluwer. 77. Frigaard, I. A., Ngwa, G., and Scherzer, O. (2002). On effective stopping time selection for visco-plastic nonlinear BV diffusion filters. SIAM J. Appl. Math. accepted for publication.
DENOISING AND INVERSE PROBLEMS
527
78. Geman, D., and Yang, C. (1995). Nonlinear image recovery with half-quadratic regularization. IEEE Trans. Image Process. 4, 932–945. 79. Gilbert, P. (1972). Iterative methods for the three-dimensional reconstruction of an object from projections. J. Theor. Biol. 36, 105–117. 80. Glowinski, R. (1984). Numerical Methods for Nonlinear Variational Problems. Berlin: Springer. 81. Grenander, U., and Miller, M. I. (1998). Computational anatomy: an emerging discipline. Quart. Appl. Math. 56(4), 617–694. 82. Groetsch, C. W. (1991). Differentiation of approximately specified functions. Amer. Math. Monthly 98, 847–850. 83. Groetsch, C. W. (1993). Inverse Problems in the Mathematical Sciences. Vieweg Mathematics for Scientists and Engineers. Braunschweig: Friedr. Vieweg. 84. Groetsch, C. W. (1999). Inverse Problems. Washington, DC: Mathematical Association of America. 85. Groetsch, C. W., and Scherzer, O. (2000). Nonstationary iterated Tikhonov–Morozov method and third order differential equations for the evaluation of unbounded operators. Math. Meth. Appl. Sci. 23, 1287–1300. 86. Groetsch, C. W. (1983). Comments on Morozov’s discrepancy principle, in Improperly Posed Problems and Their Numerical Treatment, edited by G. Ha¨mmerlin and K. H. Hoffmann, Basel: Birkha¨user, pp. 97–104. 87. Groetsch, C. W. (1984). The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind. Boston: Pitman. 88. Groetsch, C. W., and Scherzer, O. (1993). Optimal order of convergence for stable evaluation of differential operators. Electronic J. Differential Equations, 4, 1–10 (http:// ejde.math.unt.edu). 89. Hamza, A. B., Krim, H., and Unal, G. B. (2002). Unifying probabilistic and variational estimation. IEEE Signal Processing Magazine 19, 37–47. 90. Hanke, M. (2002). Grundlagen der Numerischen Mathematik und des Wissenschaftlichen Rechnens. Stuttgart: Teubner. 91. Hanke, M., and Groetsch, C. W. (1998). Nonstationary iterated Tikhonov regularization. J. Optim. Theory Appl. 98, 37–53. 92. Hanke, M., and Scherzer, O. (2001). Inverse problems light: numerical differentiation. Amer. Math. Monthly 108, 512–521. 93. Harabetian, E., and Osher, S. (1998). Regularization of ill-posed problems via the level set approach. SIAM J. Appl. Math. 58, 1689–1706. 94. Herman, G. T. (1980). Image Reconstruction from Projections: The Fundamentals of Computed Tomography. New York: Academic Press. 95. Hinterberger, W., Hintermu¨ller, M., Kunisch, K., von Oehsen, M., and Scherzer, O. (2002). Tube methods for BV regularization, J. Math. Imag. Vision, accepted for publication. 96. Hinterberger, W., and Scherzer, O. (2001). Models for image interpolation based on the optical flow. Computing 66, 231–247. 97. Hinterberger, W., Scherzer, O., Schno¨rr, Ch., and Weickert, J. (2002). Analysis of optical flow models in the framework of calculus of variations. Num. Funct. Anal. Opt. 23, 69–90. 98. Hoel, P. G. (1960). Elementary Statistics. New York: John Wiley. 99. Hofmann, B. (1999). Mathematik inverser Probleme (Mathematics of Inverse Problems). Stuttgart: Teubner. 100. Horn, B., and Schunck, B. (1981). Determining optical flow. Artif. Intell. 17, 185–203. 101. Kerckhove, M., ed. (2001). Scale-Space and Morphology in Computer Vision, Notes in Computer Science, LNCS 2106. New York: Springer Verlag.
528
OTMAR SCHERZER
102. Kichenassamy, S. (1997). The Perona–Malik paradox. SIAM J. Appl. Math. 57, 1328– 1342. 103. Kirsch, A. (1996). An Introduction to the Mathematical Theory of Inverse Problems. New York: Springer-Verlag. 104. Kornprobst, P., Deriche, R., and Aubert, G. (1999). Image sequence analysis via partial differential equations. J. Math. Imaging Vision 11, 5–26. 105. Kress, R. (1989). Linear Integral Equations. Berlin: Springer-Verlag. 106. Lindeberg, T. (1994). Scale-Space Theory in Computer Vision. Boston: Kluwer. 107. Louis, A. K. (1989). Inverse und Schlecht Gestellte Probleme. Stuttgart: Teubner. 108. Mallat, S. (1999). A Wavelet Tour of Signal Processing, 2nd edition. San Diego, CA: Academic Press. 109. Mammen, E., and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist. 25, 387–413. 110. Masnou, S. (2002). Disocclusion: a variational approach using level lines. IEEE Trans. Image Process. 11, 68–76. 111. Masnou, S., and Morel, J. M. (1998). Level lines based disocclusion, in [175], pp. 259–263. 112. The MathWorks, http://www.mathworks.com/. MATLAB. 113. Meyer, Y. (1991). Ondeletts sur l’intervalle. Rev. Mat. Iberoam. 7(2), 115–133. 114. Meyer, Y. (2001). Oscillating Patterns in Image Processing and Nonlinear Evolution Equations, Vol. 22 of University Lecture Series. Providence, RI: American Mathematical Society. 115. Morel, J. M., and Solimini, S. (1995). Variational Methods in Image Segmentation. Boston: Birkha¨user. 116. Morozov, V. A. (1984). Methods for Solving Incorrectly Posed Problems. New York: Springer Verlag. 117. Morozov, V. A. (1993). Regularization Methods for Ill-Posed Problems. Boca Raton: CRC Press. 118. Mumford, D., and Shah, J. (1989). Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42, 577–685. 119. Murio, D. A. (1993). The Mollification Method and the Numerical Solution of Ill-Posed Problems. New York: John Wiley. 120. Nagel, N. H. (1987). On the estimation of optical flow: relations between new approaches and some new results. Artif. Intell. 33, 299–324. 121. Nagel, N. H., and Enkelmann, W. (1986). An investigation of smoothness constraints for the estimation of displacement vector fields from image sequences. IEEE Trans. Pattern Anal. Mach. Intell. 8, 565–593. 122. Nashed, M. Z., ed. (1976). Generalized Inverses and Applications. New York: Academic Press. 123. Natterer, F. (1986). The Mathematics of Computerized Tomography. Stuttgart: Teubner. 124. Natterer, F., and Wu¨bbeling, F. (2001). Mathematical Methods in Image Reconstruction. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). 125. Neubauer, A. (1989). Tikhonov regularization for non-linear ill-posed problems: optimal convergence rates and finite-dimensional approximation. Inverse Probl. 5, 541–557. 126. Nielsen, M., Florack, L., and Deriche, R. (1997). Regularization, scale-space and edge detection filters. J. Math. Imag. Vision 7, 291–307. 127. Nielsen, M., Johansen, P., Olsen, O. F., and Weickert, J., eds. (1999). Scale-Space Theories in Computer Vision. Lecture Notes in Computer Science, Vol. 1683. Springer Verlag. 128. Nikolova, M. (2000). Local strong homogeneity of a regularized estimator. SIAM J. Appl. Math. 61, 633–658.
DENOISING AND INVERSE PROBLEMS
529
129. Nitzberg, M., Mumford, D., and Shiota, T. (1993). Filtering, Segmentation and Depth, Lecture Notes in Computer Science. New York: Springer. 130. Nitzberg, M., and Shiota, T. (1992). Nonlinear image filtering with edge and corner enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 14, 826–833. 131. Orphanoudakis, S., Trahamnias, P., Crowley, J., and Katveas, N., eds. (1998). Proc. Computer Vision and Mobile Robotics Workshop, CMVR’98, Santorini. 132. Paragios, M., ed. (2001). Variational and Level set methods in Computer Vision. Los Alamitos, CA: IEEE Computer Society. 133. Pazy, A. (1983). Semigroups of Linear Operators and Applications to Partial Differential Equations. New York: Springer-Verlag. 134. Perona, P., and Malik, J. (1987). Scale space and edge detection using anisotropic diffusion, in Workshop on Computer Vision. Washington, DC: IEEE Computer Society Press, pp. 16–22. 135. Perona, P., and Malik, J. (1990). Scale space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 12, 629–639. 136. Pollak, I., Krim, H., and Willsky, A. (1998). Stabilized inverse diffusion equations and segmentation of vector-valued images, in [175], pp. 246–248. 137. Preusser, T., and Rumpf, M. (2002). A level set method for anisotropic geometric diffusion in 3D image processing. SIAM J. Appl. Math. 62, 1772–1793 (electronic). 138. Radmoser, E., Scherzer, O., and Weickert, J. (1999). Scale-space properties of regularization methods, in [127], pp. 211–222. 139. Radmoser, E., Scherzer, O., and Weickert, J. (2000). Scale-space properties of nonstationary iterative regularization methods. J. Visual Commun. Image Representation 11, 96–114. 140. Radon, J. (1917). U¨ber die Bestimmung von Funktionen durch ihre Integralwerte la¨ngs gewisser Mannigfaltigkeiten. Ber. Verh. Sachs. Akad. Wiss. Leipzig Math. Phys. Kl. 69. 141. Ramasubramanian, M., Pattanaik, S. N., and Greenberg, D. P. (1999). A perceptually based physical error metric for realistic image synthesis, in Alyn Rockwood, editor, SIGGRAPH 99 Conference Proceedings, Annual Conference Series, Addison Wesley, pp. 73–82. 142. Reinsch, Ch. (1967). Smoothing by spline functions. Numer. Math. 10, 177–183. 143. Rice, J., and Rosenblatt, M. (1983). Smoothing splines: regression, derivatives and deconvolution. Ann. Statist. 11, 141–156. 144. Sapiro, G. (2001). Geometric Partial Differential Equations and Image Analysis. Cambridge: Cambridge University Press. 145. Scherzer, O. (1997). Stable evaluation of differential operators and linear and nonlinear multi-scale filtering. Electronic J. Differential Equations 15, 1–12 (http://ejde.math.unt.edu). 146. Scherzer, O. (2002). Explicit versus implicite relative error regularization on the space of functions of bounded variation, in ‘‘Inverse Problems, Image Analysis, and Medical Imaging,’’Contemp. Math. 313, 177–198. Providence, RI: American Mathematics Society. 147. Scherzer, O., and Groetsch, C. W. (2001). Inverse scale space theory for inverse problems. In [101], pp. 317–325. 148. Scherzer, O., and Weickert, J. (2000). Relations between regularization and diffusion filtering. J. Math. Imag. Vision 12, 43–63. 149. Schno¨rr, C. (1994). Segmentation of visual motion by minimizing convex non-quadratic functionals, in Proc. 12th Int. Conf. on Pattern Recognition, Vol. A, pp. 661–663. 150. Schno¨rr, Ch. (1991). Funktionalanalytische Methoden zur Gewinnung von Bewegungsinformation aus TV-Bildfolgen. PhD thesis, Fakulta¨t fu¨r Informatik, University of Karlsruhe. 151. Schoenberg, I. J. (1964). Spline functions and the problem of graduation. Proc. Natl. Acad. Sci. USA 52, 947–950.
530
OTMAR SCHERZER
152. Schultz, M. H. (1973). Spline Analysis. Englewood Cliffs, NJ: Prentice-Hall. 153. Schumaker, L. L. (1981). Spline Functions: Basic Theory. New York: Wiley. 154. Seidman, T. I., and Vogel, C. R. (1989). Well posedness and convergence of some regularisation methods for non-linear ill posed problems. Inverse Probl. 5, 227–238. 155. Sethian, J. A. (1999). Level Set Methods and Fast Marching Methods, 2nd edition. Cambridge: Cambridge University Press. 156. Sporring, J., Nielsen, M., Florack, L., and Johansen, P., eds. (1997). Gaussian Scale-Space Theory. Dordrecht: Kluwer. 157. Strang, G., and Fix, G. J. (1973). An Analysis of the Finite Element Method. Englewood Cliffs, NJ: Prentice-Hall. 158. Strong, D., and Chan, T. F. (1996). Exact solutions to the total variation regularization problem. Technical report, University of California, Los Angeles, CAM 96-41. 159. Terzopoulos, D. (1983). Multilevel computational processes for visual surface reconstructions. Computer Vision, Graphics and Image Processing 24, 52–96. 160. Terzopoulos, D. (1988). The computation of visible-surface representations. IEEE Trans. Pattern Anal. Mach. Intell. 10, 417–438. 161. Terzopoulos, D., Witkin, A., and Kass, M. (1988). Constraints on deformable models: recovering 3D shape and nonrigid motion. Artif. Intell. 36, 91–123. 162. Thirion, J.-P. (1995). Fast non-rigid matching of 3D medical images. Technical Report RR-2547, Inria, Institut National de Recherche en Informatique et en Automatique. 163. Thirion, J.-P. (1996). Non-rigid matching using demons. Preprint, INRIA, France. 164. Tikhonov, A. N., and Arsenin, V. Y. (1977). Solutions of Ill-Posed Problems. Washington, DC: Wiley (translation editor: Fritz John). 165. Torre, V., and Poggio, T. A. (1986). On edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 48–163. 166. Torwick, I., ed. (1997). Proceedings of the 1997 IEEE International Conference on Image Processing (ICIP-97). Los Alamitos, CA: IEEE Computer Society. 167. Tschumperle´, D., and Deriche, R. (2002). Diffusion pdes on vector-valued images. IEEE Signal Processing Magazine 19, 16–25. 168. Unser, M. (1999). Splines—a perfect fit for signal and image processing. IEEE Signal Processing Magazine 16, 22–38. 169. Wahba, G. (1990). Spline Models for Observational Data, Vol. 59. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). 170. Webb, S., ed. (1988). The Physics of Medical Imaging. Bristol: Institute of Physics Publishing. 171. Weickert, J. (1994). Anisotropic diffusion filters for image processing based quality control, in [74]. 172. Weickert, J. (1998). Anistropic Diffusion in Image Processing. Stuttgart: Teubner. 173. Weickert, J. (1999). On discontinuity-preserving optic flow, in [131], pp. 115–252. 174. Weickert, J., and Schno¨rr, Ch. (2001). Variational optic flow computation with a spatiotemporal smoothness constraint. J. Math. Imag. Vision, 14, 245–255. 175. Werner, B., ed. (1998). Proceedings of the 1998 IEEE International Conference on Image Processing (ICIP-98). Los Alamitos, CA: IEEE Computer Society.
Index
A
Autocovariance function (ACF), 14–15, 19–20 Average length, 129 Axial data, 126 statistical measures, 129
Aberration coefficients, 366–367 Aberration correctors, 369 Accessibility relation, 110, 111 Achromatic axis, 171 Active set strategies, 508 Adaptive tomographic algorithm, 242 Adaptive tomography, 241–243 Affine invariant mean curvature flow equation, 498 -cuts, 67–68 Ambrosio–Tortorelli approximation, 474 Angular data application examples, 153–169 dispersion, 138 Angular standard deviation, 129 Angular valued data, 127 Angular valued function, 149 Anisotropic diffusion, 522 Anisotropic noise reduction, 39 Annihilation operator, 249, 256, 257 Antinormal ordering, 211 Approximate proximity, 73 Approximation problem, 476 AR(1) process, 21, 23, 25 Archimedian t-norms, 72 Arctan function, 129 Asymptotic Tikhonov-Morozov method, 480–481, 491–492 numerical simulations, 484–493 Auger electron microprobe (AEM), 346 Auger electrons (AE), 346
B Back projection, 300 Backscattered electrons (BSE), 311, 343–349, 355, 358–361, 374, 379–380, 382–388, 393, 425 Backscattering coefficient, 351 Backward difference operator, 485 Baker–Campbell–Hausdorff (BCH) formula, 212, 216 Balanced homodyne detector, 215–218, 258 Banach space, 482 Band-pass filter, 265 Basis restriction error, 23 Beam splitter (BS), 278 Bench-mark problem, 510 Bernoulli convolution, 268, 274 Besov space, 514 regularization, 517 Bhattacharya distance, 93 Bilateral obstacle problem, 508 Binary equations, translating into fuzzy equations, 68 Binary relation, 184 Bingham filtering technique, 450, 461 Bingham fluid flow equation, 448 Bi-orthogonality condition, 230, 237–238 Bi-orthogonality relation, 239 531
532
INDEX
Bloch electrons, 343 Bloch states, 323 Bloch waves, 350 Block diagonal transforms, 28 Block transforms, 2, 8–12, 39–41 Bosonic mode, 257 Bounded variation (BV) sampling method, 506 Bragg angles, 352 Brain, internal representation of space, 61 Brightness, definition, 170 Brightness function, 175 Brillouin zone, 323, 324 Brodatz textures, 165 B-splines, 477
C Calculus of variations, 493 Cathode lens (CL), 369–374, 383, 388, 389, 392, 393, 395–397, 400, 401, 406, 407, 409, 412, 417, 429 Cauchy principal value, 225, 300 CBED (convergent beam electron diffraction), 316, 352 Central limit theorem, 228–229, 233 Centroids, 13 Chroma, 177–178, 180 Chromatic aberration of electron lenses, 317 Circulant matrix, 18 Circular centered gradient, 138, 153 Circular centered morphology, 136–141 Circular centered top-hat, defect detection with, 163–164 Circular data definition, 124 distributions, 133
mathematical morphology applied to, 123–203 nature of, 125 representation of, 125 types, 126 Circular data processing, 126 rotational invariance in, 125 Circular statistics, 128–129 theory, 197 Circular variance, 129 Closing operators, 142 Codebook, 13 Cognition and spatial distances, 60–63 Coherent signals, tomography of, 277–281 Collection efficiency, 394 Color images, 124, 138, 140, 143, 145, 148, 169 See also Hue Color representations, 124 3D polar coordinate, 171–172 Color spaces 3D polar coordinate, 169–181 derivation of useful 3D polar coordinate, 173–178 existing 3D polar coordinate, 172–173 processing of 3D polar coordinate, 181–196 Color statistics, 181–182 Color top-hat, 194 Combined magnetic-electrostatic (compound) objective lens, 367–368 Complete lattice, 184 Completeness condition, 230 Complex lapped transform (CLT), 37 Computer vision, 446, 447 Computerized tomography (CT), 446 Concatenated signal, 45
INDEX
Conceptual spaces, 62 Conditional order, 184, 186 Conditional probability, 518, 519 Confidence interval, 229 Connected component labels, 146 Connected partitions, 141, 199 Conservation of mass, 453 Constrained source coding, 13–14 Contact problem, 507, 510 Continuous-time signals and systems, 3–6 Convolution, 4 Coulomb potential, 330 Covariance matrix, 15–20, 22 eigenvalues of, 247 Crame´r–Rao bound, 289 Creation operator, 256 Crisp distances extending to fuzzy distances, 74 generalizing to fuzzy distance, 67–68 Crystal growth, 451 Crystallinity effects, 350–352 Cubic spline approximation, 474 Curvature-based evolution process, 455 Curvature-based morphological processes, 457 Cyclic closing, 143–145 on indexed partitions, 199–201 Cyclic opening, 145–151 Cyclic operators, 142
D Data compression, 13 Data set deformation, 452 Daubechies’ construction of orthonormal wavelets, 511–513 Deblurring, 463, 481–484 with scale space method, 481–484
533
Defect detection with circular centered top-hat, 163–164 with labeled opening, 164–166 in oriented textures, 163 Denoising, 463, 522 regularization models for, 465–466 Density matrix, 209, 243–244 of single-mode radiation field, 292 of spin systems, 293 of two-mode radiation field, 293 Depolarizing channel, 241 Detection quantum efficiency (DQE), 393 Detection theory, elements of, 209–222 Detector of secondary electrons, 318–319 Diagonal matrix, 17 Dielectric loss function, 328 Dielectric theory, 327 Diffraction aberration, 318 Diffusion filtering, 447, 517 applications, 458 and wavelet shrinkage, 514–517 Diffusion filtering method, 467 Diffusion filtering techniques, 496 Digital images, pixel values, 128 Digital signal processing (DSP) system, 8–9 Dilation, 131, 135–136 gradient by, 137 Dilation equation, 512 Dirac delta function, 225 Dirac delta impulse, 6 Dirac impulse, 4 Direct photodetection, 274 Discontinuity set, 473 Discrepancy principle, 477 Discrete bounded variation (BV) regularization, 500–510 sampling, 502–504
534
INDEX
Discrete cosine transform (DCT), 2, 18, 21–23, 25–27, 41, 46 Discrete Fourier transform (DFT), 2, 8–12, 15, 18, 19, 22, 26, 27, 45, 46, 279 Discrete frequency coefficients, 8 Discrete-time convolution, 7, 8 Discrete-time Fourier transform (DTFT), 1, 7–9 Discrete-time LTI systems, 7 Discrete-time signals and systems, 6–8 Displacement operator, 270 Dissimilarity measure, 70 Distance between two fuzzy sets, 65, 85–103 accounting for spatial distances geometrical approach, 94–95 graph theoretic approach, 100–101 histogram of distances, 101–103 morphological approach, 95–99 tolerance-based approach, 99–100 comparison of membership functions functional approach, 86–88 information theoretic approach, 86–88 pattern recognition approach, 93 set theoretic approach, 89–92 Distance between two points in a fuzzy set, 64 Distance density, definition, 96 Distance from a point to a fuzzy set, 65 as a fuzzy number, 81–85 as a number, 80–81 Distance from set relationships, 69–70 Distance from similarity, 69
Distance information representation of, 70 spatial representations of, 104–107 Distance knowledge to a given object, spatial representation of, 105–107 Distance relationship between two objects, 103 or with respect to a given object, 67 Distances and linguistics, 70 in qualitative setting, 113–114 views on, 54–63 Donoho’s wavelet shrinkage algorithm, 517 Double granulometry, 149, 151, 152 Dual basis, 229
E E B filter, 384 Edge detection, 458, 462, 463, 516 Elastic mean free path (EMFP), 321–323 Elastic scattering, 319 differential cross-sections, 322 on nuclei, 320–322 Electric field, 249 Electromagnetic fields, screening against, 318 Electron backscattering, 345–349 Electron backscattering patterns (EBSP), 352 Electron crystallography, 350 Electron diffusion, 319 Electron–electron interaction, 318 Electron emission, 343–361 energy dependence, 337 Electron energy loss spectroscopy (EELS), 326, 327, 346
INDEX
Electron lenses, chromatic aberration of, 317 Electron penetration, 331–334 Electron probe scanning, 334 Electron scattering, 319 simulation tools, 340–343 Electronic amplifier, 265 Electronic contrast in semiconductors, 426–430 Electrooptic modulator (EOM), 278 Electrostatic detector objective lens (EDOL), 367, 384, 387, 388 Electrostatic field strength, 366 Electrostatic immersion lens, 383 Electrostatic lens, 366 Electrostatic SEM, 313 Emission electron microscope (EEM), 312 Energy band structure, 323 Energy concentration, 16, 17 Energy gaps, 323–324 Energy spectrum of emitted electrons, 343 Entropy, 455 Entropy functions under similarity, 88 Erosion, 135–136 definition, 131 gradient by, 136, 137 Euclidean distance, 107, 194–195 Euclidean space, 129, 137 Euler equation, 498 Everhart–Thornley (ET) detector, 318–319, 389, 429 Extended lapped transform (ELT), 33
F Fast Fourier transform (FFT), 2, 11, 39, 45, 46
535
FEG SEM, 364, 365, 370, 371, 379 Fick’s law, 454 Fictitious photons tomography, 298, 299 Fidelity measurement, 275–276 Field amplitude, detection of, 249 Field intensity, direct measurement, 248 Filter sequence, 512 Finite-duration signal, 8 Finite volume bounded variation (BV) regularization, 504–505 First-order autoregressive (AR(1)), 20 Fisher information, 289 Fluid flow, 452, 455 Fock representation, 237 Fokker–Planck equations, 209, 210 Fourier integrals, 5 Fourier-optical systems, 1 Fourier transform, 1, 3–12, 124–125, 127, 216, 234, 241 amplitude, 153 Free particle, quantum estimation for, 239 Frequency coefficients, 42–43 Frequency-domain enhancement, 2 Fuzzification equations, 68 Fuzzification methods, 67–68 Fuzzy cognitive map framework, 62 Fuzzy dilation, 96, 98–99, 106, 107, 114 Fuzzy distance extending crisp distances to, 74 general principles for defining, 67–71 types and problems, 64–67 Fuzzy geodesic distance between two points in a fuzzy set, 78 defined as fuzzy number, 77–79 defined as number, 75–77
536
INDEX
Fuzzy mathematical morphology, 92 Fuzzy nearest point distance, 98 Fuzzy set theory, 53 Fuzzy sets geodesic distances in, 75–79 semiquantitative or semiqualitative interpretation, 53 Fuzzy spatial distances, 51–122 Fuzzy structuring elements, 112
Geometrical configuration and space, 56 Geometrical symmetries, 301 Glauber formula, 235, 238 Gradient by dilation, 137 Gradient by erosion, 136, 137 Gram–Schmidt orthogonalization procedure, 232 Gray level modification, 452 Green’s formula, 467, 479 Group tomography, 238
G Gamma correction, 170 Gaussian convolution, 160, 212, 218, 234, 262, 274, 461–462 Gaussian distribution, 229, 262 Gaussian filter, 157, 158 Gaussian function, 157 Gaussian state estimation, 295–298 Gaussian Wigner functions, 296–297 General method of quantum tomography, 227–239 General tomographic method, 222–243 Generalized lightness, hue, and saturation (GLHS) model, 173 Generalized squeezed quadrature operators, 236 Generalized Wigner function, 208, 213, 221, 250, 251 Geocognostics framework, 62 Geodesic dilation, 79 Geodesic distance See also Fuzzy geodesic distance in fuzzy sets, 75–79 between two points in 2D space, 76 Geographic information systems (GIS), 62 Geometric phase image, 154–155
H Haar’s invariant measure, 238 Hadamard’s inequality, 17 Hadamard’s principle of wellposedness, 461 Harmonic oscillator systems, quantum estimation for, 232–235 Hausdorff distance, 66, 74, 85, 94, 95, 99, 102, 106, 113, 114 definition, 97 Hausdorff measure, 473 Heat equation, 447–448, 522 Heisenberg evolution, 297 Heisenberg uncertainty principle, 206 Hermite polynomial, 244, 247, 292 Heterodyne detection, 218–222, 273 and homodyne tomography, 253–255 High-resolution transmission electron microscope. See HRTEM images Hilbert distance, 303 Hilbert–Schmidt operator, 256 Hilbert space, 224, 244, 294, 301, 464, 481 Histogram of distances, 101–103
INDEX
HLS space, 131, 172–173, 177, 178 Homodyne data, 278 Homodyne detector, 234 balanced, 258 Homodyne probability distribution, 257, 260 Homodyne tomography, 207–209, 272 and heterodyne detection, 253–255 multimode, 255–265 observables, 243–246 of quantum operation, 286 as universal detector, 243–245 Homogeneous phase extraction in HRTEM images, 153–156 HRTEM images homogeneous phase extraction in, 153–156 Y-TZP, 153 HSV space, 172–173, 177 Hue, 138, 139, 143, 145, 148, 149, 189–190 saturation-weighted, 182, 191–193 Hue angle, 175 Hue mean, saturation-weighted, 182 Human perception and spatial distances, 59–60 Hypergeometric function, 275, 276, 303 Hyperspherical parameterization, 256
I Iconicity diagrammatic, 59 imagistic, 59 IHLS color space, lexicographical order in, 187–195 IHLS space, 125–126, 169, 178–180, 194–195, 197–198
537
inverse transformation to RGB space, 179–180 transformation to RGB space, 178–180 Ill-posedness, 461, 464, 479, 498–499 Image compression, 301 Image enhancement and restoration, 37–38 Image processing, 74, 446, 460 Image processing and analysis, 125, 126 Image reconstruction, 3, 300 Image restoration and enhancement, 39–41 Image segmentation, 458 Image smoothing, 447 Immersion objective lens (IOL), 313, 366, 382–384, 387, 394, 395, 399, 401, 406 Improved hue, luminance, and saturation space. See IHLS space Impulse response, 4, 20 Inclusion index, 91–92 Inclusion measure, 69 Indexed partition, 141–142, 145 cyclic closings on, 199–201 definition, 200 Inelastic mean free path (IMFP), 311, 319, 328, 329 Inelastic scattering on atoms, 328–331 on electrons, 324–328 Infimum, 131, 132, 137, 184–186, 198 Integro-differential equation, 484 Inverse Fourier transform, 6 Inverse problems, 446 ill-posedness, 446 regularization of, 460–471 scale space methods for, 478–493
538
INDEX
Inverse Radon transform, 207, 222, 225, 226, 298 Inverse scale space method, 488–490, 521 IRF, 377 Isotropic opening, 149 Iterative relative error regularization, 496–497 Iterative Tikhonov-Morozov method, 480
J JPEG algorithm, 2 JPEG compression, 3
K Karhunen–Loe`ve transform (KLT), 17, 18, 20–25 Kernel functions, 223, 227 Kripke’s semantics, 109
L Label boundary points, 146 Labeled angular image, 145, 146, 148 Labeled openings, 150 defect detection with, 164–166 Lagrange multiplier, 16, 290, 291, 476 Laguerre polynomial, 235, 258, 259, 280 Language. See Linguistics Lapped directional transform (LDT), 38–41 Lapped orthogonal transform (LOT), 3, 30–32, 42 basis functions, 33 coding gain, 32
definition, 31 extensions, 36–39 Lapped transforms, 2, 3, 28–39, 42 definition, 29 extension to, 29–30 LEED, 316, 317, 352, 403, 404, 406–407 Level curve, 453, 454 Level set modeling, 453–455 Lexicographical order, 184, 186 in IHLS color space, 187–195 Lightness, definition, 171 Linear anisotropic diffusion equation, 449 Linear anisotropic diffusion filtering, 449 Linear block transforms, 9 Linear ill-posed problems, 464 Linear inverse problems, 463 Linear statistical dependencies, 14 Linear system theory, 3–12 Linear time-invariant (LTI) systems, 4–8, 20, 41 Linear transforms, 14 Linguistics and distances, 70 and spatial distances, 57–59, 64 Local oscillators (LO), 215, 219, 265 Log-likelihood function, 288, 296 Longitudinal optical phonons, 328 Low-energy electron diffraction. See LEED Low-energy electron microscope (LEEM), 312, 351, 369, 422, 431 Low-pass filters, 265 Lukasiewicz t-conorm, 92 Lukasiewicz t-norm, 72 Luminance, 149, 187–189 calculation, 171 definition, 171 Lyapunov functionals, 455
INDEX
M Magnetic pinhole lens, 401 Magnetic resonance (MR) image, 468–471, 494 Magnification correction factor, 412 Marginal order, 184, 185 Markov-I process, 20 Mathematical morphology, 108 applied to circular data, 123–203 choice of origin, 130–132 operations, 53 unit circle, 129–130 vectorial, 183–187 MATLAB, 458 Maximization problem, 291 Maximum likelihood estimator, 288 Maximum likelihood principle, 209 Maximum likelihood quantum state estimation, 289–294 Mean curvature motion, 451, 453, 463, 522 Mean direction, 128 Median filter, 510 Medical imaging, 446, 460 MEDOL (magnetic-electrostatic detectorobjectivelens),367,387 Membership functions, 106–108 comparison of, 86–93 Membership values, numerical representation, 105 Mereotopology, 108 Microchannel plate (MCP), 387 Minimization models, 497–498 Minimizing element, 476 Minimizing function, 477 Minkowski difference, 97 Mobile robotics, 62 MOCASIM program, 343 Modal logics, 53 Modulated complex lapped transform (MCLT), 37
539
Modulated lapped transform (MLT), 3, 33–36, 37, 42 basis functions, 34 coding gain, 36 extensions, 36–39 Moments generating function, 257 Monte Carlo (MC) procedure, 341–342 Morphological center, 132–135 Morphological differential equations, 451 Morphological diffusion filtering, 455–457 Morphological gradients, 136–138 Morphological operators, 130, 157, 187, 197 set definitions, 109 Morphological partial differential equations, 452 Morphological segmentation of oriented textures, 161 Morphologics, 109–113 MOS (metal–oxide– semiconductor), 335 Mother wavelet, 512 Mott cross-sections, 321 -cut, 76 Multidimensional discrete BV regularization, 506–508 Multidimensional scaling functions, 513 Multidimensional wavelets, 513 Multimode homodyne tomography, 255–265 Multiplicity numerical, 56 qualitative, 56 Mumford–Shah filtering, 472–474 Mumford–Shah functional, 474 Mumford–Shah segmentation, 473, 474
540
INDEX
N Nd:YAG laser, 277, 285 Nearest point distance, 96 Neumann boundary data, 474 Neuroimaging, 61 No-cloning theorem, 206 Noise deconvolution, 234, 239–241 quantum tomography, 250 removal, 446 in tomographic measurements, 246–253 Noise ratio, 248, 250, 252–253 Nonconvex regularization models, 493–500 Nondegenerate optical parametric amplifier(NOPA),265,273,276 Nondestructive evaluation, 446 Nondifferentiableregularization,464 Nonlinear anisotropic diffusion, 451, 462, 466 Nonlinear BV-regularized reconstructions, 510 Nonlinear inverse problems, 464 Non-local approximations, 474 Normal ordering, 211, 214, 253 Null estimators, 231, 243
O Opening operators, 142 Optical domain, 285–287 Optical filter, 280, 281 Optimal scalar quantization, 14 Optimum angular aperture, 372 Orientation images, 163, 166 Orientation summary image, 159 Oriented textures, 156–166 defect detection in, 163 morphological segmentation, 161
Origin, choice of, 130–132, 153 Orthogonal linear transform, 22 Orthogonal polarizations, 261 Orthogonality relation, 220, 238
P p-axial circular data, 127 P-function, 211, 266–267 Parallel openings, 144–146 Partial differential equations (PDEs), 446, 447 Partial order, 184, 185 Partitions, 141–151 definition, 199 Pattern functions, 223 Pattern recognition, 74 Paul trap, 232 Pauli matrices, 238 PEEM, 378 Periodic wavelets, 514 Perona–Malik diffusion filtering, 451, 452, 466–468, 495, 522 Perona–Malik model, 498 Perona–Malik regularization, 493–494 Peters formulation, 131 Phase-squeezed state, 269 Photodetection, 213–215, 248 Photodiodes, 278, 281 Photoemission electron microscope (PEEM), 312 Photon number, 254, 259–260, 262–265, 271 detection, 273 distribution, 279 probability distribution, 266 Photon statistics, 277 Piecewise constant function and traces, 501 Plastic viscosity, 449
INDEX
Polymer processing, 451 Positive operator-valued measure (POVM), 217–221, 232, 274, 289 Probability density functions, 518, 519 Probability distribution, 215, 216, 218, 236, 250, 251, 259, 262, 264 Projection postulate, 272 Proximity, perception of, 59–60 Pseudoclosing operator, 136 Pseudodilation, 132–136 Pseudoerosion, 132–136 Pseudoopening operator, 136 Pseudooperators, 136, 151 Pulse code modulation (PCM), 13
Q Q-function, 222 Qualitative distance in symbolic setting, 108–114 Quantitative measures in spatial reasoning, 60 Quantization index, 13 Quantum device, tomography of, 281–287 Quantum domain, extension to, 225–226 Quantum efficiency, 213, 214, 217, 221, 235, 245, 250, 254, 257, 262, 264, 271, 274 Quantum estimation for free particle, 239 for harmonic oscillator systems, 232–235 maximum likelihood method, 287–298 for spin systems, 237–239 Quantum hologram, 224
541
Quantum homodyne tomography, 226, 233, 270, 272, 304 experimental situations, 277 Quantum imaging, from classical imaging, 299–304 Quantum measurements, 265–281 of observables, 274 Quantum mechanics, 272, 281–282 Quantum operation, 282–285 homodyne tomography of, 286 Quantum optical phase, 251 Quantum optics, 206–207, 272 Quantum radiography, 208 Quantum standard reference, 287 Quantum state, 206, 223 maximum likelihood estimation, 289–294 nonclassicality measurement, 266–272 reconstruction, 279 two-mode field, 293 Quantum tomography, 205–308 aim, 227 applications, 265–281 basic statistics, 227–229 classical imaging by, 298–304 definition, 206 history, 208, 223–224 noise of, 250 overview, 206–209 Quorum, 227, 231–232 characterization, 229–232
R Radon transform, 207, 224, 232, 300, 464 inversion, 207, 222, 225, 226, 298 Radon transform-based imaging procedure, 223 Random variables, 518
542
INDEX
Rao and Schunck algorithm, 157–161 Rayleigh criterion, 376, 377 Rayleigh distributed noise, 519 Rayleigh noise, 520 Reconstruction formula, 239 Reconstruction technique, 209 Reduced order, 184, 185 Reflection coefficient, 325 Reflection EELS (REELS), 346 Regularization inverse problems, 460–471 methods, 464 nonstationary, 482 numerical experiments, 468–471 parameters, 482 relative error, 494–500 and spline approximation, 474–478 and statistics, 517–520 Tikhonov, 464–465, 467, 469, 496, 519 Regularization functional, 473 Regularization models for denoising, 465–466 Reindexation, 145 Relative error regularization, 494–500, 496 Reproducing kernel, 230–231 Resemblance measure, 69 Retarding field principle, 369 RGB color image, 169 RGB color space, 169 RGB cube, 169 RGB rectangular coordinates, 171 RGB space, 173, 175, 189, 198 inverse transformation from IHLS, 179–180 transformation to IHLS space, 178–180 Rotational invariance in circular data processing, 125
Rotationally invariant cyclic openings, 146–151 Rotationally invariant operator, 125, 151
S Satisfiability measure, 69 Saturation, 187–189 calculation, 176–177 Saturation-weighted hue, 191–193 Saturation-weighted hue mean, 182 Scalar uniform quantizer, 43 Scale space, 455 definition, 458 Scale-space methods, 445–530 deblurring, 481–484 for inverse problems, 478–493 Scale-space theory, 458–459 Scanning electron microscope (SEM), 310 adaptation, 399–401 dedicated equipment, 401–407 specialized, 401 Scanning low-energy electron microscopy. See SLEEM Scanning transmission electron microscope. See STEM Scanwood System, 164 Schro¨dinger cat state, 267, 269 Schro¨dinger equation, 350, 351 Schro¨dinger kitten state, 282 Secondary electrons (SE), 343–345, 354–361, 374, 376, 379–380, 382–387, 393, 402, 403 Segmentation algorithm, 161 Semiconductor laser, 277 Semiconductors, electronic contrast in, 426–430 Semi-group theory, 459, 468 Semi-implicit time step, 498
INDEX
Semi-infinite dimensional setting, 474 Semimetrics, 74 Semipseudometrics, 72, 74, 91, 99 Sensory conflicts between visual and nonvisual information, 61 Series closings, 142–144 Set relationships, distances from, 69–70 Set theoretic morphological operations, 109 Shapes, comparison of, 73 Sign language, 59 Signal blocks, 3 Signal processing, 460 Signal-to-noise ratios (SNR), 393 Signal transforms, 2 Similarity distances from, 69 entropy functions under, 88 Similarity relation, 71 Similitude measure, 69 SIMION 3D package, 383 Single-mode nonclassicality, 267–270 Single-mode radiation field, density matrix of, 292 Single-pole condenser lens (SPCL), 402 Single-pole objective lens (SPOL), 402 SLEEM, 309–443 above-surface electric field, 349 aims, 313 alignment and operation, 407–412 applications, 413–430 cathode lens, 369–374 coherence within primary beam spot, 353–354 combination with surface microanalysis, 405 contrast of crystal orientation, 422 critical energy mode, 418–419
543 detection and specimen-related issues, 381–399 detection strategies, 382–386 detectors, 387–393 diffraction contrast, 419–422 dynamics of charging process, 339 electronic contrast in semiconductors, 426–430 energy-band contrast, 430 extensions to conventional modes of operation, 314–316 first demonstration experiments, 313 formation of primary beam, 361–380 general characteristics of micrograph series, 415–416 heating and damage of specimen, 334–336 ideal dedicated instrument, 405–406 illumination coherence, 421 incorporation of retarding field, 366–369 instruments, 399–413 interaction of slow electrons with solids, 319–343 issues inherent to slow electron beams, 317–319 layered structures, 422–425 material contrast, 425–426 motivations to lower electron energy, 314–319 new opportunities, 316–317 overview, 310–314 pixel size, 374–377 practical issues, 410–413 primary beam trajectory inside objective and cathode lenses, 411 prospective application areas, 414–415
544
INDEX
SLEEM (cont.) quantitative limits, 310 secondary electron emission, 354–361 signal composition, 393–394 specimen charging, 336–340 specimen surface, 394–397 specimen tilt, 397–399 spot size, 362–365 spurious effects, 317–319, 377–379 surface relief, 417 testing the resolution, 379–380 tilted specimen, 412 Sobolev space, 465, 478, 481 s-ordered Wigner functions, 210, 212 s-ordering, 209 Source coding, 13 Space 3D, 55 4D, 55 and geometrical configuration, 56 of operators, 282 and spatial concepts, philosophical thinking, 54–57 views on, 54–63 Spatial distances, 53 See also Distance between two fuzzy sets and cognition, 60–63 economic measures, 59 and human perception, 59–60 information as edge attribute, 100 and linguistics, 57–59, 64 measures of, 59 mental representation, 61 perceptual measures, 59 properties of distances and requirements for, 71–75 temporal measures, 59
Spatial environment, cognitive understanding, 60 Spatial expressions, meaning of, 58 Spatial fuzzy distances general consideration, 63–75 represention issues, 64 Spatial fuzzy sets, 63 as representation framework, 104–105 Spatial information, 52, 53 Spatial knowledge, 54, 56 Spatial measures, 59 Spatial metaphors, 58 Spatial ordering, 55 Spatial reasoning, 54, 109, 112 qualitative information in, 60 quantitative measures in, 60 Spatial relationships, 52, 54, 57, 115 Spatial representation of distance information, 104–107 of distance knowledge to a given object, 105–107 Spatial situations, describing, 58 Spectrograms, 124 Spin systems density matrix of, 293 quantum estimation for, 237–239 Spin tomography identity, 238 Spline approximation and regularization, 474–478 Square matrix, 29 Standard morphological gradient operator, 157 State reduction (SR), 272–276 Statistics and regularization, 517–520 STEM (scanning transmission electron microscope), 316, 352, 369 Stochastic interactions, 377–378
INDEX
Stochastic models and Tikhonovtype regularization, 519 Stochastic regularization, 522 Structuring element, 130, 131, 132, 137, 144, 146, 147, 152 in morpho-logics, 109–113 origin, 138, 140 Supremum, 131, 132, 137, 184–186, 198 Symmetric gradient, 137 Symmetrical ordering, 211
T t-conorm, 92, 96, 103 t-conorm dual, 72 t-equivalence, 71 t-indistinguishability, 71, 74 t-norm, 92, 96, 102, 103 t-transitivity, 73 Taut string algorithm, 505–506, 508 TEG SEM, 363–365, 370, 371, 376, 379 Thomson–Whiddington law, 331 Tikhonov functional, 478, 480 Tikhonov–Morozov method, 483–484, 492 Tikhonov regularization, 464–465, 467, 469, 496, 519 and stochastic models, 519 Tilt-angle dependence, 359 Time-discrete diffusion filtering, 467 Time-domain aliasing cancellation (TDAC), 36 Toeplitz matrix, 15, 18, 22 Tomographic estimator, 223, 238, 249 Tomographic imaging, 224–226 Tomographic measurements, noise in, 246–253
545
Tomographic phase measurement, 251–253 Tomographic reconstruction, 229, 234, 264, 270, 278 Tomography See also Quantum tomography of coherent signals, 277–281 of quantum device, 281–287 Top-hat operator, 138–141, 153, 194 Total order, 184 Trace condition, 230–231 Transfer function, 20 Transfer matrix, 282 Transfer width, 353 Transform coder and decoder, 13–14 Transform coding, 2, 13–25 performance, 23–25 Transform coefficient, 15–17, 24–26, 28, 29, 36 Transform efficiency, 14–23 Transform matrix, 27, 28, 35, 42 Transform signals, 1–3 Transform tensor, 26 Transforms, role of, 13–14 Transmission electron microscope (TEM), 310, 369 Transport equation, 341 Triangular inequality, 77 Truncated Hilbert space dimension, 301 Tube method, 504, 505 Tversky definitions, 69 Twin-beam state, 263 Two-color images, 131 Two-dimensional MLT, 37–39 Two-dimensional transforms, 25–28 Two-LO tomography, 263 Two-mode field, quantum state of, 293 Two-mode nonclassicality, 270–272
546
INDEX
Two-mode radiation field, density matrix of, 293 Two-mode tomography, 260–265 numerical results, 260–265 Type II phase-matched parametric amplifier, 261
U UHV, 428, 431 UHV SEM, 403 UHV SLEEM, 380–381, 404 Ultimate spotsize, 372 Ultrahigh-vacuum (UHV) devices, 318 Uniform quantization, 43 Uniform scalar quantization, 13 Unit circle, 126–128, 197 mathematical morphology, 129–130 Unitary operator, 261 Unitary transform matrix, 17 Unitary transformation, 262
W Wavelet coefficients, 514, 516 Wavelet shrinkage, 510–517 denoising, 514–517 and diffusion filtering, 514–517 Wavelet spaces, 512 Well-posedness, 461 Weyl–Heisenberg group, 238 Weyl’s quantization procedure, 210 Wien condition, 384 Wien filter, 384, 386, 403 Wigner function, 207, 210–213, 222, 225–226, 267, 295, 297, 299, 301 expansion, 301 reconstruction, 279
X Xenon’s paradox, 58 X-ray photon, 330 X-ray tomography, 299
V Vectorial data, 126–127 Vectorial mathematical morphology, 183–187 Vectorial orders, 183–186 VLEED (very-low-energy electron diffraction), 323
Y YAG:Ce3 þ single-crystal scintillator, 388–390, 393, 400 Yield stress, 449 Y-TZP, HRTEM image of, 153
(a)
(b)
(d)
(c)
(e)
(f)
FIGURE A.1. Color example images. (a) Fruit image (with red regions outlined). (b) ‘‘The Virgin’’ by P. Serra in the St. Cugat monastery in Barcelona (size 352 334 pixels). (c) Subregion of Figure A.2(a). (d) Map image with the top half inverted. (e) Map image. (f ) Cell image.
(a)
(b)
FIGURE A.2. Example of a cyclic closing. (a) Initial color image (size 441 297 pixels). (b) The color image after a cyclic closing of the hue by a square SE of size 10.
FIGURE A.3. (a) Four colors and their values of hue, luminance, and saturation. (b) Lizard image (size 544 360 pixels). (c) Morphological closing of image (b) using a lexicographical order with saturation at the first level.
(a) Conic HSV
(b) Cyl HSV
(c) Bi-conic HLS
(d) Cyl HLS
FIGURE A.4. Slices through the conic and cylindrical HSV and HLS spaces.
FIGURE A.5. Results of the color morphological operators.