Advances in Imaging and Electron Physics, Volume 128 (Advances in Imaging and Electron Physics)

ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 128 EDITOR-IN-CHIEF PETER W. HAWKES CEMES-CNRS Toulouse, France ASS...

Author: Peter W. Hawkes

14 downloads 797 Views 15MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 128

EDITOR-IN-CHIEF

PETER W. HAWKES CEMES-CNRS Toulouse, France

ASSOCIATE EDITORS

BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California

TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdom

Advances in

Imaging and Electron Physics EDITED BY PETER W. HAWKES CEMES-CNRS Toulouse, France

VOLUME 128

Amsterdam Boston Heidelberg London New York Oxford Paris San Diego San Francisco Singapore Sydney Tokyo

This book is printed on acid-free paper. Copyright ß 2003, Elsevier Inc. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the Publisher. The appearance of the code at the bottom of the ﬁrst page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of speciﬁc clients. This consent is given on the condition, however, that the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (222 Rosewood Drive, Danvers, Massachusetts 01923), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-2003 chapters are as shown on the title pages: If no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/2003 $35.00 Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: ( þ 44) 1865 843830, fax ( þ 44) 1865 853333, e-mail: [email protected]. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting ‘‘Customer Support’’ and then ‘‘Obtaining Permissions.’’

Academic Press An Elsevier imprint 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.academicpress.com

Academic Press 84 Theobald’s Road, London WC1X 8RR, UK http://www.academicpress.com International Standard Book Number: 0-12-014770-X PRINTED IN THE UNITED STATES OF AMERICA 03 04 05 06 07 08 9 8 7 6 5 4 3

2

1

CONTENTS

CONTRIBUTORS . . . . PREFACE. . . . . . . FUTURE CONTRIBUTIONS .

. . .

. . .

. . .

. . .

. . . . . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

ix xi xiii

Fourier, Block, and Lapped Transforms TIL AACH I. II. III. IV. V. VI. VII.

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

1 3 13 25 28 39 41 42 43 45 47 48

Introduction . . . . . . . . . . . . . Some Views on Space and Distances . . . . . Spatial Fuzzy Distances: General Considerations Geodesic Distance in a Fuzzy Set . . . . . . Distance from a Point to a Fuzzy Set . . . . Distance between Two Fuzzy Sets. . . . . . Spatial Representations of Distance Information. Qualitative Distance in a Symbolic Setting . . . Conclusion . . . . . . . . . . . . . References. . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

52 54 63 75 80 85 104 108 114 115

Introduction: Why Transform Signals Anyway? Linear System Theory and Fourier Transforms Transform Coding . . . . . . . . . . Two-Dimensional Transforms . . . . . . Lapped Transforms . . . . . . . . . Image Restoration and Enhancement . . . Discussion. . . . . . . . . . . . . Appendix A . . . . . . . . . . . . Appendix B . . . . . . . . . . . . Appendix C . . . . . . . . . . . . Appendix D . . . . . . . . . . . . References. . . . . . . . . . . . .

On Fuzzy Spatial Distances ISABELLE BLOCH I. II. III. IV. V. VI. VII. VIII. IX.

v

vi

CONTENTS

Mathematical Morphology Applied to Circular Data ALLAN HANBURY I. II. III. IV. V. VI.

Introduction . . . . . . . . . . . . . Processing Circular Data . . . . . . . . . Application Examples . . . . . . . . . . 3D Polar Coordinate Color Spaces . . . . . Processing of 3D Polar Coordinate Color Spaces Conclusion . . . . . . . . . . . . . Appendix A: Connected Partitions . . . . . Appendix B: Cyclic Closings on Indexed Partitions References. . . . . . . . . . . . . .

. . . . . . . . .

Quantum Tomography G. MAURO D’ARIANO, MATTEO G. A. PARIS, MASSIMILIANO F. SACCHI I. II. III. IV. V. VI. VII. VIII. IX.

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

124 126 153 169 181 196 199 199 201

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

206 209 222 243 255 265 281 287 298 305

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

310 314 319 343 361 381 399 413 431 432

. . . . . . . . .

AND

Introduction . . . . . . . . . . . . . . . Wigner Functions and Elements of Detection Theory . General Tomographic Method . . . . . . . . . Universal Homodyning . . . . . . . . . . . Multimode Homodyne Tomography . . . . . . . Applications to Quantum Measurements . . . . . Tomography of a Quantum Device . . . . . . . Maximum Likelihood Method in Quantum Estimation Classical Imaging by Quantum Tomography . . . . References. . . . . . . . . . . . . . . .

Scanning Low-Energy Electron Microscopy ILONA MU¨LLEROVA´ AND LUDE˘K FRANK I. II. III. IV. V. VI. VII. VIII. IX

Introduction . . . . . . . . . . . Motivations to Lower the Electron Energy. Interaction of Slow Electrons with Solids . Emission of Electrons . . . . . . . . Formation of the Primary Beam . . . . Detection and Specimen-Related Issues . . Instruments . . . . . . . . . . . Selected Applications . . . . . . . . Conclusions . . . . . . . . . . . References. . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

vii

CONTENTS

Scale-Space Methods and Regularization for Denoising and Inverse Problems OTMAR SCHERZER I. II. III. IV. V. VI. VII. VIII. IX. X. XI.

INDEX

Introduction . . . . . . . . . . . . . . . . . . . Image Smoothing and Restoration via Diﬀusion Filtering . . . Regularization of Inverse Problems . . . . . . . . . . . Mumford–Shah Filtering . . . . . . . . . . . . . . . Regularization and Spline Approximation . . . . . . . . . Scale-Space Methods for Inverse Problems. . . . . . . . . Nonconvex Regularization Models . . . . . . . . . . . Discrete BV Regularization and Tube Methods . . . . . . . Wavelet Shrinkage . . . . . . . . . . . . . . . . . Regularization and Statistics . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . .

.

.

.

.

.

.

.

.

.

.

. .

.

.

.

.

.

.

.

.

.

.

446 447 460 472 474 478 493 500 510 517 522 523 531

This Page Intentionally Left Blank

CONTRIBUTORS

Numbers in parentheses indicate the pages on which the authors’ contributions begin.

TIL AACH (1), Institute for Signal Processing, University of Lu¨beck, Ratzeburger Allee 160, D-23538 Lu¨beck, Germany ISABELLE BLOCH (51), Ecole Nationale Supe´rieure des Te´le´communications, De´partement TSI, CNRS URA 820, 46 rue Barrault, 75013 Paris, France LUDE˘K FRANK (309), Institute of Scientiﬁc Instruments AS CR, Kra´lovopolska´ 147, CZ-61264 Brno, Czech Republic ALLAN HANBURY (123) Pattern Recognition and Image Processing Group (PRIP), Vienna University of Technology, Favoritenstraße 9/1832, A-1040 Vienna, Austria G. MAURO D’ARIANO (205), Quantum Optics and Information Group, Istituto Nazionale per la Fisica della Materia, Unita` di Pavia, Dipartimento di Fisica ‘‘A. Volta,’’ Universita` di Pavia, Italy ILONA MU¨LLEROVA´ (309), Institute of Scientiﬁc Instruments AS CR, Kra´lovopolska´ 147, CZ-61264 Brno, Czech Republic MATTEO G. A. PARIS (205), Quantum Optics and Information Group, Istituto Nazionale per la Fisica della Materia, Unita` di Pavia, Dipartimento di Fisica ‘‘A. Volta,’’ Universita` di Pavia, Italy MASSIMILIANO F. SACCHI (205), Quantum Optics and Information Group, Istituto Nazionale per la Fisica della Materia, Unita` di Pavia, Dipartimento di Fisica ‘‘A. Volta,’’ Universita` di Pavia, Italy OTMAR SCHERZER (445), Department of Computer Science, Universita¨t Innsbruck, Technikerstraße 25, A-6020 Innsbruck, Austria

ix

This Page Intentionally Left Blank

PREFACE

The six contributions in this new volume extend over many themes: mathematical morphology, signal processing, scanning electron microscopy, quantum tomography and regularization. We begin with a survey of transforms that are used in signal and image processing, by Til Aach. First, the continuous and discrete Fourier transforms are presented, which leads to the notion of block transforms. These are necessary preliminaries to the real subject of this chapter, which is to describe lapped transforms, the purpose of which is to reduce or even eliminate the artefacts introduced by block transforms. The basis functions now extend over more than one block. The next chapter is a short monograph by Isabelle Bloch on fuzzy spatial distances. Fuzzy sets are being found useful in a host of diﬀerent areas and this chapter, in which the basic notions and the reasons why they are of practical interest are set out very readably, enables the reader unfamiliar with the subject to master it easily. Mathematical morphology plays an important role in this work, which leads us naturally to the third contribution, again a short monograph, in which Allan Hanbury discusses the application of this technique to circular data. Such data are represented by angles or by directional information in two dimensions. They arise in many practical situations: wind directions, the orientations of fracture planes in rocks, and the hue component of color representations in threedimensional polar coordinates are among those cited by the author. More generally, the phase component of complex signals or of complex quantities arising from Fourier transforms are all examples of circular data. This thorough account of a somewhat neglected but very important aspect of image processing will, I am certain, be very much appreciated. The fourth chapter, by G. Mauro D’Ariano, Matteo Paris and Massimiliano Sacchi, brings us to the newest generation of electronic and optical devices. This magisterial account of quantum tomography explains how the quantum state of a system can be estimated by a tomographic technique and presents in full detail all the stages of the reasoning and some practical examples. In the ﬁfth contribution, we return to electron microscopy, this time to the use of the scanning electron microscope at very low energy, typically below 5 keV. For this, the instrument must be redesigned and the image interpretation must be reconsidered carefully. Ilona Mu¨llerova´ and Lude˘k Frank examine the instrumental aspect of low-energy SEM in xi

xii

PREFACE

considerable detail before showing how useful the technique can be in practice. Many areas of image restoration, and indeed of signal processing in general, are bedevilled by the fact that the equations describing the restoration process are ill-posed, which means that there may be no solution compatible with the measurements, or many solutions may correspond to them or again the solution may be highly sensitive to small changes in the data. In order to stabilize the methods, some form of regularization is required, and this is the central theme of the chapter by Otmar Scherzer. In the course of his account, many related questions are examined and, once again, his chapter has the status of a short monograph on this important subject. In conclusion, I thank most warmly all the contributors for taking so much trouble to make their chapters accessible to non-specialists, and on the following pages I list articles promised for future volumes. Peter Hawkes

FUTURE CONTRIBUTIONS S. van Aert, A. den Dekker, A. van den Bos and D. van Dyck (vol. 130) Statistical experimental design for quantitative atomic-resolution transmission electron microscopy G. Abbate New developments in liquid-crystal-based photonic devices S. Ando Gradient operators and edge and corner detection C. Beeli Structure and microscopy of quasicrystals G. Borgefors Distance transforms B. L. Breton, D. McMullan and K. C. A. Smith (Eds) Sir Charles Oatley and the scanning electron microscope A. Bretto (vol. 130) Hypergraphs and their use in image modelling H. Delingette Surface reconstruction based on simplex meshes R. G. Forbes Liquid metal ion sources E. Fo¨rster and F. N. Chukhovsky X-ray optics A. Fox The critical-voltage eﬀect L. Godo and V. Torra Aggregation operators A. Go¨lzha¨user Recent advances in electron holography with point sources A. M. Grigoryan and S. S. Agaian (vol. 130) Transform-based image enhancement algorithms with performance measure

xiii

xiv

FUTURE CONTRIBUTIONS

H. F. Harmuth and B. Meﬀert (vol. 129) Calculus of ﬁnite diﬀerences in quantum electrodynamics M. I. Herrera The development of electron microscopy in Spain D. Hitz Recent progress on HF ECR ion sources J. Hormigo and G. Cristobal (vol. 130) Texture and the Wigner distribution K. Ishizuka Contrast transfer and crystal images G. Ko¨gel Positron microscopy W. Krakow Sideband imaging N. Krueger (vol. 130) The application of statistical and deterministic regularities in biological and artiﬁcial vision systems B. Lahme Karhunen–Loeve decomposition B. Lencova´ Modern developments in electron optical calculations M. A. O’Keefe Electron image simulation N. Papamarkos and A. Kesidis The inverse Hough transform K. S. Pedersen, A. Lee and M. Nielsen The scale-space properties of natural images M. Petrou (vol. 130) Image registration R. Piroddi and M. Petrou (vol. 131) Dealing with irregularly sampled data M. Rainforth Recent developments in the microscopy of ceramics, ferroelectric materials and glass

FUTURE CONTRIBUTIONS

xv

E. Rau Energy analysers for electron microscopes H. Rauch The wave-particle dualism J. J. W. M. Rosink and N. van der Vaart (vol. 131) HEC sources for the CRT G. Schmahl X-ray microscopy S. Shirai CRT gun design methods T. Soma Focus-deﬂection systems and their applications J.-L. Starck The curvelet transform I. Talmon Study of complex ﬂuids by transmission electron microscopy M. Tonouchi Terahertz radiation imaging N. M. Towghi Ip norm optimal ﬁlters Y. Uchikawa Electron gun optics D. van Dyck Very high resolution electron microscopy K. Vaeth and G. Rajeswaran Organic light-emitting arrays C. D. Wright and E. W. Hill Magnetic force microscopy-ﬁltering for pattern recognition using wavelet transforms and neural networks M. Yeadon Instrumentation for surface studies

This Page Intentionally Left Blank

ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 128

Fourier, Block, and Lapped Transforms TIL AACH Institute for Signal Processing, University of Lu¨beck, Ratzeburger Allee 160, D-23538 Lu¨beck, Germany I. Introduction: Why Transform Signals Anyway? . . . . . . II. Linear System Theory and Fourier Transforms . . . . . . A. Continuous-Time Signals and Systems . . . . . . . . . B. Discrete-Time Signals and Systems . . . . . . . . . . . C. The Discrete Fourier Transform and Block Transforms III. Transform Coding . . . . . . . . . . . . . . . . . . . . . . A. The Role of Transforms: Constrained Source Coding . B. Transform Eﬃciency . . . . . . . . . . . . . . . . . . . C. Transform Coding Performance . . . . . . . . . . . . . IV. Two-Dimensional Transforms . . . . . . . . . . . . . . . . V. Lapped Transforms . . . . . . . . . . . . . . . . . . . . . . A. Block Diagonal Transforms . . . . . . . . . . . . . . . B. Extension to Lapped Transforms . . . . . . . . . . . . C. The Lapped Orthogonal Transform . . . . . . . . . . . D. The Modulated Lapped Transform . . . . . . . . . . . E. Extensions . . . . . . . . . . . . . . . . . . . . . . . . . VI. Image Restoration and Enhancement . . . . . . . . . . . . VII. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix D . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

1 3 3 6 8 13 13 14 23 25 28 28 29 30 33 36 39 41 42 42 43 45 47 48

I. INTRODUCTION: WHY TRANSFORM SIGNALS ANYWAY? The Fourier transform and its related discrete transforms are of key importance in both theory and practice of signal and image processing. In the theory of continuous-time systems and signals, the Fourier transform allows one to describe both signal and system properties and the relation between system input and output signals in the frequency domain (Ziemer et al., 1989; Lu¨ke, 1999). Fourier-optical systems based on the diﬀraction of coherent light are a direct practical realization of the two-dimensional continuous Fourier transform (Papoulis, 1968; Bamler, 1989).The discretetime Fourier transform (DTFT) describes properties of discrete-time signals and systems. While the DTFT assigns frequency-continuous and periodic 1

Copyright ß 2003 Elsevier Inc. All rights reserved. 1076-5670/2003 $35.00

2

TIL AACH

spectra to discrete-time signals, the discrete Fourier transform (DFT) represents a discrete-time signal of ﬁnite length by a ﬁnite number of discrete-frequency coeﬃcients (Oppenheim and Schafer, 1998; Proakis and Manolakis, 1996; Lu¨ke, 1999). The DFT thus permits one to compute spectral respresentations numerically. The DFT and other discrete transforms related to it, like the discrete cosine transform (DCT), are also of great practical importance for the implementations of signal and image processing systems, since eﬃcient algorithms for their computations exist, e.g., in the form of the fast Fourier transform (FFT). However, while continuous-time Fourier analysis generally considers the entire time axis from minus inﬁnity to plus inﬁnity, the DFT is only deﬁned for signals of ﬁnite duration. Conceptually, the ﬁnite-duration signals are formed by taking single periods from originally periodic signals. Consequently, enhancement and transform codings of, for instance, speech, are based on the spectral analysis of short time intervals of the speech waveform (Lim and Oppenheim, 1979; Ephraim and Malah, 1984; van Compernolle, 1992; Cappe´, 1994; Aach and Kunz, 1998). The length of the time intervals depends on the nature of the signals, viz. short-time stationarity. Similarly, transform coding (Clarke, 1985) or frequencydomain enhancement (Lim, 1980; Aach and Kunz, 1996a,b, 2000) of images require spectral analysis of rectangular blocks of ﬁnite extent in order to take into account short-space stationarity. Such processing by block transforms often generates audible or visible artifacts at block boundaries. While in some applications these artifacts may be mitigated using overlapping blocks (Lim and Oppenheim, 1979; Lim, 1980; Ephraim and Malah, 1984; Cappe´, 1994; van Compernolle, 1992; Aach and Kunz, 1996a,b, 1998; Aach, 2000), this is not practical in applications like transform coding, where overlapping blocks would inﬂate the data volume. Transform coders therefore punch out adjacent blocks from the incoming continuous data stream, and encode these individually. To illustrate the block artifacts, Figure 1 shows an image reconstructed after encoding by the JPEG algorithm, which uses a blockwise DCT (Rabbani and Jones, 1991). Lapped transforms aim at reducing or even eliminating block artifacts by the use of overlapping basis functions, which extend over more than one block. The purpose of this chapter is to provide a self-contained introduction to lapped transforms. Our approach is to develop lapped transforms from standard block transforms as a starting point. To introduce the topic of signal transforms, we ﬁrst summarize the development from the Fourier transform of continuous-time signals to the DFT. An in-depth treatment can be found in many texts on digital signal processing and system theory (e.g., Ziemer et al., 1989; Oppenheim and Schafer, 1998; Lu¨ke, 1999). In Section III, we discuss the relevance of orthogonal block transforms for

FOURIER, BLOCK, AND LAPPED TRANSFORMS

3

FIGURE 1. Left: Portion of size 361 390 pixels of the ‘‘Marcel’’ image, 8 bits per pixel. Right: Reconstruction after JPEG compression at about 0.2 bits per pixel.

transform coding, which depends on the covariance structure of the signals. Section IV deals with two-dimensional block transforms. Orthogonal block transforms map a given number of signal samples contained in each block into an identical number of transform coeﬃcients. Each signal block can hence be perfectly reconstructed from its transform coeﬃcients by an inverse transform. In contrast to block transforms, the basis functions of lapped transforms discussed in Section V extend into neighboring blocks. The number of transform coeﬃcients generated is then lower than the number of signal samples covered by the basis functions. Signal blocks can therefore not be perfectly reconstructed from their individual transform coeﬃcients. However, if the transform meets a set of extended orthogonality conditions, the original signal is perfectly reconstructed by superimposing the overlapping, imperfectly reconstructed signal blocks. Two types of lapped transforms will be considered, the lapped orthogonal transform (LOT) and the modulated lapped transform (MLT). We then discuss extensions of these transforms before concluding with some examples comparing the use of block and lapped transforms in image restoration and enhancement. II. LINEAR SYSTEM THEORY AND FOURIER TRANSFORMS A. Continuous-Time Signals and Systems Let s(t) denote a real signal, with t being the independent continuous-time variable. Our aim is to describe the transmission of signals through one or

4

TIL AACH

more systems, where a system is regarded as a black box which maps an input signal s(t) into the output signal g(t) by a mapping M, i.e., g(t) ¼ M(s(t)). Restricting ourselves here to the class of linear time-invariant (LTI) systems, we require the systems to comply with the following conditions. (i) Linearity: A linear system reacts to any weighted combination of K input signals si(t), i ¼ 1, . . . , K, with the same weighted combination of output signals gi(t) ¼ M(si(t)): ! K K K X X X M ai si ðtÞ ¼ ai Mðsi ðtÞÞ ¼ ai gi ðtÞ, ð1Þ i¼1

i¼1

i¼1

where ai, i ¼ 1, . . . , K denote the weighting factors. (ii) Time invariance: A time-invariant system reacts to an arbitrary delay of the input signal with a correspondingly delayed, but otherwise unchanged output signal: MðsðtÞÞ ¼ gðtÞ ) Mðsðt ÞÞ ¼ gðt Þ,

ð2Þ

where is the delay. An LTI system is completely characterized by the response to the Dirac delta impulse (t). Denoting the so-called impulse response by h(t), we have h(t) ¼ M((t)). The Dirac impulse (t) is a distribution deﬁned by the integral equation Z

1

sðtÞ ¼ 1

sðÞðt Þ d,

ð3Þ

which essentially represents a signal s(t) by an inﬁnite series of Dirac impulses delayed by and weighted by s(). Since an LTI system reacts to the signal s(t) by the same weighted combination of delayed impulse responses h(t), it suﬃces to replace (t) in Equation (3) by h(t) to obtain the output g(t): Z

1

gðtÞ ¼ 1

sðÞhðt Þ d:

ð4Þ

This relationship is known as the so-called convolution, and abbreviated by g(t) ¼ s(t) * h(t). Since the convolution is commutative, we may interchange input signal and impulse response, and equally write g(t) ¼ h(t) * s(t). Let us now consider the system reaction to the complex exponential seig(t) of frequency f (or radian frequency ! ¼ 2pf ) given by seig ðtÞ ¼ e j2pft ,

ð5Þ

FOURIER, BLOCK, AND LAPPED TRANSFORMS

pﬃﬃﬃﬃﬃﬃﬃ where j ¼ 1. From g(t) ¼ h(t) * seig(t), we obtain Z1 gðtÞ ¼ e j2pft hðÞej2pf d ¼ seig ðtÞ Hð f Þ,

5

ð6Þ

1

where1

Z

1

Hð f Þ ¼

hðtÞe j2pft dt:

ð7Þ

1

Hence, the input signal is only weighted by the generally complex weighting factor H( f ), but otherwise reproduced unchanged, and called an eigenfunction of LTI systems. The relationship between h(t) and H( f ) is the Fourier transform, and denoted by h(t)H( f ). If known for all frequencies, H( f ) is called the spectrum of the signal h(t), or the transfer function of the LTI system. Equation (7) essentially is an inner product or correlation between h(t) and the complex exponential of frequency f. The signal h(t) can be recovered from its spectrum H( f ) by the inverse Fourier transform Z1 hðtÞ ¼ Hð f Þe j2pft df , ð8Þ 1

which is a weighted superposition of complex exponentials. (This integral reconstructs discontinuities of h(t) by the average between left and right limit.) Evidently, an LTI system can also be fully described by its transfer function H( f ). When applied to a signal s(t), the Fourier transform S( f ) is called the spectrum of s(t). It speciﬁes the weights and phases of the complex exponentials contributing to s(t) in the inverse Fourier transform according to Z1 sðtÞ Sð f Þ ) sðtÞ ¼ Sð f Þe j2pft df : ð9Þ 1

The Fourier transform allows one to describe the transfer of a signal s(t) over an LTI system in the frequency domain. According to Equation (6), the LTI system reacts to e j2pft by H( f ) e j2pft. Equation (9) represents the system input s(t) as a weighted superposition of complex exponentials. Because of linearity, the output signal g(t) is given by an identical weighted superposition of system reactions H( f ) e j2pft: Z1 gðtÞ ¼ Sð f ÞHð f Þe j2pft df : ð10Þ 1

1 In the following, we R 1assume the Fourier integrals to exist. For h(t) piecewise continuous, a suﬃcient condition is 1 jhðÞj d < 1.

6

TIL AACH

Denoting the spectrum of g(t) by G( f ), the inverse Fourier transform yields Z

1

gðtÞ Gð f Þ ) gðtÞ ¼ 1

Gð f Þe j2pft df :

ð11Þ

Comparing Equations (10) and (11), we obtain G( f ) ¼ H( f )S( f ), i.e., the spectrum of the output signal is given by the product of the spectrum of the input signal and the transfer function of the LTI system. The Fourier transform as given by Equations (7) and (8) thus provides insight into the frequency content of signals, and transfer properties of LTI systems. Relating a continuous-time signal to a spectrum that is a function of a continuous frequency variable, this version of the Fourier transform is, however, not suited for numerical evaluation by computer or digital signal processing systems. Still, realization of a continuous Fourier analyzer is possible, for instance by optical systems (Papoulis, 1968; Bamler, 1989). B. Discrete-Time Signals and Systems Let us now consider a discrete-time signal s(n), where the independent variable may only take integer values, i.e., n ¼ 0, 1, 2, . . . . Essentially, s(n) is an ordered sequence of numbers stored, for example, in the memory of a computer, or coming from an A/D-converter. A discrete-time system maps the input signal s(n) into the output signal g(n) by the mapping g(n) ¼ M(s(n)). As in the continuous-time case, we regard only linear timeinvariant systems obeying the following conditions: (i) Linearity: ! K K K X X X ai si ðnÞ ¼ ai Mðsi ðnÞÞ ¼ ai gi ðnÞ, ð12Þ M i¼1

i¼1

i¼1

for arbitrary input signals si(n) and weighting factors ai, i ¼ 1, . . . , K. (ii) Time invariance: MðsðnÞÞ ¼ gðnÞ ) Mðsðn mÞÞ ¼ gðn mÞ,

ð13Þ

where m is an integer delay. In the discrete-time case, the Dirac delta impulse is replaced by the unit impulse (n) which is deﬁned by 1 for n ¼ 0 ðnÞ ¼ : ð14Þ 0 otherwise

FOURIER, BLOCK, AND LAPPED TRANSFORMS

7

A discrete-time signal s(n) can then be composed of a sum of weighted and shifted unit impulses according to 1 X

sðnÞ ¼

sðmÞðn mÞ:

ð15Þ

m¼1

To determine the system response g(n), it then suﬃces to know its impulse response h(n) ¼ M((n)). Because of linearity and time invariance, the output signal is given by the following superposition of weighted and shifted impulse responses: 1 X

gðnÞ ¼

sðmÞhðn mÞ:

ð16Þ

m¼1

This operation is called the discrete-time convolution, and is denoted by g(n) ¼ s(n) * h(n). Like its continuous-time counterpart, the discrete-time convolution is commutative. The eigenfunctions of discrete-time LTI systems are discrete-time complex exponentials given by seig ðnÞ ¼ e j2pfn :

ð17Þ

Note that the frequency variable f is still continuous. Passing seig(n) through our LTI system yields the output signal gðnÞ ¼ e j2pfn

1 X

hðmÞej2pfm ¼ seig ðnÞ HDT ð f Þ,

ð18Þ

m¼1

where HDT ð f Þ ¼

1 X

hðnÞej2pfn

ð19Þ

n¼1

is the DTFT of h(n), which can be regarded as the transfer function of the LTI system, or the spectrum of the signal h(n). We denote this relation by h(n)HDT( f ). Clearly, the spectrum of a discrete-time signal is periodic over f. Indeed, s(n) can be regarded as the Fourier series representation of HDT( f ). To reconstruct h(n) from its spectrum, it therefore suﬃces to consider a single period of HDT( f ): Z 1=2 hðnÞ HDT ð f Þ ) hðnÞ ¼ HDT ð f Þe j2pfn df : ð20Þ 1=2

8

TIL AACH

As in the continuous-time case, it is straightforward to show that the spectrum of an output signal of an LTI system is the product of the spectrum of the input signal and the transfer function of the LTI system: gðnÞ ¼ sðnÞ * hðnÞ GDT ð f Þ ¼ SDT ð f Þ HDT ð f Þ:

ð21Þ

While the discrete-time convolution in Equation (16) can be implemented on digital signal processing (DSP) systems, the spectral-domain relations are of less practical value, since they depend on a continuous frequency variable. C. The Discrete Fourier Transform and Block Transforms Let us now consider a ﬁnite-duration signal s(n), n ¼ 0, . . . , N1 comprising N samples. Seeking a spectral-domain representation for s(n) by N frequency coeﬃcients SDFT(k), k ¼ 0, . . . , N1, we start from its DTFT SDT( f ), which is a sum over N components. SDT( f ) is periodic with period 1, and therefore fully speciﬁed by one period, for instance 0 f<1. Seeking to represent the N-sample sequence s(n) by N discrete frequency coeﬃcients, we take N equally spaced samples from one period of SDT( f ), 0 f<1, thus obtaining the DFT of s(n) by N1 X k ¼ sðnÞejð2p=NÞkn , k ¼ 0, . . . , N 1: SDFT ðkÞ ¼ SDT N n¼0

ð22Þ

The ﬁnite-duration signal s(n) can be recovered from its DFT SDFT(k) by the inverse DFT (see Appendix A) sðnÞ SDFT ðkÞ ) sðnÞ ¼

1 X 1N SDFT ðkÞe jð2p=NÞkn , n ¼ 0, . . . , N 1: ð23Þ N k¼0

The DFT hence represents a ﬁnite-duration discrete-time signal s(n) of N coeﬃcients by N discrete spectral coeﬃcients SDFT(k), and is therefore perfectly suited for numerical implementation. In general, the frequency coeﬃcients are complex, oﬀering 2N degrees of freedom. However, for real * ðN kÞ, which s(n), the DFT obeys the symmetry condition SDFT ðkÞ ¼ SDFT reduces the degrees of freedom to N. Since the DFT applies to ﬁnite-length signals, samples of signals of long duration must be collected into successive segments or blocks of ﬁnite length, which are then subjected to the DFT. Transforms like the DFT are therefore termed block transforms. The block length is limited by practical considerations, like available memory and performance of the digital signal

FOURIER, BLOCK, AND LAPPED TRANSFORMS

9

processing system. More important, however, is the inﬂuence of statistical signal properties on the block length: the notion of power spectrum, for instance, is meaningful only for (wide sense) stationary random signals. Real data, like speech or images, are stationary only over short time intervals and blocks of rather small extent, respectively. Applications of spectral analysis, like power spectrum estimation by block transforms or block transform coding, therefore only make sense when applied to reasonably short and small segments. In the JPEG still image compression standard, images are processed in blocks of 8 8 pixels. Speech can be considered stationary for intervals of the order of 10–50 ms. When sampled at 8 kHz, this translates into blocks with 64–256 samples. Linear block transforms are conveniently expressed as matrix operations. Grouping the signal samples s(n), n ¼ 0, . . . , N1 into a column vector s ¼ [s(0), s(1), . . . , s(N1)]T, and the frequency coeﬃcients into a vector S ¼ [SDFT(0), SDFT(1), . . . , SDFT(N1)]T, Equation (22) can be written as S ¼ W s,

with W ¼ ½W kn ,

ð24Þ

where W is the square N N transform matrix with entry Wkn ¼ ejð2p=NÞkn

ð25Þ

in the (kþ1)th row and (nþ1)th column. The inverse transform in Equation (23) can be expressed by s¼

1 ðW* ÞT S: N

ð26Þ

Thus, ðW* ÞT W ¼ N I,

ð27Þ

where I is the identity matrix. The DFT transform matrix is hence unitary up to a factor N, or, in other words, the DFT basis functions are orthogonal. We have derived the DFT by sampling the ﬁrst period of the DTFT of a signal of length N with a sampling period 1/N. What are the consequences of this frequency-domain sampling operation? Comparing the Fourier transform S( f ) of a continuous-time signal s(t) according to Equation (7) to the DTFT in Equation (19), we see that replacing a continuous-time signal by discrete equally spaced samples with sampling period one leads to a periodic spectrum with period one in the DTFT of Equation (19). Also, the Fourier transform and its inverse in Equation (8) are almost identical in structure.

10

TIL AACH

Apart from a sign change in the exponent, the signal s(t) and its spectrum S( f ) are simply interchanged, as are the time and frequency variables t and f. Therefore, like time-domain sampling leads to a periodic spectrum, frequency-domain sampling of the periodic spectrum leads to a periodic discrete signal sp(n) by periodically repeating s(n) with a period that is the inverse of the sampling period in the frequency domain. Since the frequencydomain sampling period in Equation (22) is 1/N, the periodic signal sp(n) is given by sp ðnÞ ¼

1 X

sðn þ rNÞ:

ð28Þ

r¼1

Hence, the DFT represents one period of a periodic discrete-time signal by one period of its discrete-frequency periodic spectrum. Both the ‘‘actually’’ transformed signal and its spectrum are therefore periodic. This implicit, underlying periodicity must not be overlooked when applying the DFT. One consequence is the occurrence of spurious high-frequency artifacts in the DFT spectrum, which are generated when the block-end signal coeﬃcients s(0) and s(N1) diﬀer strongly. The periodically repeated signal sp(n) then exhibits abrupt transitions, which ‘‘leak’’ spectral energy into high-frequency spectral coeﬃcients. To illustrate this eﬀect, Figure 2 shows the signal s(n) ¼ cos(4pn/64), n ¼ 0, . . . , 63 and its DFT spectrum. Since two periods of the cosine ﬁt perfectly into the analysis interval, periodic repetition of s(n) generates a smooth signal. As expected, the DFT spectrum exhibits two ‘‘clean’’ peaks at k ¼ 2 and k ¼ 62. This is vastly diﬀerent in Figure 3: the frequency of the cosine is slightly increased such that now 2.5 periods ﬁt into the data interval, with s(n) ¼ cos(5pn/64), n ¼ 0, . . . , 63. Periodic repetition generates transitions from almost 1 to 1 between block-end samples. The eﬀect of these transitions is evident in the DFT spectrum, which is now spread over all frequency coeﬃcients. This example also illustrates one important application of block transforms: as Figure 2 shows, a block transform may concentrate the signal energy into only a small number of spectral coeﬃcients. This property is essential for data compression by transform coding. From Figure 3, however, it becomes clear that the DFT is probably not the optimal transform for this purpose, due to problems caused by discontinuities at the block ends. In the next section we examine transform coding in more detail. We will see that, although better transforms than the DFT exist for transform coders, artifacts caused by block boundaries persist. This is the main motivation for the development and use of lapped transforms.

FOURIER, BLOCK, AND LAPPED TRANSFORMS

11

FIGURE 2. Top: Source signal sðnÞ ¼ cosð4pn=64Þ for n ¼ 0, . . . , 63. Bottom: Modulus DFT spectrum SDFT(k) of s(n) for k ¼ 0, . . . , 63. Since periodic repetition of s(n) does not create discontinuities, the DFT spectrum exhibits the expected two peaks.

Various fast and highly eﬃcient algorithms are available for the computation of the DFT and its inverse (‘‘fast Fourier transform,’’ FFT). These are widely used in applications like power spectrum estimation, fast convolution, adaptive ﬁltering, noise reduction, and signal enhancement, as

12

TIL AACH

FIGURE 3. Top: Source signal sðnÞ ¼ cosð5pn=64Þ for n ¼ 0, . . . , 63. Bottom: Modulus DFT spectrum SDFT(k) of s(n) of k ¼ 0, . . . , 63. Periodic repetition of s(n) results in strong discontinuities between the periods, causing the spreading of signal energy over all frequency coeﬃcients.

well as in many others. Some of these applications require the use of overlapping segments, others the use of segments that are subjected to a smooth window function such that discontinuities at block ends are reduced or eliminated (Oppenheim and Schafer, 1998; Ziemer et al., 1989, Chap. 11).

FOURIER, BLOCK, AND LAPPED TRANSFORMS

13

III. TRANSFORM CODING A. The Role of Transforms: Constrained Source Coding The aim of source coding or data compression is to represent discrete signals s(n) with only a small expected number of bits per sample (the so-called bit rate), with either no distortion (lossless compression), or as low a distortion as possible for a given rate (lossy compression). Since we try to optimize the trade-oﬀ between distortion and rate on the average, we regard signals as random which we describe by their statistical properties. The essential step in source coding is quantization (Goyal, 2001, p. 12). A straightforward approach is so-called pulse code modulation (PCM), where each sample is quantized individually at a ﬁxed number of bits, e.g., eight bits for gray-level images. Most signals representing meaningful information, however, exhibit strong statistical dependencies between signal samples. In images, for instance, the gray levels of neighboring pixels tend to be similar. To take such dependencies into account, possibly large sets of adjacent samples should be quantized together. Unfortunately, this unconstrained approach leads to practical problems even for relatively small groups of samples (Goyal, 2001). In transform coding, the signals or images are ﬁrst decomposed into adjacent blocks or vectors of N input samples each. Each block is then individually transformed such that the statistical dependencies between the samples are reduced, or even eliminated (Clarke, 1985; Zelinski and Noll, 1977; Goyal, 2001). Also, the signal energy which generally is evenly distributed over all signal samples s(n) should be repacked into only a few transform coeﬃcients. The transform coeﬃcients S(k) can then be quantized individually (scalar quantization). Each quantizer output consists of an index i(k) of the quantization interval into which the corresponding transform coeﬃcient falls. These indices are then coded, e.g., by a ﬁxed length code or an entropy code. The decoder then ﬁrst reconverts the incoming bitstream into the quantization indices, and then replaces the quantization index i(k) for each transform coeﬃcient S(k) by the centroid V(i(k)) of the indexed quantization interval, which serves as an approximation, or better, estimate, S^ ðkÞ ¼ VðiðkÞÞ of S(k). The relation between the indices i(k) and the centroids V(i(k)) is stored in a look-up table called a codebook. An inverse transform then calculates the reconstructed signal ˆ sðnÞ. The principle of a transform coder and decoder (codec) is shown in Figure 4. Clearly, due to quantization, the compression technique is lossy. The distortion caused by uniform scalar quantization is discussed in Appendix B. Optimizing a transform codec needs to address choosing an

14

TIL AACH

FIGURE 4. Block diagram of a transform coder and decoder. The signal vector s is ﬁrst transformed into the transform coeﬃcient vector S ¼ As. The transform coeﬃcients are quantized. The quantization indices i(k) are encoded into codewords and multiplexed into the bitstream which is transmitted over the channel. The decoder ﬁrst demultiplexes the bitstream into the codewords, which are then reconverted into the quantization indices i(k). The decoded quantization indices are used to access the codebooks, yielding the quantized transform coeﬃcient values S^ ðkÞ ¼ VðiðkÞÞ. These are subjected to an inverse transform to obtain the reconstructed signal vector s^ ¼ A1 S^ .

optimal transform and optimal scalar quantization of the transform coeﬃcients. Since the optimization is thus constrained by the architecture outlined in Figure 4, we speak of constrained source coding. Practical transform codecs employ linear unitary or orthogonal transforms. Linear transforms explicitly inﬂuence linear statistical dependencies, that is, correlations. In the next section we therefore ﬁrst discuss unitary transforms subject to the criteria of decorrelation and energy concentration. We then show that the optimal transform with respect to these criteria is also optimal with respect to the reconstruction errors incurred at given rates.

B. Transform Efficiency Modeling the signal s(n) as wide-sense stationary over n ¼ 0, . . . , N1, the mean value is constant for all samples. Without loss of generality, we assume that the mean is zero, if necessary by having ﬁrst subtracted a potential nonzero mean from the data. The autocovariance function (ACF) is then given by cs(n) ¼ E(s(m)s(mþn)), where E denotes expectation, and the

FOURIER, BLOCK, AND LAPPED TRANSFORMS

15

(constant) variance 2 of s(n) is 2 ¼ cs(0). The ACF can be normalized by cs(n) ¼ 2 ps(n), with ps(0) ¼ 1 and j ps(n)j 1. Alternatively, covariances can be expressed by the covariance matrix Cs (Fukunaga, 1972; Therrien, 1989), which is an N N matrix deﬁned by 2 6 6 Cs ¼ E½ssT ¼ 2 6 4

1 ps ð1Þ .. . ps ðN 1Þ

ps ð1Þ 1

ps ð2Þ ps ð1Þ

ps ðN 2Þ ps ðn 3Þ

3 ps ðN 1Þ ps ðN 2Þ 7 7 7: 5 1 ð29Þ

The entry in the (nþ1)th row and (kþ1)th column of Cs is thus given by cs(jnkj). The covariance matrix of a wide-sense stationary signal vector is evidently a positive semideﬁnite and symmetric Toeplitz matrix (Therrien, 1989; Makhoul, 1981; Akansu and Haddad, 2001); indeed, Cs is symmetric about both main diagonals (persymmetric) (Unser, 1984). We transform the signal vector s into the coeﬃcient vector S ¼ A s by a linear, unitary transform. The transform is described by an N N matrix A, with A1 ¼ AH, where the superscript H denotes conjugate transpose (cf.pEquation (27)). ﬃﬃﬃﬃ For instance, A could be a unitary DFT deﬁned by A ¼ 1= N W, with W given by Equation (25). A unitary transform preserves Euclidean lengths: kSk22 ¼ SH S ¼ sT AH A s ¼ sT I s ¼ ksk22 ,

ð30Þ

where sH ¼ sT, since s is real. The covariance matrix CS of the transform coeﬃcients can then be derived as CS ¼ E½SSH ¼ AE½ssT AH ¼ ACs AH ,

ð31Þ

and also det(CS) ¼ det(Cs). Furthermore, the sum of the variances of the signal and transform coeﬃcients are identical: N 2 ¼ trðCs Þ ¼ trðCS Þ ¼

N 1 X

S2 ðkÞ,

ð32Þ

k¼0

where tr(C) is the trace of matrix C. In general, the nondiagonal entries of Cs diﬀer more or less strongly from zero, reﬂecting correlations between the signal samples s(n). We now seek a unitary transform matrix A, which decorrelates as much as possible the input data. Hence, we seek a transform matrix such that the covariance matrix CS of the transform coeﬃcients is diagonal or nearly

16

TIL AACH

diagonal (Fukunaga, 1972; Therrien, 1989; Clarke, 1985; Goyal, 2001). At the same time, we seek to concentrate optimally the signal energy into only a few dominant transform coeﬃcients. The decorrelation eﬃciency d can be measured by comparing the sums of absolute nondiagonal matrix entries before and after transformation by (Akansu and Haddad, 2001, p. 33) P k,l,k6¼l j½CS kl j : d ¼ 1 P m,n,m6¼n j½Cs mn j

ð33Þ

Energy concentration can be evaluated by the relative energy contribution of the L S2 ð1Þ > S2 ðN 1Þ, such a measure is PN1 2 PN1 2 S ðkÞ k¼L S ðkÞ , DBR ðLÞ ¼ Pk¼L ¼ N1 2 tr ðCS Þ ðkÞ k¼0 S

ð34Þ

which is sometimes referred to as the basis restriction error (Jain, 1979; Unser, 1984; Akansu and Haddad, 2001). Denoting the rows of A by N-component row vectors aTk , k ¼ 0, . . . , N1, we obtain for the variances S2 ðkÞ by evaluating Equation (31) for the entries along the main diagonal S2 ðkÞ ¼ aTk Cs a*k :

ð35Þ

Minimizing the basis restriction error subject to the real constraint aTk a*l ¼ ðk lÞ is equivalent to minimizing the functional J¼

N 1 X

½aTk Cs a*k k ðaTk a*k 1Þ ,

ð36Þ

k¼L

with Langrangian multipliers lk, and where we have taken into account that the denominator in Equation (34) is invariant under a unitary transform. It can straightforwardly be shown that J is minimized by the normalized eigenvectors uk, k ¼ 0, . . . , N1 of the data covariance matrix Cs (Therrien, 1992, pp. 50, 694; Therrien, 1989; Akansu and Haddad, 2001). The eigenvectors fulﬁll Cs uk ¼ k uk :

ð37Þ

Since Cs is symmetric and positive semideﬁnite, its eigenvalues lk are real and nonnegative. Its eigenvectors are orthogonal, and, since the eigenvalues

FOURIER, BLOCK, AND LAPPED TRANSFORMS

17

are real, the eigenvectors can always be found such that their elements are real (Cs also has complex eigenvectors, obtained by multiplying the real eigenvectors by a nonzero complex factor). The unitary transform matrix A is given by 2 T 3 u0 6 uT 7 6 1 7 7 ð38Þ A¼6 6 .. 7: 4 . 5 uTN1 This transform is called the Karhunen–Loe`ve transform (KLT) (Fukunaga, 1972; Therrien, 1989). The variances of the transform coeﬃcients are given by the eigenvalues lk, since from Equation (35) S2 ðkÞ ¼ uTk Cs uk ¼ uTk k uk ¼ k ,

ð39Þ

where we have considered only real eigenvectors. Also, since the eigenvectors are orthogonal, we have for the nondiagonal entries of the covariance matrix CS ½CS kl ¼ uTk Cs ul ¼ uTk l ul ¼ 0

for k 6¼ l:

ð40Þ

Hence, CS is a diagonal matrix, and the transform coeﬃcients are perfectly decorrelated. We constrain the eigenvectors to be real, and order them in Equation (38) by rank of their eigenvalues. Up to the sign of the eigenvectors, the KLT then is the unique unitary transform which minimizes the basis restriction error and perfectly diagonalizes the covariance matrix if the eigenvalues are all distinct. Also, invoking Hadamard’s inequality which states that the determinant of any symmetric, positive semideﬁnite matrix is less than or equal to the product of its diagonal elements, we obtain an additional measure for energy concentration: we ﬁnd that the determinant of a covariance matrix is always less than or equal to the product over all variances, i.e., det½Cs ¼ det½CS

N1 Y

S ðkÞ2 :

ð41Þ

k¼0

If CS was obtained by the KLT, we have equality: det½Cs ¼ det½CS ¼

N1 Y k¼0

k ¼

N1 Y k¼0

S ðkÞ2 :

ð42Þ

18

TIL AACH

Hence, the KLT minimizes the geometric mean of the variances to (Zelinski and Noll, 1977; Goyal, 2001) " 2 GM

¼

#1=N

N1 Y

S ðkÞ

2

" ¼

k¼0

N1 Y

#1=N k

:

ð43Þ

k¼0

As we will see later on, this measure is directly related to the distortion of a transform coder as a function of the rate. Although thus optimal in theory, the KLT has two drawbacks. First, it depends on the covariance structure of the data. Second, there is no general fast algorithm for computation of the KLT. Fortunately, as we will see subsequently, the KLT is in practice well approximated by sinusoidal tranforms like the DCT and lapped transforms. Let us ﬁrst examine how the DFT is related to the KLT. Rewriting the covariance matrix in Equation (29) as 2 6 6 Cs ¼ 6 6 4

cð0Þ

cð1Þ

cð2Þ

cð1Þ .. .

cð0Þ

cð1Þ

cðN 1Þ cðN 2Þ

cðN 3Þ

cðN 1Þ

3

cðN 2Þ 7 7 7 7 5

ð44Þ

cð0Þ

¼ toeplitz½c0 , c1 , . . . , cN2 , cN1 , we form another symmetric Toeplitz matrix: Ds ¼ toeplitz½c0 , cN1 , cN2 , . . . , c1 2 cð0Þ cðN 1Þ cðN 2Þ 6 cðN 1Þ cð0Þ cðN 1Þ 6 ¼6 .. 6 4 . cð1Þ

cð2Þ

cð3Þ

cð1Þ

3

cð2Þ 7 7 7: 7 5

cð0Þ

ð45Þ

Similar to the decomposition of a signal s(n) into the sum s(n) ¼ se(n) þ so(n) of an even signal se(n) ¼ 0.5[s(n) þ s(n)] and an odd signal so(n) ¼ 0.5[s(n) s(n)], we can decompose the covariance matrix Cs into the sum of a circulant and a skew circulant matrix (Unser, 1984). The circulant matrix is calculated by 1 E ¼ ½Cs þ Ds ¼ toeplitz½e0 , e1 , . . . , eN1 , 2

ð46Þ

FOURIER, BLOCK, AND LAPPED TRANSFORMS

19

and the skew circulant by 1 O ¼ ½Cs Ds ¼ toeplitz½o0 , o1 , . . . , oN1 : 2

ð47Þ

Evidently, e0 ¼ c0 and o0 ¼ 0. The entries ei and oi, i ¼ 1, . . . , N1 are related to ci by 1 ei ¼ ½ci þ cNi ¼ eNi 2

and

1 oi ¼ ½ci cNi ¼ oNi , 2

ð48Þ

and the covariance matrix Cs is the sum Cs ¼ E þ O

ð49Þ

As shown in Unser (1984, Sect. 4), Therrien (1992, Sect. 4.7.2), and Akansu and Haddad (2001, p. 43), the basis functions of the unitary DFT form complex eigenvectors uk of the circulant matrix E. Denoting the elements of uk by uk(n), we thus have 1 2p kn : uk ðnÞ ¼ pﬃﬃﬃﬃ exp j N N

ð50Þ

Similarly, the basis vectors of a related transform called the discrete odd Fourier transform are eigenvectors of O. The eigenvalues of E are then given by the DFT of its ﬁrst row: Ek ¼

2p kn , k ¼ 0, . . . , N 1: en exp j N n¼0

N 1 X

ð51Þ

Because of the symmetry en ¼ eNn, n ¼ 1, . . . , N1, the DFT is real and symmetric, that is lEk ¼ lENk , k ¼ 1, . . . , N1. Therefore, eigenvectors with real elements can also be found for E, like real or imaginary parts of Equation (50). The DFT can be simpliﬁed to Ek

2p kn : ¼ en cos N n¼0 N 1 X

ð52Þ

Recalling from Equation (29) that the elements of a covariance matrix are given by the samples of the ACF, and regarding E as a valid covariance

20

TIL AACH

matrix, the eigenvalues lEk can also be interpreted as power spectral coeﬃcients. Although we have thus found fast KLTs for circulant and skew circulant matrices, this does not generally solve for the KLT of the sum. We therefore analyze now a speciﬁc parametric covariance model, which is often used as an elementary approximation of the short-time behavior of s(n). Let w(n) denote zero-mean white noise with variance w2 , which is stationary by deﬁnition. Its ACF is cw ðnÞ ¼ w2 ðnÞ, and its covariance matrix is the N N diagonal matrix Cw ¼ diag½w2 , w2 , . . . , w2 . We model s(n) as the output of a ﬁrst-order recursive LTI system with input w(n); s(n) then is also stationary and obeys s(n) ¼ s(n 1) þ w(n), with j j < 1. Transfer function HDT( f ) and impulse response h(n) of the LTI system are HDT ð f Þ ¼

1 hðnÞ ¼ "ðnÞn , 1 ej2pf

ð53Þ

where "(n) is the unit step sequence, i.e., "(n) ¼ 1 for n 0, and zero otherwise. The ACF of this ﬁrst-order autoregressive (AR(1)) or Markov-I process is cs ðnÞ ¼ s2 jnj , with s2 ¼

w2 : 1 2

ð54Þ

The covariance matrix Cs then is Cs ¼ s2 toeplitz½1, , 2 , . . . , N1 :

ð55Þ

The correlation between samples of s(n) decays exponentially with their distance, and is the correlation between directly adjacent samples. Practically, approximation of the short-time and short-space behavior of speech and image signals, respectively, leads to positive and close to one (Ahmed et al., 1974; Clarke and Tech, 1981; Clarke, 1985; Malvar, 1992b; Goyal, 2001; Akansu and Haddad, 2001). The eigenvectors of the covariance matrix are sinusoids (Ray and Driver, 1970; see also Clarke and Tech, 1981; Akansu and Haddad, 2001, p. 36) the frequencies of which are not equally spaced on the unit circle. No fast algorithm for computing this KLT exists. Fortunately, as shown numerically in Ahmed et al. (1974), the KLT for an AR(1) process with suﬃciently large is well

21

FOURIER, BLOCK, AND LAPPED TRANSFORMS

approximated by the DCT. Element n of basis vector k of the DCT is deﬁned as 8 rﬃﬃﬃﬃ > 1 > > < N ak ðnÞ ¼ rﬃﬃﬃﬃ > > > 2 cos 2n þ 1 kp : N 2N

for k ¼ 0

9 > > > =

> > ; for k ¼ 1, . . . , N 1 >

,

ð56Þ

n ¼ 0, . . . , N 1: For a visual comparison, Figures 5 and 6 depict the KLT basis functions for ¼ 0.91, N ¼ 8 and the DCT basis vectors. Clarke proved analytically that the KLT of an AR(1) process approaches the DCT as approaches one (Clarke and Tech, 1981). Moreover, the DCT of an N-point signal vector can be regarded as the 2N-point DFT of the concatenation s(n) and the mirrored signal s(2N 1 n) (Clarke and Tech, 1981; Lim, 1990, p. 148). Periodic repetition of the concatenated signal is not aﬄicted with discontinuities between the periods, thus avoiding the spreading of spectral

FIGURE 5. Numerically computed KLT basis vectors of an AR(1) process for ¼ 0.91 and N ¼ 8.

22

TIL AACH

FIGURE 6. Basis vectors of the unitary DCT for N ¼ 8. Up to a sign, the similarity to the KLT basis vectors in Figure 5 is evident.

energy caused by the DFT leakage artifacts (Lim, 1990, p. 645). More details are given in Appendix C. Figure 5 also illustrates symmetry properties of the KLT: evidently, half of the eigenvectors are invariant to reversing the order of their elements; they are called (even) symmetric. For these vectors, we have ui ¼ Jui, where J denotes the N N counter identity matrix (or reverse operator), with ones along the second diagonal and zero elsewhere. For the other half, we have ui ¼ Jui; these vectors are skew symmetric. In fact, for persymmetric matrices C with distinct eigenvalues and N even, half of the eigenvectors are symmetric, while the other half are skew symmetric (Cantoni and Butler, 1976; Makhoul, 1981; Unser, 1984; Akansu and Haddad, 2001). The same symmetry properties hold for the DCT basis vectors, half of which are symmetric, while the other half are skew symmetric. We will need this property for the construction of lapped transforms. Let us summarize the results of this section:

The covariance matrix of a wide-sense stationary random signal is a persymmetric Toeplitz matrix. The orthogonal linear transform generating perfectly decorrelated transform coeﬃcients from a wide-sense stationary signal is the KLT,

23

FOURIER, BLOCK, AND LAPPED TRANSFORMS

which is unique except for a sign in the eigenvectors if the eigenvectors are constrained to have only real elements. For an even number N of samples, half of the eigenvectors are symmetric, while the other half are skew symmetric. Also, the KLT maximizes energy concentration as measured by the basis restriction error, and minimizes the geometric mean of the transform coeﬃcient variances. The covariance matrix of a wide-sense stationary process can be decomposed into the sum of a circulant and a skew circulant matrix. A KLT of the circulant matrix is the DFT. Real data can often be regarded as a ﬁrst-order autoregressive (AR(1) or Markov-I) process with relatively high adjacent-sample correlation. A KLT for this model is well approximated by the DCT. As the adjacent-sample correlation approaches one, this KLT approaches the DCT.

For the AR(1) process with ¼ 0.91 and N ¼ 8, the decorrelation eﬃciency of the DCT is d ¼ 98.05% (for the KLT, d ¼ 100% by design). The basis restriction errors are given in Table 1.

C. Transform Coding Performance In this section we show how to distribute optimally an allowable maximum bit rate to the transform coeﬃcients in Figure 4 such that the average distortion is minimized, and quantify the distortion. Since a unitary transform preserves Euclidean length, it is straightforward to show that the distortion introduced by quantization in the transform domain is the same as the mean square error of the reconstructed signal (Huang and Schultheiss, 1963; Zelinski and Noll, 1977). Denoting the quantized transform coeﬃcient vector by S^ , and the reconstructed signal vector by s^ , the average distortion is (cf. Equation (30)) D¼

i 1 h ^ ÞH ðS S ^ Þ ¼ 1 E ðs ˆsÞT ðs ˆsÞ : E ðS S N N

ð57Þ

TABLE 1 BASIS RESTRICTION ERROR (%) L KLT DCT

FOR

KLT

AND

DCT

0

1

2

3

4

5

6

7

100 100

20.5 20.7

8.9 9.1

5.2 5.2

3.3 3.3

2.1 2.2

1.3 1.3

0.61 0.61

24

TIL AACH

For suﬃciently ﬁne quantization, it is shown in Appendix B, Equation (119), that the distortion D(k) of the k-th transform coeﬃcient depends on the allocated bit rate R(k) by DðkÞ ¼ ðkÞ S2 ðkÞ 22RðkÞ :

ð58Þ

The required bit rate for a given maximum distortion then is 2

1 1 ðkÞ RðkÞ ¼ log2 ½ðkÞ þ log2 S : 2 2 DðkÞ

ð59Þ

The parameters (k) depend on the distribution of the coeﬃcients and the type of quantization. Assuming a Gaussian signal, the transform coeﬃcients are also Gaussian. (Transform coeﬃcients perfectly decorrelated by a KLT are then also statistically independent.) The (k) are then all identical, and the rate simpliﬁes to RðkÞ ¼

2

1 1 ðkÞ log2 ½ þ log2 S : 2 2 DðkÞ

ð60Þ

Minimizing the average distortion D¼

1 X 1N DðkÞ N k¼0

ð61Þ

X 1 N1 RðkÞ N k¼0

ð62Þ

subject to a ﬁxed average rate R¼

yields that all transform coeﬃcients have to be quantized with the same distortion D(k) ¼ D, k ¼ 0, . . . , N1. The optimum bit rate for the kth transform coeﬃcient is RðkÞ ¼ R þ

2

1 ðkÞ log2 S2 , 2 GM

ð63Þ

2 where GM is the geometric mean of the transform coeﬃcient variances introduced in Equation (43). (Potential negative rates for low-variance coeﬃcients may be clipped, see e.g., Zelinski and Noll, 1977; Goyal, 2001).

FOURIER, BLOCK, AND LAPPED TRANSFORMS

25

Inserting this result into Equation (58), and with D(k) ¼ D, we obtain for the distortion as a function of rate given optimal bit allocation 2 D ¼ 22R GM :

ð64Þ

2 is minimized by the KLT; hence, the KLT is the As we saw above, GM transform minimizing the distortion under optimal bit allocation. To quantify the performance of a transform coder, the optimal transform coding distortion is compared to the distortion DPCM of PCM. In the latter, the transform matrix can formally be set to the identity matrix. Then, the transform coeﬃcients are identical to the signal samples, and the coeﬃcient variances are identical to the signal variance 2. We thus obtain for the transform coding gain

GTC ¼

P 2 2 ð1=NÞ N1 k¼0 S ðkÞ ¼ , 2 2 GM GM

ð65Þ

where the rightmost identity follows from the energy preservation property of unitary transforms. For an AR(1) process with ¼ 0.91 and N ¼ 8, the transform coding gains of DCT and KLT are 4.6334 (6.66 dB) and 4.668 (6.69 dB), respectively. Evidently, the DCT is a very good approximation to the KLT. Experiments show that this result holds also for covariance matrices estimated from real speech or image data (Zelinski and Noll, 1977; Malvar, 1992b; Clarke, 1985; Akansu and Haddad, 2001).

IV. TWO-DIMENSIONAL TRANSFORMS So far we have considered only 1D signals and their transformations. In this section we generalize to 2D signals. Let s(m, n) denote a real signal deﬁned over the 2D block m, n ¼ 0, . . . , N1, and S(k, l) the transform coeﬃcients for k, l ¼ 0, . . . , N1. (Without loss of generality, the restriction to square blocks simpliﬁes notation.) Signal samples and transform coeﬃcients can be regarded as N N-matrices s and S, respectively. The basis vectors ak ¼ [ak(0), . . . , ak(N1)]T, k ¼ 0, . . . , N l, are then replaced by basis matrices bkl ¼ [bkl]mn. The transform coeﬃcients are calculated by Sðk, lÞ ¼

N1 X X N1 m¼0 n¼0

sðm, nÞbkl ðm, nÞ:

ð66Þ

26

TIL AACH

With a 4D transform tensor T, this can be expressed as (Malvar, 1992b, p. 22) S ¼ Ts,

with T ¼ ½T klmn ¼ ½bkl ðm, nÞ :

ð67Þ

Alternatively, we can order the signal samples row by row into a N2-dimensional column vector sv as sv ¼ ½sð0, 0Þ, sð0, 1Þ, . . . , sð0, N 1Þ, sð1, 0Þ, . . . , sðN 1, N 1Þ T :

ð68Þ

Similarly, a transform coeﬃcient vector Sv can be formed. Ordering the entries bkl(m, n) in an appropriate order in a N2 N2 matrix B, we can express the 2D transform as a product of a matrix with a vector as Sv ¼ Bsv :

ð69Þ

Clearly, for real signals and transforms, this product requires O(N4) multiplications and additions. In practice, however, so-called separable 2D transforms are used almost exclusively. The entries bkl(m, n) of the (kþ1), (lþ1)th basis matrix of a separable transform are calculated from 1D basis vector entries by bkl(m, n) ¼ ak(m)al(n). For the unitary 2D DFT, this yields bkl ðm, nÞ ¼

1 jð2p=NÞðkmþlnÞ e , N

ð70Þ

and for the 2D DCT, we obtain 8 1 > > > > > N > > > pﬃﬃﬃ > > > > 2 2m þ 1 > > cos kp > > 2N < N bkl ðm, nÞ ¼ pﬃﬃﬃ > > 2 2n þ 1 > > lp cos > > > 2N N > > > > > > > 2 2m þ 1 2n þ 1 > > kp cos lp : cos N 2N 2N m, n ¼ 0, . . . , N 1:

9 > > > > > > > > > > > > > for l ¼ 0, k ¼ 1, . . . , N 1 > > > = for k ¼ l ¼ 0

> > > for k ¼ 0, l ¼ 1, . . . , N 1 > > > > > > > > > > > > > for k, l ¼ 1, . . . , N 1 ;

,

ð71Þ

27

FOURIER, BLOCK, AND LAPPED TRANSFORMS

The matrix B in Equation (69) can then be written as the Kronecker product of the N N transform matrix A for a 1D signal of length N with itself: 2

a00 A .. 6 B¼AA¼4 . aN10 A

a01 A

a0N1 A

aN11 A

aN1N1 A

3 7 5:

ð72Þ

In the tensor notation of Equation (67), the transform simpliﬁes to the product of three N N matrices as S ¼ AsAT ,

ð73Þ

where the multiplication from the right by AT is a transform of the rows of s, while the multiplication from the left by A transforms the columns. The 2D transform can hence be realized by a 1D transform along each row of the signal block followed by a 1D transform along each column of the result, or vice versa. Evidently, the number of multiplications and additions needed by Equation (73) is O(N3), down from O(N4) for the nonseparable transforms, if no fast algorithms are used. As an illustration, Figure 7 depicts the real part of a basis matrix for the DFT computed from Equation (70) and a basis matrix for the DCT according to Equation (71). A comparison shows that in the case of a real transform, separability comes at a price: while the DFT basis matrix exhibits an unambiguous orientation, this is not the case for the DCT, which consists of two cosine waves with diﬀerent orientations. The separable 2D DFT is

FIGURE 7. Left: Real part of a basis matrix of the 2D DFT for N ¼ 16, k ¼ l ¼ 2. Right: 2D DCT basis matrix for N ¼ 16, k ¼ l ¼ 4.

28

TIL AACH

therefore unambiguously orientation selective, while the separable 2D DCT basis matrices are sensitive to two diﬀerent orientations. Unambiguous orientation selectivity is desired in applications like adaptive enhancement of oriented structures, such as lines and edges. More on this topic can be found in Section V.E, and in Kunz and Aach (1999) and Aach and Kunz (1996a, 2000). In the following, we will consider only separable transforms. Since these can always be implemented as a sequence of 1D transforms, we will return to the 1D notation for the remainder of this chapter.

V. LAPPED TRANSFORMS A. Block Diagonal Transforms In the preceding discussion it was suﬃcient to express the transform operations with respect to single blocks. The development of lapped transforms will require the joint consideration of several neighboring blocks. Denoting the (mþ1)th block by sm ¼ [s(mN), s(mNþ1), . . . , s(mNþN1)]T, a signal st consisting of M blocks can be written as sTt ¼ ½s0 , s1 , . . . , sM1 . Similarly, with Sm ¼ Asm being the transform coeﬃcients for the (mþ1)th block and stacking these, we obtain 2 6 6 6 St ¼ 6 6 4

S0 S1 .. . .. . SM1

2. .. 7 6 7 6 7 6 7¼6 7 6 5 4

3 2

3

0

0

7 6 7 6 7 6 76 7 6 5 4

A A A

..

.

s0 s1 .. . .. .

3 7 7 7 7 ¼ T st , 7 5

ð74Þ

sM1

where the matrix T ¼ diag(A, . . . , A) is block-diagonal. The inverse transform is given by st ¼ TT St ¼ diagðAT , . . . , AT Þ,

ð75Þ

where we have assumed a real transform. Evidently, orthogonality of the blockwise transform can also be expressed as orthogonality of the transform matrix T.

FOURIER, BLOCK, AND LAPPED TRANSFORMS

29

B. Extension to Lapped Transforms As already shown in Figure 1, independent block processing may create artifacts at the block boundaries. These are caused by the discontinuous transitions to zero at the ends of the transform basis functions (Malvar, 1992b; Aach and Kunz, 2000). Block artifacts could hence be avoided by using basis functions that decay smoothly to zero. Perfect reconstruction by an inverse transform then requires that the basis functions of neighboring blocks overlap, as otherwise ‘‘holes’’ would appear in the reconstructed signal. The basis functions would thus have lengths L>N, while the number of transform coeﬃcients per block must, of course, not exceed N. The square matrix A is then replaced by a nonsquare matrix P of size N L. We consider now L ¼ 2N. The basis functions for calculating Sm then extend over the blocks sm and smþ1, i.e., over the samples [s(mN), s(mNþ1), . . . , s((mþ2)N1)]. The N-dimensional vector Sm of transform coeﬃcients is then given by Sm ¼ P

sm smþ1

:

ð76Þ

The next block is taken over the samples [s(mþ1)N), . . . , s((mþ3)N1)], and so on. This procedure is illustrated in Figure 8. Such a transform is called a lapped transform. Since P is not a square matrix, we cannot invert Equation (76) to obtain sm and smþ1 from Sm. We therefore formulate the transform with respect to the entire signal (or image). Writing the N 2N matrix P as the concatenation P ¼ [AB] of two N N matrices, Equation (74) becomes for a lapped transform 2. .. 7 6 7 6 7 6 7 6 7 6 7¼6 7 6 7 6 7 6 7 6 5 4 0 SM1 B

2 S 0 6 S1 6 . 6 .. 6 6 .. St ¼ 6 6 .. 6 . 6 . 6 . 4 . .

3

½A 0 0 0

B ½A 0 0

0 B ½A 0

0 B ½A

0 B ..

0

.

3 2 s 3 0 0 6 s1 7 .. 7 6 7 .7 7 6 ... 7 7 6 7 7 6 .. 7 7 6 . 7 ¼ T st , 7 6 . 7 7 6 7 0 7 6 .. 7 7 6 . 7 5 4 . 5 . B sM1 A ð77Þ

where the wrap-around in the last row corresponds to a periodic repetition of the signal. As in block transforms, T is a square matrix, which we

30

TIL AACH

FIGURE 8. Formation of signal blocks sm and transform vectors Sm in a lapped transform with basis functions of length L ¼ 2N.

require to be orthogonal, i.e., T TT ¼ I. The original image can thus be reconstructed by 2 6 6 6 6 T st ¼ T St ¼ 6 6 6 6 4

..

3 . AT

0

BT 0 .. .

AT BT

0 AT

0

BT

.

..

BT 0 .. .

7 7 7 7 7 St : 7 7 7 0 5 AT

ð78Þ

This relation shows that the inverse transform consists of two steps. First, each N-dimensional transform vector Sm is multiplied by the 2N N matrix PT, yielding a 2N-dimensional signal vector. Neighboring signal vectors overlap by N samples, and are added in a second step to obtain the reconstructed image. Alternatively, Equation (78) may be regarded as another lapped transform applied to the data St, yielding

S sm ¼ ½B A m1 Sm T

T

¼ BT Sm1 þ AT Sm ,

ð79Þ

which is of the same structure as Equation (76). C. The Lapped Orthogonal Transform The matrix product T TT yields a block tridiagonal matrix, with entries P PT along the main diagonal, entries A BT along the diagonal

FOURIER, BLOCK, AND LAPPED TRANSFORMS

31

immediately to the left, and entries B AT along the diagonal immediately to the right. From the orthogonality condition T TT ¼ I, the necessary and suﬃcient conditions on P ¼ [AB] therefore are P PT ¼ A AT þ B BT ¼ I and

A BT ¼ B AT ¼ 0:

ð80Þ

For T TT ¼ I, we can equivalently write TT T ¼ I, from which an alternative formulation of the necessary and suﬃcient condition can be derived: AT A þ BT B ¼ I and

AT B ¼ BT A ¼ 0:

ð81Þ

We may also approach the orthogonality conditions by rewriting Equation (76) to Sm ¼ P

sm

¼ ½AB

smþ1

sm

smþ1

:

ð82Þ

Inserting this into Equation (79), we obtain sm ¼ BT Asm1 þ AT Bsmþ1 þ ðAT A þ BT BÞsm :

ð83Þ

This equality holds with condition (81). The ﬁrst condition in Equation (80) states that the rows of P, i.e., the transform basis functions, must be orthogonal, while the second condition requires the overlapping parts of the basis functions to be orthogonal as well. A transform complying with Equation (80) is called a lapped orthogonal transform (LOT). Invoking the shift matrix V deﬁned as V¼

0 0

I , 0

ð84Þ

where 0 and I are of size N N, conditions (80) can be more compactly written as PVm PT ¼ ðmÞI, m ¼ 0, 1:

ð85Þ

Extending the above considerations towards lapped transforms with basis functions of lengths L ¼ KN, K ¼ 2, 3, . . . , the matrix P then has size N L, and condition (85) becomes (Malvar and Staelin, 1989; Malvar, 1992b) PVm PT ¼ ðmÞI, m ¼ 0, 1, 2, . . . , K 1,

ð86Þ

32

TIL AACH

where the identity matrix in the shift matrix V is now of order (K1)N. Of course, for K ¼ 1 this notation includes traditional nonoverlapping block transforms as a special case. If P0 is a valid LOT matrix, it can be used to generate more valid LOT matrices P by P ¼ Z P0, where Z is an orthogonal N N matrix. P will then also comply with condition (86), since PVm PT ¼ ZP0 Vm PT0 ZT ¼ ZðmÞIZT ¼ ðmÞI:

ð87Þ

In the following, we construct a valid LOT of order N with basis functions of length L ¼ 2N. To obtain a transform which can be realized by a fast algorithm, the initial matrix P0 is constructed from the unitary DCT basis functions of lengths N. As we have seen in Section III.B, half of the DCT basis functions are even symmetric, while the other half are odd. Stacking the even basis functions rowwise into the (N/2) N matrix De, and the odd ones into the matrix Do, a valid LOT matrix is (Malvar, 1992a; Akansu and Wadas, 1992; Akansu and Haddad, 2001) P0 ¼

1 De Do 2 De Do

ðDe Do ÞJ , ðDe Do ÞJ

ð88Þ

where J is the counter identity matrix (or reverse operator) already used in Section III.B. The matrix P0 is of size N 2N, where, similar to KLT and DCT, the basis functions in the ﬁrst N/2 rows are even, while the other N/2 basis functions are odd. It satisﬁes condition (86), but it will not optimize the transform coding gain of, for example, an AR(1) process. Hence, for a given covariance model Cs, the orthogonal square matrix Z is determined such that its rows are identical to the eigenvectors of the covariance matrix P0 Cs PT0 . The LOT ZP0 thus consists of two steps: a transform by P0 followed by another transform by Z. The covariance matrix CS ¼ ZP0 Cs PT0 ZT then is diagonal. Note, however, that the LOT does not preserve the determinant of the covariance matrix. Figure 9 shows the basis functions for N ¼ 8 and L ¼ 16 as computed for an AR(1) process with ¼ 0.91. The coding gain of this LOT is 5.06 (7.05 dB). The fast implementation of this transform reﬂects its two-step structure (Malvar, 1992b): the matrix P0 is realized using an N-point DCT, which is followed by a series of plane rotations used to approximate Z (Akansu and Haddad, 2001; Akansu and Wadas, 1992). The numerical values of the basis functions of the approximate LOT can be found for ¼ 0.95 in Malvar (1992b, p. 171).

FOURIER, BLOCK, AND LAPPED TRANSFORMS

33

FIGURE 9. Basis functions of the LOT for N ¼ 8 and L ¼ 16. The computation of the basis functions is based on an AR(1) signal model with ¼ 0.91. The functions are sorted from left to right and top to bottom in descending order of the eigenvalues of P0CsPT0 .

D. The Modulated Lapped Transform The above LOT was derived by an eigenvector analysis, leading to basis functions with even or odd symmetry. An alternative approach is motivated by the close relationship between maximally decimated ﬁlter banks on the one hand and block and lapped transforms on the other (Akansu and Haddad, 2001, p. 4; Malvar, 1992b). In ﬁlter banks, the ﬁlters are often realized by a low-pass prototype shifted to N diﬀerent frequency channels by modulation. In the context of lapped transforms, this leads to the so-called modulated lapped transform if the ﬁlter length L is equal to 2N. For longer ﬁlters (or basis functions), this transform is referred to as the extended lapped transform (ELT). For L ¼ 2N, the basis functions are formed by a cosine modulated window function h(n), leading to the N 2N transform matrix P with entries ½P kn

rﬃﬃﬃﬃ

2 Nþ1 1 p cos n þ kþ , ¼ hðnÞ N 2 2 N

ð89Þ

34

TIL AACH

FIGURE 10. Basis functions of the MLT for N ¼ 8 and L ¼ 16. The frequency index k increases from left to right and top to bottom.

for k ¼ 0, . . . , N1, and n ¼ 0, . . . , 2N1. The window h(n) obeys hðnÞ ¼ sin

1 p nþ : 2 2N

ð90Þ

These basis functions are shown in Figure 10. Evidently, they are not symmetric any more. Still, the half-sine window ensures a continuous transition towards zero at the ends of the basis functions. In the following, we will show that this choice of basis functions complies with the orthogonality conditions (81). The window function obeys the conditions h2 ðnÞ þ h2 ðn þ NÞ ¼ 1

ð91Þ

hðnÞ ¼ hð2N 1 nÞ:

ð92Þ

and

FOURIER, BLOCK, AND LAPPED TRANSFORMS

35

Arranging the window samples into two diagonal N N matrices H0 and H1, we obtain H0 ¼ diag½hð0Þ, hð1Þ, . . . , hðN 1Þ H1 ¼ diag½hðNÞ, hðN þ 1Þ, . . . , hð2N 1Þ ¼ diag½hðN 1Þ, hðN 2Þ, . . . , hð0Þ ¼ JH0 J

ð93Þ

where JH0J reverses both rows and columns of H0. The modulating cosines are arranged into the N N matrices Q0 and Q1, yielding ½Q0 kn

rﬃﬃﬃﬃ

2 N þ1 1 p cos n þ kþ , k, n ¼ 0, . . . , N 1 ¼ N 2 2 N

ð94Þ

and ½Q1 kn ¼

rﬃﬃﬃﬃ

2 Nþ1 1 p cos n þ N þ kþ , N 2 2 N

k, n ¼ 0, . . . , N 1 ð95Þ

Expressing the transformation matrix P as the concatenation P ¼ [AB], we obtain A ¼ Q0 H0 , and

B ¼ Q1 H1 :

ð96Þ

For Q0 and Q1, the conditions QT0 Q1 ¼ QT1 Q0 ¼ 0,

ð97Þ

QT0 Q0 ¼ Q0 QT0 ¼ I J,

ð98Þ

QT1 Q1 ¼ Q1 QT1 ¼ I þ J

ð99Þ

and

hold (see Appendix D). Inserting these into condition (81), we obtain AT B ¼ H0 QT0 Q1 H1 ¼ 0

ð100Þ

36

TIL AACH

and AT A þ BT B ¼ H0 QT0 Q0 H0 þ H1 QT1 Q1 H1 ¼ H0 ½I J H0 þ H1 ½I þ J H1 ¼ H20 þ H21 H0 JH0 þ H1 JH1 ¼ I

ð101Þ

since H20 þ H21 ¼ I and H0 JH0 ¼ H1 JH1 . This shows that the MLT complies with the orthogonality conditions for the LOT. For the AR(1) model with ¼ 0.91, the MLT coding gain is 5.15 (7.12 dB). E. Extensions In this section we discuss three extensions of the MLT and the LOT by introducing additional basis functions which are in a certain sense complementary to the already existing ones. T In the MLT, reconstruction from the transform vector Sm ¼ P sTm sTmþ1 only leads to

s^ m

s^ mþ1

¼ PT Sm ¼

AT BT

½AB

sm smþ1

:

ð102Þ

With Equations (96), (98), and (99), we obtain

AT BT

AT A ½AB ¼ BT A

H0 ðI JÞH0 AT B ¼ T 0 B B

0 H1 ðI þ JÞH1

ð103Þ

where 0 is of size N N. This matrix is evidently not diagonal, thus mixing coeﬃcients from sm with diﬀerent time indices into one coeﬃcient of the reconstructed vector ˆsm , and similarly for ˆsmþ1 . By analogy to frequencydomain aliasing, where higher frequencies are mapped back onto lower ones during downsampling, this phenomenon is called time-domain aliasing. Since the MLT perfectly reconstructs the entire signal st by adding the overlapping signals obtained by individual inverse transforms, time-domain aliasing in the reconstruction from Sm is hence canceled from reconstructions from Sm1, and Smþ1 (time-domain aliasing cancellation, TDAC). This observation holds if the transform vectors Si are left unchanged. Frequency-domain processing of the transform coeﬃcients unbalances the time-domain aliasing components contained in the ˆsi , thus resulting in uncanceled time-domain aliasing in the reconstructed signal. In general, these uncanceled aliasing components are the larger, the stronger the transform coeﬃcients are changed during processing. Keeping uncanceled

FOURIER, BLOCK, AND LAPPED TRANSFORMS

37

aliasing below an acceptable threshold hence restricts how strongly the transform coeﬃcients can be processed. As an example, acoustic echo cancellation using the MLT is mentioned in Malvar (1999), where the occurrence of uncanceled aliasing limits the maximum echo reduction to no more than 10 dB. In Malvar (1999) the MLT is therefore extended by replacing the real basis functions by complex ones deﬁned as ½P kn

rﬃﬃﬃﬃ 2 Nþ1 1 p exp j n þ kþ : ¼ hðnÞ N 2 2 N

ð104Þ

The resulting transform is called the modulated complex lapped transform (MCLT). The inverse transform is carried out by the Hermitian transpose PH, yielding for Equation (102)

ˆsm ˆsmþ1

sm ¼ PH Sm ¼ PH P smþ1

ð105Þ

with (Malvar, 1999)

H20 P P ¼ diag h ðnÞ ¼ 0 H

2

0 H21

ð106Þ

which is a diagonal matrix. Time-domain aliasing does therefore not occur, which allows a stronger degree of processing. Superposition of the reconstructed signal vectors only compensates for the eﬀects of the window h(n). In Malvar (1999) the MCLT permits one to reduce echo by 20 dB, compared to only 5 dB with the MLT. The price to pay is a redundancy by a factor of two, since the MCLT transforms N real signal samples into N complex transform coeﬃcients. In Young and Kingsbury (1993) a similar extension, termed the complex lapped transform (CLT), is proposed for the 2D LOT. The objective is to estimate motion in image sequences by phase correlation between blocks. Since the use of a lapped transform implies smoothly windowed overlapping blocks, smoother motion ﬁelds are expected in comparison to motion estimation techniques using nonoverlapping blocks. The transform generates a redundancy by a factor two in each dimension, resulting in a total redundancy of four. We ﬁnally discuss an extension of the 2D MLT which makes the transform unambiguously orientation sensitive. As we have seen in Section IV, the basis functions of real separable 2D transforms are sensitive to two diﬀerent orientations. For image enhancement and restoration,

38

TIL AACH

however, unambiguous detection of orientated structures is often desired. This can be achieved by complementing the cosine-shaped basis functions by sine-shaped ones. The basis functions of the separable, 2D MLT are given by ½P klmn ¼

2hðnÞhðmÞ Nþ1 1 p Nþ1 1 p cos m þ kþ cos n þ lþ N 2 2 N 2 2 N

ð107Þ

where k, l ¼ 0, . . . , N1 and m, n ¼ 0, . . . , 2N1. Replacing the cosine functions by sine functions leads to the complementary basis functions ½P0 klmn ¼

2hðnÞhðmÞ sin N

Nþ1 1 p Nþ1 1 p mþ kþ sin n þ lþ : 2 2 N 2 2 N

ð108Þ

The basis functions of the new, orientation-selective transform are formed by ½PLþ klmn ¼ ½P klmn þ ½P0 klmn 2hðnÞhðmÞ p Nþ1 1 Nþ1 1 cos mþ kþ nþ lþ , ¼ N N 2 2 2 2

ð109Þ

which is an unambiguously orientated windowed cosine wave. Since the [PLþ]klmn cover only half the possible orientations, we form additionally the basis functions ½PL klmn ¼ ½P klmn ¼ ½P0 klmn 2hðnÞhðmÞ p Nþ1 1 Nþ1 1 kþ þ nþ lþ : ¼ cos mþ N N 2 2 2 2

ð110Þ

This transform is termed the lapped directional transform (LDT) (Kunz and Aach, 1999; Aach and Kunz, 2000). The relation between the MLT and LDT basis functions is illustrated in Figure 11. The LDT is real-valued, but not separable. However, both forward and inverse LDT can be computed from the separable fast MLTs in Equations (107) and (108). The LDT generates a redundancy by a factor of only two, and was successfully used for anisotropic image restoration and enhancement in Kunz and Aach (1999) and Aach and Kunz (2000). In a combined image restoration, enhancement, and compression framework, the processed LDT coeﬃcients can be

FOURIER, BLOCK, AND LAPPED TRANSFORMS

a)

b)

c)

d)

39

FIGURE 11. Example 2D MLT and LDT basis functions for N ¼ 8, and k ¼ 3, l ¼ 2: (a) MLT, (b) MLT0 , (c) LDT (sum), (d) LDT (diﬀerence).

reconverted into the coeﬃcients of the MLT and the complementary MLT by a simple butterﬂy. Using only the MLT coeﬃcients for compression eliminates the redundancy problem (Aach and Kunz, 2000).

VI. IMAGE RESTORATION AND ENHANCEMENT In this section we compare the block FFT and the LDT within a framework for anisotropic noise reduction by a nonlinear spectral domain ﬁlter. The noisy input image is ﬁrst decomposed into blocks of size 32 32 pixels, which are then transformed by the FFT or the LDT. The observed noisy transform coeﬃcients are then attenuated depending on their observed signal-to-noise ratio: the more the magnitude of a coeﬃcient exceeds a

40

TIL AACH

FIGURE 12. Left: Original ‘‘Marcel’’ image. Right: Noisy version with a peak signal-tonoise ratio of 20.2 dB.

FIGURE 13. Left: Processing result for the noisy ‘‘Marcel’’ image using the block FFT with no overlap. The blocking eﬀect is evident. Right: Processing result for the noisy ‘‘Marcel’’ image using the LDT. The noise reduction performance is almost identical to the FFT-based algorithm, but the blocking eﬀect has disappeared.

corresponding noise estimate, the less it is attenuated. Since directional image information leads to spectral energy concentration, which can unambiguously be detected in both FFT and LDT (but not in real separable transforms, like DCT, LOT, and MLT), coeﬃcients contributing to oriented lines and edges can be identiﬁed and more carefully treated than other ones. These algorithms are discussed in detail elsewhere (Aach and Kunz, 1996a, 1998, 2000; Aach, 2000; Kunz and Aach, 1999). Figure 12 shows an original image and its noisy version (white Gaussian noise, peak signal-to-noise ratio 20.2 dB). The processed images are shown in Figure 13. Evidently, processing by the FFT without block overlap reduces the noise level visibly, but the rather strong processing causes the block raster to appear. (In Aach

FOURIER, BLOCK, AND LAPPED TRANSFORMS

41

FIGURE 14. Enlarged versions of the FFT-processed (left) and LDT-processed (right) noisy ‘‘Marcel’’ image.

and Kunz (1996a,b) the authors therefore used overlapping blocks, inﬂating the processed data volume by a factor of four.) The LDT-based processing result reduces noise approximately as much as the FFT-based approach, i.e., by about 6 dB, without causing block artifacts. Enlargements of both processing results are shown in Figure 14.

VII. DISCUSSION In this chapter we have summarized the development of lapped transforms. We started with the continuous-time and discrete-time Fourier transforms of time-dependent signals with inﬁnite duration. These transforms were viewed as a decomposition of the signals into frequency-selective basis functions, or eigenfunctions of LTI systems. With the discrete Fourier transform which decomposes a ﬁnite-length signal block into a set of orthogonal basis functions, a transform could be expressed as a multiplication of the signal vector by a unitary matrix, i.e., viewed as a rotation of coordinate axes. We then analyzed the eﬀects of unitary transforms of the covariance structure of random signals, and found optimal transforms with respect to decorrelation and energy concentration. While these optimal transforms are signal dependent and cannot be calculated fast, we showed that Fourier-like ﬁxed transforms, in particular the DCT, are good practical approximations to the optimal transforms. The disadvantage of blockwise processing are the blocking artifacts introduced by independent spectraldomain processing of the blocks. To alleviate the blocking eﬀects, we then turned to ﬁnite-length transforms with overlapping basis functions. The

42

TIL AACH

transform matrix for a single block then is not square any more; inverse transforms of single blocks do not therefore exist. Under extended orthogonality conditions, however, it was shown that the original signal can be reconstructed from nonperfectly reconstructed individual blocks by overlapping and adding. Two types of lapped transforms were discussed, the lapped orthogonal transform and the modulated lapped transform, where we focused on a 2 : 1 overlap. For the LOT, a feasible rectangular matrix obeying the extended orthogonality conditions was ﬁrst constructed using DCT basis functions. This matrix was optimized by multiplication with an orthogonal square matrix derived from an eigenvector analysis. The MLT did not need an eigenvector analysis; rather, it was based on modulated ﬁlter banks. We did not delve deeper into the relation between block transforms and ﬁlter banks. Suﬃce it to mention that a block transform can be viewed as a uniform critically sampled ﬁlter bank, where the ﬁlter length is equal to the number of subbands. Similarly, a lapped transform can be regarded as a uniform and critically sampled ﬁlter bank with ﬁlter length equal to, for example, twice the number of subbands. We then discussed extensions of both the LOT and the MLT in speech and image processing. These extensions are based on the additional use of complementary basis functions, thus introducing redundancy. We concluded with an exemplary comparison of block and lapped transforms in image processing.

ACKNOWLEDGMENTS The author is grateful to Cicero Mota, formerly with the University of Amazonas, Brazil, and now with the University of Lu¨beck, and to Dietmar Kunz, Cologne University of Applied Sciences, for fruitful discussions.

APPENDIX A To prove that Equation (23) indeed recovers s(n) from its frequency coeﬃcients, we multiply both sides of Equation (22) by e jð2p=NÞkr , sum over all frequency coeﬃcients, and normalize by N, yielding 1 N 1 1 X X X 1 N 1 N SDFT ðkÞ e jð2p=NÞkr ¼ sðnÞ e jð2p=NÞðrnÞk N k¼0 N n¼0 k¼0

for r ¼ 0, . . . , N 1, ð111Þ

43

FOURIER, BLOCK, AND LAPPED TRANSFORMS

where we have interchanged the order of summations on the right-hand side. The orthogonality of complex sinusoids 1 X 1 N 1 for r n ¼ 0, N, 2N, . . . jð2p=NÞðrnÞk e ¼ ¼ ðn ðr mNÞÞ 0 otherwise N k¼0 ð112Þ yields 1 N 1 X X 1 N SDFT ðkÞ e jð2p=NÞkr ¼ sðnÞðn ðr mNÞÞ ¼ sðrÞ, N k¼0 n¼0

ð113Þ

which concludes the proof.

APPENDIX B Figure 15 shows a scalar uniform quantizer with quantization interval or step size . A transform coeﬃcient S(k) is quantized to multiples V(i(k)) ¼ i(k) (Goyal, 2001, p. 13; Gray and Neuhoﬀ, 1998). The output of the quantizer hence is an index i(k) ¼ round(S(k)/), where round(x) rounds to the nearest integer. The decoder calculates the quantized transform coeﬃcient values by S^ ðkÞ ¼ iðkÞ ¼ VðiðkÞÞ: Assuming suﬃciently ﬁne quantization, the error dðkÞ ¼ SðkÞ S^ ðkÞ can be assumed as being

FIGURE 15. Uniform quantization into multiples of the step size .

44

TIL AACH

uniformly distributed between /2 and /2. Deﬁning the distortion D(k) as D(k) ¼ E [d2(k)], where E denotes the expectation, we obtain D¼

2 : 12

ð114Þ

Consider S(k) uniformly distributed between [Smax, Smax). Its energy S2 ðkÞ then is S2 ðkÞ ¼

ð2Smax Þ2 : 12

ð115Þ

Dividing the dynamic range [Smax, Smax) into steps of step size yields 2Smax/ quantization steps. Assuming the number of steps being a power of two, a ﬁxed-length code needs 2Smax R ¼ log2

ð116Þ

bits per transform coeﬃcient. The distortion then depends on the rate R according to D ¼ S2 ðkÞ 22R

ð117Þ

and the signal-to-distortion ratio is 2 S2 ðkÞ ðkÞ 2R ¼ 2 ) 10 log10 S dB ¼ R 6 dB: D D

ð118Þ

Each additional bit hence improves this ratio by 6 dB (Lu¨ke, 1999, p. 204; Proakis and Manolakis, 1996, Sect. 9.2.3). In fact, it can be shown that optimal quantizers perform in accordance with (Goyal, 2001; Gray and Neuhoﬀ, 1998) D¼

S2 ðkÞ

2

2R

) 10 log10

S2 ðkÞ dB ¼ R 6 dB 10 log10 ðÞ dB, D ð119Þ

where is a factor depending on the distribution of the input signal and on the encoding method. For instance, for a Gaussian source and

FOURIER, BLOCK, AND LAPPED TRANSFORMS

45

pﬃﬃﬃ ﬁxed-length encoding of i(k), we have ¼ 3p=2 2:7. Using an entropy code yields ¼ pe=6; this improves the signal-to-distortion ratio by about 2.8 dB over the ﬁxed-length code (Goyal, 2001, p. 14; Jayant and Noll, 1984).

APPENDIX C To eliminate the potential discontinuities in the periodic repetition of s(n), n ¼ 0, . . . , N1, we form the concatenated signal of length 2N gðnÞ ¼

sðnÞ for n ¼ 0, . . . , N 1 : sð2N 1 nÞ for n ¼ N, . . . , 2N 1

ð120Þ

Figure 16 shows the concatenated signal g(n) for the cosine wave in Figure 3. Note that the last coeﬃcient of s(n), i.e., s(N1) ¼ g(N1), is repeated as g(N), the concatenation therefore is not a perfect cosine wave. Periodic repetition of g(n) will not exhibit unwanted discontinuities, so the DFT of g(n) should not be aﬄicted by leakage artifacts. Also, if s(n) is of even length, so is g(n), which is convenient when one wants to use fast FFT-like implementations. Moreover, g(n) is symmetric with respect to N0.5. The

FIGURE 16. Concatenation g(n) of the cosine wave in Figure 3 and its mirrored version according to Equation (120). Note that always g(N1) ¼ g(N).

46

TIL AACH

DFT G(k), k ¼ 0, . . . , 2N1, of g(n) should therefore be real apart from a complex linear phase factor e jpk=2N , and even symmetric (recall that s(n) is assumed to be real). Indeed, we have for G(k)

gðnÞ GðkÞ ¼

N 1 X

sðnÞejðpk=NÞn þ

n¼0

2N1 X

sð2N 1 mÞejðpk=NÞm ,

m¼N

ð121Þ which, after substituting n ¼ 2N1m in the second sum, yields

GðkÞ ¼

N1 X

sðnÞ ejðpk=NÞn þ e jðpk=NÞðnþ1Þ :

ð122Þ

n¼0

Factoring out the complex linear phase factor caused by the (N 0.5)-point circular shift, we obtain

GðkÞ ¼ e

jpk=2N

pkðn þ ð1=2ÞÞ , k ¼ 0, . . . , 2N 1: 2 sðnÞ cos N n¼0 N 1 X

ð123Þ Leaving oﬀ the complex exponential factor (this corresponds to a reverse circular shift of g(n) by N 0.5 points) and normalizing to achieve a unitary transform leads to the DCT as deﬁned in Equation (56). Because of the symmetry of the DFT coeﬃcients, the coeﬃcients for k ¼ 0, . . . , N1 suﬃce. Figure 17 shows jG(k)j for the extended cosine wave in Figure 16; this is proportional to the modulus DCT spectrum of the signal in Figure 3. When comparing the spectra in Figures 3 and 17, the reduction of leakage is immediately evident. The DCT can hence be regarded as a DFT after modifying the signal so that discontinuities do not occur in the periodic extension. Another consequence is that the DCT can be computed eﬃciently using FFT algorithms where, from Equation (56), no complex number operations are needed any more. Exploiting the symmetry of the concatenated signal g(n), the 2N-point DFT of G(k) can actually be computed by an N-point DFT (Lim, 1990, p. 153). The above observations also hold for the inverse DCT.

FOURIER, BLOCK, AND LAPPED TRANSFORMS

47

FIGURE 17. Modulus DFT of the extended cosine in Figure 16, for k ¼ 0, . . . , N1 ¼ 63, which is proportional to the DCT of the cosine wave in Figure 3. Note the improved concentration of spectral energy with respect to the DFT spectrum in Figure 3.

APPENDIX D With the notation

Nþ1 1 p ðk, mÞ ¼ m þ kþ 2 2 N N þ1 1 p

ðk, nÞ ¼ n þ N þ kþ 2 2 N

ð124Þ

we have for the (mþ1, nþ1)th element of QT0 Q1

QT0 Q1

With

mn

¼

1 X 2 N cos ðk, mÞ cos ðk, nÞ N k¼0

¼

1 X 1 N cos½ ðk, mÞ þ ðk, nÞ þ cos½ ðk, mÞ ðk, nÞ : N k¼0

1 p ðkÞ ¼ ðk, mÞ þ ðk, nÞ ¼ ðm þ n þ 1Þ k þ þp 2 N

ð125Þ

ð126Þ

48

TIL AACH

PN1 and 0 < m þ n þ 1 < 2N, the sum k¼0 cos ðkÞ extends over i periods if m þ n þ 1 ¼ 2i is an even number, and is thus zero. For m þ n þ 1 ¼ 2i þ 1 odd, the sequence of cos ðkÞ, k ¼ 0, . . . , N1, is an odd sequence, and again sums to zero. Similarly, the sum over cos½ ðk, mÞ ðk, nÞ is zero, which proves Equation (97). The entries of QT0 Q0 are

QT0 Q0

mn

¼

1 X 2 N cos ðk, mÞ cos ðk, nÞ N k¼0

¼

1 X 1 N cos½ ðk, mÞ ðk, nÞ þ cos½ ðk, mÞ þ ðk, nÞ N k¼0

ð127Þ

where N1 X

cos½ ðk, mÞ ðk, nÞ ¼

k¼0

0 for m 6¼ n N for m ¼ n

ð128Þ

and N1 X k¼0

cos½ ðk, mÞ þ ðk, nÞ ¼

0 N

for m þ n 6¼ N 1 for m þ n ¼ N 1

ð129Þ

from which Equation (98) follows. The proof of Equation (99) is similar.

REFERENCES Aach, T. (2000). Transform-based denoising and enhancement in medical x-ray imaging. European Signal Processing Conference, EURASIP, Tampere, Finland, edited by M. Gabbouj, and P. Kuosmanen, pp. 1085–1088. Aach, T., and Kunz, D. (1996a). Anisotropic spectral magnitude estimation ﬁlters for noise reduction and image enhancement. Proc. ICIP-96, Lausanne, Switzerland, pp. 335–338. Aach, T., and Kunz, D. (1996b). Spectral estimation ﬁlters for noise reduction in x-ray ﬂuoroscopy imaging. Proc. EUSIPCO-96, Trieste, Italy, edited by G. Ramponi, G. L. Sicuranza, S. Carrato, and S. Marsi, pp. 571–574. Aach, T., and Kunz, D. (1998). Spectral amplitude estimation-based x-ray image restoration: An extension of a speech enhancement approach, in Proc. EUSIPCO-98, Patras, edited by S. Theodoridis, I. Pitas, A. Stouraitis, and N. Kalouptsidis, pp. 323–326. Aach, T., and Kunz, D. (2000). A lapped directional transform for spectral image analysis and its application to restoration and enhancement. Signal Processing 80(11), 2347–2364. Ahmed, N., Natarajan, T., and Rao, K. R. (1974). Discrete cosine transform. IEEE Trans. Computers 23, 90–93.

FOURIER, BLOCK, AND LAPPED TRANSFORMS

49

Akansu, A. N., and Haddad, R. A. (2001). Multiresolution Signal Decomposition. Boston: Academic Press. Akansu, A. N., and Wadas, F. E. (1992). On lapped orthogonal transforms. IEEE Trans. Signal Processing 40(2), 439–443. Bamler, R. (1989). Mehrdimensionale lineare Systeme. Berlin: Springer Verlag. Cantoni, A., and Butler, P. (1976). Properties of the eigenvectors of persymmetric matrices with applications to communication theory. IEEE Trans. Communications 24(8), 804–809. Cappe´, O. (1994). Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. Speech and Audio Processing 2(2), 345–349. Clarke, R. J. (1985). Transform Coding of Images. London: Academic Press. Clarke, R. J., and Tech, B. (1981). Relation between the carhunen loe`ve and cosine transform. IEEE Proc. 128(6), 359–360. Ephraim, Y., and Malah, D. (1984). Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoustics, Speech, and Signal Processing 32(6), 1109–1121. Fukunaga, K. (1972). Introduction to Statistical Pattern Recognition. New York: Academic Press. Goyal, V. K. (2001). Theoretical foundations of transform coding. IEEE Signal Processing Magazine September, 9–21. Gray, R. M., and Neuhoﬀ, D. L. (1998). Quantization. IEEE Trans. Information Theory 44, 2325–2383. Huang, J., and Schultheiss, P. (1963). Block quantization of correlated Gaussian random variables. IEEE Trans. Communication Systems 11, 289–296. Jain, A. K. (1979). A sinusoidal family of unitary transforms. IEEE Trans. Pattern Analysis and Machine Intelligence 1(4), 356–365. Jayant, N. S., and Noll, P. (1984). Digital Coding of Waveforms. Englewood Cliﬀs, NJ: Prentice Hall. Kunz, D., and Aach, T. (1999). Lapped directional transform: A new transform for spectral image analysis. Proc. ICASSP-99, Phoenix, AZ, pp. 3433–3436. Lim, J. S. (1980). Image restoration by short space spectral subtraction. IEEE Trans. Acoustics, Speech, and Signal Processing 28(2), 191–197. Lim, J. S. (1990). Two-Dimensional Signal and Image Processing. Englewood Cliﬀs, NJ: Prentice-Hall. Lim, J. S., and Oppenheim, A. V. (1979). Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67(12), 1586–1604. Lu¨ke, H. D. (1999). Signalu¨bertragung. Berlin, Heidelberg, New York: Springer Verlag. Makhoul, J. (1981). On the eigenvectors of symmetric Toeplitz matrices. IEEE Trans. Acoustics, Speech, and Signal Processing 29(4), 868–872. Malvar, H. (1999). A modulated complex lapped transform and its application to audio processing. Proc. ICASSP-99, Phoenix, AZ, pp. 1421–1424. Malvar, H. S. (1992a). Extended lapped transforms: Properties, applications, and fast algorithms. IEEE Trans. Signal Processing 40(11), 2703–2714. Malvar, H. S. (1992b). Signal Processing with Lapped Transforms. Norwood, MA: Artech House. Malvar, H. S., and Staelin, D. H. (1989). The LOT: Transform coding without blocking eﬀects. IEEE Trans. Acoustics, Speech, and Signal Processing 37(4), 553–559. Oppenheim, A. V., and Schafer, R. W. (1998). Discrete-Time Signal Processing. Englewood Cliﬀs, NJ: Prentice Hall. Papoulis, A. (1968). Systems and Transforms with Applications in Optics. New York: McGraw Hill.

50

TIL AACH

Proakis, J. G., and Manolakis, D. G. (1996). Digital Signal Processing. Upper Saddle River, NJ: Prentice Hall. Rabbani, M., and Jones, P. W. (1991). Digital Image Compression Techniques. Bellingham: SPIE Optical Engineering Press. Ray, W. D., and Driver, R. M. (1970). Further decomposition of the Karhunen–Loe`ve series representation of a stationary random process. IEEE Trans. Information Theory 16(4), 845–850. Therrien, C. W. (1989). Decision, Estimation, and Classiﬁcation. New York: Wiley. Therrien, C. W. (1992). Discrete Random Signals and Statistical Signal Processing. Englewood Cliﬀs, NJ: Prentice-Hall. Unser, M. (1984). On the approximation of the discrete Karhunen–Loeve transform for stationary processes. Signal Processing 7, 231–249. van Compernolle, D. (1992). DSP techniques for speech enhancement. Proc. Speech Processing in Adverse Conditions, Cannes-Mandelieu, pp. 21–30. Young, R. W., and Kingsbury, N. G. (1993). Frequency domain motion estimation using a complex lapped transform. IEEE Trans. Image Processing 2(1), 2–17. Zelinski, R., and Noll, P. (1977). Adaptive transform coding of speech signals. IEEE Trans. Acoustics, Speech, and Signal Processing ASSP-25(4), 299–309. Ziemer, R. E., Tranter, W. H., and Fannin, D. R. (1989). Signals and Systems: Continuous and Discrete. New York: Macmillan.

ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 128

On Fuzzy Spatial Distances ISABELLE BLOCH Ecole Nationale Supe´rieure des Te´le´communications, De´partement TSI, CNRS URA 820, 46 rue Barrault, 75013 Paris, France

I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II. Some Views on Space and Distances . . . . . . . . . . . . . . . . . . A. Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Linguistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Human Perception . . . . . . . . . . . . . . . . . . . . . . . . . . D. Cognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III. Spatial Fuzzy Distances: General Considerations . . . . . . . . . . . A. Spatial Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . B. Representation Issues. . . . . . . . . . . . . . . . . . . . . . . . . C. Types of Distances and Problems . . . . . . . . . . . . . . . . . . D. General Principles for Deﬁning a Fuzzy Distance . . . . . . . . . 1. Generalizing a Crisp Distance to a Fuzzy One . . . . . . . . . 2. Distances from Similarity . . . . . . . . . . . . . . . . . . . . . 3. Distances from Set Relationships . . . . . . . . . . . . . . . . 4. Distances from Other Relationships . . . . . . . . . . . . . . . 5. Symbolic Approaches . . . . . . . . . . . . . . . . . . . . . . . E. Properties of Distances and Requirements for Spatial Distances . IV. Geodesic Distance in a Fuzzy Set . . . . . . . . . . . . . . . . . . . . A. Fuzzy Geodesic Distance Deﬁned as a Number . . . . . . . . . . B. Fuzzy Geodesic Distance Deﬁned as a Fuzzy Number . . . . . . V. Distance from a Point to a Fuzzy Set . . . . . . . . . . . . . . . . . . A. As a Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. As a Fuzzy Number . . . . . . . . . . . . . . . . . . . . . . . . . VI. Distance between Two Fuzzy Sets. . . . . . . . . . . . . . . . . . . . A. Comparison of Membership Functions . . . . . . . . . . . . . . . 1. Functional Approach . . . . . . . . . . . . . . . . . . . . . . . 2. Information Theoretic Approach . . . . . . . . . . . . . . . . 3. Set Theoretic Approach. . . . . . . . . . . . . . . . . . . . . . 4. Pattern Recognition Approach . . . . . . . . . . . . . . . . . . B. Accounting for Spatial Distances . . . . . . . . . . . . . . . . . . 1. Geometrical Approach . . . . . . . . . . . . . . . . . . . . . . 2. Morphological Approach . . . . . . . . . . . . . . . . . . . . . 3. Tolerance-Based Approach . . . . . . . . . . . . . . . . . . . . 4. Graph Theoretic Approach. . . . . . . . . . . . . . . . . . . . 5. Histogram of Distances . . . . . . . . . . . . . . . . . . . . . . VII. Spatial Representations of Distance Information . . . . . . . . . . . A. Spatial Fuzzy Sets as a Representation Framework . . . . . . . . B. Spatial Representation of Distance Knowledge to a Given Object

51

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52 54 54 57 59 60 63 63 64 64 67 67 69 69 70 70 71 75 75 77 80 80 81 85 86 86 88 89 93 93 94 95 99 100 101 104 104 105

Copyright ß 2003 Elsevier Inc. All rights reserved. 1076-5670/2003 $35.00

52

ISABELLE BLOCH

VIII. Qualitative Distance in a Symbolic Setting . A. Morpho-Logics . . . . . . . . . . . . . . B. Distances in a Qualitative Setting . . . . IX. Conclusion . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

108 109 113 114 115

I. INTRODUCTION The aim of this chapter is to discuss several deﬁnitions related to spatial fuzzy distances, based on existing works and on new proposals, with respect to their properties, the type of information they represent, and the questions they allow us to answer. The wide literature on fuzzy similarities, dissimilarities, and distances is rather silent on methods dealing with spatial information, and, unfortunately, not all approaches are suitable to this purpose. We restrict ourselves here to the ones that concern spatial information. The interest in relationships between spatial objects has been highlighted in very diﬀerent types of works: in vision, for identifying shapes and objects, in database system management, for supporting spatial data and queries, in artiﬁcial intelligence, for planning and reasoning about spatial properties of objects; in cognitive and perceptual psychology; and in geography, for geographic information systems. All these applications converge towards spatial reasoning. According to the semantical hierarchy proposed in [105], we consider here metric relationships (corresponding to level 4 of this hierarchy). Many authors have stressed the importance of topological relationships (which include part–whole relationships such as inclusion, exclusion, and adjacency), e.g., [2,3,48,52,103,132,135,153]. But distances and directional relative position (constituting the metric relationships) are also important, since positional information is a basic cognitive spatial concept which plays a central role in all applications where spatial knowledge is involved (see, e.g., [49,66,75,102,105,108,128]). Usually vision and image processing make use of quantitative representations of spatial relationships. In a purely quantitative framework, spatial distances are well deﬁned. But they need a precise knowledge of the objects and of the types of questions we want to answer. These two constraints can be relaxed in a semiqualitative framework, using fuzzy sets. This allows one to deal with imprecisely deﬁned objects with imprecise questions such as ‘‘are these two objects near to each other?’’ and to provide evaluations that may be imprecise too, which is useful for several applications where spatial reasoning under imprecision has to be considered.

ON FUZZY SPATIAL DISTANCES

53

Fuzzy set theory ﬁnds in spatial information processing a growing application domain. This may be explained not only by its ability to model the inherent imprecision of such information (such as in image processing, vision, and mobile robotics) together with expert knowledge, but also by the large and powerful toolbox it oﬀers for dealing with spatial information under imprecision [17]. This is in particular highlighted when spatial structures or objects are directly represented by fuzzy sets. If even less information is available, we may have to reason about space in a purely qualitative way, and the symbolic setting is then more appropriate. In artiﬁcial intelligence, mainly symbolic representations are developed and several works addressed the question of qualitative spatial reasoning (see [155] for a survey). For instance, in the context of mereotopology powerful representation and reasoning tools have been developed, but are merely concerned by topological and part–whole relationships, very little by metric ones. This chapter contains a contribution on this aspect too, in the context of modal logics. Limitations of purely qualitative spatial reasoning have already been stressed in [66], as well as the interest in adding semiquantitative extension to qualitative value (as done in the fuzzy set theory for linguistic variables [61,162]) for deriving useful and practical conclusions (as for recognition). Purely quantitative representations are limited in the case of imprecise statements, and of knowledge expressed in linguistic terms. Both quantitative and qualitative knowledge can be integrated, using semiquantitative (or semiqualitative) interpretation of fuzzy sets. As already mentioned in [73], this allows one to provide a computational representation and interpretation of imprecise spatial constraints, expressed in a linguistic way, possibly including quantitative knowledge. Therefore the fuzzy set framework appears central in this context. Links between mathematical morphology operations (mainly dilation) and several types of distance are well established in the quantitative case (for crisp objects). These links can be exploited to deﬁne fuzzy spatial distances or qualitative ones based on fuzzy and logical dilations. Therefore this framework allows one to represent in a uniﬁed way spatial distances in various settings: a purely quantitative one if objects are precisely deﬁned, a semiqualitative one if objects are imprecise and represented as spatial fuzzy sets, and a qualitative one for reasoning in a logical framework about space. This is made possible because of the strong algebraic structure of mathematical morphology, that ﬁnds equivalents in set theoretical terms, fuzzy operations, and logical expressions. Therefore several deﬁnitions will be proposed based on mathematical morphology operations. This chapter is organized as follows. In Section II we present some views on space and distances in diﬀerent domains, not necessarily technical ones.

54

ISABELLE BLOCH

In Section III we present the general framework for deﬁning fuzzy distances: we deﬁne what we call spatial fuzzy objects and then we discuss the diﬀerent possible representations for distances between imprecisely deﬁned objects; general principles for deﬁning fuzzy distances are summarized, and ﬁnally the properties required when dealing with spatial information are discussed. The problem of deﬁning the distance between two points in a fuzzy set in a geodesic sense is addressed in Section IV. In Section V we deﬁne the distance from a point to a fuzzy set, based on dilation. An important section (Section VI) is dedicated to the distances between fuzzy sets and a classiﬁcation is proposed, along with a discussion on their ability to manage spatial information. Then we propose a spatial representation of distance information to a given fuzzy set in Section VII. Finally, purely qualitative distances are modeled in a logical framework in Section VIII, using a modal logic deﬁned from morphological operators applied on logical formulas.

II. SOME VIEWS

ON

SPACE

AND

DISTANCES

The issue of perception and representation of the space and of spatial relationships like distances and the issue of spatial reasoning have been addressed by researchers in many communities. This can be partly explained by the fact that spatial knowledge is fundamental to common-sense knowledge. In this section we summarize some views on space and distances. They are but some examples, and should deﬁnitely not be considered as an exhaustive review. A. Philosophy The philosophical thinking about space and spatial concepts was inﬂuenced by various theories or beliefs related to cosmology, science, and religion. In this section, we point out a few philosophical views of space. From Pythagoras to Zeno, the concept of space was linked to the ﬁrst developments in arithmetics and Pythagorian geometry. The famous Zeno’s paradox highlights the limits of the Pythagorian space based on an inﬁnite subdivision possibility. Democritus tried to overcome these diﬃculties by reducing the space to the inﬁnite empty room which surrounds atoms. Plato and Aristotle did not accept this mechanism. Plato considered the concept of receptacle as the original cosmic space. Aristotle identiﬁed space with the places and limits of bodies. Stoicians and Epicurians on the contrary saw space as an inﬁnite and corporal extension, which extends outside the world limits.

ON FUZZY SPATIAL DISTANCES

55

With the development of modern mathematical science of nature, the concept of space became more substantial and gained in autonomy. Rene´ Descartes (1596–1650) considered that spatial extension is speciﬁc to material entities, governed only by the laws of mechanics. Isaac Newton (1642–1727) introduced the absolute space and ‘‘sensorium Dei,’’ which was later criticized by Leibniz. David Hume (1711–1776) reduced space to a pure psychological function, leading Kant to advocate in favor of the objectivity of space. Samuel Clarke (1675–1729), Immanuel Kant (1724–1804), and Wilhelm Leibniz (1646–1716) debated about the intrinsic nature of space. Leibniz’s arguments were that:

God does not need a sense organ to perceive objects (this view was in opposition to Newton’s view of space as God’s boundless uniform sensorium); space cannot be an absolute reality; motion and position are real and detectable only in relation to other objects, not in relation to space itself since space itself represents no object (space becomes then itself an unnecessary hypothesis). Clarke replied to the last point and argued that motion is detectable in relation to space itself, for an object accelarating or rotating alone in a void betrays the eﬀect of forces that exist in relation to no other object.1 Kant considered space as absolute. This viewpoint was accepted until the emergence of relativity. He argued that asymmetrical objects and their mirror-imaged counterparts are genuinely and physically diﬀerent. No rotation in 3D space can turn one into the other (they could be rotated to each other in 4D space), which shows that space itself is real and independent of the objects. But he proposed reconciling Newton and Leibniz by considering that space is absolute and real for objects in experience, but really nothing among things in themselves. He considered that Euclid’s axioms of geometry are not logically necessary (and could be denied), but that they are known prior to experience and depend on our intuition of space. He developed antinomies to show that contradictory arguments can be advocated to consider space and time as ﬁnite or inﬁnite (neither one nor the other can be proved) [96]. Hermann Helmoltz (1821–1894) [85] considered space as the ‘‘necessary form of outer intuition’’ prior to experience. He also related the perception of space to movement: the spatial ordering of objects is perceived through the moving sensations. He pointed out the disagreement between the conception of intuition and the training (in analytical methods, 1

http://www.friesian.com/space.htm.

56

ISABELLE BLOCH

perspective constructions, optical phenomena) required to represent spatial relationships in meta-mathematical spaces. For him Kant’s proof of the transcendental nature of the geometrical axioms is untenable, and he considered that these axioms are subject to proof or disproof by experience. Henri Poincare´ (1854–1912) [130] had an empiricist point of view and considered that spatial knowledge is mainly derived from motor experience. In this respect, localizing a spatial entity means representing mentally the necessary movements to reach this entity. But brain maturation also plays an important role. Poincare´ argued in favor of the relativity of space.2 More speciﬁcally, concerning distances he claimed that we cannot say that we know the distance between two points, since it can undergo strong variations that we may not perceive if other distances vary in the same proportions. There is no direct intuition of space, of distance, of magnitude, but only relations to a measuring instrument, in particular our own body. This is related to an understanding of space from motor experience mentioned above. A point is thus deﬁned as the succession of movements required to reach it. Moreover space is not homogeneous, since diﬀerent points cannot be considered as equivalent if the cost to reach them is not the same. Henri Bergson (1859–1941) addressed the problem of space and time through the notion of multiplicity. He considered two types of multiplicity, a numerical one which implies space as one of its conditions, and a qualitative one which implies time as one of its conditions. As opposed to Kant, he considered an ideal space not as a property of things but as in intellectual synthesis. Intuition gives access to the pure duration, in opposition to the spatialized time. One of Bergson’s theses is that a position in space can be considered as an instantaneous cut of the movement, but the movement is more than a sum of positions in space. But this goes even further, and another thesis is that a movement is a cut in time, and should be seen from a temporal perspective rather than from a spatial one. Albert Einstein (1879–1955) considered that geometry is linked to the sensible and perceptible space. The geometrical conﬁguration of the world itself becomes relative, it depends on the distribution of masses and on their speed, and is better described by a non-Euclidean, Riemanian geometry [68]. Because of the scientiﬁc revolution of Einstein, the concept of space became a real topic of debate between philosophers and scientists, and several attempts were made to conciliate a philosophical explanation of space and results from physics.

2

http://www.marxists.org/reference/subject/philosophy/works/fr/poincare.htm.

ON FUZZY SPATIAL DISTANCES

57

In the meantime, purely philosophical views of space were developed by the phenomenologists and the existentialists: Edmund Husserl (1859–1938), Jean-Paul Sartre (1905–1980), Martin Heidegger (1889–1976). Many other philosophers have considered the question of space and distance. Mentioning all schools and thoughts is outside the scope of this chapter, although further investigations in this direction could certainly be both fascinating and helpful.

B. Linguistics Natural languages usually oﬀer a rich variety of lexical terms for describing the spatial location of entities. These terms are not only numerous, they also concern all lexical categories, such as nouns, verbs, adjectives, adverbs, and prepositions [4]. The domain of linguistics is a source of inspiration of many works on qualitative spatial information representation and qualitative spatial reasoning [49]. Modeling qualitative spatial relations strongly relies on the way these relations are expressed verbally. Several properties are exhibited, such as the asymmetry of some expressions, the non-bijective relation between language and spatial concepts (in particular for prepositions [4,86,152]), and the interaction between distances and orientation [86,147]. Another important characteristic of linguistic expressions is the imprecision attached to ternary (or more) spatial relations (for instance being among people), but also to binary ones. Usually the context allows one to deal with such expressions and the linguistic statements are generally clear and informative enough to prevent misunderstanding. A remarkable feature is that representation and communication are then achieved without using numbers [4]. Conversely, apparently precise statements (for instance containing crisp numbers) should not always be understood as really precise, but rather as order of magnitudes. Let us consider for instance, the sentence ‘‘Paris and Toulouse are at a distance of 700 km.’’ The number 700 should not be considered as an exact value. It gives an idea of the distance, and its interpretation is subject to some considerations such as the areas of Paris and of Toulouse that are really concerned and the way to travel from one city to the other. Too precise statements can even become ineﬃcient if they make the message too complex. This appears typically in the problem of route description for helping navigation and pathﬁnding. The example of giving directions in Venice is particularly eloquent [58].

58

ISABELLE BLOCH

Moreover, the way to describe spatial situations, the vision and the representation of space are not ﬁxed and are likely to be modiﬁed depending on perceptual data and on discourse situation [4]. In linguistic statements about space and distance, the geometrical terms of the language that are involved in these statements are usually not suﬃcient to get a clear meaning. The statement context is also of prime importance, as well as the functional properties of the considered physical entities. This appears, for instance, in the use of prepositions, where, for example, the shape of an object inﬂuences the interpretation of a preposition preceding the name of this object. In [4], three levels are therefore distinguished for analyzing and representing the meaning of spatial expressions:

geometrical level: it concerns the objective space; functional level: it accounts for the properties of the entities described in the text and for the nongeometrical relations; pragmatic level, including the underlying principles for a good communication.

Languages exhibit strong diﬀerences in their spatial terms. This concerns the way to partition space, the terms describing motion events, and the preferred lexical categories. For instance, French, and other Romance languages, shows a typological preference for the lexicalization of the path in the main verb. On the contrary in Germanic and Slavic languages, the path is rather encoded in satellites associated to the verb (particle or preﬁx) [148]. Another subdivision considers a linguistic expression as composed of a theme, a frame of reference, and a medium. The medium can typically be related to distance, quality, etc. Three levels are then distinguished [59]:

thematic segmentation, involving topology and qualitative thresholds (for instance ‘‘close to’’), with a possible multiscale aspect; pseudo-analog representation, involving a metric; knowledge. The multiscale aspect allows us to deal with diﬀerent levels of granularity. This overcomes some of the limits of approaches which have a ﬁxed granularity and cannot properly manage both large-scale and close-range information, and of approaches which deal with inﬁnitesimals but are faced with Xeno’s paradox. An interesting point that is worth mentioning to conclude this section is the importance of spatial metaphors in several natural languages, which allow us to communicate knowledge and information that would be diﬃcult to communicate otherwise.

ON FUZZY SPATIAL DISTANCES

59

It is also interesting to consider how space is used in sign language: it appears that two types of iconicity are used: one (called imagistic) where the signing space is directly used to the spatialization of objects with respect to the ground and to some landmarks; and a second one, called diagrammatic, which does not reproduce the real space but rather conceives and constructs it as a diagram. Both types can be combined to use the space [56,140].

C. Human Perception A number of factors inﬂuence the perception of distance, leading to diﬀerent measures [49]:

Purely spatial measures, in a geometric sense, give rise to ‘‘metric distances,’’ and are related to intrinsic properties of objects. It should be noted that these characteristics do not involve only purely geometrical distances, but also topological, size, and shape properties of objects. Temporal measures lead to distances expressed as travel time, and can be considered of extrinsic type, as opposed to the previous class. This advocates for treating space and time together (which will not be done in this chapter). Economic measures, in terms of costs to be invested, are also of extrinsic type. Perceptual measures lead to distance of deictic type. They are related to an external point of view, which can be concrete or just a mental representation, which can be inﬂuenced by environmental features, by subjective considerations, leading to distances that are not necessarily symmetrical. The discourse situation also plays a role at this level, as mentioned above for linguistic aspects. As mentioned in [74,83], the perception of distance between objects also depends on the presence or absence of other objects in the environment. If there are no other objects, the perception and human reasoning are mainly of geometrical type and distances are absolute. On the contrary when there are other objects, the perception of distance becomes relative. The size of the area and the frame of reference also play a crucial role in the perception of distances [49], in particular by deﬁning the scale and the upper bound of the perceived distances. Perception is therefore not scale independent [123], while language is to a large extent scale independent [147].

60

ISABELLE BLOCH

Finally, attractiveness of the objects strongly aﬀects the perception of proximity [83].

D. Cognition Spatial reasoning has often to deal with both quantitative measures and qualitative statements. The advantage of quantitative measures lies in their absolute meaning. On the contrary, qualitative information is dependent on the context. However, qualitative information is easily handled by humans and is often more meaningful and eloquent, and therefore preferred [49,83]. This raises the question of links between quantitative data and qualitative information, that is largely addressed in the fuzzy set community. The dependence on the context of qualitative information is particularly obvious when spatial distances are concerned. The meaning of ‘‘A is far from B’’ depends on the relative size of A and B, on other scale factors, on the type of deduction and consequence we expect from this statement (for instance how I can go from A to B), etc. The translation of this statement in a quantitative value can lead to completely diﬀerent results depending on this context. For instance, saying that my preferred bookstore is far from my house can be understood as a distance of about a few hundred meters to a few kilometers, while saying that my cousin lives far away can be understood as a distance of about a few hundred to a few thousands kilometers. Such diﬃculties are related to the linguistic aspects mentioned before, as well as to the subjectivity of perception (in particular concerning the attractiveness of the objects). The cognitive understanding of a spatial environment, in particular in large-scale spaces, is issued from two types of processes [41,49,84]:

route knowledge acquisition, which consists in learning from sensorimotor experience (i.e., actual navigation) and implies an order information between visited landmarks; survey knowledge acquisition, from symbolic sources such as maps, leading to a global view (‘‘from above’’) including global features and relationships, which is independent of the order of landmarks. During development and growth, children ﬁrst acquire route knowledge and have a local perception and representation of space. They acquire survey knowledge later, when they become able to perceive space from a more global perspective. The ability to reason about metrical properties of space comes at a very late stage of development [13,67,129,143]. The mapping between spatial words and basic spatial concepts does not appear to be universal, and languages diﬀer in their partitioning of space.

ON FUZZY SPATIAL DISTANCES

61

Children are able to distinguish between the diﬀerent spatial categories of their own language at the age of 18–24 months. Such diﬀerences between languages can also be observed in the representation of motion events. These two processes can be observed in neuroimaging [121]. For instance, a right hippocampal activation can be observed for both mental navigation and mental map tasks. A parahippocampal gyrus activation is additionally observed only for mental navigation, when route information and object landmarks have to be incorporated. Moreover, a mental simulation of a subject before reproducing a path from memory aﬀects both map-like and route-like representations of the environment, and it allows the subject to better reproduce the path [154]. This is mostly observed for simple shapes, suggesting that the internal representation of space depends on geometric properties of the environment. Experiments in the case of sensory conﬂicts between visual and nonvisual information have been performed in [107] and show that either visual or nonvisual information can be used according to the task and to the sensory context. There are therefore at least two cognitive strategies of memory storage and retrieval, for mental simulation of the same path. As for the internal representation of space in the brain, a distinction is usually made between egocentric and allocentric representations [49,124]. Although the notion of ‘‘map in the head’’ has recognized limitations as a cognitive theory, it is still quite popular, and corresponds to the allocentric representations. It is important to note that the psychological space does not need to mirror the physical space. As shown in [9], the egocentric route strategy needs the memory of the associated movements with landmarks and episodes (kinesthesic memory). Solving Pythagoras’s theorem from memory is possible using vestibular information, but requires one to convert an egocentric representation into an allocentric representation. The mental representation is also combined with other factors in cognitive processes about space. For instance, questions such as ‘‘where am I?’’ can ﬁnd diﬀerent answers corresponding to [125]:

autobiographical memory, semantic memory, stress and emotion, egocentric spatial representation.

Cognitive studies report that distance and direction are quite dissociated. On the contrary, as mentioned for the perception, from a cognitive point of view, time and space cannot be easily separated.

62

ISABELLE BLOCH

The importance of the frame of reference, highlighted in all domains, has also a cognitive ﬂavor: cognitive studies have shown that multiple frames of reference are usually used and appear as necessary for understanding and navigating in a spatial environment [49,104]. Changes of viewpoint are also strongly involved in social interactions, and are required in order to understand and memorize where others are glancing [9]. These cognitive concepts have been intensively used in several works in the modeling and conception of geographic information systems (GIS), where spatial information is the core [118,128]. Let us just mention two examples. In [110,141], a fuzzy cognitive map framework is introduced for GIS, inspired by the cognitive aspects of space and spatial relationships. It aims at integrating quantitative and qualitative data, taking into account the fuzziness of relations such as spatial distances, and at providing a decision support producing cognitive descriptions similar to the ones a human expert could derive and use. Another example is the geocognostics framework proposed in [67], which aims at integrating in a common framework both formal geometric representations of spatial information and formal representations of cognitive processes. The idea is to express views and trajectories in cognitive terms and then to re-interpret them geometrically. Another ﬁeld where cognitive aspects about space inspire the development of frameworks and systems is the domain of mobile robotics. The work by Kuipers is fundamental in this respect [103–105]. His spatial semantic hierarchy is a model of knowledge of large-scale space including both qualitative and quantitative representations, and is strongly inspired by the properties of the human cognitive map. It aims at providing methods for robot exploration and map building. The hierarchy consists of sensory, control, causal, topological and metrical levels. As mentioned in the introduction, we are concerned in this chapter mainly by the last level. A new approach was proposed in [76], called conceptual spaces. These can be considered as a representation of cognitive systems, intermediate between the high-level symbolic representations and the subconceptual connectionist representations [1,77]. They emphasize orders and measures, and a key notion is distances between concepts, leading to geometrical representations, but using quality dimensions. They oﬀer a nice and natural way to model categories, to express similarities. Distances are therefore put to the fore in such spaces. Ga¨rdenfors shows that ‘‘a conceptual mode based on geometrical and topological representations deserves at least as much attention in cognitive science as the symbolic and

ON FUZZY SPATIAL DISTANCES

63

the associationistic approaches,’’ and his book is therefore about the ‘‘geometry of thoughts’’ [77].

III. SPATIAL FUZZY DISTANCES: GENERAL CONSIDERATIONS A. Spatial Fuzzy Sets In this chapter we deal with spatial information, represented by speciﬁc fuzzy sets that model spatial objects and the imprecision attached to them. They are deﬁned as follows. Let us denote by S the spatial domain (usually Rn or Zn in the discrete case). We denote by x, y, etc., the spatial variables, i.e., points of S (called pixels or voxels in the discrete case). We denote by dS (x, y) the spatial distance between two points x and y of S (related to the Cartesian space they belong to and independent of their membership of any possible fuzzy set). Generally dS is taken as the Euclidean distance on S. A crisp object is, as usual, a subset of S. Similarly, a fuzzy object is deﬁned as a fuzzy subset of S. A fuzzy object is deﬁned bi-univocally by its membership function, denoted by Greek letters ( , , etc.). A membership function characterizing a fuzzy object is therefore a function, say, , from S into [0, 1]. For each x in S, (x) is a value in [0, 1] which represents the membership degree of the point x to the fuzzy set . Such a representation allows for a direct representation of the spatial information. We denote by F the set of all fuzzy sets deﬁned on S. For any two fuzzy objects and , we denote by d(, ) their distance. The deﬁnition of distances between fuzzy objects is the main scope of this chapter. We will also brieﬂy address the question of deﬁning the distance from a point to a fuzzy set, and the distance between two points of a fuzzy set, in a geodesic sense. Since we are mainly interested here in the type of information that is included in the various distance deﬁnitions, we assume that the fuzzy sets satisfy the necessary properties such that all mathematical expressions are well deﬁned. For instance in the continuous case, several deﬁnitions assume that the membership functions are Lebesgue integrable. This will not be speciﬁed in the following. Moreover, in most cases we will restrict the discussion to the discrete bounded case (i.e., membership functions deﬁned on Zn and having a bounded support), since this is the most useful case in applications such as image processing, mobile robotics, and geographic information systems.

64

ISABELLE BLOCH

B. Representation Issues Since the spatial objects we consider are imprecisely deﬁned and therefore represented as fuzzy sets, there are several options to represent distances between such objects. Although the most common representation is to consider a distance as a number in R þ (or more speciﬁcally in [0, 1] for some deﬁnitions), diﬀerent representations may be found more suitable for representing imprecision: if the objects are imprecise, we may expect that the distance between them is imprecise too. This argument is advocated in particular in [62,138], and also in [20,120]. Then the distance is better represented as a fuzzy set, and more precisely as a fuzzy number or a fuzzy interval (a convex upper semicontinuous fuzzy set on R þ having a bounded support). In [138], Rosenfeld deﬁnes two concepts that will be used in the following. One is distance density, denoted by ( , ) and the other distance distribution, denoted by ( , ), both being fuzzy sets on R þ . They are linked together by the following relation: Z

n

ð, ÞðnÞ ¼

ð, Þðn0 Þ dn0 :

ð1Þ

0

While the distance distribution value ( , )(n) represents the degree to which the distance between and is less than n, the distance density value ( , )(n) represents the degree to which the distance is equal to n. A simpliﬁed representation can also be considered, as intervals for instance. It may be easier to handle while keeping some information on imprecision through the length of the interval. The concept of distance can be represented as a linguistic variable. This assumes a granulation [160] of the set of possible distance values into symbolic classes such as near and far, each of these classes being deﬁned as a fuzzy set. This approach has been taken, for example, in [8,12,83,102]. Spatial relations are then deﬁned as restrictions on linguistic variables (e.g., [83]). Then they can be used to produce automatically scene descriptions using fuzzy rules (e.g., [98]). Finally, purely symbolic expressions can be used as logical formulas. C. Types of Distances and Problems Several problems can be addressed when fuzzy distances are concerned. We distinguish three of them:

distances between two points in a fuzzy set,

ON FUZZY SPATIAL DISTANCES

65

distances from a point to a fuzzy set, distances between two fuzzy sets.

The ﬁrst type of distance is less treated in the literature. In the crisp case, this kind of distance is widely used in classical image processing and pattern recognition [142]. The deﬁnition of its fuzzy equivalent should lead to the design of new tools for generalizing classical methods when imprecision in structures and images has to be taken into account. We proposed in [15] to deﬁne a distance between two points in a fuzzy set as a fuzzy generalization of the concept of geodesic distance in a crisp set, by introducing fuzzy connectivity. Typical applications in fuzzy spatial information processing consist in ﬁnding the best path in the geodesic sense in a spatial fuzzy set representing some objective function (satisﬁability of a property, security areas around objects, etc.). Fuzzy geodesic distance is also the basis for fuzzy geodesic operators, e.g., morphological ones [18,21]. This type of distance is detailed in Section IV. Distances from a point to a fuzzy set have not received much attention in the literature, although they are useful in several domains: they can be used for classiﬁcation purposes where a point has to be attributed to the nearest fuzzy class, or when considering distance from a point to the complement of a fuzzy set , we obtain the basic information for computing a fuzzy skeleton of . Additionally, they may serve as a basis for deﬁning distances between two fuzzy sets. We deﬁned such distances based on fuzzy mathematical morphology in [14]. They are mentioned in Section V. The main focus of this chapter is the third kind of distance (between two fuzzy sets), and extends previous work [20]. It is the most widely addressed in the literature, but not often in the context of spatial objects. The speciﬁcities of spatial information call for a study of the existing deﬁnitions in terms of the spatial properties they include, and even for the deﬁnition of new ones. Applications of such distances cover a very large ﬁeld, including image registration, assessment of relationships between image components, comparison of imprecise spatial objects, structural pattern recognition, etc. Roughly speaking, these applications can be grouped into two classes. The ﬁrst class deals with distances dedicated to the comparison of shapes, these shapes being possibly contained in diﬀerent images, or represent one image object and one model object. The concerned applications are related to registration and to recognition. The second class deals with distances between two objects in the same space, and provides measures for quantifying how far one object is from the other. It can also serve for modelbased pattern recognition, as a relationship between image (respectively model) objects. For instance, if we consider a graph-based recognition method, where the objects of the scene are the nodes of the graph, then

66

ISABELLE BLOCH

distances of the ﬁrst class provide a way to compare nodes in two graphs, while distances of the second class can be considered as attributes of the arcs between two nodes in each graph [7,127]. While in the crisp case geodesic distance and distance from a point to a compact set are well deﬁned, several deﬁnitions exist for distances between two sets. The main ones are the following.

Nearest point or minimum distance: dN ðX, YÞ ¼

inf

x2X,y2Y

dS ðx, yÞ,

ð2Þ

where X and Y are two crisp subsets of S (in the ﬁnite case, the inﬁmum is replaced by a minimum). Maximum distance: dM ðX, YÞ ¼ sup dS ðx, yÞ:

ð3Þ

x2X,y2Y

Average distance: X 1 dðx, yÞ, jXkYj x2X,y2Y

ð4Þ

1 X 1 X dðx, YÞ þ dðy, XÞ, jXj x2X jYj y2Y

ð5Þ

dA ðX, YÞ ¼ or in a diﬀerent form: dA0 ðX, YÞ ¼

where d(x, Y) denotes the distance from x to Y: dðx, YÞ ¼ inf dS ðx, yÞ:

ð6Þ

y2Y

Hausdorﬀ distance: "

#

dH ðX, YÞ ¼ max sup dðx, YÞ, sup dðy, XÞ : x2X

ð7Þ

y2Y

Note that only the Hausdorﬀ distance is a true distance, satisfying all properties of a metric (see Section III.E). In all these deﬁnitions, the objects are supposed to be given, and the aim is to evaluate the distance between them.

ON FUZZY SPATIAL DISTANCES

67

Another type of question is the satisfaction of some given distance relationship between two objects or with respect to a given object. For instance, we may want to answer questions such as: to what extent are two objects near to each other? or which are the areas of the space that are at a distance of about 10 from a given object? The ﬁrst type of question only requires a comparison measure between the fuzzy distance and the fuzzy set deﬁning the semantics of ‘‘near.’’ On the contrary, the second type of question requires a completely diﬀerent approach, since only one set is known. We proposed in [22] spatial representations of spatial relations in order to solve such problems. This approach is detailed for the speciﬁc case of distances in Section VII. Finally, if little information is available, it may be useful to handle distance information in a completely qualitative way, using symbolic approaches such as formal logics. This will be addressed in Section VIII, following the formalism proposed in [23,24].

D. General Principles for Defining a Fuzzy Distance In this section we brieﬂy summarize the main approaches that can be followed in order to deﬁne a fuzzy distance. These include:

approaches that rely on the deﬁnition of a crisp distance and try to generalize them, approaches that infer a distance from a similarity function, approaches that deduce a distance from set relationships between both sets (or other types of relationships), symbolic approaches.

1. Generalizing a Crisp Distance to a Fuzzy One We ﬁrst consider the class of approaches to deﬁne a fuzzy distance that rely on the extension of a given crisp distance. They belong to the general problem of extending a relationship RB between two binary objects to its fuzzy equivalent R (fuzzy relationship between two fuzzy objects). Instances of the described methods to the case of distance are provided in Sections VI.A and VI.B. From -cuts. A way to deﬁne crisp sets from a fuzzy set consists in taking the -cuts of this set. Therefore one class of methods relies on the application of the relationship RB to each -cut. This gives rise to two diﬀerent ‘‘fuzziﬁcation’’ methods in the literature.

68

ISABELLE BLOCH

The ﬁrst fuzziﬁcation method consists in ‘‘stacking’’ the results obtained with binary operations on the -cuts: the fuzzy equivalent R of RB is deﬁned as (see, e.g., [29,60,102]): Z

1

Rð, Þ ¼

RB ð , Þ d ,

ð8Þ

0

where denotes the -cut of , or by a double integration as: Rð, Þ ¼

Z 1Z 0

1

RB ð , Þ d d :

ð9Þ

0

Other fuzziﬁcation equations are possible, like: Rð, Þ ¼ sup min ð , RB ð , ÞÞ 2½0,1

or

Rð, Þ ¼ sup ð RB ð , ÞÞ, ð10Þ 2½0,1

the ﬁrst of these equations being meaningful if RB takes values in [0, 1]. This approach has been applied to the deﬁnition of several fuzzy operations, for instance connectivity [137], fuzzy mathematical morphology [29], fuzzy adjacency [30], and of course distances [14,28,60] as will be seen later. As mentioned in [40,71] for instance, this approach has to be used with care in case of empty -cuts. The second fuzziﬁcation method is the extension principle [162], which leads in the general case to a fuzzy number (instead of a crisp number): 8n 2 VðRB Þ, Rð, ÞðnÞ ¼

sup

RB ð , Þ¼n

,

ð11Þ

where V(RB) denotes the image of RB, i.e., the set of values taken by RB (R þ or [0, 1] in the case of distances). Translating binary equations into fuzzy ones. Another way to proceed, in order to derive a fuzzy deﬁnition from a crisp one, consists in translating binary equations into their fuzzy equivalent: intersection is replaced by a t-norm, union by a t-conorm, sets by membership functions, etc. Examples can be found for deﬁning fuzzy morphology [29], fuzzy inclusion [144], etc. This translation is particularly straightforward if the binary relationship can be expressed in set theoretical and logical terms. This can be obtained in a natural way for several distances, like nearest point distance or Hausdorﬀ distance [14]. This remark endows methods based on mathematical morphology with a particular interest, since mathematical morphology is mainly based on set theory. This approach will be used in Section VI.B.

ON FUZZY SPATIAL DISTANCES

69

2. Distances from Similarity Considering that a distance can be derived formally from a similarity measure (see, e.g., [5,90,95,109,158,161]), then the problem amounts to deﬁning the similarity measure. This can be addressed using one of the previous methods, given a similarity between crisp sets. However, because of the links between similarity and pattern recognition problems, this approach is often used for comparing objects based on some features, possibly fuzzy ones, that are extracted from the spatial information in preliminary stages. Then the similarity concerns these features, and not the objects as spatial fuzzy sets. This may explain why this approach leads mainly to distances dealing with membership functions only (Section VI.A). Similarity-based approaches can beneﬁt from the existing algorithms for checking if a relation is a similarity, in particular if it satisﬁes the transitivity property (e.g., [131,149]).

3. Distances from Set Relationships Set relationships provide a lot of information for the comparison of objects, typically in the case where image objects have to be compared with some models or prototypes. Similar objects are expected to strongly overlap and to have reduced diﬀerences. We have chosen to present here the approach proposed in [37,136], where a very useful typology of comparison measures is proposed. In this work, a comparison measure is generally deﬁned as a function of three variables FS[ M( \ ), M( ), M( )], where M is a fuzzy set measure (e.g., fuzzy cardinality) and denotes a diﬀerence operator (such that ) ¼ ;, and 0 ) 0 ). This approach is closed to Tversky deﬁnitions [151]. Then speciﬁc types of comparison measures are deﬁned:

a similitude measure is a comparison measure such that FS(x, y, z) is nondecreasing with respect to x and nonincreasing with respect to y and z (this corresponds to the fact that two fuzzy sets are more similar if they have a greater intersection and less diﬀerence); a satisﬁability measure is a similitude measure such that FS(0, y, z) ¼ 0 FS(x, 0, z) ¼ 1, and which does not depend on z (this corresponds to the case where the ﬁrst object is considered as a reference to which the other is compared); an inclusion measure is a reﬂexive similitude measure such that FS(0, y, z) ¼ 0 and FS does not depend on z; a resemblance measure is a symmetrical and reﬂexive measure;

70

ISABELLE BLOCH

a dissimilarity measure is a comparison measure taking value 0 if ¼ , and such that FS is independent of x and increasing with respect to y and z.

A distance between two fuzzy sets can be derived from a dissimilarity measure, or from 1 FS if FS deﬁnes a similitude measure. Several distances that have been proposed in the literature can be classiﬁed from this point of view. 4. Distances from Other Relationships When distances are mainly used for comparing shapes, they may be derived from other relationships between objects, not only metric ones. Set relationships can be used as shown in the previous section, but also several other ones, like geometrical features extracted from the object or any other type of attribute, and topological relationships like ‘‘overlap’’ and ‘‘meet’’ [44,48,118]. Since such measures do not necessarily include information on the spatial distance, they are mainly found in the ﬁrst class of deﬁnitions (Section VI.A) and used for model-based pattern recognition, for approaches relying on prototypes, for applications like indexing and searching in image databases (e.g., [44,145]). Such methods are often related to similarity-based measures. 5. Symbolic Approaches We mean by ‘‘symbolic approaches’’ methods that try to deﬁne linguistic variables representing distances, or to reason with purely symbolic expressions (the last types of representation mentioned before). For instance in image processing, the problem amounts to deriving symbolic representations from the numerical information carried by the image and from computation on it (see, e.g., [102]). These representations then provide a kind of summarization of the image content related to metric information. Distance information can be represented using words such as ‘‘close,’’ ‘‘far’’ [72], which constitute the rougher level of granularity, or with further levels of granularity (e.g., ‘‘very close,’’ ‘‘close,’’ ‘‘medium,’’ ‘‘far,’’ ‘‘very far’’) [49]. Relative distances are also useful in qualitative reasoning, and use words such as ‘‘closer.’’ Dealing with such information can be performed either in the semiqualitative framework of fuzzy sets, where fuzzy sets are used to deﬁne the semantics of the linguistic values, or in a purely qualitative framework, using logics.

ON FUZZY SPATIAL DISTANCES

71

It should be noted that even a statement containing a precise value, as often used in the common language (the distance between town A and town B is 300 km), should often be considered as an imprecise statement, and should be preferably modeled as such.

E. Properties of Distances and Requirements for Spatial Distances For most applications such as pattern recognition, scene interpretation, and path planning, one basic property that should be satisﬁed by spatial distances is invariance under geometric transformation, in particular rigid ones. Therefore the ﬁrst required property is expressed as:

P0 invariance under rotation and translation.

Since the deﬁnitions summarized in this chapter do not always satisfy strictly the properties of a distance (or metric), we should rather speak of more general proximity functions. However, for the sake of simplicity we will keep the term distance. The main classes of proximity measures are recalled in this section. A metric is a positive function d such that:

P1 P2 P3 P4

8 2 F , dð, Þ ¼ 0 (reﬂexivity), 8ð, Þ 2 F 2 , dð, Þ ¼ 0 ) ¼ (separability), 8ð, Þ 2 F 2 , dð, Þ ¼ dð , Þ (symmetry), 8ð, , Þ 2 F 3 , dð, Þ dð, Þ þ dð , Þ (triangular inequality).

Several kinds of measures can be deﬁned with less requirements: a pseudo-metric is a function satisfying 1, 3, and 4 (separability does not necessarily hold), a semimetric satisﬁes 1, 2, and 3 (and not the triangular inequality), a semipseudometric satisﬁes only 1 and 3, etc. (see, e.g., [113]). For instance in the crisp case, the Hausdorﬀ distance is a metric, while the nearest point (or minimum) distance does not satisfy the separability property (since any two intersecting sets are at a zero minimum distance) or the triangular inequality, and is therefore a semipseudometric. Since distances may be derived from similarity measures, we recall here the deﬁnition of this concept. A similarity relation [161] is a function s taking values in [0, 1], such that: (1) 8 2 F , sð, Þ ¼ 1 (reﬂexivity), (2) 8ð, Þ 2 F 2 , sð, Þ ¼ sð , Þ (symmetry), (3) 8ð, , Þ 2 F 3 , t½sð, Þ, sð , Þ sð, Þ (t-transitivity, where t is a t-norm). A similarity relation is also called t-indistinguishability or t-equivalence.

72

ISABELLE BLOCH

If we set d ¼ 1 s, obviously d is a semipseudometric. The ﬁrst property corresponds to P1, the second one to P3. As for the last one, it can be expressed in terms of distance as:

P5 8ð, , Þ 2 F 3 , dð, Þ T½dð, Þ, dð , Þ , where T is the t-conorm dual to t.

If t ¼ min, then we also have: 8ð, , Þ 2 F 3 , dð, Þ max ½dð, Þ, dð , Þ which is a property of a hypermetric. If t is the Lukasiewicz t-norm (i.e., t(a, b) ¼ max(0, a þ b 1), and by duality the corresponding t-conorm is T(a, b) ¼ max(1, x þ y)), then d satisﬁes also the triangular inequality and is a pseudometric. Therefore P5 implies P4 at least for all t-conorms which are smaller than the Lukasiewicz one. If f is an additive generator (typically like the functions used for generating continuous Archimedian t-norms [63]), then d ¼ f s is a pseudometric (taking values in R þ ) if and only if the t-norm generated by f is less than t [5]. A similar relationship holds between a metric and a t-equality (i.e., a similarity such that s( , ) ¼ 1 if and only if ¼ ). From a topological point of view, the deﬁnition of a metric d on F induces a topology on F, and therefore a continuity. It has been studied for instance in [57] for the case of the fuzzy Hausdorﬀ distance. Partial results can also be obtained if d has less properties: if we set clðÞ ¼ f 2 F , dð, Þ ¼ 0g for d being a semipseudometric, then the function cl is a preclosure on F, which therefore deﬁnes a pretopology on F (see, e.g., [69,112]). Conversely, we may derive a semipseudometric from any (nonidempotent) adherence deﬁned on F. Some other properties, issued from the wide literature on fuzzy similarities, can be transposed to distances (see, e.g., [70,126]): P6 dð, Þ ¼ 1 Q SuppðÞ \ Suppð Þ ¼ ;, where Supp() denotes the support of (this property being meaningful for normalized distances). P7 8X S (X is crisp), dðX, X C Þ ¼ max ð, Þ2F 2 dð, Þ, which means that the values taken by d are bounded and the maximun value is attained on all crisp sets and their complements; if d is normalized in [0, 1], this maximum value is equal to 1. P8 monotony property:

( 8ð, , Þ 2 F , ) 3

dð, Þ dð, Þ dð, Þ dð, Þ

ON FUZZY SPATIAL DISTANCES

73

where the fuzzy subsethood is deﬁned as the relation on membership functions. P9 property of -distance: 8ð, Þ 2 F 2 , 8X S, dð, Þ ¼ dð \ X, \ XÞ þ dð \ X C , \ X C Þ,

which implies dð, Þ ¼ dð \ , [ Þ: P10 8ð, Þ 2 F 2 , dð, Þ ¼ dðC , C Þ, which is a property often required to deﬁne approximate proximity.

All these properties are expressed in the case where the distance is expressed as a crisp number. If imprecise representations are used, such as intervals and fuzzy numbers, these properties have to be adapted. For instance, saying a distance is equal to 0 will be replaced by [0,0] in the case of interval representations, and by a fuzzy number with a support and core3 reduced to {0} in case of fuzzy number representations. Some properties such as P4, P5, P8–P10 require addition, inclusion, or union of intervals or of fuzzy numbers. They can also be considered pointwise (in R þ ), for instance at each value n for ð, ÞðnÞ. Although we may speak about distances between spatial objects in a very general way, this expression does not make necessarily the assumption that we are dealing with true metrics. For several applications, it is not certain that all properties are needed. An important use of distances is related to the comparison of shapes, which reinforces the interest of deriving distances from similarities. The concept of similarities between objects, in particular spatial objects, contains some subjective aspects. As already stated by Poincare´ at the beginning of the twentieth century, and underlined by several authors in the fuzzy set domain (see, e.g., [64,94]), subjective similarities are not required to be transitive. This induces a loss of the triangular inequality in the derived distance. This question was raised in [50], and more precisely the authors suggested that if the t-transitivity is replaced by a pseudometric condition in order to deﬁne resemblance relations, then approximate equality is better modeled than when using a fuzzy equivalence relation. This point of view is controversial and has been discussed by several authors [33,34,51,92,101]. This unusual type of successive comments shows that there is no deﬁnite answer to the question of transitivity. Coming back to the spatial domain, typically for applications (in image interpretation for instance) where image objects have to be compared to 3 The support of a fuzzy set is the subset of points having nonzero membership values, while the core is the subset of points having a membership value equal to 1.

74

ISABELLE BLOCH

models, the triangular inequality is of no use, since the two arguments of the distance function belong to two diﬀerent sets of objects. For such applications, semimetrics or even semipseudometrics may be suﬃcient. We may even go further in this direction. Indeed, since a semipseudometric does not satisfy the separability property, the study of the equation d( , ) ¼ 0 can be exploited in terms of pattern recognition. For instance if we build classes according to prototypes, this equation can be used as a classiﬁcation rule: every object which is indistinguishable from a prototype will be added to the corresponding class. This has been developed in the context of pretopologies [36,69]. It is the non-idempotency of the adherence function in a pretopology that allows one to aggregate objects to a class. This is again an argument in favor of semipseudometrics. Moreover, when extending crisp distances to fuzzy ones, it is natural to expect that some properties may be lost. In particular it is diﬃcult to extend a crisp distance while keeping the triangular inequality. Considering the Hausdorﬀ distance, which is a true distance between two sets, i.e., satisfying properties P1–P4, it has been shown in [40] that under reasonable axioms it is not possible to deﬁne a fuzzy Hausdorﬀ distance that is a true distance. Another aspect that can be useful in image processing and pattern recognition is the link existing between semimetrics and fuzzy partitions derived from a t-indistinguishability relation. This clearly ﬁnds applications as soon as the recognition or classiﬁcation problem can be stated as the (fuzzy) partitioning of the set of objects. As for property P5, it is in general stronger than P4, and the above discussions apply a fortiori. On the contrary, property P0 (invariance under rigid transformations) is a strong requirement in most applications dealing with spatial distances. Property P1 (reﬂexivity) is often required and considered as quite natural. On the contrary, the separability property P2 is more diﬃcult to satisfy. As soon as the two objects are considered to play the same role in the evaluation of their distance, it is natural to require P3 (symmetry). Properties P6–P10 are meaningful only for some classes of distances, not involving directly spatial distances (Section VI.A), and accordingly for problems where an object is compared to a model object by means of distances. For these classes of distances, it can be also interesting to identify them as derived from particular forms of comparison measures, such as similitude, resemblance, and satisﬁability. For instance, property P6 is not desirable if the spatial distance has to be taken into account. Let us take as an example the case of two objects and such that SuppðÞ \ Suppð Þ ¼ ; and SuppðÞ \ Suppð þ tÞ ¼ ; where þ t denotes the translation of

ON FUZZY SPATIAL DISTANCES

75

by t. Then we may expect that dð, Þ 6¼ dð, þ tÞ, which is not possible under property P6. Similarly as for P6, property P7 is meaningful only when comparing the membership functions of the two objects. For instance if we erode a set X, we can expect that, in a spatial sense, dðEðXÞ, X C Þ > dðX, X C Þ, where E(X) denotes the erosion of X by a structuring element containing the origin of the space, and therefore d( X, XC ) cannot be maximal.

IV. GEODESIC DISTANCE

IN A

FUZZY SET

Although the concept of geodesy is very important for crisp sets and should be promising as well for fuzzy sets, this topic has not been much addressed in the literature. Beside our previous work [15], we could ﬁnd only one other work in the literature [139].

A. Fuzzy Geodesic Distance Defined as a Number We proposed in Ref. [15] original deﬁnitions for the distance between two points in a fuzzy set, extending the notion of geodesic distance. Among these deﬁnitions, one proved to have desirable properties and was therefore considered as better than the others. We recall here this deﬁnition and the main results we obtained. The geodesic distance between two points x and y represents the length of the shortest path between x and y that ‘‘goes out of as least as possible.’’ A formal deﬁnition of this concept relies on the degree of connectivity, as deﬁned by Rosenfeld [137]. In the case where S is a discrete bounded space (as is usually the case in image processing), the degree of connectivity in between any two points x and y of S is deﬁned as: i h c ðx, yÞ ¼ max min ðtÞ , Li 2L

t2Li

ð12Þ

where L denotes the set of all paths from x to y. Each possible path Li from x to y is constituted by a sequence of points of S according to the discrete connectivity deﬁned on S. We denote by L*(x, y) a shortest path between x and y on which c is reached (this path, not necessarily unique, can be interpreted as a geodesic path descending as least as possible in the membership degrees), and we denote by l(L*(x, y)) its length (computed in the discrete case from the

76

ISABELLE BLOCH μ

* l(L (x,y)) c μ (x,y) x

2

y x

S

d E (x,y)

x1

FIGURE 1. The geodesic distance in a fuzzy set between two points x and y in a 2D space.

number of points belonging to the path). Then we deﬁne the geodesic distance in between x and y as: d ðx, yÞ ¼

lðL* ðx, yÞÞ : c ðx, yÞ

ð13Þ

If c(x, y) ¼ 0, we have d ðx, yÞ ¼ þ1, which corresponds to the result obtained with the classical geodesic distance in the case where x and y belong to diﬀerent connected components (actually it corresponds to the generalized geodesic distance, where inﬁnite values are allowed). This deﬁnition corresponds to the weighted geodesic distance (in the classical sense) computed in the -cut of at level ¼ c ðx, yÞ. In this -cut, x and y belong to the same connected component (for the considered discrete crisp connectivity). This deﬁnition is illustrated in Figure 1. This deﬁnition satisﬁes the following set of properties (see [15] for the proof): positivity: 8ðx, yÞ 2 S 2 , d ðx, yÞ 0; symmetry: 8ðx, yÞ 2 S 2 , d ðx, yÞ ¼ d ðy, xÞ; separability: 8ðx, yÞ 2 S 2 , d ðx, yÞ ¼ 0 Q x ¼ y; d depends on the shortest path between x and y that ‘‘goes out’’ of ‘‘as least as possible,’’ and d tends towards inﬁnity if it is not possible to ﬁnd a path between x and y without going through a point t such that (t) ¼ 0; (5) d is decreasing with respect to (x) and (y);

(1) (2) (3) (4)

ON FUZZY SPATIAL DISTANCES

77

(6) d is decreasing with respect to c( x, y); (7) d is equal to the classical geodesic distance if is crisp. The triangular inequality is not satisﬁed, but from this deﬁnition it is possible to build a true distance, satisfying triangular inequality, while keeping all other properties. This can be achieved in the following way (see [15] for proof and details): d0 ðx, yÞ ¼ min t2S

lðL* ðx, tÞÞ lðL* ðt, yÞÞ þ : c ðx, tÞ c ðt, yÞ

Unfortunately this is computationally expensive. These properties are in agreement with what can be required from a fuzzy geodesic distance, both mathematically and intuitively. The deﬁnition proposed in [139] corresponds to one of the deﬁnitions proposed in [15] and is the length of the shortest path between the two considered points, the length being computed as the integral of the membership values along the path. Unfortunately, this deﬁnition does not meet all requirements we have here, since it does not satisfy the separability property and does not have the appropriate behavior with respect to the membership values (properties (4)–(6) in the preceding discussion). Indeed the best path can go through points with very low values (which tend to decrease the length), i.e., to go out of the set to some extent. However, one advantage of this distance is that it allows the authors in [139] to derive algorithms for computing the fuzzy distance transform.

B. Fuzzy Geodesic Distance Defined as a Fuzzy Number In the previous approach, the geodesic distance between two points is deﬁned as a crisp number (i.e., a standard number). It could be also deﬁned as a fuzzy number, taking into account the fact that, if the set is imprecisely deﬁned, geodesic distances in this set can be imprecise too (as mentioned in Section III.B). This is the scope of this section. One solution to achieve this aim is to use the extension principle, based on a combination of the geodesic distances computed on each -cut of . Let us denote by d ðx, yÞ the geodesic distance between x and y in the crisp set . Using the extension principle, we deﬁne the degree to which the geodesic distance between x and y in is equal to d as: 8d 2 Rþ , d ðx, yÞðdÞ ¼ supf 2 ½0, 1 , d ðx, yÞ ¼ dg:

ð14Þ

78

ISABELLE BLOCH

d μ(x,y) cμ (x,y)

d d S(x,y)

d

(x,y)

μc (x,y) μ

FIGURE 2. Typical shape of the fuzzy geodesic distance between two points in a fuzzy set, deﬁned as a fuzzy number.

This deﬁnition satisﬁes the following properties: (1) If > c ðx, yÞ, then x and y belong to two distinct connected components of .4 In this case, the (generalized) geodesic distance is inﬁnite. If we restrict the evaluation of d ðx, yÞðdÞ to ﬁnite distances d, then d ðx, yÞðdÞ ¼ 0 for d > dc ðx,yÞ . (2) Let dS ðx, yÞ denote the Euclidean distance between x and y. It is the shortest of the geodesic distances that can be obtained in any crisp set that contains x and y. This set can be, for instance, the whole space S, which can be assimilated to the -cut of level 0 (0). Therefore, for d < dS ðx, yÞ, we have d ðx, yÞðdÞ ¼ 0. (3) Since the -cuts are nested ( 0 for > 0 ), it follows that d ðx, yÞ is increasing in , for c ðx, yÞ. Therefore, d(x, y) is a fuzzy number, with a maximum value for dc ðx,yÞ , and with a discontinuity at this point. Its shape looks as shown in Figure 2. This deﬁnition can be normalized by dividing all values by c(x, y), in order to get a maximum membership value equal to 1. One drawback of this deﬁnition is the discontinuity at dc ðx,yÞ . It also corresponds to the discontinuity existing in the crisp case when x and y belong to parts that become disconnected. Further work aims at exploiting

4 Since c ðx, yÞ corresponds to ‘‘height’’ (in terms of membership values) of the point along the path that connects x and y, i.e., the maximum of the minimal height along paths from x to y.

ON FUZZY SPATIAL DISTANCES

79

features of fuzzy set theory in order to avoid this discontinuity, if this is found desirable. The fuzzy geodesic distance can be used to deﬁne geodesic balls which can serve as structuring elements for deﬁning fuzzy geodesic mathematical morphology, as shown in [21]. Conversely, in the discrete crisp case, geodesic morphology (and hence geodesic distance) can be obtained by iterating Euclidean morphological operations. Now, we exploit this idea to deﬁne a new geodesic distance as a fuzzy number. Let DnX ðYÞ denote the geodesic dilation of Y in X of size n. In the discrete crisp case, we have: DnX ðYÞ ¼ ðDðYÞ \ XÞn ,

ð15Þ

where D(Y ) denotes the Euclidean dilation of Y of size 1 and the exponent represents the number of iterations of the conditional dilation. This expression allows us to express the geodesic distance from a point x to a set Y conditionally to X as: dX ðx, YÞ ¼ n Q

x 2 6 ðDðYÞ \ XÞn1 x 2 ðDðYÞ \ XÞn

ð16Þ

from which we can derive the geodesic distance between two points x and y in a set X by considering y as a singleton set as: dX ðx, yÞ ¼ n Q

x 62 ðDðfygÞ \ XÞn1 x 2 ðDðfygÞ \ XÞn

ð17Þ

By extending this equation to the fuzzy case using the translation principle, we deﬁne the geodesic distance between two points x and y in a fuzzy set by: d ðx, yÞðnÞ ¼ t½c½tðD ððyÞÞðxÞ, ðxÞÞ n1 , ½tðD ððyÞÞðxÞ, ðxÞÞ n ,

ð18Þ

where the exponent still denotes the number of iterations, t is a t-norm, c is a fuzzy complementation (usually c(a) ¼ 1 a), denotes an elementary structuring element, and (y) denotes the fuzzy set of support {y} and value (y). The structuring element can be the unit crisp structuring element according to the chosen digital connectivity on S (as in the crisp case), or a fuzzy set representing the imprecision attached to the smallest spatial entities.

80

ISABELLE BLOCH

V. DISTANCE

FROM A

POINT

TO A

FUZZY SET

A. As a Number Distances from a point to a fuzzy set can be deﬁned using a weighting approach or using a fuzziﬁcation from -cuts. In this way, they are deﬁned as numbers. The idea in the weighting approach is that a point that has a low membership value to should have less inﬂuence in the computation of the inﬁmum (or minimum). Therefore the distance between x and may be deﬁned as: dðx, Þ ¼ inf ½dS ðx, yÞf ððyÞÞ , y2S

ð19Þ

where f is a decreasing function of (e.g., f ððyÞÞ ¼ 1=ðyÞ) such that f(1)< þ 1 (in order to guarantee that if x belongs completely to , i.e., if (x) ¼ 1, the distance is attained for y ¼ x), and with the convention 0f(0) ¼ þ 1. If (x) ¼ 0, i.e., if x is completely outside of , this deﬁnition leads to satisfactory results. However, if (x)>0, it leads always to 0, on the whole support of . This can be seen as a strong drawback of this deﬁnition, since we would intuitively rather expect that d(x, ) depends on the membership degree of x to . Generally speaking, it is required that d(x, ) be a strictly decreasing function of (x), with d(x, ) ¼ 0 if (x) ¼ 1. Deﬁning a fuzzy function from its crisp equivalent applied on the -cuts is a very common way to proceed, which has already been used for deﬁning several operations on fuzzy sets [60]. The two following equations express diﬀerent combinations of the -cuts for deﬁning d(x, ): Z

1

dðx, Þ ¼

dðx, Þ d ,

ð20Þ

dðx, Þ ¼ sup ½ dðx, Þ :

ð21Þ

0 2 0,1

The ﬁrst one consists in ‘‘stacking’’ the results obtained on each -cut, while the second one consists in weighting these results by the level of the cut, d(x, ) being the classical distance from a point to a crisp set. Equation (21) does not lead to convenient results, since the obtained distance is always the distance from x to the core of , i.e., d(x, ) ¼ d(x, 1), and therefore does not depend on (x) if (x) 6¼ 1. Equation (20) does not share the same disadvantage, since all -cuts are explicitly involved in the result. For instance for and having the same

ON FUZZY SPATIAL DISTANCES

81

core and (x)> (x), we have d(x, )0: dðx, XÞ ¼ 0 Q x 2 X

ð22Þ

dðx, XÞ ¼ n Q x 2 Dn ðXÞ and x 62 Dn1 ðXÞ

ð23Þ

where Dn denotes the dilation by a ball of radius n centered at the origin of S (and D0(X) ¼ X ) (see, e.g., [35] for a study of discrete balls and discrete distances in the crisp case). In this case, the extensivity property of the 0 dilation holds [142], and x 62 Dn1 ðXÞ is equivalent to 8n0 < n, x 62 Dn ðXÞ. Equation (23) is equivalent to: x 2 Dn ðXÞ \ ½Dn1 ðXÞ C ,

ð24Þ

where AC denotes the complement set of A in S. This is a pure set theoretical expression, that we can now translate into fuzzy terms. This leads to the following deﬁnition of the degree to which d(x, ) is equal to n:

ðx,Þ ðnÞ ¼

ðx,Þ ð0Þ ¼ ðxÞ,

ð25Þ

t½Dn ðÞðxÞ, c½Dn1 ðÞðxÞ ,

ð26Þ

where t is a t-norm (fuzzy intersection), c a fuzzy complementation (typically cðaÞ ¼ 1 a for a 2 ½0, 1 ), and a fuzzy structuring element used for performing the dilation. As in Section IV, several choices of are possible.

82

ISABELLE BLOCH

It can be simply the unit ball, or a fuzzy set representing for instance the smallest sensitive unit in the image, along with the imprecision attached to it. In this case, has to be equal to 1 at the origin of S, such that the extensivity of the dilation still holds [29]. The properties of this deﬁnition are the following [14]: if ðxÞ ¼ 1, ðx,Þ ð0Þ ¼ 1 and 8n > 0, ðx,Þ ðnÞ ¼ 0, i.e., the distance is a crisp number in this case; if and are binary, the proposed deﬁnition coincides with the binary one; the fuzzy set (x,) can be interpreted as a density distance, from which a distance distribution can be deduced by integration (see Section III.B); ﬁnally, (x,) is a nonnormalized fuzzy number (in the discrete ﬁnite case).

Figure 3 presents an example of fuzzy numbers (x,)(n) obtained for diﬀerent points, the spatial domain being reduced to a one-dimensional space in this example. The point x1 is outside the support of and at a larger distance from it than x2. The results correspond to intuition, since the fuzzy number (x2,)(n) is more concentrated around very small values of n than (x1,)(n). An example in a two-dimensional space is given in Figure 4. The distances of three points to the fuzzy set are computed, for three diﬀerent t-norms (min, product, and Lukasiewicz). The coordinates of these points are, respectively, (25, 40) (point A, with high membership value to ), (26, 25) (point B, at the border of , with low membership value), and (60, 10) (point C, outside of the support of ). These points are superimposed on in Figure 4. The results are given in Figure 5.

Fuzzy set μ

δ( x , μ)

x2

x1 x1

x2

FIGURE 3. Fuzzy numbers representing (x,) ( being shown on the left) for two diﬀerent x.

83

ON FUZZY SPATIAL DISTANCES

C (60, 10) B (26,25)

A (25, 40) FIGURE 4. A fuzzy set in a 2D space and the three points for which the distance to is computed.

Point (25, 40) to μ Point A – min

0.75

0.50

0.25

10

40

0.50

0.25

50

10

0.50

0.25

0.00

20 30 Distances

40

20 30 Distances

40

0.75

0.50

0.25

Point A – Lukasiewicz

0.75

0.50

0.25

0.00

10

20 30 Distances

40

10

20 30 Distances

40

50

50

Point C – product

0.50

0.25

0

0.75

0.50

0.25

10

20 30 Distances

40

50

Point C – Lukasiewicz

1.00

0.00 0

40

0.75

50

Point B – Lukasiewicz

1.00 Membership degrees

1.00

20 30 Distances

0.00 0

50

10

1.00

Membership degrees

10

0.25

0

0.00 0

t-norm: product

0.50

50

Point B – product

1.00

0.75

0.75

0.00 0

Membership degrees

Membership degrees

20 30 Distances

Point A – product

1.00

Membership degrees

0.75

Membership degrees

0

Point C – min

1.00

0.00

0.00 t-norm: min

t-norm: Lukasiewicz

Point (60, 10) to μ

Point B – min

1.00 Membership degrees

1.00 Membership degrees

Point (26, 25) to μ

Membership degrees

Distance

0.75

0.50

0.25

0.00 0

10

20 30 Distances

40

50

0

10

20 30 Distances

40

50

FIGURE 5. Distance from a point to a fuzzy set: example of three points and of Figure 4 with three diﬀerent t-norms.

84

ISABELLE BLOCH

For the ﬁrst point, which has a high membership to the fuzzy set, the distributions take a high value at 0 (equal to (x)), and decrease very fast. For the second point, which belongs to with a low membership value, the distributions are more spread. This represents the ambiguity in deﬁning the distance of this point to the fuzzy set. For instance if we consider some defuzziﬁcation process using a threshold value on , depending on this threshold, the point would be more or less close to . The third point is outside of the support of , therefore the membership degrees of low distances are all equal to 0, and the distributions are shifted towards higher values. From this deﬁnition of a point to a fuzzy set, distances between two fuzzy sets can be derived using supremum or inﬁmum computation of fuzzy numbers using the extension principle [61]. The details are given in [14], and summarized in the following. The maximum of p fuzzy numbers representing the fuzzy distance from xi to is: 8n 0, max ððx1 ,Þ , ðx2 ,Þ , . . . , ðxp ,Þ ÞðnÞ ¼

sup

ðn1 ,...,np Þ n¼max ðn1 ,...,np Þ

min ½ðx1 ,Þ ðn1 Þ, . . . , ðxp ,Þ ðnp Þ :

ð27Þ

In a similar way, the fuzzy minimum is deﬁned as: 8n 0, min ððx1 ,Þ , ðx2 ,Þ , . . . , ðxp ,Þ ÞðnÞ ¼

sup

ðn1 ,...,np Þ n¼min ðn1 ,...,np Þ

min ½ðx1 ,Þ ðn1 Þ, . . . , ðxp ,Þ ðnp Þ :

ð28Þ

These expressions are in particular useful when p ¼ jSj (cardinality of S), and can therefore be used for deﬁning distances between two fuzzy sets. As pointed out in [61], these deﬁnitions do not provide in general one of the input fuzzy numbers. Another interesting question may be: what is the greatest of these fuzzy numbers? A degree of possibility for a fuzzy set being greater than another one has been deﬁned in [61]. Methods for ranking fuzzy numbers have also been proposed, e.g., in [150]. We do not make use of this point of view in the following and restrict ourselves to deﬁnitions (27) and (28). Now if we consider points in another fuzzy set deﬁned on S, i.e., if we want to compute a function of (xi, ) over a set of xi having nonbinary membership degrees to , we have to introduce the values of (xi) in Equations (27) and (28), for instance as: 8n 0, max ððx1 ,Þ , ðx2 ,Þ , . . . , ðxp ,Þ ÞðnÞ ¼

sup

min ½min ½ðxi ,Þ ðni Þ, ðxi Þ :

i¼1...p ðn1 ,...,np Þ n¼max ðn1 ,...,np Þ

ð29Þ

ON FUZZY SPATIAL DISTANCES

85

Similarly we may deﬁne the minimum of fuzzy numbers as: 8n 0, min ððx1 ,Þ ,ðx2 ,Þ ,:::,ðxp ,Þ ÞðnÞ ¼

sup

min ½min ½ðxi ,Þ ðni Þ, ðxi Þ :

i¼1:::p ðn1 ,:::,np Þ n¼min ðn1 ,:::,np Þ

ð30Þ

Another possibility is to use the fuzziﬁcation principle over the -cuts of , which leads to a simpler expression for the maximum of a set of fuzzy numbers over points in a fuzzy set: Z

1

max ðx,Þ ðnÞ ¼ x2

max ðx,Þ ðnÞ d :

ð31Þ

min ðx,Þ ðnÞ d :

ð32Þ

x2

0

Similarly for the minimum, we have: Z

1

min ðx,Þ ðnÞ ¼ x2

0

x2

Similar expressions can be used for any function of fuzzy numbers. Since the nearest point distance, for instance, is simply a minimum over distances from a point to a fuzzy set, the fuzzy minimum taken over points in a fuzzy set leads directly to a fuzzy nearest distance between two fuzzy sets (as a fuzzy number). Similarly the Hausdorﬀ distance can be directly derived from the distance from a point to a fuzzy set using the maximum of fuzzy numbers.

VI. DISTANCE

BETWEEN

TWO FUZZY SETS

We now address the problem of deﬁning distances between two fuzzy sets. The classiﬁcation we propose considers deﬁnitions relying on comparison of membership functions on the one hand, and deﬁnitions really taking into account the spatial distance dS on the other hand. Further subdivisions are based on the type of approach and of formalism. We refer to [20] for a comparison of these distances on a concrete example of spatial objects.

86

ISABELLE BLOCH

A. Comparison of Membership Functions In this section we review the main distances proposed in the literature that aim at comparing membership functions. They have generally been proposed in a general fuzzy set framework, and not speciﬁcally in the context of image processing. They do not really include information about spatial distances. The classiﬁcation chosen here is inspired from the one found in [163]. Similar classiﬁcations can be found in [47,91,126]. 1. Functional Approach The functional approach is probably the most popular. It relies on a Lp norm between and , leading to the following generic deﬁnition [62,97,113]: Z

1=p

dp ð, Þ ¼

jðxÞ ðxÞj

p

,

ð33Þ

x2S

d1 ð, Þ ¼ sup jðxÞ ðxÞj:

ð34Þ

x2S

dp is a pseudometric, while d1 is a metric. In general, dp does not converge towards d1 when p becomes inﬁnite, but it converges towards [113]: dEssSup ð, Þ ¼ inffk 2 R, ðfx, jðxÞ ðxÞj > kgÞ ¼ 0g,

ð35Þ

where l denotes the Lebesgue measure on S. It has been shown that dEssSup is a pseudometric, called essential supremum, and related to d1 by the relation dEssSup d1 . The equality does not hold in the general continuous case (a counter-example can be found in [113]). In the discrete ﬁnite case, these deﬁnitions become: " dp ð, Þ ¼

X

#1=p jðxÞ ðxÞj

p

,

ð36Þ

x2S

d1 ð, Þ ¼ max jðxÞ ðxÞj: x2S

ð37Þ

In this case, they are all metrics. Therefore, this approach is also called metric based in [91]. A noticeable property of dp is that it takes a constant value if the supports of and are disjoint. In such cases, we have: dp ð, Þ ¼ jj þ j j,

ð38Þ

87

ON FUZZY SPATIAL DISTANCES

where jj denotes the fuzzy cardinality of , and for d1 we have: "

#

d1 ð, Þ ¼ max sup ðxÞ, sup ðxÞ , x2 S

ð39Þ

x2 S

which is equal to 1 if the fuzzy sets are normalized. These equations show that, as soon as the support of and are disjoint, the value taken by their distance is constant, irrespectively of how far the supports are from each other in S. A slightly diﬀerent version of d1 has been proposed in [47,157], where the distance is normalized by jSj (cardinality of S). This normalization could be applied to any dp as well (for p ﬁnite). However, this normalization does not change the properties or the type of information taken into account. It allows an easier link to similarity. Note that these deﬁnitions satisfy property P10 (proximity measure, in the sense of [70]) for p ﬁnite, and for and being normalized and having a bounded support for d1 . The distance d1 is also called geometrical distance in [47]. However, this deﬁnition (as well as the general deﬁnition dp ) considers only the geometry of the two fuzzy sets with respect to each other, in terms of shape of the membership function, but does not include the geometry related to dS. The distance d1 has been used in a pyramidal approach in image processing in [109] for recognizing objects based on their attributes. In this example, the fuzzy sets do not represent the objects themselves but fuzzy attributes of the objects. Therefore the spatial information is not taken into account at the level of the distance formulation but is rather included implicitly in the type of features used. Summarizing the properties of the deﬁnitions derived from a Lp norm, we get P0–P4, P7 (with a maximum value of jSj for nonnormalized forms and 1 for normalized forms), P8, P10. Properties P5 and P9 do not hold in general. A weaker form of P6 holds: if the supports are disjoints, then the distance is constant. Other forms of distances can be found in this class. For instance, in [126] the following form is proposed (in the ﬁnite discrete case): P P jðxÞ ðxÞj jðxÞ ðxÞj : ¼ x2S dð, Þ ¼ Px2S jj þ j j ððxÞ þ ðxÞÞ x2S

ð40Þ

This equation corresponds to a normalization of d1 by the sum of the cardinality of and . Again, its value is constant if the supports of both fuzzy sets are disjoint, the constant being equal to 1.

88

ISABELLE BLOCH

This equation can be generalized by using any Lp norm as: P dð, Þ ¼ P

1=p

x2 S

jðxÞ ðxÞjp

x2 S

ððxÞp þ ðxÞp Þ

1=p :

ð41Þ

It still satisﬁes property P6. For such a normalization, we do not have P4, P5, P8, P9, and P10. 2. Information Theoretic Approach Based on their deﬁnition of fuzzy entropy E( ), de Luca and Termini deﬁne a pseudometric as [114]: dð, Þ ¼ jEðÞ Eð Þj,

ð42Þ

with EðÞ ¼ K

X

½ðxÞ log ðxÞ þ ð1 ðxÞÞ log ð1 ðxÞÞ ,

ð43Þ

x2 S

where K is a normalization constant. This distance does not satisfy the separability condition. This can be overcome by considering the quotient space obtained through the equivalence relation Q EðÞ ¼ Eð Þ. However, this is not suitable for image processing. Indeed, since the entropy of a crisp set is zero, two crisp structures in an image belong to the same equivalence class, even if they are completely diﬀerent. One main drawback of this approach is that the distance is based on the comparison of two global measures performed on and separately: there is nothing linking points of to points of , which is of reduced interest for computing distances. The properties satisﬁed by this deﬁnition are P0, P1, P3, P4, and P10. Entropy functions under similarity [38,59] combine this approach with the membership comparison approach. It has been applied in decision problems (in particular for questionnaires) but to our knowledge not in image processing or other spatial information processing applications. Based on a similar approach, a notion of fuzzy divergence (which can be interpreted as a distance) has been introduced in [11], by mimicking Kullback’s approach [106]: dð, Þ ¼

1 X ½Dx ð, Þ þ Dx ð , Þ jSj x2S

ð44Þ

ON FUZZY SPATIAL DISTANCES

89

with Dx ð, Þ ¼ ðxÞ log

ðxÞ 1 ðxÞ þ ð1 ðxÞÞ log , ðxÞ 1 ðxÞ

and the convention 0=0 ¼ 1. This distance is positive, symmetrical, but does not satisfy the triangular inequality. Moreover, it is always equal to 0 for crisp sets. A slightly diﬀerent version was then proposed in [10], which solves some undetermination in the computation, by replacing by 1 þ (respectively by 1 þ ) in the logarithms: Dx ð, Þ ¼ ðxÞ log

1 þ ðxÞ 2 ðxÞ þ ð1 ðxÞÞ log : 1 þ ðxÞ 2 ðxÞ

The fuzzy divergence is a proximity measure in the sense of [70] (property P10). It also satisﬁes P0, P1, P2, P3, P7, and P8. 3. Set Theoretic Approach In this approach, distance between two fuzzy sets is seen as a set dissimilarity function, based on fuzzy union and intersection. Examples are given in [163]. The basic idea is that the distance should be larger if the two fuzzy sets weakly intersect. Most of the proposed measures are inspired from the work by Tversky [151] who proposes two parametric similarity measures between two sets A and B: f ðA \ BÞ f ðA BÞ f ðB AÞ,

ð45Þ

and in a rational form: f ðA \ BÞ , f ðA \ BÞ þ f ðA \ BÞ þ f ðB \ AÞ

ð46Þ

where f ðXÞ is typically the cardinality of X, , , and are parameters leading to diﬀerent kinds of measures, and B denotes the complement of B. Let us mention a few examples (they are given in the ﬁnite discrete case). A measure being derived from the second Tversky measure by setting ¼ ¼ 1 has been used by several authors [47,55,61,91,126,158,163]: P min ½ðxÞ, ðxÞ : dð, Þ ¼ 1 P x2S max ½ðxÞ, ðxÞ x2S

ð47Þ

90

ISABELLE BLOCH

This distance is a semimetric, and always takes the constant value 1 as soon as the two fuzzy sets have disjoint supports. It also corresponds to the Jaccard index [55]. With respect to the typology presented in [37], this distance is a comparison measure, and more precisely a dissimilarity measure (see Section III.D). Moreover, 1 d is a resemblance measure. Applications in image processing can be found, for example, in [156], where it is used on fuzzy sets representing objects features (and not directly spatial image objects) for structural pattern recognition on polygonal 2D objects. Equation (47) can be generalized by replacing the min by any t-norm t and the max by any t-conorm T: P t ½ðxÞ, ðxÞ dð, Þ ¼ 1 P x2S : x2S T½ðxÞ, ðxÞ

ð48Þ

However, properties P1 and P2 hold only for the min and max, while property P6 holds for the minimum and product t-norms, and the dual tconorms (but not for Lukasiewicz ones for instance). Properties P0, P3, and P7 are satisﬁed. Properties P4, P5, P8, P9, and P10 are not. A slightly diﬀerent formula has been proposed in [157], which, however, translates a similar idea: dð, Þ ¼ 1

1 X min ½ðxÞ, ðxÞ jSj x2S max ½ðxÞ, ðxÞ

ð49Þ

with the convention 0=0 ¼ 1. It is a semimetric. It takes the constant value 1 if the two fuzzy sets have disjoint supports, without any other condition on their relative position in the space. Again this expression can be generalized as: dð, Þ ¼ 1

1 X t½ðxÞ, ðxÞ jSj x2S T½ðxÞ, ðxÞ

ð50Þ

for any t-norm t and t-conorm T. But in general property P6 is not satisﬁed and reﬂexivity (P1) holds only for min and max. The following modiﬁed version has been proposed in [53]: dð, Þ ¼ 1

1 jSuppðÞ [ Suppð Þj

X x2SuppðÞ[Suppð Þ

t½ðxÞ, ðxÞ T½ðxÞ, ðxÞ

ð51Þ

which satisﬁes P6. It also satisﬁes P0, P3, and P7. Properties P1 and P2 are satisﬁed for the t-norm min and the t-conorm max.

ON FUZZY SPATIAL DISTANCES

91

Another measure takes into account only the intersection of the two fuzzy sets [47,91,163]: dð, Þ ¼ 1 max min ½ðxÞ, ðxÞ : x2S

ð52Þ

It is a semipseudometric if the fuzzy sets are normalized. Again it is a dissimilarity measure, and 1 d is a resemblance measure. It is always equal to 1 if the supports of and are disjoint. This deﬁnition can be generalized to [89]: dð, Þ ¼ 1 max t½ðxÞ, ðxÞ , x2S

ð53Þ

where t is any t-norm. It satisﬁes P0, P3, and P7. Property P6 is satisﬁed for the minimum and the product. Property P1 is satisﬁed for normalized fuzzy sets. If we set ðu ÞðxÞ ¼ max ½min ððxÞ, 1 ðxÞÞ, min ð1 ðxÞ, ðxÞÞ , two other distances can be derived, as [91,163]: dð, Þ ¼ sup ðu ÞðxÞ,

ð54Þ

x2S

dð, Þ ¼

X

ðu ÞðxÞ:

ð55Þ

x2S

These two distances are symmetrical measures (P3). They are separable (P2) only for binary sets. Also we have d(, ) ¼ 0 (P1) only for binary sets. They are dissimilarity measures. The ﬁrst one is equal to 1 if and have disjoint supports and are normalized (if they are not normalized, then this constant value is equal to the maximum membership value of and ). The second measure is always equal to jj þ j j if and have disjoint supports. These measures actually rely on measures of inclusion of each fuzzy set in the other. Indeed, an inclusion index can be deﬁned as [29,144]: I ð, Þ ¼ inf T½ðxÞ, 1 ðxÞ , x2S

ð56Þ

where T is a t-conorm. Since the distance should be small if the two sets have a small degree of equality (the equality between and can be expressed by ‘‘ included in and included in ,’’ which leads to an easy transposition to fuzzy equality), a distance may be deﬁned from an inclusion degree as: dð, Þ ¼ 1 min ½I ð, Þ, I ð , Þ :

ð57Þ

92

ISABELLE BLOCH

By taking T ¼ max, we recover the deﬁnition derived from ðu Þ. This approach has been used in [6,158]. Other choices of T may lead to diﬀerent properties of d. For instance, if T is taken as the Lukasiewicz t-conorm (bounded sum), then ðu ÞðxÞ ¼ jðxÞ ðxÞj. Therefore we have: sup ðu ÞðxÞ ¼ d1 ð, Þ,

ð58Þ

x2S

and X

ðu ÞðxÞ ¼ d1 ð, Þ:

ð59Þ

x2S

In this case, both distances are metrics in the discrete ﬁnite case. These measures have been applied in image processing for image databases applications in [91]. Other inclusion indexes can be deﬁned, e.g., from Tversky measure by setting ¼ 1 and ¼ 0, leading to f ðA \ BÞ=f ðAÞ [55]. The last deﬁnitions given by Equations (52) and (54) are, respectively, equivalent to 1 ð; Þ and 1 min ½Nð; Þ, Nð ; Þ (where and N are possibility and necessity functions) used in fuzzy pattern matching [42,65], which has a large application domain, including image processing (see, e.g., [93]). The possibility is symmetrical in and and corresponds to a degree of intersection. The necessity is not symmetrical and corresponds to a degree of inclusion. It can be useful for instance if we want to compare an object to a model. For instance, if the object is only a substructure, it makes sense to consider its degree of inclusion in the model object. On the contrary if the object groups several structures, then the degree of inclusion of the model in the object is meaningful. In cases where a direct comparison is possible, a symmetrical expression as 1 min ½Nð; Þ,Nð ; Þ is appropriate. A further interest of this approach is that it allows one to evaluate the distance not only as a number, but as in interval such as ½1 , 1 N , which provides more information than only one of these two numbers. It is interesting to note that the necessity and the possibility are related to fuzzy mathematical morphology, since ð; Þ corresponds to the dilation of by at origin, while Nð; Þ corresponds to the erosion of by at origin. These deﬁnitions can be straightforwardly generalized to fuzzy union and intersection derived from t-norms and t-conorms, leading to a correspondence with other forms of fuzzy mathematical morphology [29]. Such generalizations using any t-norm and t-conorm for set relationships can be done for all deﬁnitions presented in this section.

ON FUZZY SPATIAL DISTANCES

93

4. Pattern Recognition Approach This approach consists in ﬁrst expressing each fuzzy set in a feature space (e.g., cardinality, moments, skewness) and to compute the Euclidean distance between two feature vectors [163] or attribute vectors [145]. This approach may take advantage of some of the previous approaches, for instance by using entropy or similarity in the set of features. It has been applied for instance for database applications [145]. A similar approach, called signal detection theory, has been proposed in [91]. It is based on counting the number of similar and diﬀerent features. A particular form of distances between attributes can be found in [47], where the distance is deﬁned from vectorial representations a and b as: 1

ab : max ða a, b bÞ

ð60Þ

This form is very close to correlation-based approaches, such as the one described in [81,157]: X

½ðxÞ ðxÞ þ ð1 ðxÞÞð1 ðxÞÞ

x2S

dð, Þ ¼ 1 rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ : X X ½ðxÞ2 þ ð1 ðxÞÞ2 ½ ðxÞ2 þ ð1 ðxÞÞ2 x2S

ð61Þ

x2S

This expression is symmetrical (property P3), reﬂexive (property P1), and satisﬁes the separability property P2. It does not satisﬁes P6. Properties P0, P7, and P10 are satisﬁed. The Bhattacharya distance [61] can also be attached to this class. It is deﬁned as: Z sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ #1=2 ðxÞ ðxÞ : dð, Þ ¼ 1 dx jj j j S "

ð62Þ

It has been used in image processing for classiﬁcation in satellite images in [119]. B. Accounting for Spatial Distances The second class of methods tries to include the spatial distance dS in the distance between and . In contrast to the deﬁnitions given in Section VI.A, in this second class the membership values at diﬀerent points of S are linked using some formal computation, making the introduction of

94

ISABELLE BLOCH

dS possible. This leads to deﬁnitions that do not share the drawbacks of previous approaches, for instance when the supports of the two fuzzy sets are disjoint. 1. Geometrical Approach The geometrical approach consists in generalizing one of the distances between crisp sets. This has been done for instance for nearest point distance [62,138], mean distance [138], Hausdorﬀ distance [62], and could easily be extended to other distances (see, e.g., [31] for a review of crisp set distances). These generalizations follow four main principles. The ﬁrst one consists in considering fuzzy sets in a n dimensional space as n þ 1 dimensional crisp sets and then in using classical distances [82]. However, this is often not satisfactory in image processing because the n dimensions of S and the membership dimension (values in [0, 1]) have completely diﬀerent interpretations, and treating them in a unique way is questionable. The second principle is a fuzziﬁcation principle (see Section III.D): let D be a distance between crisp sets, then its fuzzy equivalent is deﬁned by: Z dð, Þ ¼

1

Dð , Þ d ,

ð63Þ

0

or by a discrete sum if the fuzzy membership functions are piecewise constant [60,163] ( denotes the -cut of ). In this way, d(, ) inherits the properties of the chosen crisp distance. Another way to consider the fuzziﬁcation principle consists in using a double integration (see Section III.D). However, using this double fuzziﬁcation, some properties of the underlying distance may be lost. The third principle consists in weighting distances by membership values. For the average distance this leads for instance to [138]: P dð, Þ ¼

P x2S

P

dS ðx, yÞ min ½ðxÞ, ð yÞ P : y2S min ½ðxÞ, ð yÞ

y2S x2S

ð64Þ

The last approach consists in deﬁning a fuzzy distance as a fuzzy set on R þ instead of as a crisp number using the extension principle (see Section III.D). For the nearest point distance this leads to [138]: dð, ÞðrÞ ¼

sup

x,y,dS ðx,yÞr

which is actually a distance distribution.

min ½ðxÞ, ðyÞ ,

ð65Þ

ON FUZZY SPATIAL DISTANCES

95

A similar approach has been used in [120], and the corresponding distance density is expressed as: dð, ÞðrÞ ¼

sup

x,y,dS ðx,yÞ¼r

min ½ðxÞ, ðyÞ :

ð66Þ

The Hausdorﬀ distance is probably the distance between sets, the fuzzy extension of which has been the most widely studied. One reason for this may be that it is a true metric in the crisp case, while other set distances like minimum or average distances have weaker properties. Another reason is that it has been used to determine a degree of similarity between two objects, or between an object and a model [88]. Extensions of this distance have been deﬁned using fuzziﬁcation over the -cuts and using the extension principle [39,45,57,133,134,163]. One potential problem with these approaches occurs in the case of empty -cuts [40,71]. Boxer [39] proposed to add a crisp set to every set, but the result is highly dependent on this additional set, and does not reduce to the classical Hausdorﬀ distance when applied on crisp sets. The solution proposed in [71] consists in clipping the distance at some maximum distance, but similar problems arise. Other authors use the Hausdorﬀ distance between the endographs of the two membership functions [57] (which corresponds to the ﬁrst principle mentioned above). Several generalizations of Hausdorﬀ distance have also been proposed under the form of fuzzy numbers [62]. Extensions of the Hausdorﬀ distance based on fuzzy mathematical morphology have also been developed [14] and are presented in the next section. Extensions of these deﬁnitions may be obtained by using other weighting functions, for instance by using t-norms instead of min. These distances share most of the advantages and drawbacks of the underlying crisp distance [31]: computation cost can be high (it is already high for several crisp distances); moreover, interpretation and robustness strongly depend on the chosen distance (for instance, the Hausdorﬀ distance is noise sensitive, whereas the average distance is not).

2. Morphological Approach We proposed in [14,20] original approaches for deﬁning fuzzy distances taking into account spatial information, which are based on fuzzy mathematical morphology. They are summarized in the following. These deﬁnitions are obtained by a direct translation of crisp equations expressing distances in terms of mathematical morphology into fuzzy ones (see Section III.D). We just give the examples of nearest point distance and Hausdorﬀ distance.

96

ISABELLE BLOCH

In the binary case, for n > 0, the nearest point distance can be expressed in morphological terms as: dN ðX, YÞ ¼ n Q Dn ðXÞ \ Y 6¼ ;

and

Dn1 ðXÞ \ Y ¼ ;

ð67Þ

and the symmetrical expression. For n ¼ 0 we have: dN ðX, YÞ ¼ 0 Q X \ Y 6¼ ;:

ð68Þ

The translation of these equivalences provides, for n >0, the following distance density: "

"

0

N ð, ÞðnÞ ¼ t sup t½ x2S

0

ðxÞ, Dn ðÞðxÞ , c

## sup t½ x2S

0

ðxÞ, Dn1 ðÞðxÞ

ð69Þ

or a symmetrical expression derived from this one, and: N ð, 0 Þð0Þ ¼ sup t½ðxÞ, 0 ðxÞ :

ð70Þ

x2S

This expression shows how the membership values to 0 are included, without involving the extension principle. As for the nearest point distance, we can extend the Hausdorﬀ distance by translating directly the binary equation deﬁning the Hausdorﬀ distance: "

#

dH ðX, YÞ ¼ max sup dðx, YÞ, sup dðy, XÞ : x2X

ð71Þ

y2Y

This distance can be expressed in morphological terms as: dH ðX, YÞ ¼ inffn, X Dn ðYÞ and Y Dn ðXÞg:

ð72Þ

From Equation (72), a distance distribution can be deﬁned, by introducing fuzzy dilation: " # H ð, 0 ÞðnÞ ¼ t inf T½Dn ðÞðxÞ, cð0 ðxÞÞ , inf T½Dn ð0 ÞðxÞ, cððxÞÞ , ð73Þ x2S

x2S

where c is a complementation, t a t-norm and T a t-conorm. A distance density can be derived implicitly from this distance distribution. A direct deﬁnition of a distance density can be obtained from: dH ðX, YÞ ¼ 0 Q X ¼ Y,

ð74Þ

97

ON FUZZY SPATIAL DISTANCES

and for n > 0: dH ðX, YÞ ¼ n Q X Dn ðYÞ and Y Dn ðXÞ

and X 6 Dn1 ðYÞ or Y 6 Dn1 ðXÞ :

ð75Þ

Translating these equations leads to a deﬁnition of the Hausdorﬀ distance between two fuzzy sets and 0 as a fuzzy number: " # H ð, 0 Þð0Þ ¼ t inf T½ðxÞ, cð0 ðxÞÞ , inf T½0 ðxÞ, cððxÞÞ , x2S

ð76Þ

x2S

" 0

H ð, ÞðnÞ ¼ t inf T½Dn ðÞðxÞ, cð0 ðxÞÞ , inf T½Dn ð0 ÞðxÞ, cððxÞÞ , x2S

x2S

!#

0 0 n1 T sup t½ðxÞ, cðDn1 ð ÞðxÞÞ , sup t½ ðxÞ, cðD ðÞðxÞÞ : ð77Þ x2S

x2S

The above deﬁnitions of fuzzy nearest point and Hausdorﬀ distances (deﬁned as fuzzy numbers) between two fuzzy sets do not necessarily share the same properties as their crisp equivalent. This is due in particular to the fact that, depending on the choice of the involved t-norms and t-conorms, excluded-middle and noncontradiction laws may not be satisﬁed.

All distances are positive, in the sense that the deﬁned fuzzy numbers have always a support included in R þ . By construction, all deﬁned distances are symmetrical with respect to and 0 (P3). The separability property (P2) is not always satisﬁed. For the Hausdorﬀ distance, H ð, 0 Þð0Þ ¼ 1 implies ¼ 0 for T being the bounded sum (Tða, bÞ ¼ min ð1, a þ bÞ), while it implies and 0 crisp and equal for T ¼ max . As for property P1 (reﬂexivity), if is normalized, we have for the nearest point distance N ð, Þð0Þ ¼ 1 and N ð, ÞðnÞ ¼ 0 for n > 1. Also the triangular inequality is not satisﬁed in general.

Another morphological approach has been suggested in [146], based on links between the minimum distance and the Minkowski diﬀerence. In the crisp case, we have: dN ðX, YÞ ¼ inf fjzj, z 2 YX g,

ð78Þ

if X and Y are nonintersecting crisp sets. In order to account for possible intersection between the two sets, the authors introduce also the notion of

98

ISABELLE BLOCH

penetration distance, deﬁned along a direction as the maximum translation of X along such that X still meets Y: ð; X, YÞ ¼ max fk, ðX þ kÞ \ Y 6¼ ;g:

ð79Þ

The extension to fuzzy sets is done by assuming fuzzy numbers on each axis. This leads to reasonable computation time, but can unfortunately not be directly extended to any fuzzy objects. Finally, we propose a new deﬁnition in the discrete case based on links with mathematical morphology and more operational from a computational point of view. It relies on the idea of distance transform, that assigns to each point of S the distance to some object. From this distance transform, the nearest point distance between two sets can be easily computed. In the crisp case, this distance transform can be computed by a dilation by a conic structuring element , deﬁned as: dðx, OÞ 8x 2 S, ðxÞ ¼ max 0, 1 k

! ð80Þ

where O is the origin and k a constant used to limit to support of the structuring element to the maximal distance of interest. It is easy to prove in the crisp case that: D ðXÞðxÞ ¼ 1

dðx, XÞ k

ð81Þ

i.e., the distance from x to X is directly linked to the dilation of X by at x. The minimum distance between X and Y is then given by: dN ðX, YÞ ¼ min ð1 D ðXÞðxÞÞk:

ð82Þ

y2Y

Now we apply similar formulas in the fuzzy case in order to deﬁne a fuzzy nearest point distance. We therefore dilate a fuzzy set by the conic structuring element : "

dðy x, OÞ D ðÞðxÞ ¼ sup min ðyÞ, max 0, 1 k y2S It is easy to show that this dilation preserves the core, i.e.: CoreðD ðÞÞ ¼ CoreðÞ:

!# :

ð83Þ

ON FUZZY SPATIAL DISTANCES

99

This shows that the points at distance 0 from are exactly the points of the core. From this dilation we deﬁne the nearest point distance between two fuzzy sets and 0 as: dN ð, 0 Þ ¼ sup min0 ð1 D ðÞðxÞÞk,

ð84Þ

2½0,1 x2

or a symmetrical expression by exchanging the roles of and 0 (this allows one to obtain a symmetrical distance satisfying P3). This deﬁnes the distance as a positive number (not a fuzzy number). Moreover, we have dN ð, Þ ¼ 0 (P1) and dN ð, 0 Þ ¼ 0 iﬀ CoreðÞ \ Coreð0 Þ 6¼ ;, which is a similar property as in the crisp case for this distance (but weaker than P2). Property P0 is obviously satisﬁed. The Hausdorﬀ distance can be deﬁned as a number in a similar way: " # dH ð, 0 Þ ¼ sup max max0 ð1 D ðÞðxÞÞk, max ð1 D ð0 ÞðxÞÞk : ð85Þ 2½0,1

x2

x2

This expression deﬁnes a positive number. The obtained distance satisﬁes P0 and P3. We do not have P1 and P2, but only dð, 0 Þ ¼ 0 iﬀ ¼ 0 and are crisp. 3. Tolerance-Based Approach This approach has been developed in [113]. The basic idea is to combine spatial information and membership values by assuming a tolerance value , indicating the diﬀerences that can occur without saying that the objects are no longer similar. The proposed deﬁnitions are semipseudometrics and are derived from the functional approach (see Section IV.A). The authors ﬁrst deﬁne a local diﬀerence between and at a point x of S as: dx ð, Þ ¼

inf

y,z2Bðx,Þ

jðyÞ ðzÞj,

ð86Þ

where B(x, ) denotes the (spatial) closed ball centered at x of radius . Then the functions dp, d1, and dEssSup are deﬁned up to a tolerance as: Z

1=p p dp ð, Þ ¼ ½dx ð, Þ dx , ð87Þ S

d1 ð, Þ ¼ sup dx ð, Þ,

ð88Þ

dEssSup ð, Þ ¼ inffk 2 R, ðfx 2 S, dx ð, Þ > kgÞ ¼ 0g:

ð89Þ

x2S

100

ISABELLE BLOCH

Several results are proved in [113], in particular about convergence: dp ð, Þ ð, Þ when p goes to inﬁnity, all pseudometrics are converges towards dEssSup decreasing with respect to , and converge towards dp, d1, and dEssSup when becomes inﬁnitely small, for continuous fuzzy sets. This approach has been extended in [111], by allowing the neighborhood around each point to depend on the point. Note that this approach has strong links with morphological approaches, since the neighborhood considered around each point can be considered as a structuring element. This approach has been illustrated on an example of noisy character recognition. 4. Graph Theoretic Approach A similarity function between fuzzy graphs may also induce a distance between fuzzy sets. This approach contrasts with the previous ones, since the objects are no longer represented directly as fuzzy sets on S or as vectors of attributes, but as higher level structures. Fuzzy graphs in image processing can be used for representing objects, as in [116], or a scene, as in [100]. In the ﬁrst case, nodes are parts of the objects and arcs are links between these parts. In the example presented in [116] for character recognition, nodes are fuzzy sets representing features of a character, extracted by some image processing. In the second case, nodes are objects of the scene and arcs are relationships between these objects. In the example of [100], the nodes represent clouds extracted from satellite images. These two examples use diﬀerent ways to consider distances (or similarity) between fuzzy graphs. In [116], the distance is deﬁned from a similarity between nodes and between arcs (both being fuzzy sets), given a correspondence between nodes (respectively between arcs). The similarity used compares only membership functions, using a set theoretic approach (see Section VI.A) and corresponds to Equation (47). Although it has not been considered in this reference, spatial distance can then be taken into account if we include it in the attribute set. This idea is probably worth further development. In a similar way, several distances between graphs have been proposed as an objective function to ﬁnd the correspondence between graphs. This function compares attributes of nodes of the two graphs to be matched, and attributes of arcs. One of the main diﬃculties is dealing with nonbijective matching. This has been addressed for instance in [7,43,127], where a formalism for deﬁning fuzzy morphisms between graphs is proposed, as well as optimization methods for ﬁnding the best morphism according to an objective function including spatial distance information as an edge attribute.

ON FUZZY SPATIAL DISTANCES

101

Another way to consider distances between objects is in terms of cost of deformations to bring one set in correspondence with the other. Such approaches are particularly powerful in graph-based methods. The distance can then be expressed as the cost of the matching of two graphs, as done in [100] for image processing applications, or as the Levensthein distance accounting for the necessary transformations (insertions, substitutions, deletions) for going from the structural representation of one shape to the representation of the other [54]. In [100], the fuzzy aspect is taken into account as weighting factors, therefore the method is quite close of the weighted Levensthein distance of [54]. Spatial distances could also be introduced as one of the relationships between objects in these approaches. A distance between conceptual graphs is deﬁned in [115], as an interval [N, ] where N represents the necessity and the possibility, obtained by a fuzzy pattern matching approach. Although the application is not related to image processing, the idea of expressing similarity as an interval is interesting and could certainly be exploited in other domains. A second interest of this approach is that the nodes of the graph are concepts, which could be (although not explicitly mentioned in this reference) represented as fuzzy sets (like linguistic variables). Although these examples are still far from the main concern of this chapter, it is worth mentioning them, since they bring an interesting structural aspect that could be further developed. 5. Histogram of Distances Until now we have considered the problem of evaluating a speciﬁc distance (nearest point, Hausdorﬀ, etc.) between two given fuzzy sets. Another question is to check if two fuzzy sets satisfy a distance property, expressed for instance in linguistic terms. To answer such questions, we propose here a new approach, which consists in expressing all distance information between the two objects as a fuzzy set of the positive real line, and to compare this fuzzy set to a fuzzy set expressing the semantics of the distance property to be checked, using a fuzzy pattern matching approach [42,65]. This idea is inspired from previous work on directional position [16,19,99,122]. The complete distance information between the two objects is encoded in a distance histogram, and the pattern matching provides an evaluation as an interval. We ﬁrst deﬁne the histogram of distances between two crisp sets X and Y as: 8d 2 Rþ , HðX, YÞðdÞ ¼ jfðx, yÞ, x 2 X, y 2 Y, dS ðx, yÞ ¼ dgj:

ð90Þ

102

ISABELLE BLOCH

In the ﬁnite case, H(X, Y)(d) is equal to 0 outside a bounded interval [d1, d2]. From this histogram, it is possible to recover several distances between X and Y: dN ðX, YÞ ¼ min fd, HðX, YÞðdÞ 6¼ 0g ¼ d1 ; dM ðX, YÞ ¼ max fd, HðX, YÞðdÞ 6¼ 0g ¼ d2 ; d2 X dHðX, YÞðdÞ

dA ðX, YÞ ¼

d1 d2 X

d2 X dHðX, YÞðdÞ

¼

d1

jXkYj

:

HðX, YÞðdÞ

d1

The Hausdorﬀ distance cannot be obtained directly from the histogram. A normalized version of this histogram (by dividing each value by jXkYj or alternatively by max d2Rþ HðdÞ) allows us to consider it as a fuzzy set carrying all distance information between the two objects. The properties of H are: H is symmetrical in X and Y: 8d 2 Rþ , HðX, YÞðdÞ ¼ HðY, XÞðdÞ; if X ¼ Y, then d1 ¼ 0 and HðX, YÞðd1 Þ ¼ jXj ¼ jYj; if d1 ¼ 0 then X \ Y 6¼ ; (we recognize here a property of the minimum distance).

This idea can be extended to the distance histogram between fuzzy objects and by weighting the contribution of each point by its membership value: 8d 2 Rþ , Hð, ÞðdÞ ¼

X

min ½ðxÞ, ðyÞ :

ð91Þ

ðx,yÞ2S 2 ,dS ðx,yÞ¼d

The sum in this equation is actually limited to the points of the supports of and of , respectively, and is therefore ﬁnite. Any t-norm could also be used instead of the min. Again the normalization in [0, 1] of this histogram leads to an appropriate interpretation as a fuzzy set representing the distance information between and . The properties of H in the fuzzy case are similar as in the crisp case: H is symmetrical in andP : 8d 2 Rþ , Hð, ÞðdÞ ¼ Hð , ÞðdÞ; if ¼ , then Hð, Þð0Þ ¼ x min ððxÞ, ðxÞÞ ¼ jSuppðÞj ¼ jSuppð Þj; if d1 ¼ 0 then SuppðÞ \ Suppð Þ 6¼ ;.

ON FUZZY SPATIAL DISTANCES

103

Now if we want to evaluate the satisfaction of a distance relationship between two objects, such as ‘‘near,’’ ‘‘far,’’ and ‘‘very far’’ we can compare the normalized histogram with the fuzzy set expressing the semantics of the desired distance value, denoted by dist where dist denotes any of the linguistic values (near, far, etc.). For instance for d 2 Rþ , near ðdÞ represents the degree to which d is considered as a near distance value. This comparison can be done using a compatibility approach, as in [122] for directional position, or using a fuzzy pattern matching approach. We detail here this second possibility. Note that all information is encoded on R þ , and the comparison is done in the same space. In Section VII, we will address similar problems, but directly in the spatial domain S. The pattern matching between dist and the normalized histogram H( , ) provides an evaluation of the relation dist as two numbers, the necessity N and the possibility , deﬁned as:

N ¼ infþ T½1 dist ðdÞ, Hð, ÞðdÞ ,

ð92Þ

¼ sup t½dist ðdÞ, Hð, ÞðdÞ ,

ð93Þ

d2R

d2Rþ

where T is a t-conorm and t a t-norm. The value N represents the degree of inclusion of dist in H( , ), i.e., the degree to which the relation dist is a part of the distance relationships between and . The value represents the degree of intersection of dist and H( , ), i.e., the degree to which dist is compatible with the distance relationships between and . The length of the interval [N, ] represents the ambiguity of the relation. Extremal values for N and are obtained in the following situations: ¼ 1 iﬀ Coreðdist Þ \ CoreðHð, ÞÞ 6¼ ; (where CoreðÞ denotes the set of modal values of , i.e., having a membership value equal to 1); if Suppðdist Þ \ SuppðHð, ÞÞ ¼ ;, then ¼ 0, the reverse implication being true for some t-norms, such as min and product; if Suppðdist Þ \ Suppð1 Hð, ÞÞ ¼ ;, then N ¼ 1, the reverse implication being true for some t-conorms, such as max and algebraic sum; N ¼ 1 iﬀ Coreðdist Þ \ Coreð1 Hð, ÞÞ 6¼ ;.

104

ISABELLE BLOCH

VII. SPATIAL REPRESENTATIONS

OF

DISTANCE INFORMATION

In this section we propose to represent distance information with respect to an object as a spatial fuzzy set, following the framework proposed in [22] for spatial representations of spatial information of various types. A. Spatial Fuzzy Sets as a Representation Framework The main idea here is to translate spatial information or knowledge as a spatial representation, and more precisely as a spatial fuzzy set representing the degree to which distance relationships to a reference object are satisﬁed at each point of the space S. Such representations can also be derived for very heterogeneous types of knowledge and then used in a fusion process that combines all these fuzzy regions of interest in order to focus attention by reducing the search space and to restrict it to the area that satisﬁes most relationships as proposed in [22,25]. This type of situation occurs, for instance, if we want to exploit spatial knowledge for guiding recognition of an object or to perform spatial reasoning. Such knowledge is generally heterogeneous: it may concern the object we are looking at (its shape, topology, color, position), or relationships to other objects (distances, adjacency, relative directional position). It may be generic (typically if derived from a model or from expert knowledge), or factual (if derived from the scene itself ). And it may be usually provided in a lot of diﬀerent forms. Classically it can be a number, a distribution, or a binary value. But we can also be concerned with imprecise values and with propositional formulas which are often used by experts within a given application. Imprecise values are expressed sometimes in linguistic terms: for instance the expected distance between two objects (‘‘close,’’ ‘‘far,’’ etc.). They can also be expressed as an interval as mentioned previously. Therefore the proposed framework allows one to have representations of diﬀerent pieces of information in the same domain and is therefore suitable to translate heterogeneous knowledge in a useable form for reasoning. In the following a point (volume element or voxel) in the 3D discrete space S is denoted by . For each piece of knowledge, we consider its ‘‘natural expression,’’ i.e., the usual form in which it is given or available, and translate it into a spatial fuzzy set in the space, the membership of which is denoted by: knowledge :

S

! °

½0, 1 knowledge ðÞ:

ð94Þ

ON FUZZY SPATIAL DISTANCES

105

In this representation, each piece of knowledge becomes a fuzzy region of the space. If the knowledge is considered as a constraint to be satisﬁed by the object to be recognized, this fuzzy region represents a search area or a fuzzy volume of interest for this object. This type of representation provides a common framework to represent pieces of information of various types (objects, spatial imprecision, relationships to other objects, etc.). Therefore the fuzzy regions deﬁned in the space S corresponding to these pieces of information may have diﬀerent semantics. Moreover, this common framework allows the combination of this heterogeneous information, as stated previously. The numerical representation of membership values assumes that we can assign numbers that represent degrees of satisfaction of a relationship for instance. These numbers can be derived from prior knowledge or learned from examples, but usually there remain some quite arbitrary choices. This might appear as a drawback in comparison to propositional representations. However, it is not necessary to have precise estimations of these values, and experimentally we observed a good robustness with respect to these estimations, in various problems like information fusion, object recognition, and scene interpretation [32,80]. This can be explained by two reasons: ﬁrst, the fuzzy representations are used for rough information and therefore do not have to be precise itself; and second, several pieces of information are usually combined in a whole reasoning process, which decreases the inﬂuence of each particular value (of individual information). Therefore, the chosen numbers are not crucial. What is important is that ranking is preserved. For instance, if a region of the space satisﬁes a relationship to some objects to a higher degree than another region, then this ranking is preserved in the representation for all relationships described in the following sections, assuming the existence of ranking is reasonable for the type of relations we consider.

B. Spatial Representation of Distance Knowledge to a Given Object We apply now the previous idea to translate expressions of knowledge about distances into spatial volumes of interest within S, taking into account imprecision and uncertainty, in order to account for approximate statements where distances can be expressed as numbers, but also intervals, fuzzy numbers, linguistic values, etc. In contrast to the approach proposed in [78,79] where linguistic variables about distances are represented as fuzzy sets on each axis, from which distance knowledge in the space can be derived, we choose here to represent distance knowledge directly in the space S, as spatial

106

ISABELLE BLOCH

fuzzy sets. The method we propose is independent of the dimension of S and uses morphological expressions of distances [20], as detailed in Section VI. We assume that a set A is known as one already recognized object, or a known area of S, and that we want to determine B, subject to satisfying some distance relationship with A. According to the algebraic expressions of distances, dilation of A is an adequate tool for this. Let us consider the following diﬀerent cases: If knowledge expresses that dN ðA, BÞ ¼ n, then the border of B should intersect the region deﬁned by Dn ðAÞ n Dn1 ðAÞ, which is made up of the points exactly at distance n from A, and B should be looked for in Dn1 ðAÞC (the complement of the dilation of size n 1). If knowledge expresses that dN ðA, BÞ n, then B should be looked for in AC, with the constraint that at least one point of B belongs to Dn ðAÞ n A. If knowledge expresses that dN ðA, BÞ n, then B should be looked for in Dn1 ðAÞC . If knowledge expresses that n1 dN ðA, BÞ n2 , then B should be searched in Dn1 1 ðAÞC with the constraint that at least one point of B belongs to Dn2 ðAÞ n Dn1 1 ðAÞ.

The constraints on the border lead to the deﬁnition of actually two fuzzy sets, one for constraining the object and one constraining its border. However, they can be avoided by considering both minimum and Hausdorﬀ distances, expressing for instance that B should lay between a distance n1 and a distance n2 of A, which is a typical type of knowledge we may have in concrete problems. Therefore, the minimum distance should be greater than n1 and the Hausdorﬀ distance should be less than n2. In this case, the volume of interest for B is reduced to Dn2 ðAÞ n Dn1 1 ðAÞ. In cases where imprecision has to be taken into account, fuzzy dilations are used, with the corresponding equivalences with fuzzy distances [20,29]. The extension to approximate distances calls for fuzzy structuring elements. We deﬁne these structuring elements through their membership function on S. Structuring elements with a spherical symmetry can typically be used, where the membership degree only depends on the distance to the center of the structuring element. Let us consider the generalization to the fuzzy case of the last case (minimum distance of at least n1 and Hausdorﬀ distance of at most n2 to a fuzzy set ). Instead of deﬁning an interval [n1, n2], we consider a fuzzy interval, deﬁned as a fuzzy set on R þ having a core equal to the interval [n1, n2]. The membership function n is increasing between 0 and n1 and

ON FUZZY SPATIAL DISTANCES

107

decreasing after n2 (this is but one example). Then we deﬁne two structuring elements as: ( 1 ðÞ ¼

1 n ðdS ð, 0ÞÞ

if dS ð, 0Þ n1

0

otherwise

( 2 ðÞ ¼

1

if dS ð, 0Þ n2

n ðdS ð, 0ÞÞ

otherwise

ð95Þ

ð96Þ

where dS is the Euclidean distance in S and O the origin. The spatial fuzzy set expressing the approximate relationship about distance to is then deﬁned as: distance ¼ t½D 2 ðÞ, 1 D 1 ðÞ

ð97Þ

if n1 6¼ 0, and distance ¼ D 2 ðÞ if n1 ¼ 0. The increasing nature of fuzzy dilation with respect to both the set to be dilated and the structuring element [29] guarantees that these expressions do not lead to inconsistencies. Indeed, we have 1 2 , 1 ð0Þ ¼ 2 ð0Þ ¼ 1, and therefore D 1 ðÞ D 2 ðÞ. In the case where n1 ¼ 0, we do not have 1 ð0Þ ¼ 1 any longer, but in this case, only the dilation by 2 is considered. This case corresponds actually to a distance to less than ‘‘about n2.’’ These properties are indeed expected for representations of distance knowledge. Figure 6 illustrates this approach. The two structuring elements 1 and 2 are derived from a fuzzy interval n, are used for dilation of an object on the left (buildings extracted from a map), and distance is computed to represent the approximate knowledge about the distance to this object. This resulting fuzzy set represents the area of the space satisfying (to some degree) the relation of semantics n to the building. From an algorithmic point of view, fuzzy dilations may be quite heavy if the structuring element has a large support. However, in the case of crisp objects and structuring elements with spherical symmetry, fast algorithms can be implemented. The distance to the object A is ﬁrst computed using chamfer algorithms [35]. It deﬁnes a distance map in S, which gives the distance of each voxel to object A. This discrete distance can be made as precise as necessary [117]. Then the translation into a fuzzy volume of interest is made according to a simple look-up table derived from n. This algorithm has a linear complexity in the cardinality of S.

108

ISABELLE BLOCH

3 4

5 2

6

1.0 Membership values

1 7

8

0.8 0.6 0.4 0.2 0 0

10

20 d(v, 0)

30

39

FIGURE 6. Buildings extracted from a map, membership function n, structuring elements 1 and 2, dilation of building 1 with these two structuring elements, and representation of distance (darker gray levels indicate higher membership values).

VIII. QUALITATIVE DISTANCE

IN A

SYMBOLIC SETTING

In this section we consider distance information in a symbolic setting, using formal logics. This point has not been much addressed in the literature, contrary to other types of relationships such as topological ones. It is, however, useful if no quantitative (even in imprecise form) is available, but only purely qualitative information, and it allows for symbolic reasoning because of the logical apparatus. In the context of mereotopology, relative distance information has been modeled as a ternary relation (4). A predicate Closer(x, y, z) reads ‘‘x is closer to y than z’’ and deﬁnes a strict order on pairs of spatial entities (x, y) and (x, z) (not necessarily reduced to points). It also induces an equidistance relation. Several axioms and properties are introduced in [4], which allow one to include this notion in reasoning schemes. Here, we take a diﬀerent point of view, and propose to model distance information between two spatial entities expressed as logical formulas. We show that mathematical morphology can be deﬁned on logical formulas and that dilations and erosions lead to deﬁnition of a modal logic which is suitable for spatial reasoning [23,24].

ON FUZZY SPATIAL DISTANCES

109

A. Morpho-Logics In this section we express morphological operations in a symbolic framework, using logical formulas. Let us ﬁrst introduce some notations. Let PS be a ﬁnite set of propositional symbols. The language is generated by PS and the usual connectives, to which we will add modal operators in the following. Well-formed formulas will be denoted by Greek letters ’, . Kripke’s semantics is used. Worlds will be denoted by !, !0 and the set of all worlds by . Modð’Þ ¼ f! 2 j ! ’g is the set of all worlds where ’ is satisﬁed. The underlying idea for constructing morphological operations on logical formulas (as presented in [27]) is to consider set interpretations of formulas and worlds. Since in classical propositional logics the set of formulas is isomorphic to 2, i.e., knowing a formula is equivalent to knowing the set of worlds where the formula is satisﬁed, we can identify ’ with Mod(’), and then apply set theoretic morphological operations. We recall that Modð’ _ Þ ¼ Modð’Þ [ Modð Þ, Modð’ ^ Þ ¼ Modð’Þ \ Modð Þ, and Modð’Þ Modð Þ iﬀ ’ . Using the previous equivalences, and based on set deﬁnitions of morphological operators [142], dilation and erosion of a formula ’ have been deﬁned in [26,27] as follows: ModðDB ð’ÞÞ ¼ f! 2 j Bð!Þ \ Modð’Þ 6¼ ;g,

ð98Þ

ModðEB ð’ÞÞ ¼ f! 2 j Bð!Þ ’g:

ð99Þ

In these equations, the structuring element B represents a relationship between worlds, i.e., !0 2 Bð!Þ iﬀ !0 satisﬁes some relationship with !. The condition in Equation (98) expresses that the set of worlds in relation to ! should be consistent with ’, i.e.: 9!0 2 Bð!Þ, !0 ’: The condition in Equation (99) is stronger and expresses that ’ should be satisﬁed in all worlds in relation to !. The structuring element B representing a relationship between worlds deﬁnes a ‘‘neighborhood’’ of worlds. If it is symmetrical, it leads to symmetrical structuring elements. If it is reﬂexive, it leads to structuring elements such that ! 2 B! , which leads to interesting properties, as will be seen later. An interesting way to choose the relationship is to base it on distances between worlds, which is an important information in spatial reasoning. This allows one to deﬁne sequences of increasing structuring

110

ISABELLE BLOCH

elements deﬁned as the balls of a distance. For any distance between worlds, a structuring element of size n centered at ! takes the following form: Bn ð!Þ ¼ f!0 2 j ð!, !0 Þ ng:

ð100Þ

For instance a distance equal to 1 can represent a connectivity relation between worlds, deﬁned for instance as a diﬀerence of one literal (i.e., one literal instantiated diﬀerently in both worlds). Now we consider the framework of normal modal logics [46,87] and use an accessibility relation as relation between worlds. We deﬁne an accessibility relation from any structuring element B as follows: Rð!, !0 Þ iff !0 2 Bð!Þ:

ð101Þ

Conversely, a structuring element can be deﬁned from an accessibility relation. The accessibility relation R is reﬂexive iﬀ 8! 2 , ! 2 Bð!Þ. It is symmetrical iﬀ 8ð!, !0 Þ 2 2 , ! 2 Bð!0 Þ Q !0 2 Bð!Þ. In the following we will restrict the discussion to symmetrical relations. In general, accessibility relations derived from a structuring element are not transitive. Let us now consider the two modal operators u and s deﬁned from the accessibility relation as [46]: M, ! u ’ iff 8!0 2 , Rð!, !0 Þ ) M, !0 ’,

ð102Þ

M, ! s’ iff 9!0 2 , Rð!, !0 Þ and M, !0 ’,

ð103Þ

where M denotes a standard model related to R, which will be skipped in the notations in the following (it will be always implicitly related to the considered accessibility relation). Equation (102) can be rewritten as: ! u’ Q f!0 2 j Rð!, !0 Þg ’ Q f!0 2 j !0 2 Bð!Þg ’ Q Bð!Þ ’, which exactly corresponds to the deﬁnition of erosion of a formula according to Equation (99).

ON FUZZY SPATIAL DISTANCES

111

In a similar way, Equation (103) can be rewritten as: ! s’ Q f!0 2 j Rð!, !0 Þg \ Modð’Þ 6¼ ; Q f!0 2 j !0 2 Bð!Þg \ Modð’Þ 6¼ ; Q Bð!Þ \ Modð’Þ 6¼ ;, which exactly corresponds to a dilation according to Equation (98). This shows that we can deﬁne modal operators derived from an accessibility relation as erosion and dilation with a structuring element: u’ ¼ EB ð’Þ,

ð104Þ

s’ ¼ DB ð’Þ:

ð105Þ

The modal logic constructed from erosion and dilation has the following theorems and rules of inference5:

T: u’ ! ’ and ’ ! s’ (if B is such that 8! 2 , ! 2 Bð!Þ, leading to a reﬂexive accessibility relation). Df: s’ $ :u:’ and u’ $ :s:’. D: u’ ! s’. B: su’ ! ’ and ’ ! us’. 5c: us’ ! s’ and u’ ! su’ (if B is such that 8! 2 , ! 2 Bð!Þ). 4c: uu’ ! u’ and s’ ! ss’ (if B is such that 8! 2 , ! 2 Bð!Þ). N: u> and :s ?. M: uð’ ^ Þ ! ðu’ ^ u Þ and ðs’ _ s Þ ! sð’ _ Þ. M0 : sð’ ^ Þ ! ðs’ ^ s Þ and ðu’ _ u Þ ! uð’ _ Þ. C: ðu’ ^ u Þ ! uð’ ^ Þ and sð’ _ Þ ! ðs’ _ s Þ. R: ðu’ ^ u Þ $ uð’ ^ Þ and sð’ _ Þ $ ðs’ _ s Þ. RN: ’ : u’

5

RM: ’ ! u’ ! u

and

’ ! : s’ ! s

ð’ ^ ’0 Þ ! ðu’ ^ u’0 Þ ! u

and

ð’ _ ’0 Þ ! : ðs’ _ s’0 Þ ! s

RR:

We use similar notations as in [46] for these theorems and rules of inference.

112

ISABELLE BLOCH

RE: ’$ u’ $ u

and

’$ : s’ $ s

K: uð’! Þ!ðu’!u Þ and by duality ð:s’^s Þ! sð:’^ Þ.

Let us now denote by un the iteration of n times u (i.e., n erosions by the same structuring element). Since the succession of n erosions by a structuring element is equivalent to one erosion by a larger structuring element of size n (iterativity property of erosion), un is a new modal operator, constructed as in Equation (104). In a similar way, we denote by sn the iteration of n times s, which is again a new modal operator, due to iterativity property of dilation, constructed as in Equation (105) with a structuring element of size n. We set u1 ¼ u and s1 ¼ s. We have the additional following theorems: 0

0

0

0

un un ’ $ unþn ’, and sn sn ’ $ snþn ’ (iterativity properties of dilation and erosion). susu’ $ su’, and usus’ $ su’ (idempotence of opening and closing). More generally, from properties of closing and opening:

0

0

0

0

0

0

sn un sn un ’ $ sn un sn un ’ $ smax ðn,n Þ umax ðn,n Þ ’, and 0

0

0

0

0

0

un sn un sn ’ $ un sn un sn ’ $ umax ðn,n Þ smax ðn,n Þ ’:

0

0

0

0

0

0

For n < n0 , sn ’ ! sn ’, un ’ ! un ’, un sn ’ ! un sn ’, sn un ’ ! sn un ’.

All these deﬁnitions and properties extend to the fuzzy case if we consider fuzzy formulas, i.e., formulas ’ for which Mod(’) is a fuzzy set of . The fuzzy structuring element can be interpreted as a fuzzy relation between worlds. The use of fuzzy structuring elements appears as particularly useful for expressing intrinsically vague spatial relationships. For spatial reasoning, interpretations can represent spatial entities, like regions of the space. Formulas then represent combinations of such entities, and deﬁne regions, objects, etc. (possibly fuzzy), which may be not connected. For instance, if a formula ’ is a symbolic representation of a region X of the space, it can be interpreted for instance as ‘‘the object we are looking at is in X.’’ In an epistemic interpretation, it could represent the

113

ON FUZZY SPATIAL DISTANCES

belief of an agent that the object is in X.6 The interest of such representations could be also to deal with any kind of spatial entities, without referring to points. Using these interpretations, if ’ represents some knowledge or belief about a region X of the space, then u’ represents a restriction of X. If we are looking at an object in X, then u’ is a necessary region for this object. Similarly, s’ represents an extension of X, and a possible region for the object. B. Distances in a Qualitative Setting We propose here to use the modal operators introduced in Section VIII.A to provide symbolic and qualitative representations of spatial knowledge. Again we use expressions of minimum and Hausdorﬀ distances in terms of morphological dilations. The translation into a logical formalism is straightforward. Expressing that dN ðX, YÞ ¼ n leads to:

8m < n,sm ’ ^ inconsistent and sm ^ ’ inconsistent and sn ’ ^ consistent and sn ^ ’ consistent:

ð106Þ

Expressions like dN ðX, YÞ n translate into: sn ’ ^

consistent and sn ^ ’ consistent:

ð107Þ

Expressions like dN ðX, YÞ n translate into: 8m < n,sm ’ ^

inconsistent and sm ^ ’ inconsistent:

ð108Þ

Expressions like n1 dN ðX, YÞ n2 translate into:

8m < n1 ,sm ’ ^ inconsistent and sm ^ ’ inconsistent and sn2 ’ ^ consistent and sn2 ^ ’ consistent:

ð109Þ

The proof of these equations involves mainly T and the property 0 sn ’ ! sn ’ (see Section VIII.A). Similarly for Hausdorﬀ distance, we translate dH ðX, YÞ ¼ n by:

8m < n, ^ :sm ’ consistent or ’ ^ :sm and ! sn ’ and ’ ! sn :

consistent

ð110Þ

The ﬁrst condition corresponds to dH ðX, YÞ n and the second one to dH ðX, YÞ n. 6

This epistemic interpretation is due to Alessandro Saﬃotti (personal communication).

114

ISABELLE BLOCH

n2

ϕ

n1

ϕ

ψ n2

ϕ

n1

FIGURE 7. Illustration of a distance relation expressed by an interval.

Let us consider an example of possible use of these representations for spatial reasoning. If we are looking at an object represented by in an area which is at a distance in an interval [n1, n2] of a region represented by ’, this corresponds to a minimum distance greater than n1 and to a Hausdorﬀ distance less than n2. This is illustrated in Figure 7. Then we have to check the following relations: ! :sn1 ’ ^ sn2 ’,

ð111Þ

! un1 :’ ^ sn2 ’:

ð112Þ

or equivalently:

This expresses in a symbolic way an imprecise knowledge about distances represented as an interval. If we consider a fuzzy interval, this extends directly by means of fuzzy dilation. These expressions show how we can convert distance information, which is usually deﬁned in an analytical way, into algebraic expressions through mathematical morphology, and then into logical expressions through morphological expressions of modal operators.

IX. CONCLUSION In this chapter we have discussed several ways of deﬁning spatial distances, in diﬀerent frameworks, ranking from purely quantitative ones to purely

ON FUZZY SPATIAL DISTANCES

115

qualitative ones. Issues such as knowledge representation, formal deﬁnitions, computation and reasoning are addressed, and diﬀerent answers found depending on the type of available information and on the type of questions we want to answer. In this context, the fuzzy set framework plays a central role, since it merges elegantly quantitative and qualitative aspects. Also discussed has been the exploitation of features and properties of mathematical morphology to provide a uniﬁed framework for deﬁning and computing distances in a quantitative setting, in a fuzzy (semiqualitative) one, as well as in a purely symbolic and qualitative one. Spatial distances constitute an important part of the spatial relationships linking objects in the space, and appear therefore as knowledge or information of major importance in spatial reasoning, which can be combined with other relationships in a fusion process. In particular, qualitative spatial reasoning and spatial reasoning under imprecision can beneﬁt from the proposed approaches. Another interesting research perspective is to further investigate the links between the diﬀerent views of space, as presented in Section II, and the various formalisms.

REFERENCES 1. Aisbett, J., and Gibbon, G. (2001). A general formulation of conceptual spaces as a meso level representation. Artiﬁcal Intelligence 133, 189–232. 2. Allen, J. (1983). Maintaining knowledge about temporal intervals. Commun. of the ACM 26(11), 832–843. 3. Asher, N., and Vieu, L. (1995). Toward a geometry of common sense: A semantics and a complete axiomatization of mereotopology. IJCAI ’95, San Mateo, CA, pp. 846–852. 4. Aurnague, M., Vieu, L., and Borillo, A. (1997). Repre´sentation formelle des concepts spatiaux dans la langue, in Language et cognition spatiale, edited by M. Denis. Paris: Masson, pp. 69–102. 5. De Baets, B., and Mesiar, R. (1997). T-partitions, T-equivalences and pseudo-metrics, in Seventh IFSA World Congress, Prague, Vol. I, pp. 187–192, June 1997. 6. Bandler, W., and Kohout, L. (1980). Fuzzy power sets and fuzzy implication operators. Fuzzy Sets and Systems 4, 13–30. 7. Bengoetxea, E., Larranaga, P., Bloch, I., Perchant, A., and Boeres, C. (2002). Inexact graph matching by means of estimation of distribution algorithms. Pattern Recognition 35, 2867–2880. 8. Benoit, E., and Foulloy, L. (1993). Capteurs ﬂous multicomposantes: applications a` la reconnaissance des couleurs, in Les Applications des Ensembles Flous. Nıˆ mes, France, pp. 167–176. 9. Berthoz, A. (2002). Strate´gies cognitives et me´moire spatiale. in Colloque Cognitique. Paris. 10. Bhandari, D., and Pal, N. R. (1993). Some new information measure for fuzzy sets. Information Sci. 67, 209–228. 11. Bhandari, D., Pal, N. R., and Majumder, D. D. (1992). Fuzzy divergence, probability measure of fuzzy events and image thresholding. Pattern Recognition Lett. 13, 857–867.

116

ISABELLE BLOCH

12. Binaghi, E., Della Ventura, A., Rampini, A., and Schettini, R. (1993). Fuzzy reasoning approach to similarity evaluation in image analysis. Int. J. Intelligent Systems 8, 749–769. 13. Blades, M. (1991). The development of the abilities required to understand spatial representations, in Cognitive and Linguistic Aspects of Geographic Space, edited by D. M. Mark and A. U. Frank. NATO ASI, Kluwer, pp. 81–116. 14. Bloch, I. (1996). Distances in fuzzy sets for image processing derived from fuzzy mathematical morphology (invited conference). Information Processing and Management of Uncertainty in Knowledge-Based Systems. Spain: Granada, pp. 1307–1312,. 15. Bloch, I. (1996). Fuzzy geodesic distance in images, in Lecture Notes in Artiﬁcial Intelligence: Fuzzy Logic in Artiﬁcial Intelligence, towards Intelligent Systems, edited by A. Ralescu and T. Martin. Berlin: Springer Verlag, pp. 153–166. 16. Bloch, I. (1996). Fuzzy relative position between objects in images: a morphological approach, in IEEE Int. Conf. on Image Processing ICIP’96, Lausanne, Vol II, pp. 987–990. 17. Bloch, I. (1996). Image information processing using fuzzy sets. World Automation Congress, Soft Computing with Industrial Applications. Montpellier, France, pp. 79–84. 18. Bloch, I. (1998). Fuzzy morphology and fuzzy distances: New deﬁnitions and links in both Euclidean and geodesic cases, in Lecture Notes in Artiﬁcial Intelligence: Fuzzy Logic in Artiﬁcial Intelligence, edited by A. Ralescu and J. Shanahan. Berlin: Springer Verlag, pp. 149–165. 19. Bloch, I. (1999). Fuzzy relative position between objects in image processing: a morphological approach. IEEE Trans. Pattern Analysis and Machine Intelligence 21(7), 657–664. 20. Bloch, I. (1999). On fuzzy distances and their use in image processing under imprecision. Pattern Recognition 32(11), 1873–1895. 21. Bloch, I. (2000). Geodesic balls in a fuzzy set and fuzzy geodesic mathematical morphology. Pattern Recognition 33(6), 897–905. 22. Bloch, I. (2000). Spatial representation of spatial relationships knowledge. 7th International Conference on Principles of Knowledge Representation and Reasoning KR 2000, Breckenridge, CO, pp. 247–258. 23. Bloch, I. (2000). Using mathematical morphology operators as modal operators for spatial reasoning. ECAI 2000, Workshop on Spatio-Temporal Reasoning, Berlin, pp. 73–79. 24. Bloch, I. (2002). Modal logics based on mathematical morphology for spatial reasoning. J. Applied Non Classical Logics 12(3–4), 399–424. 25. Bloch, I., Ge´raud, T., and Maıˆ tre, H. (2003). Representation and fusion of heterogeneous fuzzy information in the 3D space for model-based structural recognition—Application to 3D brain imaging. Artiﬁcial Intelligence 148, 731–741. 26. Bloch, I., and Lang, J. (2000). Towards mathematical morpho-logics. 8th International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems, IPMU 2000, Madrid, Vol. III, pp. 1405–1412. 27. Bloch, I., and Lang, J. (2002). Towards mathematical morpho-logics, in Technologies for Constructing Intelligent Systems, edited by B. Bouchon-Meunier, J. Gutierrez-Rios, L. Magdalena, and R. Yager. Springer, pp. 367–380. 28. Bloch, I., and Maıˆ tre, H. (1995). Fuzzy distances and image processing (invited conference). ACM Symposium on Applied Computing, Nashville, TN, pp. 570–574. 29. Bloch, I., and Maıˆ tre, H. (1995). Fuzzy mathematical morphologies: A comparative study. Pattern Recognition 28(9), 1341–1387. 30. Bloch, I., Maıˆ tre, H., and Anvari, M. (1997). Fuzzy adjacency between image objects. Int. J. Uncertainty Fuzziness and Knowledge-Based Systems 5(6), 615–653. 31. Bloch, I., Maıˆ tre, H., and Minoux, M. (1993). Optimal matching of 3-D convex polyhedra with applications to pattern recognition. Pattern Recognition and Image Analysis 3(2), 137–149.

ON FUZZY SPATIAL DISTANCES

117

32. Bloch, I., Pellot, C., Sureda, F., and Herment, A. (1996). Fuzzy modelling and fuzzy mathematical morphology applied to 3D reconstruction of blood vessels by multi-modality data fusion, in Fuzzy Set Methods in Information Engineering: A Guided Tour of Applications, edited by D. Dubois, R. Yager, and H. Prade. New York: John Wiley, Chapter 5, pp. 93–110. 33. Bodenhofer, U. (2003). A note on approximate equality versus the Poincare´ paradox. Fuzzy Sets and Systems 133, 155–160. 34. Boixader, D. (2003). On the relationship between T-transitivity and approximate equality. Fuzzy Sets and Systems 133, 161–169. 35. Borgefors, G. (1996). Distance transforms in the square grid, in Progress in Picture Processing, Les Houches, Session LVIII, edited by H. Maıˆ tre. Amsterdam: North-Holland, chapter 1.4, pp. 46–80. 36. Bouayad, M., Leschi, C., and Emptoz, H. (1995). Contribution de la logique ﬂoue a` la mode´lisation de la reconnaissance des formes, in Rencontres francophones sur la logique ﬂoue et ses applications. Paris, pp. 49–56. 37. Bouchon-Meunier, B., Rifqi, M., and Bothorel, S. (1996). Towards general measures of comparison of objects. Fuzzy Sets and Systems 84(2), 143–153. 38. Bouchon-Meunier, B., and Yager, R. R. (1993). Entropy of similarity relations in questionnaires and decision trees. Second IEEE Int. Conf. on Fuzzy Systems, San Francisco, CA, pp. 1225–1230. 39. Boxer, L. (1997). On Hausdorﬀ-like metrics for fuzzy sets. Pattern Recognition Lett. 18, 115–118. 40. Brass, P. (2002). On the nonexistence of Hausdorﬀ-like metrics for fuzzy sets. Pattern Recognition Lett. 23, 39–43. 41. Briggs, R. (1973). Urban cognitive distance, in Image and Environment: Cognitive Mapping and Spatial Behavior, edited by R. M. Downs and D. Stea. Chicago: Aldine, pp. 361–388. 42. Cayrol, M., Farreny, H., and Prade, H. (1982). Fuzzy pattern matching. Kybernetes 11, 103–116. 43. Cesar, R., Bengoetxea, E., and Bloch, I. (2002). Inexact graph matching using stochastic optimization techniques for facial feature recognition. International Conference on Pattern Recognition, ICPR 2002, Que´bec. 44. Chang, C.-C., and Jiang, J.-H. (1996). A spatial ﬁlter for similarity retrieval. Int. J. of Pattern Recognition and Artiﬁcial Intelligence 10(6), 711–730. 45. Chauduri, B. B., and Rosenfeld, A. (1996). On a metric distance between fuzzy sets. Pattern Recognition Lett. 17, 1157–1160. 46. Chellas, B. (1980). Modal Logic, an Introduction. Cambridge: Cambridge University Press. 47. Chen, S. M., Yeh, M. S., and Hsio, P. Y. (1995). A comparison of similarity measures of fuzzy values. Fuzzy Sets and Systems 72, 79–89. 48. Clementini, E., and Di Felice, O. (1997). Approximate topological relations. Int. J. Approximate Reasoning 16, 173–204. 49. Clementini, E., Di Felice, P., and Hernandez, D. (1997). Qualitative representation of positional information. Artiﬁcial Intelligence 95, 317–356. 50. De Cock, M., and Kerre, E. (2003). On (u)n-suitable fuzzy relations to model approximate equality. Fuzzy Sets and Systems 133, 137–153. 51. De Cock, M., and Kerre, E. (2003). Why fuzzy T-equivalence relations do not resolve the Poincare´ paradox, and related issues. Fuzzy Sets and Systems 133, 181–192. 52. Cohn, A., Bennett, B., Gooday, J., and Gotts, N. M. (1997). Representing and reasoning with qualitative spatial relations about regions, in Spatial and Temporal Reasoning, edited by O. Stock. Kluwer, pp. 97–134.

118

ISABELLE BLOCH

53. Colliot, O., Bloch, I., and Tuzikov, A. (2002). Characterization of approximate plane symmetries for 3D fuzzy objects. In IPMU 2002, Annecy, France, Vol. III, pp. 1749–1756. 54. Cortelazzo, G., Deretta, G., Mian, G. A., and Zamperoni, P. (1996). Normalized weighted Levensthein distance and triangle inequality in the context of similarity discrimination of bilevel images. Pattern Recognition Lett. 17, 431–436. 55. Cross, V., and Cabello, C. (1995). A mathematical relationship between set-theoretic and metric compatibility measures. ISUMA-NAFIPS’95, College Park, MD, pp. 169–174. 56. Cuxac, C. (1999). French sign language: Proposition of a structural explanation by iconicity. Gesture Workshop, pp. 173–180. 57. de Barros, L. C., Bassanezi, R. C., and Tonelli, P. A. (1997). On the continuity of the Zadeh’s extension. Seventh IFSA World Congress, Prague, Vol. II, pp. 3–8. 58. Denis, M., Pazzaglia, F., Cornoldi, C., and Bertolo, L. (1999). Spatial discourse and navigation: An analysis of route directions in the city of Venice. Appl. Cognitive Psychology 13, 145–174. 59. Dessalles, J.-L. (2000). Aux origines du langage. Paris: Herme`s. 60. Dubois, D., and Jaulent, M.-C. (1987). A general approach to parameter evaluation in fuzzy digital pictures. Pattern Recognition Lett. 6, 251–259. 61. Dubois, D., and Prade, H. (1980). Fuzzy Sets and Systems: Theory and Applications. New York: Academic Press. 62. Dubois, D., and Prade, H. (1983). On distance between fuzzy points and their use for plausible reasoning. Int. Conf. Systems, Man, and Cybernetics, pp. 300–303. 63. Dubois, D., and Prade, H. (1985). A review of fuzzy set aggregation connectives. Information Sciences 36, 85–121. 64. Dubois, D., and Prade, H. (1991). A glance at non-standard models and logics of uncertainty and vagueness. Technical Report IRIT/91-97/R, IRIT, Toulouse, France. 65. Dubois, D., Prade, H., and Testemale, C. (1988). Weighted fuzzy pattern matching. Fuzzy Sets and Systems 28, 313–331. 66. Dutta, S. (1991). Approximate spatial reasoning: Integrating qualitative and quantitative constraints. Int. J. of Approximate Reasoning 5, 307–331. 67. Edwards, G. (1997). Geocognostics: A new framework for spatial information theory, in Spatial Information Theory: A Theoretical Basis for GIS, volume 1329 of LNCS. Springer, pp. 455–471. 68. Einstein, A., (1916). General Theory of Relativity. 69. Emptoz. H. (1983). Mode`le pre´topologique pour la reconnaissance des formes. Applications en neurophysiologie. Doctoral thesis, Univ. Claude Bernard, Lyon I, France. 70. Fan, J., and Xie, W. (1999). Some notes on similarity measure and proximity measure. Fuzzy Sets and Systems 101, 403–412. 71. Fan, J.-L. (1988). Note on Hausdorﬀ-like metrics for fuzzy sets. Pattern Recognition Lett. 23, 793–796. 72. Frank, A. U. (1992). Qualitative spatial reasoning with cardinal directions. J. Visual Languages and Computing 3, 343–371. 73. Freeman, J., (1975). The modelling of spatial relations. Computer Graphics and Image Processing 4(2), 156–171. 74. Gahegan, M. (1995). Proximity operators for qualitative spatial reasoning, in Spatial Information Theory: A Theoretical Basis for GIS, volume 988 of LNCS, edited by A. U. Frank and W. Kuhn, Springer. 75. Gapp, K. P. (1984). Basic meanings of spatial relations: Computation and evaluation in 3D space, in 12th National Conference on Artiﬁcial Intelligence, AAAI-94, Seattle, WA, pp. 1393–1398.

ON FUZZY SPATIAL DISTANCES

119

76. Ga¨rdenfors, P. (2000). Conceptual Spaces: The Geometry of Thought. Cambridge, MA: MIT Press. 77. Ga¨rdenfors, P., and Williams, M.-A. (2001). Reasoning about categories in conceptual spaces. IJCAI’01, Seattle, WA, pp. 385–392. 78. Gasos, J., and Ralescu, A. (1997). Using imprecise environment information for guiding scene interpretation. Fuzzy Sets and Systems 88, 265–288. 79. Gaso´s, J., and Saﬃotti, A. (2000). Using fuzzy sets to represent uncertain spatial knowledge in autonomous robots. J. Spatial Cognition and Computation 1, 205–226. 80. Ge´raud, T., Bloch, I., and Maıˆ tre, H. (1999). Atlas-guided recognition of cerebral structures in MRI using fusion of fuzzy structural information. CIMAF’99 Symposium on Artiﬁcial Intelligence, La Havana, Cuba, pp. 99–106. 81. Gerstenkorn, T., and Manko, J. (1991). Correlation of intuitionistic fuzzy sets. Fuzzy Sets and Systems 44, 39–43. 82. Goetschel, R., and Voxman, W. (1983). Topological properties of fuzzy numbers. Fuzzy Sets and Systems 10, 87–99. 83. Guesgen, H. W., and Albrecht, J. (2000). Imprecise reasoning in geographic information systems. Fuzzy Sets and Systems 113, 121–131. 84. Hart, R. A., and Moore, G. T. (1973). The development of spatial cognition: A review, in Image and Environment: Cognitive Mapping and Spatial Behavior, edited by R. M. Downs and D. Stea. Chicago: Aldine. 85. Helmoltz, H. (1878). The Facts of Perception. 86. Herskovits, A. (1986). Language and Spatial Cognition. An Interdisciplinary Study of the Prepositions in English. Cambridge, MA: Cambridge University Press. 87. Hughes, G. E., and Cresswell, M. J. (1968). An Introduction to Modal Logic. London: Methuen. 88. Huttenlocher, D. P., Klanderman, G. A., and Rucklidge, W. J. (1993). Comparing images using the Hausdorﬀ distance. IEEE Trans. Pattern Analysis and Machine Intelligence 15(9), 850–863. 89. Hyung, L. K., Song, Y. S., and Lee, K. M. (1894). Similarity measure between fuzzy sets and between elements. Fuzzy Sets and Systems 62, 291–293. 90. Jacas, J., and Recasens, J. (1996). One-dimensional indistinguishability operators. Information Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU’96, Granada, Spain, Vol. I, pp. 377–381. 91. Jain, R., Murthy, S. N. J., and Chen, P. L. J. (1995). Similarity measures for image databases. IEEE Int. Conf. on Fuzzy Systems, Yokohama, Japan, pp. 1247–1254. 92. Janis, V. (2003). Resemblance is a nearness. Fuzzy Sets and Systems 133, 171–173. 93. Jaulent, M. C., and Yang, A. (1994). Application of fuzzy pattern matching to the ﬂexible interrogation of a digital angiographies database. IPMU, Paris, pp. 904–909. 94. Kandel, A., and Byatt, W. J. (1978). Fuzzy sets, fuzzy algebra, and fuzzy statistics. Proc. IEEE 66(12), 1619–1639. 95. Kandil, A., Sedamed, E. H., and Morsi, N. N. (1994). Fuzzy proximities induced by functional fuzzy separation. Fuzzy Sets and Systems 63, 227–235. 96. Kant, E. (1781). Critique of Pure Reason. 97. Kaufman, A. (1973). Introduction to the Theory of Fuzzy Subsets: Fundamental Theoretical Elements. New York: Academic Press. 98. Keller, J., and Wang, X. (2000). A fuzzy rule-based approach to scene description involving spatial relationships. Computer Vision and Image Understanding 80, 21–41. 99. Keller, J. M., and Wang, X. (1995). Comparison of spatial relation deﬁnitions in computer vision. ISUMA-NAFIPS’95, College Park, MD, pp. 679–684.

120

ISABELLE BLOCH

100. Kitamoto, A., and Takagi, M. (1995). Retrieval of satellite cloud imagery based on subjective similarity. 9th Scandinavian Conference on Image Analysis, Uppsala, Sweden, pp. 449–456. 101. Klawonn, F. (2003). Should fuzzy equality and similarity satisfy transitivity? Comments on the paper by M. De Cock and E. Kerre, Fuzzy Sets and Systems 133, 175–180. 102. Krishnapuram, R., Keller, J. M., and Ma, Y. (1993). Quantitative analysis of properties and spatial relations of fuzzy image regions. IEEE Trans. on Fuzzy Systems 1(3), 222–233. 103. Kuipers, B. (1978). Modeling spatial knowledge. Cognitive Science 2, 129–153. 104. Kuipers, B. (2000). The spatial semantic hierarchy. Artiﬁcal Intelligence 119, 191–233. 105. Kuipers, B. J., and Levitt, T. S. (1988). Navigation and mapping in large-scale space. AI Magazine 9(2), 25–43. 106. Kullback, S. (1959). Information Theory and Statistics. New York: Wiley. 107. Lambrey, S., Viaud-Delmon, I., and Berthoz, A. (2002). Inﬂuence of a sensorimotor conﬂict on the memorization of a path traveled in virtual reality. Cognitive Brain Research 14, 177–186. 108. Liu, J. (1998). A method of spatial reasoning based on qualitative trigonometry. Artiﬁcial Intelligence 98, 137–168. 109. Liu, X., Tan, S., Srinivasan, V., Ong, S. H., and Xie, Z. (1994). Fuzzy pyramid-based invariant object recognition. Pattern Recognition 27(5), 741–756. 110. Liu, Z.-Q. and Satur, R., (1999). Contextual fuzzy cognitive map for decision support in geographic information systems. IEEE Trans. Fuzzy Systems 7(5), 495–505. 111. Lowen, R., and Peeters, W. Anisotropic semi-pseudometrics, unpublished manuscript. 112. Lowen, R., and Peeters, W. (1997). On various classes of semi-pseudometrics used in pattern recognition. Seventh IFSA World Congress, Prague, Vol. I, pp. 232–237. 113. Lowen, R., and Peeters, W. (1998). Distances between fuzzy sets representing grey level images. Fuzzy Sets and Systems 99(2), 135–150. 114. De Luca, A., and Termini, S. (1972). A deﬁnition of non-probabilistic entropy in the setting of fuzzy set theory. Information and Control 20, 301–312. 115. Maher, P. E. (1993). A similarity measure for conceptual graphs. Int. J. Intelligent Systems 8, 819–837. 116. Man, G. T., and Poon, J. C. (1993) a fuzzy-attributed graph approach to handwritten character recognition. Second IEEE Int. Conf. on Fuzzy Systems, San Francisco, CA, pp. 570–575. 117. Mangin, J.-F., Bloch, I., Lopez-Krahe, J., and Frouin, V. (1994). Chamfer distances in anisotropic 3D images. EUSIPCO 94, Edinburgh, UK, pp. 975–978. 118. Mark, D. M., and Egenhofer, M. J. (1994). Modeling spatial relations between lines and regions: combining formal mathematical models and human subjects testing. Cartography and Geographic Information Systems 21(4), 195–212. 119. Mascarilla, L. (1994). Rule extraction based on neural networks for satellite image interpretation. SPIE Image and Signal Processing for Remote Sensing, Rome, vol. 2315, pp. 657–668. 120. Masson, M., and Denœux, T. (2002). Multidimensional scaling of fuzzy dissimilarity data. Fuzzy Sets and Systems 128, 339–352. 121. Mellet, E., Bricogne, S., Tzourio-Mazoyer, N., Ghae¨m, O., Petit, L., Zago, L., Etard, O., Berthoz, A., Mazoyer, B., and Denis, M. (2000). Neural correlates of topographic mental exploration: The impact of route versus survey perspective learning. NeuroImage 12(5), 588–600. 122. Miyajima, K., and Ralescu, A. (1994). Spatial organization in 2D segmented images: Representation and recognition of primitive spatial relations. Fuzzy Sets and Systems 65, 225–236.

ON FUZZY SPATIAL DISTANCES

121

123. Montello, D. R. (1993). Scale and multiple psychologies of space. COSIT ’93, volume 716 of LNCS, Elba Island, Italy, pp. 312–321. 124. Nadel, L. (1995). The psychobiology of spatial behavior: the hippocampal formation and spatial mapping, in Behavioral Brain Research in Naturalistic and Semi-Naturalistic Settings: Possibilities and Perspectives, edited by E. Alleva, H.-P. Lipp, L. Nadel, A. Fasolo, and L. Ricceri. Dordrecht: Kluwer. 125. Nadel, L. (2002). Multiple perspectives in spatial cognition, in Colloque Cognitique. Paris. 126. Pappis, C. P., and Karacapilidis, N. I. (1993). A comparative assessment of measures of similarity of fuzzy values. Fuzzy Sets and Systems 56, 171–174. 127. Perchant, A., and Bloch, I. (2002). Fuzzy morphisms between graphs. Fuzzy Sets and Systems 128(2), 149–168. 128. Peuquet, D. J. (1998). Representations of geographical space: Toward a conceptual synthesis. Annals of the Association of American Geographers 78(3), 375–394. 129. Piaget, J., and Inhelder, B. (1967). The Child’s Conception of Space. New York: Norton. 130. Poincare´, H. (1902). La science et l’hypothe`se. Flammarion. 131. Potoczny, H. B. (1984). On similarity relations in fuzzy relational databases. Fuzzy Sets and Systems 12, 231–235. 132. Pullar, D., and Egenhofer, M. (1988). Toward formal deﬁnitions of topological relations among spatial objects. Third Int. Symposium on Spatial Data Handling, Sydney, Australia, pp. 225–241. 133. Puri, M. L. and Ralescu, D. A. (1981). Diﬀe´rentielle d’une fonction ﬂoue. C.R. Acad. Sci. Paris I 293, 237–239. 134. Puri, M. L., and Ralescu, D. A. (1983). Diﬀerentials of fuzzy functions. J. Mathematical Analysis and Applications 91, 552–558. 135. Randell, D., Cui, Z., and Cohn, A. (1992). A spatial logic based on regions and connection, Principles of Knowledge Representation and Reasoning, KR’92, San Mateo, CA, pp. 165–176. 136. Rifqi, M. (1995). Mesures de similitude et leur agre´gation, in Rencontres francophones sur la logique ﬂoue et ses applications. Paris, pp. 80–87. 137. Rosenfeld, A. (1984). The fuzzy geometry of image subsets. Pattern Recognition Lett. 2, 311–317. 138. Rosenfeld, A. (1985). Distances between fuzzy sets. Pattern Recognition Lett. 3, 229–233. 139. Saha, P. K., Wehrli, E. W., and Gomberg, B. R. (2003). Fuzzy distance transform: Theory, algorithms, and applications. Fuzzy Sets and Systems 86, 171–190. 140. Sallandre, M.-A., and Cuxac, C. (2001). Iconicity in sign language: A theoretical and methodological point of view. Gesture Workshop, 173–180. 141. Satur, R., and Liu, Z.-Q. (1999). A contextual fuzzy cognitive map framework for geographic information systems. IEEE Trans. Fuzzy Systems 7(5), 481–494. 142. Serra, J. (1982). Image Analysis and Mathematical Morphology. London: Academic Press. 143. Siegel, A. W., and White, S. H. (1975). The development of spatial representations of large-scale environments, in Advances in Child Development and Behavior, edited by H. W. Reese. New York: Academic Press. Vol. 10. 144. Sinha, D., and Dougherty, E. R. (1993). Fuzziﬁcation of set inclusion: Theory and applications. Fuzzy Sets and Systems 55, 15–42. 145. Sokic, C., and Pavlovic-Lazetic, G. (1997). Homogeneous images in fuzzy databases. In Seventh IFSA World Congress, Prague, Vol. IV, pp. 297–303. 146. Sridharan, K., and Stephanou, H. E. (1999). Fuzzy distances for proximity characterization under uncertainty. Fuzzy Sets and Systems 103, 427–434. 147. Talmy, L. (1983). How language structures space, in Spatial Orientation: Theory, Research and Application, edited by H. L. Pick and L. P. Acredolo. New York: Plenum Press.

122

ISABELLE BLOCH

148. Talmy, L. (2000). Toward a Cognitive Semantics. Cambridge, MA: MIT Press. 149. Tan, S. K., Teh, H. H., and Wang, P. Z. (1994). Sequential representation of fuzzy similarity relations. Fuzzy Sets and Systems 67, 181–189. 150. Tran, L., and Duckstein, L. (2002). Comparison of fuzzy numbers using a fuzzy distance measure. Fuzzy Sets and Systems 130, 331–341. 151. Tversky, A. (1977). Features of similarity. Psychological Review 84(4), 327–352. 152. Vandeloise, C. (1986). L’espace en franc¸ais: se´mantique des pre´positions spatiales. Paris: Seuil, travaux en linguistique. 153. Varzi, A. (1996). Parts, wholes, and part–whole relations: The prospects of mereotopology. Data and Knowledge Engineering 20(3), 259–286. 154. Vieilledent, S, Kosslyn, S.M., Berthoz, A., and Giraudo, M. D. (2003). Does mental simulation of following a path improve navigation performance without vision? Cognitive Brain Research 16(2), 238–249. 155. Vieu, L. (1997). Spatial representation and reasoning in artiﬁcial intelligence, in Spatial and Temporal Reasoning, edited by O. Stock. Dordrecht: Kluwer, pp. 5–41. 156. Walker, E. L. (1996). Fuzzy relations for feature–model correspondence in 3D object recognition. NAFIPS, Berkeley, CA, pp. 28–32. 157. Wang, W.-J. (1997). New similarity measures on fuzzy sets and on elements. Fuzzy Sets and Systems 85, 305–309. 158. Wang, X., De Baets, B., and Kerre, E. (1995). A comparative study of similarity measures. Fuzzy Sets and Systems 73, 259–268. 159. Yager, R. Y. (1992). Entropy measures under similarity relations. Int. J. General Systems 20, 341–358. 160. Zadeh, L. (1978). Fuzzy sets and information granularity, in Advances in Fuzzy Set Theory and Applications, edited by Gupta, M., Ragade, R., and Yager, R. Amsterdam: NorthHolland, pp. 3–18. 161. Zadeh, L. A. (1971). Similarity relations and fuzzy orderings. Information Sci. 3, 177–200. 162. Zadeh, L. A. (1975). The concept of a linguistic variable and its application to approximate reasoning. Information Sci. 8, 199–249. 163. Zwick, R., Carlstein, E., and Budescu, D. V. (1987). Measures of similarity among fuzzy concepts: A comparative analysis. Int. J. Approximate Reasoning 1, 221–242.

ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 128

Mathematical Morphology Applied to Circular Data ALLAN HANBURY* Pattern Recognition and Image Processing Group (PRIP), Vienna University of Technology, Favoritenstraße 9/1832, A-1040 Vienna, Austria I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . II. Processing Circular Data . . . . . . . . . . . . . . . . . . . . . A. Circular Data and the Unit Circle . . . . . . . . . . . . . B. Circular Statistics . . . . . . . . . . . . . . . . . . . . . . . C. Mathematical Morphology Applied to the Unit Circle . . D. Morphology with the Choice of an Origin . . . . . . . . . E. Pseudodilation and Pseudoerosion . . . . . . . . . . . . . 1. Morphological Center . . . . . . . . . . . . . . . . . . . 2. Erosion and Dilation . . . . . . . . . . . . . . . . . . . F. Circular Centered Morphology . . . . . . . . . . . . . . . 1. Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Top-Hat . . . . . . . . . . . . . . . . . . . . . . . . . . G. Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Connected Partitions . . . . . . . . . . . . . . . . . . . 2. Indexed Partitions . . . . . . . . . . . . . . . . . . . . . 3. Cyclic Operators . . . . . . . . . . . . . . . . . . . . . . 4. Series Closings . . . . . . . . . . . . . . . . . . . . . . . 5. Parallel Openings . . . . . . . . . . . . . . . . . . . . . 6. Rotationally Invariant Cyclic Opening. . . . . . . . . . H. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . III. Application Examples . . . . . . . . . . . . . . . . . . . . . . A. Homogeneous Phase Extraction in HRTEM Images . . . B. Oriented Texture . . . . . . . . . . . . . . . . . . . . . . . 1. The Rao and Schunck Algorithm . . . . . . . . . . . . 2. Segmentation. . . . . . . . . . . . . . . . . . . . . . . . 3. Defect Detection with the Circular Centered Top-Hat . 4. Defect Detection with the Labeled Opening. . . . . . . C. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . IV. 3D Polar Coordinate Color Spaces . . . . . . . . . . . . . . . A. Basic Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . B. 3D Polar Coordinate Color Representations . . . . . . . . C. Discussion of the Existing 3D Polar Coordinate Spaces. . D. Derivation of a Useful 3D Polar Coordinate Space . . . . 1. Brightness . . . . . . . . . . . . . . . . . . . . . . . . . 2. Hue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Saturation . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

124 126 126 128 129 130 132 132 135 136 136 138 141 141 141 142 142 144 146 151 153 153 156 157 161 163 164 168 169 169 171 172 173 175 175 176

*The majority of this work was done while the author was with the Centre for Mathematical Morphology, Paris School of Mines, France. It is supported by the Austrian Science Foundation (FWF) under grants P14445-MAT and P14662-INF. 123

Copyright ß 2003 Elsevier Inc. All rights reserved. 1076-5670/2003 $35.00

124

ALLAN HANBURY

4. Chroma. . . . . . . . . . . . . . . . . . . . . . . . E. The IHLS Space . . . . . . . . . . . . . . . . . . . . 1. The Simplest RGB to IHLS Transformation . . . 2. An Alternative RGB to IHLS Transformation . . 3. The Inverse Transformation from IHLS to RGB. F. Conclusion . . . . . . . . . . . . . . . . . . . . . . . V. Processing of 3D Polar Coordinate Color Spaces . . . . A. Color Statistics . . . . . . . . . . . . . . . . . . . . . B. Vectorial Mathematical Morphology . . . . . . . . . 1. Vectorial Orders . . . . . . . . . . . . . . . . . . . 2. Morphological Operators . . . . . . . . . . . . . . C. Lexicographical Orders in the IHLS Color Space . . 1. Luminance and Saturation . . . . . . . . . . . . . 2. Hue . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Saturation-Weighted Hue . . . . . . . . . . . . . . 4. Color Top-Hat. . . . . . . . . . . . . . . . . . . . 5. Summary . . . . . . . . . . . . . . . . . . . . . . . D. Conclusion . . . . . . . . . . . . . . . . . . . . . . . VI. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A: Connected Partitions . . . . . . . . . . . . Appendix B: Cyclic Closings on Indexed Partitions . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

177 178 178 179 179 180 181 181 183 183 187 187 187 189 191 194 194 195 196 199 199 201 201

I. INTRODUCTION Data represented by angles or by two-dimensional orientations, called circular data, often appear in the analysis of the natural world. Some examples are wind directions, the directions of departure of birds or animals from a point of liberation, and the orientations of fracture planes in rocks. The statistical analysis of circular data is a well-studied subject (Fisher, 1993; Mardia and Jupp, 1999), but in the context of image processing and analysis, in which this type of data is also found, the development of methods for processing it correctly has received less attention. For color images, the hue component of color representations in 3D polar coordinates is an angular value. For this reason, the hue has properties diﬀerent from those of its accompanying components, the saturation and brightness. Nevertheless, this diﬀerence is often ignored and the same algorithms are applied to the three components. It is also often necessary to process two-dimensional direction ﬁelds, for example in the analysis of oriented textures, of the vector ﬁelds produced by movement analysis in image sequences, or of the vector ﬁeld produced by the Fourier transform of any image. Spectrograms, the result of a series of short-time Fourier transforms applied to a unidimensional signal, can also be visualized as a vector

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

125

ﬁeld. With the results of a Fourier transform, one has the tendency to use only the vector amplitudes, leaving aside their directions (the phase). Why is there this reticence to process the angular values? Why is the nature of circular data sometimes ignored, leading them to be processed as linear data?1 Circular data can be visualized as points on the circumference of the unit circle, the circle with unit radius. By using this representation, one immediately sees the characteristics that render circular data diﬃcult to process. They are cyclic—adding 2p or one of its multiples to a coordinate brings one back to the original position. Furthermore, there is no obvious origin—each position on the circle is equal to every other. For this reason, King Arthur chose a round table for his knights, and as for the knights, one cannot impose an order by magnitude on circular data. But are these problems insurmountable? Some possible solutions are discussed in this chapter. In particular, we show how mathematical morphology operators can be applied to this type of data. The notion of rotational invariance is important in the context of circular data processing. In a set of directions, the numerical value of the coordinate of each direction depends on the position chosen for the origin. If an operator acting on this set always gives the same direction as a result, independently of the position of the origin (note that this is not necessarily the same numerical value), then the operator is said to be rotationally invariant. For circular data, for which an obvious origin does not exist, this property is desirable, and among the morphological operators that we develop, those which satisfy this property are indicated. This chapter is concerned mainly with the processing of circular data in the context of image processing and analysis. It is essentially a translation into English of some parts of a Ph.D. thesis (Hanbury, 2002). It presents expanded versions of the material presented by Hanbury and Serra (2001a,c). In Section II, we develop morphological operators pertinent to circular data. The ﬁrst applications of these operators are presented in Section III, in the context of processing Fourier transform phase images and oriented textures, applications in which it is often possible to process the circular data component in isolation. We then move on to the case where an angular coordinate forms part of a vector, as found in color images represented in a 3D polar coordinate system. In an eﬀort to simplify the choice of a 3D polar coordinate color representation amongst the multitude available in the literature, Section IV discusses and develops the IHLS (improved hue, luminance, and saturation) system, a system of 1

The practitioners of image processing are not the only people to err in this direction. At one stage, statistics of wind directions were calculated using standard linear statistics, necessitating the later development of methods for correcting these errors (Fisher, 1993).

126

ALLAN HANBURY

3D polar coordinates suitable for use in image processing and analysis. The application of morphological operators in the IHLS space is discussed in Section V. A detailed introduction to mathematical morphology is outside the scope of this chapter. The reader is referred to Soille (1999) for a practical introduction, and to Serra (1982) and Heijmans (1994) for a more mathematical treatment. II. PROCESSING CIRCULAR DATA In image processing and analysis, one is sometimes confronted by images containing angular values at each point. Three applications of this type, which are described in more detail in Sections III and V, are the processing of the hue component of color images, of a direction ﬁeld describing an oriented texture, and of the phase image produced by a Fourier transform. Angular data can be visualized as points on the unit circle, a representation discussed in Section II.A. The circle has neither order of importance of points nor a dominant position which could be taken as an origin. In addition, the data are cyclic—adding 2p to a coordinate on the circle gives a result at the same position as the initial point. In general, for this type of data, the classic operators designed for use on linear data are not valid. For statistical descriptors, one is inconvenienced by the periodicity; for mathematical morphology, also by the lack of an obvious origin based on which one can construct a lattice. The statistics of circular data is a welldeveloped area of research. We present in Section II.B a brief review of the circular statistics descriptions (Fisher, 1993; Mardia and Jupp, 1999) which are useful in our applications. We then move on to answering the following important question: Can we avoid the diﬃculty imposed by the lack of an obvious origin, and hence develop morphological operators which are rotationally invariant? In Sections II.C–II.G we consider four approaches to mathematical morphology for circular data. For the examples in this section, we make use of the hue band of color images. This is discussed in more detail in Section IV, but for this section, it is suﬃcient to know that the hue is an angular value describing the color of a pixel. A. Circular Data and the Unit Circle Two types of circular data exist: vectorial data and axial data (Fisher, 1993). Vectorial data represent direction, for example wind direction, and have a periodicity of 2p. Axial data represent the orientations of undirected lines, for example the orientations of cracks on a surface, and have a periodicity of

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

127

p. For the examples we consider, the hue value and the Fourier transform phases are vectorial data, and direction ﬁelds are axial data.2 In general, axial data are processed by ﬁrst converting them to vectorial data (by a multiplication by 2 followed by a modulo 2p if necessary), followed by an application of vectorial data techniques, and ﬁnally a conversion of the results to axial data values. All the algorithms in this chapter are therefore designed for vectorial data. Angular valued data can usefully be represented as points on the circumference of the unit circle, the circle with center o and radius of length 1 shown in Figure 1. The points on the circle which indicate directions with respect to center o are written i with i 2 N. Upon choosing an arbitrary origin a0 on the unit circle, the positions of points i can be given by the corresponding angles ai, with these angles being measured in an anticlockwise direction with respect to the origin a0. Two points 1 and 2 are shown in Figure 1, with their angles a1 and a2 measured with respect to the origin a0 indicated. We stress that the points i are always found in the same position, independent of the position of the origin a0, whereas the values of the associated angles ai change as a function of the position of the origin. The angles have the property that the values ai þ 2kp, k 2 Z always correspond to the same point i. To simplify the comparison of angles, we constrain their values ai to lie in the interval [0, 2p). We therefore deﬁne the operator K() which takes a value 2 (1, 1) and moves it into the interval [0, 2p). This operator is deﬁned as KðÞ ¼ þ 2kp, with k 2 Z chosen so that KðÞ 2 ½0, 2pÞ:

ð1Þ

We proceed to the deﬁnition of addition and subtraction operators for angular values on the unit circle, wishing to obtain results in the interval [0, 2p). The addition of two angular values ai and aj is deﬁned as

ai þ aj ¼ Kðai þ aj Þ

ð2Þ

and the subtraction of these two values is deﬁned as aj ¼ Kðai aj Þ: ai

ð3Þ

Another notion of angular diﬀerence is the smallest angle formed by the directions 1 and 2, and represented by the acute angle between the two 2 A type of circular data known as p-axial also exists, having the property that a direction represented by an angle is the same as the directions represented by the angles þ k(2p/p), k 2 Z with p 2 N, p>2. For example, for p ¼ 6, the angles 10 , 70 , 130 , 190 , 250 , and 310 correspond to the same direction.

128

ALLAN HANBURY

FIGURE 1. The unit circle.

angles a1 and a2, which we denote by a1 a2, with, in the general case of any two angles ai, aj 2 [0, 2p), jai aj j if jai aj j p ai a j ¼ : ð4Þ 2p jai aj j if jai aj j p The acute angle a1 a2 is indicated in Figure 1. For digital images, the range of pixel values is usually limited by the number of bits per pixel. For an 8-bit image, the angles between 0 and 2p are represented by integer values between 0 and 255. For the examples presented, we use images having ﬂoating point pixel values, allowing more precision. We also freely switch between units of degrees and radians, using whichever is more convenient for the problem at hand. B. Circular Statistics One cannot use the classic linear statistical descriptors for circular data, due to the periodicity of this type of data. We present deﬁnitions of the circular mean and circular variance applicable to circular data as well as to images containing this type of data. These deﬁnitions are from Fisher (1993). We begin with the circular mean. Given n angular values i, i ¼ 1,. . . , n, the mean direction is the direction of the resultant vector of the sum of unit vectors in the n directions i. To ﬁnd the direction of this resultant vector, one ﬁrst calculates the values A¼

X i

cos i ,

B¼

X i

sin i

ð5Þ

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

129

followed by 8 arctan ðB=AÞ > > > > > > arctan ðB=AÞ þ p > < ¼ arctan ðB=AÞ þ 2p > > > > p=2 > > > : 3p=2

if B > 0, A > 0 if A < 0 if B < 0, A > 0 :

ð6Þ

if A ¼ 0, B > 0 if A ¼ 0, B < 0

The arctan function gives angular values in the interval [p/2, p/2], and the top three levels of Equation (6) give a value of in the interval [0, 2p). The ﬁnal two levels take into account the special case when A ¼ 0. The length of the resultant vector is R¼

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ A2 þ B2

ð7Þ

R : n

ð8Þ

and the average length is R¼

The average length has a value between 0 and 1 and can be used as an indicator of the dispersion of the data. If R ¼ 1, all the i are coincident. Conversely, a value of zero does not necessarily indicate a homogeneous data distribution, as many nonhomogeneous distributions can also result in a value of zero. The circular variance is deﬁned as V ¼ 1 R:

ð9Þ

Analogously to the linear variance, the concentration of the distribution is inversely proportional to the value of V (note that V also takes values between 0 and 1). Deﬁnitions of the angular standard deviation are also available (Fisher, 1993). To calculate statistical measures of axial data, one ﬁrst multiplies each angular value by two, and proceeds with the calculation of , R, and V. Lastly, is divided by two. C. Mathematical Morphology Applied to the Unit Circle Mathematical morphology is usually applied to grayscale images of the form f : z ! R where Z E is a subspace of the Euclidean space E.

130

ALLAN HANBURY

The existence of an order relation on R allows the construction of a complete lattice and hence the application of morphological operators. Morphological operators interact with images by means of small sets called structuring elements (SE). For simplicity, we use square structuring elements in the examples, where a square SE of size k is a square of size (2k þ 1) (2k þ 1) pixels. In the rest of this section, we consider images containing circular data, i.e., images of the type a : Z ! C, where C is the unit circle. In images of this type, there is no predeﬁned order for the angular values. One is free to choose an origin a0 anywhere on the circle, and the order of the values depends on this choice. The application of mathematical morphology to this type of image is discussed in the following sections. We begin by developing, in Section II.D, operators for which it is necessary to initially choose an origin, but which take the periodicity of the circular values into account. Next, we suggest, in Sections II.E–II.G, some approaches which avoid the necessity of choosing an origin, and thereby allow the creation of rotationally invariant morphological operators (Hanbury and Serra, 2001c).

D. Morphology with the Choice of an Origin Having chosen an origin a0 on the unit circle, it is easy to build an order from 0 to 360 , with inﬁmum 0 and supremum 360 (Vardavoulia et al., 2001; Weeks and Sartor, 1999). We then ﬁnd ourselves in the unfortunate situation where the inﬁmum and supremum are the same point of the circle, the origin. A solution allowing one to escape from this paradox is to order the points as a function of their distance to a chosen origin, using the acute angle between two points given by Equation (4). An order based on these diﬀerences is not total, as two points on opposite sides of the origin can have the same distance from the origin. We can nevertheless impose a total order on the points ai of the circle by using the following algorithm: ai aj

if or if

ai a0 aj a0 : ai a0 ¼ aj a0 and ai a0 180

ð10Þ

A similar relation to this has been used by Peters (1997) for applying morphological operators to hue diﬀerences in color images, and by Zhang and Wang (2000) in their deﬁnition of a central point of a segment

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

131

on the hue circle. Peters deﬁnes an erosion "PB by a structuring element B at point x as "PB aðxÞ ¼ inffað yÞ, y 2 Bx g

ð11Þ

in which the order given by Equation (10) with an origin a0 is used, that is to say that the inﬁmum of a set of points on the unit circle is the point closest to the chosen origin a0. The supremum is therefore the point furthest away from the origin. When dealing with directions, this deﬁnition is not very intuitive. If we are interested in some color of hue H on which we wish to carry out a dilation, it is necessary to choose the origin at Hþ180 . To simplify the choice of the origin, we deﬁne the operators in a way that permits the user to choose the origin at the position of the hue of interest. Consider the simple twocolor example shown in Figure 2(a): an image which, in the HLS space (see Section IV), has a brightness L ¼ 1/2 and a saturation S ¼ 1 constant over the entire image, with the red grains having a hue H ¼ 0 , and the yellow background a hue H ¼ 60 . We choose the origin equal to the hue of the objects of interest a0 ¼ 0 . If we dilate the hue by choosing the supremum in the structuring element according to the Peters formulation, the result is shown in Figure 2(b), in which the red objects have been eroded. To allow the user to choose the origin more intuitively by placing it at the position of the hue of the objects of interest, we invert the Peters formulation, and deﬁne the erosion as "B aðxÞ ¼ sup fað yÞ, y 2 Bx g

ð12Þ

B aðxÞ ¼ inffað yÞ, y 2 Bx g

ð13Þ

and the dilation as

(a)

(b)

(c)

FIGURE 2. (a) Two-color images containing red grains (marked ‘R’) on a yellow background (marked ‘Y’). (b) Dilation using the Peters formulation. (c) Dilation using Equation (13). The hue origin is a0 ¼ 0 for these two operations.

132

ALLAN HANBURY

in which we continue to use the order deﬁned by Equation (10) in choosing the supremum and inﬁmum in the structuring element. The dilation of Figure 2(a) by using Equation (13) is shown in Figure 2(c). The behavior of this dilation is revealed to be more intuitive. The choice of the origin can be made based on the requirements of the user or the characteristics of the images to be treated. For example, the mean or median (Nikolaidis and Pitas, 1998) color (hue) of an image or information on the color of the objects of interest may be used. The application of this type of hue morphology to color images which are more complex than the one in Figure 2(a), in which there is an interaction between the hue and saturation components, is discussed in Section V.C. E. Pseudodilation and Pseudoerosion To avoid having to choose an origin, we proceed to the development of operators based on the idea of grouped data. Because of the diﬃculty in determining the number of groups in a sample of circular data (Fisher and Marron, 2001), we introduce a simple deﬁnition of grouped data by way of the morphological center, and then use it to deﬁne morphological pseudoerosion and pseudodilation operators. 1. Morphological Center The morphological center is a notion which appears naturally in the context of self-dual morphological ﬁlters (Serra, 1988). Given n numerical values ti 2 R and a supplementary value t which we wish to bring closer to the ti, we apply the morphological center operator as follows 8 < ^ti ðtÞ ¼ t : _ti

if t ^ti if ^ ti t _ti : if _ ti t

ð14Þ

In particular, for n ¼ 2 we ﬁnd the median of the three values, t1, t2, and t. When we wish to transpose this notion to the unit circle, we immediately come up against an obstacle. In the linear case, it is always possible to say whether a value is outside (superior or inferior to) the set of values ti. Now consider a similar case on the unit circle, where we wish to bring a point a closer to a set of points ai. In Figure 3, let the origin O be the point to be moved closer to the ai (represented as crosses). In this case, it is possible to make sense of algorithm (14) only for certain distributions, such as those in Figures 3(a), (b), and (c), but not for the distribution of Figure 3(d), in which the data are too dispersed.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

133

FIGURE 3. Four distributions of circular data. (a), (b), and (c) are grouped, (d) is not.

A simple approach is to ignore the grouping of the data, and to unconditionally put a at the position of the closest point ai 0 ðaÞ ¼ fai jðai aÞ ¼ ^ðai aÞ, i 2 Ig:

ð15Þ

Alternatively, we can attempt to construct algorithms similar to algorithm (14). For this approach, it is necessary to formally deﬁne the notion of a group of points, of which an intuitive idea is given by Figures 3(a), (b), and (c). Deﬁnition 1 A family f i , i 2 Ig of points on the unit circle forms an !-group when an origin a0 exists for which the following is valid _fai , i 2 Ig ^ fai , i 2 Ig ! p

ð16Þ

where ! is an angle less than or equal to p, and ai is the angle corresponding to point i measured with respect to the origin a0. The condition ! p removes the case shown in Figure 3(d) from consideration. In practice, it is possible to decide if an !-group exists by simply choosing one arbitrary origin, as shown by the following proposition. Proposition 2 The family f i , i 2 Ig of points on the unit circle C forms an !-group if and only if one has _fai , i 2 Ig ^ fai , i 2 Ig !

ð17Þ

for an arbitrary origin a0, or for the origin a0 þ p. Proof If the ai are !-grouped, then it is possible to partition C into two semicircles so that all the ai are in one of the semicircles. With this partition

134

ALLAN HANBURY

of the circle, a point at the position of the origin a0 is in one of the semicircles, and the point at the position a0 þ p is necessarily in the other. One of these points is therefore in the semicircle opposite to the one which contains the family f i , i 2 Ig, and for this origin, relation (17) is satisﬁed, as the origin does not belong to the envelope of the group of points (i.e., the smallest sector of the circle which contains them all). Conversely, if the relation (17) is satisﬁed for an origin a0, we have the deﬁnition of an !-group of the ai. u This proof gives rise to a simple algorithm for determining whether a group of points is !-grouped. Given a family of points f i , i 2 Ig, an arbitrary origin a0 is chosen. If relation (17) is satisﬁed, then an !-group exists. If not, then the origin is placed at position a0 þ p. If relation (17) is satisﬁed for this origin, then an !-group exists. Otherwise, there is no grouping of points. If an !-group exists, then the inﬁmum and supremum of the group can be determined with respect to the origin for which the grouping exists. The algorithm deﬁning the circular morphological center uses this deﬁnition of an !-group. To begin, we take as origin the point which we wish to bring closer to the family { i}. Next, we look at the value of ^ ai . If > p, then either the points { i} do not form an !-group,

¼ _ai or the point is already in the interior of the group. We therefore leave in its initial position. If p, then the points { i} form an !-group, and is outside this group. The morphological center is the point of the group { i} which is closest to , this point always being one of the extremities of the group. The following deﬁnition presents a method for calculating the angular value of the morphological center. Deﬁnition 3 Given a family of points f i , i 2 Ig on the unit circle, and a point which we wish to bring closer to these points. If we place the origin of the angular values at the position of , then the morphological center is 8 <0 ð Þ ¼ ^ai : _ai

if > p if p and ð0 ^fai , i 2 IgÞ < ð0 _fai , i 2 IgÞ if p and ð0 _fai , i 2 IgÞ < ð0 ^fai , i 2 IgÞ

ð18Þ

^ ai . where ¼ _ai

The last two levels of Equation (18) have the function of choosing the extremity of the !-group closest to . For the examples shown in Figure 3, if we take the origin as the point to be moved closer to the others by the application of the morphological center operator, then it does not move

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

135

for Figures 3(b) and (d), and moves to the position of the circled points in Figures 3(a) and (c). 2. Erosion and Dilation The notion of an !-group (Equation (16)) suggests the introduction of two operators which are similar to the supremum and the inﬁmum. Consider a ﬁnite !-group i , i 2 I. For all the origins for which Equation (17) is satisﬁed, the point at position amax ¼ _fai , i 2 Ig, even if the numerical value of amax depends on the position of the origin, always corresponds to the same point of the group. The same result applies to the inﬁmum ^fai , i 2 Ig. These two extremities therefore have a signiﬁcance partially independent of the choice of the origin on the unit circle. The operation leads to the introduction of a ‘‘pseudodilation’’ operator. Consider a function a : E ! C, and let B be a structuring element. The pseudodilation : C ! C is deﬁned as follows aðxÞ ¼

_fai ðyÞ, y 2 Bx g aðxÞ

if fai ðyÞ, y 2 BðxÞg forms an !-group : otherwise

ð19Þ

The operator is not a true dilation, as one cannot ﬁnd an underlying order relation. Nevertheless, for every symmetric B, we can deﬁne, by duality, a ‘‘pseudoerosion’’ "aðxÞ ¼

^fai ðyÞ, y 2 Bx g aðxÞ

if fai ð yÞ, y 2 BðxÞg forms an !-group : otherwise

ð20Þ

It follows that all classic extensive mathematical morphology operators, such as openings, closings, reconstructions, and levelings, have a ‘‘pseudo’’ version. Figure 4 shows a comparison between a pseudoerosion and a classic erosion. Figure 4(a) is the hue band of a subregion of the color image in Figure A.1(a). A classic erosion is shown in Figure 4(b), and a pseudoerosion in Figure 4(c). The region in which the diﬀerences are the most visible corresponds to the red fruit at the left. The hue values for red are found on the two sides of the angular discontinuity at 0 /360 . The classic erosion reduces them to the smallest values larger than zero. The pseudoerosion, on the other hand, replaces the pixels with the inﬁmum of the group of angular values around zero. It is nevertheless important to examine other regions, such as the base of the wine glass, where the pseudoerosion operator has no eﬀect, due to the pixels in these regions not forming !-groups.

136

ALLAN HANBURY

(a)

(b)

(c)

FIGURE 4. (a) Hue of a 231 134 pixel subregion of the image in Figure A.1(a). (b) Classic erosion of image (a). (c) Pseudoerosion of image (a). Both erosions are done with a square SE of size 2.

By introducing these pseudooperators to avoid the necessity of choosing an origin, we unfortunately lose some of the useful properties of the classic morphological operators. For example, the pseudoopening and pseudoclosing operators are not idempotent (although, in general, they become idempotent after a few iterations). This lack of idempotence is due to the operator not acting on each pixel of the image in the same way, as it leaves some of them in their original state. The decision to change or to leave a pixel depends on the values in the structuring element, which can change with each application of the operator.

F. Circular Centered Morphology It is clear that even though the order of angular values depends on the choice of the origin a0, the order of the diﬀerences between angular values is independent of the position of the origin. It is possible to reformulate the mathematical morphology operators which act only on increments of values so that they can be applied to circular data without requiring any initial choice. In this section, the morphological gradient and top-hat operators are adapted to circular data. 1. Gradient We deﬁne here the morphological gradient operating on circular increments (Equation (4)), which is therefore applicable to images containing circular data. Let f : E ! R be a diﬀerentiable numerical function and B a structuring element. Beucher introduced three morphological gradients, described in Serra (1982): the gradient by erosion f ð f BÞ,

ð21Þ

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

137

the gradient by dilation ð f BÞ f ,

ð22Þ

ð f BÞ ð f BÞ

ð23Þ

and the symmetric gradient

in which f B denotes an erosion of f by B, and f B the corresponding dilation. In the Euclidean space, if we use a small sphere S(x, r) centered on x with radius r as the structuring element B, the symmetric gradient can also be written as gðxÞ ¼ lim f½ f Sðx, rÞ ½ f Sðx, rÞ g=2r r ! 0

ð24Þ

¼ lim f_½ f ðxÞ f ðyÞ, y 2 Sðx, rÞ ^½ f ðxÞ f ð yÞ, y 2 Sðx, rÞ g=2r ð25Þ r ! 0

In a discrete space Zd, this symmetric gradient in terms of erosions and dilations is written as gðxÞ ¼ _½ f ð yÞ, y 2 BðxÞ ^½ f ð yÞ, y 2 BðxÞ :

ð26Þ

By using the following relation f f ðxÞ ^½ f ð yÞ, y 2 BðxÞ g ¼ _f f ðxÞ f ð yÞ, y 2 BðxÞg

ð27Þ

and the relation obtained by inverting the supremum and inﬁmum operators, one can write Equation (26) in a form analogous to that of Equation (25), which contains only increments gðxÞ ¼ _½ f ðxÞ f ð yÞ, y 2 BðxÞ ^½ f ðxÞ f ð yÞ, y 2 BðxÞ :

ð28Þ

For the gradients by erosion and by dilation, Equation (27) and its inversion give their forms. The gradient by erosion is ge ðxÞ ¼ _½ f ðxÞ f ð yÞ, y 2 BðxÞ

ð29Þ

and the gradient by dilation is gd ðxÞ ¼ ^ ½ f ðxÞ f ð yÞ, y 2 BðxÞ ¼ _½ f ð yÞ f ð xÞ, y 2 BðxÞ :

ð30Þ

To move from numerical functions f(x) to angular functions a(x), it is suﬃcient to replace the increments [ f(x) f( y)] in Equations (28)–(30) by the angular diﬀerence given by Equation (4). For the case where the

138

ALLAN HANBURY

structuring element origin is inside the structuring element, the three equations reduce to a unique equation ga ðxÞ ¼ _½aðxÞ aðyÞ, y 2 BðxÞ :

ð31Þ

This is because ½ f ðxÞ f ð yÞ 2 ð1, 1Þ, but ½aðxÞ aðyÞ 2 ½0, 2pÞ, and therefore ^½aðxÞ að yÞ, y 2 BðxÞ in Equation (28) is always equal to zero if the origin is part of the structuring element. Equations (29) and (30) become identical because aðxÞ aðyÞ ¼ aðyÞ aðxÞ. For the case where the structuring element origin is not part of the structuring element, Equation (28) obviously becomes ga ðxÞ ¼ _½aðxÞ að yÞ, y 2 BðxÞ ^½aðxÞ að yÞ, y 2 BðxÞ :

ð32Þ

We demonstrate the action of this operator on Figure 5(a), which is the hue band of the color image in Figure A.1(b). This image was chosen because it is mainly red and purple, which puts the majority of its pixels on the two sides of the origin of the (circular) histogram. A discontinuity is therefore visible in the hue, with red pixels shown at both extremities of the straightened out hue histogram (Figure 5(b)). A classic morphological gradient on the hue (Figure 5(c)) results in a large number of highvalued pixels which do not correspond to highly visible color diﬀerences in the initial image. This phenomenon is particularly strongly visible in the outer part of the halo, which has a smooth appearance in the initial image, but which produces strong gradients in Figure 5(c). The circular centered gradient (Equation (31)), shown in Figure 5(d), solves this problem. Note that, for this example, if we add p to each of the hue values, the classic gradient becomes identical to the circular centered gradient. On the other hand, the circular centered gradient remains invariant to rotations of the pixel values. An alternative circular gradient based on measures of angular data dispersion is presented by Nikolaidis and Pitas (1998). 2. Top-Hat The top-hat operator, developed by F. Meyer, and described by Serra (1982), is the residue between a numerical function and its transformation by an opening. It therefore acts only on increments, and hence can be transposed to circular valued functions. We describe below the algorithm which is developed for the case of openings by adjunction (i.e., products by composition of an erosion and its adjunct dilation). We ﬁrst recall the relation which gives the value B(x) of the opening by structuring element B

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

(a)

(b)

(c)

(d)

139

FIGURE 5. (a) Hue component of Figure A.1(b). (b) Histogram of the hue component. (c) Classic morphological gradient (Equation (23)) of the hue. (d) Circular centered gradient (Equation (31)) of the hue. The gradients were calculated using a square SE of size 1.

at point x. If we denote by fBi , i 2 Ig the family of structuring elements which contain point x, the relation is B ðxÞ ¼ sup finf ½ f ð yÞ, y 2 Bi , i 2 Ig:

ð33Þ

We now consider the top-hat expression f(x) B(x), which we rewrite in terms of increments of f f ðxÞ B ðxÞ ¼ f ðxÞ sup finf ½ f ð yÞ, y 2 Bi , i 2 Ig ¼ sup finf ½ f ð yÞ, y 2 Bi f ðxÞ, i 2 Ig ¼ sup finf ½ f ð yÞ f ðxÞ, y 2 Bi , i 2 Ig: As for the gradient, we replace ½ f ð yÞ f ðxÞ by ½aðxÞ aðyÞ . Nevertheless, it is necessary to take into account the fact that we are replacing the expression ½ f ð yÞ f ðxÞ, y 2 Bi 2 ð1, 1Þ by the expression

140

ALLAN HANBURY

(a)

(b)

(c)

(d)

(e)

FIGURE 6. (a) Hue component of a 311 227 pixel subregion of Figure A.1(a). (b) Classic top-hat by a square SE of size 1 applied to image (a). (c) Circular centered top-hat by a square SE of size 1 applied to image (a). (d) Histogram of image (b). (e) Histogram of image (c).

½aðxÞ aðyÞ, y 2 Bi 2 ½0, 2pÞ, and in consequence, if the structuring element origin forms part of the structuring element, the expression inf ½aðxÞ aðyÞ, y 2 Bi is always equal to zero. To avoid the result of this top-hat operator always being zero, it is necessary to use the dual form, which is equivalent, but which is written only in terms of suprema ATH½aðxÞ ¼ supfsup½aðxÞ aðyÞ, y 2 Bi , i 2 Ig:

ð34Þ

An example of the use of this top-hat is shown in Figure 6. Figure 6(a) is the hue band of a subregion of the color image in Figure A.1(a). In the color image, the red regions, i.e., those that are found on the discontinuity of the hue values, have been manually outlined. These hue value discontinuities are visible in Figure 6(a). The result of a classic top-hat operator applied to

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

141

Figure 6(a) is shown in Figure 6(b), with its histogram in Figure 6(d). It is clear that even though the colors do not change signiﬁcantly in the regions indicated, many large-valued pixels appear in the top-hat result, and are also visible in the histogram. The result of the circular centered top-hat is shown in Figure 6(c), with its histogram in Figure 6(e). In this image, the false high values are no longer present.

G. Partitions Up to now, we have tried to construct a lattice on the unit circle C by changing the position of the origin, possibly variable at each point x 2 E. In this section, we consider the more classic lattice of sets or partitions, in which the directions are used only to label the partition elements. Starting from the concept of connected partitions, we develop opening and closing operators which act on these partitions. Next, the opening is further developed to produce a version which is rotationally invariant, which leads to an alternative deﬁnition of the top-hat. 1. Connected Partitions A connected partition of a space is deﬁned as follows: Deﬁnition 4 A partition of the space E for which each element is connected is an application D : E ! PðEÞ, with connectivity C deﬁned on P (E), such that for all points x and y of E: (1) x 2 DðxÞ (2) x ¼ 6 y ) DðxÞ ¼ DðyÞ or DðxÞ \ DðyÞ ¼ ; (3) DðxÞ 2 C: The ﬁrst two axioms require that each x 2 E forms part of an element of the partition, and that there is no overlapping of partition elements. These two axioms deﬁne partitions in general. The third, more speciﬁc to our needs, imposes a connectivity on the partition elements. We deﬁne a connected partition as a partition of E for which the partition elements are connected. A proof that a family of connected partitions forms a complete lattice is given in Appendix A. 2. Indexed Partitions We move from a connected partition to an indexed partition by associating an index (e.g., linked to the hue or direction) with each element of the partition.

142

ALLAN HANBURY

Deﬁnition 5 An indexed partition of a space E, indexed by a ﬁnite number N, is as application D : E ! PðEÞ with a function M : PðEÞ ! ½1, 2, . . . , N which associates an index with each element D(x) of the connected partition. To simplify the notation, we deﬁne ( Dðx, iÞ ¼

DðxÞ

if M½DðxÞ ¼ i

;

otherwise

:

ð35Þ

The N sets associated with the gamut of indices (hue, direction, etc.) are called phases, and the phase Ai is the union of the partition elements associated with index i Ai ¼ [fDðx, iÞ, x 2 Eg:

ð36Þ

As each point x 2 E must be associated with an index, there are only N 1 independent index values—if we know the position of N 1 phases, the position of the Nth phase is necessarily known. Appendix B deals with lattices of indexed partitions and the behavior of increasing operators on these partitions. We now consider in more detail opening and closing operators acting on indexed partitions. 3. Cyclic Operators Indexed partitions constructed on the unit circle are called cyclic partitions. An operator acting on such a partition is called cyclic when it acts on the phases associated with all the indices of the partition. When a cyclic closing is applied to a cyclic partition, it is clear that this operation, being extensive, leads to the interaction of diﬀerent phases. In order to be able to take these interactions into account, the closing is applied to phases in series. Conversely, the opening, because of its antiextensivity, can be applied to the phases in a parallel fashion. 4. Series Closings We equip the space E with a proper connection, i.e., a connection for which every grain of X is adjacent to at least one pore, and each pore to at least one grain ðX E, except for X ¼ ; and X ¼ EÞ. Let ’ be a connected closing on P(E). We introduce the following operation k ðEÞ ¼

x ’B ðAk Þ

if k ¼ l

Al n½x ’B ðAk Þ

if k 6¼ l

8l ¼ ½1, 2, . . . , N

ð37Þ

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

143

in which Ai is given by Equation (36), B is a structuring element, and x indicates the point connected opening. The ﬁrst line of the equation applies a connected closing to the phase Ak, and the second line removes the region assigned to the phase Ak from the other phases, thereby ensuring that the properties of a partition are not lost. This closing of one phase is obviously not cyclic. Consider now the product by composition ¼

N

2

1

ð38Þ

applied to the N phases. The operator has the following eﬀect on the partition: 1 closes certain pores of the phase A1 according to an increasing criterion, which signiﬁes that if a certain pore is not closed, then no pores larger than it will be closed. The operations 2, 3,. . ., N then transform certain grains of x’(A1) into pores without ever adding more grains. Because the connection on E is proper, each grain of x’(A1) subsequently transformed into a pore can only increase the size of the adjacent pores of ¼ . In other x’(A1). Consequently, 1 ¼ , and by the iteration words, is idempotent. The practical eﬀect of each operator i is to assign the index i to connected components of the partition which are smaller than the structuring element and which are entirely surrounded by the phase i. The result of the operator is not independent of the order of application of the closing operators i, as is demonstrated schematically in Figure 7. An example of a cyclic closing which simpliﬁes the hue of the color image shown in Figure A.2(a) is given. The hue is ﬁrst partitioned by using a simple algorithm for constructing the limits of partition elements. This algorithm constructs indices starting from 0 , requiring that each phase have either a maximum number of pixels (here equal to one-sixth of the total number of pixels in the image), or a maximum width of 45 . The 10 phases generated by this algorithm for the hue of the example image are listed in Table 1. Figure 8(a) shows the hue image containing the labeled phases. A cyclic closing (Equation (38)) by a square SE of size 10, with the phases processed in the order of increasing index i, is applied to the indexed partition to produce the closed indexed partition shown in Figure 8(b). To reconstruct a color image, each phase is replaced by its mean hue (Table 1), and this image is recombined with the initial saturation and brightness images to create the image of Figure A.2(b). In this image, some eﬀects of the closing on the hue, while not striking, are visible. For example, the white elements of the mosaic which are surrounded by red have taken on a light red color in the output image.

144

ALLAN HANBURY

FIGURE 7. Schematic example of a cyclic closing using the indicated structuring element, which demonstrates that the result depends on the order of the component operators. TABLE 1 UPPER AND LOWER LIMITS OF THE ELEMENTS OF THE INDEXED PARTITION OF THE HUE USED IN FIGURE 8, AND THE MEAN HUE CALCULATED IN EACH PHASE Phase 1 2 3 4 5 6 7 8 9 10

Lower limit

Upper limit

Mean hue

0 35 48 93 138 183 211 241 286 331

35 48 93 138 183 211 241 286 331 360

22.3 40.5 60.1 115.1 163.2 199.7 218.2 256.1 311.7 353.3

In summary, the aim of this approach is to replace spatial data for which the choice of an origin is not obvious by a processing of indexed sets. The operator is, however, not rotationally invariant, as the result depends on the order in which the indices are processed. We presented an example of the simpliﬁcation of the hue of a color image; another good example of the use of this operator on images of thin polarized sections of silicates is given by Mlynarczuk et al. (1998). 5. Parallel Openings The result of a series closing is to relabel some connected components using already existing indices. An opening, on the other hand, completely removes

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

(a)

145

(b)

FIGURE 8. Example of a cyclic closing. (a) Indexed partition of the hue band of Figure A.2 containing 10 phases. (b) Indexed hue partition after the cyclic closing by a square SE of size 10. The corresponding color image is shown in Figure A.2(b).

some of the connected components. In order to remain within the framework of partitions, we solve this problem by beginning with a partition labeled by N 1 indices, and we label the components which are removed by the opening with index N. It is clear that there is no interaction between diﬀerent components of the partition when applying an opening, in contrast to the cyclic closing, which changes the shape of some of the components. This operator is therefore simply a reindexation of the components of a partition. We assign index N to the components which are eliminated by a connected opening. Those which are not eliminated keep their initial indices. This operation is cyclic as it acts on all indices. This reindexation can be symbolically written as M½DðxÞ :¼

M½DðxÞ N

if B ½DðxÞ 6¼ ; otherwise

8x 2 E

ð39Þ

in which B is a connected opening, and the symbol :¼ indicates that the value on the right is assigned to the one on the left. The phase with label N plays the role of residue of the opening. We take advantage of the fact that there is no interaction between the phases during the application of a cyclic opening in order to reformulate this opening to be applicable to labeled angular images, which permits the development of a simple top-hat operator. In a labeled image, it is not necessary to have a label at every point of the image; the residue is therefore represented more conveniently by the absence of a label, as in the case of sets. With the formulation in terms of labels, we are also no longer limited to using connected openings, as it is no longer necessary to preserve the properties of a partition.

146

ALLAN HANBURY

We label an angular image by choosing label boundary points qi, i ¼ 1, 2,. . ., N with q1 ¼ qN on the circle. The label i (equivalent to phase i in the partition context) is given by Ai ¼ fx : x 2 E, aðxÞ 2 ½qi , qiþ1 Þg:

ð40Þ

The set of labels which are not removed by an opening with structuring element B is Bc A ¼

N1 [

B ðAi Þ:

ð41Þ

i¼1

The result of this opening is independent of the order of the elementary openings from which it is constructed, as demonstrated schematically in Figure 9. Furthermore, the union in Equation (41) does not necessarily require a ﬁnite set of N nonoverlapping labels, which allows the easy development of a rotationally invariant labeled opening. 6. Rotationally Invariant Cyclic Opening An application of a cyclic opening, as deﬁned in the preceding section, to a labeled image produces an image in which some connected components have lost their labels, but in which none of the components have been assigned a diﬀerent label. Consequently, it is not necessary to carefully follow the evolution of the connected component labels, it is suﬃcient to look at the

FIGURE 9. Schematic example of a cyclic opening using the indicated structuring element, demonstrating that the result is independent of the order of the elementary openings from which it is built.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

147

intersection or union of results of openings acting on labels which overlap. We can therefore transform the images by extracting sets of pixels which satisfy certain criteria, applying an opening to each set, and combining the results in an isotropic way. Let A( , !) be a set of points x 2 E for which the angular value a(x) lies in the range [ , þ !] Að , !Þ ¼ fx : x 2 E, aðxÞ 2 ½ , þ !Þg: The opening BA( , !) behaves like a binary opening, with A( , !) in the foreground, and its inverse Að , !Þ in the background. To make this operator isotropic, we take the union of the transformed sets B[A( , !)] when moves around the unit circle, i.e., ðB, !Þ ¼

[

B ½Að , !Þ , 0 2p :

ð42Þ

One obtains a binary image whose foreground pixels are those which were not removed by the opening for at least one value of the angles . We therefore consider that all the pixels which were removed by the operator correspond to the residue of the operator. As for the top-hat, this residue, denoted byR ðB, !Þ, can be obtained by the set diﬀerence between the result of the opening ðB, !Þ and the union of all the labels [ {A( , !), 0 2p}, which we write R ðB, !Þ ¼ [ Að , !Þ, 0 2p nðB, !Þ:

ð43Þ

Given that the union of all the labels encompasses the entire image, the residue can equivalently be obtained by inversion of the result of the opening R ðB, !Þ ¼ ðB, !Þ:

ð44Þ

This residue contains all the pixels which were eliminated by the opening for each angle . In practice, in order to speed up the calculation, it is often necessary to approximate the variation of the angle in Equation (42) by a few discrete values, for example by varying starting from an origin 0 in steps of size . It is possible that the use of such an approximation adds some supplementary regions to the residue. This is demonstrated by the pathological situation represented in Figure 10, in which image (a) contains two angular values represented by two graylevels. For the labeled openings applied to this image, we use the structuring element shown, and a value of

148

ALLAN HANBURY

FIGURE 10. (a) Schematic image showing two angular values represented by two graylevels. (b–d) Label A( , !) shown in gray for the values of given below each image, and ! ¼ 30 . The structuring element used for the labeled opening is shown at the top.

! ¼ 30 . Figure 10(b) shows in dark gray the label A( , !) for 59 < 76 , Figure 10(c) for 76 < 89 , and Figure 10(d) for 89 < 106 . It is clear that an opening of the labeled region by the structuring element shown leaves Figures 10(b) and (c) in their initial state, and removes the labeled region in Figure 10(d). If we use all the values of 2 ½0 , 360 Þ, then the central region will not form part of the residue as it is not eliminated for all values of . If we choose to use an approximation with 0 ¼ 0 and ¼ 15 , only takes on the values 45 , 60 , 75 , 90 , and 105 in the interval of interest, which avoid the conﬁguration of Figure 10(c). The central region therefore forms part of the residue with this approximation. We now present an example which illustrates the steps in a cyclic opening applied to the hue component (Figure 11(b)) of the subregion of the color image of Figure A.2(a) shown in Figure A.1(c). We apply a labeled connected opening (i.e., opening with reconstruction) with ! ¼ 90 and a square SE of size l ¼ 7. The opening (Equation (42)) is done by varying from 0 ¼ 0 to 315 in steps of size ¼ 45 . The deﬁnition of the labels is shown in Figures 12(a) and (b), and the labeled images are shown in Figures 12(c) and (d) (two labeled images are shown because the labels overlap). The results of the openings applied to each label in Figures 12(c) and (d) are shown in Figures 12(e) and (f ), respectively, with the residue indicated in white. This residue corresponds to the labeled regions which were completely removed by the opening. The ﬁnal residue (Equation (44)) is shown in Figure 11(c). When looking at this result, it is clear that the residue is made up of two types of regions: (1) Those which have a label diﬀerent to that of the neighboring regions, and which are smaller than the structuring element.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

(a)

(b)

149

(c)

FIGURE 11. The (a) luminance and (b) hue of Figure A.1(c). (c) The residue of a cyclic opening on the hue with ! ¼ 90 and l ¼ 7.

(2) Those in which the pixel values have a variation larger than the size ! of a label. In the case of hue images, the regions having a low saturation often fall into this category. These observations can be described in a more rigorous way. When the angle ! varies from 0 to p, it is clear that the opening is an increasing function of !. In addition, this opening is also a decreasing function of the structuring element size parameter l, from which we get the following proposition. Proposition 6 Let a : E ! C be an angular valued function, l a granulometry, and A( , !) a set of points having angular values which satisfy the restriction Að , !Þ ¼ fx : x 2 E, aðxÞ 2 ½ , þ !Þg: Then the operator ð, !Þ ¼

[ ½Að , !Þ , 0 2p

is an isotropic opening. The family fðl, p !Þ, 0 ! p, l > 0g gives rise to a double granulometry with respect to the parameters l and p !.

150

ALLAN HANBURY

(a)

(b)

(c)

(d)

(e)

(f)

FIGURE 12. (a), (b) Deﬁnition of the labels with parameters ! ¼ 90 , 0 ¼ 0 and ¼ 45 . (c), (d) Labeled hue following deﬁnitions (a) and (b). (e), (f ) Results of labeled openings, in which the residue is marked in white.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

151

An example illustrating this double granulometry is given in Table 2, in which the residues of a labeled opening applied to the hue band shown in Figure 11(b) are shown. The value of ! decreases from left to right, and the size l of the square SE increases from top to bottom. The area of the residue is therefore largest for the bottom right image. An application of this operator for the detection of defects in an oriented texture is given in Section III.B.3. In practice, the labeled opening is faster than the circular centered opening, and could be accelerated even more as it acts on the data in a parallel way (i.e., each l( , !) could be calculated by an independent processor).

H. Conclusion In this section, we presented four diﬀerent methods for applying mathematical morphology to images containing circular data. The principal aim was to develop operators which (1) take into account the fact that the values on the circle are cyclic (and hence the discontinuity in the values at the origin 0/2p) (2) are rotationally invariant, i.e., independent of the choice of an origin. These objectives are met to diﬀering extents by the operators developed, as unfortunately there is no general solution satisfying the two prerequisites. In summary, all the operators introduced take the periodicity into account, but only some are rotationally invariant: the pseudooperators, the circular centered operators, and the labeled cyclic opening. Often, in order to remove the necessity of choosing an origin, and to build a rotationally invariant operator, other preliminary choices are necessary. For example, for the pseudooperators, we are obliged to choose a deﬁnition of grouped data, and for the labeled openings, a sector size is needed. The only operators for which both prerequisites are satisﬁed without imposing an alternative choice are the circular centered operators. Nevertheless, only very few morphological operators can be rewritten in this form. In practice, certain operators show themselves to be more useful or convenient to use than others. The pseudooperator approach, for example, seems to give rise to more inconveniences (loss of basic properties) than advantages (isotropic). The cyclic closing, which requires an initial deﬁnition of a set of labels, is very diﬃcult to apply in cases for which the label boundaries are not obvious. The fact that it is not rotationally invariant also limits its applicability.

152

ALLAN HANBURY TABLE 2 DEMONSTRATION OF THE DOUBLE GRANULOMETRY l/!

90

45

20

1

3

5

7

The columns show residues (in white) obtained for decreasing values of !, and the rows show the residues obtained for increasing structuring element size l.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

153

Without a doubt, the most useful operators are:

The two top-hats (circular centered and labeled cyclic), which are applied to defect detection in an oriented texture in Sections III.B.3 and III.B.4. The circular centered gradient, used for the extraction of smoothly varying regions in a phase image in Section III.A, and for the segmentation of an oriented texture in Section III.B.2. The operators requiring the choice of an origin, which are the earliest to apply in vector spaces (Section V.B.2).

III. APPLICATION EXAMPLES In this section we give examples of applications in which the angular data can often be treated separately. The ﬁrst example (Section III.A) involves extracting homogeneous regions in Fourier transform phase images using the circular centered gradient operator. We then discuss some applications in oriented texture analysis (Section III.B). A. Homogeneous Phase Extraction in HRTEM Images Boulc’h et al. (2001) have measured the size of crystalline domains in yttriadoped nanocrystalline zirconia (Y-tetragonal zirconia polycrystals or YTZP) by image analysis of high-resolution transmission electron microscope (HRTEM) images. A geometric phase analysis method, developed by Hy¨tch et al. (1998), was used to make the crystalline domains visible. As the phase image consists of angular values, the morphological operators developed in Section II are perfectly suited to its analysis. We show here how the use of the circular centered morphological gradient can simplify the automated extraction of the crystalline domains from the phase image. For completeness, we ﬁrst brieﬂy describe the construction of the phase image. Figure 13 shows an HRTEM image of Y-TZP, which is used to illustrate the procedure. In order to compute the phase image (Hy¨tch et al., 1998), the Fourier transform of the image is ﬁrst calculated, and one of the peaks in the Fourier transform amplitude (Figure 14) is chosen. The Fourier transform is then multiplied by a Gaussian mask centered on the chosen peak, and the inverse Fourier transform of the masked image is calculated. After subtracting a factor corresponding to the chosen frequency, one obtains the phase image. Figure 15 shows the geometric phase image corresponding to the indicated peak in Figure 14.

154

ALLAN HANBURY

FIGURE 13. An HRTEM image of Y-TZP (size 1024 1024 pixels). (Image courtesy of F. Boulc’h and P. Donnadieu.)

FIGURE 14. The amplitude of the Fourier transform of Figure 13. The geometric phase image corresponding to the indicated peak was calculated.

The regions of homogeneous phase in the phase image correspond to the crystalline domains. In order to easily extract the regions in which the angular values vary slowly, we apply a circular centered morphological gradient with a square SE of size 5 to the geometric phase image, resulting in the image shown in

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

155

FIGURE 15. The geometric phase image corresponding to the Fourier peak indicated in Figure 14.

FIGURE 16. The morphological circular centered gradient of the geometric phase image of Figure 15 calculated using a square SE of size 5.

Figure 16. In this image, the homogeneous (slowly varying) regions of the geometric phase image result in regions of low grayvalue (note that the rate of spatial variance of the regions to be found is selected by the size of the structuring element used). These low-grayvalue regions can easily be extracted using a threshold.

156

ALLAN HANBURY

FIGURE 17. The threshold of Figure 16 showing the pixels having graylevels between 0 and 80.

In Figure 17, the regions of the circular centered gradient image (Figure 16) with graylevels between 0 and 80 are extracted (the upper threshold limit was chosen by hand, but it should be quite stable over a range of images). If the small areas included in the threshold are not of interest, they can be removed using a morphological area opening (Soille, 1999). As a further demonstration of the unsuitability of the standard morphological gradient for circular data, we have applied such a gradient operator to the geometric phase image. The resulting gradient is shown in Figure 18. It is clear that a number of false strong gradients, due to the discontinuity in pixel values between p and p, have been detected. They have been indicated in the ﬁgure.

B. Oriented Texture An oriented texture is an anisotropic texture characterized by a dominant local orientation at each point of the texture. To describe such a texture quantitatively, the principal orientation is calculated in a group of neighborhoods superimposed on the image containing the texture. If we represent each neighborhood by one pixel having as value the dominant orientation in the neighborhood, we create an image which summarizes the

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

157

FIGURE 18. The standard morphological gradient operator applied to Figure 15. The false strong gradients due to the angular value discontinuity between p and p are indicated.

texture in the form of a direction ﬁeld. In each neighborhood, one can also calculate a measure of coherence or a level of anisotropy so as to construct a second summary image. If the centers of the neighborhoods are separated by a distance of more than one pixel, then the summary images are smaller than the initial texture image. We ﬁrst describe the Rao and Schunck algorithm for calculating an orientation ﬁeld summarizing an oriented texture. This orientation ﬁeld is then used for segmentation in conjunction with the circular centered gradient and watershed operators, and defect detection using the circular centered top-hat and labeled top-hat operators.

1. The Rao and Schunck Algorithm To calculate the summary images, we use an algorithm developed by Rao and Schunck (Rao, 1990; Rao and Schunck, 1991), based on an approach by Kass and Witkin (1987). It is based on the gradient of a Gaussian ﬁlter. In two dimensions, the Fourier transform of the ﬁrst derivative of a Gaussian function consists of two lobes on opposite sides of the origin in frequency space. An oriented texture would have a dominant frequency component, and the response of the gradient of the Gaussian ﬁlter

158

ALLAN HANBURY

can be ﬁtted to this dominant component (Rao, 1990). The steps of the algorithm are: (1) A Gaussian ﬁlter of standard deviation 1 is applied to the initial grayscale image in order to choose the scale of the interesting structures. (2) For each pixel (k, l), the horizontal and vertical gradients Hkl and Vkl are calculated. This is done by a convolution with the Prewitt or Sobel kernels (Gonzalez and Woods, 1992). (3) For each pixel (k, l), the modulus Rkl and angle kl (between 0 and 360 ) are calculated from the gradient values. (4) A neighborhood W of width 2h and height 2v is moved over the image in steps of h pixels horizontally, and v pixels vertically. At each position (x, y), the local orientation ^xy (between 0 and 180 ) and the orientational coherence xy are calculated: P R2 sin 2kl ^xy ¼ 1 arctan P ðk,lÞ2W kl 2 2 ðk,lÞ2W Rkl cos 2kl

ð45Þ

and P xy ¼

jRkl cosð^xy kl Þj P : ðk,lÞ2W Rkl

ðk,lÞ2W

ð46Þ

The dominant orientation is essentially the angular mean of the directions within the neighborhood, the relation between these two deﬁnitions being discussed after the presentation of the algorithm. The coherence is the sum of the lengths of the unit vectors with directions kl projected onto the unit vector in the mean direction ^xy of the neighborhood. It gives the proportion of the directions which are close to the mean direction. (5) We lastly build two summary images and , which represent, respectively, the distribution of the orientations and of the coherences. In these images, each pixel encodes the values calculated at all neighborhood positions. In symbolic form kl ¼ ^ðkh Þðlv Þ

and

kl ¼ ðkh Þðlv Þ:

The important step in this algorithm is the calculation of the orientation by Equation (45). The form of this equation is similar to that of the angular mean presented in Section II.B (one also has an arc-tangent of a sum of sine terms divided by a sum of cosine terms). The mean direction given by

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

159

Equation (45) is in fact the second trigonometric moment (Fisher, 1993). This moment was chosen so that vectors in opposite directions reinforce each other,3 the behavior needed when working with axial data. The sums are augmented by taking the moduli of the vectors into account, thereby giving more importance to directions associated with large moduli.4 Lastly, the result of the arctan function is divided by two so as to place the resultant angle in the range of axial data. Figure 19(b) shows the orientation summary image for the plank of wild cherry wood shown in Figure 19(a). For this image, the veins form the dominant oriented texture. We ﬁrst use a threshold to separate the wood from the background, and then calculate the orientation summary image of the wood using the parameters 1 ¼ 1.4, 2h ¼ 2v ¼ 32, and h ¼ v ¼ 16. In this image, the graylevel of each pixel represents an angular value between 0 and 179 . The histogram of the orientation distribution, in other words the histogram of Figure 19(b), is shown in Figure 19(c), and a schematic diagram showing the encoding of the vein directions is given in Figure 19(d). This encoding is used for all the orientation summary images shown in this chapter. As is clear from the histogram, the majority of the veins of Figure 19(a) have orientations in the 40 to 100 range. The values of the parameters 1, 2h, 2v, h, and v are chosen as a function of the data being analyzed. The 1 parameter, the Gaussian ﬁlter standard deviation, chooses the scale of the texture to be processed. Higher values of 1 lead to the removal of small details from the image. The values of the 2h and 2v parameters, which specify the size of the neighborhood in which the mean direction is calculated, have less eﬀect on the result. They should, however, be chosen so that there is at least one oriented structure within each neighborhood. The values of the h and v parameters specify the level of subsampling of the initial image. To take all the pixels in the initial image into account, it is necessary that h 2h and v 2v. For practical applications, a more eﬃcient form of Equation (45) is available. One can derive it by using the relations R2kl ei2kl ¼ R2kl cos 2kl þ iR2kl sin 2kl

3

Consider a vector having a representation Rei in polar coordinates. The square of this vector is R2e2i. The vector facing the opposite way to Rei is described by Rei( þ p) and its square is R2 eið2þ2pÞ ¼ R2 e2i . Therefore, the addition of the squares of two vectors in opposite directions gives a vector having modulus 2R2 (the vectors reinforce each other). 4 In Section V.A we use a similar weighting in the context of color images.

160

ALLAN HANBURY

Frequency

15 10 5 0

0

20

40

60

80 100 120 140 160 180 Gray level

FIGURE 19. Calculation of an orientation image. (a) Initial image of size 272 608 pixels (courtesy of Scanwood System, Pont-a`-Mousson, France). (b) Orientation summary image (size 13 33 pixels). (c) Histogram of image (b). (d) Schematic diagram showing how the direction angles are encoded by the graylevels of image (b).

and R2kl ei2kl ¼ ðHkl þ iVkl Þ2 ¼ Hkl2 Vkl2 þ 2iHkl Vkl whence R2kl sin 2 ¼ 2Hkl Vkl

and

2 R2kl cos 2 ¼ Hkl Vkl2

which are substituted into Equation (45) producing a version having fewer trigonometric functions and which directly uses the gradient values P 2Hkl Vkl ^xy ¼ 1 arctan P ðk,lÞ2W : 2 2 2 ðk,lÞ2W Hkl Vkl The Rao and Schunck algorithm is a simple and fast way of calculating the principal orientations of a texture. We now present some variations of the algorithm which have been presented in the literature. Bigu¨n et al. (1991) give an alternative derivation of the same algorithm, except that the Gaussian convolution and gradient calculation are combined into a single

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

161

step. An alternative to steps 1 and 2, i.e., the choice of the scale and the calculation of the gradients, is a dyadic wavelet transformation (Mallat, 1998) which eﬃciently gives the horizontal and vertical gradients at several scales. Davies (1997) proposes a set of ﬁlter kernels which are well suited to determining the orientations of linear structures, and can be used as a replacement for the kernels in step 2. The main disadvantage of the Rao and Schunck algorithm is its inability to take into account cases in which there are more than one principal orientation in a neighborhood. Andersson and Knutsson (1991) present an approach capable of separating two directions, and Chetverikov (1999) presents a method which can detect several orientations by using a measure of anisotropy. Freeman and Adelson (1991) introduce the notion of steerable ﬁlters, a generalization of ﬁlter banks allowing the calculation of an orientation and of a coherence at each point of an image, as well as the possibility to deal with multiple principal orientations in a single neighborhood. Picard and Gorkani (1994) give the results of an experiment which compares the principal orientations found in the Brodatz textures by the Freeman and Adelson algorithm to those perceived by humans. 2. Segmentation Morphological segmentation of a grayscale image is usually done by applying the watershed algorithm to the gradient of the image. The circular centered gradient operator allows one to segment an image containing circular data in the same way. We present an example of the segmentation of an oriented texture. The aim of this type of segmentation is to create regions in which the orientations are homogeneous. The steps in the segmentation algorithm, which are illustrated in Figure 20, are: (1) The Rao and Schunck algorithm is applied to the initial image (a plank of oak, Figure 20(a)) to calculate the orientation image (Figure 20(b)). For the example, the parameters 1 ¼ 1.4, 2h ¼ 2v ¼ 64, and h ¼ v ¼ 8 were used. (2) The circular centered gradient of the orientation image is calculated (Figure 20(c)). For the example, a square SE of size 2 was used. (3) The minima are extracted. So as to avoid ﬁnding a large number of small minima which would result in an over-segmentation of the image, we close the gradient image with a square SE of size 1, and then ﬁnd the h-minima (Soille, 1999) of height h ¼ 5 (Figure 20(d)). (4) The watershed is applied to the gradient image using the minima extracted in the previous step as markers, producing the segmentation shown in Figure 20(e). In this image, the watershed lines are

162

ALLAN HANBURY

(a)

(b)

(c)

(d)

(e)

FIGURE 20. Steps in the segmentation of an oriented texture. (a) Initial image with size 420 1040 pixels (courtesy of Scanwood System, Pont-a`-Mousson, France). (b) Orientation image with size 50 125 pixels. (c) Morphological circular centered gradient (with a square SE of size 2). (d) h-minima. (e) Watershed segmentation of the circular centered gradient image (c) using the markers in (d). The graylevel in each region encodes the mean orientation of the region.

(a)

(b)

(c)

(d)

FIGURE 21. Results of segmentations of oriented textures by the watershed algorithm for four oak images. (Images courtesy of Scanwood System, Pont-a`-Mousson, France.)

shown in black, and the graylevel of each region encodes the mean orientation of the region, calculated using circular statistics. For visualization purposes, the segmentation obtained is superimposed on the initial image in Figure 21(a), in which the black lines represent the watershed lines.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

163

Some further results of the segmentation of oriented textures are shown in Figures 21(b)–(d). In general, this algorithm manages to segment the textures into homogeneous regions, with the more globally homogeneous textures segmented into the fewest regions, as in Figure 21(b) for example. Some problems are nevertheless present in the current formulation of the algorithm: the ﬁrst is common to almost all watershed segmentations, and involves the choice of markers. If we take all the minima of the gradient as markers, an over-segmentation (segmentation into too many regions) is produced. With the current approach, a small closing followed by the extraction of the h-minima, the number of regions is reduced, but a change in the value of parameter h can provoke a large diﬀerence in the segmentation. The segmentation of an oriented texture can also be modiﬁed by changing the scale parameter ( 1) in the calculation of the orientation image. A last diﬃculty is when the orientation variations are not localized enough to be detected by the structuring element used in the gradient calculation, which can lead to the presence of more than one orientation in one of the regions of the segmentation. Several possible solutions to these problems remain to be studied. For example, starting with an oversegmentation of the image, and then fusing regions with similar mean orientations using a graph of the partition (Meyer, 1999) so as to eliminate over-segmented regions, or taking into account several partitions of the same texture so as to ﬁnd the most probable one (Nickels and Hutchinson, 1997). 3. Defect Detection with the Circular Centered Top-Hat We show the application of the circular centered top-hat operator to the detection of defects in oriented textures. The examples in this section were used by Chetverikov and Hanbury (2002) in studying the contribution and the limits of using the two most important perceptual properties of texture, regularity and isotropy, in detecting texture defects. Here we show some of the examples which use the orientation-based method so as to illustrate the application of the circular centered top-hat. For texture defects characterized by an orientation anomaly, the circular centered top-hat is a good choice for creating an image in which the defects can be detected by a threshold. To show the application of this operator, ﬁve images of size 256 256 pixels having a texture defect visible were chosen from the Brodatz (1966) album. Their reference numbers are d050, d052, d053, d077, and d079. The orientation images were calculated using the Rao and Schunck algorithm with parameters 1 ¼ 1.75, h ¼ v ¼ 2, and 2h ¼ 2v ¼ 16, except for images d052 and d079 for which 2h ¼ 2v ¼ 32 were used. The threshold

164

ALLAN HANBURY

for isolating the defect was chosen by hand for each image. The results for the ﬁve images are shown in Figure 22. In each line, the initial image, the orientation image, the result of the circular centered top-hat, and the borders of the thresholded regions superimposed on the initial image are shown. In textures d052 and d053, the defects are very subtle modiﬁcations of the structure, yet they perturb the orientations enough to be detected. The defect in texture d077 is easily seen and easily detectable in the orientation image. In textures d050 and d079, the defects cause perturbations in the orientation ﬁeld, but the borders of these defects are not obvious, even to the naked eye. Among these textures, the only one made up of oriented lines is d050, but the others are anisotropic enough to have a uniform orientation ﬁeld which is perturbed by the defects. This approach is obviously not applicable to textures which are not anisotropic, some examples of which are given by Chetverikov and Hanbury (2002). 4. Defect Detection with the Labeled Opening The labeled opening and its associated top-hat (Section II.G.5) have the advantage of being extremely rapid, and are therefore attractive for highspeed industrial inspection problems. In this section we show some examples of its application to the important industrial problem of the automated detection of defects on wood boards (Kim and Koivo, 1994; Silve´n and Kaupinen, 1996; Niskanen et al., 2001), part of a project done in collaboration with Scanwood System, Pont-a`-Mousson, France. In most of the existing algorithms, the defects are detected using color characteristics. For example, the knots are considered to be the darkest objects on the boards. We brieﬂy consider the possibility of enriching the color information by a preliminary detection which takes into account the fact that certain types of defects cause a perturbation in the orientation of the veins in their neighborhood. This could allow the detection of defects which are not completely discernible by their color. For wood, the most important structural defects are the knots, some of which do not have a color very diﬀerent to that of the wood, but which nevertheless cause perturbations in the surrounding vein orientations. The defects identiﬁable only by a change in color are obviously not detectable by these orientational methods, the same being applicable to defects due to external inﬂuences, such as insect stings. For the experiments, we used a database of oak boards with a very high defect occurrence. In order to speed up the calculation, we used a large separation between the neighborhoods in the orientation image calculation. The parameters used were 1 ¼ 1.4, 2h ¼ 2v ¼ 64, and h ¼ v ¼ 16. Some

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

d050ini

d050ori

d050ath

d050det

d052ini

d052ori

d052ath

d052det

d053ini

d053ori

d053ath

d053det

d077ini

d077ori

d077ath

d077det

d079ini

d079ori

d079ath

d079det

165

FIGURE 22. Results of defect detection by the circular centered top-hat applied to some Brodatz textures. The images labeled ‘‘ini’’ are the initial images, ‘‘ori’’ the orientation images, ‘‘ath’’ the top-hat operator results, and ‘‘det’’ the regions detected by the threshold superimposed on the initial image.

166

ALLAN HANBURY

images and their corresponding orientation images are shown in the ﬁrst two columns of Figure 23. We then calculated the top-hat based on the labeled opening of the orientation images. The opening was done with a sector size ! ¼ 45 , and by varying from 0 ¼ 0 to 157.5 in steps of ¼ 22.5 . A square SE of size 3 was used. The residue of this top-hat enlarged and superimposed on the initial image is shown in the rightmost column of Figure 23, in which the light regions correspond to the residues. We brieﬂy discuss the results shown in Figure 23:

For image c005, the black vein is evidently not detected as it does not perturb the orientation. The knot at the top right is detected, but the small knots at the bottom left do not perturb the orientation enough to be detected. For image c007, the large knot is detected, but the ﬁssures to the left of the knot, which have similar orientations to the veins, are not detected. Some false detections near the borders of the image are also present. Image c034 demonstrates that this method is not very useful on boards which contain veins having elliptical forms. Their large curvature leads to many false detections. If one calculates the orientation image at a ﬁner resolution, then smaller defects can be detected. For oak, this is useful for detecting the small light patches, some of which are indicated in Figure 24(a). Even if these are not classiﬁed as defects, their detection can be important if one wishes to determine the aesthetic appearance of the wood. The detection of these light patches based only on their color is rather diﬃcult, as their color is very similar to that of other structures on the wood. On looking at the orientation image, one can see that because the light patches cut the veins, they produce perturbations in the orientation ﬁeld which can be detected by a top-hat. The orientation image of Figure 24(a), calculated using the parameters 1 ¼ 1.4, 2h ¼ 2v ¼ 16, and h ¼ v ¼ 8 is shown in Figure 24(b). The result of a top-hat based on a labeled opening is shown in Figure 24(c). The parameters of this operator are ! ¼ 45 , 0 ¼ 0 and ¼ 22.5 , and a square SE of size 4 was used. Globally, it is clear that a perturbation in the vein orientation is not always associated with a defect, and that a defect does not always perturb the surrounding veins. This method of defect detection can therefore not function as a total solution to the defect detection problem. The results can nevertheless be used in a defect classiﬁcation step, which takes color and other texture variations into account along with the orientation perturbations in the calculation of the probability of a defect being present at a certain position on the board.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

c005ini (438 × 1076)

c005ori (21 × 61)

c005cyc (438 × 1076)

c007ini (421 × 568)

c007ori (21 × 30)

c007cyc (421 × 568)

c034ini (432 × 686)

c034ori (23 × 37)

c034cyc (432 × 686)

167

FIGURE 23. Results of the detection of regions having orientation perturbations using the top-hat based on the labeled opening. The ‘‘ini’’ images are the initial images of oak boards (courtesy of Scanwood System, Pont-a`-Mousson, France), on which the defects found by an expert are outlined in black (the dark horizontal lines are red chalk marks on the board, and have no bearing on the experiment). The ‘‘ori’’ images are the orientation images. The light regions of the ‘‘cyc’’ images correspond to the residues of the orientation images detected by the top-hat. The size of the images, in pixels, is given below each image.

168

ALLAN HANBURY

(a)

(b)

(c)

FIGURE 24. Detection of defects at a smaller scale. (a) An oak board (courtesy of Scanwood System, Pont-a`-Mousson, France) with some of the small white patches manually indicated (size 608 955 pixels). (b) Orientation image (size 50 112 pixels). (c) Result of a tophat based on a labeled opening. The light pixels indicate the residue.

C. Conclusion In this section we consider applications in which the circular data can be processed independently. The ﬁrst is the processing of the phase image resulting from a Fourier transform of an election microscopy image. The second is the processing of oriented textures described by orientation ﬁelds. Wood veins are a classic example of an oriented texture, in which defects are often characterized by perturbations in the orientation ﬁeld. Finding these orientations using the labeled opening is demonstrated. The circular centered top-hat operator can also be applied to the wood defect detection problem, but has a far slower processing time (Hanbury and Serra, 2002b). The examples taken from the Brodatz album, even if they are not all made up of linear structures, are anisotropic enough that the texture perturbations are visible in the orientation ﬁeld, and detectable by a circular centered tophat. Finally, a method of segmenting oriented textures demonstrated on wood textures is described. Even though we treated the angular components of the Fourier transform phase-magnitude pairs, and of the oriented texture orientation-coherence pairs separately, this is not always possible. In the case of color images, there tends to be a particularly close relation between the hue and saturation

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

169

coordinates. Processing of color images is considered in the next two sections. IV. 3D POLAR COORDINATE COLOR SPACES The analysis of color images has become very common due to the widespread availability of reasonably priced color cameras. These cameras almost always capture images which are stored in the RGB format. Each pixel of an RGB color image is encoded as a vector containing three values, giving the amount of each of the red, green, and blue primaries making up each color. Color images, due to their vectorial structure, are generally more diﬃcult to process and analyze than grayscale images. Indeed, one of the commonly adopted approaches is to convert the color data to monochrome data by ﬁrst calculating the luminance or ﬁrst principal component, for example. Alternatively, each color channel is processed separately, or vector-valued operators which can take the three channels into account simultaneously are used. Lastly, an alternative representation of the vector space can be used, such as one in terms of 3D polar coordinates, describing each pixel color in terms of the possibly more intuitive hue, saturation, and brightness coordinates. In this section we discuss the RGB space, deﬁnitions of color intensity measures, and the improved hue, luminance, and saturation (IHLS) space, the latter being a 3D polar coordinate color description well suited to image processing and analysis tasks. A. Basic Definitions The RGB color space is a three-dimensional color space constructed from a basis of three primary color stimuli, given by the vectors 2 3 2 3 2 3 1 0 0 R ¼ 4 0 5, G ¼ 4 1 5, B ¼ 4 0 5 0 0 1 which correspond to the colors red, green, and blue. A color c is speciﬁed in this basis according to one of the laws of Grassman (Wyszecki and Stiles, 1982) c ¼ RR þ GG þ BB in which R, G, B 2 ½0, 1 , and the RGB cube is the cube [0, 1] [0, 1] [0, 1] which contains the coordinates corresponding to valid colors, where the vector corresponding to color c is c ¼ (R, G, B). For digital devices, the

170

ALLAN HANBURY

values R, G, and B are most often integers between 0 and 255, but it is easy to generalize from [0, 1] to any range of values. The primary color stimuli usually vary from device to device, making the RGB space device dependent. The images can be made device independent by transforming then to the International Commission on Illumination (CIE) XYZ space, for which calibration information on the coordinates of the primary color stimuli of the camera in the XYZ space and the lighting conditions (white point) is required. Gamma correction also plays a role in the formation of color images (Novak et al., 1992; Poynton, 1999). In general, video display devices have a nonlinear brightness response to the input voltage, of the form L ¼ V

ð47Þ

where L is the brightness, V is the input voltage, and the values of and are controlled by the brightness and contrast settings of the display. The value of is therefore variable, but usually around 2. To take this nonlinearity into account, many video cameras are designed with an inbuilt nonlinear light response, so that when a camera is connected directly to a display, the displayed image will be linearly related to the brightness of the scene. This means that the output voltage Vout of a camera is usually gamma corrected in the following way Vout ¼ I 1=

ð48Þ

where I is the light intensity recorded by the camera. For a color camera, this gamma correction is applied to each channel. Taking the device dependence and gamma correction into account requires that the image capture devices be calibrated, and is usually only necessary if one wishes to exchange colorimetric information between observers or devices. If one is only interested in measuring a change in images, such as a variation in the dominant shade of blue, where all images were taken with the same camera under the same conditions, then it is not essential to calibrate the equipment. The last basic facts considered in this section are the deﬁnitions of the terms brightness, luminance, and lightness. These terms are often used interchangeably, but they have the following speciﬁc deﬁnitions assigned to them by the CIE (Commission Internationale de l’E´clairage, 1987; Poynton, 1999):

Brightness: A subjective attribute of visual sensation describing whether an area appears to emit more or less light. It has no units of measurement.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

Luminance: Luminance, measured in the SI units of candela per square meter (cd/m2), is the luminous intensity per square meter. Luminous intensity, measured in candela, is the radiant intensity, measured in watts per steradian, weighted by the spectral response of the human eye. This measurement quantitatively describes the fact that if one looks at red, green, and blue light sources having the same radiant intensity in the visible spectrum, the green source will appear the brightest, the red one less bright, and the blue one the dimmest. In the international recommendation for the high-deﬁnition television standard (ITU-R Recommendation BT.709, 1990), the following equation for calculating luminance from the (non gamma corrected) red, green, and blue components is given: YðcÞ ¼ 0:2126R þ 0:7152G þ 0:0722B:

171

ð49Þ

Lightness: The human eye has a nonlinear response to luminance, which is taken into account by the lightness measure. A source with a luminance of only 18% of a reference luminance will appear to be half as bright. This measure is used in the CIE L*a*b* and L*u*v* color spaces.

B. 3D Polar Coordinate Color Representations These spaces essentially allow RGB rectangular coordinates to be speciﬁed in terms of 3D polar (also known as cylindrical) coordinates. As they are only an alternative representation of the RGB spaces, they do not add any supplementary properties such as device independence to the RGB space, but they are often more intuitive to use, and allow the colors to be treated more homogeneously. The ﬁrst step in the conversion from an RGB space to a 3D polar coordinate space is to place a new axis between the points (0, 0, 0) and (1, 1, 1) in the RGB space. As this axis passes through all the achromatic points (graylevels) for which R ¼ G ¼ B, it is called the achromatic axis. We then deﬁne a set of 3D polar coordinates with respect to this axis: (1) Brightness L 2 ½0, 1 : This coordinate gives the position of the color on the achromatic axis. (2) Hue H 2 ½0 , 360 Þ: This angular coordinate speciﬁes whether a color is red, yellow, green, magenta, etc. It is traditionally measured anticlockwise around the achromatic axis with respect to pure red.

172

ALLAN HANBURY

(3) Saturation S 2 ½0, 1 or chroma C 2 ½0, 1 : Measure of the distance of a color from the achromatic axis. Pure colors (i.e., highly saturated) are found further away from the achromatic axis. Upon examining the literature, one is faced with a bewildering array of such 3D polar coordinate color spaces, such as the HLS, HSV, HSI, and HSB spaces.5 We now discuss how a seemingly simple coordinate system transformation could have given rise to so many diﬀerent conversion methods, examine the disadvantages of these commonly used spaces, and present the IHLS space, which removes many of these disadvantages.

C. Discussion of the Existing 3D Polar Coordinate Spaces One of the reasons for the existence of such a large variety of 3D polar coordinate color spaces is the number of diﬀerent deﬁnitions of brightness. These deﬁnitions lead to spaces which have shapes which are not simply constructed as a pile of planar cross-sections of the cube taken perpendicular to the achromatic axis. Further problems with the existing transforms are due to them originally being developed for the easy numerical speciﬁcation of colors in computer graphics applications (Smith, 1978). Due to the associated brightness functions, the ‘‘natural’’ shape of the HSV space is a cone, and of the HLS space, a double cone (Levkowitz and Herman, 1993). A vertical slice through the achromatic axis of each of these spaces is shown in Figures A.4(a) and (c). In these images, the achromatic axis is a vertical line in the center, with a hue value of 0 to the right, and 180 to the left. The problem with using these conically shaped representations when specifying a color is that there are large regions which lie outside the cones, i.e., outside the gamut of valid colors. In order to avoid complicated veriﬁcation (originally on slow 1970s computers) of the validity of a speciﬁed color, these spaces were often artiﬁcially expanded into cylinders by dividing the saturation values by their maximum possible values for the corresponding brightness. Slices of these cylindrically shaped versions of the HSV and HLS spaces are shown in Figures A.4(b) and (d), respectively. The cylindrically shaped versions have often been carried over into image processing and computer vision, for which they are ill-suited, as discussed here.6 One of the claims often made in respect of the 3D polar coordinate color spaces is that the saturation and brightness coordinates are independent. 5

Shih (1995) summarizes the transforms to and from these spaces. Software already used by the author which implement cylindrically shaped color models include Matlab release 12.1, Aphelion 3.0, Optimas 6.1, and Paint Shop Pro 7. 6

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

173

However, the expansion of the conically shaped spaces into cylinders introduces a brightness normalization which removes this independence. This can easily be seen by examining the standard saturation component values of the color image in Figure A.1(d). In this image, not all the pixels which appear white have RGB coordinates of exactly (1, 1, 1), and not all the black pixels of exactly (0, 0, 0). The standard HSV saturation is shown in Figure 25(c). Due to the artiﬁcial expansion of the bottom part of the HSV cone, some of the pixels which look black, but do not have coordinates of exactly (0, 0, 0) are shown as being fully saturated, implying that they have a higher saturation than most of the colors. With the HLS saturation, shown in Figure 25(d), the HLS double cone is expanded in both the low- and highbrightness regions, leading to an image which is essentially useless for image analysis. The large diﬀerence between the HSV and HLS saturation images demonstrates the dependence of the saturation on the brightness function used (the brightness functions being diﬀerent for the HSV and HLS spaces). We now consider two cases of the confusion that the cylindrically shaped spaces can cause. Demarty and Beucher (1998) applied a constant saturation threshold in the cylindrical HLS space (Figure A.4(d)) to diﬀerentiate between chromatic and achromatic colors. This threshold can be represented by a vertical line on either side of the achromatic axis in Figure A.4(d), and it is clear that this does not correspond to a constant saturation. Demarty (2000) later improved the threshold by using a hyperbola in the cylindrical HSV space (Figure A.4(b)), which corresponds to a constant threshold in the conic HSV space (Figure A.4(a)). Smith (1997) makes the assumption that the cylindrical HSV space is perceptually uniform when a Euclidean metric is used, but upon examining Figure A.4(b), one sees that a certain distance in the high-brightness (top) part of the space corresponds to a far larger perceived change in color than the same distance in the low-brightness part of the space.

D. Derivation of a Useful 3D Polar Coordinate Space In this section we examine a derivation of a 3D polar coordinate system in the RGB space, pointing out the choices which could (and have) led to characteristics which are disadvantageous, and ending up with a 3D polar coordinate representation of the RGB space which is useful for image processing and analysis. This derivation is based on the derivation of the generalized lightness, hue, and saturation (GLHS) model by Levkowitz and Herman (1993). As the derivation of color spaces is not the principal theme of this chapter, we present only an outline of the derivation. Full details can be found in Hanbury and Serra (2002a).

174

ALLAN HANBURY

(a) Luminance

(b) Hue

(c) HSV cylindrical saturation

(d) HLS cylindrical saturation

(e) Suggested IHLS saturation

(f) IHLS chroma

(g) Saturation _ chroma

FIGURE 25. (a)–(f ) Various 3D polar color space components for the color image in Figure A.1(d). (g) Arithmetic diﬀerence between images (e) and (f ). The highest pixel value in this image is 0.127, but the contrast has been stretched to make the diﬀerences more visible.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

175

1. Brightness In order to conform to the terminology suggested by the CIE, we call a subjective measure of luminous intensity the brightness. The brightness function of the GLHS model is LðcÞ ¼ wmin min ðcÞ þ wmid mid ðcÞ þ wmax max ðcÞ

ð50Þ

in which the functions min (c), mid (c), and max (c) return, respectively, the minimum, median, and maximum component of a vector c in the RGB space, and wmin, wmid, and wmax are weights set by the user, with the constraints wmax>0 and wmin þ wmid þ wmax ¼ 1. Speciﬁc values of the weights give the brightness functions used by the common cylindrically shaped color spaces: wmin ¼ 0, wmid ¼ 0, and wmax ¼ 1 for HSV; wmin ¼ 1/2, wmid ¼ 0, and wmax ¼ 1/2 for HLS; and wmin ¼ 1/3, wmid ¼ 1/3, and wmax ¼ 1/3 for HSI. In the RGB space, one can visualize surfaces of isobrightness (or isoluminance). The surfaces of isobrightness l contain all the points such that L(c) ¼ l and intersect the achromatic axis at l. For the HSV and HLS spaces, these surfaces have a complicated shape, as described by Levkowitz and Herman (1993), and for the HSI space they are planes perpendicular to the achromatic axis. The isoluminance surfaces (Equation (49)) are planes oblique to the achromatic axis. The isobrightness and isoluminance surfaces corresponding to a single brightness or luminance function are by deﬁnition parallel to each other. 2. Hue The hue angle is traditionally measured starting at the direction corresponding to pure red. The simplest way to derive an expression for this angle is to project the vector (1, 0, 0) corresponding to red in the RGB space and an arbitrary vector c onto a plane perpendicular to the achromatic axis, and to calculate the angle between them. This gives the expression 0

H ¼ arccos

R 12 G 12 B

ðR2 þ G2 þ B2 RG RB BGÞ1=2

ð51Þ

after which, in order to give a value of H 2 ½0 , 360 , we apply H¼

360 H 0 H0

if B > G : otherwise

ð52Þ

176

ALLAN HANBURY

An approximation to this trigonometric expression is often used, and it is shown by Levkowitz and Herman (1993) that the approximated value diﬀers from the trigonometric value by at most 1.12 . Nevertheless, given that one generally has much processing power available today, the use of the approximation is not recommended, as it tends to suﬀer to a larger extent from the discretization problems pointed out by Kender (1976). 3. Saturation For the derivation of an expression for the saturation of an arbitrary color c, we begin by looking at the triangle which contains all the points with the same hue as c, as shown in Figure 26. The intersection of this triangle and the isobrightness surfaces are lines parallel to the line between c and its brightness value on the achromatic axis L(c) ¼ [L(c), L(c), L(c)]. Traditionally, the saturation is calculated as the length of the vector from L(c) to c divided by the length of the extension of this vector to the surface of the RGB cube. This deﬁnition, however, results in color spaces in the form of cylinders discussed in Section IV.C. Moreover, it is clear that this deﬁnition of the saturation depends intimately on the form of the brightness function chosen (i.e., on the slopes of the isobrightness lines). In order to keep the conic or bi-conic forms of the spaces, it is necessary to change the deﬁnition of the saturation. Instead of the deﬁnition given

FIGURE 26. The triangle which contains all the points with the same hue as c. The circled corners mark the extremities of the edges of the cube containing the points furthest away from the achromatic axis.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

177

above, we divide the length of the vector from L(c) to c (in Figure 26) by the length of the vector between L[q(c)] and q(c), that is, the longest vector parallel to [L(c), c] included in the isohue triangle, the vector which necessarily intersects the third corner q(c) of the triangle. We then end up with the following expression for the saturation S¼

kLðcÞ ck kL½qðcÞ qðcÞk

ð53Þ

in which k k indicates the Euclidean norm. This saturation is independent of the choice of the brightness function, which can be shown by using similar triangles (Hanbury and Serra, 2002a). An example of this saturation measurement is shown in Figure 25(e), where it should be compared to the corresponding HSV and HLS examples. The most visible improvement resulting from this deﬁnition is that both the white and black regions of the color image are assigned a low saturation value. The points furthest away from the achromatic axis are those on the edges of the RGB cube between the circled corners in Figure 26. These points correspond to the most highly saturated colors, and if we project them onto a plane perpendicular to the achromatic axis, they from the edges of a hexagon, which correspond to the maximum distance a point can be from the achromatic axis for a given hue. A simpler expression for the saturation of point c can be obtained by projecting it onto this hexagon, and dividing the distance of the projected point from the center of the hexagon by the distance from the center to the hexagon edge at the same hue value (Hanbury and Serra, 2002a). By using Equation (53) along with the brightness function LðcÞ ¼ min ðR, G, BÞ

ð54Þ

one can derive the following extremely simple saturation expression (Hanbury and Serra, 2002a): S0 ¼ max ðR, G, BÞ min ðR, G, BÞ

ð55Þ

4. Chroma Carron (1995) suggests the use of the distance of a point from the achromatic axis without the maximum distance normalization as an approximation to the saturation, which he calls chroma. This distance is multiplied by a constant so that for the six vertices of the projected hexagon (i.e., corresponding to the circled vertices in Figure 26) the chroma has a

178

ALLAN HANBURY

maximum value of one. An example of the chroma is shown in Figure 25(f ) and the diﬀerence between the chroma and the saturation images is shown in Figure 25(g) (the contrast has been enhanced for better visibility, the maximum pixel value in the image is 0.127). The maximum possible diﬀerence between a saturation and a chroma value for a color is 0.134.

E. The IHLS Space In this section we present algorithms for transforming back and forth between the RGB and the improved HLS (IHLS) space. This latter is an improvement on the standard HLS space that replaces the cylindrical saturation measure with a conic one, thereby allowing the use of any brightness or luminance function (provided that they produce parallel isobrightness surfaces). In these algorithms, we have chosen to use the luminance function because of its psychovisual properties. MATLAB routines implementing the following transformations are available at http://www.prip.tuwien.ac.at/ hanbury. Two transforms from the RGB space to the IHLS space are given, both of which produce exactly the same hue, saturation, and luminance coordinates. The ﬁrst is extremely rapid, while the second is easier to invert. The inverse transformation from IHLS to RGB is also presented. 1. The Simplest RGB to IHLS Transformation For the simplest implementation, one calculates a brightness measure (Equation (49) or Equation (50)), the saturation using Equation (55), and the hue using Equations (51) and (52), as summarized here:

0

YðcÞ ¼ 0:2126R þ 0:7152G þ 0:0722B

ð56Þ

SðcÞ ¼ max ðR, G, BÞ min ðR, G, BÞ

ð57Þ

H ðcÞ ¼ arccos

R 12 G 12 B ðR2 þ G2 þ B2 RG RB BGÞ1=2

HðcÞ ¼

360 H 0 H0

if B > G : otherwise

ð58Þ

ð59Þ

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

179

2. An Alternative RGB to IHLS Transformation A method to calculate the luminance, trigonometric hue, chroma, and saturation coordinates is given here, based on the one suggested by Carron (1995). The changes with respect to the version given by Carron are the extension to calculate the saturation from the chroma, and the use of luminance instead of brightness. The algorithm gives precisely the same Y, S, and H component values as the simpler algorithm in the previous section, but is simpler to invert as it contains no max or min functions. The ﬁrst step is 2

3 2 0:2125 Y 4 C1 5 ¼ 4 1 C2 0

0:7154 12 pﬃﬃ 23

32 3 0:0721 R 12 54 G 5, pﬃﬃ 3 B 2

ð60Þ

followed by the calculation of the chroma C 2 ½0, 1 C¼

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ C12 þ C22 ,

the hue H 2 ½0 , 360 8 undefined > < H ¼ arccosðC1 =CÞ > : 360 arccosðC1 =CÞ

if C ¼ 0 if C 6¼ 0 and C2 0 , if C 6¼ 0 and C2 > 0

and, if required, the saturation S 2 ½0, 1 S¼

2C sin ð120 H * Þ pﬃﬃﬃ 3

ð61Þ

in which H * ¼ H k 60

where k 2 f0, 1, 2, 3, 4, 5g so that 0 H * 60 : ð62Þ

3. The Inverse Transformation from IHLS to RGB To transform colors represented in the IHLS coordinate system obtained using either of the above algorithms to RGB coordinates, one ﬁrst calculates the chroma values from the saturation values (using Equation (61)): pﬃﬃﬃ 3S C¼ 2 sinð120 H * Þ

ð63Þ

180

ALLAN HANBURY

in which H* is given by Equation (62). From the chroma, one calculates C1 ¼ C cos ðHÞ

ð64Þ

C2 ¼ C sin ðHÞ:

ð65Þ

For the case where the hue is undeﬁned C1 ¼ C2 ¼ 0. Finally, the inverse of the matrix used in Equation (60) is used to obtain R, G, and B: 2

3 2 R 1:0000 4 G 5 ¼ 4 1:0000 B 1:0000

32 3 0:7875 0:3714 Y 0:2125 0:2059 5 4 C1 5: 0:2125 0:9488 C2

ð66Þ

F. Conclusion The commonly used 3D polar coordinate color representation systems, such as the HLS and HSV, are unsuited to image processing and analysis. The principal reason for this is the artiﬁcial expansion of the natural conic shapes of the spaces into a cylindrical shape. In this chapter, we propose a generalized 3D polar coordinate representation of the RGB space, called the IHLS space, which has the following advantages over the commonly used cylindrically shaped ones:

Achromatic or near-achromatic colors always receive a low saturation value. As we have removed the normalization of the saturation by the brightness function present in the cylindrically shaped models, these two coordinates are independent, allowing a wide choice of brightness functions. The removal of the brightness normalization of the saturation means that comparisons between saturation values are meaningful, which is important in the context of mathematical morphology. Any 3D polar coordinate color representation is very closely tied to the RGB space, being simply a diﬀerent representation of it. It therefore does not have any supplementary properties such as device independence. The main advantage of the 3D polar coordinate representation is that it is a more homogeneous color representation, as the colors are not speciﬁed in terms of ﬁxed directions corresponding to red, green, and blue, but as angular values. This representation sometimes allows features which are not clearly visible in the RGB space to be seen and exploited.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

V. PROCESSING

OF

181

3D POLAR COORDINATE COLOR SPACES

Up to this point we have discussed the application of mathematical morphology to values on the unit circle, and have given some examples of the morphological processing of phase images and of orientation images describing oriented textures, cases in which the angular data can often be processed separately. This separate processing is nevertheless not always possible as there are often supplementary measures closely associated with the angular value: the amplitude associated with the phase in a Fourier transform, or the luminance and saturation coordinates associated with the hue. In this section we discuss the latter case, mathematical morphology applied to color images represented in a 3D polar coordinate system (i.e., in the IHLS space). Each pixel in this type of image is encoded by a vector containing an angular value, the hue, and two linear values, the luminance and the saturation. We begin, in Section V.A, by discussing circular statistics applied to the hue, and suggest a way of taking the saturation into account when calculating hue means and variances. The application of morphological operators to color images is a special case of vectorial mathematical morphology. Some aspects of this subject, notably the vectorial orders, are discussed in Section V.B, We then present, in Section V.C, the use of lexicographical orders in the IHLS space. We show that in order to obtain usable results with operators using a lexicographical order with hue at the top level, it is necessary to weight the hue by the saturation. The weighting method used here is nonetheless diﬀerent to the one used in the context of color statistics. A color top-hat operator is also suggested. A. Color Statistics In a 3D polar coordinate color space, for example the IHLS space, if we treat each channel separately then the classic linear statistical methods may be used to calculate descriptors of the luminance and saturation. For the hue, on the other hand, circular statistics (Section II.B) must be used. This processing by color band presents the disadvantage of ignoring the close relationship between the two chrominance bands, the saturation and the hue, thereby giving equal importance to all the hues, irrespective of their associated saturation. A mean of the hue weighted by the saturation allows the simultaneous use of both chrominance components. We begin with n pairs of values, the hues Hi and their associated saturations Si. To calculate the mean of the hue weighted by the saturation, we proceed, as in

182

ALLAN HANBURY

Section II.B, by calculating the direction of the resultant vector of the sum of vectors in the directions Hi, except that instead of unit vectors, the vector with direction Hi has a length proportional to the saturation Si. The hues associated with small saturation values therefore have less inﬂuence on the result. We now present the changes to be made to the equations of Section II.B in order to calculate a saturation-weighted hue mean. Equations (5) and (7) are replaced by AS ¼

n X i¼1

Si cos Hi ,

BS ¼

n X

Si sin Hi ,

R2S ¼ A2S þ B2S

ð67Þ

i¼1

and we replace A and B in Equation (6) by AS and BS giving 8 arctan ðBS =AS Þ > > > > < arctan ðBS =AS Þ þ p H S ¼ arctan ðBS =AS Þ þ 2p > > p=2 > > : 3p=2

if if if if if

BS > 0, AS AS < 0 BS < 0, AS AS ¼ 0, BS AS ¼ 0, BS

>0 >0 >0 <0

ð68Þ

where the saturation-weighted hue mean is denoted by H S . The mean length (Equation (8)) becomes RS RS ¼ Pn i¼1

Si

:

This expression is still an indicator of the spread of the hue values, and is not linked to the standard saturation mean. The direction of the mean H S is independent of the position of the hue origin, even if its value depends on the position of the origin. In practice, for images which contain only highly saturated colors, there is not a large diﬀerence between the values of the weighted and unweighted hue means. Figure A.1(f ), whose saturation is shown in Figure 27(a), is an image for which the diﬀerence is signiﬁcant. For this color image, the unweighted hue mean is H ¼ 326:9 , and the saturation-weighted hue thresholds on the hue band of mean is H S ¼ 19:7 . To show the diﬀerence, the image for the intervals ½H 20 , H þ 20 and ½H S 20 , H S þ 20 are shown in Figures 27(b) and (c), respectively. It is clear that the saturationweighted hue mean corresponds to the hue of the regions of the image with the highest saturation, the two cells. The unweighted hue mean is skewed by the low-saturation background.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

(a)

(b)

183

(c)

FIGURE 27. (a) Saturation of Figure A.1(f ). (b) Pixels of the initial image, which have a hue value in the interval of 20 on either side of the unweighted hue mean. (c) Pixels of the initial image, which have a hue value in the interval of 20 on either side of the saturation-weighted hue mean.

B. Vectorial Mathematical Morphology Mathematical morphology operators on complete lattices (Serra, 1988), which are based on two notions: the assignment of an order, and of a supremum and inﬁmum. To be able to apply mathematical morphology to color images, it is necessary to be able to impose an order on the colors, and to ensure the existence of suprema and inﬁma to allow the construction of a lattice to which morphological operators can be applied. The processing of color images is a special case of the processing of vectorial data or images. We consider the ﬁrst requirement in the construction of a lattice, that of orders, and we present a general introduction to vectorial orders. We note that for certain vectorial orders, the supremum and inﬁmum of a set of vectors do not always form part of the set. In the speciﬁc case of color images, this could result in the introduction of false colors into an image processed by a morphological operator (Talbot et al., 1998). A false color is a color present in the image after the application of an operator which was not in the initial image. For ﬁlters aiming to simplify an image, the appearance of these new colors is generally unwanted.

1. Vectorial Orders In the framework of mathematical morphology, the three important vectorial orders are the preorders, partial orders, and total orders. Before we deﬁne these orders, we give the deﬁnitions of relations useful for characterizing an order relation.

184 Deﬁnition 7

ALLAN HANBURY

Let R be a binary relation on an arbitrary set A.

(1) R is reﬂexive iﬀ 8x 2 A, xRx (2) R is transitive iﬀ 8x, y, z 2 A, xRy, and yRz ) xRz (3) R is antisymmetric iﬀ 8x, y 2 A, xRy, and yRx ) x ¼ y For example, for order relations on R, the binary relation R is generally the relation ‘‘ ’’ or ‘‘<’’ (or their inverses). We now move onto the deﬁnitions of order relations. Deﬁnition 8 transitive.

A binary relation R on a set A is a preorder iﬀ R is reﬂexive and

Deﬁnition 9 A binary relation R on a set A is a partial order iﬀ R is reﬂexive, transitive, and antisymmetric. Deﬁnition 10

A partial order is totally ordered iﬀ 8x, y 2 A, xRy, or yRx.

Hence, a totally ordered set contains no pairs of members which cannot be ordered. We call a partial order which is totally ordered a total order. Using these deﬁnitions, the mathematical framework in which mathematical morphology operates, the complete lattice, can be rigorously deﬁned. Deﬁnition 11

A complete lattice is a set L such that

(1) L is provided with a partial order. (2) For every family of elements fXi g 2 L, a supremum and an inﬁmum exist. Barnett (1976) describes four ways of ordering vectorial data, three of which are often used in the processing of vectorial images, these being the marginal order, reduced order, and conditional or lexicographical order. They have been applied to the deﬁnition of median ﬁlters (Pitas and Tsakalides, 1991) and morphological operates (Comer and Delp, 1999). We now give the deﬁnitions of the three vectorial orders used in image processing. Let V be a set of n vectors xi with dimension p, with xi ¼ ðx1ðiÞ , x2ðiÞ , . . . , xpðiÞ Þ,

i ¼ f1, 2, . . . , ng:

Deﬁnition 12 For the marginal order, the vector components are ordered for each of the p dimensions. For two vectors xi , xj 2 V xi xj Q xkðiÞ xkðjÞ 8k 2 f1, 2, . . . , pg:

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

185

The supremum of the set V is therefore xsup ¼

_

_ _ x1ðiÞ , x2ðiÞ , . . . xpðiÞ

i

i

!

i

and the inﬁmum is

xinf

! ^ ^ ^ ¼ x1ðiÞ , x2ðiÞ , . . . xpðiÞ : i

i

i

The marginal order is a partial order. We cannot order the vectors (1, 2) and (2, 1), for example. It is clear that the vectors xsup and xinf do not necessarily form part of the initial set of vectors V. For example, the inﬁmum of vectors (1, 2) and (2, 1) is the vector (1, 1). Deﬁnition 13 The reduced order uses a function g : Rp ! R to impose an order on the vectors. For two vectors xi , xj 2 V xi xj Q gðxi Þ gðxj Þ: The supremum of set V is ( xsup ¼ xi : gðxi Þ ¼

)

_

gðxj Þ

j¼1,2,...,n

and the inﬁmum is ( xinf ¼ xi : gðxi Þ ¼

^

) gðxj Þ :

j¼1,2,...,n

For the reduced order, the vectors xsup and xinf are members of the initial set of vectors V, as we do not process the vector components separately. The reduced order is a preorder if the function g is not injective, but totally ordered if it is. For example, in a two-dimensional space, the function g[(x, y)] ¼ x þ y is not injective, which means that as g[(1, 2)] ¼ g[(2, 1)], it is not possible to order these vectors. This problem can be surmounted by using a function g which is injective, or by using a lexicographical order.

186

ALLAN HANBURY

Deﬁnition 14 The conditional or lexicographical order is based on the following order relation for two vectors xi , xj 2 V:

xi xj if

8 x1ðiÞ x1ðjÞ > > > > > or > > > > > < x1ðiÞ ¼ x1ðjÞ and x2ðiÞ x2ðjÞ or > > > > . > > > .. > > > : x1ðiÞ ¼ x1ðjÞ and x2ðiÞ ¼ x2ðjÞ and x3ðiÞ ¼ x3ðjÞ and . . . and xpðiÞ xpðjÞ ð69Þ

or, written more compactly (Chanussot and Lambert, 1998) xi xj Q 9k 2 f1, 2, . . . , pg : xlðiÞ ¼ xlðjÞ 8l 2 f1, 2, . . . , k 1g and xkðiÞ xkðjÞ : The supremum and inﬁmum of the set V are deﬁned based on this order relation. The lexicographical order is a total vector order, with the property that the supremum and inﬁmum are always members of the initial vector set V. The use of this order necessarily implies the attribution of a priority to the components, as in the majority of cases the order of two vectors will be decided by the ﬁrst line of Equation (69) (and hence by the ﬁrst vector component). It is obviously not necessary to limit oneself to a component priority based on the order of the components within the vectors. The components can be placed into Equation (69) in an order of priority deﬁned by the user. It is even possible to place a noninjective function g (from the deﬁnition of the reduced order) at the ﬁrst level of the lexicographical order, thereby creating a total order. Some alternative orders are suggested in the literature. A total order based on space-ﬁlling curves is suggested by Chanussot and Lambert (1998) and Chanussot (1998). Serra (1992) suggests an intermediate order between the marginal and lexicographical orders. Comer and Delp (1999) use nontotal orders along with a geometric criterion based on pixel position in the structuring element, allowing vectors for which the order is not deﬁned by the order relation chosen to be ordered. An application of fuzzy mathematical morphology (Bloch and Maıˆ tre, 1995) for color images is presented, along with a textile inspection application, by Ko¨ppen et al. (1999).

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

187

2. Morphological Operators After having chosen or deﬁned a lattice for the color vectors, thereby permitting the choice of a supremum and an inﬁmum of these vectors, the basic morphological operators can be applied. The erosion at point x by structuring element B is "B f ðxÞ ¼ f f ðyÞ : f ðyÞ ¼ inf ½ f ðzÞ , z 2 Bx g

ð70Þ

and the corresponding dilation is B f ðxÞ ¼ f f ð yÞ : f ð yÞ ¼ sup ½ f ðzÞ , z 2 Bx g:

ð71Þ

These operators can be used to build other operators, such as the opening B and the closing ’B. C. Lexicographical Orders in the IHLS Color Space When a lexicographical order is used with morphological operators, one ﬁnds that the majority of decisions on the vector order in a structuring element are taken at the ﬁrst level of the order relation (Hanbury and Serra, 2001b). The application of a lexicographical order to a color space of type RGB necessarily results in the promotion of one of the red, green, or blue components to a dominant position, which produces operators which treat the color space inhomogeneously. The use of a 3D polar coordinate space, such as IHLS, allows the creation of two operators which use the homogeneous coordinates of luminance and saturation at the ﬁrst level, or of an operator for which any hue can be chosen to play the dominant role (Hanbury, 2001; Hanbury and Serra, 2001a). In this section, we ﬁrst present formulations of the lexicographical order with luminance and with saturation at the ﬁrst level. Next, we consider an order with the hue at the ﬁrst level. After a demonstration of the inconveniences of this order caused by the close relation between the two chrominance components, the hue and the saturation, we suggest a solution in the form of a lexicographical order with hue weighted by saturation at the ﬁrst level. Lastly, a color top-hat operator is suggested. The image used in the examples in this section is shown in Figure A.1(e), and its hue, saturation, and luminance components are shown in Figure 28. 1. Luminance and Saturation The luminance and saturation coordinates each form a complete lattice, and it is therefore easy to use them in a lexicographical order. The angular

188

ALLAN HANBURY

(a) Hue

(b) Saturation

(c) Luminance

FIGURE 28. The IHLS space: (a) hue, (b) saturation, and (c) luminance of the image in Figure A.1(e).

coordinate (the hue), placed at the third level so as to minimize its importance, is ordered based on its distance from an origin (Section II.D). It is therefore necessary to choose an origin for the hues, but this origin intervenes only in the third level of the lexicographical order. It therefore only arbitrates in the cases where two vectors have equal luminance and saturation values. We deﬁne the lexicographical order with luminance at the ﬁrst level for two vectors ci ¼ (Hi, Yi, Si) and cj ¼ (Hj, Yj, Sj) in the IHLS space, as

ci > cj

if

8 Yi > Yj > > > > < or Yi ¼ Yj and Si > Sj > > or > > : Yi ¼ Yj and Si ¼ Sj and Hi H0 < Hj H0

ð72Þ

where H0 is the hue origin chosen by the user and the symbol indicates the acute angle between the two hues (Equation (4)). If the luminance values of the two vectors being compared are equal, then the vector with the higher saturation value is taken as being larger. If the luminance and saturation values are equal, it is necessary to take the hue values into account. The top

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

189

two levels of this relation are invariant to rotation around the achromatic axis. The erosion and dilation obtained by using this order applied to Figure A.1(e) are shown in Figure A.5(a) and (b), respectively (we set H0 ¼ 0 and use a square SE of size 2). The lexicographical order with saturation at the top level is constructed by inverting the top two levels of relation (72), giving

ci > cj

8 Si > Sj > > > > < or if Si ¼ Sj and Yi > Yj : > > or > > : Si ¼ Sj and Yi ¼ Yj and Hi H0 < Hj H0

ð73Þ

The erosion and dilation obtained when using this order are shown in Figures A.5(c) and (d), respectively (a square SE of size 2 was used). The lexicographical orders suggested in Equations (72) and (73) are obviously not the only orders of this type possible. One can easily invert the second and third levels, or the directions of the comparison operators in these two levels, still keeping valid lexicographical orders. The choice between these two orders depends on whether one is interested in luminous and dark objects, or in saturated and nonsaturated objects. For example, if one wishes to eliminate only the white or black writing from the orange card at the bottom of the image in Figure A.1(e), one could use, respectively, a luminance opening or closing.7 If both black and white writing is to be removed, a saturation closing is recommended. In general, the operators with luminance at the top level are better at preserving object contours. A good example of an application using a lexicographical order with luminance at the top level is given by Iwanowski (2000) in the context of color image interpolation. He uses a lexicographical order having luminance at the top level, and the values of the green and red components of the RGB space at the second and third levels, respectively. 2. Hue For the hue, the obvious approach is to construct a lexicographical order with, at the ﬁrst level, a hue order based on the distance from an origin. 7 Due to lack of space for printing color images, the results of the opening and closing are not shown. They can either be mentally extrapolated from the erosion and dilation images, or downloaded from http://www.prip.tuwien.ac.at/ hanbury. Larger versions of the images shown are also available on this web page.

190

ALLAN HANBURY

A possible form for this order is

ci > cj

8 Hi H0 < Hj H0 > > > > < or if Hi H0 ¼ Hj H0 and Si > Sj : > > > or > : Hi H0 ¼ Hj H0 and Si ¼ Sj and Yi > Yj

ð74Þ

Upon applying morphological operators using this order, one sees that the results are not satisfactory. This is due to the close relationship between the chrominance components, the hue and the saturation. To give an example, we use this order with H0 ¼ 40 (which corresponds to the color of the orange blobs near the top right of the image) applied to the image in Figure A.1(e). With this choice of origin, we intend a dilation to enlarge the red and orange regions, and to shrink the blue and violet regions with hues around 220 (the lettering on the violet area at the top left of the image has a hue of about 245 , and is surrounded by a low-saturation color with a hue of around 300 ). The results are shown in Figure A.5(e) (erosion) and Figure A.5(f ) (dilation). Upon examining the result of the dilation, one sees that the red lines have all been covered by the neighboring white pixels, that some black pixels still remain within the upper right orange blobs, that the white letters on the orange card have actually been enlarged, and that the rightmost border of the orange card has become jagged, due to only some of the orange pixels having been expanded over the surrounding white background. In essence, the black and white pixels are sometimes chosen preferentially to the red and orange pixels, contrary to our requirements. The reason is that the low-saturation black and white regions, for which the hue values are rather arbitrary, often have hue values closer to the chosen origin than the high-saturation regions. For example, a white pixel having a hue of 0 is considered to be closer to the chosen origin (40 ) than a red pixel with a hue of 350 . The disadvantage of this order by hue only is better illustrated by the simple example of Figure A.3(a), which shows four colors along with their hue, luminance, and saturation coordinates (in this order in the vectors). The positions of these colors on the hue circle are shown in Figure 29(a). If one chooses the color closest to red a ¼ (0.0, 0.21, 1.00) in this image by using only the hue values, the result is the brown c ¼ (9.3 , 0.47, 0.20), whereas the orange b ¼ (18.9 , 0.46, 0.90) is visually the most similar. This contradiction is due to the low value of the saturation of color c, which makes it more of a gray than a color, and therefore very far from pure red. The solution proposed is to weight the hue values by their corresponding saturation values before ordering the hues.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

191

FIGURE 29. (a) The positions of the colors of Figure A.3(a) on the hue circle. (b) The positions of the colors of Figure A.3(a) on the hue circle after weighting the hues by the corresponding saturations.

3. Saturation-Weighted Hue Demarty (2000) introduced an algorithm for dividing the pixels of an image into two classes, highly saturated pixels and monochromatic pixels. This binary approach to separating the pixels is nevertheless not ﬂexible enough to be applied to a weighting of the hues by the saturation. In the context of hue statistics, we presented in Section V.A a method for determining saturation-weighted hues. This method represents each hue by a vector with a length proportional to the associated saturation. If we consider the hues as points on the unit circle, this method represents the weighted hues as points in the interior of the circle. This bidemensional representation is, however, not convenient if we wish to impose an order based only on the angular values of the hues. The method for weighting the hue by the saturation suggested in this section changes the position of the hue on the unit circle as a function of the saturation and of the origin chosen. Because these weighted hues remain on the unit circle (and do not move into its interior), they can be ordered based on their angular values, as for the nonweighted hues. We start with a set of vectors in the IHLS space in which we wish to ﬁnd the supremum and inﬁmum with respect to a selected hue origin H0. These vectors are ordered by weighted hue by using the following algorithm: (1) We ﬁrst calculate, for each vector, a saturation-weighted hue H0 . (2) We use H0 to order the vectors with respect to the angular distance from the chosen origin H0. (3) After the supremum and inﬁmum are chosen, the vectors are reassigned their initial hues so as to avoid introducing false colors.

192

ALLAN HANBURY

The principal characteristics of the saturation weighted hue H0 are:

The vectors with high saturation values keep their initial hue values. The vectors with low saturation values are assigned weighted hues close 90 , thereby reducing their likelihood of being to H0 þ 90 or H0 chosen as supremum or inﬁmum.

Before giving a general formulation of the hue-weighting algorithm, we illustrate it using the example of Figure A.3(a). We choose the origin H0 ¼ 0 , and calculate the weighted hue values H0 for the four colors. For hues between 0 and 90 , we deﬁne the value of H0 as follows Hi0 ¼ sup½Hi , 90 ð1 Si Þ :

ð75Þ

The color c, whose hue is Hc ¼ 9.3 , and for which the second argument of the supremum operator in Equation (75) gives 90 (0.2 90 ) ¼ 72 , is assigned a weighted hue of Hc0 ¼ 72 . For the colors b and d, the expression 90 (0.9 90 ) ¼ 9 , and for color a, the expression 90 (1.0 90 ) ¼ 0 . Therefore, the weighted hues for these colors are equal to their initial hues: Ha0 ¼ Ha , Hb0 ¼ Hb , and Hd0 ¼ Hd . The positions of the weighted hues on the hue circle are shown in Figure 29(b), from which it is clear that color a remains closest to the origin, with color b now in second position. The general formulation (for hues between 0 and 360 ) of the hue weighting by their corresponding saturations is now presented. For each vector i, a value Hi0 is calculated from Hi and Si. To simplify the notation, we make the hypothesis that the origin H0 is placed at 0 . The value of Hi0 is

Hi0 ¼

8 sup ½Hi , 90 ð1 Si Þ > > > > > < inf ½Hi , 90 ð1 þ Si Þ > > sup ½Hi , 90 ð3 Si Þ > > > : inf ½Hi , 90 ð3 þ Si Þ

if 0 Hi 90 if 90 < Hi 180 if 180 < Hi 270

:

ð76Þ

if 270 < Hi < 360

To use any origin H0, it is suﬃcient to replace the Hi in Equation (76) by ( Hi !

Hi H0

if Hi H0 0

360 þ ðHi H0 Þ if Hi H0 < 0

:

ð77Þ

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

193

A lexicographical order with the saturation-weighted hue in the ﬁrst position is

ci > cj

8 0 ðH 0 Þ < ðHj0 0 Þ > > > i > > > < or if ðHi0 0 Þ ¼ ðHj0 0 Þ and Si > Sj : > > > or > > > : 0 ðHi 0 Þ ¼ ðHj0 0 Þ and Si ¼ Sj and Yi > Yj

ð78Þ

Note that by placing the hue at the top level, we have created a morphological operator which by design is not rotationally invariant. The diﬀerences between the results obtained with morphological operators using the lexicographical order with the hue at the top level (Equation (74)) and the lexicographical order using the saturation-weighted hue at the top level (Equation (78)) are visible in Figure A.5, in which images (g) and (h) show the erosion and dilation of Figure A.1 using the order by saturation-weighted hue. The origin was chosen to be at 40 , so that the dilation should enlarge the red and orange regions, and shrink the blue and violet regions. The fact that this is not possible with a dilation with only the hue at the top level has already been discussed. It can be seen in Figure A.5(h) that the result that we require is produced, as the white and black pixels are removed from consideration due to their low saturation. It is instructive to compare the results of the operators which use the saturation-weighted hue at the top level (Figures A.5(g) and (h)) with those of the operators which use the saturation at the top level (Figures A.5(c) and (d)). We remark that, except for the nonsymmetric treatment of the colorful regions inherent to the operators which use the weighted (or unweighted) hue, the results are very similar. A disadvantage of using the saturationweighted hue is evident in regions which do not contain pixels having a saturated color close to the chosen origin, for which the results are less predictable. This is visible, for example, in regions containing only black and white pixels in Figures A.5(g) and (h). A possible solution to this problem could be to check if all the pixels in the structuring element have weighted hue values H0 in the intervals (subtended by an acute angle) h i , H0 þ 90 þ H0 þ 90

h or

i H0 90 , H0 90 þ

and in this case, to use instead the luminance for ordering the pixels. The value of , chosen by the user, gives the size of the interval.

194

ALLAN HANBURY

4. Color Top-Hat Hanbury and Serra (2002c) suggest taking advantage of the perceptual uniformity of the L*a*b* color space to calculate a type of morphological top-hat (Soille, 1999). One simply calculates the Euclidean distance between the color coordinates of each pixel of an initial color image and its transform by an opening or closing operator. The resulting Euclidean distances are encoded in a graylevel image, and represent the perceptual diﬀerences between colors. Even though the IHLS space is not perceptually uniform, an approximation to this top-hat can be calculated using Euclidean distances in the RGB or IHLS space. Even though these Euclidean distances do not represent perceptual diﬀerences in a rigorous way, they can still be useful for feature extraction, as shown in the following example. This example also demonstrates a situation in which a lexicographical order with luminance at the top level is not the ‘‘best,’’ contrary to what has been claimed (Ortiz et al., 2001; Louverdis et al., 2002). Figure A.3(b) is a color image for which we have set ourselves the task of extracting the grayish lines between the mosaic tiles. The luminance of this image is shown in Figure 30(a). It is visible that the luminance of the mosaic tiles is sometimes above and sometimes below that of the gray lines between them. In the saturation image, shown in Figure 30(b), one can see that the lines to be extracted have, in general, a lower saturation than the tiles. A morphological closing operation using a lexicographical order with saturation at the top level (Equation (73)) with a square SE of size 2 was applied to the initial color image to give Figure A.3(c). This closing succeeds in expanding the tiles to cover the gray lines. Finally, the suggested color top-hat was calculated by taking the Euclidean distance between corresponding pixels in Figures A.3(b) and (c) to give the grayscale image in Figure 30(c), in which the pixels of highest graylevel correspond to the features we wish to extract. 5. Summary We have shown the applicability of lexicographical orders for creating total orders in the IHLS space, although they are also applicable in any 3D polar coordinate color representation. For applications in which the pertinent information is in the luminance or saturation, classic lexicographical orders with one of these components at the ﬁrst level are applicable. For an application in which we are interested by objects of a speciﬁc hue, we have shown that a lexicographical order using only a hue order based on a chosen origin does not give satisfactory results due to the close relationship between the chromatic coordinates. We therefore propose a method for weighting the hue by the corresponding saturation allowing the application of

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

(a)

195

(b)

(c)

FIGURE 30. (a) Luminance of Figure A.3(b). (b) Saturation of A.3(b). (c) The top-hat—the Euclidean distance between the corresponding pixels in Figures A.3(b) and (c).

morphological operators using a hue order. Even though the IHLS space is not perceptually uniform, a useful graylevel top-hat image can be calculated by taking the pixelwise Euclidean distances between the color coordinates in an initial image and either its opening or closing. The results of this section show the ﬂexibility of the representation of color in terms of hue, luminance, and saturation in the context of mathematical morphology applied to color images. D. Conclusion We have considered the use of vectorial mathematical morphology in the special case for which one of the vector components is an angular value. This was done for the speciﬁc case of mathematical morphology applied to color images described in a space using an angular hue measure, but applications in other domains are conceivable. It is diﬃcult to take into account the supplementary values associated with the hue when using most of the unit circle morphological operators developed in Section II. Therefore, the approach adopted is to create operators which are as rotationally invariant as possible, by placing the angular value with its associated choice of an origin at the third level of a

196

ALLAN HANBURY

lexicographical order, thereby minimizing its role. The lexicographical order was chosen because it imposes a total order on the vectors, avoiding the appearance of false colors brought about when the supremum and inﬁmum of a set of vectors are not part of the initial set, which can happen with a lattice built on a partial order. The third level of the lexicographical order is taken into account when imposing an order on all the vectors in a color space. However, for the extremely reduced set of vectors which are usually found inside a structuring element during the application of a morphological operator, the process of choosing a supremum or inﬁmum of the set almost never gets to the third level of the order, except in pathological cases (Hanbury and Serra, 2001b). The operators based on this minimization of the role of the angular value are those which use a lexicographical order with the luminance or saturation of the IHLS space at the ﬁrst level. The opposite extreme is to create operators which by design are not rotationally invariant, such as those using the lexicographical order with saturation-weighted hue at the ﬁrst level. For these operators, the eﬀect of the hue origin chosen by the user is immediately visible, which is not necessarily disadvantageous if the user is interested in a single hue or group of hues, or if the image contains a dominant hue (subjective or objective). An objective measure of the dominant hue can be obtained by using the saturation-weighted hue mean. An objection to the use of the lexicographical order is its propensity to elevate one of the vector components to a role which is much more important than that of the others. In a representation in 3D polar coordinates, this characteristic of the order is nevertheless less restricting than in a rectangular coordinate representation (of type XYZ or RGB), in which we are limited to a choice between the three primary colors of the space. It is nevertheless possible to increase the importance of the roles played by the lower levels of the lexicographical order by using a quantization into a smaller number of levels for the components higher up in the order relation. For example, in the order relation with luminance at the top level, if the luminance is represented by 10 levels instead of 255, the saturation at the second level necessarily plays a more important role. Ortiz et al. (2001) use a similar approach, which uses a weighting parameter at the ﬁrst level which can augment or reduce its importance. VI. CONCLUSION The principal theme of this chapter is the processing and analysis of circular data and of images which contain this type of data. These data

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

197

most often represent a set of directions which can be visualized as a set of points on the unit circle. The theory of circular statistics is well developed; we limit ourselves to using simple statistical measures such as the mean and variance, for which we present an extension to the vectorial case in the deﬁnition of saturation-weighed hue statistical measures for color images. The principal topic discussed is the development of morphological operators applicable to data on the unit circle. These operators attempt to surmount the two principal disadvantages associated with this data: the absence of an obvious origin and the cyclic nature of the data. We consider for operator formulations: (1) (2) (3) (4)

Operators which require the choice of an origin. Pseudooperators which use the notion of data grouping. Circular centered operators which operate on increments. Operators which begin by labeling an image.

For operators applied to circular data, the notion of rotational invariance is very important. One is free to choose the origin of an angular coordinate system at any position, but it is desirable that the results of the morphological operators are not changed by such a change of origin. Even if the values (measured with respect to the origin) of the results charge, the directions should stay the same. The development of such rotationally invariant operators is diﬃcult, as requiring this invariance could

lead to the loss of other useful properties, for example the loss of idempotence for the pseudoopenings and pseudoclosings, be limited to a small set of operators, for example the circular centered operators which are limited to the gradient and top-hat. The usefulness of these operators is shown in two contexts. For the ﬁrst, one is free to process the angular data in isolation, which is illustrated by the processing of oriented texture ﬁelds and of the phase image of a Fourier transform. For the second, we consider vectorial data for which each vector contains at least one angular value. The latter case is illustrated by the processing of color images represented in the IHLS space, a 3D polar coordinate color system. An oriented texture is a class of texture which has a certain level of orientation speciﬁcity at each point, and can be described by a direction ﬁeld. We use the two top-hat operators adapted to circular data to detect defects associated with singularities in the dominant orientation. Examples are shown for images from the Brodatz album and for images of wooden boards. Finally, the circular centered

198

ALLAN HANBURY

gradient is applied to the segmentation of oriented textures, as well as to the extraction of homogeneous regions in Fourier transform phase images. We then move on to the subject of color images. A more isotropic representation of the RGB space can be made in terms of 3D polar coordinates, for which each coordinate is not linked to a ﬁxed direction in the space. These brightness, saturation, and hue coordinates, the latter being an angular coordinate, are also often more intuitive to use. Before considering the application of mathematical morphology to color images, we examine the multitude of methods described in the literature for doing the simple coordinate system conversion from rectangular to 3D polar coordinates. We show that the principal reason for the proliferation of coordinate conversion algorithms is the (often implicit) normalization of the saturation by the associated brightness function. We suggest the use of the IHLS space which has independent saturation and brightness coordinates. This allows, for example, the use of a psychovisual luminance function instead of a brightness function. The most important property when applying mathematical morphology to color images is that the supremum and inﬁmum of a set should be part of that set, thereby avoiding the introduction of false colors. To have this property, we use a total order, the lexicographical order. This order requires that one of the vector components be elevated to a dominant role. To avoid having to use a speciﬁc direction in this dominant position, we use the more isotropic 3D polar coordinate system. Due to the vectorial nature of the data, the rotationally invariant morphological operators are not easily applicable. We therefore make the compromise of developing operators which are as rotationally invariant as possible, by placing the hue order at the third level of a lexicographical order. The possibility of elevating the hues to the ﬁrst level of importance, thereby creating operators which by design are not rotationally invariant, is also considered. It is shown that the use of a saturation-weighted hue measure at the ﬁrst level gives better results than the hue only, due to the close relation between these chrominance characteristics. The main contribution of this chapter is the development of morphological operators adapted to circular data, and the demonstration of the similarities between the processing of Fourier transform phase images, oriented textures, and color images. Some extensions, both theoretical and practical, remain to be done. These include the investigation of morphological reconstruction operators for color images, the extension of the Fourier transform phase image application, and the investigation of the applicability of these operators to spectrogram processing.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

199

APPENDIX A: CONNECTED PARTITIONS In Section II.G.1 the deﬁnition of a connected partition is presented. Here we repeated this deﬁnition, followed by a discussion of the lattice created by the family of connected paritions. Deﬁnition 15 A partition of the space E for which each element is connected is an application D : E ! PðEÞ, with connectivity C deﬁned on P(E), such that for all points x and y of E: (1) x 2 DðxÞ (2) x ¼ 6 y ) DðxÞ ¼ Dð yÞ or DðxÞ \ Dð yÞ ¼ ; (3) DðxÞ 2 C It is known that given two partitions D and D0 (not necessarily with connected classes), the relation DðxÞ D0 ðxÞ for all x 2 E deﬁnes an order on the partitions, from which a lattice is derived. If we limit ourselves to the family D0 of partitions with connected classes, this order relation remains valid, but gives rise to a diﬀerent lattice. Hence all families fDi , i 2 Ig of connected partitions have in D0 a largest minorante D with its class at point x written as DðxÞ ¼ x ½\Di ðxÞ, i 2 I where x is the point connected opening. D(x) is none other than the connected component given by the intersection of the Di(x) which contain the point x. In the same way, the class at point x of the supremum of the Di is the connected component at x of the smallest set which is the union of the classes of D1, and also of D2, etc., and which contains the point x.

APPENDIX B: CYCLIC CLOSINGS ON INDEXED PARTITIONS In Section II.G.2 the notion of an indexed partition is introduced. The deﬁnition is repeated here, followed by a presentation of the order relation and the action of increasing operators on these partitions.

200

ALLAN HANBURY

Deﬁnition 16 An indexed partition of a space E, indexed by a ﬁnite number N, is an application D : E ! PðEÞ with a function M : PðEÞ ! ½1, 2, . . . , N which associates an index with each element D(x) of the connected partition. To simplify the notation, we deﬁne ( Dðx, iÞ ¼

DðxÞ

if M½DðxÞ ¼ i

;

otherwise

:

ðA:1Þ

The N sets associated with the gamut of indices (hue, direction, etc.) are called phases, and the phase Ai is the union of the partition elements associated with index i Ai ¼ [fDðx, iÞ, x 2 Eg:

ðA:2Þ

As each point x 2 E must be associated with an index, there are only N 1 independent index values—if we know the position of N 1 phases, the position of the Nth is necessarily known. We therefore limit ourselves to the ﬁrst N 1 indices, and consider the relation between two indexed partitions D and D0 , deﬁned by D D0 in the sense of connected partitions 0 DD Q : ðA:3Þ Ai A0i i 2 ½1, 2, . . . , N 1 The set D of partitions with N indices is the lattice produced from the N lattices associated with the orders of relation (A.3). This lattice is far from being unique as the Nth phase plays a particular role, for which we could just as easily choose any of the other phases. Are the orders really all diﬀerent? In particular, we are interested in the orders of the transformations, most importantly the increasing transformations. Let : D ! D be an increasing operation, which hence respects the N inequalities of system (A.3). We obviously have fAi A0i ) ðAi Þ ðA0i Þg Q fAi A0i ) ðAi Þ ðA0i Þg and Ai A0i

for i 2 ½1, 2, . . . , N 1 Q AN A0N ) ðAN Þ ðA0N Þ:

Consequently, if the operator is increasing for one of the lattices D, it is increasing for the others, which all play the same role. Hence the following proposition. Proposition 17 Given an arbitrary space E and a ﬁnite family of N indices, the set D of indexed partitions on E is a complete lattice for every order

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

201

deﬁned by the system (A.3), and for all those which are constructed from it by permutation of the indices or reversing the direction of the inequalities. All increasing operations : D ! D for one of these orders is increasing for all the others.

ACKNOWLEDGMENTS The author wishes to thank his colleagues at the Centre for Mathematical Morphology, Paris School of Mines, France and at the PRIP group, Vienna University of Technology, Austria for the many useful discussions leading to the completion of this work. Particular thanks go to Jean Serra, Etienne Decencie`re and Walter Kropatsch. Part of the texture work presented was undertaken in collaboration with Dmitry Chetverikov of the Hungarian Academy of Sciences, Budapest, Hungary. Thanks also to Florence Boulc’h and Patricia Donnadieu of the Laboratoire de Thermodynamique et Physicochimie Me´tallurgique, Grenoble, France for the Electron Microscope images.

REFERENCES Andersson, M. T., and Knutsson, H. (1992). Orientation estimation in ambiguous neighbourhoods, in Theory & Applications of Image Analysis, edited by P. Johansen and S. Olsen. River Edge, NJ: World Scientiﬁc Publishing Co., pp. 189–210. Barnett, V. (1976). The ordering of multivariate data. J. Statistical Society of America A139(3), 318–354. Bigu¨n, J., Granlund, G. H., and Wiklund, J. (1991). Multidimensional orientation estimation with applications to texture analysis and optical ﬂow. IEEE Trans. Pattern Analysis and Machine Intelligence 13(8), 775–790. Bloch, I., and Maıˆ tre, H. (1995). Fuzzy mathematical morphologies: A comparative study. Pattern Recognition 28(9), 1341–1387. Boulc’h, F., Schouler, M.-C., Donnadieu, P., Chaix, J.-M., and Djurado, E. (2001). Domain size distribution of Y-TZP nano-particles using XRD and HRTEM. Image Analysis and Stereology 20, 157–161. Brodatz, P. (1966). Textures: A Photographic Album for Artists and Designers. New York: Dover. Carron, T. (1995). Segmentations d’images couleur dans la base Teinte-Luminance-Saturation: approche nume´rique et symbolique. Ph.D. thesis, Universite´ de Savoie. Chanussot, J. (1998). Approches vectorielles ou marginales pour le traitement d’images multicomposantes. Ph.D. thesis, Universite´ de Savoie. Chanussot, J., and Lambert, P. (1998). Total ordering based on space ﬁlling curves for multivalued morphology. Proc. International Symposium on Mathematical Morphology (ISMM ’98), pp. 51–58.

202

ALLAN HANBURY

Chetverikov, D. (1999). Texture analysis using feature-based pairwise interaction maps. Pattern Recognition 32, 487–502. Chetverikov, D., and Hanbury, A. (2002). Finding defects in texture using regularity and local orientation. Pattern Recognition 35(10), 2165–2180. Comer, M. L., and Delp, E. J. (1999). Morphological operations for color image processing. J. Electronic Imaging 8(3), 279–289. Commission Internationale de l’E´clairage (1987). International Lighting Vocabulary, 4th Edition, No. 17.4. CIE. Davies, E. R. (1997). Vectorial strategy for designing line segment detectors with high orientation accuracy. Electronics Lett. 33(21), 1775–1777. Demarty, C.-H. (2000). Segmentation et structuration d’un document vide´o pour la caracte´risation et l’indexation de son contenu se´mantique. Ph.D. thesis, CMM, Ecole des Mines de Paris. Demarty, C.-H., and Beucher, S. (1998). Color segmentation algorithm using an HLS transformation. Proc. International Symposium on Mathematical Morphology (ISMM ’98), pp. 231–238. Fisher, N. I. (1993). Statistical Analysis of Circular Data. Cambridge: Cambridge University Press. Fisher, N. I., and Marron, J. S. (2001). Mode testing via the excess mass estimate. Biometrika 88, 499–517. Freeman, W. T., and Adelson, E. H. (1991). The design and use of steerable ﬁlters. IEEE Trans. Pattern Analysis and Machine Intelligence 13(9), 891–906. Gonzalez, R. C., and Woods, R. E. (1992). Digital Image Processing. Reading, MA: Addison-Wesley. Hanbury, A. (2001). Lexicographical order in the HLS colour space. Tech. Rep. N-04/01/MM, CMM, Ecole des Mines de Paris. Hanbury, A. (2002). Morphologie mathe´matique sur le cercle unite´: avec applications aux teintes et aux textures oriente´es. Ph.D. thesis, CMM, Ecole des Mines de Paris. Hanbury, A., and Serra, J. (2001a). Mathematical morphology in the HLS colour space. Proc. British Machine Vision Conference 2001, BMVA, pp. 451–460. Hanbury, A., and Serra, J. (2001b). Mathematical morphology in the L*a*b* colour space. Tech. Rep. N-36/01/MM, CMM, Ecole des Mines de Paris. Hanbury, A., and Serra, J. (2001c). Morphological operators on the unit circle. IEEE Trans. Image Processing 10(12), 1842–1850. Hanbury, A., and Serra, J. (2002a). A 3D-polar coordinate colour representation suitable for image analysis. Tech. Rep. PRIP-TR-077, PRIP, T.U. Wien. Hanbury, A., and Serra, J. (2002b). Analysis of oriented textures using mathematical morphology, in Vision with Non-Traditional Sensors. Austrian Computer Society, pp. 201–208. Hanbury, A., and Serra, J. (2002c). Mathematical morphology in the CIELAB space. Image Analysis and Stereology 21, 201–206. Heijmans, H. (1994). Morphological Image Operators. Boston: Academic Press. Hy¨tch, M. J., Snoeck, E., and Kilaas, R. (1998). Quantitative measurement of displacement and strain ﬁelds from HREM micrographs. Ultramicroscopy 74, 131–146. ITU-R Recommendation BT.709 (1990). Basic parameter values for the HDTV standard for the studio and for international programme exchange. Geneva: ITU. Iwanowski, M. (2000). Application de la morphologie mathe´matique pour l’interpolation d’images nume´riques. Ph.D. thesis, CMM, Ecole des Mines de Paris. Kass, M., and Witkin, A. (1987). Analyzing oriented patterns. Computer Vision, Graphics and Image Processing 37, 362–385.

MATHEMATICAL MORPHOLOGY AND CIRCULAR DATA

203

Kender, J. R. (1976). Saturation, hue and normalized color: Calculation, digitization eﬀects, and use. Tech Rep., Department of Computer Science, Carnegie-Mellon University. Kim, C.-W., and Koivo, A. J. (1994). Hierarchical classiﬁcation of surface defects on dusty wood boards. Pattern Recognition Lett. 15, 713–721. Ko¨ppen, M., Nowack, C., and Ro¨sel, G. (1999). Pareto-morphology for color image processing. Proc. 11th Scandinavian Conference on Image Analysis 1, 195–202. Levkowitz, H., and Herman, G. T. (1993). GLHS: A generalised lightness, hue and saturation color model. CVGIP: Graphical Models and Image Processing 55(4), 271–285. Louverdis, G., Vardavoulia, M. I., Andreadis, I., and Tsalides, P. (2002). A new approach to morphological color image processing. Pattern Recognition 35, 1733–1741. Mallat, S. (1998). A Wavelet Tour of Signal Processing. London: Academic Press. Mardia, K. V., and Jupp, P. E. (1999). Directional Statistics, 2nd Edition. Chichester: John Wiley. Meyer, F. (1999). Graph based morphological segmentation. Proc. 2nd IAPR-TC-15 Worshop on Graph-based Representations, 51–60. Mlynarczuk, M., Serra, J., Bailly, F., and Bouchet, S. (1998). Segmentation de lames minces polarise´es. Tech. Rep. N-48/98/MM, CMM, Ecole des Mines de Paris. Nickels, K. M., and Hutchinson, S. (1997). Textured image segmentation: Returning multiple solutions. Image and Vision Computing 15, 781–795. Nikolaidis, N., and Pitas, I. (1998). Nonlinear processing and analysis of angular signals. IEEE Trans. Signal Proccesing 46(12), 3181–3194. Niskanen, M., Silve´n, O., and Kauppinen, H. (2001). Experiments with SOM based inspection of wood. Proc. International Conference on Quality Control by Artiﬁcial Vision (QCAV’2001), pp. 311–316. Novak, C. L., Shafer, S. A., and Willson, R. G. (1992). Obtaining accurate color images for machine vision research, in Color, edited by G. E. Healey, S. A. Shafer, and L. B. Wolﬀ. Jones and Bartlett, pp. 13–27. Ortiz, F., Torres, F., Angulo, J., and Puente, S. (2001). Comparative study of vectorial morphological operations in diﬀerent color space. Proc. SPIE on Intelligent Robots and Computer Vision (IRCV) XX: Algorithms Techniques and Active Vision 4572, 259–268. Peters II, R. A. (1997). Mathematical morphology for angle-valued images, in Non-Linear Image Processing VIII. SPIE, Vol. 3026. Picard, R. W., and Gorkani, M. (1994). Finding perceptually dominant orientations in natural textures. Spatial Vision 8(2), 221–253. Pitas, I., and Tsakalides, P. (1991). Multivariate ordering in color image ﬁltering. IEEE Trans. Circuits and Systems for Video Technology 1(3), 247–259. Poynton, C. (1999). Frequently asked questions about gamma. URL: http://www.inforamp.net/ poynton/PDFs/GammaFAQ.pdf. Rao, A. R. (1990). A Taxonomy for Texture Description and Identiﬁcation. New York: Springer-Verlag. Rao, A. R., and Schunck, B. G. (1991). Computing oriented texture ﬁelds. CVGIP: Graphical Models and Image Processing 53(2), 157–185. Serra, J. (1982). Image Analysis and Mathematical Morphology. London: Academic Press. Serra, J. (1988). Image Analysis and Mathematical Morphology. Volume 2: Theoretical Advances. London: Academic Press. Serra, J. (1992). Anamorphosis and function lattices (multivalued morphology), in Mathematical Morphology in Image Processing, edited by E. R. Dougherty. New York: Marcel Dekker, pp. 483–523, Chapter 13. Shih, T.-Y. (1995). The reversibility of six geometric color spaces. Photogrammetric Engineering and Remote Sensing 61(10), 1223–1232.

204

ALLAN HANBURY

Silve´n, O., and Kaupinen, H. (1996). Recent developments in wood inspection. Int. J. Pattern Recognition and Artiﬁcial Intelligence 10(1), 83–95. Smith, A. R. (1978). Color gamut transform pairs. Computer Graphics 12(3), 12–19. Smith, J. R. (1997). Integrated spatial and feature image systems: Retrieval, compression and analysis. Ph.D. thesis, Columbia University. Soille, P. (1999). Morphological Image Analysis: Principles and Applications. Springer-Verlag. Talbot, H., Evans, C., and Jones, R. (1998). Complete ordering and multivariate mathematical morphology: Algorithms and applications. Proc. International Symposium on Mathematical Morphology (ISMM’98), pp. 27–34. Vardavoulia, M. I., Andreadis, I., and Tsalides, P. (2001). A new vector median ﬁlter for colour image processing. Pattern Recognition Lett. 22(6–7), 675–689. Weeks, A. R., and Sartor, L. J. (1999). Color morphological operators using conditional and reduced ordering. Proc. SPIE Conference on Applications of Digital Image Processing XXII. SPIE 3808, 358–366. Wyszecki, G., and Stiles, W. S. (1982). Color Science: Concepts and Methods, Quantitative Data and Formulae, 2nd Edition. New York: John Wiley. Zhang, C., and Wang, P. (2000). A new method of color image segmentation based on intensity and hue clustering. Proc. 15th ICPR Barcelona 3, 617–620.

ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 128

Quantum Tomography G. MAURO D’ARIANO, MATTEO G. A. PARIS, and MASSIMILIANO F. SACCHI Quantum Optics and Information Group, Istituto Nazionale per la Fisica della Materia, Unita` di Pavia, Dipartimento di Fisica ‘‘A. Volta,’’ Universita` di Pavia, Italy

I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II. Wigner Functions and Elements of Detection Theory . . . . . . . . . A. Wigner Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . B. Photodetection . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Balanced Homodyne Detection . . . . . . . . . . . . . . . . . . . D. Heterodyne Detection . . . . . . . . . . . . . . . . . . . . . . . . III. General Tomographic Method. . . . . . . . . . . . . . . . . . . . . . A. Brief Historical Excursus. . . . . . . . . . . . . . . . . . . . . . . B. Conventional Tomographic Imaging . . . . . . . . . . . . . . . . 1. Extension to the Quantum Domain . . . . . . . . . . . . . . . C. General Method of Quantum Tomography . . . . . . . . . . . . 1. Basic Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Characterization of the Quorum . . . . . . . . . . . . . . . . . 3. Quantum Estimation for Harmonic Oscillator Systems . . . . 4. Some Generalizations . . . . . . . . . . . . . . . . . . . . . . . 5. Quantum Estimation for Spin Systems . . . . . . . . . . . . . 6. Quantum Estimation for a Free Particle . . . . . . . . . . . . D. Noise Deconvolution and Adaptive Tomography . . . . . . . . . 1. Noise Deconvolution . . . . . . . . . . . . . . . . . . . . . . . 2. Adaptive Tomography . . . . . . . . . . . . . . . . . . . . . . IV. Universal Homodyning. . . . . . . . . . . . . . . . . . . . . . . . . . A. Homodyning Observables . . . . . . . . . . . . . . . . . . . . . . B. Noise in Tomographic Measurements. . . . . . . . . . . . . . . . 1. Field Intensity . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Real Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Field Amplitude . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Comparison between Homodyne Tomography and Heterodyning V. Multimode Homodyne Tomography . . . . . . . . . . . . . . . . . . A. The General Method . . . . . . . . . . . . . . . . . . . . . . . . . 1. Numerical Results for Two-Mode Fields . . . . . . . . . . . . VI. Applications to Quantum Measurements . . . . . . . . . . . . . . . . A. Measuring the Nonclassicality of a Quantum State . . . . . . . . 1. Single-Mode Nonclassicality . . . . . . . . . . . . . . . . . . . 2. Two-Mode Nonclassicality . . . . . . . . . . . . . . . . . . . . B. Test of State Reduction . . . . . . . . . . . . . . . . . . . . . . . C. Tomography of Coherent Signals and Applications . . . . . . . .

205

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

206 209 210 213 215 218 222 223 224 225 227 227 229 232 235 237 239 239 240 241 243 243 246 248 249 249 251 253 255 256 260 265 266 267 270 272 277

Copyright ß 2003 Elsevier Inc. All rights reserved. 1076-5670/2003 $35.00

206

MAURO D’ARIANO ET AL.

VII. Tomography of a Quantum Device . . . . . . . . . . . . A. The Method. . . . . . . . . . . . . . . . . . . . . . . B. An Example in the Optical Domain. . . . . . . . . . VIII. Maximum Likelihood Method in Quantum Estimation . A. Maximum Likelihood Principle . . . . . . . . . . . . B. ML Quantum State Estimation . . . . . . . . . . . . C. Gaussian State Estimation . . . . . . . . . . . . . . . IX. Classical Imaging by Quantum Tomography . . . . . . . A. From Classical to Quantum Imaging . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

281 283 285 287 288 289 295 298 299 305 305

I. INTRODUCTION The state of a physical system is the mathematical description that provides complete information on the system. Its knowledge is equivalent to knowing the result of any possible measurement on the system. In classical mechanics it is always possible, at least in principle, to devise a procedure made of multiple measurements which fully recovers the state of the system. In quantum mechanics, on the contrary, this is not possible, due to the fundamental limitations related to the Heisenberg uncertainty principle [1,2] and the no-cloning theorem [3]. In fact, on the one hand one cannot perform an arbitrary sequence of measurements on a single system without inducing on it a back-action of some sort. On the other hand, the no-cloning theorem forbids one to create a perfect copy of the system without already knowing its state in advance. Thus, there is no way out, not even in principle, to infer the quantum state of a single system without having some prior knowledge on it [4]. It is possible to estimate the unknown quantum state of a system when many identical copies are available in the same state, so that a diﬀerent measurement can be performed on each copy. A procedure of such kind is called quantum tomography. The problem of ﬁnding a procedure to determine the state of a system from multiple copies was ﬁrst addressed in 1957 by Fano [5], who called quorum a set of observables suﬃcient for a complete determination of the density matrix. However, since for a particle it is diﬃcult to devise concretely measurable observables other than position, momentum, and energy, the fundamental problem of measuring the quantum state has remained at the level of mere speculation up to almost 10 years ago, when the issue ﬁnally entered the realm of experimental physics with the pioneering experiments by Raymer’s group [6] in the domain of quantum optics. In quantum optics, in fact, using a

QUANTUM TOMOGRAPHY

207

balanced homodyne detector one has the unique opportunity of measuring all possible linear combinations of position and momentum of a harmonic oscillator, which here represents a single mode of the electromagnetic ﬁeld. The ﬁrst technique to reconstruct the density matrix from homodyne measurements—so-called homodyne tomography—originated from the observation by Vogel and Risken [7] that the collection of probability distributions achieved by homodyne detection is just the Radon transform of the Wigner function W. Therefore, as in classical imaging, by Radon transform inversion one can obtain W, and then from W the matrix elements of the density operator. This ﬁrst method, however, was aﬀected by uncontrollable approximations, since arbitrary smoothing parameters are needed for the inverse Radon transform. In Ref. [8] the ﬁrst exact technique was given for measuring experimentally the matrix elements of the density operator in the photon-number representation, by simply averaging functions of homodyne data. After that, the method was further simpliﬁed [9], and the feasibility for nonunit quantum eﬃciency of detectors—above some bounds—was established. The exact homodyne method has been implemented experimentally to measure the photon statistics of a semiconductor laser [10], and the density matrix of a squeezed vacuum [11]. The success of optical homodyne tomography has then stimulated the development of state-reconstruction procedures for atomic beams [12], the experimental determination of the vibrational state of a molecule [13], of an ensemble of helium atoms [14], and of a single ion in a Paul trap [15]. Through quantum tomography the state is perfectly recovered in the limit of inﬁnite number of measurements, while in the practical ﬁnitemeasurements case, one can always estimate the statistical error that aﬀects the reconstruction. For inﬁnite dimensions the propagation of statistical errors of the density matrix elements make them useless for estimating the ensemble average of unbounded operators, and a method for estimating the ensemble average of arbitrary observables of the ﬁeld without using the density matrix elements has been derived [16]. Further insight on the general method of state reconstruction has led one to generalize homodyne tomography to any number of modes [17], and then to extend the tomographic method from the harmonic oscillator to an arbitrary quantum system using group theory [18–21]. A general data analysis method has been designed in order to unbias the estimation procedure from any known instrumental noise [20]. Moreover, algorithms have been engineered to improve the statistical errors on a given sample of experimental data—the so-called adaptive tomography [22]—and then max-likelihood strategies [23] have been used that improved dramatically statistical errors; however, this

208

MAURO D’ARIANO ET AL.

has been at the expense of some bias in the inﬁnite dimensional case, and of exponential complexity versus N for the joint tomography of N quantum systems. The latest technical developments [24] derive the general tomographic method from spanning sets of operators, the previous group theoretical approaches [18–21] being just a particular case of this general method, where the group representation is just a device to ﬁnd suitable operator ‘‘orthogonality’’ and ‘‘completeness’’ relations in the linear algebra of operators. Finally, a method for tomographic estimation of the unknown quantum operation of a quantum device has been derived [25], which uses a single ﬁxed input entangled state, which plays the role of all possible input states in quantum parallel on the tested device, making ﬁnally the method a true ‘‘quantum radiography’’ of the functioning of a device. In this chapter we will give a self-contained and uniﬁed derivation of the methods of quantum tomography, with examples of applications to diﬀerent kinds of quantum systems, and with particular focus on quantum optics, where also some results from experiments are reexamined. The chapter is organized as follows. In Section II we introduce the generalized Wigner functions [26,27] and we provide the basic elements of detection theory in quantum optics, by giving the description of photodetection, homodyne detection, and heterodyne detection. As we will see, heterodyne detection also provides a method for estimating the ensemble average of polynomials in the ﬁeld operators; however, it is unsuitable for the density matrix elements in the photon-number representation. The eﬀect of nonunit quantum eﬃciency is taken into account for all such detection schemes. In Section III we give a brief history of quantum tomography, starting with the ﬁrst proposal of Vogel and Risken [7] as the extension to the domain of quantum optics of the conventional tomographic imaging. As already mentioned, this method indirectly recovers the state of the system through the reconstruction of the Wigner function, and is aﬀected by uncontrollable bias. The exact homodyne tomography method of Ref. [8] (successively simpliﬁed in Ref. [9]) is here presented on the basis of the general tomographic method of spanning sets of operators of Ref. [24]. As another application of the general method, the tomography of spin systems [28] is provided from the group theoretical method of Refs. [18–20]. In this section we include also further developments to improve the method, such as the deconvolution techniques of [20] to correct the eﬀects of experimental noise by data processing, and the adaptive tomography [22] to reduce the statistical ﬂuctuations of tomographic estimators. Section IV is devoted to the evaluation from Ref. [16] of the expectation value of arbitrary operators of a single-mode radiation ﬁeld via homodyne tomography. Here we also report from Ref. [29] the estimation of the

QUANTUM TOMOGRAPHY

209

added noise with respect to the perfect measurement of ﬁeld observables, for some relevant observables, along with a comparison with the noise that would have been obtained using heterodyne detection. The generalization of Ref. [17] of homodyne tomography to many modes of radiation is reviewed in Section V, where it is shown how tomography of a multimode ﬁeld can be performed by using only a single local oscillator with a tunable ﬁeld mode. Some results of Monte Carlo simulations from Ref. [17] are also shown for the state that describes light from parametric downconversion. Section VI reviews some applications of quantum homodyne tomography to perform fundamental tests of quantum mechanics. The ﬁrst is the proposal of Ref. [30] to measure the nonclassicality of radiation ﬁeld. The second is the scheme of Ref. [31] to test the state reduction rule using light from parametric downconversion. Finally, we review some experimental results about tomography of coherent signals with applications to the estimation of losses introduced by simple optical components [32]. Section VII reviews the tomographic method of Ref. [25] to reconstruct the quantum operation of a device, such as an ampliﬁer or a measuring device, using a single ﬁxed input entangled state, which plays the role of all possible input states in a quantum parallel fashion. Section VIII is devoted to the reconstruction technique of Ref. [23] based on the maximum likelihood principle. As mentioned, for inﬁnite dimensions this method is necessarily biased; however, it is more suited to the estimation of a ﬁnite number of parameters, as proposed in Ref. [33], or to the state determination in the presence of very low number of experimental data [23]. Unfortunately, the algorithm of this method has exponential complexity versus the number of quantum systems for a joint tomography of many systems. Finally, in Section IX we brieﬂy review Ref. [34], showing how quantum tomography could be proﬁtably used as a tool for reconstruction and compression in classical imaging.

II. WIGNER FUNCTIONS AND ELEMENTS OF DETECTION THEORY In this section we review some simple formulas from Ref. [35] that connect the generalized Wigner functions for s-ordering with the density matrix, and vice versa. These formulas prove very useful for quantum mechanical applications as, for example, for connecting master equations with Fokker– Planck equations, or for evaluating the quantum state from Monte Carlo simulations of Fokker–Planck equations, and ﬁnally for studying positivity

210

MAURO D’ARIANO ET AL.

of the generalized Wigner functions in the complex plane. Moreover, as we will show in Section III, the ﬁrst proposal of quantum state reconstruction [7] used the Wigner function as an intermediate step. In the second part of the section we evaluate the probability distribution of the photocurrent of photodetectors, balanced homodyne detectors, and heterodyne detectors. We show that under suitable limits the respective photocurrents provide the measurement of the photon number distribution, of the quadrature, and of the complex amplitude of a single mode of the electromagnetic ﬁeld. When the eﬀect of nonunit quantum eﬃciency is taken into account an additional noise aﬀects the measurement, giving a Bernoulli convolution for photodetection, and a Gaussian convolution for homodyne and heterodyne detection. Extensive use of the results in this section will be made in subsequent sections devoted to quantum homodyne tomography. A. Wigner Functions Since Wigner’s pioneering work [26], generalized phase-space techniques have proved very useful in various branches of physics [36]. As a method to express the density operator in terms of c-number functions, the Wigner functions often lead to considerable simpliﬁcation of the quantum equations of motion, as, for example, for transforming master equations in operator form into more amenable Fokker–Planck diﬀerential equations (see, for example, Ref. [37]). Using the Wigner function one can express quantum mechanical expectation values in form of averages over the complex plane (the classical phase-space), the Wigner function playing the role of a cnumber quasiprobability distribution, which generally can also have negative values. More precisely, the original Wigner function allows one to easily evaluate expectations of symmetrically ordered products of the ﬁeld operators, corresponding to Weyl’s quantization procedure [38]. However, with a slight change of the original deﬁnition, one deﬁnes generalized sordered Wigner function Ws ð , * Þ, as follows [27] Z Ws ð , * Þ ¼

C

d2 * * þðs=2Þjj2 e Tr½DðÞ , p2

ð1Þ

where * denotes the complex conjugate of , the integral is performed on the complex plane with measure d2 l ¼ d Re l d Im l, represents the density operator, and Dð Þ:expð ay * aÞ

ð2Þ

QUANTUM TOMOGRAPHY

211

denotes the displacement operator, where a and ay (½a, ay ¼ 1) are the annihilation and creation operators of the ﬁeld mode of interest. The Wigner functions in Equation (1) allow one to evaluate s-ordered expectation values of the ﬁeld operators through the following relation Z

y n m

Tr½: ða Þ a : s ¼

C

d2 Ws ð , * Þ * n m :

ð3Þ

The particular cases s ¼ 1, 0, 1 correspond to antinormal, symmetrical, and normal ordering, respectively. In these cases the generalized Wigner function Ws ð , * Þ are usually denoted by the following symbols and names 1 Qð , * Þ for s ¼ 1 ‘‘Q function” p Wð , * Þ for s ¼ 0 ðusual Wigner functionÞ Pð , * Þ

ð4Þ

for s ¼ 1 ‘‘P function:”

For the normal (s ¼ 1) and antinormal (s ¼ 1) orderings, the following simple relations with the density matrix are well known Qð , * Þ:h jj i, Z ¼ d2 Pð , * Þ j ih j,

ð5Þ ð6Þ

C

where j i denotes the customary coherent state j i ¼ D( )j0i, j0i being the vacuum state of the ﬁeld. Among the three particular representations (4), the Q function is positively deﬁnite and inﬁnitely diﬀerentiable (it actually represents the probability distribution for ideal joint measurements of position and momentum of the harmonic oscillator; see Section II.D). On the other hand, the P function is known to be possibly highly singular, and the only pure states for which it is positive are the coherent states [39]. Finally, the usual Wigner function has the remarkable property of providing the probability distribution of the quadratures of the ﬁeld in the form of a marginal distribution, namely Z

1 1

d Im Wð ei’ , * ei’ Þ ¼’ hRe jjRe i’ ,

ð7Þ

where jxi’ denotes the (unnormalizable) eigenstate of the ﬁeld quadrature X’ ¼

ay ei’ þ aei’ 2

ð8Þ

212

MAURO D’ARIANO ET AL.

with real eigenvalue x. Notice that any couple of quadratures X’, X’ þ p/2 is canonically conjugate, namely [X’, X’ þ p/2] ¼ i/2, and it is equivalent to position and momentum of a harmonic oscillator. Usually, negative values of the Wigner function are viewed as a signature of a nonclassical state, the most eloquent example being the Schro¨dinger-cat state [40], whose Wigner function is characterized by rapid oscillations around the origin of the complex plane. From Equation (1) one can notice that all s-ordered Wigner functions are related to each other through Gaussian convolution Z Ws ð , * Þ ¼

C

d2 Ws0 ð , * Þ

¼ exp

2 2 2 exp j

j pðs0 sÞ s0 s

s0 s @ 2 Ws0 ð , * Þ, ðs0 > sÞ: 2 @ @ *

ð9Þ ð10Þ

Equation (9) shows the positivity of the generalized Wigner function for s < 1, as a consequence of the positivity of the Q function. From a qualitative point of view, the maximum value of s keeping the generalized Wigner functions as positive can be considered as an indication of the classical nature of the physical state [41]. An equivalent expression for Ws ð , * Þ can be derived as follows [35]. Equation (1) can be rewritten as Ws ð , * Þ ¼ Tr½Dð ÞW^ s Dy ð Þ ,

ð11Þ

where Z W^ s ¼

C

d2 ðs=2Þjj2 e DðÞ: p2

ð12Þ

Through the customary Baker–Campbell–Hausdorﬀ (BCH) formula 1 exp A exp B ¼ exp A þ B þ ½A, B , 2

ð13Þ

which holds when ½A, ½A, B ¼ ½B, ½A, B ¼ 0, one writes the displacement in normal order, and integrating on argðlÞ and jlj one obtains W^ s ¼

y 1 X 2 1 2 n yn n 2 sþ1 a a a a ¼ , pð1 sÞ n¼0 n! s 1 pð1 sÞ s 1

ð14Þ

213

QUANTUM TOMOGRAPHY

where we used the normal-ordered forms : ðay aÞn :¼ ðay Þn an ¼ ay aðay a 1Þ ðay a n þ 1Þ, and the identity y

: exa a :¼

1 X y ðxÞl y l l ða Þ a ¼ ð1 xÞa a : l! l¼0

ð15Þ

ð16Þ

The density matrix can be recovered from the generalized Wigner functions using the following expression ¼

2 1þs

Z C

d2 Ws ð , * Þeð2=ð1þsÞÞj j eð2 =ð1þsÞÞa 2

y

y s 1 a a ð2 * =ð1þsÞÞa e : sþ1 ð17Þ

For the proof of Equation (17) the reader is referred to Ref. [35]. In particular, for s ¼ 0 one has the inverse of the Glauber formula Z y ¼2 d2 Wð , * ÞDð2 ÞðÞa a , ð18Þ C

whereas for s ¼ 1 one recovers Equation (6) that deﬁnes the P function. B. Photodetection Light is revealed by exploiting its interaction with atoms/molecules or electrons in a solid, and, essentially, each photon ionizes a single atom or promotes an electron to a conduction band, and the resulting charge is then ampliﬁed to produce a measurable pulse. In practice, however, available photodetectors are not ideally counting all photons, and their performance is limited by a nonunit quantum eﬃciency , namely only a fraction of the incoming photons leads to an electric signal, and ultimately to a count: some photons are either reﬂected from the surface of the detector, or are absorbed without being transformed into electric pulses. Let us consider a light beam entering a photodetector of quantum eﬃciency , i.e., a detector that transforms just a fraction of the incoming light pulse into electric signal. If the detector is small with respect to the coherence length of radiation and its window is open for a time interval T, then the Poissonian process of counting gives a probability pðm; TÞ of revealing m photons written as [42]

½IðTÞT m exp½IðTÞT : , ð19Þ pðm; TÞ ¼ Tr : m!

214

MAURO D’ARIANO ET AL.

where is the quantum state of light, : : denotes the normal ordering of ﬁeld operators, and I(T) is the beam intensity IðTÞ ¼

2"0 c T

Z

T

EðÞ ðr, tÞ EðþÞ ðr, tÞ dt,

ð20Þ

0

given in terms of the positive (negative) frequency part of the electric ﬁeld operator EðþÞ ðr, tÞ (EðÞ ðr, tÞ). The quantity pðtÞ ¼ Tr½IðTÞ equals the probability of a single count during the time interval ðt, t þ dtÞ. Let us now focus our attention on the case of the radiation ﬁeld excited in a stationary state of a single mode at frequency !. Equation (19) can be rewritten as

ðay aÞm expðay aÞ : , p ðmÞ ¼ Tr : m!

ð21Þ

where the parameter ¼ c h!=V denotes the overall quantum eﬃciency of the photodetector. Using Equations (15) and (16) one obtains p ðmÞ ¼

1 X

nn

n¼m

n m ð1 Þnm , m

ð22Þ

where nn :hnjjni ¼ p¼1 ðnÞ:

ð23Þ

Hence, for unit quantum eﬃciency a photodetector measures the photon number distribution of the state, whereas for nonunit quantum eﬃciency the output distribution of counts is given by a Bernoulli convolution of the ideal distribution. The eﬀects of nonunit quantum eﬃciency on the statistics of a photodetector, i.e., Equation (22) for the output distribution, can be also described by means of a simple model in which the realistic photodetector is replaced with an ideal photodetector preceded by a beam splitter of transmissivity :. The reﬂected mode is absorbed, whereas the transmitted mode is photodetected with unit quantum eﬃciency. In order to obtain the probability of measuring m clicks, notice that, apart from trivial phase changes, a beam splitter of transmissivity aﬀects the unitary transformation of ﬁelds c y a U ¼ :U b d

pﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ! a 1 , pﬃﬃﬃ b

ð24Þ

QUANTUM TOMOGRAPHY

215

where all ﬁeld modes are considered at the same frequency. Hence, the output mode c hitting the detector is given by the linear combination c¼

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃ a 1 b,

ð25Þ

and the probability of counts reads p ðmÞ ¼ Tr½U j0i 0jUy jm hmj 1 ! 1 X n nn ð1 Þnm m : ¼ m n¼m

ð26Þ

Equation (26) reproduces the probability distribution of Equation (22) with ¼ . We conclude that a photodetector of quantum eﬃciency is equivalent to a perfect photodetector preceded by a beam splitter of transmissivity which accounts for the overall losses of the detection process. C. Balanced Homodyne Detection The balanced homodyne detector provides the measurement of the quadrature of the ﬁeld X’ in Equation (8). It was proposed by Yuen and Chan [43], and subsequently demonstrated by Abbas et al. [44]. The scheme of a balanced homodyne detector is depicted in Figure 1. The signal mode a interferes with a strong laser beam mode b in a balanced 50/50 beam splitter. The mode b is the so-called the local oscillator (LO) mode of the detector. It operates at the same frequency as a, and is excited by the laser in a strong coherent state jzi. Since in all experiments that use homodyne detectors the signal and the LO beams are generated by a common source, we assume that they have a ﬁxed phase relation. In this case the LO phase provides a reference for the quadrature measurement, namely we identify the phase of the LO with the phase diﬀerence between the two modes. As we will see, by tuning ’ ¼ arg z we can measure the quadrature X’ at diﬀerent phases. After the beam splitter the two modes are detected by two identical photodetectors (usually linear avalanche photodiodes), and ﬁnally the diﬀerence of photocurrents at zero frequency is electronically processed and rescaled by 2jzj. According to Equation (24), the modes at the output of the 50=50 beam splitter ( ¼ 1=2) are written ab aþb c ¼ pﬃﬃﬃ , d ¼ pﬃﬃﬃ , 2 2

ð27Þ

216

MAURO D’ARIANO ET AL.

FIGURE 1. Scheme of the balanced homodyne detector.

hence the diﬀerence of photocurrents is given by the following operator I¼

d y d c y c ay b þ by a ¼ : 2jzj 2jzj

ð28Þ

Let us now proceed to evaluate the probability distribution of the output photocurrent I for a generic state of the signal mode a. In the following treatment we will follow Refs. [45,46]. Let us consider the moment generating function of the photocurrent I ðÞ ¼ Tr jzihzjeiI ,

ð29Þ

which provides the probability distribution of I as the Fourier transform Z

þ1

PðIÞ ¼ 1

d iI e ðÞ: 2p

ð30Þ

Using the BCH formula [47,48] for the SU(2) group, namely y

y

y

y

expð aby * ay bÞ ¼ eb a ð1 þ jj2 Þð1=2Þðb ba aÞ e* a b ,

¼

tanj j, ð31Þ j j

one can write the exponential in Equation (29) in normal-ordered form with respect to mode b as follows * y

ðÞ ¼ eitanð=ð2jzjÞÞb a

+ ay aby b y cos eitanð=ð2jzjÞÞa b : 2jzj

ab

ð32Þ

QUANTUM TOMOGRAPHY

217

Since mode b is in a coherent state jzi the partial trace over b can be evaluated as follows *

+ ay a itanð=ð2jzjÞÞzay cos e ðÞ ¼ e 2jzj a * by b + z : z cos 2jzj

itanð=ð2jzjÞÞz* a

ð33Þ

Using now Equation (13), one can rewrite Equation (33) in normal order with respect to a, namely

2 izsinð=ð2jzjÞÞay y 2 iz* sinð=ð2jzjÞÞa ðÞ ¼ e ða a þ jzj Þ e exp 2 sin , 4jzj a

ð34Þ

In the strong-LO limit z ! 1, only the lowest order terms in l/jzj are retained, ay a is neglected with respect to jzj2, and Equation (34) simpliﬁes as follows 2

i’ y i’ lim ðÞ ¼ eið=2Þe a exp eið=2Þe a ¼ hexp½iX’ ia , z!1 8 a

ð35Þ

where ’ ¼ arg z. The generating function in Equation (35) is then equivalent to the positive operator-valued measure (POVM) Z

þ1

ðxÞ ¼ 1

d exp½iðX’ xÞ ¼ ðX’ xÞ:jxi’’ hxj, 2p

ð36Þ

namely the projector on the eigenstate of the quadrature X’ with eigenvalue x. In conclusion, the balanced homodyne detector achieves the ideal measurement of the quadrature X’ in the strong LO limit. In this limit, the probability distribution of the output photocurrent I approaches exactly the probability distribution pðx, ’Þ ¼’ hxjjxi’ of the quadrature X’, and this for any state of the signal mode a. It is easy to take into account nonunit quantum eﬃciency at detectors. According to Equation (25) one has the replacements pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃ c 1 u, pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃ d ) d 1 , c)

u, vacuum modes

ð37Þ ð38Þ

218

MAURO D’ARIANO ET AL.

and now the output current is rescaled by 2jzj, namely 1 I ^ 2jzj

("

sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ # ) 1 y aþ ðu þ Þ b þ h c , 2

ð39Þ

where only terms containing the strong LO mode b are retained. The POVM is then obtained by replacing sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1 ðu’ þ ’ Þ X’ ! X’ þ 2

ð40Þ

in Equation (36), with w’ ¼ ðwy ei’ þ wei’ Þ=2, w ¼ u, , and tracing the vacuum modes u and . One then obtains Z ðxÞ ¼ Z

þ1 1

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ d iðX’ xÞ e jh0jei ð1Þ=2 u’ j0ij2 2p

þ1

d iðX’ xÞ 2 ðð1Þ=8Þ e e 1 2p " # 1 ðx X’ Þ2 ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ exp 22 2p2 ¼

1 ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2p2

Z

þ1

1

0 2

dx0 eð1=2 Þðxx Þ jx0 i’’ hx0 j, 2

ð41Þ

where 2 ¼

1 : 4

ð42Þ

Thus the POVM, and in turn the probability distribution of the output photocurrent, are just pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ the Gaussian convolution of the ideal ones with rms ¼ ð1 Þ=ð4Þ. D. Heterodyne Detection Heterodyne detection allows one to perform the joint measurement of two conjugated quadratures of the ﬁeld [49,50]. The scheme of the heterodyne detector is depicted in Figure 2.

QUANTUM TOMOGRAPHY

219

FIGURE 2. Scheme of the heterodyne detector.

A strong local oscillator at frequency ! in a coherent state j i hits a beam splitter with transmissivity pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ! 1, and with the coherent amplitude such that :j j ð1 Þ is kept constant. If the output photocurrent is sampled at the intermediate frequency !IF , just the ﬁeld modes a and b at frequency ! !IF are selected by the detector. Modes a and b are usually referred to as signal band and image band modes, respectively. In the strong LO limit, upon tracing the LO mode, the output photocurrent Ið!IF Þ rescaled by is equivalent to the complex operator Z¼

Ið!IF Þ ¼ a by ,

ð43Þ

where the arbitrary phases of modes have been suitably chosen. The heterodyne photocurrent Z is a normal operator, equivalent to a couple of commuting selfadjoint operators Z ¼ Re Z þ iIm Z, ½Z, Zy ¼ ½Re Z, Im Z ¼ 0:

ð44Þ

The POVM of the detector is then given by the orthogonal eigenvectors of Z. It is here convenient to introduce the notation of Ref. [51] for vectors in the tensor product of Hilbert spaces H H jAii ¼

X

Anm jni jmi:ðA IÞjIii:ðI A ÞjIii,

ð45Þ

nm

where A denotes the transposed operator with respect to some prechosen orthonormal basis. Equation (45) exploits the isomorphism between the Hilbert space of the Hilbert–Schmidt operators A, B 2 HSðHÞ with scalar product hA, Bi ¼ Tr½Ay B , and the Hilbert space of bipartite vectors jAii, jBii 2 H H, where one has hhAjBii:hA, Bi.

220

MAURO D’ARIANO ET AL.

Using the abovepnotation it is easy to write the eigenvectors of Z with ﬃﬃﬃ eigenvalue z as ð1= pÞjDðzÞii. In fact one has [52] ZjDðzÞii ¼ ða by ÞðDa ðzÞ Ib ÞjIii ¼ ðDa ðzÞ Ib Þða by þ zÞ

1 X

jni jni

n¼0

¼ zðDa ðzÞ Ib ÞjIii ¼ zjDðzÞii:

ð46Þ

The orthogonality of such eigenvectors can be veriﬁed through the relation hhDðzÞjDðz0 Þii ¼ Tr½Dy ðzÞDðz0 Þ ¼ pð2Þ ðz z0 Þ,

ð47Þ

where ð2Þ ð Þ denotes the Dirac delta function over the complex plane Z

ð2Þ

ð Þ ¼

C

d2 expð * * Þ: p2

ð48Þ

In conventional heterodyne detection the image band mode is in the vacuum state, and one is just interested in measuring the ﬁeld mode a. In this case we can evaluate the POVM upon tracing on mode b. One has 1 Trb ½jDðzÞiihhDðzÞjIa j0ih0j p 1 1 ¼ DðzÞj0ih0jDy ðzÞ ¼ jzihzj, p p

ðz, z* Þ ¼

ð49Þ

namely one obtain the projectors on coherent states. The coherent-state POVM provides the optimal joint measurement of conjugated quadratures of the ﬁeld [53]. In fact, heterodyne detection allows one to measure the Q-function in Equation (4). According to Equation (3) then it provides the expectation value of the antinormal ordered ﬁeld operator. For a state the expectation value of any quadrature X’ is obtained as Z hX’ i ¼ Tr½X’ ¼

C

d2 Reð ei’ ÞQð , * Þ: p

ð50Þ

The price to pay for jointly measuring noncommuting observables is an additional noise. The rms ﬂuctuation is evaluated as follows Z C

d2 1 ½Reð ei’ Þ 2 Qð , * Þ hX’ i2 ¼ hX’2 i þ , p 4

ð51Þ

QUANTUM TOMOGRAPHY

221

where hX’2 i is the intrinsic noise, and the additional term is usually referred to as ‘‘the additional 3 dB noise due to the joint measure’’ [54–56]. The eﬀect of nonunit quantum eﬃciency can be taken into account in an analogous way as in Section II.C for homodyne detection. The heterodyne photocurrent is rescaled by an additional factor 1=2 , and vacuum modes u and v are introduced, thus giving [57] sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1 1 y u : Z ¼ a by þ

ð52Þ

Upon tracing over modes u and , one obtain the POVM Z ðz, z* Þ ¼ Z

C

d 2 ðZy z* Þ * ðZ zÞ j0iu j0i u h0j h0je 2 p

d 2 ðZy z* Þ * ðZzÞ ðð1Þ=Þjj2 e e ð53Þ 2 C p Z d 2 z0 ðjz0 zj2 Þ=2 ð=ð1ÞjZzj2 e ¼ e jDðz0 ÞiihhDðz0 Þj: ¼ 2 pð1 Þ C p

¼

The probability distribution is then a Gaussian convolution on the complex plane of the ideal probability with rms 2 ¼ ð1 Þ=. Analogously, the coherent-state POVM for conventional heterodyne detection with vacuum image band mode is replaced with Z ðz, z* Þ ¼

C

d 2 z0 ðjz0 zj2 =2 Þ 0 0 e jz ihz j: p2

ð54Þ

From Equation (9) we can equivalently say that the heterodyne detection probability density is given by the generalized Wigner function Ws ð , * Þ, with s ¼ 1 ð2=Þ. Notice that for < 1, the average of functions n * m is related to the expectation value of a diﬀerent ordering of ﬁeld operators. However, one has the relevant identity [27,58]

: ðay Þn am : s ¼

ðn,mÞ X k¼0

k!

n m t sk : ðay Þnk amk :t , 2 k k

ð55Þ

222

MAURO D’ARIANO ET AL.

where ðn, mÞ ¼ minðn, mÞ, and then Z C

¼

d 2 W1ð2=Þ ð , * Þ m * n ðn,mÞ X k¼0

k!

n k

!

m k

!

1 k mk y nk ha ða Þ i:

ð56Þ

Notice that the measure of the Q-function (or any smoothed version for < 1) does not allow one to recover the expectation value of any operator through an average over heterodyne outcomes. In fact, one needs the admissibility of anti-normal ordered expansion [59] and the convergence of the integral in Equation (56). In particular, the matrix elements of the density operator cannot be recovered. For some operators in which heterodyne measurement is allowed, a comparison with quantum homodyne tomography will be given in Section IV.C. Finally, it is worth mentioning that all results of this section are valid also for an image-band mode with the same frequency of the signal. In this case a measurement scheme based on multiport homodyne detection should be used [50,58,60–66].

III. GENERAL TOMOGRAPHIC METHOD In this section we review the general tomographic method of spanning sets of operators of Ref. [24], and re-derive in this general framework the exact homodyne tomography method of Ref. [8]. In the ﬁrst section we ﬁrst give a brief history of quantum tomography, starting with the original proposal of Vogel and Risken [7], that extended the conventional tomographic imaging to the domain of quantum optics. Here we will brieﬂy sketch the conventional imaging tomographic method, and show the analogy with the method of Ref. [7]. The latter achieves the quantum state via the Wigner function, which in turn is obtained by inverse Radon transform of the homodyne probability distributions for varying phase with respect to the LO. As already mentioned, the Radon transform inversion is aﬀected by uncontrollable bias: such limitations and the intrinsic unreliability of this method are thoroughly explained in the same section. In contrast to the Radon transform method, the ﬁrst exact method of Ref. [8] (and successively reﬁned in Ref. [9]) allows the reconstruction of the density matrix , bypassing the step of the Wigner function, and achieving the matrix elements of —or the expectation of any arbitrary operator—by

QUANTUM TOMOGRAPHY

223

just averaging the pertaining estimators (also called Kernel functions or pattern functions), evaluated on the experimental homodyne data. This method will be re-derived in Section III.C.3, as a special case of the general tomographic method of Ref. [24] here reviewed in Section III.C, where we introduce the concept of ‘‘quorum,’’ which is the complete set of observables whose measurement provides the expectation value of any desired operator. Here we also show how some ‘‘orthogonality’’ and ‘‘completeness’’ relations in the linear algebra of operators are suﬃcient to individuate a quorum. As another application of the general method, in Section III.C.5 the tomography of spin systems [28] is reviewed, which was originally derived from the group theoretical methods of Refs. [18–20]. Another application is the quantum tomography of a free particle state, given in Section III.C.6. In Section III.D we include some further developments to improve the tomographic method, such as the deconvolution techniques of Ref. [20] to correct the imperfections of detectors and experimental apparatus with a suitable data processing, and the adaptive tomography of Ref. [22] to reduce the statistical ﬂuctuations of tomographic estimators, by adapting the averaged estimators to the given sample of experimental data. The other relevant topics of homodyning observables, multimode tomography, and tomography of quantum operations will be given a separate treatment in the following sections of the chapter.

A. Brief Historical Excursus The problem of quantum state determination through repeated measurements on identically prepared systems was originally stated by Fano in 1957 [5], who ﬁrst recognized the need for measuring more that two noncommuting observables to achieve such a purpose. However, it was only with the proposal by Vogel and Risken [7] that quantum tomography was born. The ﬁrst experiments, which already showed reconstructions of coherent and squeezed states, were performed by Raymer and his group at the University of Oregon [6]. The main idea at the basis of the ﬁrst proposal is that it is possible to extend to the quantum domain the algorithms that are conventionally used in medical imaging to recover two-dimensional (mass) distributions from unidimensional projections in diﬀerent directions. This ﬁrst tomographic method, however, was unreliable for the reconstruction of an unknown quantum state, since arbitrary smoothing parameters were needed in the Radon transform-based imaging procedure. The ﬁrst exact unbiased tomographic method was proposed in Ref. [8], and successively simpliﬁed in Ref. [9]. Since then, the new exact method has

224

MAURO D’ARIANO ET AL.

been practically implemented in many experiments, such as the measurement of the photon statistics of a semiconductor laser [10], and the reconstruction of the density matrix of a squeezed vacuum [11]. The success of optical homodyne tomography has then stimulated the development of state-reconstruction procedures in other quantum harmonic oscillator systems, such as for atomic beams [12], and the vibrational state of a molecule [13], of an ensemble of helium atoms [14], and of a single ion in a Paul trap [15]. After the original exact method, quantum tomography has been generalized to the estimation of arbitrary observables of the ﬁeld [16], to any number of modes [17], and, ﬁnally, to arbitrary quantum systems via group theory [18–21], with further improvements such as noise deconvolution [20], adaptive tomographic methods [22], and the use of max-likelihood strategies [23], which has made it possible to reduce dramatically the number of experimental data, up to a factor of 103–105, with negligible bias for most practical cases of interest. Finally, a method for tomographic estimation of the unknown quantum operation of a quantum device has been proposed [25], where a ﬁxed input entangled state plays the role of all input states in a sort of quantum parallel fashion. Moreover, as another manifestation of such a quantum parallelism, one can also estimate the ensemble average of all operators by measuring only one ﬁxed ‘‘universal’’ observable on an extended Hilbert space in a sort of quantum hologram [67]. This latest development is based on the general tomographic method of Ref. [24], where the tomographic reconstruction is based on the existence of spanning sets of operators, of which the irreducible unitary group representations of the group methods of Refs. [18–21] are just a special case. B. Conventional Tomographic Imaging In conventional medical tomography, one collects data in the form of marginal distributions of the mass function m(x, y). In the complex plane the marginal r(x, ’) is a projection of the complex function m(x, y) on the direction indicated by the angle ’ 2 [0, p], namely Z rðx, ’Þ ¼

þ1

1

dy m ðx þ iyÞei’ , ðx iyÞei’ : p

ð57Þ

The collection of marginals for diﬀerent ’ is called ‘‘Radon transform.’’ The tomography process essentially consists in the inversion of the Radon transform (57), in order to recover the mass function m(x, y) from the marginals r(x, ’).

225

QUANTUM TOMOGRAPHY

Here we derive inversion of Equation (57). Consider the identity Z mð , * Þ ¼

C

d 2 ð2Þ ð Þmð , * Þ,

ð58Þ

where (2)( ) denotes the Dirac delta function of Equation (48), and m( , *) ¼ m(x, y) with ¼ x þ iy and * ¼ xiy. It is convenient to rewrite Equation (48) as follows ð2Þ

Z

þ1

ð Þ ¼ 0

dk k 4

Z

2p

0

d’ ik ’ e ¼ p2

Z

þ1 1

dk jkj 4

Z 0

p

d’ ik ’ e , p2

ð59Þ

with ’ ¼ Re( ei’) ¼ ’ þ p. Then, from Equations (58) and (59) the inverse Radon transform is obtained as follows: Z

p

mðx, yÞ ¼ 0

d’ p

Z

þ1

dx0 rðx0 , ’Þ

1

Z

þ1 1

dk 0 jkjeikðx ’ Þ : 4

ð60Þ

Equation (60) is conventionally written as Z

p

mðx, yÞ ¼ 0

d’ p

Z

þ1 1

dx0 rðx0 , ’Þ Kðx0 ’ Þ,

ð61Þ

where K(x) is given by Z

þ1

KðxÞ: 1

dk 1 jkjeikx ¼ Re 4 2

Z

þ1

dk keikx ¼ 0

1 1 P , 2 x2

ð62Þ

with P denoting the Cauchy principal value. Integrating Equation (61) by parts one obtains the tomographic formula that is usually found in medical imaging, i.e., mðx, yÞ ¼

1 2p

Z

p 0

Z

þ1

d’ P 1

dx0

1 @ rðx0 , ’Þ, x0 ’ @x0

ð63Þ

which allows the reconstruction of the mass distribution m(x, y) from its projections along diﬀerent directions r(x, ’). 1. Extension to the Quantum Domain In the ‘‘quantum imaging’’ process the goal is to reconstruct a quantum state in the form of its Wigner function starting from its marginal probability distributions. As shown in Section II.A, the Wigner function is a

226

MAURO D’ARIANO ET AL.

real normalized function that is in one-to-one correspondence with the state density operator . As noticed in Equation (7), the probability distributions of the quadrature operators X’ ¼ ( yei’ þ ei’)/2 are the marginal probabilities of the Wigner function for the state . Thus, by applying the same procedure outlined in the previous subsection, Vogel and Risken [7] proposed a method to recover the Wigner function via an inverse Radon transform from the quadrature probability distributions p(x, ’), namely Z

p

Wðx, yÞ ¼ 0

d’ p

Z

þ1 1

0

0

dx pðx , ’Þ

Z

þ1 1

dk 0 jkjeikðx xcos’ysin’Þ : 4

ð64Þ

(Surprisingly, in the original paper [7] the connection to the tomographic imaging method was never mentioned.) As shown in Section II.C the experimental measurement of the quadratures of the ﬁeld is obtained using the homodyne detector. The method proposed by Vogel and Risken, namely the inversion of the Radon transform, was the one used in the ﬁrst experiments [6]. This ﬁrst method is, however, not reliable for the reconstruction of an unknown quantum state, due to the intrinsic unavoidable systematic error related to the fact that the integral on k in Equation (64) is unbounded. In fact, in order to evaluate the inverse Radon transform, one would need the analytical form of the marginal distribution of the quadrature p(x, ’), which, in turn, can only be obtained by collecting the experimental data into histograms, and thence ‘‘spline-ing’’ them. This, of course, is not an unbiased procedure since the degree of spline-ing, the width and the number of the histogram bins, and ﬁnally the number of diﬀerent phases used to collect the experimental data sample introduce systematic errors if they are not set above some minimal values, which actually depend on the unknown quantum state that one wants to reconstruct. Typically, an over-spline-ing will washout the quantum features of the state, whereas, vice versa, an under-spline-ing will create negative photon probabilities in the reconstruction (see Ref. [8] for details). A new exact method was then proposed in Ref. [8], as an alternative to the Radon transform technique. This approach, referred to as quantum homodyne tomography, allows one to recover the quantum state of the ﬁeld —along with any ensemble average of arbitrary operators—by directly averaging functions of the homodyne data, abolishing the intermediate step of the Wigner function, which is the source of all systematic errors. Only statistical errors are present, and they can be reduced arbitrarily by collecting more experimental data. This exact method will be re-derived from the general tomographic theory in Section III.C.3.

QUANTUM TOMOGRAPHY

227

C. General Method of Quantum Tomography In this section the general method of quantum tomography is explained in detail. First, we give the basics of Monte Carlo integral theory which are needed to implement the tomographic algorithms in actual experiments and in numerical simulations. Then, we derive the formulas on which all schemes of state reconstruction are based. 1. Basic Statistics The aim of quantum tomography is to estimate, for an arbitrary quantum system, the mean value hOi of a system operator O using only the results of the measurements on a set of observables {Ql, l 2 }, called the‘‘quorum.’’ The procedure by which this can be obtained needs the estimator or ‘‘Kernel function’’ R[O](x, l) which is a function of the eigenvalues x of the quorum operators. Integrating the estimator with the probability p(x, l) of having outcome x when measuring Ql, the mean value of O is obtained as follows Z hOi ¼

Z d ðxÞpðx, ÞR½O ðx, Þ,

d

ð65Þ

where the ﬁrst integral is performed on the values of l that designate all quorum observables, and the second on the eigenvalues of the quorum observable Ql determined by the l variable of the outer integral. For discrete set and/or discrete spectrum of the quorum, both integrals in (65) can be suitably replaced by sums. The algorithm to estimate hOi with Equation (65) is the following. One chooses a quorum operator Ql by drawing l with uniform probability in and performing a measurement, obtaining the result xi. By repeating the procedure N times, one collects the set of experimental data {(xi, li), with i ¼ 1, . . . , N}, where li identiﬁes the quorum observable used for the ith measurement, and xi its result. From the same set of data the mean value of any operator O can be obtained. In fact, one evaluates the estimator of hOi and the quorum Ql, and then samples the double integral of (65) using the limit hOi ¼ lim

N!1

N 1 X R½O ðxi , i Þ: N i¼1

ð66Þ

Of course the ﬁnite sum FN ¼

N 1 X R½O ðxi , i Þ N i¼1

ð67Þ

228

MAURO D’ARIANO ET AL.

gives an approximation of hOi. To estimate the error in the approximation one applies the central limit theorem that we recall here. Central limit theorem. Consider N statistically uncorrelated random variables {zi, i ¼ 1, . . . , N}, with mean values (zi), variances 2(zi), and bounded third-order moments. If the variances 2(zi) are all of the same order then the statistical variable ‘‘average’’ y deﬁned as

yN ¼

N 1 X zi N i¼1

ð68Þ

has mean and variance

ðyN Þ ¼

N 1 X ðzi Þ, N i¼1

2 ðyN Þ ¼

N 1 X 2 ðzi Þ: N 2 i¼1

ð69Þ

The distribution of yN approaches asymptotically a Gaussian for N ! 1. In practical cases, the distribution of y can be considered Gaussian already for N as low as N 10. For our needs the hypotheses are met if the estimator R[O](xi, li) in Equation (67) has limited moments up to the third order, since, even though xi have diﬀerent probability densities depending on li, nevertheless, since li is also random all zi here given by zi ¼ R½O ðxi , i Þ

ð70Þ

ðzi Þ ¼ hOi

ð71Þ

have common mean

and variance Z

Z

ðzi Þ ¼ 2

d

d ðxÞpðx, ÞR2 ½O ðx, Þ hOi2 :

ð72Þ

Using the central limit theorem, we can conclude that the experimental average y:FN in Equation (67) is a statistical variable distributed as a Gaussian with mean value ( yN):(zi) and variance 2( yN):(1/N) 2(zi). Then the tomographic estimation converges with statistical error that

229

QUANTUM TOMOGRAPHY

decreases as N1/2. A statistically precise estimate of the conﬁdence interval is given by sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ PN 2 i¼1 ½zi yN , "N ¼ NðN 1Þ

ð73Þ

with zi given by Equation (70) and yN by Equation (68). In order to test that the conﬁdence intervals are estimated correctly, one can check that the FN distribution is actually Gaussian. This can be done by comparing the histogram of the block data with a Gaussian, or by using the 2 test. 2. Characterization of the Quorum Diﬀerent estimations technique have been proposed tailored to diﬀerent quantum systems, such as the radiation ﬁeld [9,17], trapped ions and molecular vibrational states [68], and spin systems [69]. All the known quantum estimation techniques can be embodied in the following approach. The tomographic reconstruction of an operator O is possible when there exists a resolution of the form Z

d Tr½OBy ðÞ CðÞ,

O¼

ð74Þ

where l is a (possibly multidimensional) parameter on a (discrete or continuous) manifold . The only hypothesis in (74) is the existence of the trace. If, for example, O is a trace–class operator, then we do not need to require B(l) to be of Hilbert–Schmidt class, since it is suﬃcient to require B(l) bounded. The operators C(l) are functions of the quorum of observables measured for the reconstruction, whereas the operators B(l) form the dual basis of the set C(l). The term E½O ðÞ ¼ Tr½OBy ðÞ CðÞ

ð75Þ

represents the quantum estimator for the operator O. The expectation value of O is given by the ensemble average Z

y

Z

d Tr½OB ðÞ Tr½CðÞ :

hOi:Tr½O ¼

dhE½O ðÞi,

ð76Þ

where is the density matrix of the quantum system under investigation. Notice that the quantity Tr½CðlÞ depends only on the quantum state, and it is related to the probability distribution of the measurement outcomes,

230

MAURO D’ARIANO ET AL.

whereas the term Tr½OBy ðlÞ depends only on the quantity to be measured. In particular, the tomography of the quantum state of a system corresponds to writing Equation (74) for the operators O ¼ jkihnj, {jni} being a given Hilbert space basis. For a given system, the existence of a set of operators C(l), together with its dual basis B(l) allows universal quantum estimation, i.e., the reconstruction of any operator. We now give two characterizations of the sets B(l) and C(l) that are necessary and suﬃcient conditions for writing Equation (74). Condition 1: bi-orthogonality. Let us consider a complete orthonormal basis of vectors jni (n ¼ 0, 1, . . .). Equation (74) is equivalent to the biorthogonality condition Z

d qjBy ðÞjp mjCðÞjl ¼ mp lq ,

ð77Þ

where ij is the Kronecker delta. Equation (77) can be straightforwardly generalized to a continuous basis. Condition 2: completeness. If the set of operators CðlÞ is complete, namely if any operator can be written as a linear combination of the CðlÞ as Z O¼

d aðÞ CðÞ,

ð78Þ

then Equation (74) is also equivalent to the trace condition Tr By ðÞ CðÞ ¼ ð, Þ,

ð79Þ

where ðl, Þ is a reproducing kernel for the set BðlÞ, namely it is a function or a tempered distribution which satisﬁes Z d BðÞ ð, Þ ¼ BðÞ:

ð80Þ

An analogous identity holds for the set of CðlÞ Z d CðÞ ð, Þ ¼ CðÞ:

ð81Þ

The proofs are straightforward. The completeness condition on the operators CðlÞ is essential for the equivalence of (74) and (79). A simple counterexample is provided by the set of projectors PðlÞ ¼ jlihlj over the eigenstates of a self-adjoint operator L. In fact, Equation (79) is satisﬁed by CðlÞ ¼ BðlÞ:PðlÞ. However, since they do not form a complete set in the

QUANTUM TOMOGRAPHY

231

sense of Equation (78), it is not possible to express a generic operator in the R form X ¼ dl hljOjli jlihlj. If either the set BðlÞ or the set CðlÞ satisfy the additional trace condition Tr½By ðÞBðÞ ¼ ð, Þ,

ð82Þ

Tr½C y ðÞCðÞ ¼ ð, Þ,

ð83Þ

then we have CðlÞ ¼ BðlÞ (notice that neither BðlÞ nor CðlÞ need to be unitary). In this case, Equation (74) can be rewritten as Z

d Tr OC y ðÞ CðÞ:

O¼

ð84Þ

A certain number of observables Ql constitute a quorum when there are functions fl ðQl Þ ¼ CðlÞ such that CðlÞ form an irreducible set. The quantum estimator for O in Equation (75) is then written as a function of the quorum operators E½O ðÞ:E ½O ðQ Þ:

ð85Þ

Notice that if a set of observables Ql constitutes a quorum, than the set of projectors jqill hqj over their eigenvectors provides a quorum too, with the measure dl in Equation (74) including the measure dl ðqÞ. Notice also that, even once the quorum has been ﬁxed, the unbiased estimator for an operator O will not in general be unique, since there can exist functions N ðQl Þ that satisfy [22] Z d N ðQ Þ ¼ 0,

ð86Þ

and that will be called ‘‘null estimators.’’ Two unbiased estimators that diﬀer by a null estimator yield the same results when estimating the operator mean value. We will see in Section III.D.2 how the null estimators can be used to reduce the statistical noise. In terms of the quorum observables Ql Equation (76) is rewritten Z

d Tr OBy ðÞ Tr½ f ðQ Þ

hO i ¼ Z

¼

Z d

d ðqÞpðq, Þ Tr½OBy ðÞ f ðqÞ,

ð87Þ

232

MAURO D’ARIANO ET AL.

where pðq, lÞ ¼ lhqjjqil is the probability density of getting the outcome q from the measurement of Ql on the state . Equation (87) is equivalent to the expression (65), with estimator ð88Þ R½O ðq, Þ ¼ Tr OBy ðÞ f ðqÞ: Of course it is of interest to connect a quorum of observables to a resolution of the form (74), since only in this case can there be a feasible reconstruction scheme. If a resolution formula is written in terms of a set of self-adjoint operators, the set itself constitutes the desired quorum. However, in general a quorum of observables is functionally connected to the corresponding resolution formula. If the operators CðlÞ are unitary, then they can always be taken as the exponential map of a set of self-adjoint operators, which then are identiﬁed with our quorum Ql . The quantity Tr½CðlÞ is thus connected with the moment generating function of the set Ql , and hence to the probability density pðq, lÞ of the measurement outcomes, which play the role of the Radon transform in the quantum tomography of the harmonic oscillator. In general, the operators CðlÞ can be any function (neither self-adjoint nor unitary) of observables and, even more generally, they may be connected to POVMs rather than observables. The dual set BðlÞ can be obtained from the set CðlÞ by solving Equation (79). For ﬁnite quorums, this resorts to a matrix inversion. An alternative procedure uses the Gram–Schmidt orthogonalization procedure [24]. No such general procedure exists for a continuous spanning set. Many cases, however, satisfy conditions (82) and (83), and thus we can write BðlÞ ¼ CðlÞy . 3. Quantum Estimation for Harmonic Oscillator Systems The harmonic oscillator models several systems of interest in quantum mechanics, such as the vibrational states of molecules, the motion of an ion in a Paul trap, and a single mode radiation ﬁeld. Diﬀerent proposals have been suggested in order to reconstruct the quantum state of a harmonic system, which all ﬁt the framework of the previous section, which is also useful for devising novel estimation techniques. Here, the basic resolution formula involves the set of displacement operators Dð Þ ¼ expð ay * aÞ, which can be viewed as exponentials of the ﬁeld-quadrature operators X’ ¼ ðay ei’ þ aei’ Þ=2. We have shown in Section II.C that for a singlemode radiation ﬁeld X’ is measured through homodyne detection. For the vibrational tomography of a molecule or a trapped ion X’ corresponds to a time-evolved position or momentum. The set of displacement operators

233

QUANTUM TOMOGRAPHY

satisﬁes Equations (79) and (83), since Tr½Dð ÞDy ð Þ ¼ pð2Þ ð Þ,

ð89Þ

whereas Equation (84) reduces to the Glauber formula Z

d2 Tr ODy ð Þ Dð Þ: p

O¼ C

ð90Þ

Changing to polar variables ¼ ði=2Þkei’ , Equation (90) becomes Z

p

O¼

d’ p

0

Z

þ1 1

dkjkj Tr½OeikX’ eikX’ , 4

ð91Þ

which shows explicitly the dependence on the quorum X’ . Taking the ensemble average of both members and evaluating the trace over the set of eigenvectors of X’ , one obtains Z

p

hO i ¼

d’ p

0

Z

þ1

1

dx pðx, ’Þ R½O ðx, ’Þ,

ð92Þ

where pðx; ’Þ ¼ ’hxjjxi’ is the probability distribution of quadratures outcome. The estimator of the operator ensemble average hOi is given by R½O ðx, ’Þ ¼ Tr½OKðX’ xÞ ,

ð93Þ

where KðxÞ is the same as in Equation (62). Equation (92) is the basis of quantum homodyne tomography. Notice that even though KðxÞ is unbounded, the matrix element h jKðX’ xÞji can be bounded, whence it can be used to sample the matrix element h jji of the state , which, according to Section III.C.1, is directly obtained by averaging the estimator (93) over homodyne experimental values. In fact, for bounded h jKðX’ xÞji, the central limit theorem guarantees that

jj ¼

Z

p 0

d’ p

¼ lim

N!1

Z

þ1 1

dx pðx, ’Þ

jKðX’ xÞj

N 1 X jKðx’n xn ÞÞj , N n¼0

ð94Þ ð95Þ

234

MAURO D’ARIANO ET AL.

where xn is the homodyne outcome measured at phase ’n and distributed with probability pðx, ’Þ. Systematic errors are eliminated by choosing randomly each phase ’n at which homodyne measurement is performed. As shown in Section III.C.1, for a ﬁnite number of measurements N, the estimate (95) of the integral in Equation (94) is Gaussian distributed around the true value h jji, with statistical error decreasing as N 1=2 . Notice that the measurability of the density operator matrix element depends only on the boundedness of the matrix element of the estimator, and that no adjustable parameters are needed in the procedure, which thus is unbiased. The general procedure for noise deconvolution is presented in Section III.D.1. However, we give here the main result for the density matrix reconstruction. As shown in Section II.C, the eﬀect of the eﬃciency in homodyne detectors is a Gaussian convolution of the ideal probability pðx, ’Þ, as sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Z þ1 2 0 2 dx0 eð2=ð1ÞÞðxx Þ pðx0 , ’Þ: p ðx, ’Þ ¼ pð1 Þ 1

ð96Þ

The tomographic reconstruction procedure still holds upon replacing pðx, ’Þ with p ðx, ’Þ, so that Z

p

¼ 0

d’ p

Z

þ1

dx p ðx, ’ÞK ðX’ xÞ,

1

ð97Þ

where now the estimator is K ðxÞ ¼

1 Re 2

Z

þ1

k dk eðð1Þ=8Þk

2

þikx

:

ð98Þ

0

In fact, by taking the Fourier transform of both members of Equation (96), one can easily check that Z ¼

p

0

Z ¼

0

p

d’ p d’ p

Z

þ1

1

Z

dx p ðx, ’ÞK ðX’ xÞ

þ1

1

dx pðx, ’ÞKðX’ xÞ:

ð99Þ

Notice that the anti-Gaussian in Equation (98) causes a much slower convergence of the Monte Carlo integral (97): the statistical ﬂuctuation will increase exponentially for decreasing detector eﬃciency . In order to

QUANTUM TOMOGRAPHY

235

achieve good reconstructions with non-ideal detectors, then one has to collect a larger number of data. It is clear from Equation (95) that the measurability of the density matrix depends on the chosen representation and on the quantum eﬃciency of the detectors. For example, for the reconstruction of the density matrix in the Fock basis the estimators are given by Z

þ1

dkjkj ðð1Þ=8Þk2 ikx e hn þ djeikX’ jni 4 1 sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Z þ1 n! 2 ¼ eidð’þðp=2ÞÞ dkjkjeðð12Þ=2Þk i2kx kd Ldn ðk2 Þ, ðn þ dÞ! 1

R ½jnihn þ dj ðx, ’Þ ¼

ð100Þ where Ldn ðxÞ denotes the generalized Laguerre polynomials. Notice that the estimator is bounded only for > 1=2, and below the method would give unbounded statistical errors. However, this bound is well below the values that are reasonably achieved in the laboratory, where actual homodyne detectors have eﬃciencies ranging between 70% and 90% [11,70]. Moreover, a more eﬃcient algorithm is available, that uses the factorization formulas that hold for ¼ 1 [71,72] R½jnihdj ðx, ’Þ ¼ eid’ ½4xun ðxÞnþd ðxÞ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 n þ 1unþ1 ðxÞnþd ðxÞ 2 n þ d þ 1un ðxÞnþdþ1 ðxÞ ,

ð101Þ

where uj ðxÞ and j ðxÞ are the normalizable and unnormalizable eigenfunctions of the harmonic oscillator with eigenvalue j, respectively. The noise from quantum eﬃciency can be unbiased via the inversion of the Bernoulli convolution, which holds for > 1=2 [73]. The use of Equation (92) to estimate arbitrary operators through homodyne tomography will be the subject of Section IV. Notice that Equation (90) cannot be used for unbounded operators; however the estimators for some unbounded operators will be derived in Section IV.A. 4. Some Generalizations Using condition (79) one can see that the Glauber formula can be generalized to Z O¼ C

d 2 Tr½OF1 Dð ÞF2 F21 Dy ð ÞF11 , p

ð102Þ

236

MAURO D’ARIANO ET AL.

where F1 and F2 are two generic invertible operators. By choosing F1y ¼ F2 ¼ SðÞ, where SðÞ is the squeezing operator

1 2 y2 2 2 * a a , 2 C, SðÞ ¼ exp 2

ð103Þ

we obtain the tomographic resolution Z

p

hO i ¼ 0

d’ p

Z

þ1 1

dx p ðx, ’Þ Tr OKðX’ xÞ ,

ð104Þ

in terms of the probability distribution of the generalized squeezed quadrature operators X’ ¼ S y ðÞX’ SðÞ ¼

1 i’ ðe þ ei’ Þay þ ðei’ þ * ei’ Þa , 2

ð105Þ

with ¼ coshjj and ¼ sinhjjexp½2i argðÞ . Such an estimation technique has been investigated in detail in Ref. [74]. A diﬀerent estimation technique can be obtained by choosing in y Equation (102) F1 ¼ I, the identity operator, and F2 ¼ ðÞa a , the parity operator. In this case one gets Z O¼ C

i y y d 2 h Tr ODy ð ÞðÞa a ðÞa a Dð Þ: p

ð106Þ

Changing variable to ¼ 2 and using the relation y

y

ðÞa a Dð2 Þ ¼ Dy ð ÞðÞa a Dð Þ

ð107Þ

it follows Z hO i ¼ C

i i h y y d 2 h Tr O4Dy ð ÞðÞa a Dð Þ Tr Dð ÞDy ð ÞðÞa a : p

ð108Þ

Hence, it is possible to estimate hOi by repeated measurement of the parity operator on displaced versions of the state under investigation. An approximated implementation of this technique for a single-mode radiation ﬁeld has been suggested in Refs. [75,76] through the measurement of the photon number probability on states displaced by a beam splitter. A similar

QUANTUM TOMOGRAPHY

237

scheme has been used for the experimental determination of the motional quantum state of a trapped atom [15]. In comparison with the approximated methods, Equation (108) allows one to obtain directly the estimator R[O]( ) for any operator O for which the trace exists. For instance, the reconstruction of the density matrix in the Fock representation is obtained by averaging the estimator y

R½jnihn þ djj ð Þ ¼ 4hn þ djDy ð ÞðÞa a Dð Þjni sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 n! nþd ¼ 4ðÞ ð2 Þd e2j j Ldn ð4j j2 Þ, ðn þ dÞ!

ð109Þ

without the need of artiﬁcial cut-oﬀ in the Fock space [15]. 5. Quantum Estimation for Spin Systems The spin tomographic methods of Refs. [20,28,69] allow the reconstruction of the quantum state of a spin system. These methods utilize measurements of the spin in diﬀerent directions, i.e., the quorum is the set of operators of the form S~ n~, where S~ is the spin operator and n~:ðcos ’ sin #, sin ’ sin #, cos #Þ is a varying unit vector. Diﬀerent quorums can be used that exploit diﬀerent sets of directions. The easiest choice for the set of directions n~ is to consider all possible directions. The procedure to derive the tomographic formulas for this quorum is analogous to the one employed in Section III.C.3 for homodyne tomography. The reconstruction formula for spin tomography for the estimation of an arbitrary operator O is hOi ¼

s Z X d n~ pðm, n~Þ R½O ðm, n~Þ, 4p m¼s

ð110Þ

where pðm, n~Þ is the probability of obtaining the eigenvalue m when measuring the spin along direction n~, R½O ðm, n~Þ is the tomographic estimator for the operator O, and is the unit sphere. In this case the operators C(l) of Equation (74) are given by the set of projectors over the eigenstates jm, n~i of the operators S~ n~. Notice that this is a complete set of operators on the system Hilbert space H. In order to ﬁnd the dual basis B, one must consider the unitary operators obtained by exponentiating the quorum, i.e., Dð , n~Þ ¼ expði S~ n~Þ, which satisfy the bi-orthogonality condition (77). In fact, Dð , n~Þ constitutes a unitary irreducible representation of the group G ¼ SU(2), and the bi-orthogonality condition is just

238

MAURO D’ARIANO ET AL.

the orthogonality relations between the matrix elements of the group representation [77], i.e. Z G

dg Djr ðgÞDytk ðgÞ ¼

V jk tr , d

ð111Þ

where D is a unitary irreducible representation Rof dimension d, dg is the group Haar invariant measure, and V ¼ G dg. For G ¼ SU(2), with the (2s þ 1)-dimensional unitary irreducible representation Dð , n~Þ (~ n 2 S 2 unit vector on the sphere, and 2 ½0, 4p the rotation angle around n~) the Haar’s invariant measure is sin2 ð =2Þ sin# d# d’ d , and V=d ¼ 8p2 =ð2s þ 1Þ. We need, however, to integrate only for 2 ½0, 2p (the change of sign for 2p rotation is irrelevant), whence the bi-orthogonality condition is 2s þ 1 4p2

Z

Z d n~

D

2p

d sin2

2

0

jjei

n~S~

ED jr tjei

n~S~

E jk ¼ jk tr ,

ð112Þ

and hence the spin tomography identity is given by 2s þ 1 O¼ 4p2

Z

Z

2p

d n~

d sin2 0

2

Tr ODy ð , n~Þ Dð , n~Þ:

ð113Þ

Notice the analogy between Equation (113) and Glauber’s formula (90). In fact, both homodyne and spin tomography can be derived using the method of group tomography [20], and the underlying groups are the Weyl–Heisenberg group and the SU(2) group, respectively. Formula (110) is obtained from Equation (113) through the expectation value calculated on the eigenstates of S~ n~. Thus, the explicit form of the tomographic estimator is obtained as 2s þ 1 R½O ðm, n~Þ ¼ p

Z

2p

d sin2 0

2

h Tr Oei

S~~ n

i

ei

m

:

ð114Þ

As already noticed, there are other possible quorums for spin tomography. For example, for spin s ¼ 1/2 systems, a self-dual basis for the operator space is given by the identity and P the Pauli matrices. In fact, from the properties Tr½ ¼ 0 and ¼ i " ( , , ¼ x, y, z),

QUANTUM TOMOGRAPHY

239

both the bi-orthogonality relation (77) and the trace condition (79) follow. In this case the reconstruction formula is hOi ¼

1 1 X X Tr½O þ mpðm, n~ ÞTr½O : 2 2 ¼x,y,z m¼1=2

ð115Þ

In the case of generic s spin system, Weigert has also shown [69] that by choosing ð2s þ 1Þ2 arbitrary directions for n~, it is possible to obtain (in almost all cases) a quorum of projectors js, n~j ihs, n~j j ( j ¼ 1, . . . , ð2s þ 1Þ2 ), where js, n~j i is the eigenstate pertaining to the maximum eigenvalue s of S~ n~j . 6. Quantum Estimation for a Free Particle The state of a moving packet can be inferred from position measurement at diﬀerent times [78]. Assuming a particle with unit mass and using normalized unit h=2 ¼ 1, the free Hamiltonian is given by the square of momentum operator HF ¼ p2 . In terms of the eigenvectors jxi of the position operator and of the self-adjoint operator Rðx, Þ ¼ eip jxihxjeip , 2

2

ð116Þ

the probability density of the position of the free particle at time is obtained as pðx, Þ ¼ Tr½Rðx, Þ . The operators Rðx, Þ provide a self-dual basis, and an arbitrary particle state can be written as Z Z ¼

R

R

dx d pðx, Þ Rðx, Þ:

ð117Þ

D. Noise Deconvolution and Adaptive Tomography In this section we will analyze: (1) the noise deconvolution scheme of Refs. [20,79], that allows one to eliminate the experimental noise that arises from imperfect detection and lossy devices; and (2) the adaptive tomography technique of Ref. [22] that allows one to tune the unbiased tomographic estimators to a speciﬁc sample of experimental data, in order to reduce the statistical noise.

240

MAURO D’ARIANO ET AL.

1. Noise Deconvolution In short, it is possible to eliminate detection noise when it is possible to invert the noise map. A noise process is described by a trace preserving a completely positive map . The noise can be deconvolved at the data analysis if the inverse of exists, namely 1 : LðHÞ ! LðHÞ, with 1 ½½O ¼ O, for 8O 2 LðHÞ, the estimator E l ½O ðQl Þ is in the domain of 1, the map 1 ½E l ½O ðQl Þ is a function of Ql .

If the above conditions are met, we can recover the ‘‘ideal’’ expectation value hOi that we would get without noise. This is achieved by replacing E l ½O ðQl Þ with 1 ½E l ½O ðQl Þ , and evaluating the ensemble average with the state ðÞ, namely the state aﬀected by the noise ( represents the dual map that provides the evolution in the Schroedinger picture). Hence, one has Z

d Tr½1 ½E ½O ðQ Þ ðÞ

hOi ¼

Z

ð118Þ 1

:

dh ½E ½O ðQ Þ i :

Consider, for example, the noise arising from nonunity quantum eﬃciency of homodyne detectors. Recall that the ideal probability density is replaced by a Gaussian convolution with rms 2 ¼ ð1 Þ=ð4Þ. Then, the map acts on the quorum as follows Z ½e

ikX’

þ1

¼

dx eikx ½jxihxj

1

Z

Z

þ1

¼

þ1

dx 1

0 2

dx0 eikx eðxx Þ

=22

½jx0 ihx0 j

ð119Þ

1

¼ e2 k eikX’ : 1

2 2

Of course one has ikX’ ¼ e 2 1 ½e 1

2 2

k

eikX’ :

ð120Þ

QUANTUM TOMOGRAPHY

In terms of the Fourier transform of the estimator Z þ1 dx ixy ˜ e R½O ðx, ’Þ, R½O ð y, ’Þ ¼ 1 2p

241

ð121Þ

one has ˜ ˜ ½O ðy, ’Þ ¼ e122 y2 R½O ð y, ’Þ: R

ð122Þ

We applied the above result in Section III.C.3, where the eﬀect of nonunity quantum eﬃciency for reconstructing the density matrix elements was discussed. The use of the estimator in Equation (98) and the origin of the bound > 1=2 is now more clear. Another simple example of noise deconvolution is given here for a spin 1=2 system. Consider the map that describes the ‘‘depolarizing channel’’ p ð123Þ p ½O ¼ ð1 pÞO þ Tr½O I, 0 p 1: 2 This map can be inverted for p 6¼ 1 as follows 1 p ½O ¼

1 p O Tr½O I : 1p 2

ð124Þ

Then Equation (115) can be replaced with hOi ¼

X X 1 1 Tr½O þ mpp ðm, n~ Þ Tr½O , 2 2ð1 pÞ m¼1=2 ¼x,y,z

ð125Þ

where now pp ðm, n~ Þ represents the probability of outcome m when measuring on the noisy state p ½ . 2. Adaptive Tomography The idea of adaptive tomography is that the tomographic null estimators of Equation (86) can be used to reduce statistical errors. In fact, the addition of a null estimator in the ideal case of inﬁnite statistics does not change the average since its mean value is zero, but can change the variance. Thus, one can look for a procedure to reduce the variance by adding suitable null functions. Consider the class of equivalent estimators for O E 0 ½O ðQ Þ ¼ E ½O ðQ Þ þ

M X i¼1

i N i ðQ Þ:

ð126Þ

242

MAURO D’ARIANO ET AL.

Each estimator in the class E0 is identiﬁed by the coeﬃcient vector ~. The variance of the tomographic averages can be evaluated as 2 E 0 ½O ¼ 2 E½O þ 2

M X

i N i E½O þ

i¼1

where F:h

R

M X

i j N i N j ,

ð127Þ

i,j¼1

dl FðQl Þi, and 2

2 E½O ¼ E 2 ½O E½O :

ð128Þ

Minimizing 2 E 0 ½O with respect to the coeﬃcients i , one obtains the equation M X

j N i N j ¼ E½O N i ,

ð129Þ

j¼1

which can be solved starting from the estimated mean values, with the vector ~ as unknown. Notice that the obtained vector ~ will depend on the experimental data, and has to be calculated with the above procedure for any new set of data. In this way we obtain an adaptive tomographic algorithm, which consists of the following steps: Find the null estimators N i ðQl Þ ði ¼ 1, . . . , MÞ for the quorum which is being used in the experiment. Execute the experiment and collect the input data. Calculate, using the obtained data, the mean values N i N j and E½O N i , and solve the linear system (129), to obtain ~. Use the vector ~ obtained in the previous step to build the ‘‘optimized P estimator’’ E 0 ½O ðQl Þ ¼ E½O ðQl Þ þ i i N i ðQl Þ. Using the data collected in the ﬁrst step, the mean value hOi is now evaluated as

Z hOi ¼

d hE 0 ½O ðQ Þi,

ð130Þ

where the optimized estimator has been used. For each new set of data the whole procedure must be repeated, as ~ is dependent on the data.

Notice that also the experimental mean values are slightly modiﬁed in the adaptive tomographic process, since null estimators do not change mean values only in the limiting case of inﬁnite statistics. Examples of simulations

QUANTUM TOMOGRAPHY

243

of the adaptive technique that eﬃciently reduce statistical noise of homodyne tomographic reconstructions can be found in Ref. [22]. In homodyne tomography null estimators are obtained as linear combinations of the following functions N k,n ðX’ Þ ¼ X’k eiðkþ2þ2nÞ’ ,

k, n 0:

ð131Þ

One can easily check that such functions have zero average over ’, independent of . Hence, for every operator O one actually has an equivalence class of inﬁnitely many unbiased estimators, which diﬀer by a linear combination of functions N k,n ðX’ Þ. It is then possible to minimize the rms error in the equivalence class by the least-squares method, obtaining in this way an optimal estimator that is adapted to the particular set of experimental data.

IV. UNIVERSAL HOMODYNING As shown in Ref. [16], homodyne tomography can be used as a kind of universal detector for measuring generic ﬁeld operators, at the expense, however, of some additional noise. In this section the general class of ﬁeld operators that can be measured in this way is reviewed, which includes also operators that are inaccessible to heterodyne detection. In Ref. [29] the most relevant observables were analyzed—such as the intensity, the real, the complex ﬁeld, and the phase—showing how their tomographic measurements are aﬀected by noise that is always larger than the intrinsic noise of the direct detection of the considered observables. On the other hand, by comparing the noise from homodyne tomography with that from heterodyning (for those operators that can be measured in both ways), in Ref. [29] it was shown that for some operators homodyning is better than heterodyning when the mean photon number is suﬃciently small, i.e., in the quantum regime, and in this section such comparisons will be also reviewed. A. Homodyning Observables Homodyne tomography provides the maximum achievable information on the quantum state of a single-mode radiation ﬁeld through the use of the estimators in Section III.C.3. In principle, the knowledge of the density matrix should allow one to calculate the expectation value for unbounded operators. However, this is generally true only when one has an analytic knowledge of the density matrix, but it is not true when the matrix has been

244

MAURO D’ARIANO ET AL.

obtained experimentally. In fact, the Hilbert space is actually inﬁnite dimensional, whereas experimentally one can achieve only a ﬁnite matrix, each element being aﬀected by an experimental error. Notice that, even though the method allows one to extract any matrix element in the Hilbert space from the same bunch of experimental data, it is the way in which errors converge in the Hilbert space that determines the actual possibility of estimating the trace hOi ¼ Tr½O for an arbitrary operator O. This issue has been debated in the set of papers of Ref. [73]. Consider, for example, the number representation, and suppose that we want to estimate the average photon number hay ai. In Ref. [80] it has been shown that for nonunit quantum eﬃciency the statistical error for the diagonal matrix element hnjjni diverges faster than exponentially versus n, whereas pﬃﬃﬃﬃﬃﬃﬃﬃﬃ for ¼ 1 the error saturates for large n to the universal value "n ¼ 2=N that depends only on the number N of experimental data, but is independent of both n and on the quantum state. Even for the unrealistic case P ¼ 1, one can see immediately that the estimated expectation value hay ai ¼ H1 n¼0 nnn based on the measured matrix elements nn , will exhibit an unbounded error versus the truncated-space dimension H, because the nonvanishing error of nn versus n multiplies the increasing eigenvalue n. Here, we report the estimators valid for any operator that admits a normal ordered expansion, giving the general class of operators that can be measured in this way, also as a function of the quantum eﬃciency . Hence, from the same tomographic experiment, one can obtain not only the density matrix, but also the expectation value of various ﬁeld operators, also unbounded, and including some operators that are inaccessible to heterodyne detection. However, the price to pay for such detection ﬂexibility is that all measured quantities will be aﬀected by noise. If one compares this noise with that from heterodyning (for those operators that can be measured in both ways), it turns out that for some operators homodyning is anyway less noisy than heterodyning, at least for small mean photon numbers. The procedure for estimating the expectation hOi will be referred to as homodyning the observable O. By homodyning the observable O we mean averaging an appropriate estimator R½O ðx, ’Þ, independent on the state , over the experimental homodyne data, achieving in this way the expectation value hOi for every state , as in Equation (92). For unbounded operators one can obtain the explicit form of the estimator R½O ðx, ’Þ in a diﬀerent way. Starting from the identity involving trilinear products of Hermite polynomials [81] Z

þ1 1

dx ex Hk ðxÞ Hm ðxÞ Hn ðxÞ ¼ 2

2ðmþnþkÞ=2 p1=2 k!m!n! , ðs kÞ!ðs mÞ!ðs nÞ!

ð132Þ

QUANTUM TOMOGRAPHY

245

for k þ m þ n ¼ 2s even, Richter proved the following nontrivial formula for the expectation value of the normally ordered ﬁeld operators [82] hayn am i ¼

Z

p 0

d’ p

Z

þ1 1

pﬃﬃﬃ Hnþm ð 2xÞ dx pðx, ’ÞeiðmnÞ’ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ , 2nþm ð0Þðnþm n Þ

ð133Þ

which corresponds to the estimator yn m

R½a a ðx, ’Þ ¼ e

iðmnÞ’

pﬃﬃﬃ Hnþm ð 2xÞ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ nþm : 2nþm ð n Þ

ð134Þ

This result can be easily extended to the case of nonunit quantum eﬃciency < 1. Using Equation (122) one obtains pﬃﬃﬃﬃﬃ Hnþm ð 2xÞ ﬃ R ½ayn am ðx, ’Þ ¼ eiðmnÞ’ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ : ð2Þnþm ðnþm n Þ

ð135Þ

From Equation (135) by linearity one can obtain the estimator R ½ f ðx, ’Þ for any operator function f that has normal ordered expansion f :f ða, ay Þ ¼

1 X

ðNÞ yn m fnm a a :

ð136Þ

nm¼0

From Equation (135) one obtains pﬃﬃﬃﬃﬃ 1 1 X Hs ð 2xÞ X ðNÞ iðmnÞ’ fnm e n!m!nþm,s R ½ f ðx, ’Þ ¼ s=2 s!ð2Þ s¼0 nm¼0 pﬃﬃﬃﬃﬃ 1 X Hs ð 2xÞis d s ¼ F ½ f ð, ’Þ, s!ð2Þs=2 d s ¼0 s¼0

ð137Þ

where F ½ f ð, ’Þ ¼

1 X

ðNÞ fnm

nm¼0

nþm m

1 ðiÞnþm eiðmnÞ’ :

ð138Þ

Continuing from Equation (137) one has R ½ f ðx, ’Þ ¼ exp

1 d2 2ix d þ F ½ f ð, ’Þ, pﬃﬃﬃ 2 d2 d ¼0

ð139Þ

246

MAURO D’ARIANO ET AL.

and ﬁnally

Z

R ½ f ðx, ’Þ ¼

þ1

1

dw pﬃﬃﬃ 2 pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ eð=2Þw F ½ f ðw þ 2ix= , ’Þ: 1 2p

ð140Þ

Hence one concludes that the operator f can be measured by homodyne tomography if the function F ½ f ð, ’Þ in Equation (138) grows slower than expð2 =2Þ for ! 1, and the integral in Equation (140) grows at most exponentially for x ! 1 (assuming pðx, ’Þ goes to zero faster than exponentially at x ! 1). The robustness to additive phase-insensitive noise of this method of homodyning observables has also been analyzed in Ref. [16], where it was shown that just half a photon of thermal noise would spoil completely the measurement of the density matrix elements in the Fock representation. In Table 1 we report the estimator R ½O ðx, ’Þ for some operators O. The operator W^ s gives the generalized Wigner function Ws ð , * Þ for ordering parameter s through the relation in Equation (11). From the expression of R ½W^ s ðx, ’Þ it follows that by homodyning with quantum eﬃciency one can measure the generalized Wigner function only for s < 1 1 : in particular the usual Wigner function for s ¼ 0 cannot be measured for any quantum eﬃciency. B. Noise in Tomographic Measurements In this section we will review the analysis of Ref. [29], where the tomographic measurement of following four relevant ﬁeld quantities has been studied: the ﬁeld intensity, the real ﬁeld or quadrature, the complex ﬁeld, and the phase. For all these quantities the conditions given after Equation (140) are fulﬁlled. TABLE 1 ESTIMATOR R ½O ðx, ’Þ

FOR SOME

OPERATORS O (FROM [16]) R ½O ðx, ’Þ

O aynam

pﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ eiðmnÞ’ ½Hnþm ð 2xÞ= ð2Þnþm ðnþm n Þ

a

2ei’x

a2

e2i’(4x21/)

y

2x2(1/2)

aa y

(a a) y W^ s ¼ ½2=pð1 sÞ ½ðs þ 1Þ=ðs 1Þ a a

(8/3)x4 [((4 2)/)x2] þ [(1 )/22] pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ R1 t 0 dt½2e =ðpð1 sÞ ð1=ÞÞ cos½2 2t=ðð1 sÞ ð1=ÞÞ x

jnihn þ dj

R[jnihn þ dj](x, ’) in Equation (100)

2

QUANTUM TOMOGRAPHY

247

The tomographic measurement of the observable O is provided in terms of the average w of the estimator w :R ½O ðx, ’Þ over the homodyne data. q ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃThe precision of the measurement is given by the conﬁdence interval w2 . When w is a real quantity, one has w2 ¼ w2 w 2 ,

ð141Þ

where Z w2 :R2 ½O ðx, ’Þ ¼

p 0

d’ p

Z

1 1

dx p ðx, ’Þ R2 ½O ðx, ’Þ:

ð142Þ

When w is complex, one has to consider the eigenvalues of the covariance matrix, namely w2 ¼

i 1h 2 jwj jw j2 jw2 w 2 j : 2

ð143Þ

When the observable O can also be directly measured by a speciﬁc setup we can compare the tomographic precision w2 with hO2 i ¼ hO2 i hO2 i. 2 Notice that, when we deal with < 1 the noise hO i is larger that the quantum ﬂuctuations due to the smearing eﬀect of nonunit quantum eﬃciency. As we will see, the tomographic measurement is always more noisy than the corresponding direct measurement for any observable at any quantum eﬃciency . This is not surprising, in view of the larger amount of information retrieved in the tomographic measurement as compared to the direct measurement of a single quantity. According to Equation (142), the evaluation of the added noise requires the average of the squared estimator. For the estimators in Equation (135) it is very useful to consider the following identity for the Hermite polynomials [83] Hn2 ðxÞ ¼ 2n n!2

n X k¼0

H2k ðxÞ , k!2 2k ðn kÞ!

ð144Þ

that allows one to write R2 ½ayn am ðx, ’Þ ¼ e2i’ðmnÞ

þn X n!2 m!2 m ð2kÞ!k R ½ayk ak ðx, ’Þ, mþn k¼0 k!4 ðn þ m kÞ!

ð145Þ

248

MAURO D’ARIANO ET AL.

namely the squared estimator R2 ½ayn am ðx, ’Þ can be written just in terms of ‘‘diagonal’’ estimators R ½ayk ak ðx, ’Þ. 1. Field Intensity Photodetection is the direct measurement of the ﬁeld intensity. For nonunit quantum eﬃciency , the probability of detecting m photons is given by the Bernoulli convolution in Equation (22). Let us consider the rescaled photocurrent I ¼

1 y a a,

ð146Þ

which traces the photon number, namely hI i ¼

1 1X m p ðmÞ ¼ hay ai: n: m¼0

ð147Þ

The variance of I is given by D

1 E 1 X 1 m2 pðmÞ n2 ¼ n2 þ n 1 , I2 ¼ 2 m¼0

ð148Þ

where hn2i denotes the intrinsic photon number variance, and nð1 1Þ represents the noise introduced by ineﬃcient detection. The tomographic estimator that traces the photon number is given by the phase-independent function w :2x2 ð2Þ1 . Using Equation (145) we can evaluate its variance as follows 1 2 3 1 w2 ¼ hn2 i þ hn2 i þ n ð149Þ þ 2: 2 2 2 The noise N[n] added by tomography in the measurement of the ﬁeld intensity n is then given by

1 2 1 N½n ¼ w2 hI 2 i ¼ hn2 i þ n 1 þ 2 : ð150Þ 2 Notice that N[n] is always positive, and largely depends on the state under examination. For coherent states we have the noise ratio sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

w2 1 1 1=2 n ¼ n þ , ð151Þ ¼ 2þ 2 n hI 2 i which is minimum for n ¼ 1 .

QUANTUM TOMOGRAPHY

249

2. Real Field For single-mode radiation the electric ﬁeld is proportional to a quadrature X ¼ ða þ ay Þ=2, which is just traced by homodyne detection at ﬁxed zerophase with respect to the local oscillator. The tomographic estimator is given by w :R ½X ðx, ’Þ ¼ 2x cos ’, independent of , whereas the squared estimator R2 ½X can be written as w2 ¼

1 1 cosð2’Þ R ½a2 ðx, ’Þ þ R ½ay2 ðx, ’Þ þ R ½ay a ðx, ’Þ þ þ : ð152Þ 4 2 2

Then one has w2 ¼

1 y2 1 1 ha i þ ha2 i þ n þ ha þ ay i2 4 2 4

¼ hX 2 i þ

1 2 , nþ 2 4

ð153Þ

where hX2i represents the intrinsic quadrature ﬂuctuations. The tomographic noise in Equation (153) can be compared with the rms variance of direct homodyne detection (see Section II.C) hX 2 i ¼ hX 2 i þ

1 : 4

ð154Þ

Then the added noise reads N½X ¼

n 1 þ : 2 4

ð155Þ

For coherent states hX2i ¼ 1/4, and one has the noise ratio sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ w2 x ¼ ¼ 2n þ 2: hX 2 i

ð156Þ

3. Field Amplitude The detection of the complex ﬁeld amplitude of a single-mode light beam is represented by the generalized measurement of the annihilation operator a. The tomographic estimator for a is given by the complex function

250

MAURO D’ARIANO ET AL.

w :R ½a ðx, ’Þ ¼ 2x exp ði’Þ, and the precision of the measurement is evaluated as in Equation (143). From Equation (145) one obtains

1 ei2’ þ 2R ½ay a ðx, ’Þ ¼ þ R ½a2 ðx, ’Þ,

w2 :R2 ½a ðx, ’Þ ¼ ei2’

ð157Þ

and jw j2 :jR ½a ðx, ’Þj2 ¼

1 1 þ 2R ½ay a ðx, ’Þ ,

ð158Þ

and hence w2

1 1 2 2 2 þ 2n jhaij jha i ha ij : ¼ 2

ð159Þ

The optimal measurement of the complex ﬁeld a is obtained through heterodyne detection. As noticed in Section II.D the probability distribution is given by the generalized Wigner function Ws ð , * Þ, with s ¼ 1 ð2=Þ. Using Equation (56) the precision of the measurement is easily evaluated as follows i 2 1h a ¼ j j2 j j2 j 2 2 j 2

1 1 2 2 2 ¼ n þ jhaij jha i hai j : 2

ð160Þ

The noise added by quantum tomography then reads N½a ¼

1 n, 2

ð161Þ

which is independent on quantum eﬃciency. For a coherent state we have w2

1 1 ¼ nþ , 2

ha2 i ¼

1 , 2

ð162Þ

and the noise ratio is then sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ w2 a ¼ ¼ 1 þ n: 2 ha i

ð163Þ

251

QUANTUM TOMOGRAPHY

4. Phase The canonical description of the quantum optical phase is given by the probability operator measure [53,84] dð’Þ ¼

1 d’ X exp½iðm nÞ’ jnihmj: 2p n,m¼0

ð164Þ

However, no feasible setup is known that achieves the optimal measurement (164). For this reason, here we consider the heterodyne measurement of the phase, and compare it with the phase of the tomographic estimator for the corresponding ﬁeld operator a, i.e., w ¼ argð2xei’ Þ. Notice that the phase w does not coincide with the local oscillator phase ’, because x has varying sign. The probability distribution of w can be obtained by the following identity Z

p 0

d’ p

Z

Z

1 1

dx p ðx, ’Þ ¼ 1 ¼

p

p

dw p

Z

1

dx p ðx, w Þ,

ð165Þ

0

which implies p ðw Þ ¼

1 p

Z

1

dx p ðx, w Þ:

ð166Þ

0

The precision in the tomographic phase measurement is given by the rms variance w2 of the probability (166). In the case of a coherent state with positive amplitude j i:jj ji, Equation (166) gives " !# pﬃﬃﬃ 1 2j j cos w 1 þ Erf , p ðw Þ ¼ pﬃﬃﬃ 2p

ð167Þ

which approaches a ‘‘boxed’’ distribution in ½p=2, p=2 for large intensity j j 1. We compare the tomographic phase measurement with heterodyne detection, namely the phase of the direct-detected complex ﬁeld a. The outcome probability distribution is the marginal distribution of the generalized Wigner function Ws ð , * Þ (s ¼ 1 (2/)) integrated over the radius Z p ð’Þ ¼ 0

1

d Ws ðei’ , ei’ Þ,

ð168Þ

252

MAURO D’ARIANO ET AL.

whereas the precision in the phase measurement is given by its rms variance’2n : We are not able to give a closed formula for the added noise N½’ ¼ w2 ’2 . However, for high excited coherent states j i:jj ji (zero mean phase) one has w2 ¼ p2 =12 and ’2 ¼ ð2nÞ1 . The asymptotic noise ratio is thus given by vﬃﬃﬃﬃﬃﬃﬃﬃﬃ rﬃﬃﬃﬃﬃ u 2 uy n t , ’ ¼ ¼p 2 6 ’

n 1:

ð169Þ

A comparison for low excited coherent states can be performed numerically. The noise ratio ’ (expressed in dB) is shown in Figure 3 for some values of the quantum eﬃciency . It is apparent that the tomographic determination of the phase is more noisy than heterodyning also in this low-intensity regime. In Table 2 a synthesis of the results of this section is reported. We have considered the ratio between the tomographic and the direct-measurement noise. This is an increasing function of the mean photon number n, scaled by the quantum eﬃciency . Therefore homodyne tomography turns out to be a very robust detection scheme for low quantum eﬃciency. In Figure 4 the coherent-state noise ratios (in dB) for all the considered quantities are plotted for unit quantum eﬃciency versus n.

FIGURE 3. Ratio between tomographic and heterodyne noise in the measurement of the phase for low excited coherent states. The noise ratio is reported versus the mean photon number n for some values of the quantum eﬃciency. From bottom to top we have ¼ 0.2, 0.4, 0.6, 0.8, 1.0. (From Ref. [29].)

253

QUANTUM TOMOGRAPHY TABLE 2

ADDED NOISE N[O] IN TOMOGRAPHIC MEASUREMENT OF O AND NOISE RATIO O FOR COHERENT STATES. FOR THE PHASE THE RESULTS ARE VALID IN THE ASYMPTOTIC REGIME n 1 (FROM REF. [29]) O aya X A ’

N[O]

O

ð1=2Þ½hn2 i þ nðð2=Þ 1Þ þ ð1=2 Þ ð1=2Þ½n þ ð1=2Þ ð1=2Þn ðp=12Þ ð1=2nÞ

½2 þ ðn=2Þ þ ð1=2nÞ 1=2 ½2ð1 þ nÞ 1=2 ð1 þ nÞ1=2 pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ p n=6

FIGURE 4. The coherent-state noise ratio (in dB) for all the quantities considered in this section. (From Ref. [29].)

In conclusion, homodyne tomography adds larger noise for highly excited states, however, it is not too noisy in the quantum regime of low n. It is then very useful in this regime, where currently available photodetectors suﬀer most limitations. Indeed, it has been adopted in experiments of photodetection [10,11].

C. Comparison between Homodyne Tomography and Heterodyning We have seen that homodyne tomography allows one to measure any ﬁeld observable f :f ða, ay Þ having normal ordered expansion f :f ðNÞ ða, ay Þ ¼ P 1 ðNÞ yn m and bounded integral in Equation (140). On the other nm¼0 fnm a a hand, as shown in Section II.D, heterodyne detection allows one to measure

254

MAURO D’ARIANO ET AL.

ﬁeld that admit antinormal ordered expansion f :f ðAÞ ða, ay Þ ¼ P1 observables ðAÞ m yn nm¼0 fnm a a , in which case the expectation value is obtained through the heterodyne average Z hfi ¼ C

d 2 ðAÞ f ð , * Þh jj i: p

ð170Þ

As shown in Section II.D, for ¼ 1 the heterodyne probability is just the Q-function Qð , * Þ ¼ ð1=pÞh jj i, whereas for < 1 it is Gaussian convoluted with rms ð1 Þ=, thus giving the Wigner function Ws ð , * Þ, with s ¼ 1 ð2=Þ. Indeed, the problem of measurability of the observable f through heterodyne detection is not trivial, since one needs the admissibility of antinormal ordered expansion and the convergence of the integral in Equation (170). We refer the reader to Refs. [16,59] for more details and to Refs. [58,60] for analysis of quantum state estimates based on heterodyne detection. The additional noise in homodyning the complex ﬁeld a has been evaluated in Equation (161), where we found that homodyning is always more noisy than heterodyning. On the other hand, for other ﬁeld observables it may happen that homodyne tomography is less noisy than heterodyne detection. For example, the added noise in homodyning the intensity aya with respect to direct detection has been evaluated in Equation (150). Analogously, one can easily evaluate the added noise Nhet ½n when heterodyning the photon number n ¼ ay a. According to Equation (56), the random variable corresponding to the photon number for heterodyne detection with quantum eﬃciency is ð Þ ¼ j j2 ð1=Þ. From the relation j j4 ¼ ha2 ay2 i þ 4

1 1 2 haay i þ 2

ð171Þ

one obtains 2

ð Þ ¼ hn2 i þ n

2 1 1 þ 2:

ð172Þ

Upon comparing with Equation (148), one concludes that the added noise in heterodyning the photon number is given by D E 1 Nhet ½n ¼ 2 ðzÞ I2 ¼ 2 ðn 1Þ:

ð173Þ

QUANTUM TOMOGRAPHY

255

With respect to the added noise in homodyning of Equation (150) one has Nhet ½n ¼ N½n

1 1 hn2 i n 2 : 2

ð174Þ

Since hn2 i n2 , we can conclude that homodyning the photon number is less noisy than heterodyning pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ it for suﬃciently low mean photon number hni < (1/2)ð1 þ 1 þ ð4=2 ÞÞ.

V. MULTIMODE HOMODYNE TOMOGRAPHY The generalization of homodyne tomography from a single-mode to a multimode ﬁeld is quite obvious, the estimator of simple operator tensors O ¼ O1 O2 On being just the product of the estimators of each single-mode operator O1 ,O1 , . . . ,On . By linearity, one then obtains also the estimator for arbitrary multimode operators. Such a simple generalization, however, requires a separate homodyne detector for each mode, which is unfeasible when the modes of the ﬁeld are not spatiotemporally separated. This is the case, for example, of pulsed ﬁelds, for which a general multimode tomographic method is especially needed, also due to the problem of mode matching between the local oscillator and the detected ﬁelds (determined by their relative spatiotemporal overlap) [85], which produces a dramatic reduction of the overall quantum eﬃciency. In this section we review the general method of Ref. [17] for homodyning observables of a multimode electromagnetic ﬁeld using a single local oscillator (LO), providing the rule to evaluate the estimator of an arbitrary multimode operator. The expectation value of the operator can then be obtained by averaging the estimator over the homodyne outcomes that are collected using a single LO whose mode randomly scans all possible linear combinations of incident modes. We will then speciﬁcally consider some observables for a two-mode ﬁeld in a state corresponding to a twin-beam produced by parametric downconversion, and prove the reliability of the method on the basis of computer simulations. Finally, we report some experimental results [86] obtained in Prem Kumar’s laboratory at Northwestern University. Such an experiment actually represents the ﬁrst measurement of the joint photon number probability distribution of the twin-beam state.

256

MAURO D’ARIANO ET AL.

A. The General Method The Hilbert–Schmidt operator expansion in Equation (91) can be generalized to any number of modes as follows ( " #) Z 2 Z 2 M X d 2 z0 d z1 d zM y * ... Tr O exp zl al þ zl al O¼ p c p c p c l¼0 " # M X y * z l al z l al , exp Z

ð175Þ

l¼0

where al and ayl , with l ¼ 0, . . . , M and ½al , ayl0 ¼ ll0 , are the annihilation and creation operators of M þ 1 independent modes, and O now denotes an operator over all modes. Using the following hyperspherical parameterization for zl 2 C i ku0 ð~Þei 2 i z1 ¼ ku1 ð~Þei 2 i z2 ¼ ku2 ð~Þei 2

z0 ¼

i i 0 ke cos 1 , 2 i 1 ¼ _ k ei 1 sin 1 cos 2 , 2 i 2 ¼ _ kei 2 sin 1 sin 2 cos 3 , 2 0

¼ _

... i i kuM1 ð~Þei M1 ¼ _ kei M1 sin 1 sin 2 sin M1 cos M , 2 2 i i ¼ kuM ð~Þei M ¼ _ kei M sin 1 sin 2 sin M1 sin M , 2 2

zM1 ¼ zM

ð176Þ

where k 2 ½0, 1Þ; l 2 ½0, 2p for l ¼ 0,1, . . . , M; and l 2 ½0, p=2 for l ¼ 1, 2, . . . ,M, Equation (175) can be rewritten as follows: Z O¼

d½ ~

Z

d½~

Z

þ1

dk 0

2Mþ1 k 1 ~ ~ ~~ Tr½OeikXð, Þ eikXð, Þ : 2 M!

ð177Þ

Here we have used the notation Z

d½ ~ ¼ _

M Z Y l¼0

0

2p

d l , 2p

ð178Þ

257

QUANTUM TOMOGRAPHY

Z

d½~ ¼ _ 2M M!

M Z Y l¼1

p=2

dl sin2ðMlÞþ1 l cos l ,

i 1h Xð~, ~Þ ¼ Ay ð~, ~Þ þ Að~, ~Þ , 2 Að~, ~Þ ¼

M X

ð179Þ

0

ð180Þ

ei l ul ð~Þal :

ð181Þ

l¼0

P 2 ~ From the parameterization in Equation (177), one has M l¼0 ul ðÞ ¼ 1, and y y hence ½Að~, ~Þ, A ð~, ~Þ ¼ 1, namely Að~, ~Þ and A ð~, ~Þ themselves are annihilation and creation operators of a bosonic mode. By scanning all values of l 2 ½0, p=2 and l 2 ½0, 2p , all possible linear combinations of modes al are obtained. For the quadrature operator Xð~, ~Þ in Equation (180), one has the following identity for the moments generating function Z þ1 D E 1 2 ikXð~, ~Þ k ¼ exp e dx eikx p ðx; ~, ~Þ, ð182Þ 8 1 where p ðx; ~, ~Þ denotes the homodyne probability distribution of the quadrature Xð~, ~Þ with quantum eﬃciency . Generally, can depend on the mode itself, i.e., it is a function ¼ ð~, ~Þ of the selected mode. In the following, for simplicity, we assume to be mode independent, however. By taking the ensemble average on each side of Equation (177) and using Equation (182) one has Z Z Z þ1 hOi ¼ d½ ~ d½~ dx p ðx; ~, ~Þ R ½O ðx; ~, ~Þ, ð183Þ 1

where the estimator R ½O ðx; ~, ~Þ has the following expression k R ½O ðx; ~, ~Þ ¼ M!

Mþ1

Z

þ1

dt eð1ðk=2ÞÞtþ2i

h i pﬃﬃﬃ ~ ~ t Tr O e2i ktXð, Þ ,

pﬃﬃﬃ kt x M

0

ð184Þ with k ¼ 2=ð2 1Þ. Equations (183) and (184) allow one to obtain the expectation value hOi for any unknown state of the radiation ﬁeld by averaging over the homodyne outcomes of the quadrature Xð~, ~Þ for ~ and ~ randomly distributed according to d½ ~ and d½~ . Such outcomes can be obtained by using a single LO that is prepared in the multimode coherent i l state M l¼0 jl i with l ¼ e ul ðÞK=2 and K 1. In fact, in this case the

258

MAURO D’ARIANO ET AL.

rescaled zero-frequency photocurrent at the output of a balanced homodyne detector is given by I¼

M 1 X ð * al þ l ayl Þ, K l¼0 l

ð185Þ

which corresponds to the operator Xð~, ~Þ. In the limit of a strong LO (K ! 1), all moments of the current I correspond to the moments of Xð~, ~Þ, and the exact measurement of Xð~, ~Þ is then realized. Notice that for modes al with diﬀerent frequencies, in the d.c. photocurrent in Equation (185) each LO with amplitude l selects the mode al at the same frequency (and polarization). For less-than-unity quantum eﬃciency, Equation (182) holds. Equation (184) can be applied to some observables of interest. In particular, one can estimate the matrix element hfnl gjRjfml gi of the multimode density operator R. This will be obtained by averaging the estimator PM

kMþ1 M! sﬃﬃﬃﬃﬃﬃ) ( M Y pﬃﬃﬃ l ! l l ½i kul ð~Þ l! l¼0

R ½jfml gihfnl gj ðx; ~, ~Þ ¼ ei

Z

l¼0

ðnl ml Þ

þ1

dt etþ2i

l

M PM pﬃﬃﬃ Y kt x Mþ l¼0 ðl l Þ=2

t

0

L ll l ½ku2l ð~Þt ,

l¼0

ð186Þ where l ¼ maxðml , nl Þ, l ¼ minðml , nl Þ, and L n ðzÞ denotes the generalized Laguerre polynomial. For diagonal matrix elements, Equation (186) simpliﬁes to k R ½jfnl gihfnl gj ðx; ~, ~Þ ¼ M!

Mþ1

Z

þ1 0

dt etþ2i

M pﬃﬃﬃ Y kt x M

t

Lnl ½ku2l ð~Þt ð187Þ

l¼0

with Ln ðzÞ denoting the customary Laguerre polynomial in z. Using the following identity [81] L n 0 þ 1 þ þ M þM ðx0 þ x1 þ þ xM Þ X L i00 ðx0 ÞL i11 ðx1 Þ L iMM ðxM Þ, ¼ i0 þi1 þ þiM ¼n

ð188Þ

259

QUANTUM TOMOGRAPHY

from Equation (187) one can easily derive the estimator of the probability P y distribution of the total number of photons N ¼ M l¼0 al al k R ½jnihnj ðx; ~, ~Þ ¼ M!

Mþ1

Z

þ1

dt etþ2i

pﬃﬃﬃ kt x M

0

t LM n ½kt ,

ð189Þ

where jni denotes the eigenvector of N with eigenvalue n. Notice that the estimator in Equation (187) does not depend on the phases l ; only the knowledge of the angles l is needed. For the estimator in Equation (189), even the angles l can be unknown. Now we specialize to the case of only two modes a and b (i.e., M ¼ 1 and ~ is a scalar ). The joint photon number probability distribution is obtained by averaging R ½jn, mihn, mj ðx; , 0 , 1 Þ Z þ1 pﬃﬃﬃ 2 ¼k dt etþ2i kt x t Ln ðkt cos2 ÞLm ðkt sin2 Þ:

ð190Þ

0

The estimator (189) of the probability distribution of the total number of photons can be written as Z R ½jnihnj ðx; ,

0,

1Þ

¼k

þ1

2 0

dt etþ2i

pﬃﬃﬃ kt x

t L1n ½kt :

ð191Þ

For the total number of photons one can also derive the estimator of the moment generating function, using the generating function for the Laguerre polynomials [81]. One obtains R ½za

y

aþby b

ðx; ,

0,

1Þ ¼

1 1 1z 2 ; x : 2, 2 z þ ðð1 zÞ=kÞ ðz þ ðð1 zÞ=kÞÞ2 ð192Þ

For the ﬁrst two moments one obtains the simple expressions 2 ¼ 4x2 þ 2, k 24 6 10 y y 2 4 20 x2 þ 2 þ 4: ð193Þ R ½ða a þ b bÞ ðx; , 0 , 1 Þ ¼ 8x þ R ½ay a þ by b ðx; ,

0,

1Þ

It is worth noting that analogous estimators of the photon number diﬀerence between the two modes are singular and one needs a cutoﬀ

260

MAURO D’ARIANO ET AL.

procedure, similar to the one used in Ref. [87] for recovering the correlation between the modes by means of the customary two-mode tomography. In fact, in order to extract information pertaining to a single mode only one needs a delta-function at ¼ 0 for mode a, or ¼ p=2 for mode b, and, in this case, one could better use the standard one-mode tomography by setting the LO to the proper mode of interest. Finally, we note that for two-mode tomography the estimators can be averaged by the integral Z

2p

hO i ¼ 0

d 0 2p

Z

2p 0

d 1 2p

R ½O ðx; ,

0,

Z

1

1

dðcos 2Þ 2

Z

þ1

1

dx p ðx; ,

0,

1Þ

1Þ

ð194Þ

over the random parameters cosð2Þ, 0 , and 1 . For example, in the case of two radiation modes having the same frequency but orthogonal polarizations, represents a random rotation of the polarizations, whereas 0 and 1 denote the relative phases between the LO and the two modes, respectively. 1. Numerical Results for Two-Mode Fields In this section we report some Monte Carlo simulations from Ref. [17] to judge the experimental working conditions for performing the single-LO tomography on two-mode ﬁelds. We focus our attention on the twin-beam state, usually generated by spontaneous parametric downconversion, namely qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ X 1 ji ¼ SðÞj0ia j0ib ¼ 1 j j2

n jnia jnib ,

ð195Þ

n¼0

where SðÞ ¼ expðay by * abÞ and ¼ e iarg tanhjj. The parameter is related to the average number of photons per beam n ¼ j j2 =ð1 j j2 Þ. For the simulations we need to derive the homodyne probability distribution pðx; , 0 , 1 Þ which is given by pðx; ,

0,

1Þ

¼ Tr U y jx aa hxj 1b Ujihj D E ¼ 0jb 0jSy ðÞU y ½jxiaa hxj 1b USðÞj0 a j0 , a

b

ð196Þ

261

QUANTUM TOMOGRAPHY

where jxia is the eigenvector of the quadrature x ¼ 12 ðay þ aÞ with eigenvalue x and U is the unitary operator achieving the mode transformation Uy

i 0 a cos e U¼ b ei 1 sin

ei 1 sin ei 0 cos

a : b

ð197Þ

In the case of two radiation modes having the same frequency but orthogonal polarizations—the case of Type II phase-matched parametric ampliﬁer—Equation (196) gives the theoretical probability of outcome x for the homodyne measurement at a polarization angle with respect to the polarization of the a mode, and with 0 and 1 denoting the relative phases between the LO and the two modes, respectively. By using the Dirac- representation of the X-quadrature projector Z

þ1

jxihxj ¼ 1

d exp½iðX xÞ , 2p

ð198Þ

Equation (196) can be rewritten as follows [17] Z

E d D y y iðXa xÞ USðÞj0 a j0 a 0jb 0jS ðÞU e b 1 2p 8 9 Z þ1 < i ½ðei 0 cos þ ei 1 * sinÞa = d ix 2 e j0i j0i , ¼ a h0jb h0jexp : ; a b 1 2p þ ðei 0 * cos þ ei 1 sinÞb þ H:c:

pðx; ,

0,

þ1

1Þ ¼

ð199Þ where we have used Equation (197) and the transformation a S ðÞ y SðÞ ¼ b * y

a by

ð200Þ

with ¼ coshjj and ¼ e iarg sinhjj. Upon deﬁning KC ¼ ei 0 cos þ ei 1 * sin , KD ¼ ei 0 * cos þ ei 1 sin ,

ð201Þ

where K 2 R and C, D 2 C, with |C|2 þ |D|2 ¼ 1 one has K 2 ¼ 2 þ j j2 þ 2j j sin 2 cosð

0

þ

1

arg Þ:

ð202Þ

262

MAURO D’ARIANO ET AL.

Now, since the unitary transformation

C D*

D C*

a a ! b b

ð203Þ

has no eﬀect on the vacuum state, Equation (199) leads to the following Gaussian distribution pðx; , 0 , 1 Þ Z þ1 d ix ea ¼ 0jb 0j exp iK ½ðCa þ DbÞ þ H:c: j0 j0 2 1 2p a b

Z þ1 d ix 1 ea a þ ay j0 ¼ ja 0jx=K a j2 ¼ 0j exp iK 2 K 1 2p a 1 x2 ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ exp 2 , 2 ð, 0 , 1 Þ 2p2 ð, 0 , 1 Þ where the variance 2 ð, 2 ð,

0,

1Þ

¼

0,

1Þ

ð204Þ

is given by

K 2 1 þ j j2 þ 2j j sin 2 cosð 0 þ ¼ 4 4ð1 j j2 Þ

1

arg Þ

:

ð205Þ

Taking into account the Gaussian convolution that results from less-thanunity quantum eﬃciency, the variance just increases as 2 ð,

0,

1Þ

! 2 ð,

0,

1Þ

¼ 2 ð,

0,

1Þ

þ

1 : 4

ð206Þ

Notice that the probability distribution in Equation (204) corresponds to a squeezed vacuum for ¼ p=4 and 0 þ 1 arg ¼ 0 or p. We study the tomographic measurement of the joint photon number probability distribution and the probability distribution for the total number of photons with use of the estimators in Equations (190) and (191), respectively. Moreover, using the estimator in Equation (186) we reconstruct the matrix elements Cn,m :a mjb hmjihjnia jn b ,

ð207Þ

QUANTUM TOMOGRAPHY

263

FIGURE 5. Two-mode photon number probability p(n, m) of the twin-beam state in Equation (195) for average number of photons per beam n ¼ 5 obtained by a Monte Carlo simulation with the estimator in Equation (190) and random parameters cos 2, 0, and 1. Left: quantum eﬃciency ¼ 1 and 106 data samples were used in the reconstruction; right: ¼ 0.9, and 5 106 data samples. (From Ref. [17].)

which reveal the coherence of the twin-beam state. Theoretically one should have Cn,m ¼ ð1 j j2 Þ m *n :

ð208Þ

The estimators have been numerically evaluated by applying the Gauss method for calculating the integral in Equation (186), which results in a fast and suﬃciently precise algorithm with the use of just 150 evaluation points. In Figure 5 a Monte Carlo simulation of the joint photon number probability distribution is reported. The simulated values compare very well with the theoretical ones. In Ref. [17] a careful analysis of the statistical errors has been done for various twin-beam states by constructing histograms of deviations of the results from diﬀerent simulated experiments from the theoretical ones. In comparison to the customary two-LO tomography of Ref. [87], where for ¼ 1 the statistical errors saturate for increasingly large n and m, here we have statistical errors that are slowly increasing versus n and m. This is due to the fact that the range of the estimators in Equation (190) increases versus n and m. Overall we ﬁnd that for any given quantum eﬃciency the statistical errors are generally slightly larger than those obtained with the two-LO method. The convenience of using a single LO then comes with its own price tag.

264

MAURO D’ARIANO ET AL.

FIGURE 6. Probability distribution for the total number of photons of the twin beams in Equation (195) for average number of photons per beam n ¼ 2 obtained using the estimator in Equation (191). The oscillation of the total photon number probability due to the perfect correlation of the twin beams has been reconstructed by simulating 107 data samples with quantum eﬃciency ¼ 0:9 (on the left), and 2 107 data samples ¼ 0:8 (on the right). The theoretical probability (thick solid line) is superimposed onto the result of the Monte Carlo experiment; the latter is shown by the thin solid line. Notice the dramatic increase of errors (in gray shade) versus N and for smaller . (From Ref. [17].)

FIGURE 7. Tomographic reconstruction of the matrix elements Cn,m :a hmjb hmjihjnia jnib of the twin beams in Equation (195) for average number of photons per beam n ¼ 2, obtained using the estimator in Equation (186). On the left we used 106 simulated data samples and quantum eﬃciency ¼ 0:9; on the right 3 106 data samples and ¼ 0:8. The coherence of the twin-beam state is easily recognized as Cn,m varies little for n þ m ¼ constant ( in Equation (195) has been chosen real). For a typical comparison between theoretical and experimental matrix elements and their relative statistical errors, see results in Figure 6. (From Ref. [17].)

QUANTUM TOMOGRAPHY

265

FIGURE 8. A schematic of the experimental setup. NOPA: nondegenerate optical parametric ampliﬁer; LOs: local oscillators; PBS: polarizing beam splitter; LPFs: low-

By using the estimator in Equation (191) the probability distribution for the total number of photons N of the twin beams has been also constructed (Figure 6). Notice the dramatic increase of error bars versus N and for smaller . Finally, in Figure 7 we report the results of the tomographic measurement of Cn,m deﬁned in Equation (207). Because the reconstructed Cn,m is close to the theoretically expected value in Equation (208), these reveal the purity of the twin beams, which cannot be inferred from the thermal diagonal distribution of Figure 5. The ﬁrst experimental results of a measurement of the joint photon number probability distribution for a two-mode quantum state created by a nondegenerate optical parametric ampliﬁer has been presented in Ref. [86]. In this experiment, however, the twin beams are detected separately by two balanced-homodyne detectors. A schematic of the experimental setup is reported in Figure 8, and some experimental results are reported in Figure 9. As expected for parametric ﬂuorescence, the experiment has shown a measured joint photon number probability distribution that exhibited up to 1.9 dB of quantum correlation between the two modes, with thermal marginal distributions.

VI. APPLICATIONS TO QUANTUM MEASUREMENTS In this section we review a number of applications of quantum tomography related to some fundamental tests in quantum mechanics.

266

MAURO D’ARIANO ET AL.

FIGURE 9. Left: measured joint photon number probability distribution for the twin-beam state with average number of photons per beam n ¼ 1:5 and 4 105 samples. Right: marginal distribution for the signal beam for the same data. The theoretical distribution is also shown. Very similar results are obtained for the idler beam. (From Ref. [86].)

First, we report the proposal of Ref. [30] for testing the nonclassicality of quantum states by means of an operational criterion based on a set of quantities that can be measured experimentally with some given level of conﬁdence, even in the presence of loss, noise, and less-than-unity quantum eﬃciency. Second, we report the experiment proposed in Ref. [31] for testing quantum state reduction. The state reduction rule is tested using optical homodyne tomography by directly measuring the ﬁdelity between the theoretically expected reduced state and the experimental state. Finally, we review some experimental results obtained at the Quantum Optics Lab of the University of Naples [32] about the reconstruction of coherent signals, together with application to the estimation of the losses introduced by simple optical components.

A. Measuring the Nonclassicality of a Quantum State The concept of nonclassical states of light has received much attention in quantum optics [41,88–96]. The customary deﬁnition of nonclassicality is given in terms of the P-function presented in Section II.A: a nonclassical state does not admit a regular positive P-function representation, namely it cannot be written as a statistical mixture of coherent states. Such states produce eﬀects that have no classical analogue. These kinds of states are of fundamental relevance not only for the demonstration of the inadequacy of classical description, but also for applications, e.g., in the realms of information transmission and interferometric measurements [91,92,95].

QUANTUM TOMOGRAPHY

267

We are interested in testing the nonclassicality of a quantum state by means of a set of quantities that can be measured experimentally with some given level of conﬁdence, even in the presence of loss, noise, and less-thanunity quantum eﬃciency. The positivity of the P-function itself cannot be adopted as a test, since there is no viable method to measure it. As proved in Section IV.A only the generalized Wigner functions of order s < 1 1 can be measured, being the quantum eﬃciency of homodyne detection. Hence, through this technique, all functions from s ¼ 1 to s ¼ 0 cannot be recovered, i.e., we cannot obtain the P-function and all its smoothed convolutions up to the customary Wigner function. For the same reason, the nonclassicality parameter proposed by Lee [41], namely the maximum s-parameter that provides a positive distribution, cannot be experimentally measured. Among the many manifestations of nonclassical eﬀects, one ﬁnds squeezing, antibunching, even–odd oscillations in the photon-number probability, and negativity of the Wigner function [89–91,95,97–100]. Any of these features alone, however, does not represent the univocal criterion we are looking for. Neither squeezing nor antibunching provides a necessary condition for nonclassicality [93]. The negativity of the Wigner function, which is well exhibited by the Fock states and the Schro¨dinger-cat-like states, is absent for the squeezed states. As for the oscillations in the photon number probability, some even–odd oscillations can be simply obtained by using a statistical mixture of coherent states. Many authors [93,94,96] have adopted the nonpositivity of the phaseR 2p averaged P-function FðIÞ ¼ ð1=2pÞ 0 d PðI 1=2 ei Þ as the deﬁnition for a nonclassical state, since FðIÞ < 0 invalidates Mandel’s semiclassical formula [88] of photon counting, i.e., it does not allow a classical description in terms of a stochastic intensity. Of course, some states can exhibit a ‘‘weak’’ nonclassicality [96], namely a positive FðIÞ, but with a nonpositive P-function (a relevant example being a coherent state undergoing Kerr-type self-phase modulation). However, from the point of view of the detection theory, such ‘‘weak’’ nonclassical states still admit a classical description in terms of positive intensity probability FðIÞ > 0. For this reason, we adopt nonpositivity of FðIÞ as the deﬁnition of nonclassicality. 1. Single-Mode Nonclassicality The authors of Refs. [93,94,96] have pointed out some relations between FðIÞ and generalized moments of the photon distribution, which, in turn, can be used to test the nonclassicality. The problem is reduced to an inﬁnite set of inequalities that provide both necessary and suﬃcient conditions for nonclassicality [94]. In terms of the photon number

268

MAURO D’ARIANO ET AL.

probability pðnÞ ¼ hnjjni of the state with density matrix , the simplest suﬃcient condition involves the following three-point relation [94,96] BðnÞ:ðn þ 2ÞpðnÞpðn þ 2Þ ðn þ 1Þ½ pðn þ 1Þ 2 < 0:

ð209Þ

Higher-order suﬃcient conditions involve ﬁve-, seven-, . . . , ð2k þ 1Þ-point relations, always for adjacent values of n. It is suﬃcient that just one of these inequalities is satisﬁed in order to assure the negativity of FðIÞ. Notice that for a coherent state BðnÞ ¼ 0 identically for all n. In the following we show that quantum tomography can be used as a powerful tool for performing the nonclassicality test in Equation (209). For less-than-unity quantum eﬃciency ( < 1), we rely on the concept of a ‘‘noisy state’’ , wherein the eﬀect of quantum eﬃciency is ascribed to the quantum state itself rather than to the detector. In this model, the eﬀect of quantum eﬃciency is treated in a Schro¨dinger-like picture, with the state evolving from to , and with playing the role of a time parameter. Such lossy evolution is described by the master equation [37] @t ðtÞ ¼

2aðtÞay ay aðtÞ ðtÞay a , 2

ð210Þ

wherein ðtÞ: with t ¼ ln =. For the nonclassicality test, reconstruction in terms of the noisy state has many advantages. In fact, for nonunit quantum eﬃciency < 1 the tomographic method introduces errors for pðnÞ which are increasingly large versus n, with the additional limitation that quantum eﬃciency must be greater than the minimum value ¼ 0:5. On the other hand, the reconstruction of the noisy-state probabilities p ðnÞ ¼ hnj jni does not suﬀer such limitations, and even though all quantum features are certainly diminished in the noisystate description, nevertheless the eﬀect of nonunity quantum eﬃciency does not change the sign of the P-function, but only rescales it as follows: PðzÞ ! P ðzÞ ¼

1 Pðz=1=2 Þ:

ð211Þ

Hence, the inequality (209) still represents a suﬃcient condition for nonclassicality when the probabilities pðnÞ ¼ hnjjni are replaced with p ðnÞ ¼ hnj jni, the latter being given by a Bernoulli convolution, as shown in Equation (22). When referred to the noisy-state probabilities p ðnÞ, the inequality in Equation (209) keeps its form and is simply rewritten as follows B ðnÞ:ðn þ 2Þp ðnÞp ðn þ 2Þ ðn þ 1Þ½p ðn þ 1Þ 2 < 0:

ð212Þ

QUANTUM TOMOGRAPHY

269

The quantities BðnÞ and B ðnÞ are nonlinear in the density matrix. Then, they cannot be measured by averaging a suitable estimator over the homodyne data. Hence, in the evaluation of BðnÞ one has to reconstruct the photon number probabilities pðnÞ, using the estimator R ½jnihnj ðx, ’Þ in Equation (100). The noisy-state probabilities p ðnÞ are obtained by using the same estimator for ¼ 1, namely without recovering the convolution eﬀect of nonunit quantum eﬃciency. Notice that the estimator does not depend on the phase of the quadrature. Hence, the knowledge of the phase of the local oscillator in the homodyne detector is not needed for the tomographic reconstruction, and it can be left ﬂuctuating in a real experiment. Regarding the estimation of statistical errors, they are generally obtained by dividing the set of homodyne data into blocks, as shown in Section III.C.1. However, in the present case, the nonlinear dependence on the photon number probability introduces a systematic error that is vanishingly small for increasingly larger sets of data. Therefore, the estimated value of BðnÞ is obtained from the full set of data, instead of averaging the mean value of the diﬀerent statistical blocks. In Figures 10 and 11 some numerical results from Ref. [30] are reported, which are obtained by a Monte Carlo simulation of a quantum tomography experiment. The nonclassicality criterion is tested either on a Schro¨dingercat state j ð Þi / ðj i þ j iÞ or on a squeezed state j , ri:Dð ÞSðrÞj0i, wherein j i, Dð Þ, and SðrÞ denote a coherent state with amplitude ,

FIGURE 10. Tomographic measurement of BðnÞ (dashed trace) with the respective error bars (superimposed in gray-shade) along with the theoretical values (solid trace) for a Schro¨dinger cat state with average photon number n ¼ 5 (left); for a phase-squeezed state with n ¼ 5 and nsq ¼ sinh2 r ¼ 3 squeezing photons (right). In both cases the quantum eﬃciency is ¼ 0:8 and the number of simulated experimental data is 107 . (From Ref. [30].)

270

MAURO D’ARIANO ET AL.

FIGURE 11. Same as Figure 10, but here for B ðnÞ. (From Ref. [30].) y

the displacement operator Dð Þ ¼ e a a , and the squeezing operator y2 2 SðrÞ ¼ erða a Þ=2 , respectively. Figure 10 shows tomographically obtained values of BðnÞ, with the respective error bars superimposed, along with the theoretical values for a Schro¨dinger-cat state and for a phase-squeezed state (r > 0). For the same set of states the results for B ðnÞ obtained by tomographic reconstruction of the noisy state are reported in Figure 11. Let us compare the statistical errors that aﬀect the BðnÞ and B ðnÞ on the original and the noisy states, respectively. In the ﬁrst case the error increases with n, whereas in the second it remains nearly constant, albeit with less marked oscillations in B ðnÞ than those in BðnÞ. The nonclassicality of the states here analyzed is experimentally veriﬁable, as B ð0Þ < 0 by more than ﬁve standard deviations. In contrast, for coherent states one obtains small statistical ﬂuctuations around zero for all n. Finally, we remark that the simpler test of checking for antibunching or oscillations in the photon number probability in the case of the phase-squeezed state (left of Figures 10 and 11) would not reveal the nonclassical features of such a state. *

2. Two-Mode Nonclassicality In Ref. [30] it is also shown how quantum homodyne tomography can also be employed to test the nonclassicality of two-mode states. For a two-mode state nonclassicality is deﬁned in terms of nonpositivity of the following phase-averaged two-mode P-function [96]: 1 FðI1 , I2 , Þ ¼ 2p

Z 0

2p

d1 PðI11=2 ei1 , I21=2 eið1 þÞ Þ:

ð213Þ

QUANTUM TOMOGRAPHY

271

In Ref. [96] it is also proved that a suﬃcient condition for nonclassicality is C ¼ hðn1 n2 Þ2 i ðhn1 n2 iÞ2 hn1 þ n2 i < 0,

ð214Þ

where n1 and n2 are the photon number operators of the two modes. A tomographic test of the inequality in Equation (214) can be performed by averaging the estimators for the involved operators using Table 1. Again, the value ¼ 1 can be used to reconstruct the ensemble averages of the noisy state . As an example, we consider the twin-beam state of Equation (195). The theoretical value of C is given by C ¼ 2j j2 =ð1 j j2 Þ < 0. With regard to the eﬀect of quantum eﬃciency < 1, the same argument still holds as for the single-mode case: one can evaluate C for the twin beams degraded by the eﬀect of loss, and use ¼ 1 in the estimators. In this case, the theoretical value of C is simply rescaled, namely C ¼ 22 j j2 =ð1 j j2 Þ:

ð215Þ

In Figure 12 we report C vs. 1 , with ranging from 1 to 0.3 in steps of 0.05, for the twin beam in Equation (195) with j j2 ¼ 0:5, corresponding to a total average photon number hn1 þ n2 i ¼ 2. The values of C result from

FIGURE 12. Tomographic measurement of the nonclassical parameter C for twin beams in Equation (195) with j j2 ¼ 0:5. The results are shown for diﬀerent values of the quantum eﬃciency (in steps of 0.05), and for each value the number of simulated data is 4 105 . Statistical errors are shown in the gray shade. (From Ref. [30].)

272

MAURO D’ARIANO ET AL.

a Monte Carlo simulation of a homodyne tomography experiment with a sample of 4 105 data. The nonclassicality test in terms of the noisy state gives values of C that are increasingly near the classically positive region for decreasing quantum eﬃciency . However, the statistical error remains constant and is suﬃciently small to allow recognition of the nonclassicality of the twin beams up to ¼ 0:3. We conclude that quantum homodyne tomography allows one to perform nonclassicality tests for single- and two-mode radiation states, even when the quantum eﬃciency of homodyne detection is rather low. The method involves reconstruction of the photon number probability or of some suitable function of the number operators pertaining to the noisy state, namely the state degraded by the less-than-unity quantum eﬃciency. The noisy-state reconstruction is aﬀected by the statistical errors; however, they are suﬃciently small that the nonclassicality of the state can be tested even for low values of . For the cases considered here, we have shown that the nonclassicality of the states can be proved (deviation from classicality by many error bars) with 105 –107 homodyne data. Moreover, since the knowledge of the phase of the local oscillator in the homodyne detector is not needed for the tomographic reconstruction, it can be left ﬂuctuating in a real experiment.

B. Test of State Reduction In quantum mechanics the state reduction (SR) is still a much discussed rule. The so-called ‘‘projection postulate’’ was introduced by von Neumann [2] to explain the results from the Compton–Simons experiment, and it was generalized by Lu¨ders [101] for measurements of observables with degenerate spectrum. The consistency of the derivation of the SR rule and its validity for generic measurements have been analyzed with some criticism [102]. In a very general context, the SR rule was derived in a physically consistent way from the Schro¨dinger equation for the composite system of object and measuring apparatus [103]. An experiment for testing quantum SR is therefore a very interesting matter. Such a test in general is not equivalent to a test of the repeatability hypothesis since the latter holds only for measurements of observables that are described by self-adjoint operators. For example, joint measurements like the Arthurs–Kelly [54] are not repeatable, as the reduced states are coherent states, which are not orthogonal. Quantum optics oﬀers a possibility of testing the SR, because several observables can be chosen to perform diﬀerent measurements on a ﬁxed system. For instance, one can decide to perform either homodyne or

273

QUANTUM TOMOGRAPHY

heterodyne, or photon number detection. This is a unique opportunity; in contrast, in particle physics the measurements are mostly quasiclassical and restricted to only a few observables. In addition, optical homodyne tomography allows a precise determination of the quantum system after the SR. A scheme for testing the SR could be based on tomographic measurements of the radiation density matrix after nondemolition measurements. However, such a scheme would reduce the number of observables that are available for the test. Instead, one can take advantage of the correlations between the twin beams of Equation (195) produced by a nondegenerate optical parametric ampliﬁer (NOPA), in which case one can test the SR even for demolitive-type measurements. Indeed, if a measurement is performed on one of the twin beams, the SR can be tested by homodyne tomography on the other beam. This is precisely the scheme for an experimental test of SR proposed in Ref. [31], which is reviewed in the following. The scheme for the SR test is given in Figure 13. Diﬀerent kinds of measurements can be performed on beam 1, even though here the SR only for heterodyne detection and photon number detection will be considered.

KTP

Re α

Beam 1 Heterodyne

Im α

Beam 2

LO

LO φ Homodyne

Data Analysis

FIGURE 13. Schematic of the proposed scheme for testing the SR for heterodyne detection. A NOPA generates a pair of twin beams (1 and 2). After heterodyning beam 1, the reduced state of beam 2 is analyzed by homodyne tomography, which is conditioned by the heterodyne outcome. In place of the heterodyne detector one can put any other kind of detector for testing the SR on diﬀerent observables. We also consider the case of direct photodetection. (From Ref. [31].)

274

MAURO D’ARIANO ET AL.

For a system described by a density operator , the probability pðlÞdl that the outcome of a quantum measurement of an observable is in the interval ½l, l þ dlÞ is given by Born’s rule pðlÞdl ¼ Tr½ l dl , where l is the POVM pertaining to the measurement that satisﬁes l 0 and R dl l ¼ I. For an exact measurement of an observable, which is described by a self-adjoint operator, l is just the projector over the eigenvector corresponding to the outcome l. In the case of the photon number ay a the spectrum is discrete and the POVM is m ¼ jmi hmj for integer eigenvalue m. For the Arthurs–Kelly joint measurement of the position and momentum (corresponding to a joint measurement of two conjugated quadratures of the ﬁeld) we have the coherent-state POVM ¼ p1 j i h j. When on beam 1 we perform a measurement described by l , the reduced normalized state of beam 2 is ðÞ ¼

Tr1 ½j i h jð 1Þ y ¼ , Tr1,2 ½j i h jð 1Þ pðÞ

ð216Þ

where O denotes the transposed operator (on a ﬁxed basis), ¼ y ð1 j j2 Þ1=2 a a , and pðlÞ ¼ Tr1,2 ½ l y is the probability density of the measurement outcome l. In the limit of inﬁnite gain j j ! 1 one has ðlÞ / l . For example, for heterodyne detection with outcome , we have ð Þ ¼ j * i h * j. If the readout detector on beam 1 has quantum eﬃciency r , Equation (216) is replaced with r ðÞ ¼

ðr Þ y , pr ðÞ

ð217Þ

where pr ðlÞ ¼ Tr1,2 ½ ðlr Þ y , and lr is the POVM for measurement with quantum eﬃciency r . As shown in Section II.D, for heterodyne detection one has the Gaussian convolution r ¼

1 p

Z C

d 2 z ðjz j2 =2r Þ e jzi hzj, p2r

ð218Þ

with 2r ¼ ð1 r Þ=r . For direct photodetection m ¼ jmi hmj is replaced with the Bernoulli convolution mr ¼

1 X j jm j ji h jj: m r ð1 r Þ m j¼m

ð219Þ

QUANTUM TOMOGRAPHY

275

The experimental test proposed here consists of performing conditional homodyne tomography on beam 2, given the outcome l of the measurement on beam 1. We can directly measure the ‘‘ﬁdelity of the test’’ FðÞ ¼ Tr½r ðÞ meas ðÞ ,

ð220Þ

where r ðlÞ is the theoretical state in Equation (217), and meas ðlÞ is the experimentally measured state on beam 2. Notice that we use the term ‘‘ﬁdelity’’ even if FðlÞ is a proper ﬁdelity when at least one of the two states is pure, which occurs in the limit of unit quantum eﬃciency r . In the following we evaluate the theoretical value of FðlÞ and compare it with the tomographic measured value. The ﬁdelity (220) can be directly measured by homodyne tomography using the estimator for the operator r ðlÞ, namely Z

p

FðÞ ¼ 0

d’ p

Z

þ1 1

dx ph ðx, ’; ÞRh ½r ðÞ ðx, ’Þ,

ð221Þ

where ph ðx, ’; lÞ is the conditional homodyne probability distribution for outcome l at the readout detector. For heterodyne detection on beam 1 with outcome 2 C, the reduced state on beam 2 is given by the displaced thermal state y

r ð Þ ¼ DðÞð1 Þa a Dy ðÞ,

ð222Þ

where ¼ 1 þ ðr 1Þj j2 ,

¼

r * :

ð223Þ

The estimator in Equation (221) is given by 2h 1 2h 2 1, ; ðx ’ Þ , Rh ½ ð Þ ðx, ’Þ ¼ 2 2h 2h r

ð224Þ

where ’ ¼ Re ðei’ Þ, and ða, b; zÞ denotes the customary conﬂuent hypergeometric function. The estimator in Equation (224) is bounded for h > ð1=2Þ , then one needs to have h >

1 1 j j2 ð1 r Þ : 2

ð225Þ

276

MAURO D’ARIANO ET AL.

As one can see from Equation (225), for h > 0:5 the ﬁdelity can be measured for any value of r and any gain parameter of the NOPA. We recall that the condition h > 0:5 is required for the measurement of the density matrix. However, in this direct measurement of the ﬁdelity, the reconstruction of the density matrix is bypassed, and we see from Equation (225) that the bound h ¼ 0:5 can be lowered. The measured ﬁdelity F( ) in Equation (221) with r ð Þ as given in Equation (222) must be compared with the theoretical value Fth ¼ =ð2 Þ,

ð226Þ

that is independent of . For direct photodetection on beam 1 with outcome n, the reduced state on beam 2 is given by r ðnÞ ¼

1

n

ay a

!

n

y

ð1 Þa a :

ð227Þ

The estimator for the ﬁdelity measurement is ð @z Þn 2h 1 2h ð zÞ 2 1, ; x : Rh ½ ðnÞ ðx, ’Þ ¼ 2 2h þ z n! z¼0 2h þ z ð228Þ r

We see that the same bound of Equation (225) holds. In this case the measured ﬁdelity FðnÞ must be compared with the theoretical value

Fth ðnÞ ¼ 2þ2n F 2n þ 1, 2n þ 1; 1; ð1 Þ2 ,

ð229Þ

where Fða, b; c; zÞ denotes the customary hypergeometric function. Several simulations have been reported in Ref. [31] for both heterodyne and photodetection on beam 1. In the former case the quadrature probability distribution has been simulated, pertaining to the reduced state (222) on beam 2, and averaged the estimators in Equation (224). In the latter case the reduced state (227) and the estimators in Equation (228) have been used. Numerical results for the ﬁdelity were thus obtained for diﬀerent values of the quantum eﬃciencies r and h , and of the NOPA gain parameter . A decisive test can be performed with samples of just a few thousand measurements. The statistical error in the measurement was found to be rather insensitive to both quantum eﬃciencies and NOPA gain.

QUANTUM TOMOGRAPHY

277

C. Tomography of Coherent Signals and Applications Quantum homodyne tomography has been proved useful in various experimental situations, such as for measuring the photon statistics of a semiconductor laser [10], for determining the density matrix of a squeezed vacuum [11] and the joint photon number probability distribution of a twin beam created by a nondegenerate optical parametric ampliﬁer [86], and for reconstructing the quantum states of spatial modes with an array detector [104]. In this section we review some experimental results about homodyne tomography with coherent states, with application to the estimation of the loss introduced by simple optical components [32]. The experiment has been performed in the Quantum Optics Lab of the University of Naples, and a schematic is presented in Figure 14. The principal radiation source is provided by a monolithic Nd : YAG laser ( 50 mW at 1064 nm; Lightwave, model 142). The laser has a linewidth of less than 10 kHz/ms with a frequency jitter of less than 300 kHz/s, while its intensity spectrum is shot–noise limited above 2.5 MHz. The laser emits a linearly polarized beam in a TEM00 mode, which is split in two parts by a beam splitter. One part provides the strong local oscillator for the homodyne detector. The other part, typically less than 200 W, is the homodyne signal. The optical paths traveled by the local oscillator and

FIGURE 14. Schematic of the experimental setup. A Nd : YAG laser beam is divided into two beams, the ﬁrst acting as a strong local oscillator, the second representing the signal beam. The signal is modulated at frequency with a deﬁned modulation depth to control the average photon number in the generated coherent state. The tomographic data are collected by a homodyne detector whose diﬀerence photocurrent is demodulated and then acquired by a digital oscilloscope. (From Ref. [32].)

278

MAURO D’ARIANO ET AL.

the signal beams are carefully adjusted to obtain a visibility typically above 75% measured at one of the homodyne output ports. The signal beam is modulated by means of a phase electrooptic modulator (EOM, Linos Photonics PM0202), at 4 MHz, and a halfwave plate (HWP2, HWP3) is mounted in each path to carefully match the polarization state at the homodyne input. The detector is composed of a 50/50 beam splitter (BS), two ampliﬁed photodiodes (PD1, PD2), and a power combiner. The diﬀerence photocurrent is demodulated at 4 MHz by means of an electrical mixer. In this way the detection occurs outside any technical noise and, more important, in a spectral region where the laser does not carry excess noise. The phase modulation added to the signal beam moves a certain number of photons, proportional to the square of the modulation depth, from the carrier optical frequency ! to the side bands at ! so generating two weak coherent states with engineered average photon number at frequencies ! . The sum sideband mode is then detected as a controlled perturbation attached to the signal beam. The demodulated current is acquired by a digital oscilloscope (Tektronix TDS 520D) with 8-bit resolution and record length of 250,000 points per run. The acquisition is triggered by a triangularshaped waveform applied to the PZT mounted on the local oscillator path. The piezo ramp is adjusted to obtain a 2p phase variation between the local oscillator and the signal beam in an acquisition window. The homodyne data to be used for tomographic reconstruction of the state have been calibrated according to the noise of the vacuum state. This is obtained by acquiring a set of data leaving the signal beam undisturbed while scanning the local oscillator phase. It is important to note that in the case of the vacuum state no role is played by the visibility at the homodyne beam splitter. The tomographic samples consist of N homodyne data fxj , ’j gj¼1,:::,N with phases ’j equally spaced with respect to the local oscillator. Since the piezo ramp is active during the whole acquisition time, we have a single value xj for any phase ’j . From calibrated data we ﬁrst reconstruct the quantum state of the homodyne signal. According to the experimental setup, we expect a coherent signal with nominal amplitude that can be adjusted by varying the modulation depth of the optical mixer. However, since we do not compensate for the quantum eﬃciency of photodiodes in the homodyne detector (^90%) we expect to reveal coherent signals with reduced amplitude. In addition, the amplitude is further reduced by the nonmaximum visibility (ranging from 75 to 85%) at the homodyne beam splitter. In Figure 15 we report a typical reconstruction, together with the reconstruction of the vacuum state used for calibration. For both states, we

QUANTUM TOMOGRAPHY

279

FIGURE 15. Reconstruction of the quantum state of the signal, and of the vacuum state used for calibration. For both states, from left to right, we report the raw data, a histogram of the photon number distribution, and a countour plot of the Wigner fumction. The reconstruction has been performed by a smaple of N ¼ 242250 homodyne data. The coherent signal has an estimated average photon number equal to hayai ¼ 8.4. The solid line denotes the theoretical photon distribution of a coherent state with such number of photons. Statistical errors on matrix elements are about 2%. The slight phase asymmetry in the Wigner distribution corresponds to a value of about 2% of the maximum. (From Ref. [32].)

report the raw data, the photon number distribution nn , and a contour plot of the Wigner function. The matrix elements are obtained by sampling the corresponding estimators in Equation (100), whereas pﬃﬃﬃﬃthe conﬁdence intervals for diagonal elements are given by nn ¼ = N , being the rms deviation of the estimator over data. For oﬀ-diagonal elements the conﬁdence intervals are evaluated for the real and imaginary part separately. In order to see the quantum state as a whole, we also report the reconstruction of the Wigner function of the ﬁeld, which can be expressed in terms of the matrix elements as the discrete Fourier transform Wð , * Þ ¼ Re

1 X d¼0

eid’

1 X n¼0

ðn, d; j jÞn,nþd

ð230Þ

280

MAURO D’ARIANO ET AL.

where ’ ¼ arg , and sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 n! e2j j Ldn ðj2 j2 Þ, ðn, d; j jÞ ¼ ðÞ 2ð2 d0 Þj2 j ðn þ dÞ! n

d

ð231Þ

Ldn ðxÞ denoting the Laguerre polynomials. Of course, the series in Equation (230) should be truncated at some point, and therefore the Wigner function can be reconstructed only at some ﬁnite resolution. Once the coherence of the signal has been established we may use homodyne tomography to estimate the loss imposed by a passive optical component like an optical ﬁlter. The procedure may be outlined as follows. We ﬁrst estimate the initial mean photon number n0 ¼ j 0 j2 of the signal beam, and then the same quantity inserting an optical neutral density ﬁlter in the signal path. If is the loss parameter, then the coherent amplitude is reduced to ¼ 0 e , and the intensity to n ¼ n0 e2 . The estimation of the mean photon number can be performed adaptively on data, using the general method presented in Section III.D.2. One takes the average of the estimator 1 R½ay a ðx, ’Þ ¼ 2x2 þ ei2’ þ * ei2’ , 2

ð232Þ

where is a parameter to be determined in order to minimize ﬂuctuations. As proved in Ref. [22] one has ¼ 1=2hay2 i, which itself can be obtained from homodyne data. In practice, one uses the data sample twice: ﬁrst to evaluate , then to obtain the estimate for the mean photon number. In Figure 16 the tomographic determinations of n are compared with the expected values for three sets of experiments, corresponding to three diﬀerent initial amplitudes. The expected values are given by n ¼ n0 e2 V, where is the value obtained by comparing the signal d.c. currents I0 and I at the homodyne photodiodes and V ¼ V =V 0 is the relative visibility. The solid line in Figure 16 denotes these values. The line is not continuous due to variations of visibility. It is apparent from the plot that the estimation is reliable in the whole range of values we could explore. It is worth noting that the estimation is absolute, i.e., it does not depend on the knowledge of the initial amplitude, and it is robust, since it can be performed independently of the quantum eﬃciency of the homodyne detector. One may notice that the estimation of loss can be pursued also by measuring an appropriate observable, typically the intensity of the light beam with and without the ﬁlter. However, this is a concrete possibility only

QUANTUM TOMOGRAPHY

281

FIGURE 16. Estimation of the mean photon number of a coherent signal as a function of the loss imposed by an optical ﬁlter. Three sets of experiments, corresponding to three diﬀerent initial amplitudes are reported. Open circles are the tomographic determinations, whereas the solid lines denote the expected values, as follow from nominal values of loss and visibility at the homodyne detector. Statistical errors are within the circles (From Ref. [32].)

for high-amplitude signals, whereas losses on weak coherent states cannot be properly characterized neither by direct photocounting using photodiodes (due to the low quantum eﬃciency and large ﬂuctuations) nor by avalanche photodetectors (due to the impossibility of discriminating among the number of photons). On the contrary, homodyne tomography provides the mean intensity (actually the whole photon distribution) independent of the signal level, thus allowing a precise characterization also in the quantum regime. Indeed, in Ref. [22] adaptive tomographic determination of the mean photon number has been extensively applied to (numerically simulated) homodyne data for coherent states of various amplitudes. The analysis has shown that the determination is reliable also for small samples and that precision is not much aﬀected by the intensity of the signal.

VII. TOMOGRAPHY OF A QUANTUM DEVICE If we want to determine experimentally the operation of a quantum device, we need, by deﬁnition, quantum tomography. In fact, the characterization of the device operation could be done by running a basis of possible known inputs, and determining the corresponding outputs by quantum tomography. In quantum mechanics the inputs are density operators, and the role

282

MAURO D’ARIANO ET AL.

of the transfer matrix is played by the so-called quantum operation of the device, here denoted by E. Thus the output state out (a part from a possible normalization) is given by the quantum operation applied to the input state as follows out ¼ Eðin Þ:

ð233Þ

Since the set of states actually belongs to a space of operators, this means that if we want to characterize E completely, we need to run a complete orthogonal basis of quantum states jni ðn ¼ 0, 1, 2, . . .Þ, along with their pﬃﬃﬃ linear combinations ð1= 2Þðjn0 i þ ik jn00 iÞ, with k ¼ 0, 1, 2, 3 and i denoting the imaginary unit. However, the availability of such a set of states in the laboratory is, by itself, a very diﬃcult technological problem. For example, for an optical device, the states jni are those with a precise number n of photons, and, apart from very small n—say at most n ¼ 2—they have never been achieved in the laboratory, whereas preparing their superpositions remains a dream for experimentalists, especially if n 1 (a kind of Schrodinger kitten state). The idea of achieving the quantum operation of a device by scanning the inputs and making tomography of the corresponding output is the basis of the early methods proposed in Refs. [105,106]. Due to the mentioned problems of the availability of input states, both methods have limited application. The method of Ref. [105] has been designed for NMR quantum processing, whereas the method of Ref. [106] was conceived for determining the Liouvillian of a phase-insensitive ampliﬁer, namely for a case in which the quantum operation has no oﬀ-diagonal pﬃﬃﬃ matrix elements, to evaluate which one needs the superpositions ð1= 2Þðjn0 i þ ik jn00 iÞ with k ¼ 0,1,2,3 mentioned above. The problem of availability of input states and their superpositions was partially solved by the method of Ref. [107], where it was suggested to use randomly drawn coherent states to estimate the quantum operation of an optical device via a maximum likelihood approach. This method, however, cannot be used for quantum systems diﬀerent from the em radiation—such as ﬁnite dimensional systems, i.e., qubits—due to the peculiarity of coherent states. The solution to the problem came with the method of Ref. [25], where the problem of the availability of input states was solved by using a single bipartite entangled input, which is equivalent to run all possible input states in a kind of ‘‘quantum parallel’’ fashion (bipartite entangled states are nowadays easily available in most quantum systems of interest). The method is also very simple and eﬀective, and its experimental feasibility (for single-photon polarization-encoded qubits) has been already demonstrated in an experiment performed in the Francesco De Martini laboratory in Roma La Sapienza [108]. In the next sections

QUANTUM TOMOGRAPHY

283

we will review the general method and report some computer simulated results from Ref. [25]. A. The Method As already mentioned, the description of a general state transformation in quantum mechanics is given in terms of the so-called quantum operation. The state transformation due to the quantum operation E is given as follows !

EðÞ : TrðEðÞÞ

ð234Þ

The transformation occurs with probability given by p ¼ Tr½EðÞ 1. The quantum operation E is a linear, trace-decreasing completely positive (CP) map. We remember that a map is completely positive if it preserves positivity generally when applied locally to an entangled state. In other words, upon denoting by I the identical map on the Hilbert space K of a second quantum system, the extended map E I on H K is positive for any extension K. Typically, the CP map is written using a Kraus decomposition [109] as follows EðÞ ¼

X

Kn Kny ,

ð235Þ

n

where the operators Kn satisfy X

Kny Kn I:

ð236Þ

n

The transformation (235) occurs with generally nonunit probability Tr½EðÞ 1, and the probability is unity independent of when E is trace-preserving, i.e., when we have the equal sign in Equation (236). The particular case of unitary transformations corresponds to having just one term K1 ¼ U in the sum (235), with U unitary. However, one can consider also nonunitary operations with one term only, namely EðÞ ¼ AAy ,

ð237Þ

where A is a contraction, i.e., jjAjj 1. Such operations leave pure states as pure, and describes, for example, the state reduction from a measurement apparatus for a particular ﬁxed outcome that occurs with probability Tr½Ay A 1.

284

MAURO D’ARIANO ET AL.

In the following we will use the notation for bipartite pure states introduced in Equation (45), and we will denote by O and O* the transposed and the conjugate operator of O with respect to some prechosen orthonormal basis. The basic idea of the method in Ref. [25] is the following. An unknown quantum operation E can be determined experimentally through quantum tomography, by exploiting the following one-to-one correspondence: E $ RE between quantum operations E and positive operators RE on two copies of the Hilbert space H H RE ¼ E I ðjIii hhIjÞ,

EðÞ ¼ Tr2 ½I RE :

ð238Þ

Notice that the vector jIii represents a (unnormalized) maximally entangled state. If we consider a bipartite input state j ii and operate with E only on one Hilbert space as in Figure 17, the output state is given by Rð Þ:E I ðj ii hh jÞ: For invertible

ð239Þ

the two matrices R(I):RE and R( ) are related as follows RðIÞ ¼ ðI

1

Rð ÞðI

1*

Þ:

ð240Þ

Hence, the (four-index) quantum operation matrix RE can be obtained by estimating via quantum tomography the following ensemble averages

hhi, jjRðIÞjl, kii ¼ Tr Rð Þ jli hij

1*

jki h jj

1*

:

ð241Þ

FIGURE 17. General scheme of the method for the tomographic estimation of a quantum operation. Two identical quantum systems are prepared in a bipartite state j ii, with invertible . One of the two systems undergoes the quantum operation E, whereas the other is left untouched. At the output one performs a quantum tomographic estimation, by measuring jointly two observables Xl and Xl0 from two quorums {Xl} and {Xl0 } for the two Hilbert spaces, such as two diﬀerent quadratures of the two ﬁeld modes in a two-mode homodyne tomography. (From Ref. [25].)

QUANTUM TOMOGRAPHY

285

Then one simply has to perform a quantum tomographic estimation, by measuring jointly two observables Xl and Xl0 from two quorums {Xl} and {Xl0 } for the two entangled quantum systems. B. An Example in the Optical Domain In Ref. [25] it is shown that the proposed method for quantum tomography of a device can be actually performed using joint homodyne tomography on a twin-beam from downconversion of vacuum, with an experimental setup similar to that used in the experiment in Ref. [86]. The feasibility analysis considers, as an example, the experimental determination of the quantum operation corresponding to the unitary displacement operator y DðzÞ ¼ eza z* a . The pertaining matrix R(I) is given by RðIÞ ¼ jDðzÞii hhDðzÞj,

ð242Þ

which is the (unnormalizable) eigenstate of the operator a by with eigenvalue z, as shown in Section II.D. As an input bipartite state, one uses the twin beam from parametric downconversion of Equation (195), which is clearly invertible, since ¼

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ y 1 j j2 a a ,

1

1 y ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ a a : 2 1 j j Þ

ð243Þ

The experimental apparatus is the same as in the experiment of Ref. [86], where the twin beam is provided by a nondegenerate optical parametric ampliﬁer (a KTP crystal) pumped by the second harmonic of a Q-switched mode-locked Nd : YAG laser, which produces a 100-MHz train of 120-ps duration pulses at 1064 nm. The orthogonally polarized twin beams emitted by the KTP crystal (one of which is displaced DðzÞ by a nearly transparent beam splitter with a strong local oscillator) are separately detected by two balanced homodyne detectors that use two independent local oscillators derived from the same laser. This provides the joint tomography of quadratures X0 X00 needed for the reconstruction. The only experimental problem which still needs to be addressed (even though is practically solvable) with respect to the original experiment of Ref. [86] is the control of the quadrature phases 0 and 00 with respect to the LO, which in the original experiment were random. In Figure 18 the results of a simulated experiment are reported, for displacement parameter z ¼ 1, and for some typical values of the quantum eﬃciency at homodyne detectors and of the total average photon

286

MAURO D’ARIANO ET AL.

FIGURE 18. Homodyne tomography of the quantum operation corresponding to the unitary displacement operator DðzÞ, with z ¼ 1. The reconstructed diagonal elements Ann ¼ hnjDðzÞjni are shown (thin solid lines on an extended abscissa range, with their respective error bars in gray shade), compared to the theoretical value (thick solid lines). Similar results are obtained for oﬀ-diagonal terms. The reconstruction has been achieved using at the input the twin beam state of Equation (195), with total average photon number n and quantum eﬃciency at homodyne detectors . Left: n ¼ 5, ¼ 0:9, and 150 blocks of 104 data have been used. Right: n ¼ 3, ¼ 0:7, and 300 blocks of 2 105 data have been used. (From Ref. [25].)

number n of the twin beam. The diagonal elements Ann ¼ hnjDðzÞjni ¼ ½hnjhnjRDðzÞ jnini 1=2 are plotted for the displacement operator with z ¼ 1. The reconstructed values are shown by thin solid lines on an extended abscissa range, with their respective error bars in gray shade, and compared to the theoretical probability (thick solid line). A good reconstruction of the matrix can be achieved in the given range with n 1, quantum eﬃciency as low as ¼ 0.7, and 106 –107 data. The number of data can be decreased by a factor of 100–1000 using the tomographic max-likelihood techniques of Ref. [23], at the expense, however, of the complexity of the algorithm. Improving the quantum eﬃciency and increasing the ampliﬁer gain (toward a maximally entangled state) have the eﬀect of making statistical errors smaller and more uniform versus the photon labels n and m of the matrix Anm . It is worth emphasizing that the quantum tomographic method of Ref. [25] for measuring the matrix of a quantum operation can be much improved by means of a max-likelihood strategy aimed at the estimation of some unknown parameters of the quantum operation. In this case, instead of obtaining the matrix elements of R(I) from the ensemble averages in Equation (241), one parametrizes R(I) in terms of unknown quantities to be experimentally determined, and the likelihood is maximized for the set of experimental data at various randomly selected (tensor) quorum elements,

QUANTUM TOMOGRAPHY

287

keeping the same ﬁxed bipartite input state. This method is especially useful for a very precise experimental comparison between the characteristics of a given device (e.g., the gain and loss of an active ﬁber) and those of a quantum standard reference.

VIII. MAXIMUM LIKELIHOOD METHOD IN QUANTUM ESTIMATION Quantum estimation of states, observables, and parameters is, from very basic principles, a matter of statistical inference from a population sampling, and the most comprehensive quantum estimation procedure is quantum tomography. As we have shown in Section III, the expectation value of an operator is obtained by averaging an estimator over the experimental data of a ‘‘quorum’’ of observables. The method is very general and eﬃcient, however, in the averaging procedure, we have ﬂuctuations which result in relatively large statistical errors. Another relevant strategy, the maximum likelihood (ML) method, can be used for measuring unknown parameters of transformation on a given state [33], or for measuring the matrix elements of the density operator itself [23]. The ML strategy [110,111] is an entirely diﬀerent approach to quantum state measurement compared to the standard quantum tomographic techniques. The ML procedure consists in ﬁnding the quantum state, or the value of the parameters, that are most likely to generate the observed data. This idea can be quantiﬁed and implemented using the concept of the likelihood functional. As regards state estimation, the ML method estimates the quantum state as a whole. Such a procedure incorporates a priori knowledge about relations between elements of the density matrix. This guarantees positivity and normalization of the matrix, with the result of a substantial reduction of statistical errors. Regarding the estimation of speciﬁc parameters, we notice that in many cases the resulting estimators are eﬃcient, unbiased, and consistent, thus providing a statistically reliable determination. As we will show, by using the ML method only small samples of data are required for a precise determination. However, we want to emphasize that such a method is not always the optimal solution of the tomographic problem, since it suﬀers from some major limitations. Besides being biased due to the Hilbert space truncation—even though the bias can be very small if, from other methods, we know where to truncate—it cannot be generalized to the estimation of any ensemble average, but just of a set of

288

MAURO D’ARIANO ET AL.

parameters on which the density matrix depends. In addition, for increasing number of parameters the method has exponential complexity. In the following we will review the ML methods proposed in Refs. [23] and [33], by deriving the likelihood functional, and applying the ML method to the quantum state reconstruction, with examples for both radiation and spin systems, and, ﬁnally, considering the ML estimation for the relevant class of Gaussian states in quantum optics. A. Maximum Likelihood Principle Here we brieﬂy review the theory of the ML estimation of a single parameter. The generalization to several parameters, as for example the elements of the density matrix, is straightforward. The only point that should be carefully analyzed is the parameterization of the multidimensional quantity to be estimated. In the next section the speciﬁc case of the density matrix will be discussed. Let p(xjl) the probability density of a random variable x, conditioned to the value of the parameter l. The form of p is known, but the true value of l is unknown, and will be estimated from the result of a measurement of x. Let x1 , x2 , . . . , xN be a random sample of size N. The joint probability density of the independent random variable x1 , x2 , . . . , xN (the global probability of the sample) is given by Lðx1 , x2 , . . . , xN jÞ ¼ N k¼1 pðxk jÞ,

ð244Þ

and is called the likelihood function of the given data sample (hereafter we will suppress the dependence of L on the data). The maximum likelihood estimator (MLE) of the parameter l is deﬁned as the quantity lml :lml ðfxk gÞ that maximizes LðlÞ for variations of l, namely lml is given by the solution of the equations @LðÞ ¼ 0; @

@2 LðÞ < 0: @2

ð245Þ

The ﬁrst equation is equivalent to @L=@l ¼ 0 where LðÞ ¼ log LðÞ ¼

N X k¼1

is the so-called log-likelihood function.

log pðxk jÞ

ð246Þ

QUANTUM TOMOGRAPHY

289

In order to obtain a measure for the conﬁdence interval in the determination of lml we consider the variance 2 ¼

Z "Y

# dxk pðxk jÞ ½ml ðfxk gÞ 2 :

ð247Þ

k

In terms of the Fisher information Z F¼

@pðxjÞ 2 1 , dx @ pðxjÞ

ð248Þ

it is easy to prove that 2

1 , NF

ð249Þ

where N is the number of measurements. The inequality in Equation (249) is known as the Crame´r–Rao bound [112] on the precision of the ML estimation. Notice that this bound holds for any functional form of the probability distribution pðxjlÞ, provided that the Fisher information exists 8l and @l pðxjlÞ exists 8x. When an experiment has ‘‘good statistics’’ (i.e., for a large enough data sample) the Crame´r–Rao bound is saturated. B. ML Quantum State Estimation In this section we review the method of the maximum likelihood estimation of the quantum state of Ref. [23], focusing attention to the cases of homodyne and spin tomography. We consider an experiment consisting of N measurements performed on identically prepared copies of a given quantum system. Each measurement is described by a positive operator-valued measure (POVM). The outcome of the ith measurement corresponds to the realization of a speciﬁc element of the POVM used in the corresponding run, and we denote this element by i . The likelihood is here a functional of the density matrix LðÞ and is given by the product LðÞ ¼

N Y

Trði Þ,

ð250Þ

i¼1

which represents the probability of the observed data. The unknown element of the above expression, which we want to infer from data, is the

290

MAURO D’ARIANO ET AL.

density matrix describing the measured ensemble. The estimation strategy of the ML technique is to maximize the likelihood functional over the set of the density matrices. Several properties of the likelihood functional are easily found, if we restrict ourselves to ﬁnite dimensional Hilbert spaces. In this case, it can be easily proved that LðÞ is a concave function deﬁned on a convex and closed set of density matrices. Therefore, its maximum is achieved either on a single isolated point, or on a convex subset of density matrices. In the latter case, the experimental data are insuﬃcient to provide a unique estimate for the density matrix using the ML strategy. On the other hand, the existence of a single maximum allows us to assign unambiguously the ML estimate for the density matrix. The ML estimation of the quantum state, despite its elegant general formulation, results in a highly nontrivial constrained optimization problem, even if we resort to purely numerical means. The main diﬃculty lies in the appropriate parameterization of the set of all density matrices. The parameter space should be of the minimum dimension in order to preserve the maximum of the likelihood function as a single isolated point. Additionally, the expression of quantum expectation values in terms of this parameterization should enable fast evaluation of the likelihood function, as this step is performed many times in the course of numerical maximization. For such a purpose one introduces [23] a parameterization of the set of density matrices which provides an eﬃcient algorithm for maximization of the likelihood function. We represent the density matrix in the form ¼ T y T,

ð251Þ

which automatically guarantees that is positive and Hermitian. The remaining condition of unit trace Tr ¼ 1 will be taken into account using the method of Lagrange multipliers. In order to achieve the minimal parameterization, we assume that T is a complex lower triangular matrix, with real elements on the diagonal. This form of T is motivated by the Cholesky decomposition known in numerical analysis [113] for arbitrary nonnegative Hermitian matrix. For an M-dimensional Hilbert space, the number of real parameters in the matrix T is M þ 2MðM 1Þ=2 ¼ M 2 , which equals the number of independent real parameters for a Hermitian matrix. This conﬁrms that such parameterization is minimal, up to the unit trace condition. In numerical calculations, it is convenient to replace the likelihood functional by its natural logarithm, which of course does not change the location of the maximum. Thus the log-likelihood function subjected to

QUANTUM TOMOGRAPHY

291

numerical maximization is given by LðTÞ ¼

N X

ln TrðT y Ti Þ TrðT y TÞ,

ð252Þ

i¼1

where l is a Lagrange multiplier accounting P for normalization of . Writing in terms of its eigenvectors j i as ¼ y2 j ih j,, with real y , the maximum likelihood condition @L=@y ¼ 0 reads y ¼

N X

½y h

ji j i=Trði Þ ,

ð253Þ

i¼1

which, after multiplication by y and summation over , yields l ¼ N. The Lagrange multiplier then equals the total number of measurements N. This formulation of the maximization problem allows one to apply standard numerical procedures for searching the maximum over the M 2 real parameters of the matrix T. The examples presented below use the downhill simplex method [114]. The ﬁrst example is the ML estimation of a single-mode radiation ﬁeld. The experimental apparatus used in this technique is the homodyne detector. According to Section II.D the homodyne measurement is described by the POVM sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

2 2 ðX’ xÞ2 , ð254Þ exp Hðx; ’Þ ¼ pð1 Þ 1 where is the detector eﬃciency, and X’ ¼ ðay ei’ þ aei’ Þ=2 is the quadrature operator at phase ’. After N measurements, we obtain a set of pairs ðxi ; ’i Þ, where i ¼ 1, . . . , N. The log-likelihood functional is given by Equation (252) with i :Hðxi ; ’i Þ. Of course, for a light mode it is necessary to truncate the Hilbert space to a ﬁnite dimensional basis. We shall assume that the highest Fock state has M 1 photons, i.e., that the dimension of the truncated Hilbert space is M. For the expectation Tr½T y THðx; ’Þ it is necessary to use an expression which is explicitly positive, in order to protect the algorithm against the occurrence of small negative numerical arguments of the logarithm function. A simple derivation yields Tr½T y THðx; ’Þ ¼

2 kj k X XX pﬃﬃﬃ M1 pﬃﬃﬃ in’ hkjTjn þ jiB hnj xie nþj,n , k¼0 j¼0

n¼0

ð255Þ

292

MAURO D’ARIANO ET AL.

where Bnþj,n ¼

nþj n

1=2 j n , ð1 Þ

ð256Þ

and 1=4 pﬃﬃﬃ 2 1 pﬃﬃﬃﬃﬃﬃﬃﬃﬃ Hn ð 2xÞ expðx2 Þ hnjxi ¼ n p 2 n!

ð257Þ

are the eigenstates of the harmonic oscillator in the position representation—Hn(x) being the nth Hermite polynomial. The ML technique can be applied to reconstruct the density matrix in the Fock basis from Monte Carlo simulated homodyne statistics. Figure 19 depicts the matrix elements of the density operator as obtained for a coherent state and a squeezed vacuum. Remarkably, only 50,000 homodyne data have been used for quantum eﬃciency ¼ 80%. We recall that in quantum homodyne tomography the statistical errors are known to grow rapidly with decreasing eﬃciency of the detector [29,80]. In contrast, the elements of the density matrix reconstructed using the ML approach remain bounded, as the whole matrix must satisfy positivity and normalization constraints. This results in much smaller statistical errors. As a comparison one could see that the same precision of the reconstructions in Figure 19 could be achieved using 107 –108 data samples with conventional quantum

0.8 0.6

0.2

ρ n,m

ρ n,m

0.3

0.1

0.4 0.2

4

0

2 m

0

4

0

2 m

0

2 n

2 4

0

n

4

0

FIGURE 19. Reconstruction of the density matrix of a single-mode radiation ﬁeld by the ML method. The plot shows the matrix elements of a coherent state (left) with hay ai ¼ 1 photon, and for a squeezed vacuum (right) with hay ai ¼ 0:5 photon. A sample of 50,000 simulated homodyne data for quantum eﬃciency ¼ 80% has been used. (From Ref. [23].)

QUANTUM TOMOGRAPHY

293

tomography. On the other hand, in order to ﬁnd numerically the ML estimate we need to set a priori the cut-oﬀ parameter for the photon number, and its value is limited by increasing computation time. Another relevant example is the reconstruction of the quantum state of two-mode ﬁeld using single-LO homodyning of Section V. Here, the full joint density matrix can be measured by scanning the quadratures of all possible linear combinations of modes. For two modes the measured quadrature operator is given by

Xð,

0,

1Þ

1 ¼ ðaei 2

0

cos þ bei

1

sin þ h:c:Þ,

ð258Þ

where ð, 0 , 1 Þ 2 S2 ½0, 2p , S 2 being the Poincare´ sphere and one phase ranging between 0 and 2p. In each run these parameters are chosen randomly. The POVM describing the measurement is given by the righthand side of Equation (254), with X’ replaced by Xð, 0 , p1ﬃﬃÞ.ﬃ An experiment for the ptwo ﬃﬃﬃ orthogonal states j1 i ¼ ðj00i þ j11iÞ= 2 and j2 i ¼ ðj01i þ j10iÞ= 2 has been simulated, in order to reconstruct the density matrix in the two-mode Fock basis using the ML technique. The results are reported in Figure 20. The ML procedure can also be applied for reconstructing the density matrix of spin systems. For example, let us consider N repeated preparations of a pair of spin-1/2 particles. The particles are shared by two parties. In each run, the parties select randomly and independently from each other a

ρ nm,ls

ρ nm,ls

0.5 0.3 0.1 00 01 10 nm 02 11 20

20 11 02 10 ls 01 00

0.3 0.1 00 01 10 nm 02 11 20

20 11 02 10 01 ls 00

FIGURE 20. ML reconstruction of the density matrix of a two-mode p radiation ﬁeld. On the ﬃﬃﬃ left the matrix elements pﬃﬃﬃ obtained for the state j1 i ¼ ðj00i þ j11iÞ= 2; on the right for j2 i ¼ ðj01i þ j10iÞ= 2. For j1 i we used 100,000 simulated homodyne data and ¼ 80%; for j2i we used 20,000 data and ¼ 90%. (From Ref. [23].)

294

MAURO D’ARIANO ET AL.

direction along which they perform a spin measurement. The obtained result is described by the joint projection operator (spin coherent states [115]) B A B A B F i ¼ jA i , i i hi , i j, where i and i are the vectors on the Bloch sphere corresponding to the outcomes of the ith run, and the indices A and B refer to the two particles. As in the previous examples, it is convenient to use an expression for the quantum expectation value TrðT y TF i Þ which is explicitly positive. The suitable form is TrðT y TF i Þ ¼

X

B 2 jhjTjA i , i ij ,

ð259Þ

where ji is an orthonormal basis in the Hilbert space of the two particles. The result of a simulated experiment with only 500 data for the reconstruction of the density matrix of the singlet state is shown in Figure 21. Summarizing, the ML technique can be used to estimate the density matrix of a quantum system. With respect to conventional quantum tomography this method has the great advantage of needing much smaller experimental samples, making experiments with low data rates feasible; however, with a truncation of the Hilbert space dimension. We have shown that the method is general and the algorithm has solid methodological background, its reliability being conﬁrmed in a number of Monte Carlo simulations. However, for increasing dimension of Hilbert spaces the method has exponential complexity.

0.5 ρ nm,ls

0.3 0.1 − 0.1 − 0.3 − 0.5 00 01 10 02 nm

02 10 01 ls 00

FIGURE 21. ML reconstruction of the density matrix of a pair of spin-1/2 particles in the singlet state. The particles are shared by two parties. In each run, the parties select randomly and independently from each other a direction along which they perform a spin measurement. The matrix elements have been obtained by a sample of 500 simulated data. (From Ref. [23].)

QUANTUM TOMOGRAPHY

295

C. Gaussian State Estimation In this section we review the ML determination method of Ref. [33] for the parameters of Gaussian states. Such states represent the wide class of coherent, squeezed, and thermal states, all of them being characterized by a Gaussian Wigner function. Apart from an irrelevant phase, we consider Wigner functions of the form Wðx, yÞ ¼

22 exp 22 e2r ðx Re Þ2 þ e2r ðy Im Þ2 , p

ð260Þ

and the ML technique with homodyne detection is applied to estimate the four real parameters , r, Re , and Im . The four parameters provide the number of thermal, squeezing, and coherent-signal photons in the quantum state as follows nth ¼

1 1 1 , 2 2

nsq ¼ sinh2 r, ncoh ¼ jj2 :

ð261Þ

The density matrix corresponding to the Wigner function in Equation (260) is written ¼ DðÞSðrÞ

ay a 1 nth S y ðrÞDy ðÞ, nth þ 1 nth þ 1

ð262Þ

where SðrÞ¼exp½rða2 ay2 Þ=2 and DðÞ¼expðay * aÞ denote the squeezing and displacement operators, respectively. The theoretical homodyne probability distribution at phase ’ with respect to the local oscillator can be evaluated using Equation (7), and is given by the Gaussian sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 22 pðx, ’Þ ¼ pðe2r cos2 ’ þ e2r sin2 ’Þ 22 i’ 2 x Re ðe Þ : exp e2r cos2 ’ þ e2r sin2 ’

ð263Þ

296

MAURO D’ARIANO ET AL.

The log-likelihood function (246) for a set of N homodyne outcomes xi at random phase ’i is then written as follows L¼

N X 1 22 log 2 pðe2r cos2 ’i þ e2r sin2 ’i Þ i¼1

2 22 xi Re ðei’i Þ : 2 2r 2 2r e cos ’i þ e sin ’i

ð264Þ

The ML estimators ml, ml, Reml, and Imml are found upon maximizing Equation (264) versus , r, Re, and Im. In order to evaluate globally the state reconstruction, one considers the normalized overlap O between the theoretical and the estimated state Tr½ml O ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ : Tr½2 Tr½2ml

ð265Þ

Notice that O ¼ 1 iﬀ ¼ ml. Through Monte Carlo simulations, one always ﬁnds a value around unity, typically with statistical ﬂuctuations over the third digit, for number of data samples N ¼ 50,000, quantum eﬃciency at homodyne detectors ¼ 80%, and state parameters with the following ranges: nth < 3, ncoh < 5, and nsq < 3. Also with such a small number of data samples, the quality of the state reconstruction is so good that other physical quantities that are theoretically evaluated from the experimental values of ml, rml, Reml, and Imml are inferred very precisely. For example, in Ref. [33] the photon number probability of a squeezed thermal state has been evaluated, which is given by the integral Z

2p

hnjjni ¼ 0

d ½Cð, nth , rÞ 1 n , 2p Cð, nth , rÞnþ1

ð266Þ

with Cð, nth , rÞ ¼ ðnth þ 1=2Þðe2r sin2 þ e2r cos2 Þ þ 1=2. The comparison of the theoretical and the experimental results for a state with nth ¼ 0.1 and nsq ¼ 3 is reported in Figure 22. The statistical error of the reconstructed number probability aﬀects the third decimal digit, and is not visible on the scale of the plot. The estimation of parameters of Gaussian Wigner functions through the ML method allows one to estimate the parameters in quadratic Hamiltonians of the generic form H ¼ a þ * ay þ ’ay a þ

1 2 1 * y2

a þ a : 2 2

ð267Þ

QUANTUM TOMOGRAPHY

297

FIGURE 22. Photon number probability of a squeezed thermal state (thermal photons nth ¼ 0.1, squeezing photons nsq ¼ 3). Compare the reconstructed probabilities by means of the maximum likelihood method and homodyne detection (gray histogram) with the theoretical values (black histogram). Number of data samples N ¼ 50,000, quantum eﬃciency ¼ 80%. The statistical error aﬀects the third decimal digit, and it is not visible on the scale of the plot. (From Ref. [33].)

In fact, the unitary evolution operator U ¼ eiHt preserves the Gaussian form of an input state with Gaussian Wigner function. In other words, one can use a known Gaussian state to probe and characterize an optical device described by a Hamiltonian as in Equation (267). Assuming t ¼ 1 without loss of generality, the Heisenberg evolution of the radiation mode a is given by U y aU ¼ a þ ay þ ,

ð268Þ

with qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ’ ¼ cosð ’2 j j2 Þ i pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ sinð ’2 j j2 Þ, ’2 j j2 ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ q

* ¼ i pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ sinð ’2 j j2 Þ, ’2 j j2 qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ’ * * * 2 2 ¼ 2 ðcosð ’ j j Þ 1Þ i pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ sinð ’2 j j2 Þ: ’ j j2 ’2 j j2

ð269Þ

For an input state with known Wigner function W( , *), the corresponding output Wigner function is WUU y ð , * Þ ¼ W ½ð Þ * ð * * Þ, ð * * Þ ð Þ* : ð270Þ

298

MAURO D’ARIANO ET AL.

Hence, by estimating the parameters , , and inverting Equation (269), one obtains the ML values for , ’ and of the Hamiltonian in Equation (267). The present example can be used in practical applications for the estimation of the gain of a phase-sensitive ampliﬁer or equivalently to estimate a squeezing parameter.

IX. CLASSICAL IMAGING BY QUANTUM TOMOGRAPHY As we showed in Section II, the development of quantum tomography has its origin in the inadequacy of classical imaging procedures to face the quantum problem of Wigner function reconstruction. In this section we brieﬂy illustrate how to go back to classical imaging and proﬁtably use quantum tomography as a tool for image reconstruction and compression: this is the method of ﬁctitious photons tomography of Ref. [34]. The problem of tomographic imaging is to recover a mass distribution m(x, y) in a two-dimensional slab from a ﬁnite collection of one-dimensional projections. The situation is schematically sketched in Figure 23 where m(x, y) describes two circular holes in a uniform background. The tomographic machine, say X-ray equipment, collects many stripe photos of the sample from various directions , and then numerically performs a mathematical transformation—the so-called inverse Radon transform [116]—in order to reconstruct m(x, y) from its radial proﬁles at diﬀerent

FIGURE 23. (a) Tomography of a simple object: analytical transmission proﬁles are reported for ¼ 0, p/2. (b) The same case of (a), but for very weak signals: in this case the transmission proﬁles are given in terms of random points on a photographic plate (here obtained from a Monte Carlo simulation). (From Ref. [34].)

QUANTUM TOMOGRAPHY

299

values of . The problem which is of interest for us is when the radial proﬁles are not well-resolved digitalized functions, but actually represent the density distribution of random points, as if in our X-ray machine the beam is so weak that radial photos are just the collection of many small spots, each from a single X-ray photon (this situation is sketched in Figure 23(b). It is obvious that this case can be reduced to the previous one by counting all points falling in a predetermined one-dimensional mesh, and giving radial proﬁles in the form of histograms (this is what actually happens in a real machine, using arrays of photodetectors). However, we want to use the whole available information from each ‘‘event’’—i.e., the exact onedimensional location of each spot—in a way which is independent of any predetermined mesh. In practice, this situation occurs when the signal is so weak and the machine resolution is so high (i.e., the mesh-step is so tiny) that only zero or one photon at most can be collected in each channel. As we will see, this low-signal/high-resolution case naturally brings the imaging problem into the domain of quantum tomography. Images are identiﬁed with Wigner functions, so as to obtain a description in terms of density matrices. These are still trace-class matrices (corresponding to ‘‘normalizable’’ images), but are no longer positive deﬁnite, because an ‘‘image’’ generally is not a genuine Wigner function and violates the Heisenberg relations on the complex plane (the phase space of a single mode of radiation). Hence, such density matrices are unphysical: they are just a mathematical tool for imaging. This is the reason why this method has been named ﬁctitious photons tomography [34]. As we will see in the following, the image resolution improves by increasing the rank of the density matrix, and in this way the present method also provides a new algorithm for image compression, which is suited to angular image scanning.

A. From Classical to Quantum Imaging We adopt the complex notation, with ¼ x þ iy representing a point in the image plane. In this way and * are considered as independent variables, and the two-dimensional image—here denoted by the same symbol W( , *) used for the Wigner function—is just a generic real function of the point in the plane. In the most general situation W( , *) is deﬁned on the whole complex plane, where it is normalized to some ﬁnite constant, and it is bounded from both below and above, with range representing the darkness nuance. For X-ray tomography W( , *) roughly represents the absorption coeﬃcient as a function of the point . We consider a linear absorption regime, i.e., the image extension is negligible with respect to the radiation

300

MAURO D’ARIANO ET AL.

absorption length in the medium. At the same time we neglect any diﬀraction eﬀect. As shown in Section III.B the customary imaging technique is based on the inverse Radon transform. A tomography of a two-dimensional image W( , *) is a collection of one-dimensional projections p(x, ) at diﬀerent values of the observation angle . We rewrite here the deﬁnition of the Radon transform of W( , *) Z pðx, Þ ¼

þ1 1

dy W ðx þ iyÞei , ðx iyÞei : p

ð271Þ

In Equation (271) x is the current coordinate along the direction orthogonal to the projection and y is the coordinate along the projection direction. The situation is depicted in Figure 23 where W( , *) is plotted along with its p(x, ) proﬁles for ¼ 0, p/2 for a couple of identical circular holes that are symmetrically disposed with respect to the origin. The reconstruction of the image W( , *) from its projections p(x, )— also called ‘‘back projection’’—is given by the inverse Radon transform, which, following the derivation in Section III.B, leads to the ﬁltering procedure Z

p

Wð , * Þ ¼ 0

d P 2p

Z

þ1

dx 1

@pðx, Þ=@x , x

ð272Þ

where P denotes the Cauchy principal value and ¼ Re( ei). Equation (272) is commonly used in conventional tomographic imaging (see, for example, [117]). Let us now critically consider the above procedure in the case of very weak signals, namely when p(x, ) just represents the probability distribution of random X-ray spots on a ﬁne-mesh multichannel: this situation is sketched in Figure 23(b). From Equation (272) one can recover W( , *) only when the analytical form of p(x, ) is known. But the experimental outcomes of each projection actually are random data distributed according to p(x, ), whereas in order to recover W( , *) from Equation (272) one has to evaluate the ﬁrst-order derivatives of p(x, ). The need of the analytical form for projections p(x, ) requires a ﬁltering procedure on data, usually obtained by ‘‘splining’’ data in order to use Equation (272). The above procedure leads to approximate image reconstructions, and the choice of any kind of smoothing parameter unavoidably aﬀects in a systematic way the statistics of errors. In the following we show how quantum tomography can be used for conventional imaging in the presence

301

QUANTUM TOMOGRAPHY

of weak signals, providing both ideally controlled resolution and reliable error statistics. The basic formula we will use is the expansion of the Wigner function in the number representation of Equations (230) and (231). In practice, the Hilbert space has to be truncated at some ﬁnite dimension dH, and this sets the resolution for the reconstruction of W( , *). However, as we will show, this resolution can be chosen at will, independently of the number of experimental data. As previously noticed, in general an image does not correspond to a Wigner function of a physical state, due to the fact that the Heisenberg relations unavoidably produce only smooth Wigner functions, whereas a conventional image can have very sharp edges. However, if one allows the density matrix to be no longer positive deﬁnite (but still trace class), a correspondence with images is obtained, which holds in general. In this way every image is stored into a trace-class matrix n,m via quantum tomography, and a convenient truncation of the matrix dimension dH can be chosen. The connection between images and matrices is the main point of this approach: the information needed to reconstruct the image is stored in a dH dH matrix. For suitably chosen dimension dH the present method can also provide a procedure for image compression. Notice that the correspondence between images and trace-class matrices retains some symmetries of the image, which manifest as algebraic properties of the matrix n,m . For example, an isotropic image (like a uniform circle centered at the origin) is stored in a diagonal matrix. Other symmetries are given in Table 3. The truncated Hilbert space dimension dH sets the imaging resolution. The kind of resolution can be understood by studying the behavior of the kernels R½jn þ dihnj ðx, Þ of Equation (100), which are averaged over the experimental data in order to obtain the matrix elements n,nþd . Outside a region that is almost independent of n and d, all functions R½jn þ dihnj ðx, Þ

TABLE 3 GEOMETRICAL SYMMETRIES OF AN IMAGE, ANALYTICAL PROPERTIES OF PROJECTIONS, ALGEBRAIC PROPERTIES OF THE CORRESPONDING MATRIX (FROM REF. [34]) Symmetry Isotropy X-axis mirror Y-axis mirror Inversion through the origin

AND

p(x, )

pðx, Þ:pðxÞ pðx, p Þ ¼ pðx, Þ pðx, p Þ ¼ pðx, Þ pðx, Þ ¼ pðx, Þ

n,m ¼ n,m n,m n,m 2 R in,m 2 R n,nþ2dþ1 ¼ 0

302

MAURO D’ARIANO ET AL.

FIGURE 24. Tomographic reconstruction of the font ‘‘a’’ for increasing dimension of the truncated matrix, dH ¼ 2, 4, 8, 16, 32, 48. The plot is obtained by averaging the kernel function R½jn þ dihnj ðx, Þ of Equation (100) with assigned analytic transmission proﬁles pðx, Þ, and then using Equations (230) and (231) (From Ref. [34].)

decrease exponentially, whereas inside this region they oscillate with a number of oscillations linearly increasing with 2n þ d. This behavior produces the eﬀects illustrated in Figure 24, where we report the tomographic reconstruction of the font ‘‘a’’ for increasing dimension dH . The plot is obtained by numerically integrating the kernel functions from given analytic transmission proﬁles pðx, Þ. As we see from Figure 24 both the radial and the angular resolutions improve versus dH , making the details of the image sharper and sharper already from a relatively small truncation dH ¼ 48. A quantitative measure of the precision of the tomographic reconstruction can be given in terms of the distance D between the true and the

QUANTUM TOMOGRAPHY

303

FIGURE 25. Convergence of both trace and Hilbert distance D in Equation (273) versus the dimensional truncation dH of the Hilbert space. Here the image is a uniform circle of unit radius centered at the origin. The reconstructed matrix elements are obtained as in Figure 24, whereas the exact matrix elements are provided by Equation (274) (From Ref. [34].)

reconstructed image, which, in turn, coincides with the Hilbert distance D between the corresponding density matrices. One has Z D¼ ¼

d 2 jWð , * Þj2 ¼ TrðÞ2 1 X

2n,n þ 2

1 X 1 X

j2n,nþ j2 ,

ð273Þ

n¼0 ¼1

n¼0

where ½. . . ¼ ½. . . true ½. . . reconstructed . The convergence of D versus dH is given in Figure 25 for a solid circle of unit radius centered at the origin. In this case the obtained density matrix has only diagonal elements, according to Table 3. These are given by n,n ¼ 2

n X ¼0

ð2Þ

n

ð1 , 2, 2R2 Þ,

ð274Þ

where ð , , zÞ denotes the conﬂuent hypergeometric function of argument z and parameters and . So far we have analyzed the method only on the basis of given analytic proﬁles p(x, ). As already said, however, the method is particularly advantageous in the weak-signal/high-resolution situation, where

304

MAURO D’ARIANO ET AL.

FIGURE 26. Monte Carlo simulation of an experimental tomographic reconstruction of the font ‘‘a.’’ The truncation dimension is ﬁxed at dH ¼ 48, and the number of scanning phases is F ¼ 100. The plots correspond to 103 , 104 , 105 , 106 data for each phase, respectively. (From Ref. [34].)

the imaging can be achieved directly from averaging the kernel functions on data. In this case the procedure allows one to exploit the whole available experimental resolution, whereas the image resolution is set at will. In Figure 26 we report a Monte Carlo simulation of an experimental tomographic reconstruction of the font ‘‘a’’ for increasing number of data. All plots are obtained at the maximum available dimension dH ¼ 48, and using F ¼ 100 scanning phases. The situation occurring for small numbers of data is given in the ﬁrst plot, where the highly resolved image still exhibits the natural statistical ﬂuctuations due to the limited number of data. For a larger sample the image appears sharper from the random background, and it is clearly recognizable for a number of data equal to 106 . The method is eﬃcient also from the computational point of view, as the time needed for image reconstruction is quadratic in the number of elements of the density matrix, and linear in the number of experimental data. Needless to say, imaging by quantum homodyne tomography is at the very early stages and further investigation is in order.

QUANTUM TOMOGRAPHY

305

ACKNOWLEDGMENTS The writing of this chapter has been cosponsored by the Italian Ministero dell’Istruzione, dell’Universita’ e della Ricerca (MIUR) under the Coﬁnanziamento 2002 Entanglement assisted high precision measurements, the Istituto Nazionale di Fisica della Materia under the project PRA-2002CLON, and by the European Community programs ATESIT (Contract No. IST-2000-29681) and EQUIP (Contract No. IST-1999-11053). G. M. D. acknowledges partial support by the Department of Defense Multidisciplinary University Research Initiative (MURI) program administered by the Army Research Oﬃce under Grant No. DAAD19-00-1-0177. M. G. A. P. is research fellow at Collegio Alessandro Volta.

REFERENCES 1. Heisenberg, W. (1927). Zeit. fu¨r Physik 43, 172; Heisenberg, W. (1930). The Physical Principle of Quantum Theory. Dover, NY: Univ. Chicago Press. 2. von Neumann, J. (1955). Mathematical Foundations of Quantum Mechanics. Princeton, NJ: Princeton Univ. Press. 3. Wootters, W. K., and Zurek, W. H. (1982). Nature 299, 802; Yuen, H. P. (1986). Phys. Lett. A 113, 405. 4. D’Ariano, G. M., and Yuen, H. P. (1996). Phys. Rev. Lett. 76, 2832. 5. Fano, U. (1957). Rev. Mod. Phys. 29, 74. 6. Smithey, D. T., Beck, M., Raymer, M. G., and Faridani, A. (1993). Phys. Rev. Lett. 70, 1244; Raymer, M. G., Beck, M., and McAlister, D. F. (1994). Phys. Rev. Lett. 72, 1137; Smithey, D. T., Beck, M., Cooper, J., and Raymer, M. G. (1993). Phys. Rev. A 48, 3159. 7. Vogel, K., and Risken, H. (1989). Phys. Rev. A 40, 2847. 8. D’Ariano, G. M., Macchiavello, C., and Paris, M. G. A. (1994). Phys. Rev. A 50, 4298. 9. D’Ariano, G. M., Leonhardt, U., and Paul, H. (1995). Phys. Rev. A 52, R1801. 10. Munroe, M., Boggavarapu, D., Anderson, M. E., and Raymer, M. G. (1995). Phys. Rev. A 52, R924. 11. Schiller, S., Breitenbach, G., Pereira, S. F., Mu¨ller, T., and Mlynek, J. (1996). Phys. Rev. Lett. 77, 2933; Breitenbach, G., Schiller, S., and Mlynek, J. (1997). Nature 387, 471. 12. Janicke, U., and Wilkens, M. (1995). J. Mod. Opt. 42, 2183; Wallentowitz, S., and Vogel, W. (1995). Phys. Rev. Lett. 75, 2932; Kienle, S. H., Freiberger, M., Schleich, W. P., and Raymer, M. G. (1997). In Experimental Metaphysics: Quantum Mechanical Studies for Abner Shimony, edited by S. Cohen et al. Lancaster: Kluwer, pp. 121. 13. Dunn, T. J., Walmsley, I. A., and Mukamel, S. (1995). Phys. Rev. Lett. 74, 884. 14. Kurtsiefer, C., Pfau, T., and Mlynek, J. (1997). Nature 386, 150. 15. Leibfried, D., Meekhof, D. M., King, B. E., Monroe, C., Itano, W. M., and Wineland, D. J. (1996). Phys. Rev. Lett. 77, 4281. 16. D’Ariano, G. M., (1997). Quantum Communication, Computing, and Measurement, edited by O. Hirota, A. S. Holevo, and C. M. Caves, New York and London: Plenum, pp. 253. 17. D’Ariano, G. M., Kumar, P., and Sacchi, M. F. (2000). Phys. Rev. A 61, 013806.

306

MAURO D’ARIANO ET AL.

18. D’Ariano, G. M. (2000). Quantum Communication, Computing, and Measurement, edited by P. Kumar, G. M. D’Ariano, and O. Hirota. New York and London: Kluwer Academic/ Plenum, pp. 137. 19. Paini, M. preprint quant-ph/0002078. 20. D’Ariano, G. M. (2000). Phys. Lett. A 268, 151. 21. Cassinelli, G., D’Ariano, G. M., De Vito, E., and Levrero, A. (2000). J. Math. Phys. 41, 7940. 22. D’Ariano, G. M., and Paris, M. G. A. (1998). Acta Phys. Slov. 48, 191; D’Ariano, G. M., and Paris, M. G. A. (1999). Phys. Rev. A 60, 518. 23. Banaszek, K., D’Ariano, G. M., Paris, M. G. A., and Sacchi, M. F. (2000). Phys. Rev. A 61, R010304. 24. D’Ariano, G. M., Maccone, L., and Paris, M. G. A. (2001). Phys. Lett. A 276, 25; (2001) J. Phys. A 34, 93. 25. D’Ariano, G. M., and Lo Presti, P. (2001). Phys. Rev. Lett. 86, 4195. 26. Wigner, E. P. (1932). Phys. Rev. 40, 749. 27. Cahill, K. E., and Glauber, R. J. (1969). Phys. Rev. 177, 1857, 1882. 28. D’Ariano, G. M., Maccone, L., and Paini, M. (2003). J. Opt. B 5, 77. 29. D’Ariano, G. M., and Paris, M. G. A. (1997). Phys. Lett. A 233, 49. 30. D’Ariano, G. M., Kumar, P., and Sacchi, M. F. (1999). Phys. Rev. A 59, 826. 31. D’Ariano, G. M., Kumar, P., Macchiavello, C., Maccone, L., and Sterpi, N. (1999). Phys. Rev. Lett. 83, 2490. 32. D’Ariano, G. M., De Laurentis, M., Paris, M. G. A., Porzio, A., and Solimeno, S. (2002). J. Opt. B 4, 127. 33. D’Ariano, G. M., Paris, M. G. A., and Sacchi, M. F. (2000). Phys. Rev. A 62, 023815; (2001). Phys. Rev. A 64, 019903(E). 34. D’Ariano, G.M., Macchiavello, C., and Paris, M.G.A. (1996). Opt. Comm. 129, 6. 35. D’Ariano, G. M. D. and Sacchi, M. F., (1997). Nuovo Cimento 112B, 881. 36. Kim, Y. S., and Zachary, W. W. (1986). The Physics of Phase Space. Berlin: Springer. 37. Gardiner, C. W. (1991). Quantum Noise. Berlin: Springer-Verlag. 38. Weyl, H. (1950). The Theory of Groups and Quantum Mechanics. New York: Dover. 39. Cahill, K. E. (1965). Phys. Rev. 138, B1566. 40. D’Ariano, G. M., Fortunato, M., and Tombesi, P. (1995). Nuovo Cimento 110B, 1127. 41. Lee, C. T. (1991) Phys. Rev. A 44, R2775; (1995). Phys. Rev. A 52, 3374. 42. Kelley, P. L., and Kleiner, W. H. (1994). Phys. Rev. 136, 316. 43. Yuen, H. P., and Chan, V. Y. S. (1983). Opt. Lett. 8, 177. 44. Abbas, G. L., Chan, V. W. S., Yee, S. T. (1983). Opt. Lett. 8, 419 (1985). IEEE J. Light. Tech. LT-3, 1110. 45. D’Ariano, G. M. (1992). Nuovo Cimento 107B, 643. 46. D’Ariano, G. M. (1997). Quantum estimation theory and optical detection, in Quantum Optics and the Spectroscopy of Solids, edited by T. Hakiog˘lu and A. S. Shumovsky, Dordrecht: Kluwer, pp. 139–174. 47. Wodkiewicz, K., and Eberly, J. H. (1985). JOSA B 2, 458. 48. D’Ariano, G. M. (1992). Int. J. Mod. Phys. B 6, 1291. 49. Yuen, H. P., and Shapiro, J. H. (1978). IEEE Trans. Inf. Theory 24, 657. (1979). 25, 179. 50. Yuen, H. P., and Shapiro, J. H. (1980). IEEE Trans. Inf. Theory 26, 78. 51. D’Ariano, G. M., Lo Presti, P., and Sacchi, M. F. (2000). Phys. Lett. A 272, 32. 52. D’Ariano, G. M., and Sacchi, M. F. (1995). Phys. Rev. A 52, R4309. 53. Helstrom, C. W. (1976). Quantum Detection and Estimation Theory. New York: Academic Press. 54. Arthurs, E., and Kelly, J. L., Jr. (1965). Bell System Tech. J. 44, 725.

QUANTUM TOMOGRAPHY 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73.

74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95.

307

Yuen, H. P. (1982). Phys. Lett. 91A, 101. Arthurs, E., and Goodman, M. S. (1988). Phys. Rev. Lett. 60, 2447. D’Ariano, G. M., Macchiavello, C., and Paris, M. G. A. (1995). Phys. Lett. A 198, 286. Paris, M. G. A. (1996). Phys. Rev. A 53, 2658. Baltin, R. (1983). J. Phys. A 16, 2721; (1984). Phys. Lett. A 102, 332. Paris, M. G. A. (1996). Opt. Comm. 124, 277. Shapiro, J. H., and Wagner, S. S. (1984). IEEE J. Quant. Electron. QE-20, 803. Shapiro, J. H. (1985). IEEE J. Quant. Electron. QE-21, 237. Walker, N. G., (1987). J. Mod. Opt. 34, 15. Lai, Y., and Haus, H. A. (1989). Quantum Opt. 1, 99. Paris, M. G. A., Chizhov, A., and Steuernagel, O. (1986). Opt. Comm. 134, 117. Zucchetti, A., Vogel, W., and Welsch, D. G. (1996). Phys. Rev. A 54, 856. D’Ariano, G. M. (2002). Phys. Lett. A 300, 1. Opatrny, T., and Welsch, D.-G. (1999). Prog. Opt. XXXIX, 63. Weigert, S. (2000). Phys. Rev. Lett. 84, 802. Kim, C., and Kumar, P. (1994). Phys. Rev. Lett. 73, 1605. Richter, Th. (1996). Phys. Lett. A 211, 327. Leonhardt, U., Munroe, M., Kiss, T., Raymer, M. G., and Richter, Th. (1996). Opt. Comm. 127, 144. Kiss, T., Herzog, U., and Leonhardt, U. (1995). Phys. Rev. A 52, 2433; D’Ariano, G. M., and Macchiavello, C. (1998). Phys. Rev. A 57, 3131; Kiss, T., Herzog, U., and Leonhardt, U. (1998). Phys. Rev. A 57, 3131. D’Ariano, G. M., Mancini, S., Manko, V. I., and Tombesi, P. Q. (1996). Opt. 8, 1017. Banaszek, K. and Wodkievicz, K., (1996). Phys. Rev. Lett. 76, 4344. Opatrny, T., and Welsch, D.-G. (1997). Phys. Rev. A 55, 1462. Murnaghan, F. D. (1938). The Theory of Group Representation. Baltimore: Johns Hopkins Press, pp. 216. Leonhardt, U., and Raymer, M. G. (1996). Phys. Rev. Lett. 76, 1985. D’Ariano, G. M. (1999). Acta Phys. Slov. 49, 513. D’Ariano, G. M., Macchiavello, C., and Sterpi, N. (1997). Quantum Semiclass. Opt. 9, 929. Gradshteyn, I. S. and Ryzhik, I. M. (1980). Table of Integrals, Series, and Products. New York: Academic Press. Richter, Th. (1996). Phys. Rev. A 53, 1197. Orlowsky, A., and Wu¨nsche, A. (1993). Phys. Rev. A 48, 4617. Holevo, A. S. (1982). Probabilistic and Statistical Aspects of Quantum Theory. Amsterdam: North-Holland. Shapiro, J. H., and Shakeel, A. (1997). JOSA B 14, 232. Vasilyev, M., Choi, S.-K., Kumar, P., and D’Ariano, G. M. (2000). Phys. Rev. Lett. 84, 2354. D’Ariano, G. M., Vasilyev, M., and Kumar, P. (1998). Phys. Rev. A 58, 636. Mandel, L. (1958). Proc. Phys. Soc. 72, 1037. Hong, C. K., and Mandel, L. (1985). Phys. Rev. Lett. 54, 323; (1985). Phys. Rev. A 32, 974. Hillery, M. (1987). Phys. Rev. A 36, 3796. Special issues on squeezed states: J. Opt. Soc. Am. B 4 (1987); J. Mod. Opt. 34 (1987). Tombesi, P., and Pike, E. R., eds. (1989). Squeezed and Non-classical Light. New York: Plenum. Agarwal, G. S., and Tara, K. (1992). Phys. Rev. A 46, 485. Klyshko, D. N. (1994). Phys. Usp. 37, 1097; (1996). Phys. Usp. 39, 573; (1996). Phys. Lett. A 213, 7. De Martini, F. et al., eds. (1996). Quantum Interferometry. Wenheim: VCH.

308 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117.

MAURO D’ARIANO ET AL. Arvind, N., Mukunda, N., and Simon, R. (1998). J. Phys. A 31, 565. Schleich, W., and Wheeler, J. A. (1987). Nature 326, 574. Yuen, H. P. (1976). Phys. Rev. A 13, 2226. Yamamoto, Y., and Haus, H. A. (1986). Rev. Mod. Phys. 58, 1001. Bandilla, A., Drobny´, G., and Jex, I. (1995). Phys. Rev. Lett. 75, 4019; (1996). Phys. Rev. A 53, 507. Lu¨ders, G. (1951). Ann. Physik 8, 322. See, for example E. P. Wigner (1963). Am. J. Phys. 31, 6; Imoto, N., Ueda, M., and Ogawa, T. (1990). Phys. Rev. A 41, 4127. Ozawa, M. (1987). Ann. Phys. 259, 121, and references therein. Beck, M. (2000). Phys. Rev. Lett. 84, 5748. Chuang, I. L., and Nielsen, M. A. (1997). J. Mod. Opt. 44, 2455. D’Ariano, G. M., and Maccone, L. (1998). Phys. Rev. Lett. 80, 5465. Sacchi, M. F. (2001). Phys. Rev. A 63, 054104. De Martini, F., D’Ariano, G. M., Mazzei, A., and Ricci, M. (2003). Fortschr. Phys. 51, 342 and (2003). Phys. Rev. A 87, 062307. Kraus, K. (1983). States, Eﬀects, and Operations. Berlin: Springer-Verlag. Hradil, Z. (1997). Phys. Rev. A 55, R1561; in the context of phase measurement, see Braunstein, S. L., Lane, A. S., and Caves, C. M. (1992). Phys. Rev. Lett. 69, 2153. Banaszek, K. (1998). Phys. Rev. A 57, 5013. Cramer, H. (1946). Mathematical Methods of Statistics. Princeton, NJ: Princeton University Press. Householder, A. S. (1964). The Theory of Matrices in Numerical Analysis. New York: Blaisdell, Sec. 5.2. Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992). Numerical Recipes in Fortran. Cambridge: Cambridge Univ. Press, Sec. 10.4. Arecchi, F. T., Courtens, E., Gilmore, R., and Thomas, H. (1972). Phys. Rev. A 6, 2211. Natterer, F. (1986). The Mathematics of Computerized Tomography. Wiley. Mansﬁeld, P., and Morris, P. G. (1982). NMR Imaging in Biomedicine. Academic Press.

ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 128

Scanning Low-Energy Electron Microscopy ILONA MU¨LLEROVA´ and LUDE˘K FRANK Institute of Scientific Instruments AS CR, Kra´lovopolska´ 147, CZ-61264 Brno, Czech Republic

I. Introduction . . . . . . . . . . . . . . . . . . . . . . . II. Motivations to Lower the Electron Energy . . . . . . A. Extensions to Conventional Modes of Operation B. New Opportunities . . . . . . . . . . . . . . . . . C. Issues Inherent to Slow Electron Beams . . . . . III. Interaction of Slow Electrons with Solids. . . . . . . A. Elastic Scattering . . . . . . . . . . . . . . . . . . 1. Scattering on Nuclei . . . . . . . . . . . . . . 2. Reﬂection on Energy Gaps . . . . . . . . . . . B. Inelastic Scattering . . . . . . . . . . . . . . . . . 1. Scattering on Electrons . . . . . . . . . . . . . 2. Scattering on Atoms . . . . . . . . . . . . . . C. Penetration of Electrons . . . . . . . . . . . . . . D. Heating and Damage of the Specimen . . . . . . E. Specimen Charging . . . . . . . . . . . . . . . . . F. Tools for Simulation of Electron Scattering . . . IV. Emission of Electrons . . . . . . . . . . . . . . . . . A. Electron Backscattering . . . . . . . . . . . . . . B. Crystallinity Eﬀects . . . . . . . . . . . . . . . . . C. Coherence within the Primary Beam Spot . . . . D. Secondary Electron Emission . . . . . . . . . . . V. Formation of the Primary Beam . . . . . . . . . . . A. The Spot Size . . . . . . . . . . . . . . . . . . . . B. Incorporation of the Retarding Field . . . . . . . C. The Cathode Lens . . . . . . . . . . . . . . . . . D. The Pixel Size. . . . . . . . . . . . . . . . . . . . E. Spurious Eﬀects . . . . . . . . . . . . . . . . . . . F. Testing the Resolution . . . . . . . . . . . . . . . VI. Detection and Specimen-Related Issues . . . . . . . . A. Detection Strategies . . . . . . . . . . . . . . . . B. Detectors . . . . . . . . . . . . . . . . . . . . . . C. Signal Composition. . . . . . . . . . . . . . . . . D. Specimen Surface . . . . . . . . . . . . . . . . . . E. Specimen Tilt . . . . . . . . . . . . . . . . . . . . VII. Instruments . . . . . . . . . . . . . . . . . . . . . . . A. Adaptation of Conventional SEMs . . . . . . . . B. Dedicated Equipment. . . . . . . . . . . . . . . . C. Alignment and Operation . . . . . . . . . . . . .

309

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

310 314 314 316 317 319 320 320 323 324 324 328 331 334 336 340 343 345 350 353 354 361 362 366 369 374 377 379 381 382 387 393 394 397 399 399 401 407

Copyright ß 2003 Elsevier Inc. All rights reserved. 1076-5670/2003 $35.00

310

MU¨LLEROVA´ AND FRANK

D. Practical Issues . . . . . . . . . . . . . VIII. Selected Applications . . . . . . . . . . . . A. Prospective Application Areas. . . . . B. General Characteristics of Micrograph C. Surface Relief . . . . . . . . . . . . . . D. Critical Energy Mode . . . . . . . . . E. Diﬀraction Contrast . . . . . . . . . . F. Contrast of Crystal Orientation . . . . G. Layered Structures . . . . . . . . . . . H. Material Contrast . . . . . . . . . . . I. Electronic Contrast in Semiconductors J. Energy-Band Contrast . . . . . . . . . IX. Conclusions . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

410 413 414 415 417 418 419 422 422 425 426 430 431 432 432

I. INTRODUCTION Two versions of the electron microscope, the directly imaging (usually TEM, transmission electron microscope) and scanning (SEM) models, have coexisted in the instrument market and in laboratories for decades and none of them seems likely to lose ground. At certain periods, one or the other attracts enhanced attention and makes a more signiﬁcant step forward, leaving afterwards the momentary leadership to the competing principle. A period rich in innovations has been experienced by SEM designers and brought two successful novelties, namely the environmental SEM with the specimen surrounded by a gas at a pressure of thousands of Pa and high-resolution imaging at electron beam energies down to fractions of eV. Our purpose here is to review the theoretical and practical aspects of the latter and to present the method as being already fully feasible and worth employing in the majority of SEM application areas. The term ‘‘low-energy’’ electron is obviously of a qualitative nature and should be given some quantitative limits. This limit could be best based on characteristics of the electron interaction with solids that provides the image signal in SEM. Examining the typical energy dependences of all relevant quantities connected with this interaction, we can ﬁnd good reasons for the deﬁnition of two such energy intervals instead of only one. Figure 1 shows the atomic number dependence of the so-called second critical energy EC2, i.e., the higher of two electron impact energies at which the total electron emission yield is equal to one (or to 100%). These energies exist for nearly all solids with a few exceptions such as conductors of lowest mean atomic numbers, for which the total emission does not exceed the unit level. Above EC2 the electron emission decreases monotonically and no thresholds can be

SCANNING LEEM

311

FIGURE 1. The higher of the critical energies for normal electron impact, EC2, at which the total electron yield is equal to one, plotted versus the atomic number for conductive chemical elements; data were collected from Bauer and Seiler (1984) and Zadrazˇil and El-Gomati (2002).

identiﬁed. On the contrary, the energy EC2 alone represents a signiﬁcant breakpoint at which the specimen charging changes its sign. As the graph shows, a value like 5 keV can be taken as a margin of this interval. Thus, let us consider the ‘‘low-energy’’ range below 5 keV beam energy. As will be shown subsequently, around this threshold the yield of backscattered electrons (BSE) also loses its monotonic dependence on the atomic number, which exists at higher energies, so that the conventional material contrast ceases to be reliably usable. In Figure 2, the well-known plot of the inelastic mean free path (IMFP) of electrons is shown versus energy for numerous elements and compounds. The deep minimum at approximately 50 eV represents another crucial threshold: the IMFP starts to grow below this point because the main interaction phenomena, the secondary electron (SE) emission in particular, settle here and a fundamentally new situation emerges for the scanned imaging. So let us deﬁne also the ‘‘very low energy’’ interval below 50 eV. Later we will see that this energy range can be further subdivided but this would miss its practical purpose. The commercial SEM instruments traditionally used primary beam energies of 15–30 keV as a compromise between suﬃciently small spot size and reasonable SE emission. The series of preadjusted beam energies in SEM mostly ended at 5 keV and when even lower energies were possible, good-quality micrographs were not acquired. Progress in the computeraided design methods for electron optics opened ways to tailoring the objective lenses and even full columns to desired parameters and afterwards the SEM instruments entered the low-energy range down to about 1 keV.

312

MU¨LLEROVA´ AND FRANK

FIGURE 2. The energy dependence of the inelastic mean free path of electrons; the dots represent various elements and compounds. (Reprinted with permission from Seah and Dench, 1979.)

The motivation included the suppressed charging and better visualization of surface relief details that projected itself, among others, into more precise measurement of distances in the images. This experience made the lowenergy range known and acceptable for the community of microscopists but no trends to push the energy further down have been apparent albeit possible sources of motivation existed for a long time in experimental areas adjacent to SEM. The so-called emission electron microscope (EEM) is in fact one of the oldest versions of EM. In this type of directly imaging microscope the specimen itself is the source of electrons, which are emitted under excitations that include impact of photons, electrons and/or ions or high-temperature heating. More than 60 years ago Recknagel (1941) published a theoretical study showing that the immersion objective lens, a crucial part of EEM that accelerates the electrons emitted at quite low energy E0 to some ﬁnal energy E and forms the ﬁrst image of the emitting surface, has surprisingly good properties. Its basic aberration coeﬃcients are proportional to the ratio E0/E so that they decrease even for the lowest emission energies. Of the EEM versions mentioned, the photoemission one (PEEM) is most often met in laboratories at present, partly because of the progress in this method, connected with extended availability of intensive radiation sources at synchrotrons. However, for us another version of the EEM is most important, namely the low-energy electron microscope (LEEM), in which the specimen excitation is made via a parallel coherent wave of slow

SCANNING LEEM

313

electrons. The method and instrument were ﬁrst proposed by Bauer (1962) and demonstration experiments were later performed by Delong and Drahosˇ (1971). Only in the 1980s did the ﬁrst micrographs appear in the literature (Telieps and Bauer, 1985) but since then the method boomed; for a review see Bauer (1994), while more practical aspects are summarized by Veneklasen (1992). Although the LEEM apparatus remains an expensive tool for top specialists, it produced the most attractive and fruitful results from among the surface examination methods except, maybe, probe microscopies. The scanning LEEM (SLEEM) method described here aims at achieving similar results as regards observability of surface-localized physical phenomena, with possibilities of extension toward multiple signal detection. The idea of reversing the function of the immersion objective lens with respect to that in EEM can be originally found in Zworykin et al. (1942) where an electrostatic SEM with biased specimen is outlined. Adaptation to a conventional SEM, by inserting a retarding ﬁeld element below its objective lens, was published by Paden and Nixon (1968). Yau et al. (1981) demonstrated lowering of the aberration coeﬃcients by means of a retarding ﬁeld, either overlapped over the focusing magnetic ﬁeld or arranged sequentially, and even measured the aberration coeﬃcients down to tens of mm at very low energies but their aim was solely to improve tools for electron lithography and annealing and they did not consider any application to scanned low-energy imaging. Many other attempts to retard the primary beam electrons before their impact onto the specimen were published and this history is reviewed by Mu¨llerova´ and Lenc (1992a). It is interesting that, although many of the previous studies proposed quite feasible solutions to the problem of decelerating the beam in SEM, none of the reviewed papers presented convincing results, i.e., micrographs collected throughout the full energy scale. To our knowledge, the ﬁrst such series was published by us (Mu¨llerova´ and Frank, 1993) together with practical experience from adaptation of a commercial SEM to the SLEEM method. The low-energy microscopy program at ISI Brno was started in the 1960s (see above) and after a long break it continued with the ﬁrst demonstration experiments with the SLEEM method (Mu¨llerova´ and Lenc, 1992b) and theoretical examination of properties of the immersion objective lens (IOL) (Lenc and Mu¨llerova´, 1992a,b). The problem of IOL and its optimization was systematically treated by Rose and Preikszas (1992) and Preikszas and Rose (1995). More literature references will be given below. One could easily conclude that the method and corresponding apparatus have been suﬃciently explored to appear on the instrument market and to

314

MU¨LLEROVA´ AND FRANK

enter the broad community of users. Moreover, the method may be launched after quite moderate adaptation to a conventional SEM. Nevertheless, the small number of existing instruments corresponds to only a handful of users, who do not represent a suﬃcient marketing target, so that the barrier has not been broken yet. The ﬁrst commercial device is expected in 2003. In the following the SLEEM method will be discussed in detail from all fundamental viewpoints so that the reader can comprehend it and even start to use it. The scope of the application results is still quite limited and awaits additional users who could contribute to ﬁlling the obscure areas in the interpretation of the contrast observed.

II. MOTIVATIONS

TO

LOWER

THE

ELECTRON ENERGY

The low-energy range below 5 keV is now available in commercial SEM instruments and widely used, for observation of nonconductors, for measurement of dimensions in images, for improved observation of surface relief, etc. In this chapter we will summarize the main advantages of working in this energy range and then continue with the very-lowenergy range. Let us mention here that practical experience with image contrast (and therefore also the awareness of motivation for using it) is quite naturally concentrated mainly in the energy ranges of commercial instruments. These mostly provide quality imaging down to 1 keV where the resolution value is often still guaranteed. Somewhere below 1 keV the imaging properties ‘‘break down’’ and the image quality becomes unacceptable—this threshold is usually met around 500 eV. For microscopes containing compound objective lenses, this limit is shifted to about 200 eV and the performance of devices equipped with an aberration corrector is similar. A. Extensions to Conventional Modes of Operation

It is well known (and also evident from Figures 13 and 16) that the total electron yield per incident electron, acquired from any known specimen, increases when the primary beam energy is reduced below its usual value of 15 to 30 keV. This is because of shrinkage of the interaction volume together with the path of generated secondary electrons towards the surface, which reduces absorption. Because is generally less than 1 at high energies, its increase leads to a decrease in the portion of electrons dissipated in the specimen, suppression of

SCANNING LEEM

315

charging of nonconductive specimens, and lesser demands on requirements to make them conductive. At the critical energy EC2 we get ¼ 1 and a true noncharging microscopy is possible (Frank and Mu¨llerova´, 1994). As the SE yield keeps growing even when below EC2, a signiﬁcant signal increase with respect to the traditional beam energies is achieved, which projects itself into improved signal-to-noise ratio (SNR) in the image. Measurements on elemental specimens showed that the SE signal maximum appears at energy Em between 100 and 800 eV (e.g., Seiler, 1983) and below this energy the yield again falls. The so-called material contrast, based on direct proportionality between the mean atomic number of the specimen and the yield of backscattered electrons, which is reliably available at tens of keV, disappears in the low energy range in the sense that the (E ) curves for various specimens start to cross each other (see Figure 13). Instead, for particular combinations of materials, optimum energies can be found at which the mutual contrast reaches its maximum (see Mu¨llerova´, 2001). As the interaction volume of slower electrons diminishes, information generated in the microscope becomes better localized and more sensitive to the true surface, which is then also more truly visualized. Tiny protrusions and ridges appear on facets that were apparently smooth at higher beam energies. The so-called edge eﬀect, i.e., overbrightening of steeply inclined facets or side walls of surface steps that is apparent and mostly dominant at tens of keV, diminishes here and fully disappears somewhere below 500 eV (in fact near Em for the maximum SE yield). The reason is that the penetration depth of primary electrons (PE) shortens and approaches the escape depth of SE. Consequently, all generated SE are emitted and no surface steps can extend the emitting area. Experience has shown that, in the range of hundreds of eV, a contrast between various grains appears on polycrystalline specimens. This phenomenon needs to be explored more carefully because in principle there are several possible explanations: in addition to dependence of both the generation and absorption of SE and of the electron backscattering on the crystal orientation, the presence of thin surface ﬁlms can also play a role as these layers cease to be transparent here and their thickness (like that of oxides) is also orientation dependent. The energy dissipation in the specimen is clearly smaller at low energies—each electron delivers less energy. The increase in emission of slow SE makes no signiﬁcant change and the BSE emission, which is responsible for the great majority of the energy export, is roughly

316

MU¨LLEROVA´ AND FRANK

constant down to hundreds of eV (see Figure 13). Nevertheless, at higher energies and still at 1 keV the penetration depth (or electron range) decreases faster than linearly (see Bo¨ngeler et al., 1993) so that the spatial density of dissipated energy grows. Then this decrease gradually becomes linear (Joy and Joy, 1996) and the density of dissipated energy stays constant because the deﬁcit in the energy income is just compensated with thinning of the interaction layer. The question of radiation damage is even more complicated than the previous issue. In addition to changes in the total amount of dissipated energy and in its spatial density, the cross-sections for various inelastic phenomena also depend on the energy of incident electrons. For example, cracking of hydrocarbon molecules and creation of the contamination graphitic layer is most eﬀective for electron impact at about 100 to 200 eV. Only in the very-low-energy range do the elastic collisions dominate and the radiation damage diminishes and disappears.

B. New Opportunities

In the very-low-energy range, the wavelength of incident electrons, l [nm] ¼ 1.226 {E [eV]}1/2, becomes comparable with the interatomic distances. As in the classical LEED (low-energy electron diﬀraction) apparatus, the angular distribution of the reﬂected electrons is strongly anisotropic and the intensity is concentrated into diﬀracted beams. In our case, the incident wave is convergent and one can refer to the CBED (convergent beam electron diﬀraction) method used in the STEM (scanning transmission electron microscope). Selective detection of some diﬀracted beams enables one to directly visualize the surface crystallinity and its possible changes. Flat clean crystal surfaces are composed of terraces that are smooth on the atomic level and separated by steps of a height of one or more atoms. If the primary spot illuminates a terrace margin and the electron wavelength is in suitable relation to the step height, the two parts of the wavefront, reﬂected on adjacent terraces, can interfere (the dividedwavefront interference) constructively or destructively and reveal the step although the point resolution of the microscope does not reach the atomic level. A similar phenomenon can be observed when the interference concerns electrons reﬂected from upper and lower interfaces of a thin surface ﬁlm (the divided-amplitude interference). At wedge-shaped layers,

SCANNING LEEM

317

equal-thickness stripes (an analogy to Newton’s rings) should be observed. Electrons impacting the solid with energy just above the surface potential barrier are, according to laws of quantum mechanics, subject to partial reﬂection so that the height and shape of the barrier can be sensed. It is known from LEED experiments that the electron reﬂection (Bauer, 1994; Bartosˇ et al., 1996) is inversely proportional to the local density of electron states coupled to the incident wave. This phenomenon can appear only below 20 or 30 eV of landing energy. The contrast based on the local density of states enables one to directly observe the energy band structure, which opens ways to attractive applications, e.g., in development and diagnostics of semiconductor structures (Frank et al., 2002; Mu¨llerova´ et al., 2002). Already in the low-energy range (and especially for heavier specimens), the elastic electron scattering displays behavior that can be described solely by the quantum mechanical Mott cross-sections, which incorporate screening of the nucleus by electrons, existence of the spin, and the spin–orbital interaction (see, e.g., Reimer, 1998). Thus, the electron spin inﬂuences the image signal and the magnetic microstructure becomes observable provided a spin-polarized beam is used for the illumination (Bauer, 1994). As mentioned above, below about 20 to 30 eV elastic collisions of incident electrons start to dominate so that radiation damage becomes negligible. This can be important for examination of highly sensitive materials and also, for example, for interoperational checks in semiconductor production where any damage should be avoided.

C. Issues Inherent to Slow Electron Beams Problems with the generation of suﬃciently small electron probes in the lowenergy range have been solved reasonably satisfactorily and these beam energies are available in modern instruments. Nevertheless, the low-energy range has been explored only down to about 200 eV, as already noted. Let us now summarize the problems that have to be overcome when lowering the beam energy with an instrument having this energy constant throughout the microscope column.

The chromatic aberration of electron lenses depends on the ratio E/E, where the energy spread E within the beam is given by the emission mechanism used in the gun and E is the beam energy within the lens.

318

MU¨LLEROVA´ AND FRANK

Obviously, the chromatic aberration disc enlarges with decreasing energy and in the low-energy range it usually dominates as regards the image resolution and aﬀects this crucial characteristic adversely. The diﬀraction aberration, i.e., the size of the Airy disc arising from interference of the unscattered electron wave, passing through the aperture-restricting diaphragm, with the marginal wave scattered on the diaphragm edge, is proportional to the wavelength l, i.e., to E1/2. Thus, this contribution to the ﬁnal spot size also grows at low energies. The electron current extracted from the gun is proportional to the extraction voltage. For thermionic cathodes, the gun brightness is linearly proportional to E (Reimer, 1998). For Schottky- and ﬁeldemission guns, this proportionality is not so simple because the ﬁrst acceleration voltage controls the emission and the ﬁnal beam energy is adjusted afterwards. But the beam current always decreases with decreasing energy. In spite of some screening against the spurious electromagnetic ﬁelds coming from the environment, particularly the a.c. ones, which is secured by the material of magnetic circuits, some undesired inﬂuence is usually observed. This grows strongly at low energies and is proportional to the time of ﬂight through the column, i.e., to E 1/2. The situation is most critical in the ultrahigh-vacuum (UHV) devices where the chamber walls are traditionally made of nonmagnetic materials. Any narrow directed beams of charged particles suﬀer from mutual interaction of those particles via Coulomb forces. Particularly in crossovers, the mutual repulsion intensiﬁes so that the size of these crossovers becomes larger than that given by geometrical optics. The consequences of the inner interaction are strongly dependent on the beam current, and for low currents and Gaussian beam proﬁle the crossover broadening is proportional to E3/2 (Spehr, 1985). In connection with the previous point we should also mention that another consequence of the electron–electron interaction within the beam, and again mainly in crossovers, is broadening of the energy spread (so-called Boersch eﬀect). Here again the situation depends on the beam current and also on the crossover shape and dimensions; for stigmatic focusing the mean quadratic broadening E is proportional to E1/4 (Rose and Spehr, 1980) so that ﬁgures are even slightly more favorable at low energies. The conventional detector of secondary electrons of the Everhart– Thornley (ET) type (Everhart and Thornley, 1960) relies upon extraction of SE by means of the front-grid bias to about 300 to 500 V and subsequent acceleration of them with the scintillator potential of

319

SCANNING LEEM

about 10 kV. These electrostatic ﬁelds, oriented laterally to the optic axis, might be found damaging to the primary beam geometry in the low-energy range. Thus, novel approaches to the detection strategy are needed. With decreasing electron energy and reduction of the active depth of signal generation, the surface cleanliness becomes more important. At energies near to the minimum of IMFP the penetration depth of PE becomes comparable with the thickness of contamination layers, both that of the graphitic carbon from cracked hydrocarbon molecules and that of oxide or other products of the surface reactivity. From this point of view, the vacuum conditions within the specimen chamber become more important, as in the case of electron spectroscopies. However, in the very-low-energy range the IMFP again extends and normal vacuum demands are restored. It is obvious from this list of issues that major obstacles arise from physical principles and can only be avoided by keeping the primary electron beam at high energy for as long as possible and decelerating it only shortly before its impact on the specimen. This approach has already been applied in various modiﬁcations that will be outlined here and completed with some new data and experience of the authors.

III. INTERACTION

OF

SLOW ELECTRONS

WITH

SOLIDS

The physics of electron scattering and diﬀusion in solids is described in many original papers and also good textbooks. Precise and suﬃciently detailed analysis of the problem for nonspecialists can be found in the book of Reimer (1998) and a condensed review of the scattering phenomena suﬀered by very slow electrons has been published by Bauer (1994). In this chapter we summarize the main approaches only brieﬂy and depict basic diﬀerences inherent to low- and very-low-energy ranges. Elastic scattering on atom nuclei and inelastic scattering connected with excitation of electrons belonging to the target are fundamental processes determining the range of incident electrons, in-depth distribution of the ionization processes, and consequently also emission of the secondary and backscattered electrons. In order to characterize the individual scattering processes, usually the quantity known as the diﬀerential cross-section d/d is used, which shows the relation between the distance of the original electron trajectory from the scatterer and the angle of its deﬂection , and represents the probability that an electron approaching the target will be scattered into a solid angle d.

320

MU¨LLEROVA´ AND FRANK

Integrating over we get the total cross-section . Multiple scattering is described via statistical quantities, ﬁrst of all by the mean free path between collisions. Multiple elastic scattering causes broadening of the incident beam up to possible backscattering while multiple inelastic scattering causes gradual loss of the electron energy along its trajectory. For us, the most important inelastic process is release of a secondary electron.

A. Elastic Scattering 1. Scattering on Nuclei Scattering of incident electrons on nuclei of the specimen atoms is considered elastic when the mass of the nucleus is regarded as so large with respect to the mass of the electron that after interaction the nucleus remains in rest. This simplifying assumption neglects generation of phonons, which becomes apparent particularly at the lowest electron energies where other losses already vanish. Nevertheless, we will mention this type of scattering among the inelastic types. In the frame of classical mechanics, we can solve the Newton equation containing the attractive Coulomb force between a positively charged nucleus and the negative charge of an electron. The result of the classical calculation of the diﬀerential cross-section, ﬁrst published by Rutherford in 1911, is (for electron energies negligible with respect to the rest energy E0 ¼ mc2 ¼ 511 keV) given by del e4 Z 2 1 ¼ 2 4 2 d 16ð4p"0 Þ E sin ð=2Þ

ð1Þ

where e and m are the electron charge and mass, respectively, "0 is the permittivity of vacuum, and Z is the atomic number of the nucleus. This relation is acceptable for electrons above 100 keV but at low energies it represents a bad approximation. It diverges at ¼ 0 because the small scattering angles arise for electrons ﬂying far from the nucleus where in fact its potential is screened by electrons of the atom. At large scattering angles the approximation also fails owing to neglect of relativistic eﬀects (Reimer, 1998). The screening eﬀects can be described solely by quantum mechanics by determining the scattering amplitude f() of a spherical wave scattered on the atom and superposing it on the incident plane wave. The diﬀerential cross-section is then generally expressed as del =d ¼ jf ðÞj2 :

ð2Þ

SCANNING LEEM

321

The scattering amplitude for the screened Coulomb potential can be found by solving the Schro¨dinger equation for ground states of the atom electrons. A good approximation is the exponential screening with the screening radius equal to RS ¼ aHZ1/3, where aH ¼ 0.0569 nm is the Bohr radius; this gives, after substitution into Equation (2), the so-called screened Rutherford cross-section del e4 Z 2 1 ¼ 2 2 d 16ð4p"0 Þ E sin2 ð=2Þ þ sin2 ð0 =2Þ 2

ð3Þ

with 0 ﬃ l/2pRS (Reimer, 1998). This cross-section already produces ﬁnite values for small and can be further improved by taking into account the full series of exponential potentials instead of only one, by incorporating potentials from neighboring atoms (e.g., via so-called muﬃn-tin model), by modiﬁcation of the scattering potential by correlation and exchange phenomena between incident and target electrons, etc., and/or decomposition into partial waves can be used. The exact cross-sections, so-called Mott cross-sections (Mott and Massey, 1965), for elastic large-angle scattering may be obtained for the screened Coulomb potential when the relativistic Schro¨dinger or Pauli– Dirac equation is used. The result is then in the form of a superposition of terms belonging to both spin directions with respect to the direction of propagation but no analytical expression for d/d can be written. For an unpolarized electron beam, the Mott cross-section remains axially symmetric and in the general Equation (2) two formally identical members are summed on the right-hand side. These can be then developed into a series of Legendre functions (e.g., Ding and Shimizu, 1996). In addition to modiﬁed values at large scattering angles, the Mott cross-sections exhibit one property not met before, namely nonmonotonic angular dependence as shown in Figure 3. Obviously, this behavior emerges in the lowenergy range for large Z while for small Z it is not present until near the very-low-energy range. Data regarding the Mott cross-sections for chemical elements at low and even very low energies can be taken from numerous sources (see, e.g., Reimer and Lo¨dding, 1984; Czyzewski et al., 1990; Werner, 1992). The role of the electron spin in scattering was examined by Kirschner (1984). When decreasing the energy of incident electrons toward the verylow-energy range, the Mott cross-sections, expressing a relativistic freeelectron scattering incorporating partial waves, overestimate the scattering rate and bring unrealistic short elastic mean free path (EMFP) in tenths of nm only. It was suggested (Fitting et al., 2001) that they should be

322

MU¨LLEROVA´ AND FRANK

FIGURE 3. Diﬀerential cross-sections for elastic scattering of electrons at various energies, calculated by decomposition into partial waves. (Reprinted with permission from Ichimura, 1980.)

replaced here with factors inherent to quasi-elastic scattering on acoustic phonons; this mechanism works down to the thermalization threshold of electrons at mean energy 3kT/2, i.e., a few tens of meV, and preserves EMFP in the nm range. Then the scattered electrons are considered as quasifree Bloch electrons within a dispersion relation of the conduction band of the target. Acoustic phonons have energies of a few meV only, but scattering on them is nearly isotropic so that they eﬀectively inﬂuence any oriented stream of electrons and, for example, lower the electric conductivity. While the total elastic cross-section el characterizes a single scattering event, the multiple scattering is described by the EMFP el ¼ 1=Nel

ð4Þ

where N is the number of atoms per unit volume. As Figure 4 shows, this quantity also exhibits nonmonotonic behavior starting from the low-energy range. We can conclude that, for slow electrons, anisotropy appears already in scattering on single atoms and resulting features then combine with a directional segregation owing to interference of partial waves from a lattice of scatterers. Further, the path length between the scattering phenomena generally shortens down to the lowest energies but in the very-low-energy range this dependence is far from being monotonic.

SCANNING LEEM

323

FIGURE 4. Calculated energy dependence of elastic mean free paths (EMFP) for electrons. (Reprinted with permission from Ding, 1990.)

2. Reflection on Energy Gaps Even above the vacuum level, the energy band structure exists and the energy states E(k) are separated by forbidden gaps, particularly where the kvector touches the boundary of the Brillouin zone. If the incident electron hits the gap, it does not enter allowed states and should be reﬂected. Nevertheless, total reﬂection is not obtained because the electron can pass an inelastic collision and lose energy or it changes its wave-vector owing to scattering on a phonon or some crystal imperfection—in both cases a shift to the allowed states can occur. In the range of units of eV elastic reﬂection is strongly enhanced. Electron microscopists have never had any reason to take this phenomenon into account, as its appearance requires the electron impact to be far below the energy range available in the SEM. But those using the VLEED (very-low-energy electron diﬀraction) method know the energy band structure region existing on the intensity vs. energy (I–V ) curve for the specularly reﬂected (00) spot below the threshold where the ﬁrst nonspecular diﬀracted beam appears (Jaklevic and Davis, 1982). It is important to note that the incident electron wave has to couple to the energy states into which it is to penetrate. This means those Bloch states inside the specimen, the surface-parallel wave-vector component of which is equal to Kk þ g, where Kk is the vector component of the incident electron and g is any surface reciprocal-lattice vector (Strocov and Starnberg, 1995); in other words, those Bloch states that have dominant Fourier components resembling the incident wave (the coupling bands). The local extremes on the energy dependence of reﬂectivity R(E ) are ﬁrst of all connected with critical points of the energy bands E(k) at which @E/@k? exhibits sharp changes or is equal to zero at the band-gap edges. When mapping those

324

MU¨LLEROVA´ AND FRANK

critical points upon variation of Kk, complete bands in the symmetry planes of the Brillouin zone can be compiled (see Strocov et al., 1996). A further crucial condition is low absorption of electrons, which is met below landing energies of 25 to 30 eV. Simulations show that any local R(E ) features are washed out at even moderate absorption (appearing for the imaginary part of the crystal potential exceeding 1 eV) and that the model ﬁts better the experimental data when the nonisotropic situation is considered with absorption reduced in directions along the surface (Bartosˇ et al., 1996). In addition to extremes of R(E ) revealing the critical points at the coupling bands, oscillations might also appear as minor features. These are connected with surface resonances caused by interference between the specular beam and a nearly emerged nonspecular diﬀracted beam moving parallel to the surface, and can be used for mapping the surface potential barrier (Jaklevic and Davis, 1982). Figure 5 demonstrates the reﬂection anisotropy connected with its k vector dependence, which enables one to get contrast between diﬀerent crystal orientations at suitable energies. Mapping of the local variations in the density of states at the energy of the electron impact, for example those connected with the local doping of semiconductor structures, is also potentially available. B. Inelastic Scattering The term inelastic scattering is usually used to describe an interaction between the incident electron and the atomic electrons in the target. More generally, this should include all phenomena at which the impinging electron changes its energy. 1. Scattering on Electrons The main mechanisms of interaction between electrons include:

quasifree electron–electron collisions (i.e., a Compton-like scattering), excitation of electrons within partially occupied energy bands, excitation of interband transitions, excitation of plasmons, i.e., energy quanta connected with the charge density waves of valence or conduction electrons, and ionization of inner shells of the atoms. These mechanisms exhibit not only various cross-sections, but also represent very diﬀerent amounts of energy interchanged between colliding particles. Although inelastic scattering is often assumed to cause only energy decrease

SCANNING LEEM

325

FIGURE 5. Reﬂection coeﬃcient R(E ) of the W(110) and W(100) surfaces for slow electrons. (Reprinted with permission from Bauer, 1994.)

but not trajectory deﬂection, some collisions can be associated with large scattering angles (like the Compton scattering). In spite of this, the relevant diﬀerential cross-section is often considered in the energy scale instead of in the angular one, namely as d/dW, where W is the transferred energy. Energies transferred at ionization range from a few eV up to nearly 100 keV depending of the atomic number and electron shell involved. Excitation of plasmons and electron transitions cause energy losses not exceeding tens of eV but the loss due to the electron–electron collision can be up to 50% of the initial energy. If we neglect any binding forces acting upon the target electron at rest and consider the incident electron approaching with energy E, we can use classical mechanics to get the diﬀerential cross-section (Reimer, 1998) din pe4 1 ¼ : dW ð4p"0 Þ2 EW 2

ð5Þ

This relation is derived upon the assumption that the target electron stays at rest during the collision and only acquires momentum—this is not the case for slow electrons and hence for E ! 0 (and also for central collisions) this cross-section diverges. But Equation (5) indicates that this type of scattering is more common at low energies, that small energy losses are

MU¨LLEROVA´ AND FRANK

326

more probable, and, because the same approximation gives the scattering angle as sin2 ¼ W=E,

ð6Þ

small scattering angles are also more probable. If this simpliﬁed approach is upgraded, correcting terms due to indistinguishability of electrons and due to their spin are added to 1/W 2. These are of the same dimension (energy)2 and combine E, W, and E0. Thus, the added terms also grow at low energies but for W E these corrected relations converge to Equation (5). An equation derived by Gryzinski (1965), which takes into account the binding of electrons in atoms, also converges to Equation (5) for low binding energies. In fact, the ‘‘continuum’’ of losses owing to scattering on quasifree electrons appears in the EELS (electron energy loss spectroscopy) spectra only in the range of hundreds of meV (Reimer, 1995). The inner shell ionization can be solved in the same way as the problem of screening of the nuclear potential, i.e., by using the Schro¨dinger equation for the nucleus, one atomic electron, and one incident electron, which leads to Equation (3). Now, excited states of the target electron are incorporated too and the ﬁnal result, the total cross-section, e.g., for ionization of the K shell, is (Reimer, 1998) K ¼

pe4 zK bK ln u ð4p"0 Þ2 EK2 u

ð7Þ

where zK is the number of electrons in the shell (zK ¼ 2), bK is a constant factor (bK ¼ 0.35), EK is the ionization energy of the shell, and u ¼ E/EK is the overvoltage. A maximum of K appears at u 3 for all atomic numbers and at lower energies K steeply falls. This means that throughout the lowenergy range the electron impact ionization is possible for every atom but in the very-low-energy range this type of scattering does not take place. The diﬀerential cross-section from the same calculation is 2 4

din eZ 4 1

¼ 1 d 1 þ 2 þ E2 =02 ð4p"0 Þ2 E 2

!2 3 5

1 2

þ E2

2

ð8Þ

where the characteristic angle is E ¼ J/4E with J being the mean ionization potential of the atom (J [eV] 10Z) and 0 is that from Equation (3). This relation is similar to Equation (3) and we can compare two characteristic features. First, at E ¼ 5 keV the inelastic scattering is still conﬁned to smaller deﬂection angles (e.g., for Z ¼ 30, 0 is 10 times larger than E) but this

SCANNING LEEM

327

diﬀerence is less marked at lower energies. Second, the ratio of both diﬀerential and total cross-sections for inelastic with respect to elastic scattering is proportional to 1/Z, at least for large scattering angles. Within so-called dielectric theory, considering the solid described by the complex dielectric constant " and employing the analogy between the inelastic scattering of electrons and spatial attenuation of electromagnetic waves being proportional to the imaginary dissipative part of ", the diﬀerential cross-section can be written as (e.g., Bo¨ngeler et al., 1993)

d2 in 1 1 1 ¼ 2 Im 2 "ðW, Þ 2 þ D dW d p aH EN

ð9Þ

with D ¼ W/2E. The analogy is based on modeling groups of electrons, similarly strongly bound within the given energy-band structure, by oscillators deﬁned by their strengths and characteristic frequencies. So the problem is now shifted to determination of the complex dielectric constant ". Similar relations and results as regards the inelastic cross-sections are obtained when using the formalism characterizing the incident electron as a quasiparticle with self-energy, the imaginary part of which describes the quasiparticle lifetime while the real part expresses the shifts in the energy eigenvalues with respect to the noninteracting system. The same holds for the formalism of the electron–jellium correlation potential with the imaginary part governing attenuation of the dielectric response of jellium to the electron impact. Equation (9) used to be written also in variables W and q (the momentum) or q and ! (with W ¼ h!/2p). An overview of these approaches was published by Nieminen (1988). The energy loss function, written as Im[1/"(q, !)], can be calculated on the basis of EELS experimental data for q ¼ 0 (the ‘‘optical’’ data) when employing, for example, the quadratic dispersion relation (Kuhr and Fitting, 1999) !ðqÞ ¼ !ð0Þ þ

h 2 q: 4pm

ð10Þ

Figure 6 shows an example of the measured dielectric loss function for SiO2 (Fitting et al., 2001). This contains peaks inherent in scattering on optical phonons, which will be mentioned in the next section. Calculations of quantities characterizing the inelastic scattering, which employed the dielectric function, were performed by many authors (e.g., Cailler et al., 1981; Powell, 1974, 1984, 1985; Penn, 1987; Egerton, 1986; Ding and Shimizu, 1996).

328

MU¨LLEROVA´ AND FRANK

FIGURE 6. The dielectric loss function for SiO2. (Reprinted with permission from Fitting et al., 2001.)

In Figure 7 we see the calculated energy dependences of in for two elements, including the main contributions to in. These curves reﬂect the general IMFP curve in Figure 2, which we used for the deﬁnition of the very-low-energy range (with IMFP, lin, deﬁned analogously to Equation (4)). Further data regarding the IMFP behavior at low energies can be found in the work of Ding and Shimizu (1996), Powell (1987), Tanuma et al. (1991a,b), and others. We can notice that, starting from lowest energies, ﬁrst the electron–electron scattering appears, then the ionization, and ﬁnally the plasmon excitation emerges. Let us repeat that the steep drop in in below about 50 eV is the most important feature here and also the reason for considering the use of this range as a separate mode of SEM. 2. Scattering on Atoms In the dielectric loss function in Figure 6, noticeable peaks that appear around 100 meV belong to scattering of electrons on optical phonons. Having formally separated the inelastic phenomena due to scattering on electrons in the previous paragraph, we discuss this mechanism here. The electron–phonon interactions are important mainly in dielectrics and insulators, but also in semiconductors. The forward scattering on longitudinal optical phonons (LO) is strongest. In 1969 Llacer and Garwin calculated (by means of Monte Carlo simulations) the secondary electron transport in alkali halides below 7.5 eV using the time-dependent perturbation of plane waves with the interaction Hamiltonian containing the polarization ﬁeld caused by relative displacement of ions in the LO vibrational modes. Schreiber and Fitting (2002) discussed these phenomena in detail for SiO2 and included two LO modes with energies

SCANNING LEEM

329

FIGURE 7. Calculated total inelastic cross-sections (———) and their main contributions, namely the electron–electron scattering (— — ), shell ionization (- - - - - -), and plasmon excitation (— — ). (Reprinted with permission from Ho et al., 1991.)

of 60 and 150 meV and also scattering phenomena representing both emission and annihilation of phonons. They also presented the scattering rates of collisions, which are much higher for the phonon emission phenomena. At about triple phonon energies, these rates reach their maxima between 1014 and 1015 s1 and toward higher energies they fall approximately as E1/3. Thus, this scattering mechanism concerns mostly electrons with energy around 1 eV. It seems clear that, even when incorporating the phonon scattering, the enlargement of IMFP in the very-low-energy range (see Figure 2) is preserved at least for conductors and possibly for semiconductors. In Figure 8, comparison is made on the basis of data simulated by the Monte Carlo (MC) program employing the dielectric loss function (Kuhr and Fitting, 1999). Further, Figure 9 details the contributions to IMFP for SiO2, again presenting results of a MC program specialized to the very-low-energy scattering in semiconductors and wide-gap insulators. In addition to the mean free paths, both ﬁgures also contain the attenuation lengths characterizing a no-loss escape of electrons. We note that the energy range below 50 eV still has some structure in Figure 9, which could be utilized for the subdivision of this range. Nevertheless, this would be specimen-speciﬁc and would not allow any general conclusions to be drawn. In studies of the very-low-energy electron scattering, one more scattering mechanism is mentioned, namely the intervalley scattering (e.g., Schreiber and Fitting, 2002). This consists of collisions with suitable optical phonons, at which, in addition to energy loss corresponding to the phonon energy, additional energy and also momentum is transferred because the ﬁnal state is in a diﬀerent band or ‘‘valley’’ within a multiple-band structure. For SiO2, this type of scattering occurs much less frequently than the LO scattering.

330

MU¨LLEROVA´ AND FRANK

FIGURE 8. Elastic (el.) and inelastic (inel.) mean free paths and attenuation lengths (atten.) for Ag, Si, and SiO2, calculated by means of a MC program incorporating the Mott crosssections and the dielectric loss function. (Reprinted with permission from Fitting et al., 2001.)

FIGURE 9. The mean free paths in SiO2 as a function of the electron energy for scattering at optical phonons (LO) and acoustic phonons (ac) and for impact ionization (ii), together with the attenuation length (at) for monoenergetic electrons. (Reprinted with permission from Schreiber and Fitting, 2002.)

For completeness we should also mention here the inelastic scattering of electrons on the screened Coulomb potential of the nucleus, leading to generation of an X-ray photon of the continuous emission (Bremsstrahlung emission). The low probability of radiative scattering on the nucleus can be demonstrated by comparing the ratio of the mean energy loss per unit trajectory Srad to the analogous quantity for the electron–electron scattering. When using an approximate stopping power Se–e according

SCANNING LEEM

331

to the Thomson–Whiddington law (see next section), we get the ratio (Feldman and Mayer, 1986) Srad 4 Z v2 ﬃ Se–e 3p 137 c

ð11Þ

which can be simply written as (Z/161) E/E0 so that in the low-energy range it falls to a value of the order of 103 or 104.

C. Penetration of Electrons The primary beam in SEM strikes the specimen surface at a point, the coordinates of which within the ﬁeld of view are then used to describe the localization of all information collected during the dwell time of the beam. Nevertheless, the primary electrons penetrate to nonnegligible distances in all directions from the impact point and within this interaction volume they cause scattering phenomena and generate signal species. Thus, the abovedescribed single-scattering mechanisms are important not only for interpretation of the observed properties of emissions but also for tracing the spatial distribution of the information sources. The analysis of the electron penetration goes through the concept of multiple scattering, which can be characterized by statistical quantities only. We have mentioned the mean free paths for individual types of scattering. From Figures 2 and 4 or from Figure 8 it is obvious that, throughout the low-energy range, the ratio of rates for elastic and inelastic scattering is approximately constant and dependent on the mean atomic number of the target. The very-low-energy range is characterized by the onset of a strong dominance of the elastic scattering. When penetrating into the specimen (and, after undergoing some highangle scattering events, also into lateral directions) by a distance dx, the electron encounters N dx atoms (where N ¼ NA/A is the number of atoms per unit volume, NA is the Avogadro number, the target density, and A the atomic mass). Thus, the decrease in the stream of unscattered electrons within the trajectory section dz is dI/I ¼ N dz, where is the total crosssection of one atom for a particular scattering mechanism. The unscattered intensity after passing the thickness l is I ¼ I0 exp(l/l) (with l as the mean free path), p ¼ l/l is the mean number of collisions in the layer, and Pn ¼ pnep/n! is the probability of n collisions for one electron. This simple model can be used only up to about p 25 (see Reimer, 1998), i.e., only for tracing the penetration to distances of the order of 101 nm.

332

MU¨LLEROVA´ AND FRANK

In the course of its penetration, the electron beam broadens as regards both its spread of angles and also its cross-section. Within the approximation of small energy losses and small scattering angles, the rootmean-square (RMS) width of the beam increases as l3/2 (Reimer, 1998). Nevertheless, this approximation is not good for low energies and successful modeling of the geometry of electron penetration is possible only by using tools such as MC programs. The multiple inelastic scattering is responsible for the ﬁnite length of the electron path within the target. The appropriate statistical quantity for the examination of this process is the stopping power S ¼ dEm/dx (with dEm for the mean energy loss), corresponding to the continuous slowing-down approximation. This approach, which neglects discreteness of the collisions, does not allow study of the emission of elastically backscattered electrons (eBSE) but is useful for MC programs simulating the SEM image signals. When taking into consideration the e–e interactions only, the so-called Bethe formula, usually written as (Reimer, 1998) 2pe4 Z NA E ln 1:166 S¼ J ð4p"0 Þ2 E A

ð12Þ

represents the ﬁrst approximation. For composite targets, the individual stopping powers have to be accumulated so that the relation for S contains a sum of terms like Cim ðZi =Ai Þ lnðbE=Ji Þ, where Cim are the mass fractions. This sum is often replaced by some energy-independent factor and the resulting expression is then called the Thomson–Whiddington law. Another practically convenient form of the stopping power relation for elemental targets is (Joy and Luo, 1989) S ¼ 7850

Z X Zi E ln AE i Z Ei

½eV=nm

ð13Þ

where is in g cm3, Zi is the occupancy of the level i, and Ei its binding energy. Equation (13) is claimed to work down to the binding energy of the outermost occupied level. While the diﬀerential cross-sections for the main scattering mechanisms are all proportional to E2, the stopping power according to Equations (12) and (13) increases only as E 1. The validity of approximation (12) is restricted to high energies, notably in the dependence on the ratio E/J, so that for light elements it is acceptable down to about 1 keV. For lower energies, the correction J ! J0 ¼ J/(1 þ kJ/E ) with k ﬃ 0.8 is possible. Below E/J ¼ 6.3 the energy dependence of S used to be replaced by S / E0.5 (Rao-Sahib and Wittry, 1974) but some authors assert that this parabolic

SCANNING LEEM

333

relation overestimates the energy loss of slow electrons (see Ding and Shimizu, 1996). In the very-low-energy range, the stopping power seems to behave according to the statistical theory of Tung et al. (1979), in which electrons in the target are considered to form a homogeneous electron gas. Then S / E5/2 for all targets (Tung et al., 1979; Nieminen, 1988), which corresponds to the sharp fall obvious in Figure 7. A theoretical model exists also for the most probable electron energy after passing a layer of the target, together with the distribution around this mean value. For SEM applications, the energy distribution of the backscattered electrons, which is mentioned below, is relevant. From the practical point of view, we need to know to what depth the primary electrons penetrate and what is the escape depth of the signal species. Various quantities have been deﬁned to measure these distances and one of them is the attenuation length shown in Figures 8 and 9. The most useful is the electron range R, which can be deﬁned in several diﬀerent ways according to the method of measurement (see Reimer, 1998). Determination of the electron range is possible via measurement of the number of electrons T(x) passing a foil of a given material with some known thickness x. Because R depends also on energy E, it is convenient to use one foil thickness and to vary the energy. It is also advantageous to use an extrapolated value Rx (obtained by extrapolating the linear part of T(x) toward T ¼ 0) instead of measuring down to really negligible transmission in order to get some Rmax. Most of the experimental data obey a simple law R ¼ aE n

ð14Þ

with a around 10 and n decreasing from 5/3 at high energies to about 4/3 at low energies (Bo¨ngeler et al., 1993). This relation seems to be valid down to about 1 keV and only few data exist below this energy. Salehi and Flinn (1981) veriﬁed the power law (14) for the penetration depth using two diﬀerent amorphous glasses within the energy range 100 to 5000 eV and found n as 1.4 and 1.5 with the larger value for higher mean atomic number. The theoretical limit for the electron range can be obtained by integrating the stopping power S up to the particle ‘‘stop,’’ which gives some RS. Experimental data will provide lower values. According to Reimer (1998), for light elements with Z below about 20 we get Rmax ﬃ RS and Rx ﬃ 0.75RS, while for high Z above 50, Rmax
334

MU¨LLEROVA´ AND FRANK

toward a nearly energy-independent behavior at 100 eV, with values of 10 nm for Si and 6 nm for Au. One further parameter of the electron beam penetration is the depth distribution of the energy dissipation. The interaction volumes of the beam are roughly the same size for all specimens when distances are measured in the mass thickness x. Nevertheless, as the MC simulations show, for light elements the majority of scattering events is concentrated in the central level of the volume and somewhat below it while for heavy elements more scattering takes place above the central level.

D. Heating and Damage of the Specimen In introductions to papers dealing with low-energy microscopy, especially those concerning the instrumentation, we often meet formulations stating that the low energies are advantageous because of reduction in the specimen radiation damage. However, this opinion is wrong in most instances, at least within the low-energy range usually meant in the statements cited. Although the slow electrons really deliver less energy per impact, their interaction volume shrinks strongly and the spatial density of energy dissipation even increases. If the primary beam spot is considered stationary and its interaction volume hemispherical of a radius R/2, the temperature increase in the illuminated point amounts to (Reimer, 1998) T ¼

3dUI 2pcR

ð15Þ

where U and I are the accelerating voltage and beam current, respectively, d is the portion of the incoming power that is dissipated into the specimen, and c [Js1m1K1] is the thermal conductivity of the target. Obviously, the heating is proportional to E/R and because of Equation (14) T increases at low energies. This heating is naturally higher for any noncompact material; e.g., for ﬁbers it increases roughly in the ratio of their length to diameter (Reimer, 1998), and cannot be signiﬁcantly suppressed by surface metalizing because of insuﬃcient cross-section of the metal layer. When considering the electron probe scanning an island of area SA of a layer of thickness l R/2 from a low thermal conductivity (e.g., organic) material deposited onto a metal surface, we can derive the simple relation for its heating T ¼

d jU ðl R=4Þ c

ð16Þ

SCANNING LEEM

335

with the illumination current density j ¼ I/SA. Now we get T decreasing at low energies but when SA denotes the size of the ﬁeld of view on a larger area of this organic material, the lateral heat escape can reverse again the energy dependence of the heating, according to the relation between SA and R2. In the low-energy range, the energies of emitted SE and BSE are still so diﬀerent that in spite of the SE yield being higher than that of BSE, , the energy output mediated by SE can be neglected. Thus, d 1 and it remains nearly constant over the low-energy range. Further, because of / E0.8 (Drescher et al., 1970) at energies suﬃciently above the maximum of , i.e., above, say, 2 keV (see the next section), we can operate with the primary current decreasing in the same ratio with the decreasing energy. Taking this into account, we ﬁnd (even in Equation (15)) that the specimen heating slowly decreases at low energies down to about 2 keV where the increase starts again. The temperature distribution around the moving electron probe was studied by Kohl et al. (1981). For common materials, particularly metals, semiconductors, and even insulators with suﬃcient thermal conductivity, the temperature increase remains far below 1 K. Nevertheless, some materials such as foams and gels have very low thermal conductivity and their heating might be critical. These specimens were studied by, for example, Brown and Swift (1974), Berry (1988) and Price and McCarthy (1988). Mostly low-energy modes are recommended for sensitive specimens. The direct radiation damage of the specimen material consists ﬁrst of all in breaking the chemical bonds, decomposition of molecules, and possibly release of gaseous components. Because it is generally ionization phenomena that are in question here, the energy dependence can be assessed according to the stopping power S. According to Equation (12), S / ln(E/J)/E, i.e., approximately S / E0.8. This is an even steeper slope than that of E1/3, which results from the simplest assumption about the beam energy tIE/e (with t being time) distributed homogeneously into the depth measured by R / E4/3. Hence the spatial density of the radiation damage events increases throughout the low-energy range. For organic molecules the radiation damage mechanisms are diverse and it is not possible to survey them here. Let us brieﬂy mention semiconductors. Incident electrons generate, for example in silicon, electron–hole pairs and holes can be trapped in the SiO2 layer where they have much lower mobility than electrons. Hence the density of surface states on the SiO2–Si interface increases, which in turn increases the rate of surface recombinations and the generated space charge might even cause layer inversion. For example, in MOS (metal–oxide–semiconductor) structures, the electron bombardment can induce changes in the threshold

336

MU¨LLEROVA´ AND FRANK

voltage, gain, and dark current, i.e., in all the important parameters of the device. Some of these eﬀects disappear only after long heating to temperatures above 250 C. Nevertheless, signiﬁcant suppression of these inﬂuences has been proved for electron energies below 1 keV, which are also used in IC (integrated circuit) testers. A special kind of radiation damage consists in breaking the carbon bonds to oxygen, hydrogen, and other atoms in hydrocarbon molecules. Owing to their high sticking coeﬃcients, these molecules are always present on the inner walls of vacuum vessels including the specimen surface, unless this has been prepared or cleaned in situ. The carbon atoms then close double bonds and create a ‘‘polymerized’’ graphitic layer, the thickness of which progressively grows owing to easy diﬀusion of additional hydrocarbon molecules from the nonilluminated neighborhood of the ﬁeld of view. Consequently, dark rectangles with darker frames indicate the areas of previous observation; these eﬀects were studied by, for example, Fourie (1976, 1979, 1981) and Reimer and Wa¨chter (1978). At low current densities, the contamination thickness is proportional to dissipated energy, i.e., to the stopping power (which increases down to the boundary of the very-low-energy range). When all the molecules that have diﬀused into the illuminated area within a time interval are cracked in that time, the contamination rate saturates and then the contaminant accumulation becomes linear in time. As indicated above, the carbon contamination rate increases with decreasing E; according to experience, the situation around 100 to 200 eV is the most critical. Let us underline that, within a certain energy range of a few hundreds of eV, this phenomenon is the main obstacle against routine taking of micrographs (see, e.g., Figure 65). Fortunately, from the beginning of the very-low-energy range, this trend is reversed, elastic phenomena start to dominate, and the radiation damage disappears.

E. Specimen Charging As already mentioned above, one important reason for using low energies in SEM is to gain an advantage in the perpetual ﬁght of SEM microscopists against specimen charging (see, e.g., Pfeﬀerkorn et al., 1972; Welter and McKee, 1972; Morin et al., 1976). In the next section we will see in more detail that in general the amount of charge emitted from the specimen diﬀers signiﬁcantly from the incoming charge. The diﬀerence is dissipated in the target and when this has a low conductivity, the absorbed current is not carried away from the illuminated area eﬃciently enough and signiﬁcant charge density gradients arise. Furthermore, except for glasses and other

SCANNING LEEM

337

FIGURE 10. Typical energy dependence of the total electron emission for specimen tilt angles 3> 2> 1 indicating the development of charging processes (see text for details).

homogeneous materials, the nonconductors and particularly specimens from the area of the life sciences are as a rule of a heterogeneous and anisotropic nature. Thus, the specimen charging is usually also inhomogeneous with the result that electric ﬁelds that vary strongly both in space and time are created above the surface. These ﬁelds destroy the micrograph geometry by deﬂecting and defocusing the primary beam, and also aﬀect the brightness distribution by inﬂuencing the signal electron trajectories toward the detector. Qualitatively the issue can be comprehended from Figure 10. For every specimen, the total electron yield, ¼ þ , exhibits a maximum, which for the great majority of elements and compounds and for all nonconductors exceeds the value 1.0. When progressing from the conventional SEM energies around 15 keV downwards, the (E ) curve rises and crosses the unit level at the critical energy EC2. This is the optimum energy for no charging and we will discuss ways of employing it for practical microscopy in Section VIII. Further, (E ) reaches its maximum at some Em0 that more or less coincides with the maximum of the SE emission at Em (see the next section) and then descends, crosses again the unit level at EC1 and enters the range where no general curve can be drawn owing to very diverse behavior of the BSE emission. Finally, (E ) ! 1 when approaching the mirror microscopy range at and below the zero energy impact. The diﬀerences in curves labeled with angles i express the angular dependence of the emission, which here means the inﬂuence of the specimen tilt and also of local inclinations corresponding to the surface relief.

338

MU¨LLEROVA´ AND FRANK

Suppose now the primary beam incident at energy E0 >EC2 on a poorly conducting surface that exhibits a ﬁnite leakage resistance RG between the illuminated point and ground; this is a measure of the ability of the specimen to carry the incoming charge away. We can characterize it by a straight line of slope (eRGIP)1 with IP as the specimen current, which corresponds to a positive surface potential formed on RG. The potential drop across RG partially compensates the negative potential of the accumulated surface charge. The residual local potential decelerates the incoming beam so that its landing energy decreases causing (E ) to increase. This iteration continues until an equilibrium is reached at the point A for the ﬁnal landing energy E where the incident and leakage currents are equal, leaving some net surface potential US<0. Obviously, for RG ! 0 no charging occurs, the leakage line is vertical, and E ¼ E0 while for RG ! 1 the line is horizontal, E ¼ EC2 at the equilibrium point A1, and maximum charging takes place. The charging process should be identical for E 0 <EC2, only with a positive sign of US, but an important diﬀerence is that the positively charged surface reattracts the slowest secondary electrons so that instead of the ‘‘working point’’ moving along the original curve (E ), this curve is modiﬁed toward lower eﬀective SE emission. This results in a reduced positive surface potential US0 . Let us underline that here we discuss exclusively the above-surface electric ﬁelds, which are those inﬂuencing the image acquisition. Thus, we consider the specimen illuminated by the given beam current within a certain energy range. Quite a diﬀerent issue is to interpret the internal charging phenomena and alterations arising inside the specimen due to electron beam bombardment. These questions have been thoroughly studied by Cazaux and collaborators (see, e.g., Cazaux, 1986, 1996a,b, 1999; Cazaux and Le Gressus, 1991; Cazaux et al., 1991; Cazaux and Lehuede, 1992; Le Gressus et al., 1990). They studied physical phenomena connected with electron bombardment of insulators, electron trapping, internal electric ﬁelds, electromigration of ions due to these ﬁelds, etc., and when external ﬁelds are treated in these works they are considered as a consequence of internal charge distributions. We will restrict ourselves to phenomenological assessment of the net surface charge, based on its inﬂuence on the (E ) curve. Thus, we will regard the charging as the creation of a thin charged plate of the same size as the ﬁeld of view just at the specimen/vacuum interface. Otherwise, even a simple physical model has to adopt a positively charged below-surface layer from which SE are emitted and a deeper negatively charged region where PE are trapped. Between EC1 and EC2, a third layer appears closest to the surface in which the reattracted SE generate

SCANNING LEEM

339

some negative charge again. The above-surface ﬁeld is then dependent on the total charge distribution. Furthermore, the (E ) curve cannot be understood as a static property of the specimen since it exhibits a dynamical behavior, the specimen emission yields being aﬀected by the penetrating charged species. We now look at the dynamics of the charging process based on our phenomenological model—these data will be needed for one important SLEEM application described below. From the (E ) curve in Figure 10, we easily deduce E E0 : ð17Þ ðEÞ 1 ¼ eRG IP Let us assume for simplicity that the charging process is characterized by only one time constant C so that the accumulated charge develops as Q ¼ Qmax[1exp(t/ C)]. This time constant will now be determined: Qmax C ¼ : ð18Þ ðdQ=dtÞt¼0 We consider the charged ﬁeld of view to be a thin circular disc of diameter a and charge density q, situated in a medium of permittivity ". From the Coulomb law and the principle of superposition, we get the disc potential as US ¼ qa/2". Figure 10 gives eUS ¼ E E0 , so that Qmax ﬃ qa2 ¼ 2"aUS ¼

2"a ðE E 0 Þ: e

ð19Þ

At t ¼ 0 the dissipated part of the beam current is [ (E0 )1] IP ¼ dQ(0)/dt (<0 for negative charging). In this equality, we substitute for (E0 ) the ﬁrst two terms of the Taylor expansion around E and then substitute for (E ) from Equation (17). Finally we obtain

dQ 1 d ﬃ ðE E 0 Þ IP : dt t¼0 eRG dE

ð20Þ

The behavior of (E ) for energies above the critical energy can be estimated, according to the relations const and / E0.8 (Reimer et al., 1992), as (E ) ﬃ þ (1 )(E/EC2)0.8, which enables us to take the derivative in Equation (20). Substituting now from Equations (19) and (20) into Equation (18), we ﬁnd for large RG the ﬁnal result C ﬃ 2:5

"aEC2 : eIP ð1 Þ

ð21Þ

340

MU¨LLEROVA´ AND FRANK

In order to get some quantitative ﬁgure, let us use the values ¼ 0.2, "r ¼ 4, EC2 ¼ 2000 eV, IP ¼ 1 nA as an example. Then the time constant C varies between 200 ms and 20 ms as the size of the ﬁeld of view changes from 1 to 100 mm. Practical experience conﬁrms that some charging is nearly instantaneous and might correspond to this range of C but afterwards further changes in the image are usually seen for seconds or even longer. We stress that the above calculation diﬀers from the approach often met in the literature (see, e.g., Shaﬀner and Van Veld, 1971; Welter and McKee, 1972; Cazaux, 1986), which considers the charging dynamics to be identical with the charging of a plate capacitor situated between the specimen surface and a metallic holder; then C ¼ ". At high resistivity , this time constant can be very long, e.g., about 3300 s for SiO2. Nevertheless, we recall that we aimed at getting a time constant for the progress in image destruction eﬀects caused by the above-surface ﬁeld, but the capacitor ﬁeld is entirely closed between its plates and hence restricted to inside the specimen. A weak point of our approach is that the specimen current is considered to be of an ohmic nature, which might not always be realistic (see Cazaux et al., 1991). The crucial quantity in the above considerations is the critical energy EC2. This quantity can be easily measured on conductors (via the absorbed current) but the opposite holds for the nonconductors that are of interest here; Reimer et al. (1992) summarized possible methods for this case. While in Figure 1 we have the values of EC2 for some conductors, Joy (1989) published values for a choice of technologically important inorganic insulators between 550 and 3000 eV and for a selection of polymers, he found EC2 from 0.4 to 1.8 keV. The dependence of EC2, governing the scatter of these properties at rough specimens, can be estimated as EC2( ) ¼ EC2(0) sec2 (e.g., Joy, 1989). This relation suggests that EC2(60 )/EC2(0) ¼ 4 but detailed studies showed a less steep angular dependence and this ratio was found to be only around 2 and even smaller for insulators (Reimer et al., 1992). In particular, the angular dependence of EC2 should weaken in the low-energy range together with the same trend regarding .

F. Tools for Simulation of Electron Scattering Phenomena connected with electron scattering inside the target, particularly in the case of multiple scattering met in SEM, are too complex for it to be possible to solve the ‘‘direct task,’’ i.e., to reconstruct

341

SCANNING LEEM

the specimen from the SEM image or an image series, as can be done in certain circumstances in the TEM. Thus, eﬀorts were oriented toward simulating the image of a ﬁctitious specimen described by its scattering cross-sections. Two main approaches can be found in the literature: the transport equation approach and the Monte Carlo procedure. The transport equation is the equation for the phase-space density of electrons, n(r, t, t), dependent on the position vector r, velocity vector t, and time t. All scattering phenomena that inﬂuence local values of n(r, t, t), including the generation of the incident beam, can be expressed by integrals containing the corresponding probabilities for movement within phase space (e.g., (t ! t0 ) for both inelastic and elastic scattering, diﬀering by whether |t| ¼ |t0 | or not) and taken over the rest of the phase space. If a stationary case is considered, the terms increasing and decreasing the local density n compensate mutually, composing some integral equation for n(r, t). In order to render the equation solvable, some simplifying assumptions are usually made, such as restriction to semi-inﬁnite, amorphous, and in-depth homogeneous specimens with ideally ﬂat surfaces, to multiple scattering corresponding to the Poisson stochastic process, the inelastic scattering being described by the mean free path, etc.; these assumptions enable one to use the Boltzmann-type classical transport equation (see, e.g., Werner, 1996). The simplest solutions then assume that the scattering is restricted to small angles. Program packages utilizing the MC algorithm also rely on the simplifying assumptions mentioned above, except those regarding the homogeneity, semi-inﬁnity, and ﬂatness of the specimen. Otherwise, similar information about the specimen is needed—the diﬀerential elastic and inelastic cross-sections or their equivalents, the diﬀerential inverse mean free paths. Let us brieﬂy summarize the MC procedure for tracing the scattering of one electron (see, e.g., Ding and Shimizu, 1996). The basic concept is the normalized accumulation function A(x) for some probability distribution function P(x) of a physical phenomenon with one parameter x: Z AðxÞ ¼

x xmin

0

0

Pðx Þ dx

Z

xmax

Pðx0 Þ dx0 :

ð22Þ

xmin

Obviously A(x) 2 (0,1) and when uniformly distributed random numbers R are taken within this interval and x is calculated from A(x) ¼ R, then after many attempts the assembly of x values obeys P(x). The second basic concept is the rule for deciding which member of a set of n possible phenomena will take place when one must deﬁnitely occur. The rule is that

MU¨LLEROVA´ AND FRANK

342

the ith phenomenon occurs if i1 X j¼1

pj =

n X j¼1

pj < R <

i X j¼1

pj =

n X

pj

ð23Þ

j¼1

where pj is the probability of the jth alternative. Again, after many attempts the total occurrence of a particular phenomenon corresponds to its probability. Now we can apply Equation (22) to the free path s of the electron provided the scattering probability distribution is that of the Poisson process, i.e., P(s) ¼ (1/lT)exp(s/lT) with 1/lT ¼ 1/lel þ 1/lin as the total inverse mean free path; we simply get s ¼ lT lnR. Having determined the free-path section, we use another random number to decide what collision takes place by means of Equation (23) so that p1 ¼ lel and p2 ¼ lin. When simulating the scattering in a compound of m diﬀerent atoms, the elastic scattering phenomenon can be ascribed to the jth atom according to Equation (23) again, now with pj ¼ Cja =ljel , Cja as the atomic fraction. The same procedure can be followed for the inelastic collisions only in alloy-like compounds where the scattering cross-sections can be summed. Otherwise, the compound-speciﬁc data for 1/lin should be acquired. The scattering angle due to an elastic collision is also calculated from Equation (22) with A(x) ¼ R, provided P(x) is replaced by d el/d; the scattering angle and energy loss for an inelastic event can likewise be found by using d2 l1 in /d /dW, respectively, as the probability P(x). dW and d2 l1 in When simulating sophisticated processes such as the formation of an angular resolved energy spectrum of electrons with characteristic energies, in which case only very few of the incident electrons create the relevant signal species, simulation of reversed trajectories can also be used (Gries and Werner, 1990), starting at the detector and ﬁnishing at the ﬁrst inelastic collision. Ding and Shimizu (1996) presented MC modeling of SE generation including cascading processes, which produced energy spectra of SE þ BSE emission that ﬁtted the experimental data very well. A detailed study of SE emission from SiO2 by MC simulations was performed by Schreiber and Fitting (2002). These data directly relate to the SEM application. Obviously, both transport function and MC algorithms can be applied on various levels of the physical model, depending of what data for the mean free paths are utilized. For the inelastic scattering, more recent works use the dielectric functions, usually compiled from measured optical data and some dispersion relation, and sometimes a combination of discrete events and continuous slowing down contribution is used or even no discrete events

SCANNING LEEM

343

are considered. For the elastic scattering, all the choices from classical Rutherford cross-sections up to the Mott ones can be found, including the scattering on phonons, as described in previous sections of this chapter. Generally the MC algorithm is capable of working down to very low energies, provided it is completed with adequate scattering data, and only coherent scattering phenomena and eﬀects connected with the energy band structure are excluded. When in the very-low-energy range the scattered electrons are taken as the Bloch electrons within the energy band structure of the target, the electron trajectory, needed for the MC procedure, has to be extracted from E(k), tensor of the reciprocal eﬀective mass km1k, etc. Then the electron movement in an electric ﬁeld F is ﬁrst considered in the reciprocal space as dk/dt ¼ (2pe/h)F, and transition into the real space is made via the real electron velocity (Schreiber and Fitting, 2002) tðkÞ ¼

1 2p 1 h k gradk E ¼ h 2p 1 þ 2 E ðkÞ m

ð24Þ

with as the nonparabolicity parameter of the energy band. The MC methods were developed many years ago by Joy (see Joy, 1995) and his programs are widely used. The work of Ding and Shimizu (e.g., Ding and Shimizu, 1996) is important here and simulations carried down to even fractions of eV were made by Fitting and co-workers (e.g., Kuhr and Fitting, 1999; Fitting et al., 2001). The MOCASIM program introduced by Reimer (1996) is also popular. Although the MC approach seems to be much more ﬂexible and universal and enables one to model directly various phenomenological quantities, the necessary number of simulated trajectories grows enormously when any spectrum-like data need to be modeled and the computation time exceeds reasonable limits. Speciﬁc solutions to the transport equation are then sought, as, for example, in simulation of the energy spectra of BSE (Reimer et al., 1991).

IV. EMISSION

OF

ELECTRONS

The previous section provided fundamentals enabling one to comprehend the behavior of the electron emission excited by the impact of primary electrons. We now deal with individual contributions to the total energy spectrum of emitted electrons, shown schematically in Figure 11. This is composed of two main components, SE and BSE emission, which together form the background for electrons with characteristic energies that convey

344

MU¨LLEROVA´ AND FRANK

FIGURE 11. Typical energy spectrum of electrons emitted under the electron beam impact (for explanation of symbols see text for details).

spectroscopic information. Secondary electron emission and electron backscattering are quite diﬀerent phenomena, the ﬁrst of which consists of a stream of particles released from atoms by impact ionization, the most important item within the inelastic scattering, while the latter comprises reﬂected primary electrons that have undergone some elastic scattering events. Nevertheless, owing to the indistinguishability of electrons, there is no way of separating these groups, except according to some statistical quantities connected with their motion after emission, i.e., with their velocity. Even when considering in simulations that the faster electron leaving the inelastic collision connected with ionization is the scattered particle and the slower is the ionization product, we do not arrive at the correct results because of the existence of processes in which the particles exchange their energies (Fitting et al., 2001). By deﬁnition, the separation between SE and BSE emission is situated at the threshold emission energy Et ¼ 50 eV and it is believed that the tails of both distributions, extended beyond Et, compensate each other. This can be the case for high primary energies but, in the low-energy range, the compensation is far from complete, as Fitting et al. (2001) showed by simulating both SE and BSE emissions from Au (see Figure 12). Obviously, below 400 eV the BSE contribution below 50 eV (BE) strongly exceeds the amount of fast SE above 50 eV (SE), while for EP>400 eV the opposite but weaker imbalance exists. This indicates that measurements of both yields, with respect to the 50 eV convention, overestimate below

SCANNING LEEM

345

FIGURE 12. Contribution SE of fast SE (E>50 eV) to the BSE yield , together with contribution BE of slow BSE (E<50 eV) to the SE yield , as calculated for Au. (Reprinted with permission from Fitting et al., 2001.)

a certain energy and underestimate it above the same value, while the opposite holds for . Nevertheless, any more correct distinction between the two emissions is possible only in simulations but can hardly be made in experiment, so from now on we continue to accept the 50 eV threshold. Naturally, when approaching the very-low-energy range, i.e., somewhere below 100–200 eV, no separation is possible and the total yield has to be considered. In this section we will also deal with phenomenological features of emissions such as their yields, energy and angular distributions, information depths, and (for eBSE) the coherence.

A. Electron Backscattering Emission of the backscattered electrons consists of a fraction of the electrons reﬂected without any signiﬁcant energy loss, the eBSE emission, while the rest down to 50 eV are electrons with various energy losses. The eBSE peak is a potential source of information about the probability of incoherent elastic scattering, and hence about relations between elastic and inelastic scattering (Gergely, 1986). To acquire this information, it is necessary to measure the eBSE emission yield el with respect to the primary current, namely within the maximum possible

346

MU¨LLEROVA´ AND FRANK

range of emission angles. Measurement on a thin-ﬁlm-covered substrate under variation of the ﬁlm thickness is particularly fruitful. The intensity of the eBSE peak generally increases with increasing atomic number and decreasing energy, as does the scattering cross-section el. Nevertheless, these monotonic dependences break down below about 1500 eV where the el(E ) curves for diﬀerent Z start to cross each other (Schmid et al., 1983). Owing to multiple scattering inside bulk specimens in SEM, the discrete peaks connected with ionization losses are usually washed out to a smooth distribution curve of BSE modulated with low peaks due to Auger electrons (AE) and to plasmon losses. Only under special circumstances can some very weak ionization loss peaks be observed as, for example, the oxygen ionization peak for a specimen covered with a thin oxide layer; the height of this peak can be up to about 103 of the eBSE peak (Gergely et al., 1986). The plasmon peaks can normally be observed in electron spectra taken with an Auger electron microprobe (AEM) and used for analytical purposes as in the transmission mode in EELS. The risk of carbon contamination makes it necessary to examine discrete features in the BSE spectrum solely under UHV conditions. Although the UHV-conditioned modes are not excluded from the scope of this text, we do not consider here instruments equipped with analyzers for signal discrimination according to energy. Hence we will not further discuss the reﬂection EELS (REELS) method, because within the total detected BSE signal the contribution from discrete peaks is negligible. The overall shape of the BSE spectrum exhibits a very broad maximum at m (see Figure 11). It represents the most probable energy loss of the EBSE emitted BSE, and in addition to scattering properties of the specimen, it depends also on the experiment geometry, i.e., on the impact and emission angles. For a tilted specimen and high emission angles (taken from the surface normal), this maximum moves toward EP (Bauer, 1979) and the same holds for higher atomic numbers (Kulenkampﬀ and Spyra, 1954). Reimer et al. (1991) simulated the energy spectra of the fast electron backscattering into the full half-space for layered structures and demonstrated that the position and height of the spectral maximum sensitively changes with thickness and material of the overlayer relative to substrate. Frank (1992b) presented experimental data for similar layered structures, taken in the low-energy range, together with a model interpreting the position of the BSE spectral maximum in terms of the depth of the ﬁlm/substrate interface and treating the height of this maximum as proportional to the rate of quasi-elastic backscattering on the interface.

SCANNING LEEM

347

Examination of the BSE spectral maximum also requires using an energy analyzer and hence goes beyond the scope of this review, but the mere existence of this feature is important for SEM because it deﬁnes the average energy of the BSE signal species, which is crucial for their detection. Nevertheless, even for homogeneous specimens and normal impact, the mean BSE energy varies within a broad range, and is also aﬀected by the acceptance angle of the detector. Most of the available data are taken with the cylindrical mirror analyzer (CMA) of electron energies, in which the input beam is limited between two cones with a mean emission angle of m =EP ¼ 0:83 for Al and 0.87 for Si about 42 . In this case one gets EBSE but for Cu no maximum develops (Frank, 1992a). Obviously, the energy distribution of BSE should be modeled speciﬁcally for a particular detector geometry and the mean energy of BSE can move everywhere above about 0.6EP. A crucial parameter is the total yield of BSE. With the normal SEM, it is expected that will be nearly independent of the energy of the incident electrons and grow monotonically with the mean atomic number of the specimen, giving the broadly used material contrast in the image. Values of as well as its energy dependence are available from many sources but we will abandon citing individual experimental studies of the yields because near-complete experimental data from the literature have been collected by Joy (2001) into a database in which data for the low-energy range can also be found. Nevertheless, the scatter in the published data is signiﬁcant, e.g., for Al at 1 keV, six values of can be found in this database, spanning the interval between 0.134 and 0.2346. It is generally assumed that only the SE yield has to be measured on clean surfaces because of the sensitivity of this parameter toward surface contaminations. Nevertheless, large variations in the measured BSE yields can also be explained only by the surface status. In order to conﬁrm this opinion, a targeted study was made consisting in careful measurement of SE and BSE yields at low energies for 24 conductive elements, both as-inserted into a UHV apparatus and after in situ cleaning by ions. The yields were measured using a device based on a principle published by Reimer and Tollkamp (1980). Here we will quote data from this study (Zadrazˇil and El-Gomati, 2002), which have been only partially published hitherto (Zadrazˇil et al., 1997; Zadrazˇil and El-Gomati, 1998a,b). In Figure 13 the data are shown for in situ cleaned specimens throughout the low-energy range down to 250 eV. Obviously, the (E ) curves for various Z do not tend to one point at the low end of the plot as was shown in some older published results but they do this in the dataset for the as-inserted specimens; in Figure 14 we compare the ‘‘clean’’ and ‘‘unclean’’

348

MU¨LLEROVA´ AND FRANK

FIGURE 13. The BSE yields measured in UHV under normal incidence of primary electrons onto targets in situ cleaned by an ion beam; values at E ¼ 5 keV correspond to the atomic numbers (top to bottom) 79, 78, 82, 73, 72, 74, 64, 50, 47, 41, 40, 30, 42, 29, 48, 32, 28, 24, 26, 23, 22, 14, 13, and 6 (data provided by Zadrazˇil and El-Gomati, 2002).

FIGURE 14. Comparison of the BSE yields for normal impact, measured under UHV conditions for as-inserted (- - - - - -) and in situ cleaned (——) specimens; data provided by Zadrazˇil and El-Gomati (2002).

data for several elements. It is usual to explain the observations as a consequence of graphitic contamination and oxide layers on as-inserted specimens, which become less transparent at low energies and increasingly contribute to the BSE signal. Then, because of the presence of ‘‘standard’’

SCANNING LEEM

349

contaminations, very similar data can be obtained for diﬀerent specimens, at least within their groups of a similar reactivity. Three of these pairs of measurements were studied in detail by simulation of the BSE yields from those specimens, both clean and covered by a contamination layer of a probable composition and thickness (Frank et al., 2000b). It was found that the observed diﬀerences in due to cleaning can be explained, for example, by the presence of a 3 nm layer of Al2O3.3H2O on Al or 7 nm of carbon on Au. The question of the information depth D of BSE emission was also addressed. At a ﬁrst glance, we can expect a value similar to half the penetration depth, i.e., D ﬃ Rx/2. Simulation showed that this is approximately so at 1 keV while at 3 keV the information depth is 2 to 4 times smaller than Rx/2 (Frank et al., 2000b) while Joy and Joy (1996) presented this depth as approximately equal to 0.2RS. These conclusions indicate that, when entering the low-energy range, we have to consider the BSE yield as another surface-sensitive quantity and when interpreting the SEM image we should take into account data corresponding to the vacuum conditions and surface treatment used. In Figure 13 we notice that, just below 5 keV, the (E ) curves for various Z start to cross each other, which means that the material contrast, i.e., the monotonic (Z) dependence at constant E, disappears here and cannot be reliably used for interpretation of micrographs in the low-energy range. This represents an additional threshold separating the low-energy range. In the SLEEM instrument the above-surface electric ﬁeld has to be kept as homogeneous as possible, which restricts the specimen tilt to within narrow limits. Even so, nonnormal electron impact can occur because of relative enhancement of the radial velocity by deceleration. The BSE yield grows with the specimen tilt so that, according to MC simulations, for example at 1 keV the ratio (80 )/(0 ) reaches 2.9 for Al, 2.3 for Cu, and 1.8 for Au (Bo¨ngeler et al., 1993). The shape of ( ) does not noticeably vary with the energy of the electron impact. As regards the angular distribution of BSE, at high energies it is circular in the polar diagram, i.e., ( ) ¼ (0) cos . At energies as low as 1 keV, the same shape of the distribution was simulated for Al, but for heavier elements (Cu, Ag, Au) the distribution is more ‘pointed,’’ i.e., it increases more steeply at small (Bo¨ngeler et al., 1993). The same change resulted from the MC simulations of Kuhr and Fitting (1999) at 100 eV, though the Au data appeared much nearer to the cosine law than those for Ag. The eBSE distribution was found to be not only strongly pointed toward the axis but also exhibited anisotropy similar to that in Figure 3.

350

MU¨LLEROVA´ AND FRANK

B. Crystallinity Effects In the previous section we discussed the electron backscattering from a specimen behaving like a homogeneous and isotropic continuum. Experimentally, this corresponds to amorphous substances but otherwise averaging has to be made over grains of polycrystals or diﬀerent orientations of a single crystal. In practice, we usually observe details connected with anisotropy of the scattering properties of crystals. However, electron crystallography is an independent discipline with sophisticated theory and a broad range of experimental data so within this review we can only brieﬂy touch on several key points. For very thin specimens and higher electron energies, the kinematic theory of diﬀraction can provide useful results, particularly as regards the geometry of the diﬀraction pattern from which the crystal structure and orientation can be determined; the basic relation is the Bragg equation k k0 ¼ g ¼ g1 x*1 þ g2 x*2 þ g3 x*3

ð25Þ

in which k0 and k are the wave-vectors of the incident and scattered wave, respectively, and xi* are base vectors of the reciprocal lattice, orthonormal to the base vectors xi of the Bravais lattice of the crystal. This equation represents a condition for constructive interference of waves scattered by individual unit cells in the crystal. In SEM and at low energies, large-angle scattering and multiple scattering phenomena always occur together in the signal generation so that the dynamical diﬀraction theory is needed. The problem consists in conversion of the incident plane wave into a wave ﬁeld with the crystal periodicity. This is solved via the Schro¨dinger equation with periodic potential and with the solution developed into the Bloch waves. The topic is again discussed in suﬃcient detail by Reimer (1998) and by authors cited therein. An important step is the development of the crystal potential into a Fourier series with complex coeﬃcients Vg þ iVg* where g comprises all points of the reciprocal lattice. The imaginary coeﬃcients Vg* (with the dimension of energy) express the absorption of the electron waves and V0 is the (mean) inner potential inside the crystal. The Bloch waves have the formal appearance X aj ðgÞexp 2pi kj þ g r ð26Þ j ¼ g

where aj(g) have the periodicity of the crystal potential. The index j should run over all points of the reciprocal lattice but in practice it is suﬃcient to

SCANNING LEEM

351

consider only n beams for which the lattice points are near enough to the Ewald sphere. The incident wave then splits into n2 partial waves forming n Bloch waves. On substituting Equation (26) into the Schro¨dinger equation, we get h

2 i 2m X K 2 kj þ g aj ðgÞ þ Vh aj ðg hÞ ¼ 0 h h6¼0

ð27Þ

where K ¼ h1[2m(E þ V0)]1/2 is the length of the incident wave-vector inside the crystal. Solution of Equation (27) gives n values of kj and n2 values of the amplitudes aj(g) for the incident wave K and, naturally, at least two beams should be considered of which that of g ¼ 0 is the primary beam. It is quite obvious that the intensity of such a scattered wave is anisotropically distributed, which projects itself into anisotropy of the emitted signals so that their cosine or similar monotonic distributions become modulated. The directions kB of enhanced intensity can be estimated by the Bragg equation, now in the form of kB ¼ K þ g. Observable consequences of these phenomena include dependence of on the crystallinic orientation (i.e., on the specimen tilt or on orientation of a crystal grain) and also some structure modulating the ( ) distribution. Variations in due to the orientation can be a source of grain contrast at polycrystals. In the two-beam approximation and assuming that the Bloch waves do not interfere, we get (according to Reimer, 1998) for variation in the backscattering coeﬃcient

a ¼ 0 2pD

! þ 0a = ga 2 1 þ !2 0a = ga

ð28Þ

where ga is the absorption length, ga ¼ h=2Vg* , and ! ¼ s ge is a dimensionless factor in which s ¼ |s| characterizes deviation from the exact Bragg position (distance of the reciprocal lattice point from the Ewald sphere) and ge ¼ h=2Vg is the extinction length. Further, D is the (already introduced) information depth of the backscattered signal, D ¼ /N B, with B as the cross-section for scattering through angles greater than 90 . Because at least down to hundreds of eV D decreases with decreasing energy faster than ga , we ﬁnd that, among other eﬀects, the grain contrast increases at low energies. Modulation of ( ) includes all the electron diﬀraction phenomena that have proved to be sources of extremely interesting image signals in LEEM. In the low-energy range, the Bragg angles B ¼ arcsin (l/2dhkl) (with dhkl as the interplanar distance in the real crystal) are still less than 90 and no

352

MU¨LLEROVA´ AND FRANK

regular diﬀraction pattern is formed by backscattered waves. But an important phenomenon is the formation of EBSP (electron backscattering patterns) whereby diﬀused BSE upon their return toward surface diﬀract on sets of atomic planes and form Kikuchi bands and lines, which are in fact intersections of so-called Kossel cones with the observation plane. The Kossel cones comprise all the vector directions fulﬁlling the Bragg equation so that one cone contains k0 and the other k, both having g as the cone axis. In EBSP these features can be visible when the backscattered electron suﬀers only single elastic scattering before emission. Thus, a good contrast of EBSP can be achieved solely at high tilt angles around 70 and also at higher electron energies. Nevertheless, some modulation of ( ) should be present even in the low-energy range but to the authors’ knowledge, no successful observation has been made yet. At energies of the order of hundreds of eV, the Bragg angles start to exceed 90 and true diﬀraction patterns can be formed. At the same time, the penetration depth shrinks so that electrons interact with a nearly twodimensional lattice and the Bragg equation has to be fulﬁlled only for vector components parallel to the surface. In other words, the reciprocal lattice, normally three-dimensional with lattice points of a size inversely proportional to dimensions of the real crystal, is now ﬁlled with ‘‘rods’’ perpendicular to the surface. In fact the real situation is usually between the two marginal cases, so that the ‘‘rods’’ are modulated in ‘‘thickness’’ and the third-dimension condition still has to be considered. Formation of a LEED pattern with sharp diﬀraction spots requires the use of a beam aperture below 1 mrad as in transmission microscopy. As will be shown below, in SEM the optimum beam aperture, tuned to ultimate resolution, can approach the 1 mrad level but after deceleration to very-low-energy, the aperture grows to tens of mrad. Consequently, the diﬀraction spots extend to discs, as in CBED conditions in STEM. It is a matter of debate whether these mutually overlap, which would further enhance the signal. A position-sensitive multichannel detector situated above the specimen (or in a side position when a through-the-lens detection system, to which the signal electrons are deﬂected, is employed) enables one to observe the local deviations from fulﬁllment of the diﬀraction condition. Even a single-channel integral detector will show as brighter those areas where, at the energy used, the Ewald sphere just crosses some reciprocal lattice point. However, owing to dynamical eﬀects, additional maxima of the signal can appear. This imaging mode will be illustrated in Section VIII. Irrespective of the beam aperture, eﬀects connected with channeling of the Bloch functions, like formation of Kikuchi lines, take place and are visible in the phenomena outlined above, because their geometry is connected with the crystal structure and not with the beam shape.

353

SCANNING LEEM

C. Coherence within the Primary Beam Spot A crucial condition for constructive interference of waves scattered from individual atoms is coherence within the primary spot. This should be assessed in connection with the energy and angular spreads of the beam that determine whether all contributions from the illuminated spot are amplitude summed. As discussed by Buseck et al. (1988) for STEM, no amplitude addition between neighboring pixels can appear. The coherence conditions for a single static spot were discussed by Frank et al. (1999). The energy spread in the primary beam is mainly given by the type of the electron gun and varies between 0.2 eV for a ﬁeld-emission cathode at room temperature up to about 2 to 3 eV for thermoemission from tungsten, provided we neglect any additional spreads generated by e–e interactions in crossovers. According to Born and Wolf (1975), the coherence condition for the path diﬀerence s, i.e., for the size DC of the coherently illuminated area, is DC ¼ jsj

pﬃﬃﬃﬃ hi2 E ﬃ 2:45 E

½nm; eV :

ð29Þ

The initial source size, also connected with the emission type used, gives the maximum diameter of the coherently illuminated diaphragm. If we tolerate a decrease to 88% in the complex degree of coherence from the center to the edge of the illuminated area, then a quasimonochromatic uniform source of angular radius ¼ /x (see Figure 15) illuminates ‘‘nearly coherently’’ a circle of a diameter 2r ¼ 0.16hli/ (Born and Wolf, 1975). Hence a further coherence condition is

0:08hi 0:98 ﬃ pﬃﬃﬃﬃ 0 0 E

½nm; eV

ð30Þ

In diﬀraction experiments the so-called ‘‘transfer width’’ (see Woodruﬀ and Delchar, 1986) plays a similar role to that of the beam coherence area. In order to understand this concept, we have to recall that the reciprocal lattice points have a ‘‘size,’’ which is inversely proportional to the crystal dimensions. Hence the diﬀracted beams have a ﬁnite angular size corresponding to the dimensions of the area from which the amplitude addition takes place. If any imperfections in the primary beam exist, such as energy and angular spread, that cause a change in the wave-vector kk, then they also correspond to some distance on a surface, just as we get the surface periodicity length from the Bragg condition, dhk ¼ 1/kk. These lengths are analogously deﬁned as w ¼ 1/kk, and determine maximum distances over

354

MU¨LLEROVA´ AND FRANK

FIGURE 15. Deﬁnition of quantities used in the assessment of the primary beam coherence.

which variations in the surface periodicity can be detected. In other words, kk represents now a dispersion caused by a ﬁnite aperture and energy spread of the illuminating beam. Thus, the area of addition of amplitudes is limited by ‘‘angular’’ and ‘‘energy’’ transfer widths w and wE so that DC w ¼

0:61

pﬃﬃﬃﬃ 2 cos E

½nm; eV

ð31Þ

and DC wE ¼ dhk

2E : E

ð32Þ

Later we will see that in a normal SLEEM conﬁguration all these conditions for suﬃcient coherence within the primary spot can be satisﬁed. The diﬀraction spots can overlap in the radial direction when their angular size, ﬃ l /dhk cos, is larger than 2 . This can easily happen and then further intensity increase is achieved. D. Secondary Electron Emission Secondary electrons are released from the target atoms by impact ionization, which forms a substantial contribution to quantities characterizing

SCANNING LEEM

355

the inelastic scattering. According to conclusions drawn from results of momentum-resolved coincidence spectroscopy, the main source of SE is decay of the valence band excitations caused by large momentum transfer spatially localized scattering events (Drucker et al., 1993). Further intensive SE generators include decay of volume and surface plasmons; the yield from the electron–electron collisions is substantially weaker. Upon release from an atom, the internal secondary electron possesses kinetic energy (taken with respect to the bottom of the conduction band) of the order of 101 eV. For example, Schreiber and Fitting (2002) studied in detail the SE emission from SiO2 and found the mean initial kinetic energy to be 13 eV. Owing to further impact ionization and cascading processes, the energy of the SE dropped below 10 eV within 10 fs. Then, scattering on phonons dominated and after 200 fs the electrons were more or less thermalized so that their energy approached 3kT/2, i.e., approx. 40 meV at room temperature. Finally, electron–hole recombination took place and within 1000 fs the released electrons were almost all recombined or trapped. Emission of a SE has, therefore, to take place within a very short time after its generation. Data important for understanding the SE signal in SEM were reviewed by many authors, e.g., Bruining (1954), Kollath (1956), Dekker (1958), Hachenberg and Brauer (1959) and Seiler (1983). The measured SE yields are contained in the database of Joy (2001) and we will quote also the data from the study targeted at the determination of the inﬂuence of the surface cleanliness (Zadrazˇil and El-Gomati, 2002). The SE yield is relatively low at the energies normally used in the SEM; we can verify in the database of Joy (2001) that at 20 keV, < for all except the lightest elements, for which both yields are roughly equal. But at low energies, is signiﬁcantly larger than —this relation creates another crucial distinction of the low-energy range. Similarly, the information depth of BSE is normally much larger than that of SE but this relation also reverses. According to the simulations of Kuhr and Fitting (1998), the relation between maxima of the depth distributions for SE and BSE from Ag is mutually opposite for electron energies 3000 eV and 100 eV. The maximum SE yield m, achieved at a certain energy of the incident electrons Em (located between 100 and 900 eV), remains within 0.5 to 1.7 (Seiler, 1983) or 0.6 to 2.1 (Zadrazˇil and El-Gomati, 2002) for metals. For insulators, owing to the extended escape depth of SE, the yield can reach values even higher than 10 (Seiler, 1983; Joy, 2001), for alkali halides in particular. In Figure 16 a set of data similar to that in Figure 13 is given; these are now values of for the same selection of 24 conductive elements.

356

MU¨LLEROVA´ AND FRANK

FIGURE 16. The SE yields measured in UHV with normal incidence of primary electrons onto targets in situ cleaned by an ion beam; values at E ¼ 1 keV correspond to the atomic numbers (top to bottom) 64, 13, 40, 78, 79, 72, 47, 30, 82, 50, 14, 29, 24, 48, 74, 73, 28, 32, 42, 26, 22, 41, 23, and 6 (data provided by Zadrazˇil and El-Gomati, 2002).

FIGURE 17. Comparison of the SE yields for normal impact, measured under UHV conditions for as-inserted (- - - - - -) and in situ cleaned (——) specimens; dashed curves correspond to (top to bottom) Ag, Al, Pt, Cu, and C at 1 keV (data provided by Zadrazˇil and El-Gomati, 2002).

SCANNING LEEM

357

Further, in Figure 17 the same pairs of measurements are given as in Figure 14, i.e., for the specimen as-inserted into UHV and after being ion-beam cleaned. Here we notice a pronounced similarity between the as-inserted curves, which obviously corresponds to similarly contaminated surfaces although the specimens were thoroughly precleaned and measured under clean conditions. A semiempirical theory of SE emission, summarized by Seiler (1983), gives a universal (i.e., specimen-independent) curve " 0:35 ( 1:35 #) E E ¼ 1:11 1 exp 2:3 : m Em Em

ð33Þ

It is also stated that for metals, m/Em is constant (due to the proportionality of both quantities to J4/5) and approximately equal to 2 103 eV1 (Ono and Kanaya, 1979). Nevertheless, data tabulated by Seiler (1983) do not indicate constancy of m/Em and when extracting this ratio from Figure 13, we ﬁnd that its average value is around 0.002 eV1 but the values are scattered between 0.0012 and 0.0053 eV1. All sources of data conﬁrm that in their Z dependences both m and Em exhibit a modulation corresponding to the periodic system of elements. This modulation is apparent even when drawing a (Z) curve for an arbitrary energy value (Zadrazˇil and El-Gomati, 1998b). At incident electron energies suﬃciently higher than Em, the SE yield decreases as E0.8 (Drescher et al., 1970), which is the energy dependence of the Bethe stopping power. Near and below the yield maximum, no universal relation exists except Equation (33). The energy distribution of emitted SE has a strong maximum at m energy ESE (see Figure 11), which is smaller for insulators than for metals, for which it moves between 1 and 5 eV, while the width of this distribution, measured at half maximum, ranges from 3 to 15 eV (Scha¨fer and Ho¨lzl, 1972). Ding and Shimizu (1996) veriﬁed that the energy distribution does not depend strongly on the emission angle. The dependence of the position and width of the distribution peak on the material and its surface status were studied by Dietrich and Seiler (1960) and Joy (1987), and m others. Fitting et al. (2001) found, again for SiO2, that the value of ESE decreases with increasing escape depth of SE. Chung and Everhart (1974) presented a simple theory leading to a relation for (ESE) for metals. They supposed the surface potential barrier fully transparent for ESE>0 (with ESE measured from the vacuum level) and nonpenetrable otherwise, and the SE generation to be isotropic and depth independent. The resulting

358

MU¨LLEROVA´ AND FRANK

expression was dNSE 1 ESE ¼K E ðESE þ W Þ4 dESE

ð34Þ

(where W is the work function and K is a material constant), giving m ¼ W =3. For our next considerations, we need some ‘‘mean’’ energy of ESE SE; from Equation (34) the mean value of ESE is 2 W. However, the mean value overestimates the contribution of fast SE so that it is more reasonable to take the median, which is equal to W here. Thus, for detection considerations, we can use 3 to 5 eV as the typical energy of SE. A more exact theory would require incorporation of processes of SE generation, diﬀusion inside the target, and penetration through the surface barrier. Reimer (1998) reviewed calculations made for aluminum (see, e.g., Bindi et al., 1980) and hinted at anisotropy of the internal SE release, which is afterwards quickly randomized, owing to the short mean free path, to the cosine distribution. In practice, the distribution ( ) / cos is observed generally at all instances (see, e.g., Kanaya and Kawakatsu, 1972). Nevertheless, with single crystals some structure again appears on the angular distribution, caused by channeling of the Bloch functions as we mentioned for the BSE emission (see, e.g., Burns, 1960). The smooth energy distribution described by Equation (34) can exhibit some additional structure at energies equal to energies of plasmons. Everhart et al. (1976) observed this structure with aluminum and for an atomically clean surface they found that the energy distribution was broadened and contained some features at energies corresponding to surface and volume plasmons. Nevertheless, after very slight oxidation the structure not only disappeared but also the main peak became much narrower. This indicates that the SE generation via decay of plasmons is sensitive to the surface status and is much weaker at ‘‘real’’ surfaces. At high energies the dependence of on the specimen tilt angle is very important, causing the most pronounced contribution to the image signal, owing to which the SEM image acquires its three-dimensional appearance. The proportionality can be written as ( ) / secn with n decreasing from about 1.3 to 0.8 throughout the Z scale (Seiler, 1983). An extreme demonstration of this dependence is so-called edge eﬀect, i.e., a strong overbrightening of side walls of surface steps that dominates micrographs at conventional energies. The phenomenon is simply caused by the SE escape depth being shallower than the penetration depth of primaries, owing to which any inclined facet represents an additional emitting surface. Thus, the edge eﬀect should disappear at low energies (see Figure 18) near to Em for the maximum SE yield where all generated SE are emitted; this was

SCANNING LEEM

359

FIGURE 18. Experimental data for tilt-angle dependence of the SE yield. (Reprinted with permission from Bo¨ngeler et al., 1993.)

quantitatively veriﬁed by Pejchl et al. (1993). Consequently, the SE contrast at low energies is restricted to the ‘‘shadowing’’ connected with the usual side position of the detector and the image becomes more ‘‘ﬂat’’ (see Joy and Joy, 1996). When exciting SE from a single crystal, the monotonic ( ) dependence again acquires a structure. This is normally comparable with that of the BSE yield, / (see comparison made for Si(111) by Seiler and Kuhnle, 1970) but toward low energies / does not grow so distinctly as / does. Hence any grain contrast in SEM micrographs at low energies are more probably caused by the BSE emission anisotropy. Further studies regarding the angular distribution of include those of Salehi and Flinn (1981) and Libinson (1999). An important collection of experimental results concerning the SE emission anisotropy has been acquired by using UHV SEM instruments equipped with detectors featuring an enhanced angular sensitivity, usually achieved by suppression of SE emitted oﬀ the direction toward the detector. Then, Homma et al. (1993) observed alternating 2 1 and 1 2 domains in subsequent atomic layers on Si (100) as well as reconstructed 7 7 domains coexisting with nonreconstructed remains of 1 1 phase on Si (111). Domains were visible even at an electron energy of 25 keV but enhanced contrast was demonstrated at 2 keV. Similar instrumentation was used to visualize surface atomic steps, e.g., those on Si (111) (Ishikawa et al., 1985) or on an oxidized Cu surface (Bleloch et al., 1989). Obviously, with careful in situ treatment of the specimen surface, even in the ‘‘incoherent’’ SE imaging many phenomena can be observed which would intuitively be

360

MU¨LLEROVA´ AND FRANK

expected to be perceptible solely by diﬀraction contrasts in the LEEM method. An important characteristic is the mean escape depth lesc of SE, which governs the information depth of the SE image. The probability of escape Pesc is generally considered to be exponentially dependent on the depth, i.e., Pesc exp (z/lesc). Values of lesc range between 0.5 and 1.5 nm for metals and between 10 and 20 nm for insulators while the maximum escape depth is T ﬃ 5lesc (Seiler, 1967). The larger values of lesc for insulators are in accordance with the enhanced SE yield from them. Fitting et al. (2001) found for SiO2 that lesc decreases with increasing SE energy—for ESE 3 eV it amounted to about 10 nm while for ESE>20 eV it dropped below 1 nm. If the escape depth is brought into relation to the electron range R, we get the maximum SE yield at R ¼ 2.3lesc (Seiler, 1983). At higher energies the SE generation extends to depths from which no escape is possible while at lower energies the generation rate (the integral of the stopping power along trajectory of the incident electron) diminishes. The shallow escape depth, together with sensitivity toward ionization energies of least bound electrons, makes the SE emission very sensitive to the surface status, its cleanliness and contamination, and also to the radiation damage. At conventional SEM energies, the secondary electron signal is composed of so-called SE1 and SE2 contributions, the ﬁrst being excited directly by PE while the latter are due to BSE returning toward the surface. While SE1 escape from an area the diameter of which is approximately (dp2 þ l2esc Þ1=2 with dP as the primary spot size (see, e.g., Everhart and Chung, 1972), the SE2 signal emission spot is broadened by lateral diﬀusion of BSE so that the specimen response function consists of two bell-shaped features of diﬀerent width. We will discuss this later in connection with the image resolution but now let us mention that the total SE yield is usually written as ¼ PE þ BSE ¼ 0 ½sec þ ðÞ

ð35Þ

where 0 is the SE1 emission at normal impact of PE and ( ) denotes the ratio of the SE yields between PE and BSE. The dependence of

is decreasing (Seiler, 1983) and >1 because BSE have lower energies than PE and also their trajectories are generally more inclined with respect to the surface normal. Above about 10 keV, we get ﬃ 2.5 with only weak material and energy dependences. For low energies, when the electron range approaches the escape depth of SE, this approach, as well as any distinction between SE1 and SE2, becomes questionable. Nevertheless, at least at the beginning of the low-energy range, i.e., down to, say, 2 to

SCANNING LEEM

361

3 keV, Equation (35) can be considered, probably with an increasing value of . The role of BSE in the SE emission has been studied by numerous authors (e.g., Kanter, 1961; Kanaya and Kawakatsu, 1972; Joy, 1984; Hasselbach and Krauss, 1988; Bo¨ngeler et al., 1993). For us the distinction between SE1 and SE2 is of minor importance because in SLEEM the standard detectors acquire the total emission ¼ þ . Nevertheless, we should be aware that, even at low energies, the SE yield from surface ﬁlms depends on the underlying substrate and when the two materials have very diﬀerent Z, the change in with the ﬁlm thickness is very strong so that in fact the SE2 contribution prevails over that of SE1, see, e.g., measurements of Thomas and Pattinson (1970). As regards the noise in SE emission, it is usually considered to follow the Poisson distribution. This was proved for energies below 250 eV (Seiler, 1983) but at higher energies some excess noise content is found (see Reimer, 1971) because of the SE2 contribution. This question does not seem to have been fully answered. Finally let us mention that SE emitted from ferromagnetics are spin polarized (Kirschner, 1984). The degree of polarization is nonnegligible even for very low-energy electrons and further increases with E, with the highest polarization found for the slowest SE. The eﬀect is explained by the diﬀerent reﬂectivity of electrons with diﬀerent spin orientations. This phenomenon would enable one to observe the domain contrast if a detector of polarized electrons was available.

V. FORMATION

OF THE

PRIMARY BEAM

We have already touched on the important circumstance that in SEM the specimen represents a part of the imaging system. The information collected, coming from the entire interaction volume of the primary beam, is ascribed to a single point labeled by pixel coordinates so that the response function of the specimen, i.e., distribution of the signal excited by a monochromatic inﬁnitely narrow incident pencil, has to be taken into account when assessing the resolution. However, incorporation of the specimen properties prevents us from drawing general conclusions about the instrument quality so that it is usual to evaluate an ‘‘intermediate’’ quantity, namely, the current distribution in the primary beam spot entering the specimen. We will do the same and afterwards we extend the discussion toward the concept of the ‘‘real’’ resolution on a particular specimen.

MU¨LLEROVA´ AND FRANK

362

A. The Spot Size Within the scope of this text, we cannot go into details of the electron optical theory of the SEM column, of the lens aberrations and their combinations, and related problems. Let us only mention that correct results, particularly for coherent or nearly coherent illumination by various types of ﬁeld-emission guns, can be obtained only by wave-optical theory of the electron probe formation, which regards lenses as diaphragms ﬁlled by a phase shifting medium that deforms and trims the wavefronts. However, our aim is to explain, using relations as simple as possible, the speciﬁcs of the low-energy spot formation and hence we will utilize the simplest approximate ﬁgures obtained from geometric optical theory. For more details we can refer to Reimer (1998) and particularly to an exact analysis of the topic made by Hawkes and Kasper (1996b). We will simply consider the primary spot as a convolution of the current distribution within the demagniﬁed image of the gun crossover with discs of confusion of the basic aberrations. Assuming the astigmatism and defocusing aberrations fully corrected, we take into account contributions to the spot size expressed in the form of discs of confusion the sizes of which are

4I dG ¼ p2

1=2

1 , dS ¼ KS CS 3 ,

dC ¼ KC CC

E , dD ¼ KD 1 E ð36Þ

where dG is the demagniﬁed crossover, dS, dC, and dD are the discs of spherical, chromatic, and diﬀraction aberration, respectively, I is the beam current, is the gun brightness, is the specimen-side angular aperture of the primary beam, CS and CC are the coeﬃcients of spherical and chromatic aberration, respectively, and KS, KC, and KD are numerical factors dependent on the model of the spot formation. Here the leastconfusion planes are assumed for spherical and chromatic aberrations and the ﬁnal aperture-limiting diaphragm is considered uniformly illuminated. When using the full beam diameters in the least-confusion planes of spherical and chromatic aberrations and FWHM of the Airy disc for the diﬀraction aberration, we get the numerical factors as KS ¼ 0.5, KC ¼ 1, and KD ¼ 0.6. The next step is to select a summation rule for combining the contributions from Equation (36) into the overall spotsize dP. It is traditional to consider the ray radii in the individual discs as mutually independent random variables with normal distributions. Then the

363

SCANNING LEEM

summation rule is given by a convolution of Gaussian functions the result of which is also Gaussian and dP2 ¼ dG2 þ dS2 þ dC2 þ dD2 :

ð37Þ

In fact, the individual contributions are neither independent nor normally distributed so that Equation (37) provides only a rough estimate of dP. A more realistic but still reasonably simple relation is obtained by deﬁning the disc sizes as the diameter encircling some current fraction. Using this approach, Barth and Kruit (1996) derived the summation rule (for 50% of encircled signal) dP2 ¼

h

dS4 þ dD4

1:3=4

þdG1:3

i2=1:3

þdC2

ð38Þ

and determined modiﬁed values of the numeric factors, namely KS ¼ 0.18, KC ¼ 0.34, and KD ¼ 0.54. Other summation rules exist that provide more exact but at the same time more complicated relations for the spotsize (see, e.g., Kolarˇ ı´ k and Lenc, 1997) but we shall use the summation rules in Equations (37) and (38) and compare their results. First of all let us make the following simple observation: when the electron energy E decreases, the wavelength increases as l / E1/2. This causes the Airy disc to extend and in order to suppress the impact on resolution, we have to adjust the beam aperture to the same slope / E1/2. But then the spherical and chromatic aberration discs grow as dS / E3/2 and also dC / E3/2. The same energy dependence would in turn apply to the total spotsize dP, fully preventing any use of very low energies. To compensate this, an objective lens would be needed with aberration coeﬃcients, CS and CC, proportional to E3/2. However, the normal magnetic lenses have energy-independent aberration coeﬃcients. It is true that, for example, for weak lenses CS is proportional to f 3 (Glaser, 1952), i.e., in fact to E3, but after changing the beam energy we have to refocus to the same specimen plane and hence to get the same f and also the same CS. Consequently, the spotsize enlarges at low energies. The optimum angular aperture opt for achieving the ultimate resolution dPm is simply calculated from the relation @dP/@ ¼ 0. In Figure 19 we have the function opt(E ) plotted from the beam energy 15 keV downwards for both above-given summation rules and for two model SEM instruments of a diﬀerent quality. These are deﬁned in order to span the current instrumentation scope; the ﬁrst, ‘‘TEG SEM,’’ represents old instruments probably not marketable any more but still serving in plenty of laboratories

MU¨LLEROVA´ AND FRANK

364

FIGURE 19. The optimum angular aperture, opt, for the smallest spotsize, plotted versus electron energy. TEG SEM and FEG SEM denote the two sets of SEM parameters given in the text, the dashed line corresponds to the summation rule (37), and the full line to the rule (38).

while the other, ‘‘FEG SEM,’’ is for high-quality modern devices. The parameters were chosen as ¼ 105 A cm2 sr1, I ¼ 5 pA, E ¼ 2 eV, CS ¼ 50 mm, CC ¼ 20 mm for TEG SEM, and ¼ 109 A cm2 sr1, I ¼ 100 pA, E ¼ 0.2 eV, CS ¼ 1.9 mm, CC ¼ 2.5 mm for FEG SEM. Naturally, there might be queries about individual parameters but as we will see, the basic trends that we are now seeking for are independent of these details. One general trend is obvious already from Figure 19: along the low-energy range, all curves progressively acquire the same slope opt / E1/4. This behavior can be easily obtained from Equation (37) when we retain in it only members growing at low energies, i.e., dC and dD. When substituting / E1/4 into all terms listed in Equation (36), we get the proportionalities dG / E 1=4 ,

dS / E 3=4 ,

dC / E 3=4 ,

dD / E 3=4

ð39Þ

so that the inﬂuence of dC and dD dominates and hence the same slope can be expected for dP. In Figure 20 is shown the dP (E ) plot for ¼ opt, which conﬁrms the said behavior, again independently of the summation rule and instrument parameters. The foregoing very simple considerations have yielded the general relation for SEM, namely the proportionality of the spotsize to E3/4. This says that when we want to turn from the conventional energy like 15 keV to units of eV, the resolution in nanometers deteriorates to the same number

SCANNING LEEM

365

FIGURE 20. The ultimate spotsize, dPm , for the optimum angular aperture opt, calculated for two sets of SEM parameters denoted by TEG SEM and FEG SEM (see text) from the summation rules (37) (- - - - - -) and (38) (——).

in micrometers, i.e., below the level of a standard optical microscope. The proportionality to E3/4 seems to be broken by parameters of some recent microscopes that guarantee the spotsize at 1 keV only about three times larger than that at 15 keV but the improvement is achieved at the cost of shortened working distance, reduced current, and other restrictions (see, e.g., Nagatani et al., 1987). In general, a conventional SEM without aberration correctors can work at acceptable quality of micrographs down to 1 keV. Because the E3/4 slope does not depend on the instrument class, we will not discuss in detail the methods of optimizing the objective lenses and detection systems toward improved resolution at low energies. These mostly rely on placing the specimen very close to or even inside the magnetic ﬁeld, which in turn brings some limitations on other parameters of the microscope operation. Among possible conﬁgurations, the so-called single-polepiece lens (Mulvey, 1984), with the second polepiece shifted far from the optic axis and the primary spot, attracts the most attention. Various conﬁgurations based on the single-polepiece principle were studied by Pawley (1984), Bode and Reimer (1985), Shao (1989), Mu¨llerova´ et al. (1989) and Ximen et al. (1993), and others. Some setups achieved very low aberration coeﬃcients like the CS ¼ 0.15 mm and CC ¼ 0.55 mm of Tsai and Crewe (1998).

366

MU¨LLEROVA´ AND FRANK

B. Incorporation of the Retarding Field A qualitative step forward as regards possibilities of the SEM operation throughout the full energy scale was achieved by introducing nonconstant beam energy along the column. The idea is to form and transport the beam at high energy and only close to the specimen to retard it to a ﬁnal low energy. The underlying principle consisted in one property of the immersion electrostatic lenses, namely that the magnitude of their aberrations corresponds to the higher of the electron energies on either side of the lens. So an immersion lens, i.e., an electrostatic lens with diﬀerent potential on the marginal electrodes, can be inserted into the end part of the column with the negatively biased electrode toward the specimen. Fundamentals about conﬁgurations utilizing this principle were studied in detail by Frank and Mu¨llerova´ (1999). For estimation of aberrations of the immersion lens we use the approximate equation (Lenc, 1995)

Z 1 z1 0 1=2 0 0 dz ð40Þ CS CC 2 z0 ðzÞ ðzÞ 1 where interval (z0, z1) spans the transition region of (z) between 0 and 1. If we consider the electrostatic ﬁeld strength abruptly changing in the planes of ﬂat electrodes held on 0 and 1, we get " # w l 1 þ pﬃﬃﬃ CS CC ð41Þ 1 2 k þ 1 2 k (see also Lencova´, 1997) with w and l being the distances between the specimen and the ﬁrst electrode and between electrodes, respectively, and k the ratio of electron energies on either side of the lens, i.e., k ¼ EP/E (EP is the beam energy in the SEM column and E ¼ EP þ eUb, with Ub being the retarding potential, is now the lowered energy of impact on the specimen). In Figure 21 we see that the approximation (41) diﬀers appreciably from results obtained when substituting real potential distributions into Equation (40) but on its basis we still can make at least one simple consideration. At very low energies, i.e., for high values of k, both CS and CC approach w/2. So they are still independent of energy but can be quite small. However, for w small enough both coeﬃcients are approximately proportional to l/k ¼ (l/EP)E and hence diminish with decreasing energy, as we required in the previous section. Of course, the aberrations according to Equation (41) combine with aberrations of the magnetic objective lens but

SCANNING LEEM

367

FIGURE 21. The aberration coeﬃcients, CS and CC, of the immersion electrostatic lens plotted versus the working distance w with both axes scaled by the length of the retarding ﬁeld l. (a) Approximate Equation (41) for abrupt ﬁeld transitions; (b) and (c) calculation from Equation (40) for real potential distributions with the ﬁrst electrode (nearest to the specimen) of a thickness t ¼ 0.1 l (b) and t ¼ 0.2 l (c).

those are in the summation rule weighed by k3/2 / E3/2 (Lencova´, 1997), which is exactly the energy dependence that fully suppresses the resolution worsening at low energies. Obviously, the immersion objective lens eliminates deterioration of the objective lens parameters for slow electrons and introduces its own but weaker tendency to a larger spotsize. Figure 22 shows the most popular design of a compound lens consisting of the magnetic focusing lens and electrostatic retarding lens (Frosien et al., 1989), which is, together with the above-lens detector, also called MEDOL (magnetic–electrostatic detector objective lens). Authors report improvement in the aberration coeﬃcients from CS ¼ 59 mm and CC ¼ 15 mm to CS ¼ 3.7 mm and CC ¼ 1.8 mm at the immersion ratio k ¼ 17 so that a resolution of 5 nm at 500 eV was achieved (Martin et al., 1994). This design was also used in the ﬁrst and still the only commercial SEM with the retarding ﬁeld element and subsequently its parameters have been further upgraded. With a similar conﬁguration, Knell and Plies (1998) obtained 3 nm at 1 keV and 9 nm at 200 eV. The MEDOL-type lens was preceded by a purely electrostatic (three-electrode) lens by Zach and Rose (1988), called EDOL (electrostatic detector objective lens, see Figure 34); further data were then published by Zach (1989) and Zach and Haider (1992). For a beam energy of 8 keV inside the column, they applied the electrode potentials –7.5, þ 7, and 0 keV (when proceeding from the specimen) and

368

MU¨LLEROVA´ AND FRANK

FIGURE 22. Combined magnetic–electrostatic (compound) objective lens. (Reprinted with permission from Frosien et al., 1989.)

hence reached a landing energy of 500 eV for which a resolution of 7 nm was reported. Other conﬁgurations on a similar principle include the use of a so-called ‘‘booster,’’ i.e., a tube around the optic axis between the anode plane and the lower polepiece of the objective lens, insulated and held at a high positive potential (Beck et al., 1995) so that its lower end fully corresponds to the arrangement in Figure 22. Preikszas and Rose (1995) explored the possibilities of optimizing compound lenses and took into account maximum feasible magnetic and electric ﬁelds (they considered as limiting values 5 kV mm1 and 1 T), tolerable ﬁelds at the specimen surface, bore diameters in electrodes and polepieces, maximum immersion ratio, and energy spread in the beam. Also Khursheed (2002) examined the aberrations of a set of the compound lens conﬁgurations. Let us only brieﬂy mention that, adjacent to the SEM instrumentation area, is the family of IC testers, i.e., specialized scanning devices for inspection of semiconductor structures and measurement of critical dimensions on them (see, e.g., Ezumi et al., 1996). Their recent versions nearly exclusively work in the low-energy range around 1 keV, employ

SCANNING LEEM

369

various combinations of the compound lenses with energy ﬁlters (e.g., Frosien and Plies, 1987) and detectors, and achieve resolution comparable with those mentioned above. Practice has conﬁrmed the advantages of using the retarding ﬁeld principle, i.e., immersion or compound lenses, for SEM in the low-energy range. In recent commercial instruments acceptable imaging parameters have been achieved down to about 200 eV and the limit for reported laboratory conﬁgurations and IC testers is similar. A separate class is formed by the ﬁrst operated versions of aberration correctors. These are capable of achieving the resolution quoted above even in a device with the beam energy constant within the column. Possible corrector conﬁgurations were reviewed by Rose (1987), Rose and Preikszas (1992), Hawkes and Kasper (1996a) and Hawkes (1997). The aberration correctors are, nevertheless, mostly applied to STEM, TEM, and LEEM instruments where the specimen inﬂuence on the real image resolution is either nearly negligible or does not apply so that any spotsize correction is more eﬃciently projected into the ﬁnal result. Only a few applications in SEM have been reported yet; these were brieﬂy reviewed by Frank (2002). C. The Cathode Lens In the previous section we noticed that for a very short working distance w of the retarding immersion lens, the aberration coeﬃcients diminish with decreasing electron energy. A promising alternative is thus to choose w ¼ 0, i.e., to apply the retarding potential directly between the specimen and some anode placed closely above. This conﬁguration is called a cathode lens (CL) and has been known since the beginnings of electron microscopy as the crucial component of the emission electron microscopes. As we already mentioned in Section I, Recknagel published the fundamental theory of this optical element as early as 1941 and showed that its basic aberrations are proportional to the ratio of the initial and ﬁnal electron energies. The same should be expected for the reversed function in the SEM and this is indicated by Equation (41). More exact analytical relations for CS and CC for a combination of the cathode lens with the focusing magnetic objective lens with aberration coeﬃcients CSf and CCf were derived by Lenc and Mu¨llerova´ (1992b): 2 !4 3 pﬃﬃﬃ f l 4 ð k 1Þ 2 l=D 3 k 1 CS 5 p ﬃﬃﬃ p ﬃﬃﬃ þ 1 þ CS ¼ 3=2 pﬃﬃﬃ 3 k l kþ1 2 k kþ1

ð42Þ

MU¨LLEROVA´ AND FRANK

370

2 !2 3 pﬃﬃﬃ l 4 ð k 1Þ 2 3 k 1 CCf 5 pﬃﬃﬃ CC ¼ 3=2 pﬃﬃﬃ 3 k l 2 k kþ1

ð43Þ

with D as diameter of the anode bore. Instead of an abrupt potential transition in the electrode plane, the quadratic polynomial shape was considered here. For our simple characteristics of the energy dependences, development of Equations (42) and (43) into a power series for large k (i.e., small E ) gives relations that are easier to grasp: CS ﬃ

2 l l 81 1 þ CSf Eþ E 3=2 þ , 3=2 EP D 16 EP

CC ﬃ

l 9 CCf 3=2 Eþ E þ EP 4 EP3=2

ð44Þ

Equation (44) conﬁrms the conclusions of the previous section: the immersion lens introduces the E1 slope for both spherical and chromatic aberrations but eliminates the energy dependence of the focusing lens aberrations via the weight proportional to E3/2. The same holds for the ‘‘aperture lens,’’ i.e., the optical power of the CL ﬁeld penetrating the anode bore and forming a divergent lens, as we will discuss below. If we now substitute Equations (42) and (43) into Equation (36), then into Equation (38), and ﬁnally calculate again the optimum aperture opt, we obtain the results shown in Figure 23. (In this section we complete the sets of model parameters, FEG SEM and TEG SEM, with D ¼ 3.5 mm, EP ¼ 15 keV, and l ¼ 1.5 and 15 mm.) We see that the optimum angular aperture in the specimen plane is, at least at lowest energies, proportional to E1/4. When substituting this into all four contributions to the spot size (Equation (36)), we get both dG and dS proportional to E1/4 while both dC and dD scale as E1/4 and this can also be expected for dP. Because previously we found that these basic proportionalities are the same for both summation rules, with Equation (38) simply providing 1.6 times larger aperture and 1.8 times smaller spotsize, we used here only one rule. It is important to note that the optimum angular aperture just below the focusing lens, i.e., the beam aperture C formed by the microscope column, remains nearly the same when switching the cathode lens on. Hence the SLEEM mode does not require any signiﬁcant realignments of the column. In Figure 24 we see again a comparison of the calculated ultimate spotsizes, dPm, for two sets of the SEM parameters as deﬁned above; the summation rule (38) was used again. Obviously, the slope E1/4 is actually achieved at low energies, namely in the energy range where the higher

SCANNING LEEM

371

FIGURE 23. The optimum angular aperture, opt, for the smallest spotsize, plotted versus electron energy for a CL-equipped SLEEM. TEG SEM and FEG SEM denote the two sets of SEM parameters given in the text; the summation rule (38) was used with CS and CC substituted from Equations (42) and (43), respectively. For the cathode lens mode, the aperture is shown both between the focusing and cathode lens (— — ) as well as in the specimen plane (- - - - - -); for the latter case the aperture without CL is also shown (———). The numeric labels denote the maximum ﬁeld within the CL in kV mm1.

members in Equation (44) become negligible. For larger aberrations of the focusing lens this happens at lower energies so that, quite paradoxically, the overall drop in resolution between the primary beam energy and, say, 1 eV is smaller for the lower quality device—for the TEG SEM and 10 kV mm1, these spotsizes are identical in Figure 24. Figure 24 demonstrates one crucial fact: below some threshold of the order of hundreds of eV, even the routine microscope, equipped with the cathode lens, surpasses the top-quality device as regards the image resolution. This advantage is paid for by the fact that the specimen has to be immersed in the electrostatic ﬁeld, the strength of which governs the spotsize. The optimum aperture varies with energy and is therefore not convenient to use when acquiring a series of micrographs typical for the SLEEM operation, i.e., showing the same ﬁeld of view over a broader energy range. In this case some ﬁxed angular aperture is adjusted and it is interesting to enquire how this modiﬁes the resolution vs. energy curve. In Figure 25 we see that when ﬁxed apertures are chosen from among those optimum for certain energy within the low-energy range, deterioration at higher energies

372

MU¨LLEROVA´ AND FRANK

FIGURE 24. The ultimate spotsize, dPm, for the optimum angular aperture opt, calculated for the two sets of SEM parameters denoted by TEG SEM and FEG SEM (see text) from the summation rule (38): the conventional SEM mode without CL (- - - - -) and the SLEEM mode with the CL excited (——), namely for the maximum ﬁeld strength labeled in kV mm1.

FIGURE 25. The ultimate spotsize, dPm, for the optimum angular aperture opt, calculated for the model FEG SEM parameters (see text, maximum CL ﬁeld 10 kV mm1) from the summation rule (38) (- - - - -), together with resolutions obtained for three ﬁxed angular apertures, namely 1, 2, and 4 mrad (——).

SCANNING LEEM

373

FIGURE 26. (a) Simplest conﬁguration of SEM with the cathode lens introduced below the objective lens; (b) single-polepiece magnetic lens (SPL) installed below the specimen and serving as the focusing lens while the original objective lens is either switched oﬀ or used as an additional condenser lens (in the SLEEM mode, the anode/detector assembly was radially inserted from side to below OL).

is moderate only and, in some instances, a resolution really constant throughout the energy scale is obtained. In the previous paragraphs we concentrated on simple relations concerning the energy dependences of the beam aperture and spotsize. We assumed the electrostatic and magnetic ﬁelds of the immersion and focusing lenses as nonoverlapping and, furthermore, the shapes of electrodes and polepieces have not been taken into account. The simplest arrangement, shown in Figure 26(a), can be also realized via adaptation of a conventional SEM (Mu¨llerova´ and Frank, 1993), as will be mentioned below. An electrostatic focusing lens was used in LEEM by Liebel and Senftinger (1991) while Mu¨llerova´ and Lenc (1992b) applied to SLEEM the singlepolepiece magnetic lens (see Figure 26(b)). Khursheed (2002) compared the ultimate resolutions achievable in three conﬁgurations that included the specimen inserted into the magnetic ﬁeld without any retarding, and both the nonoverlapping and overlapping magnetic focusing and electric retarding ﬁelds. Using a simple model of very thin electrodes and the bellshaped magnetic ﬁeld (Glaser, 1952), he found that the overlapping ﬁelds provide 1.5 to 2 times smaller spotsize than the ‘‘sequential’’ conﬁguration

374

MU¨LLEROVA´ AND FRANK

and at 5 kV mm1 a spotsize of about 1 nm for an electron energy of 200 eV was calculated. D. The Pixel Size As we already mentioned in Section IV.D, the specimen response function for the total electron emission is composed of two bell-shaped contributions of diﬀerent widths. The narrower peak corresponds to the SE1 part of SE, released directly with primary electrons, and its width is similar to the primary spotsize dP, amounting approximately to (d2P þ l2esc)1/2, while the broader component is that of SE2 and BSE and its width is similar to the electron range R. At high energies, the SE and BSE signals are, as a rule, detected separately and the SE resolution is much higher than that of BSE. The SE2 contribution to the SE image is usually smeared so much that visually it is not apparent and when the resolution is measured between 25 and 75% of the signal rise on a sharp edge, the SE2 signal need not manifest itself at all. The BSE resolution is usually presented on small clusters of heavy metals so that the localization of information is improved by a sharp structure within the broad three-dimensional distribution of the BSE yield. However, at low energies the electron range approaches the escape depth of SE and the widths of both response functions become similar. As demonstrated for a silicon specimen by Reimer (1998), below 1 keV the SE distribution becomes even broader than that of BSE owing to lateral diﬀusion of SE2 after their release by BSE. In the SLEEM method, we usually detect a mixture of SE and BSE and use just the energy range where both distribution widths are comparable—this is why we have to consider the real resolution, or the pixel size, as determined by the full response function incorporating also the specimen. The problem was solved using the response function formalism by Frank (1996a,b). The spatial distribution IT (r) of the total emitted current in the surface plane can be written as Z IT ðrÞ=IP ¼ C ðrÞ þ ð1 þ Þ

C ðr0 ÞS ðr r0 Þ dr0

ð45Þ

where IP is the primary current. Let us assume both the column response C(r) and specimen response S(r) to be two-dimensional distributions of independent normal random variables. The normal distribution of BSE and SE2 (i.e., the shape of S(r)) was proved by Hasselbach and Rieke (1982) above 20 keV so at lower energies it can be assumed only as a rough

375

SCANNING LEEM

approximation and the same holds for the shape of C(r). One way of assessing the pixel size is to take the RMS distance of the emitted electron, dRMS, which can be calculated for the axially symmetric case as "Z ,Z #1=2 1

dRMS ¼ 2

1

r2 IT ðrÞ dr 0

IT ðrÞ dr

:

ð46Þ

0

After substituting from Equation (45) and taking two-dimensional Gaussians for both C(r) and S(r), we get

1=2 dRMS ¼ ½ þ ð1 þ Þ 1=2 dP2 þ ð1 þ Þ dP2 þ dS2

ð47Þ

where dP is the spotsize and dS is the RMS width of the specimen response. Equation (47) was then used for the estimation of the best achievable values of dRMS at low energies. The emission yields were calculated from the approximate relations reviewed above and the primary spotsize was assumed both for a standard SEM and for the CL-equipped one. The RMS specimen response dS was determined by MC simulations using software described by Czyzewski and Joy (1989) with the result dS ﬃ C1 E 1:75

ð48Þ

where C ﬃ 9 1011 kg m2 eV1. With the approximations described above, the pixel size dRMS exhibits a minimum (see Figure 27) enabling one to deﬁne the optimum imaging conditions for a particular specimen when the total electron emission is detected, as it is in most versions of SLEEM. Hence

FIGURE 27. Comparison of the primary spotsize (——) with the total pixel size dRMS calculated for Cu (- - - - -). The CL parameters were l ¼ D ¼ 5 mm and the summation rule (37) was used.

376

MU¨LLEROVA´ AND FRANK

optimum energies of the electron impact and ultimate values of dRMS were calculated for all three conﬁgurations indicated in Figure 27 and for the majority of chemical elements. The optimum energies move between 330 and 4530 eV while the ultimate resolutions were found as 5 to 13 times the nominal spotsize at 30 keV for both microscopes without CL and only 1.6 to 2 times for the CL-equipped model TEG SEM (Frank, 1996a). However, these data provide only broad guidance because of many simpliﬁcations made. The approach employing the specimen response function can be extended one step further, provided the SE emission is considered only (see Frank, 1996b). In the previous derivation we took the specimen to be fully homogeneous, with all yields constant with respect to r. Now we can progress to a specimen composed of a homogeneous substrate with a heterogeneous surface ﬁlm or surface relief. Then, both and in Equation (47) remain position independent but the emission distribution ISE (r) can be written as a convolution, ISE(r) ¼ (r) iSE(r), with iSE ðrÞ=IP ¼ C ðrÞ þ ½C ðrÞ SðrÞ

ð49Þ

which enables us to separate the imaged surface from the distribution of illumination by both PE and BSE. Because S(r) does not vary over the surface for a homogeneous substrate, we get the true specimen response function, which can be, for C(r) and S(r) approximated by Gaussians, written as h 1=2 i IRF ¼ G2 ðP , rÞ þ ðZÞG2 P2 þ S2 ,r

ð50Þ

where G2(, r) ¼ (2p)1 2 exp(r2/2 2) is the two-dimensional Gaussian function. Equation (50) opens possibilities of using any acknowledged resolution criterion, like the Rayleigh one or those based on a certain encircled portion of signal, in addition to the evaluation via statistical moments that was performed before. In Figure 28 the real resolutions for C, Cu, and Au are compared for the Rayleigh criterion and the pixel size deﬁned by 80 and 90% of the encircled signal. Obviously, the appearance of the resolution minimum, as in Figure 27, is connected with criteria oriented to the total signal (like dRMS) or to its major portion (like d90). In the dR curves the minimum is not present at all and the dnn curves exhibit the minimum (connected with a signiﬁcant inﬂuence of SE2) only for very high percentage nn. Already at nn ¼ 80% the minimum disappears for the lightest element and at lower nn it is also not found. The above analysis showed that the real resolution has to be assessed by means of criteria oriented onto the central peak of the total response

SCANNING LEEM

377

FIGURE 28. (a) Resolution dR calculated from IRF according to Equation (50) when the Rayleigh criterion is used (i.e., a drop of IRF to 36.74% of the maximum; see Born and Wolf, 1975) for three elements, with dR0 representing the ﬁrst term in Equation (50) only; (b) resolution dnn for nn ¼ 80 and 90% of the signal encircled within the diameter deﬁning the resolution, again with dnn0 for the ﬁrst term in Equation (50). Parameters of the model FEG SEM and Equation (37) were used.

function, i.e., criteria based on a certain decrease of IRF with respect to its maximum or on some encircled signal. These criteria show only a small extension of the pixel size with respect to the primary spotsize, as exempliﬁed in Figure 28. On the contrary, the statistical moments of the signal distribution in the specimen plane overestimate the inﬂuence of species having diﬀused to great distances so that fully unrealistic ﬁgures appear at higher energies (see Figure 27). This indicates that even at low energies the conventional resolution tests can be used provided their evaluation respects the above-mentioned circumstances. E. Spurious Effects The spurious eﬀects inﬂuencing parameters of the electron probe in SEM are listed in Section II.C. Some of them are connected with the Coulomb forces acting between electrons moving within the beam so that the intensity of eﬀects depends on the beam energy. The main phenomena include probe size broadening owing to stochastic e–e interactions, broadening of the energy spread (the Boersch eﬀect), and defocus or probe shift caused by the overall space charge. The probe broadening caused by stochastic interactions was studied by Spehr (1985). He found the spotsize enlargement proportional to the normalized beam current 3=2 I E0 ð51Þ k¼ 2 I0 2E

378

MU¨LLEROVA´ AND FRANK

where I is the beam current, is the angle of beam convergence, I0 ¼ 3.41 104 A, and E0 is the rest energy of an electron. This E3/2 dependence is further enhanced by another factor that increases with decreasing energy with progressively varying slope and cannot be characterized by a simple proportion, but for short slow beams it approximately behaves as ln2 (const E 1). Naturally, the ﬁnal crossover at the specimen surface is the most critical one because the energy is lowest there. In cathode lenses, the beam aperture grows toward the specimen surface as 2 / E 1 so that altogether we get the probe-broadening rate somewhere around E 1. Mankos and Adler (2002) explored the problem of stochastic interactions for the cathode lens conﬁgurations. Using precise tracing of particle bunches through calculated electric and magnetic ﬁelds for both electrostatic and compound lenses with non-overlapping retarding and focusing parts, they obtained the ‘‘blur’’ values for wide ranges of the beam current and current density. Being oriented to direct imaging in the PEEM mode, their data range is shifted to larger currents and lower densities than those corresponding to the SLEEM situation. We can extrapolate their data to our case, the probe current of 5 pA and the spotsize of 10 nm at lowest energies, i.e., to the current density 5 103 mA cm2, and obtain a broadening of about 1 to 2 nm. Otherwise, a linear increase in the blur with decreasing EP was found. As regards the increase in the energy spread owing to e–e interactions, we already mentioned the fundamental work of Rose and Spehr (1980). For the stigmatic focus they calculated the extra energy spread to be hE/Ei ¼ 2pk (see Equation (51)) for low currents, so that E / I1/2E1/4. This result is independent of the beam aperture provided k 1 and 12E/E0. The second condition is easily satisﬁed and for a beam current of 5 pA and aperture of 1 mrad, we get k 2 102 at 1 eV while for larger energies it further decreases as E 3/2. Thus, the Boersch eﬀect is not enhanced at low energies. The average space charge within the whole beam acts as a divergent lens causing some defocus of the primary spot. Spehr (1985) showed that for a constant current density across the beam and k<102, refocusing of the appropriate lens enables the spot broadening to be corrected with negligible residual eﬀect. For an electron beam with Gaussian cross-section, a contribution to the spherical aberration is generated too, with the corresponding confusion disc that, again for low currents, has a diameter de–e ¼ 1.1k D0, where D0 is the diameter of the beam-limiting diaphragm. Nevertheless, at the same time it is claimed that this deviation can also be corrected by readjusting the lens excitations. The proportionality to k, i.e., to E3/2, requires the eﬀect to be listed here although there are no reports

SCANNING LEEM

379

about its practical demonstration so that successful correction via ﬁne focusing can be believed. Important spurious eﬀects in SEM are caused by the penetration of external electromagnetic waves into the column. These phenomena were reviewed by Frank and Mu¨llerova´ (1999) and found negligible except the beam deﬂection y caused by a radial magnetic ﬁeld Br, which amounts to y ¼ eBr (2E/m3)1/2 2 with as the time of ﬂight across the region exposed to the magnetic ﬁeld. The beam trajectory inside magnetic lenses is shielded by magnetic circuits against the spurious ﬁelds relatively well so that the most exposed part is the trajectory along the working distance between the lower polepiece of the objective lens and the specimen. If this region is traversed by p slow electrons at energy E, the time of ﬂight is (1 þ k)/2 times longer than when electrons are decelerated to E from the primary energy EP along p the same trajectory. This means that the beam deﬂection is reduced (1 þ k)2/4 times, i.e., for example 266 for k ¼ 1000. From this point of view, insertion of the cathode lens below the magnetic objective lens represents the optimum solution. Finally let us recall the problem of mechanical vibrations. This issue is common to all types of SEM and its impact simply depends on demagniﬁcation of the gun crossover. The most sensitive component to any vibrations is the cathode itself but in TEG SEM its movements are demagniﬁed 103 to 104 times together with the crossover and become negligible. However, for FEG operated at room temperature the necessary demagniﬁcation remains of the order of units, so serious problems arise unless the device is carefully insulated from vibration sources. This problem is not speciﬁc to the SLEEM mode.

F. Testing the Resolution We have already mentioned that, even at low energies, the conventional tests of resolution, made with a specimen containing small particles of a heavy metal on a low-atomic-number substrate (most often gold on carbon), can be used provided their evaluation respects the signal composition of SE1, SE2, and BSE. For psychological reasons, it is desirable to extract from these tests numbers that approach very closely the calculated spotsize without any enlargement owing to diﬀusion inside the specimen. This means the tests should be performed solely upon the SE1 signal. In previous sections we showed that in the low-energy range where the lateral spread of SE2 and BSE emission shrinks and approaches that of SE1, progressively enhanced fraction of SE release takes place within lesc and hence the SE1 signal relatively grows. It is reasonable to suppress the

380

MU¨LLEROVA´ AND FRANK

distribution tail that appears in the edge width measurement by taking the thresholds far enough from the signal levels on adjacent facets; the proved algorithm is to measure between levels of 25 and 75% of the signal rise. Reimer (1998) used MC simulations to model the resolution test for emission distributions of all signals and also their integrals on one side of a moving straight edge and veriﬁed applicability of the 25/75 scheme. As we will discuss in the next section, the SE and BSE signals are detected together in the SLEEM mode and therefore extension of the lateral distribution of the total signal with respect to that of SE1 becomes even more probable. Nevertheless, practical experience showed that also here the 25/75 rule is suitable (see Figure 30). Another crucial circumstance, not taken into account with a conventional SEM, is the necessity of using a specimen that preserves a suﬃcient contrast throughout the full energy scale. It is believed that a diﬀerence in atomic numbers as large as that between Au and C should secure this. However, we see in Figures 13 and 16 that the signal yields already change their mutual relations drastically above 250 eV; other published data conﬁrm this down to even lower energies, as Schmid et al. (1983) showed for eBSE. The contrast behavior of the standard Au/C specimen was veriﬁed in two instruments equipped with the same SLEEM detector and CL assembly described below. The microscopes diﬀered mainly in the vacuum conditions; one had the usual medium vacuum (MV) of the order of 104 Pa and the other used clean UHV at about 2 108 Pa. As shown in Figure 29, at 320 eV for both devices the Au/C contrast is substantially less than that at 3 keV but at 20 eV for the MV instrument the contrast fully disappears, while in UHV it is inverted and quite high. Surface contaminant layers being less transparent at 20 eV under worse vacuum might cause the diﬀerence but as yet the interpretation is not fully clear (see Mu¨llerova´ and Frank, 2003). Nevertheless, ﬁne cracks are still apparent at the surface, the contrast of which is obviously due to enhanced electron absorption in deep cavities, and an appreciable signal rise at edges can be observed. Figure 30 shows a linescan across such an edge, taken at an electron energy of 10 eV, which demonstrates a resolution of 9.3 nm, to the authors’ knowledge the best one achieved yet (Mu¨llerova´ and Frank, 2002). This value corresponds to the objective lens aberrations CS ¼ 33 mm, CC ¼ 15 mm, published by Takashima (1994) for the working distance (WD) of 6.5 mm, and to WD ¼ 8 mm used by us, which is larger than that appropriate for the guaranteed instrument resolution (1 nm at 15 keV). As regards the UHV SLEEM, the same measurement gave 11.5 nm at 10 keV and 26 nm at 10 eV.

SCANNING LEEM

381

FIGURE 29. Micrographs of the standard resolution-testing specimen with Au particles on a carbon substrate, taken (from the top) at 3020, 320, and 20 eV. Left column: dedicated UHV SLEEM of ISI Brno, right column: JEOL 6700F adapted for SLEEM. The width of the ﬁeld of view is 100 mm (top left) and 200 mm (top right). (Reprinted with permission from Mu¨llerova´ and Frank, 2003.)

VI. DETECTION

AND

SPECIMEN-RELATED ISSUES

This review concentrates on SEM modes employing primary beam retardation close to the specimen. In the majority of instances the retarding ﬁeld is also traversed by the signal electrons in the opposite direction so that these are accelerated and, if the ﬁeld has its axial component strongly

382

MU¨LLEROVA´ AND FRANK

FIGURE 30. Linescan across an edge in a micrograph of the Au/C specimen, taken at an energy of 10 eV (JEOL 6700F adapted for the SLEEM method), with the edge width indicated.

prevailing, also collimated toward the axis. In the cathode lens conﬁgurations the specimen itself becomes one electrode of the electron optical system. These facts have a decisive impact on the choice of detection principles to be used. First of all, the classical Everhart–Thornley detector, extracting slow SE by a lateral electric ﬁeld, cannot be used because the emitted electrons are being accelerated along the axis and also because the slow primary beam could be undesirably aﬀected. Hence the objective lens with its focusing and retarding parts has to be considered together with the detector. Further, owing to acceleration of signal electrons, the crucial diﬀerence between typical SE and BSE energies is shifted so that usually both appear in the same order of magnitude. Then, separate detection of SE and BSE via conventional methods is not eﬃcient any more and novel detection principles are needed. Finally, the specimen surface parameters, the roughness in particular, have to be considered for the cathode lens assemblies. A. Detection Strategies In previous sections we frequently compared the properties of the immersion objective lens (IOL) in its general form with the particular case for w ¼ 0, in which the retarding part is called the cathode lens. It is worth continuing this also here: as demonstrated in Figure 31, the formation of the ‘‘signal beam’’ is signiﬁcantly diﬀerent in these cases.

SCANNING LEEM

383

FIGURE 31. Trajectories of electrons in the electrostatic immersion lens (a) and cathode lens (b); the potential diﬀerence within the lens is 10 kV; energies of electron emission are 5 eV (bottom half of the bundles) and 200 eV. The simulation was made using the SIMION 3D package (Dahl, 1995). (Reprinted with permission from Mu¨llerova´ and Frank, 1999.)

For the IOL in the low-energy range, a non-negligible part of BSE impinge on the ﬁrst electrode and can be detected in this plane while the SE emission is concentrated to a bundle that is focused into some crossover and then again spreads. This formation of an image of the emitting pixel is further supported by the focusing part of the IOL. Thus, the SE beam can be acquired even around the axis in a suitable plane above the IOL with a detector type normally used for BSE. The arrangement (see Figure 22) is then similar to the so-called ‘‘upper’’ SE detector utilized for acquisition of the SE beam from a specimen immersed in the magnetic ﬁeld of the objective lens, which collimates the SE emission toward ﬂux lines of the ﬁeld (see Kruit and Lenc, 1992). Here the SE beam is already accelerated so that its detection is easier. The upper SE detector is usually situated above the deﬂection stage so that its action also inﬂuences the trajectories of signal electrons and a general issue here is to minimize or avoid escape of signal through the detector bore. We will discuss this problem further in the next sections (see Figures 40 and 46). If a CL is used (Figure 31(b)), the signal electrons are collimated to a diverging beam the width of which depends on emission energy and CL parameters. If we solve the classical equation of motion for an electron emitted with initial energy Ee under an angle with respect to the surface normal into the CL ﬁeld within which it is accelerated to energy EP, we get for its radial coordinate ra at the end of ﬁeld, i.e., in the anode plane, ra ¼

i 1=2 2l sin h ke sin2 cos

ke 1

ð52Þ

384

MU¨LLEROVA´ AND FRANK

where ke ¼ EP/Ee is the immersion ratio for the emitted electron, deﬁned analogously to k. The entire emitted bundle for 2 (0, p/2) is then concentrated into the spot of radius 2l ra, max ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ : ke 1

ð53Þ

The angle of passage through the anode plane, a, is given as sin a ¼ ke1/2sin so that the angular aperture of the bundle is a,max ¼ arcsin (ke1/2). When drawing a ray backward from the anode plane, we ﬁnd it crossing the optic axis, according to its emission angle, between l and l(k1/21)/(k1/2 þ 1) behind the cathode, with the latter position corresponding to the paraxial ray. This virtual source is further imaged by the aperture lens in the anode plane and by the focusing part of the IOL. For secondary electrons with characteristic energy 3 eV (see Section IV.D) and for typical values of l ¼ 7 mm and EP ¼ 15 keV, we get ra,max ¼ 0.2 mm and a,max ¼ 14 mrad. The beam of backscattered electrons is formed according to the BSE energy, which is dependent on the landing energy of the primary beam, but at very low energies similar ﬁgures are obtained to those for SE. Obviously, the detection made below the focusing lens has to be extended up to the close vicinity of the optic axis or again the throughthe-lens principle has to be incorporated. The previous simple considerations indicate a general problem: we have a narrow signal beam along the axis, escaping at least partially the detection through the central bore left for the primary beam. The signal losses can be reduced if the signal beam can be again broadened within a suitably arranged electric ﬁeld as is done in the EDOL-type lens already mentioned; we will return to this arrangement in the next section. The issue can be fully solved via deﬂection by means of crossed electric and magnetic ﬁelds, i.e., by a so-called Wien ﬁlter or E B ﬁlter (see Figure 32). The electric and magnetic forces subtract for the primary beam direction, but for the opposite signal beam direction they add and cause its deﬂection toward the detector. The Wien condition for equality of electric and magnetic forces can be easily fulﬁlled for the homogeneous parts of the ﬁelds but it is more diﬃcult to satisfy for the spurious ﬁelds at the margins of electrodes/polepieces. In addition, any spread in electron velocities causes beam dispersion. This is why two identical but oppositely oriented ﬁlters are often incorporated so that the primary beam passes both and any undesired modiﬁcations are mutually compensated while the signal beam escapes between the ﬁlters (see Figure 46). In order to further minimize any inﬂuence on the primary beam, the Wien ﬁlters can be made

SCANNING LEEM

385

FIGURE 32. Principle of the beam deﬂector employing crossed electric and magnetic ﬁelds.

weak, just suﬃcient to deﬂect the signal beam to where it can enter some other electric ﬁeld, not penetrating to the optic axis, that extracts it strongly towards the detector; for example, see Figure 33 and the same principle is shown in Figure 46. Various modiﬁcations of detector assemblies containing Wien ﬁlters have appeared in the literature since the 1980s (e.g., Schmid and Brunner, 1986; Brunner and Schmid, 1987; Reimer and Ka¨ssens, 1994; McKernan, 1998). Zach and Rose (1986) studied the inﬂuence on the primary beam of ﬁlter aberrations and proposed using higher order than dipole ﬁelds. A detailed study of the ﬁlter properties, including fringing ﬁelds, was presented by Kato and Tsuno (1990). A signiﬁcant eﬀort has also been invested in shifting the range of eﬃcient operation of BSE detectors down to lower energies, i.e., in breaking as much as possible the traditional threshold at 2 to 3 keV. The amendments include both technological improvements concerning the preparation of scintillator surfaces and the introduction of extraction ﬁelds in such a way that secondary electrons are still not incorporated. Important studies include those of, for example, Autrata and Hejna (1991), Autrata et al. (1992), Hejna (1994), Autrata and Schauer (1994) and Hejna (1998). New detector principles have emerged that employ sophisticated arrangements of electric ﬁelds, created by electrodes situated within the magnetic lens bore and gap, and permitting a wide range of manipulations with the emitted electrons and extension of the scope of operation modes of the upper SE detector. These operation modes include collimation of the SE beam for enhanced detection eﬃciency, reﬂection of a portion

386

MU¨LLEROVA´ AND FRANK

FIGURE 33. Scheme of the double Wien ﬁlter with intermediate electrostatic mirror for deﬂection of the signal beam in an SEM. The bundle of SE trajectories is shown for the emission energy of 4 eV and emission angles within 70 , Ub ¼ 10 kV. (Reprinted with permission from Pejchl et al., 1994.)

of the SE in order to achieve the charge balance at a nonconductive surface without charging, and, in combination with a moderate specimen bias, conversion of accelerated SE, impacting a converter surface, to tertiary electrons that are detected normally. In this way, good-quality micrographs were obtained down to 100 eV (Kazumori, 2002). An analogous conﬁguration of electrodes, combined with the E B ﬁlter, enables one to control the content of SE and BSE in their mixture detected by a single (upper) detector. In this conﬁguration, BSE are converted to SE and biases of the electrodes can discriminate between SE from the specimen and those from the converter surface. These combinations with the crossed E B ﬁelds appeared for the ﬁrst time in the early 1990s (see Sato et al., 1993). Inevitably, arrangements of all electrodes are being published only schematically because details are considered conﬁdential. Nevertheless, the CAD systems available do enable one to tailor the above-specimen ﬁelds in various ways and to optimize the detection eﬃciency for individual portions of the energy and angular distributions of total electron emission.

SCANNING LEEM

387

B. Detectors In the previous section we listed the possible approaches to detection of electrons in systems with retarding ﬁeld elements. Now we will describe several actual conﬁgurations. For IOL setups, i.e., with the retarding ﬁeld not directly applied to the specimen, the variety of detector assemblies is very broad. For SE detection, they mostly rely upon the through-the-lens principle and detect SE either with a coaxial scintillator-type detector or deﬂect them by an E B element toward a side-attached detector similar to the ET type. As regards BSE detection, conventional assemblies below the IOL polepiece are utilized but novel approaches have also appeared that include conversion of BSE to SE and detection of a controllable mixture of SE and BSE as mentioned above. Let us recall here the simple EDOL arrangement published by Zach and Rose (1988) (see Figure 34(a)), in which an accelerating electrostatic lens is employed and designed so that the conical electrodes generate signiﬁcant radial components of the ﬁeld. Within the ﬁrst accelerating part of the lens, the SE emission is collimated to a beam while in the decelerating part the beam is, owing to radial forces, appropriately broadened so that it hits the annular detector with a reasonably large central bore. In a conﬁguration according to Figure 34(b), the decelerating part of an electrostatic lens is similarly utilized to broaden the beam but here the lens is of the decelerating type and hence its ﬁrst part is employed for this purpose. A further diﬀerence consists in using the conical electrode as the electron converter transforming accelerated signal electrons to tertiary electrons that are directed by the radial ﬁeld toward a microchannel plate (MCP)-based detection assembly (Frank et al., 2000a). In this setup, the collection eﬃciency for 10 eV signal electrons, i.e., the probability of their impact on the converter surface, was calculated to exceed 98% and, after conversion and passing the MCP, 35% of emitted electrons still create signal impulses. A setup on this principle, an electrostatic detector lens (EDL), can be more widely utilized as it enables one to introduce a segmented or even twodimensional collector below the MCP and to acquire data about the angular distribution of emission. For completeness, let us also recall the MEDOL conﬁguration (Figure 22) in which the signal beam is projected onto the detector by the combined action of both components of the compound lens. Now we will speciﬁcally deal with detection in systems that employ a cathode lens, particularly those based on adaptation of a conventional SEM. The authors’ ﬁrst experimental arrangement contained the detector assembly shown in Figure 35(a) (Mu¨llerova´ and Frank, 1993) in which, on the surfaces of the diaphragm and lower polepiece of the objective lens (OL),

388

MU¨LLEROVA´ AND FRANK

FIGURE 34. (a) Scheme of the EDOL arrangement with an accelerating lens that broadens the signal beam inside its decelerating second part (electrode potentials are shown together with electron energy in parentheses). (Reprinted with permission from Zach and Rose, 1988.) (b) Similar principle combined with a converter of accelerated signal electrons into tertiary electrons detected by a multichannel-plate-based assembly (L, lens; Conv, convertor; CPI and CPO, input and output of MCP, respectively, Coll, collector; A, anode; Sp, specimen). (Reprinted with permission from Frank et al., 2000a.)

the accelerated signal electrons are converted to tertiary electrons and these are attracted to a conventional ET detector with the front grid removed in order to allow the scintillator ﬁeld to penetrate towards the axis, as shown in Figure 35(b). The advantages of this type include very low price and easy realization but the drawback is the quite large working distance that is necessary. In fact, the same arrangement was introduced by manufacturers in the form of the upper detector with converter (see above), but for an easy adaptation in users’ laboratories the space above the SEM objective lens was, of course, not accessible. Another successful design employed the single-polepiece magnetic lens situated below the specimen (see Figure 26(b)), now completed with the anode of a CL and a BSE detector with a YAG crystal; the scheme of the assembly is shown in Figure 36. In this setup, a micrograph with a resolution of 80 nm at 0.5 eV electron energy was obtained for the ﬁrst time (Mu¨llerova´ and Lenc, 1992b). This type is hardly suitable for adaptation of classical SEM instruments but was used, for example in a specialized lowenergy SEM for inspection of semiconductor structures at a landing energy of 800 eV and a primary energy of 20 keV (Meisburger et al., 1992). The most successful arrangement to date is shown in Figure 37. A crucial component is the YAG:Ce3 þ single-crystal scintillator disc with small central bore of depth and diameter 300 mm, side-attached to a light guide

SCANNING LEEM

389

FIGURE 35. Conﬁguration of the SLEEM mode detector utilizing conversion of accelerated signal electrons on surfaces of the diaphragm and lower polepiece of an OL, and extraction of tertiary electrons toward a conventional ET detector (left); equipotential surfaces within the assembly when the front grid of the ET detector is removed (right). (Reprinted with permission from Frank and Mu¨llerova´, 1999.)

FIGURE 36. A combination of the cathode lens with a single-polepiece lens situated below the specimen.

390

MU¨LLEROVA´ AND FRANK

FIGURE 37. The CL/detector assembly with a YAG:Ce3 þ single-crystal scintillator. (Reprinted with permission from Frank and Mu¨llerova´, 1999.)

made of organic glass (for standard vacuum applications) or of quartz (for bakeable UHV instruments). The bore size was tuned to some balance between reasonable dimensions of the ﬁeld of view and successful acquisition of very-low-energy electrons collimated towards the close vicinity of the axis. As shown in Figure 38, for one typical set of dimensions (used in the experiment described in Section VIII.E) and for normal impact of PE, signal electrons are detected above 0.5 eV of emission energy. This conﬁguration is similar to the so-called Autrata-type BSE detector (Autrata, 1989) but important diﬀerences are the much smaller central bore and the related necessity for ﬁne adjustment of the crystal position in all three axes. Fortunately, this adjustment is decisively facilitated by the fact that the upper crystal surface is also active so that the detector bore is directly observed on the SEM screen (see Section VII.C). The bore shape with a 45 sink is dictated by several issues that include requirements of the boring technology, feasibility of a conductive coating of the inner bore wall, and an advantageous axial ﬁeld distribution. It is obvious from Equation (42) that one term in the relation for CS, namely aberration of the anode ﬁeld, is inversely proportional to the bore diameter D and hence D should not be made too small.

SCANNING LEEM

391

FIGURE 38. The maximum emission angle, m, of an electron with the emission energy Ee for which the electron still escapes through the central bore. Solid line: data calculated from Equation (52) for ra ¼ 0.15 mm, l ¼ 11 mm, EP ¼ 10 keV; squares: exact values obtained via trajectory simulations using a software described by Lencova´ and Wisselink (1990). (Reprinted with permission from Frank et al., 1999.).

FIGURE 39. Derivative @/@z of the axial potential distribution (z) for the three anode shapes outlined in the inset (calculated using software described by Lencova´ and Wisselink, 1990).

The shape shown in Figure 37 produces the axial potential distribution that resembles more closely the distribution corresponding to the outer diameter of the sink than that of the inner one (Zobacˇova´ et al., 2003); see Figure 39.

392

MU¨LLEROVA´ AND FRANK

FIGURE 40. Scheme of trajectories of primary and signal electrons in a CL-equipped SEM with an upper detector.

With regard to adaptations of commercial SEMs to the SLEEM method, we should also mention an alternative employment of the upper SE detector, either in a coaxial arrangement or side-attached one with the electron converter, when it is combined with a CL operated between the specimen and the lower polepiece of the OL. Here we gain the space required for the detector as in Figure 37, so that a shorter working distance can be attained. In Figure 40, a sketch of electron trajectories is shown for the case when none of the above-mentioned additional electric ﬁelds for manipulation with signal electrons is considered. All previously described setups are intended for adaptation of conventional SEMs that naturally have their full columns at the ground potential and hence any retarding ﬁeld can be created only via a specimen bias. This can impose serious diﬃculties particularly for instruments equipped with an air-lock for insertion of a specimen cartridge. But in any case the specimen biasing loads the routine operation with extra tasks and checks. It is much more convenient to use the booster principle with a positively biased central tube that creates the necessary retarding ﬁeld even toward the specimen at ground potential—the next section will address these questions. Here again we should recall the family of IC testers in which CL/detector assemblies are increasingly used and designed subject to an extra requirement that is not so important for other SEM applications, namely

SCANNING LEEM

393

that the beam current be as high as possible in order to achieve a high throughput in the productions checks (see Section VII.B).

C. Signal Composition Within the CL ﬁeld the whole energy spectrum of emitted electrons, outlined in Figure 11, is accelerated by the potential diﬀerence between specimen and anode. For larger CL ﬁelds, the energies of the SE and BSE are of the same order of magnitude and no type of detector can eﬃciently separate them unless a true energy ﬁlter is incorporated. The particular composition of the signal mixture depends on parameters of the ﬁelds and on the geometry and here we can only show one typical example. Let us consider the CL/detector assembly according to Figure 37 with a YAG:Ce3 þ crystal for which Autrata and Schauer (1998) published the detection quantum eﬃciency (DQE), deﬁned as the squared ratio of the signal-to-noise ratios (SNR) at the scintillator input and output. For a cosine distribution of both SE and BSE emissions, the portion of electrons hitting the scintillator (or the collection eﬃciency) is simply P ¼ IDET =I ¼ sin2 max sin2 min

ð54Þ

where max and min correspond to the marginal rays incident onto the scintillator, which can be determined from Equation (52). Figure 41 shows P and DQE for both SE and BSE, plotted with respect to the landing energy of the PE together with the detection weight WBSE=SE ¼

PBSE DQEBSE : PSE DQESE

ð55Þ

This obviously favors BSE in a ratio between 5:3 and 4:3. Since the SE yield usually surpasses the BSE one, the signal mixture, governed by the weight WBSE/SE, represents a more or less balanced combination of both components. The graph in Figure 41 depends on particular values of three parameters, EP, D/l, and D0/l (D0 being the outer scintillator diameter), but also for other reasonable combinations of these factors similar results for WBSE/SE are obtained. With below-the-lens detectors the BSE/SE ratio cannot be eﬃciently controlled and generally a sum of both signals is obtained. On the contrary, setups employing the upper detector open possibilities for controlling the signal ratio at least down to about 200 eV where SE and BSE cease to be

394

MU¨LLEROVA´ AND FRANK

FIGURE 41. Collection eﬃciency P for SE emitted at mean energy of 3 eV, for BSE emitted at 75% of the landing energy of PE (above 50 eV), and for eBSE below 50 eV, calculated for the assembly shown in Figure 37 with a crystal outer diameter 10 mm and bore diameter 0.3 mm and ﬁeld length l ¼ 7 mm, together with the DQE values corresponding to the various impact energies and with the weight WBSE/SE according to Equation (55).

distinguishable even in the energy spectrum of emission—see simulation results of Kuhr and Fitting (1998).

D. Specimen Surface One important characteristic of an IOL is the magnitude of the electric ﬁeld penetrating toward the specimen surface. For the CL as the extreme case, the surface ﬁeld is the maximum retarding ﬁeld used, while IOL arrangements with a nonzero working distance w, i.e., with the specimen electrically connected to the ﬁrst electrode, are usually regarded as situating the specimen in a ﬁeld-free space. Let us assess the ﬁeld penetration outwards of the retarding lens for a simple case of ﬂat electrodes of the same thickness t and identical bore diameter D, situated at distances w and w þ l from the specimen, as in Figure 21. For three particular IOL geometries we show in Figure 42 the ratio of the surface axial ﬁeld to the maximum axial ﬁeld within the lens plotted versus w/l, together with the immersion ratio k for which the IOL alone focuses the beam onto the specimen surface. We see that for very low energies, i.e., say for k 500, the surface ﬁeld does not drop below 10% of Ez, max. Nevertheless, if a magnetic lens contributes to the probe focusing, the working distance further shortens. If both lenses are of an equal optical strength, the surface ﬁeld is Ez ﬃ 0.5Ez, max.

SCANNING LEEM

395

FIGURE 42. The axial ﬁeld strength on the specimen surface referred to the maximum ﬁeld strength within an IOL, and the immersion ratio k for focusing the probe by the electrostatic lens only, plotted for three conﬁgurations of the IOL with ﬂat thin electrodes: (a) D ¼ l, t ¼ 0; (b) D ¼ l, t ¼ 0.1 l; (c) D ¼ l/2, t ¼ 0.1 l (for symbols see text).

Obviously, the ﬁeld penetration has to be taken into account even for an IOL with a nonzero working distance. In order to improve the situation, the lens electrodes have to be shaped and their bores tuned to create an axial potential distribution falling as sharply as possible in the close vicinity of the specimen. For this task powerful simulation software packages are available; the optimization procedures were studied by Preikszas and Rose (1995). We will now address the specimen surface roughness. While no problems are expected and appear with an IOL, it has been repeatedly argued that a CL is unsuitable because the specimen surface has to be very smooth if not polished like that of an electrode. Practical experience shows that the real demands are not so strict albeit application of the method is without any doubts restricted to observation of ﬂat specimens. Naturally, the maximum applicable ﬁeld between specimen and anode is restricted by the danger of a possible discharge that increases with the specimen roughness. Also the imaging process alone requires that the CL ﬁeld should be homogeneous up to the very specimen, which can be so only for smooth surfaces. Tolerable roughness depends on shapes of protrusions and depressions and on the ﬁeld strength. Any radial forces, connected with ‘‘waving’’ of equipotential surfaces above the surface relief, shift or smear the primary probe and locally deteriorate the image. Nevertheless, the above-specimen equipotentials do not simply copy the proper surface but depend also on the distribution of surface dipoles and any trapped charges.

396

MU¨LLEROVA´ AND FRANK

FIGURE 43. Si (100) substrate with heterogeneously etched trenches of both width and depth 3 mm, with some traces of Cu decoration, imaged (from the left) at 5 keV, 250 eV, and 1 eV; primary beam energy ¼ 10 keV, CL ﬁeld ¼ 1.5 kV mm1, width of the ﬁeld of view ¼ 20 mm.

To the authors’ knowledge, no study focusing on this issue has been published as yet. In Figure 43 we see one practical example of a specimen with known shape and dimensions of surface relief features and can verify that even in the 1 eV micrograph no traces of local deterioration at the trench edge are apparent. Other similar experiments also resulted in the conclusion that a surface relief up to a p–p height of a few mm could be tolerated for the SLEEM mode at moderate CL ﬁeld strengths not exceeding 2 to 3 kV mm1. Otherwise, in Figure 43 we can also notice a strong shortening of the depth of focus with decreasing electron energy. Restriction to a relief not exceeding a few mm is more or less in accordance with the depth of focus, which is here signiﬁcantly shortened, owing to p enlargement of the beam aperture within the CL by approximately k times. Otherwise, limitation to the depth of focus is felt to be the most important disadvantage of the method and can be at least partly suppressed only at the cost of resolution, as indicated in Figure 25. Finally we recall one more of the traditional misgivings regarding the SLEEM/LEEM methods, namely that UHV conditions and atomic cleanliness of the surface are unavoidable because of extreme surface sensitivity. It is true that at around 50 eV of landing energy, where the minimum penetration depth of PE is achieved, the surface sensitivity is high and the image contrast is dominated by that of surface contaminants. However, above and particularly below this threshold the electron penetration grows steeply so that at a few eV it is comparable to that at tens of keV. This means that the method itself does not put any demands on the surface cleanliness but such demands might follow from phenomena to

SCANNING LEEM

397

be observed. For example, phenomena connected with surface crystallinity, reconstructions, phase transformations, etc., can take place solely on surfaces free of amorphous contaminants that might prevent atoms arranging according to the distribution of forces inherent in the crystal. E. Specimen Tilt When the specimen is not immersed in an electric ﬁeld, normal conditions for the specimen tilting can be expected irrespectively of whether the retarding ﬁeld is used or not. When only a weak ﬁeld penetrates the anode bore toward the specimen, some balance has to be sought between the image shift and deterioration on one side and the tilt angle on the other side. In the CL any specimen tilt introduces a lateral ﬁeld component that is not considered in the simple electron-optical theory outlined above. Small specimen tilt can be easily caused by imperfect ﬁxing or sticking of the specimen to the holder so it is advantageous to have the specimen stage equipped with the double-tilt facility at least within a small range. However, it is also interesting to explore the limits of an intentional tilt introduced in order to get the primary beam incident under an angle. Let us now look at the consequences of a moderate tilt at an angle !. Having the anode of the CL in the plane z ¼ 0 at potential ¼ 0 and the cathode at z ¼ l and ¼ 0, we get now the CL ﬁeld, for the tilt made with respect to the y axis, modiﬁed to ðx, zÞ ¼ 0

z : l þ x tan!

ð56Þ

A solution of the equations of electron motion in this ﬁeld was presented by Frank et al. (1999). The lateral component of the CL ﬁeld causes some shift of the primary spot in the tilt direction. For small values of ! this shift amounts to

pﬃﬃﬃ 3 k1 2 pﬃﬃﬃ ﬃ !l k , 3 ð k 1Þ 2

ð57Þ

i.e., for very large k it tends to (2/3)!l. This magnitude of shift is quite signiﬁcant so that when tilting the specimen intentionally, we have to do it in very small steps and to correct the position of the ﬁeld of view in between. A further eﬀect of the tilt is a unidirectional smearing of the primary spot owing to the lateral ﬁeld created. In scanning devices, it is reasonable to express the image resolution via the number N of spotsizes ﬁlling the ﬁeld of

398

MU¨LLEROVA´ AND FRANK

view and to compare this ﬁgure with the number of pixels acquired. Naturally, the optimum operation mode is achieved when these two numbers are equal. Normally, the spotsize and the size of the ﬁeld of view are to a ﬁrst approximation independent so that the optimum mode for a given number of pixels can be adjusted via the magniﬁcation. Nevertheless, imaging with some excess nonutilized information or with insuﬃcient information density is also available for the magniﬁcation below or above the optimum, respectively. In other words, the number of spotsizes within the ﬁeld of view is inversely proportional to magniﬁcation, and for any number of pixels acquired some optimum size of the ﬁeld of view exists. In our case the beam inclination, generated by deﬂection coils in order to reach an oﬀ-axis pixel, plays its role in the spotsize deformation so that the optimum number of pixels N directly results from the conﬁguration data as 3 : N ﬃ pﬃﬃﬃ 7 k 10 c !

ð58Þ

This number corresponds to the margin of the ﬁeld of view while toward the optic axis the spotsize linearly diminishes toward its original dimension for no tilt. When taking typical values for the very-low-energy range, i.e., say k ¼ 1000 and c ¼ 2 mrad, and considering 1 tilt, we get N ¼ 407, i.e., a number on the edge of acceptability. An additional consequence of the tilt is the oblique impact of the primary beam, which is deﬂected by the lateral ﬁeld component. Notice that this change in the impact angle is homogeneous within the ﬁeld of view and has nothing in common with inclination due to beam rocking around the pivot point of the scanning system. For a certain deceleration, deﬁned by a value of k, we get some tilt angle !max for which the illumination becomes glancing: !max

1 ﬃ k

rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 3ð k 1Þ 2

ð59Þ

Independence of !max on any other parameter oﬀers an advantageous possibility of calibrating the true tilt scale according to the specimen bias causing a near-glancing impact easily recognized from long shadows of relief details. Otherwise, the impact angle can be assessed from Figure 44; this plot depends solely on EP, which is here 10 keV. Obviously, the impact angle is, except in the near-glancing situation, well approximated by !k1/2. Even very small tilts of tenths of a degree can at very low energies secure a full scale of impact angles, which is important for applications connected with the acquisition of the diﬀraction or energy-band contrasts.

SCANNING LEEM

399

FIGURE 44. Dependence of the beam impact angle on the landing energy for the specimen tilt angles shown (EP ¼ 10 keV); exact values are shown together with those approximated by !k1/2. (Reprinted with permission from Frank et al., 1999.)

The above analysis showed that even with the CL the specimen can be tilted suﬃciently to observe phenomena or features that require a tilt for this purpose. Moreover, the mechanical tilts necessary for large impact angles of the beam are only tolerably destructive as regards the image quality.

VII. INSTRUMENTS A. Adaptation of Conventional SEMs As noted in the introduction, many attempts to design and build a SEM incorporating the retarding principle have been performed and the majority of them can be classiﬁed as adaptations of conventional instruments. Mu¨llerova´ and Lenc (1992a) reviewed the older works and we will not repeat here this literature survey. This section will discuss some recently applied approaches without aiming at a complete summary of relevant publications. It is obvious that in a conventional SEM the way of providing beam deceleration in front of the specimen is usually restricted to specimen biasing to a high negative potential. An additional electrode, placed above the specimen and electrically connected to it, enables one to arrange for a nonzero working distance of an IOL while without such an electrode we get

400

MU¨LLEROVA´ AND FRANK

the CL. The latter alternative has proved itself successful in all groups of application tasks listed below in Section VIII and experience has been collected from adapting microscopes of all major manufacturers. After gaining preliminary experience with the conﬁguration shown in Figure 35, the setup of Figure 37 was repeatedly installed because of its shorter working distance, easy alignment, and superior eﬃciency of the signal collection, albeit at a signiﬁcantly higher price. Up to now, in all versions the YAG:Ce3 þ single-crystal scintillators were used with a thickness of 2 or 2.5 mm and outer diameter between 10 and 20 mm. The light-guide shape is partly dictated by the arrangement of the OL polepiece and of the xy stage but otherwise some space exists for optimizing the shape with respect to the light transfer from scintillator to the side-attached light guide (see Schauer and Autrata, 1998). It is recommended to make the detector retractable so that the resulting restriction to the ﬁeld of view is not preserved for other operation modes. It is suﬃcient to provide specimen insulation for about 15 keV while for practical microscopy 10 keV of primary energy is most suitable. In this case the overall working distance should be not less than 8 mm from which 5 mm is left for the CL ﬁeld. In SEM instruments without specimen loading via an air-lock, it is usually easy to design an insulating insert cup into the standard specimen cartridge so that the specimen holder connects to a high-voltage contact via a pin passing along the stage axis. This design leaves intact all specimen movements including rotation. In air-locks the specimen cartridge is usually side inserted so that some additional mechanism is needed for contacting from the lower side of the specimen already loaded. Alternatively, the contact can be connected in the direction of the specimen loading and a ﬂexible cable used between the specimen stage and feedthrough. These designs have to be tailored to a particular instrument and to its full setup with all options and attachments. The specimen biasing itself needs a negative supply with a ripple not exceeding 105 of the output voltage, ﬁnely adjustable in steps of 1 eV or smaller. Because the supply is usually operated near its maximum output voltage, the parameters should be assessed for the full scale. Generating the low landing energy of electrons via the diﬀerence between two high voltages is obviously not an optimum solution not only because of the addition of instabilities of the two supplies; it would be more advantageous to apply a voltage directly between the gun cathode and specimen. However, for CL applications this supply would have to be designed as passing smoothly through the zero voltage to the opposite polarity, e.g., during alignment, which is a feature not available with commercial laboratory supplies. The alternative and better solution consisting in using the beam booster (an insulated and positively biased tube around the optical axis between the

SCANNING LEEM

401

FIGURE 45. Possible design of a booster inserted into a magnetic pinhole lens. (Adapted with permission from Beck et al., 1995.)

gun anode and bottom of OL, see Figure 45) is mostly reserved for dedicated instruments but adaptation of this type to a conventional SEM was also reported (Plies et al., 1998). Let us explicitly state an obvious fact that with a commercial SEM containing the booster, the adaptation to the CL mode might be restricted to mere removal of the ﬁnal electrode of the IOL or to its connection to the upper electrode; in this way the retarding ﬁeld is shifted to between the end of the booster and specimen.

B. Dedicated Equipment To the authors’ knowledge, among general purpose SEM instruments marketed at present no IOL-containing dedicated device exists except the 1500 Series SEM of LEO Electron Microscopy with the Gemini lens shown in Figure 22 (see http://www.leo-usa.com). The new type of JSM-7400F of JEOL already incorporates specimen biasing but only to 1 to 2 kV and the purpose is declared to consist solely in acceleration of SE toward an electron converter where they are transformed to tertiary electrons detected by the upper detector (Kazumori, 2002). Probably all recent IC testers, i.e., specialized SEMs with very high beam current and a scope of operation modes tailored to inspection of semiconductor structures, including measurement of critical dimensions and local voltages, generation and sensing the electron-beam-induced currents, and operation with surface charges compensated by controlled

402

MU¨LLEROVA´ AND FRANK

FIGURE 46. High-probe-current, low-energy SEM column equipped with a single-pole condenser lens (SPCL) and a single-pole objective lens (SPIOL) (see text for details). (Reprinted with permission from Beck et al., 1995.)

back-streaming of SE, are solved with immersion objective lenses (see, e.g., Meisburger et al., 1992; Miyoshi et al., 1999). Let us describe two of these solutions in more details. Beck et al. (1995) summarized problems connected with the formation of high-current low-energy probes and designed the dedicated column shown in Figure 46. It incorporates the central booster at þ 9 kV (consisting of the gun anode and electrodes 1 to 4) so that with the Schottky cathode at 1 kV and specimen at ground, the primary beam is held at 10 keV throughout the column but within the CL is retarded to the ﬁnal 1 keV. Using singlepolepiece conﬁgurations both as the condenser lens (CS ¼ 13.2 mm, CC ¼ 11.5 mm) and objective lens (CS ¼ 1.05 mm and CC ¼ 0.83 eV at a

SCANNING LEEM

403

working distance of 8 mm and focal length 5 mm, when the retarding ﬁeld is taken into account), they reached the calculated spotsize of 46 nm and the measured resolution of 50 nm. The beam current was calculated to reach 150 nA but measured only 20 nA, which was ascribed to insuﬃcient vacuum conditions in the gun chamber. The lower SE detector was used only with an unbiased booster while in the low-energy mode the accelerated SE are deﬂected in the lower Wien ﬁlter to outside electrode 2 so that they can impinge on the upper SE detector. Although the Wien ﬁlter did not deﬂect the primary beam, it caused dispersion, astigmatism, and some higher-order aberrations that required pre-compensation by the upper Wien ﬁlter with opposite orientation. Meisburger et al. (1992) used the primary beam at 20 keV in the column and decelerated it in front of the specimen held at 19.2 kV together with the nearest electrode of the immersion objective lens that was combined with a magnetic single-polepiece lens below the specimen. Again a resolution of 50 nm was achieved at the landing energy of 800 eV with 50 nA in the probe. As above, two Wien ﬁlters were incorporated of which the lower one deﬂected signal electrons to a semiconductor detector with a ﬁber-optic light guide while the upper Wien ﬁlter incorporated in its electrostatic octupole the blanking unit, stigmator, and centering system. More instruments of the type described above can be found at major producers of SEM technology and also a specialized instrument industry for this application exists. Nevertheless, critical details about equipment developed outside the academic community are very often conﬁdential. Another family of dedicated instruments consists of laboratory equipment composed for basic research tasks and although their development often began with a commercial microscope, the volume of modiﬁcations was much larger than that described in the previous section. One example is the UHV SEM working within the range 100 eV to 3 keV with resolution of 60 nm at 250 eV and equipped with a LEED pattern detector consisting of two concentric hemispherical grids, two microchannel plates, and a two-dimensional detector. The specimen was typically inclined 45 toward the detector that was, moreover, rotatable around the sample. Signal processing functions included selection of a diﬀraction spot for the brightﬁeld imaging and for various dark-ﬁeld images. This device, based on a Hitachi S-800 but completed by equipment typical for surface analysis devices including the magnetically driven transfer from the air-lock, provided many interesting observations, e.g., of grains on polycrystalline Si, the step structure on a Si (111) surface, domain structure on the reconstructed Si (110) 16 2 surface (Ichinokawa et al., 1987), and also of superstructures and of movement of islands on an Au-evaporated Si (111) surface (Ichinokawa et al., 1986). In order to collect and process the image

404

MU¨LLEROVA´ AND FRANK

FIGURE 47. UHV SLEEM for examination of clean and deﬁned surfaces (the magnetically driven specimen transport is not attached to the air-lock on the far right-hand end).

data, the authors of these studies made enormous eﬀort with their electronic equipment, the quality of which was far below that of the present. In the authors’ laboratory an UHV SLEEM instrument has been developed, the design representing a combination of the ‘‘adapted SEM’’ outlined above (i.e., biased specimen and detector according to Figure 37) and facilities usual in surface analysis equipment for examination of clean and deﬁned surfaces (clean UHV conditions of the order of 108 Pa, separate preparation chamber with the ion gun for cleaning and sputtering the surface, and with an attachment for evaporation of metals)— see Figure 47. The basic illumination system is the commercial electrostatic two-lens column with a Schottky cathode (2LE Column, FEI Company; see http://www.feibeamtech.com/pages/electron.html). The apparatus is intended to employ diﬀraction, interference, and energy-band contrasts and will be equipped with a two-dimensional LEED pattern detector the design of which has not been ﬁnished yet and with a parallel operating electron

SCANNING LEEM

405

FIGURE 48. Double-tilt specimen stage for the device in Figure 47, permitting insertion of a specimen cartridge via the air-lock with ﬁve high-voltage connections to outside of the vacuum chamber.

energy analyzer of the hyperbolic type (Jacka et al., 1999) that is under preparation. As we saw in Section VI.E, even a tiny specimen tilt manifests itself with large beam impact angles at the lowest energies. In order to acquire full control over the impact direction, a double-tilt specimen stage is necessary as shown in Figure 48. It features the x and y movements of 5 mm, z-axis movement of 10.5 mm, rotation of 8 , and two mutually perpendicular tilts of up to 5 . One prospective task is to solve a combination of the SLEEM method with surface microanalysis like Auger spectromicroscopy. For this purpose a miniature all-electrostatic SLEEM column has been developed with the built-in detection part shown in Figure 34(b) (see Frank et al., 2000a). The whole device (see Figure 49) is of length 90 mm and diameter 45 mm, and at 5 keV primary energy, probe current of a few nA, and working distance 5 mm, it provides a resolution of 30 nm (El-Gomati et al., 2002) that predicts a value around 100 nm at 10 eV. The column ﬁts inside a cylindrical mirror analyzer for Auger electron spectroscopy but in a separate installation it can be completely biased to a high positive potential providing for the CL conﬁguration with earthed specimen. Bearing all the previous considerations in mind, we can now outline a design for the ideal dedicated SLEEM instrument for general

406

MU¨LLEROVA´ AND FRANK

FIGURE 49. Electrostatic three-lens mini-column, equipped with Schottky cathode, two-stage quadrupole centering system, octupole stigmator/centering and built-in detector (see Figure 34(b)) with a six-segment collector. (Reprinted with permission from Frank et al., 2000a.)

very-low-energy SEM applications. Such a device would be similar to the setup shown in Figure 46, i.e., equipped with the positively biased booster and the CL with the specimen at ground potential. For acquisition of diﬀraction and energy-band contrasts, a two-dimensional multichannel detector is desirable onto which the full LEED pattern would be projected. In order to focus the diﬀraction pattern, we have to let the signal beam pass through some lens. This can be an IOL and then deﬂection toward some kind of upper detector is necessary but here the deﬂection assembly has to image its input plane onto the detector plane, so that behind a weak Wien ﬁlter the simple extraction ﬁeld has to be replaced by regular large-angle deﬂection unit. Altogether the device becomes relatively complicated so that

SCANNING LEEM

407

a conﬁguration with the single-polepiece lens inserted below the specimen (e.g., a miniature one with permanent magnets) seems to be more promising. Then, the LEED pattern will be formed in the space between the specimen and the SEM column (Mu¨llerova´ and Lenc, 1992b). Having ample free space available (see Figure 36), we can design the two-dimensional detector on various principles. Of course, the specular beam would still escape detection, but this could be avoided by allowing for a small specimen tilt. Naturally, a ﬁeld-emission gun of a high brightness is desirable and when designing the specimen chamber as a UHV one, we extend signiﬁcantly the scope of possible applications.

C. Alignment and Operation If a conventional SEM is adapted via insertion of the CL, the routine procedures for alignment and operation become modiﬁed. Let us now make a few remarks on this topic. A key issue is the strong electric ﬁeld in the specimen chamber. The maximum applicable strength of the ﬁeld between the specimen and anode within the CL naturally depends on the quality of both surfaces. It is generally recommended to arrange the specimen holder or cartridge so that no sharp edges or protrusions appear on the side facing the anode and to cover the specimen with a large ﬂat cap made of a smooth foil, leaving the desired part of the specimen exposed for observation. It is good practice to start the experiments with every new specimen by ‘‘training’’ the specimen biasing by slow stepwise increase of the voltage with the rest of the microscope switched oﬀ. Good optical properties are obtained for suﬃciently high values of the immersion ratio k so that the primary energy should not be chosen below 5 keV while 10 keV seems to be an optimum value. Thus, the ﬁeld strength is mainly controlled via the working distance and according to experience the range between 1 and 2.5 kV mm1 (i.e., l between 4 and 10 mm) usually suits the purpose. Naturally, when predicting the imaging parameters dependent on the working distance, we have to take into account the necessary underfocusing of the SEM column, connected with the CL optical power (see the next section). We now restrict ourselves to the conﬁguration shown in Figure 37 that has proved to be optimum for adaptation of general purpose SEMs. Two points are important here, namely aligning the detector onto the optical axis and tuning the homogeneity of the CL retarding ﬁeld. The detector alignment is a standard initial routine used not only when the detector is designed to be retractable as was recommended above but always before entering the SLEEM mode. This routine should be performed

408

MU¨LLEROVA´ AND FRANK

FIGURE 50. An example of the appearance of the microscope screen when the detector/ anode assembly shown in Figure 37 is being adjusted onto the optic axis: (a) the upper surface of the YAG crystal with no specimen bias and only low BSE signal from a specimen composed of light elements; (b) the YAG surface combined with the specimen image in the bore, both images being defocused with the sharp image plane situated between them (specimen biased for an impact energy of 1 keV so that the SE signal dominates).

with a perfectly aligned column, particularly as regards the beam centering. The upper surface of the YAG crystal is treated in the same way as the lower detecting surface so that at low magniﬁcations the scintillator material around the central bore is struck by primary electrons from above and the YAG surface is visible on the microscope screen as a bright area with black circular feature, which can then be easily mechanically adjusted to the screen center (see Figure 50). When afterwards increasing the magniﬁcation, we restrict the scanning range to within the detector bore and only the lower YAG surface remains active for electrons emitted from the specimen. This low limit of magniﬁcation is usually between 250 and 500, which is one of the drawbacks of this detector type. Nevertheless, modern computercontrolled microscopes enable one to control every active element within the column easily so if a larger ﬁeld of view is needed, the pivot point of the deﬂection system can be shifted nearer to the scintillator. This change increases the OL aberrations because of an out-of-axis passage of the beam but the corresponding drop in resolution should not be perceptible owing to the enlarged pixel size. A further step in the recommended procedure is to check whether the position of the detector bore does not move on the screen when refocusing from the detector plane to the specimen plane visible inside the bore; otherwise the OL centering has to be improved. Next the specimen bias is increased in steps and image astigmatism is corrected together with the shift

SCANNING LEEM

409

FIGURE 51. An example of the very-low-energy image in the center of the ﬁeld of view with the rim of the mirror image connected with glancing electron impact and with their reﬂection in front of the surface; electron energies from the left: 2.5, 1, 0, and 2.5 eV.

of ﬁeld of view, which is eliminated via slight specimen tilt. When continuing this until the specimen bias is equal to the gun voltage, we approach the zero energy of electron impact so that from the margin of the ﬁeld of view the surface image starts to convert to the mirror image of an above-specimen equipotential surface. Now the most sensitive alignment of the CL ﬁeld can be made by shifting the residual central area of the very-low-energy image into the screen center by using small specimen tilts. If the double-tilt facility is not available, the single tilt can be combined with the specimen rotation. Sometimes small corrections of the OL centering are made but this is a pragmatic step not belonging to the consistent alignment procedure. This step of the alignment procedure is illustrated in Figure 51. Roughly below 5 eV some signal decrease in the center of the ﬁeld of view starts to appear owing to loss in signal escaping through the detector bore, as follows from Figure 38. At low magniﬁcation, the margin of the ﬁeld of view appears (see the next section, Equation (60)) and shrinks with further reduction of the energy. Around the ﬁeld of view, a rim of mirror image is seen where electrons reﬂect on above-surface equipotentials. When the zero energy is reached, the specimen surface acts as a planar mirror reﬂecting the lower scintillator surface with the central sink and bore and this image can be focused. The signal deﬁcit in the center of the ﬁeld of view, analogous to the escape of the (00) diﬀracted beam, is inherent in coaxial detectors in general. In practice this is avoided by a slight specimen tilt causing only a small loss in resolution. Systematic solution requires using a deﬂection unit and sidepositioned detector. In a well-aligned device, the very-low-energy image centered according to Figure 51 is seen and the ﬁeld of view does not move when decreasing the specimen bias throughout its full range. In Figure 51, some residual ellipticity of the central spot is still visible, which indicates that the conﬁguration is not perfectly axially symmetric.

410

MU¨LLEROVA´ AND FRANK

In the course of the SLEEM mode operation it is always wise not to increase the specimen bias too fast and to use specimen movements, z-shift, tilt, and rotation in particular, very carefully with larger changes better made at a low bias. Interpretation of contrast observed below a few hundreds of eV is often not straightforward and is signiﬁcantly facilitated when a series of micrographs of the ﬁeld of view is available within the full energy scale. D. Practical Issues The optical power of the cathode lens inﬂuences the beam impact on the specimen and modiﬁes characteristics like the image magniﬁcation, the working distance (when assessed according to excitation of the magnetic focusing lens), and the impact angle connected with beam rocking around the pivot point of the scanning system. Also an additional condition restricting the ﬁeld of view appears, namely that connected with the increase of the impact angle to p/2 as is shown in Figure 51. While the restriction to the ﬁeld of view and increase in the impact angle can be only recognized and considered when interpreting the image, corrections for focus and magniﬁcation should be incorporated into the microscope control software. This section aims at preparing algorithms for these corrections (see Zobacˇova´ et al., 2003). The basic equations were given by Mu¨llerova´ and Frank (1993) while Hutarˇ et al. (2000) solved the correction of magniﬁcation. Let the primary beam trajectory (see Figure 52) be initially directed into a point with radial coordinate r0 that in paraxial approximation is given as r0 ¼ (wS þ l ) , where wS is the axial coordinate of the virtual vertex of the scanning system. The ﬁeld in the anode bore acts as a diverging lens of a focal length fA that moves the virtual vertex to z ¼ wS0 ¼ fAwS(wS – fA)1. The lens also enlarges ! 0 and in the paraxial approximation (i.e., when we can put tan ﬃ sin ﬃ ), we get

’ ¼ wS/wS0 . The homogeneous retarding ﬁeld further deﬂects the oblique impacting ray along a parabolic trajectory so that it is easy to trace its radial coordinate and velocity; an interesting point is where the axial velocity falls to zero. This takes place on a ﬁctitious ‘‘reﬂection surface’’ that intersects the specimen surface at a radial coordinate rmax deﬁning the maximum size of the ﬁeld of view to 4l 4l þ 3wS Vmax ¼ 2rmax ﬃ pﬃﬃﬃ : k 4l þ wS

ð60Þ

SCANNING LEEM

411

FIGURE 52. The primary beam trajectory inside the objective and cathode lenses.

In order to derive Equation (60), the focal length of the aperture lens (Lenc and Mu¨llerova´, 1992a) fA ¼ 4l

k k1

ð61Þ

was, for near-zero landing energies, i.e., k 104, approximated by 4l. Assuming wS ¼ 25 mm (see below), l ¼ 5 mm, and EP ¼ 10 keV, we get Vmax ¼ 0.42 mm at the landing energy of 1 eV and 130 mm at a mere 0.1 eV, i.e., quite acceptable ﬁgures. It is obvious from Figure 52 that the beam impact angle, which is initially equal to and, owing to the aperture lens action, enlarges to 0 , increases further within the retarding ﬁeld. Solving again the parabolic motion up to the specimen surface, we get for the ﬁnal impact angle, CL, the relation tan CL

pﬃﬃﬃ k sin 0 ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ , 1 k sin2 0

ð62Þ

412

MU¨LLEROVA´ AND FRANK

i.e., this enlargement is roughly k1/2 times. When substituting for 0 and then from Equation (61), we ﬁnd near the optic axis

CL

pﬃﬃﬃ r0 k k 1 wS 1þ : ﬃ k 4l wS þ l

ð63Þ

For a tilted specimen, this impact angle, increasing linearly with the oﬀ-axis distance, combines with the uniform impact inclination shown in Figure 44. The above-described changes in the parameters of the ﬁeld of view are inherent in the conﬁguration and although we can modify them to some extent by controlling the vertex position wS, actual correction for them is neither possible nor desired. On the contrary, changes in magniﬁcation and in the eﬃcient working distance should be automatically corrected in the control software. In the paraxial approximation the coordinate rC of the impact point (see Figure 52) amounts to rC ¼ (2l" þ wS0 ) 0 with " ¼ k1/2/(1 þ k1/2). Let us deﬁne the magniﬁcation correction factor M ¼ r0/rC<1, which can be used for updating the size of micron marks or the numerical magniﬁcation values. From previous relations we can write this as

pﬃﬃﬃ fA ðwS þ l Þ k þ 1 pﬃﬃﬃ M ¼ :

pﬃﬃﬃ fA wS k þ 1 þ 2l kðfA wS Þ

ð64Þ

It should be noted that M does not represent the CL magniﬁcation, which is, of course, not wS dependent. When substituting for fA from Equation (61), we get at k>>1 the factor M 2 (1/2, 2/3) within full range of wS/l; the most often met value is M 0.6. Careful measurement of M showed that the approximation (61), derived while considering an abrupt change of the CL ﬁeld in the anode plane, does not provide values of fA ﬁtting the measured data with accuracy suﬃcient for the purposes of critical dimension measurement for example. Hence a more exact relation was derived, based upon modeling the anode ﬁeld of a ﬁnite thickness t within which both the axial potential and the electron trajectory follow parabolic curves. Lenc and Mu¨llerova´ (1992b) used this approach when deriving relations (42) and (43) for the CL aberrations. The result was rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

4 ln 1 X 2 k k1 t 1 , X ¼ X : fA ¼ 4l 2 Arth X k1 2k l

ð65Þ

SCANNING LEEM

413

Finally we will look at the magnitude of underfocusing f that has to be made with the magnetic OL when the cathode lens is excited. In a similar way as above, we ﬁnd in the paraxial approximation that the surface point is imaged by the retarding ﬁeld at a distance 2l" below the anode. The aperture lens in the anode plane further images this virtual crossover so that it appears near a point lying at a distance of l/3 below the specimen; more accurately, the axial shift of the focused probe is given by pﬃﬃﬃ

pﬃﬃﬃ 2l k þ fA k 1 ð66Þ f ¼ l pﬃﬃﬃ

pﬃﬃﬃ > 0: 2l k fA k þ 1 In order to obtain a rough quantitative estimation, pﬃﬃﬃ we use fA from Equation (61) and k>>1 and get f ﬃ (l/3)[1 (8/3 k)]. This leads to a slope of the refocusing, necessary when the energy varies, expressed as @ðf Þ 4l ﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ : @E 9 EP E

ð67Þ

For example, with l ¼ 5 mm and EP ¼ 10 keV we get 7 mm eV1 at 10 eV. Equations (64) and (66), each in combination with Equation (65), represent the desired algorithms for the on-line correction of magniﬁcation and for the refocusing. Both algorithms contain three parameters, EP, l, and t, while M depends also on the vertex position wS. Values of t and wS cannot be directly measured and in fact they represent some eﬀective dimensions obtainable only by ﬁtting experimental data to the model. This was made in one particular SLEEM arrangement and the result was wS ¼ 25.62 mm (with the OL aperture situated at 27 mm above the anode) and t ¼ 5.79 mm for an anode bore of D ¼ 3.5 mm. The ratio t/D ¼ 1.65 for an anode thickness of 2 mm corresponds accordingly to a ratio of 1.25 for a thin anode (Lenc and Mu¨llerova´, 1992b). Furthermore, for a broad range of variables the measured magniﬁcation factors M(l, E ) ﬁtted the above model with deviations below 1.9% including the measurement errors (Hutarˇ et al., 2000).

VIII. SELECTED APPLICATIONS This section summarizes the results of some demonstration experiments in which the aim was to map the main features of SLEEM micrographs at low and very low energies for a particular family of specimen types and

414

MU¨LLEROVA´ AND FRANK

to verify the feasibility of obtaining the types of contrast inherent in slow electrons. Only in Sections VIII.D and VIII.I do we quote results of more systematic studies.

A. Prospective Application Areas First of all let us brieﬂy characterize the application areas in which the use of the SLEEM mode can contribute to progress in the solution of research tasks. It is well established that examination of semiconductor structures, both as regards their geometry and critical dimensions and the local voltages and currents, either initiated by powering the structure or induced by the electron beam, is best performed at beam energies around 1 keV. Dedicated IC testers use this energy and many of them are equipped with some retarding ﬁeld element. The doping contrast is highest around 1 keV, too, and a further possibility is to use the elastically backscattered electrons at a tailored very-low-energy causing no damage. Nonconductors were for a long time observed below their critical energy where only a moderate charging takes place. This mode was, of course, surpassed by observation just at the critical energy, as described in Section VIII.D. However, a new approach has arisen consisting in controlled return of the fraction of SE needed to balance the charge so that the noncharging situation can be secured at any energy but still only below EC2. Detailed examination of surface topography is best made at an electron energy for which the interaction volume inside the specimen ﬁts in size the relief protrusions. To a certain extent this remains valid even at energies for which the primary electrons do not penetrate to below the escape depth of SE—if a raised feature is just ﬁlled with the interaction volume, even SE directed quite far from the surface normal might be emitted. For details smaller than 100 nm this means that the low-energy range must be used. These small features replace the topographical contrast of inclined facets and surface steps, dominating at high energies. Variations in the electron yield with crystal orientation culminate at a few hundreds of eV, which energy range is then optimum for observing grains in polycrystals, crystallinic precipitates, and amorphized areas in crystals. The diﬀraction and interference contrast below, say, 20 or 30 eV reveals phenomena connected with surface crystallinity and its changes owing to surface reconstructions, adsorption, desorption, growth of layers, sublimation, diﬀusion, segregation, etc.

SCANNING LEEM

415

This list obviously covers selected but very numerous tasks from virtually any of the application ﬁelds of microstructure examination, both in materials science and the life sciences.

B. General Characteristics of Micrograph Series Because of the lack of experience with contrast appearing in micrographs below, say, 500 eV, it is good practice to acquire always (or at least with a new type of specimen) a series of micrographs beginning with the primary beam energy used and continuing by increased specimen bias, possibly up to near the zero energy. When doing this, attention should be paid to preserving the identical ﬁeld of view and to correcting for the magniﬁcation changes wherever this is not performed automatically. This micrograph series will show some characteristic features that include disappearance of the edge eﬀect, transformations in the material contrast, and enhancement of the relief contrast. Further image changes with decreasing electron energy are then inherent to individual structure types. One example is seen in Figure 53, which shows the surface of a Cu polycrystal with the surface oxides and contaminants removed by chemical etching. While at 5 keV the image is strongly dominated by the edge eﬀect appearing on steps made by etching along grain boundaries and also on other etch pits, at 200 eV these over-brightened features are not visible. Instead, the ﬁne surface relief combined with the grain contrast appears as most pronounced. At 10 eV the contrast of residual islands of

FIGURE 53. Surface of a polycrystalline Cu sheet etched in nitric acid, Tesla BS 340 SEM adapted for the SLEEM method, energies from the left 5 keV, 200 eV, and 10 eV, the width of the ﬁeld of view is 70 mm. (Reprinted with permission from Mu¨llerova´ and Frank, 1994).

416

MU¨LLEROVA´ AND FRANK

FIGURE 54. GaAs-based integrated circuit, JEOL JSM T220A SEM adapted for the SLEEM method, electron energies (a) 9800 eV, (b) 4300 eV, (c) 1400 eV, (d) 20 eV; the width of the ﬁeld of view is 400 mm.

contamination is strongest; the mechanism would require further examination in order to be explained, a surface microanalysis in particular. Another typical example, shown in Figure 54, represents the semiconductor structures. Here one can notice primarily the contrast changes caused by decreasing penetration depth as interfaces between technological layers are crossed, which projects itself into variations in the BSE and SE2 yields. Possible dynamic eﬀects, connected with injection of electrons into interface states and creation of space charges within the information depth, would also need further examination. At 20 eV the local charging and surface details and defects are most obvious. This micrograph series also illustrates the noncorrected changes in the image magniﬁcation with decreasing electron energy. The third example in Figure 55 consists of only a single micrograph representing a typical example of an unexpected contrast that appeared at low energies without being apparent at all at 10 keV. The dots arranged in rows on a cleaved GaAs crystal surface might represent islands of oxide layer preferentially grown on crystal defects or on edges of surface steps made when cleaving but no reliable explanation is at hand. Again, surface microanalysis would greatly facilitate the interpretation.

SCANNING LEEM

417

FIGURE 55. ‘‘Decoration dots’’ on the fracture surface of a low-quality GaAs crystal, Tesla BS 343 SEM adapted for the SLEEM method, electron energy 250 eV, the width of the ﬁeld of view is 20 mm.

C. Surface Relief In all the series of micrographs in this section, strong enhancement of the relief contrast is apparent when they are made within a broader energy range. At ﬂat metal surfaces without any artiﬁcial structure, small relief details are best visible around 50 eV where the penetration depth is shortest. In Figure 56 two frames from the ﬁrst published series of micrographs, taken throughout the full energy scale, demonstrate this trend. We notice here that although the detector system shown in Figure 37 that was used to acquire the majority of the micrographs is of the overhead type and should not produce any shadowing eﬀects, in practice this is not entirely true. The scintillator is placed in an axially symmetrical position but the side-attached light guide breaks the symmetry and the eﬃciency of light transport is not identical all over the scintillator crystal even if the optical contact is made properly. Owing to the strong acceleration within the CL ﬁeld, the emitted electrons more or less preserve their oﬀ-axis coordinates and species emitted from any one pixel impact the detector locally. Consequently, the image side situated below the light guide is the brightest (see Figures 50 and 51) and sometimes some oﬀ-line corrections are needed at very low energies. In connection with this, larger surface facets inclined toward the light-guide direction might also exhibit a higher signal and hence some kind of moderate shadowing is observed.

418

MU¨LLEROVA´ AND FRANK

FIGURE 56. Chemically etched polycrystalline Ti sheet, Tesla BS 350 UHV SEM adapted for the SLEEM method, electron energies 15 keV (left) and 50 eV (right), the width of the ﬁeld of view is 50 mm. (Reprinted with permission from Mu¨llerova´ and Frank, 1993).

D. Critical Energy Mode In Section III.E, we discussed phenomena connected with charge accumulation in nonconductive specimens and saw that if the total electron yield curve (E ) is taken as the process diagram, then spontaneous movement of the working point toward the critical energy EC2 does qualitatively explain the observed eﬀects. In addition, it was argued that when the electron energy is below EC2, the ultimate (positive) surface potential is reduced because of recapture of the slower part of the SE. It is important to recall that, irrespective of the initial energy of impact, the charging process causes shifts of the impact energy, inﬂuenced by ﬁelds of persisting charges, directed always toward EC2. This is, of course, connected with corresponding changes in the image signal; Figure 57 illustrates this for the alternative of a positive charging up. This is a well known eﬀect encountered in the observation of nonconductors and can be utilized via a practical procedure consisting of a temporary increase of the image magniﬁcation and subsequent relative assessment of the signal level from the smaller ﬁeld of view with respect to its surrounding, which reveals in what direction the charging has changed the average emission (see Joy and Joy, 1996). The same approach forms the basis for an automatic method of determination of EC2 (or, more exactly, of the energy causing minimum damage of the image owing to charging), which is also outlined in Figure 57. The method (see Frank et al., 2001) consists in acquisition of a temporal sequence of image signals from individual pixels since their ﬁrst illumination, and in oﬀ-line determination of the integral under the S(t) curve, which can be taken as a measure of the total signal change caused by the charge accumulation. By plotting this quantity versus the initial impact

SCANNING LEEM

419

FIGURE 57. Scheme of the spontaneous time development of the image signal in the course of positive charging up: movement of the ‘‘working point’’ from initial (EB) to ﬁnal (EF) impact energy (top), signal vs. time plot (bottom left), and the area below the S(t) curve (the charging rate) as a function of the initial impact energy (bottom right). (Reprinted with permission from Zobacˇova´ and Frank, 2003.)

energy, we can ﬁnd its optimum value where the curve crosses the zero level. The peculiar behavior of this curve below EC2 was explained as a consequence of SE being ‘‘focused’’ into the detector bore by the radial ﬁeld component above the charged ﬁeld of view surrounded by the noncharged specimen (Zobacˇova´ and Frank, 2003). The results, demonstrated in Figure 58, are more reliable for ﬂat specimens exhibiting only moderate heterogeneity in the conductivity and electron yields. The observation method described in this section requires modiﬁcations in the SEM control software that go beyond the scope of a simple adaptation made by the customer. It is mentioned here to demonstrate that, when working just at the critical energy, we can achieve much better results than at low energies in general. The same idea led to the above-described detection approach incorporating controlled return of a portion of the SE. E. Diffraction Contrast In Section IV.B, we dealt with the electron backscattering from single crystals and hinted at the possibility of obtaining image contrast connected with locally varying fulﬁllment of the diﬀraction condition. This consists in getting a bright albeit defocused diﬀracted beam(s) incident on the detector, which is, for the detector types described here, achieved automatically except for the specular spot (00). Although at very low energies the reciprocal lattice is theoretically two-dimensional and bright spots are

420

MU¨LLEROVA´ AND FRANK

FIGURE 58. Surface of writing paper, nonprocessed and uncoated, Tesla BS 343 SEM adapted for the SLEEM method, electron energies (a) 3650 eV, (b) 2650 eV (the critical energy EC2), and (c) 1850 eV; the width of the ﬁeld of view is 40 mm. (Unpublished micrographs courtesy of M. Zadrazˇil.)

received at any energy, in fact signiﬁcant variations in the spot brightness with electron energy are always observed. Consequently, the eBSE signal from crystals is modulated along the energy scale according to the crystal orientation and the distance from the Ewald sphere of a nearby reciprocal lattice point. Also, additional features can appear owing to eﬀects going beyond the kinematical diﬀraction theory. The ﬁrst test experiment was published and interpreted in detail by Frank et al. (1999). In Figure 59 we see micrographs taken at normal impact of the slow electron beam. One can compare the brightness of rectangular (with (100) orientation) and triangular (with (111) orientation) Pb crystals on Si and verify that it varies with energy in diﬀerent ways in the two cases; this diﬀerence can be correlated with the diﬀraction condition for individual conﬁgurations. The interpretation of Figure 60 in which the micrographs are taken with the specimen tilted by a mere 1.3 , is much more sophisticated.

FIGURE 59. Flat Pb islands deposited in situ onto an Si (100) surface, JEOL JAMP 30 UHV SEM adapted for the SLEEM method, electron energies (a) 5, (b) 12.5, (c) 42.5, and (d) 378 eV, the width of the ﬁeld of view is 60 mm. (Reprinted with permission from Frank et al., 1999.)

SCANNING LEEM

421

FIGURE 60. The same specimen and microscope as in Figure 59, the specimen tilted to approximately 1.3 in the direction inclined at 55 with respect to the horizontal line, electron energies from the top left by rows: 6.5, 7.5, 10.5, 16, 18, 22, 29, and 34.5 eV; the width of the ﬁeld of view is 50 mm. (Reprinted with permission from Frank et al., 1999.)

Now, not only the crystal orientation and energy but also the impact angles, both polar and azimuthal, play a role as they deﬁne the incident ray orientation with respect to the reciprocal lattice. Hence even crystals with an identical crystalline plane on the surface but mutually rotated exhibit some speciﬁc behavior of the eBSE signal with energy. In connection with this experiment, the degree of the illumination coherence was also assessed according to relations given in Section IV.C. The size DC of the coherently illuminated area was determined according to Equations (29), (31), and (32) and also the condition (30) for the source size was veriﬁed. The experimental data were taken as E ¼ 0.5 eV, EP ¼ 10 keV, C ¼ 1 mrad, E ¼ 10 eV, and dhk ¼ 3 nm. Considering the OL demagniﬁcation to be 10 times, we have 0 ¼ 0.1 mrad for the diaphragm illumination angle in Figure 15, and ﬁnally we get ¼ 30 mrad for the aperture angle on the specimen. Then the factors limiting DC result as |s| ¼ 15.5 nm, wE ¼ 12 nm, and w ¼ 6.4 nm for ¼ p/2 with increase to 9.1 nm at 45 . The real spotsize was not measured but could be estimated to be between 10 and 20 nm. Obviously, the constructive interference took place at least for a major part of the primary spot, which means that at

422

MU¨LLEROVA´ AND FRANK

favorable conditions the image signal was increased by a factor approaching the number of unit cells within the coherence area or within the crystal domain, whichever is smaller. It is obvious that observations like these can be made solely on very clean surfaces and under true UHV conditions. However, it should be underlined once more that these demands do not arise from the observation method employing very slow electrons but they condition the phenomena that can be observed. Prospective applications of the diﬀraction and interference contrast mechanisms can be estimated according to the huge variety of experimental results collected by means of the LEEM apparatus (see, e.g., Telieps and Bauer, 1985; Telieps, 1987; Bauer and Telieps, 1988; Tromp and Reuter, 1993; Tromp et al., 1993; Bauer, 1994; Tromp, 2000). A survey of references in this area can be found at http://www.leem-user.com. F. Contrast of Crystal Orientation In the previous section the examples showing a crystallinic structure in verylow-energy micrographs concerned the coherent backscattering when the detected yield is increased by amplitude instead of intensity addition of scattered waves. However, in Sections IV.B and IV.D we also mentioned the dependences of both BSE and SE yields on the crystal orientation and argued that these should become more pronounced at low energies. In fact, experiments showed that for metal polycrystals the grain contrast in SLEEM images is highest between 50 and 150 eV (see Figures 53 and 56). Another application of the same eﬀect arises when amorphous and crystalline areas are to be distinguished. The example in Figure 61 presents a lattice of spots amorphized by laser beam exposure of a crystalline layer. Owing to this amorphization, a decrease in both SE and BSE signal can be generally expected so that brighter centers of dots, caused by increased laser beam power, need to be examined in more detail. Figure 61 illustrates the enhanced sensitivity of slow electrons to spurious a.c. electromagnetic ﬁelds. The vertical stripes are caused by an excessively high amplitude of the 50 Hz interference from the SEM electronic console. G. Layered Structures One trivial consequence of lowering the impact energy is that thin surface layers that were transparent at high energies become opaque and their structure can be observed. The example in Figure 62 shows a trilayer structure prepared for exploration of the backscattering factor in Auger

SCANNING LEEM

423

FIGURE 61. Structure created by laser beam exposure of microdots (with various beam intensities and exposure times) in a Pt3Si layer made on a glass substrate, dot pitch 2.9 mm; Tesla BS 343 SEM adapted for the SLEEM method, electron energy 200 eV. (Specimen provided by Dr. H. Birecki, HP Labs; reprinted with permission from Mu¨llerova´, 1996.)

FIGURE 62. A patterned multilayer structure consisting of islands of a 500 nm thick Au layer (right) on a Si substrate, partially covered with a 200 nm thick layer of GeSi (top left), JEOL JSM T220A SEM adapted for the SLEEM method, electron energies 9800 eV (left) and 850 eV (right), the width of the ﬁeld of view is 300 mm. (Specimen provided by Professor M.M. El-Gomati, University of York, UK.)

spectromicroscopy (El-Gomati et al., 1992) with an obvious demonstration of this eﬀect. Specimens of semiconductor devices in a plan view, like that in Figure 54, exhibit the same features but combined with other eﬀects and hence are not so striking.

424

MU¨LLEROVA´ AND FRANK

FIGURE 63. A beveled cross-section cratered by oblique impact of a low-energy ion beam across a multilayer structure consisting of 12 pairs of 100 nm GaAs/63 nm AlAs layers; Tesla BS 343 SEM adapted for the SLEEM method, electron energies from the top left by rows: 20, 30, 40, 130, 430, and 2430 eV; the width of the ﬁeld of view is 600 mm. (Specimen provided by Dr. J. Kova´cˇ, TU Bratislava, Slovakia, preparation by Dr. A. Barna, KFKI Budapest.)

Unlike the previous example, the structure in Figure 63 produces contrasts that are not so easy to understand. The beveled section of a multilayer, composed of two alternating diﬀerent semiconductors, shows outcrops of layers of one material (GaAs instead of AlAs) at a strongly elevated contrast within a certain energy interval, and in addition three stripes of diﬀerent intensity instead of two can be distinguished periodically repeating across the structure. One boundary of the ‘‘extra’’ bright strip, namely that next to the dark part corresponding to the thicker wedge, is not sharp, which indicates that the contrast source might be buried. And in addition, the eﬀect, i.e., both the contrast enhancement and formation of the third fuzzy strip, is clearly of a dynamical nature as can be seen from Figure 64 where none of the previous eﬀects appears at the lowest electron dose while both features progressively emerge with increased current as well as with prolonged frame time. Similar eﬀects were observed with Mo/Si multilayers (Mu¨llerova´ et al., 1997) but they were absent at metallic multilayers such as Ni/Cr. The phenomenon will be further studied because the provisional interpretation, relying upon the inﬂuence of charges trapped in the interface states and forming a buried space charge layer, needs to be

SCANNING LEEM

425

FIGURE 64. The same specimen and microscope as in Figure 63; electron energy 450 eV, the width of the ﬁeld of view is 600 mm, primary beam current 0.2 nA (upper row) and 80 pA (lower row), frame times from the left 3, 11, 30, and 83 s. (Reprinted with permission from Mu¨llerova´ et al., 1997.)

supported by more experimental data; furthermore a complete model of the contrast mechanism, even qualitative, is not available yet.

H. Material Contrast The absence of the monotonic material contrast in the BSE emission, i.e., the direct proportionality / Z, available at conventional beam energies in SEM, is characteristic of the low-energy ranges. This fact is obvious from the (E ) plots in Figure 13 for clean material surfaces. A comparison of clean and ‘‘real’’ surfaces in Figure 14 indicates that, under standard vacuum conditions and on specimens without any special treatment, some residual traces of this contrast can be observed down to about 1 keV. Below 1 keV any relations between BSE yields from diﬀerent materials have to be speciﬁcally reconsidered. As Figure 29 shows, even the contrast between gold and carbon, otherwise representing the extreme in this respect, is inverted or at least disappears at 20 eV, where the eBSE emission already dominates. When following a particular combination of materials throughout the energy scale, even more than one inversion can be registered; Figure 65 shows two of them for the Cu/Si combination and both are met at energies for which the Cu layer is far from being penetrated so that no alternative explanation is possible.

426

MU¨LLEROVA´ AND FRANK

FIGURE 65. Islands of a 300 nm thick Cu layer deposited through a mask, exposed by electron beam lithography, onto the Si substrate, period of squares is 10 mm; Tesla BS 340 SEM adapted for the SLEEM method, electron energies from the top left by rows: 5000, 500, 250, 100, 50, and 10 eV. (Specimen provided by Mgr. F. Mateˇjka, ISI Brno, Czech Republic.)

On the other hand, for a particular couple of materials that at high energies exhibit a moderate contrast because of a small diﬀerence in atomic numbers only, in the low-energy range an energy value can be tuned for which much enhanced contrast is available (Mu¨llerova´, 2001). Figure 65 also illustrates consequences of the bad practice consisting in performing the alignment, stigmation, and focusing inside the ﬁeld of view selected for the ﬁnal frame. Rectangles of the graphitic layer of contaminants, which are always formed on specimens but at high energies are usually transparent enough, heavily damage the images at low energies and particularly around 100 eV, albeit the sign of the material contrast remains preserved.

I. Electronic Contrast in Semiconductors Observation of doped areas with respect to the semiconductor substrate, both in plan view and on cleaved cross-sections, is one of the major tasks of microscopists, imposed by the semiconductor industry, which is faced with requests for continued diminution of the size of features and increase of the throughput. Several times we recalled the instrumentation branch of IC testers that are represented by low-energy SEMs with special sophisticated attachments. However, the basic question, how to get the best visualization

SCANNING LEEM

427

of the doped areas and what is the correct contrast interpretation, does not seem to have been deﬁnitively answered so far. It is obvious that no material contrast can reveal the dopant concentrations as low as 1016 to 1019 cm3, when in a matrix of 5 1022 cm3 of silicon atoms. Still, successful observations have been made since the mid-1990s and interpreted via the electronic contrast mechanism. Mu¨llerova´ et al. (2002) reviewed the previous studies and summarized the present situation in understanding the dopant contrast. The main points are that this contrast is observed in the SE emission, achieves up to 10% level when calculated from the equation Cp=n ¼

Sp Sn Sn

ð68Þ

with Sp and Sn as the mean signal levels in p- and n-type areas, respectively, that p-type generally appears brighter than n-type, and that Cp/n grows toward low energies. In Figure 66 the main ideas are represented of the contrast model described by Sealy et al. (2000), which relies upon diﬀerences in the ionization energy, i.e., a distance between the valence band top and the vacuum energy. Because the tiny content of dopant cannot change this characteristic, the ionization energies Ep and En are considered identical but the local ‘‘vacuum’’ level varies in the model, being then balanced via abovesurface patch ﬁelds created by surface dipoles of nonconstant density. When the patch ﬁelds disappear at a distance comparable with the sizes of the doped areas, some average reference energy level is progressively reached suﬃciently far from the specimen. A consequence is that electrons to be emitted from the n-type area have to surmount a barrier higher by some En. The ﬂat band situation, shown in Figure 66, is modiﬁed when the presence of the surface states is taken into account, namely so that the band bending causes a drop in En and hence a contrast decrease. For the Fermi level pinned mid-gap at a high density of surface states, no contrast should be observed. The SLEEM observations were made on a boron-doped p-type patterned structure fabricated in an n-type Si substrate using two instruments with considerably diﬀerent vacuum conditions (Mu¨llerova´ et al., 2002). The experiments (see Figure 67) conﬁrmed the basic premises of the model, i.e., no BSE contrast and a moderate contrast in the SE emission. However, the most important ﬁnding was that signiﬁcant increase in contrast was registered in the SLEEM mode. Careful contrast quantiﬁcation veriﬁed the high contrast for the specimen inserted into the CL and revealed that even vacuum conditions play a very important role: at a standard vacuum of the

428

MU¨LLEROVA´ AND FRANK

FIGURE 66. Combined band structures of p and n regions in the same specimen, with no inﬂuence of surface states assumed.

FIGURE 67. Boron-doped (1 1019 cm3) p-type patterns on an n-type phosphor-doped (4 to 6 1014 cm3) Si (111) substrate: (a) BSE image at 10 keV, (b) SE image at 10 keV, (c) SLEEM image at EP ¼ 10 keV, E ¼ 1 keV; Tescan Vega 5130 SEM adapted for the SLEEM method, the width of the ﬁeld of view is 350 mm for (a) and (b) and 500 mm for (c). (Specimen provided by Ing. B. Necˇasova´, Tesla Sezam, Inc., Rozˇnov p/R, Czech Republic.)

order of 103 to 104 Pa the contrast clearly surpasses that obtained under clean UHV conditions (Figure 68). The existence of the extremely high contrast for a specimen immersed into a moderate electric ﬁeld not exceeding 2 V mm1, i.e., weaker than ﬁelds normally applied to semiconductor structures under operation, and the enhancement under

SCANNING LEEM

429

FIGURE 68. The electron energy dependence of the SLEEM image contrast between p and n areas for the specimen shown in Figure 67; (A) dedicated UHV SLEEM microscope (see Section VII.B), (B) standard vacuum conditions, (C) SE signal from a standard ET detector. ((B) and (C) from Tescan Vega 5130 SEM adapted for the SLEEM method.)

FIGURE 69. The p/n contrast measured in the SLEEM mode for constant impact energy E ¼ 1 keV but variable primary energy EP.

routine vacuum conditions are facts very promising for application of the SLEEM method in semiconductor diagnostics and testing. The inﬂuence of the CL ﬁeld is further illustrated by Figure 69 showing directly the contrast dependence on the ﬁeld strength. The low-ﬁeld limit obviously ﬁts the contrast level achieved with the standard ET detector (see Figure 68). The crucial role of the vacuum conditions clearly indicates that the realization about the clean crystal surface as a base for the contrast interpretation is not correct. Further experiments showed that the contrast

430

MU¨LLEROVA´ AND FRANK

could be manipulated and even inverted by coating the structure with metals of various work functions. On this basis a new model was proposed (El-Gomati et al., 2003) that considers the surface to be covered by a graphitic layer of contaminants with quasimetal properties and a metal–semiconductor junction to be formed beneath the surface. The subsurface ﬁelds, connected with the junction, explain successfully the observed phenomena even in cases when no patch ﬁeld can be created, for example with the metalized surface that has to be taken as an equipotential one.

J. Energy-Band Contrast In Section III.A.2, we described the reﬂection of very slow electrons on energy gaps, i.e., a contrast mechanism quite exotic from the point of view of SEM practice. In Figure 5 this was illustrated by measured energy dependences of the (00) spot intensity for two crystal orientations of tungsten. However, demonstration of this contrast in a SEM micrograph is diﬃcult—any bicrystal and/or polycrystal specimens exhibit a combination of contrasts caused by phenomena anisotropic with respect to the crystal orientation so that reliably extracting this contribution is a problematic task. One exception is a semiconductor structure with patterned doping, observed in plan view. A clean semiconductor surface can be believed to possess identical properties on the doped pattern as well as on the substrate and the same holds for the crystal orientation in the sense that a small amount of dopant cannot change the electron yields. Nevertheless, additional impurity levels in the energy-band structure, namely those appearing in the energy gaps, can manifest themselves via this contrast mechanism. If such an energy level is hit, electrons penetrate into the doped pattern but not into the surrounding substrate so that the pattern appears dark. The ﬁrst successful observation was announced by Mu¨llerova´ et al. (2001) and is shown in Figure 70. A signal decrease is apparent in micrographs taken at 3 and 1 eV and very pronouncedly in the 0.5 eV frame. This ﬁrst experience has proved that this type of contrast is strongly dependent on even a tiny mechanical tilt of the specimen incorporated into the cathode lens. Figure 70 was taken with a provisional specimen stage with no tilt facility. Hence no true CL ﬁeld alignment was possible and inﬂuences of inhomogeneity of the retarding ﬁeld could be only compensated by suitable misalignment of the objective lens, which resulted in lowered resolution and enhanced axially nonsymmetric aberrations.

SCANNING LEEM

431

FIGURE 70. A p-type rectangle on the specimen shown in Figure 67, the SLEEM image at the electron energies from the top left by rows: 7, 4, 3, 2, 1, and 0.5 eV; dedicated UHV SLEEM microscope (see Section VII.B), the width of the ﬁeld of view is 70 mm.

IX. CONCLUSIONS The element of instrumentation common to the history of work summarized in this text, the cathode lens, is in fact a very simple and very old assembly that can be easily incorporated into any electron optical device. In spite of this, it took more than 10 years before it started to appear frequently in the titles of papers in the journals devoted to scanning electron microscopy and its applications. The authors of this review feel a certain satisfaction about this development and about the forthcoming commercial devices containing this attachment, which may belong to the family of dedicated instruments for IC technologies or even to general purpose SEMs. Progress in this direction can break the ‘‘magic ring’’ and the increasing number of instruments will expand the community of users who quickly extend the application ﬁelds, etc. A UHV version of the instrument, equipped with devices for surface microanalysis methods, opens the way to examination of fascinating physical phenomena taking place on crystal surfaces that were revealed by the LEEM method. The scanning counterpart can take advantage

432

MU¨LLEROVA´ AND FRANK

of multiple signal acquisition and simultaneous compilation of separate image slices for individual diﬀraction spots, possibly even completed by additional signals. More experienced users with some technical background can introduce the method into their commercial SEM instruments by an eﬀort comparable with embarking on any other small nonstandard adaptation. For a boosterequipped SEM the adaptation might be quite trivial. In the near future, the ﬁrst commercial SEM with the CL mode among the standard operation routines is expected. But, as for any other experimental method, the future progress of this method will also depend on its usefulness for a suﬃciently broad community of users.

ACKNOWLEDGMENTS This chapter reviews a major part of the work of the authors’ team since the beginning of the 1990s. In the course of this time several particular projects have been brought to a successful conclusion under support of the Grant Agency of the Czech Republic and of the Grant Agency of the Academy of Sciences of the Czech Republic. The ﬁnal period was supported by the GA ASCR grant no. A1065901. The results presented were naturally obtained in collaboration with other team members, both present and past, in particular Dr. Martin Zadrazˇil, Mr. Pavel Klein, and Mr. Mojmı´ r Sirny´. The participation of other members of the Institute of Scientiﬁc Instruments of ASCR in Brno and the Institute’s background in general were crucial for the whole long-term program. External cooperation was most intensive with Professor E. Bauer (TU Clausthal, Germany, and later the Arizona State University) and with Professor M.M. El-Gomati (University of York, UK). The authors express their profound gratitude to all who helped them in their work. The ﬁnal manuscript was compiled during a fruitful stay at the University of Toyama, Japan, for which sincere thanks are due to Professors S. Ikeno and M. Shiojiri and to Dr. K. Matsuda.

REFERENCES Autrata, R. (1989). Backscattered electron imaging using single crystal scintillator detectors. Scanning Microsc. 3, 739–763. Autrata, R., Hermann, R., and Mu¨ller, M. (1992). An eﬃcient single crystal BSE detector in SEM. Scanning 14, 127–135.

SCANNING LEEM

433

Autrata, R., and Hejna, J. (1991). Detectors for low voltage scanning electron microscopy. Scanning 13, 275–287. Autrata, R., and Schauer, P. (1994). Behaviour of planar and annular YAG single crystal detectors for LVSEM operation, in Proceedings of the Thirteenth International Congress on Electron Microscopy, Vol. 1, edited by B. Jouﬀrey and C. Colliex. Les Ulis, France: Les e´ditions de Physique, pp. 71–72. Autrata, R., and Schauer, P. (1998). Single crystal scintillation detectors for LVSEM, in Proceedings of the Fourteenth International Congress on Electron Microscopy, Vol. 1, edited by H. A. Calderon Benavides and M. J. Yacaman. Bristol, UK: Institute of Physics, pp. 437–438. Barth, J. E., and Kruit, P. (1996). Addition of diﬀerent contributions to the charged particle probe size. Optik 101, 101–109. Bartosˇ , I., van Hove, M. A., and Altman, M. S. (1996). Cu (111) electron band structure and channeling by VLEED. Surface Sci. 352–354, 660–664. Bauer, E. (1962). Low energy electron reﬂection microscopy, in Proceedings of the Fifth International Congress on Electron Microscopy, Vol. I, edited by S. S. Bresse, Jr. New York: Academic Press, pp. D11–12. Bauer, E. (1994). Low energy electron microscopy. Rep. Progr. Phys. 57, 895–938. Bauer, E., and Telieps, W. (1988). Emission and low-energy reﬂection electron microscopy, in Surface and Interface Characterization by Electron Optical Methods, edited by A. Howie and U. Valdre, New York: Plenum Press, pp. 195–233. Bauer, H. D. (1979). Messungen zur Energieverteilung von Ru¨ckstreu-elektronen an polykristallinen Festko¨rpern. Exp. Techn. Phys. 27, 331–344. Bauer, H. E., and Seiler, H. (1984). Determination of the non-charging electron beam energies of electrically ﬂoating metal samples, in Scanning Electron Microscopy, Vol. III, edited by O. Johari. Chicago: SEM, pp. 1081–1088. Beck, S., Plies, E., and Schiebel, B. (1995). Low-voltage probe forming columns for electrons. Nucl. Instrum. Methods Phys. Res. A 363, 31–42. Berry, V. K. (1988). Characterization of polymer blends by low voltage scanning electron microscopy. Scanning 10, 19–27. Bindi, R., Lanteri, H., and Rostaing, P. (1980). A new approach and resolution method of the Boltzman equation applied to secondary electron emission by reﬂection from polycrystalline aluminum. J. Phys. D: Appl. Phys. 13, 267–280. Bleloch, A. L., Howie, A., and Milne, R. H. (1989). High resolution secondary electron imaging and spectroscopy. Ultramicroscopy 31, 99–110. Bode, M., and Reimer, L. (1985). Detector strategy for a single-polepiece lens. Scanning 7, 125–133. Bo¨ngeler, R., Golla, U., Ka¨ssens, M., Reimer, L., Schindler, B., Senkel, R., and Spranck, M. (1993). Electron-specimen interactions in low-voltage scanning electron microscopy. Scanning 15, 1–18. Born, M., and Wolf, E. (1975). Principles of Optics. Oxford: Pergamon Press. Brown, A. C., and Swift, J. A. (1974). Low voltage scanning electron microscopy of keratin ﬁbre surfaces, in Scanning Electron Microscopy, edited by O. Johari. Chicago: SEM, pp. 68–74. Bruining, H. (1954). Physics and Application of Secondary Electron Emission. Oxford: Pergamon Press. Brunner, M., and Schmid, R. (1987). Characteristics of an electric/magnetic quadrupole detector for low voltage scanning electron microscopy. Scanning Microsc. 1, 1501–1506. Burns, J. (1960). Angular distribution of secondary electrons from (100) faces of Cu and Ni. Phys. Rev. 119, 102–114.

434

MU¨LLEROVA´ AND FRANK

Buseck, P., Cowley, J., and Eyring, L. (1988). High-Resolution Transmission Electron Microscopy and Associated Techniques. London: Oxford University Press. Cailler, M., Ganachaud, J. P., and Bourdin, J. P. (1981). The mean free path of an electron in copper between two inelastic collisions. Thin Solid Films 75, 181–189. Cazaux, J. (1986). Some considerations on the electric ﬁeld induced in insulators by electron bombardment. J. Appl. Phys. 59, 1418–1430. Cazaux, J. (1996a). Electron probe microanalysis in insulating materials: quantiﬁcation problems and some possible solutions. X-ray Spectrom. 25, 265–280. Cazaux, J. (1996b). The electric image eﬀects at dielectric surfaces. IEEE Trans. Diel. Electr. Insul. 3, 75–79. Cazaux, J. (1999). Some considerations on the secondary electron emission, , from e irradiated insulators. J. Appl. Phys. 85, 1137–1147. Cazaux, J., and Le Gressus, C. (1991). Phenomena relating to charge in insulators: macroscopic eﬀects and microscopic causes. Scanning Microsc. 5, 17–27. Cazaux, J., and Lehuede, P. (1992). Some physical descriptions of the charging eﬀects of insulators under incident particle bombardment. J. El. Spectrosc. Rel. Phenom. 59, 49–71. Cazaux, J., Kim, K. H., Jbara, O., and Salace, G. (1991). Charging eﬀects of MgO under electron bombardment and nonohmic behaviour of the induced specimen current. J. Appl. Phys. 70, 960–965. Chung, M. S., and Everhart, T. E. (1974). Simple calculation of energy distribution of lowenergy secondary electrons emitted from metals under electron bombardment. J. Appl. Phys. 45, 707–709. Czyzewski, Z., and Joy, D. C. (1989). Fast Monte Carlo method for simulating electron scattering in solids. J. Microscopy 156, 285–291. Czyzewski, Z., MacCallum, D. O., Romig, A., and Joy, D. C. (1990). Calculations of Mott scattering cross-section. J. Appl. Phys. 68, 3066–3072. Dahl, D. A. (1995). SIMION 3D Version 6.0, in Proceedings of the Forty-Third ASMS Conference on Mass Spectrometry and Allied Topic. Santa Fe, NM: American Society for Mass Spectrometry, p. 717. Dekker, A. J. (1958). Secondary electron emission. Solid State Phys. 6, 251–315. Delong, A., and Drahosˇ , V. (1971). Low-energy electron diﬀraction in an emission electron microscope. Nature Phys. Sci. 230, 196–197. Dietrich, W., and Seiler, H. (1960). Energieverteilung von Elektronen, die durch Ionen und Elektronen in Durchstrahlung an du¨nnen Folien ausgelo¨st werden. Z. Angew. Physik 157, 576–585. Ding, Z.-J. (1990). Fundamental studies on the interactions of kV electrons with solids for application to electron spectroscopies. Ph.D. Thesis, Osaka University, Japan. Ding, Z.-J., and Shimizu, R. (1996). A Monte Carlo modeling of electron interaction with solids including cascade secondary electron production. Scanning 18, 92–113. Drescher, H., Reimer, L., and Seidel, H. (1970). Ru¨ckstreukoeﬃzient und Sekunda¨relektronenAusbeute von 10–100 keV Elektronen und Beziehungen zur Raster-Elektronenmikroskopie. Z. Angew. Physik 29, 331–336. Drucker, J., Scheinfein, M. R., Liu, J., and Weiss, J. K. (1993). Electron coincidence spectroscopy studies of secondary and Auger-electron generation mechanisms. J. Appl. Phys. 74, 7329–7339. Egerton, R. F. (1986). Electron Energy-Loss Spectroscopy in the Electron Microscope. New York: Plenum Press. El-Gomati, M. M., Barkshire, I., Greenwood, J., Kenny, P., Roberts, R., and Prutton, M. (1992). Compositional imaging in scanning Auger microscopy, in: Microscopy: The Key Research Tool, edited by C. Lyman. Chicago: Electron Microscopy Society of America, pp. 29–38.

SCANNING LEEM

435

El-Gomati, M. M., Romanovsky´, V., Frank, L., and Mu¨llerova´, I. (2002). A very low energy electron column for surface studies, in Proceedings of the Fifteenth International Congress on Electron Microscopy, Vol. 3, edited by R. Cross, J. Engelbrecht, T. Sewell, M. Witcomb, and P. Richards. Onderstepoort: Microscopy Society of Southern Africa, pp. 323–324. El-Gomati, M. M., Wells, T. C. R., Mu¨llerova´, I., Frank, L., and Jayakody, H. (2003). Why is it that diferently doped regions in semiconductors are visible in low voltage SEM? IEEE Trans. Electron Devices, submitted. Everhart, T. E., and Chung, M. S. (1972). Idealized spatial emission distribution of secondary electrons. J. Appl. Phys. 43, 3708–3711. Everhart, T. E., and Thornley, R. F. M. (1960). Wideband detector for micro-micro-ampere low electron currents. J. Sci. Instrum. 37, 246–248. Everhart, T. E., Saeki, N., Shimizu, R., and Koshikawa, T. (1976). Measurement of structure in the energy distribution of slow secondary electrons from aluminum. J. Appl. Phys. 47, 2941–2945. Ezumi, M., Otaka, T., Mori, H., Todokoro, H., and Ose, Y. (1996). Development of critical dimension measurement scanning electron microscope for ULSI (S-8000 series). Hitachi Instrument News Electron Microscopy Edition 30, 15–21. Feldman, L. C., and Mayer, J. W. (1986). Fundamentals of Surface and Thin Film Analysis. New York: Elsevier. Fitting, H.-J., Schreiber, E., Kuhr, J.-Ch., and von Czarnowski, A. (2001). Attenuation and escape depths of low-energy electron emission. J. El. Spectrosc. Rel. Phenom. 119, 35–47. Fourie, J. T. (1976). Contamination phenomena in cryopumped TEM and ultrahigh vacuum ﬁeld-emission STEM systems, in Scanning Electron Microscopy, Vol. I, edited by O. Johari. Chicago: SEM, pp. 53–60. Fourie, J. T. (1979). A theory of surface-originating contamination and a method for its elimination, in Scanning Electron Microscopy, Vol. II, edited by O. Johari. Chicago: SEM, pp. 87–102. Fourie, J. T. (1981). Electric eﬀects in contamination and electron beam etching, in Scanning Electron Microscopy, Vol. I., edited by O. Johari. Chicago: SEM, pp. 127–134. Frank, L. (1992a). Towards electron beam tomography, in Proceedings of the Tenth European Congress on Electron Microscopy, Vol. 1, edited by A. Rios, J.M. Arias, L. Megias-Megias, and A. Lopez-Galindo. Granada: Univ. de Granada, pp. 141–142. Frank, L. (1992b). Experimental study of electron backscattering at interfaces. Surface Sci. 269/270, 763–771. Frank, L. (1996a). Real image resolution of SEM and low-energy SEM and its optimization. Ultramicroscopy 62, 261–269. Frank, L. (1996b). Width of the SEM and LESEM response function as a tool for the image resolution assessment, in Proceedings of the Ninth Conference on Electron Microscopy of Solids, edited by A. Czyrska-Filemonowicz. Krako´w: State Committee for Scientiﬁc Research, pp. 109–112. Frank, L. (2002). Advances in scanning electron microscopy, in Advances in Imaging and Electron Physics, Vol. 123, edited by P. W. Hawkes. San Diego: Academic Press, pp. 327–373. Frank, L., and Mu¨llerova´, I. (1994). Zero-charging electron microscopy in a cathode lens equipped SEM, in Proceedings of the Thirteenth International Congress on Electron Microscopy, Vol. 1, edited by B. Jouﬀrey and C. Colliex. Les Ulis, France: Les e´ditions de Physique, pp. 139–140. Frank, L., and Mu¨llerova´, I. (1999). Strategies for low- and very-low-energy SEM. J. El. Microsc. 48, 205–219. Frank, L., Mu¨llerova´, I., and El-Gomati, M. M. (2000a). A novel in-lens detector for electrostatic scanning LEEM column. Ultramicroscopy 81, 99–110.

436

MU¨LLEROVA´ AND FRANK

Frank, L., Mu¨llerova´, I., and El-Gomati, M. M. (2002). SEM visualization of doping in semiconductors, in Proceedings of the Fifteenth International Congress on Electron Microscopy, Vol. 1, edited by R. Cross, J. Engelbrecht, M. Witcomb, and P. Richards. Onderstepoort: Microscopy Society of Southern Africa, pp. 39–40. Frank, L., Mu¨llerova´, I., Faulian, K., and Bauer, E. (1999). The scanning low-energy electron microscope: ﬁrst attainment of diﬀraction contrast in the scanning electron microscope. Scanning 21, 1–13. Frank, L., Stekly´, R., Zadrazˇil, M., El-Gomati, M. M., and Mu¨llerova´, I. (2000b). Electron backscattering from real and in situ treated surfaces. Mikrochim. Acta 132, 179–188. Frank, L., Zadrazˇil, M., and Mu¨llerova´, I. (2001). Scanning electron microscopy of nonconductive specimens at critical energies in a cathode lens system. Scanning 23, 36–50. Frosien, J., and Plies, E. (1987). High performance electron optical column for testing ICs with submicrometer design rules. Microelectronic Engineering 7, 163–172. Frosien, J., Plies, E., and Anger, K. (1989). Compound magnetic and electrostatic lenses for low voltage applications. J. Vac. Sci. Technol. B 7, 1874–1877. Gergely, G. (1986). Elastic peak electron spectroscopy. Scanning 8, 203–214. Gergely, G., Menyha´rd, M., and Sulyok, A. (1986). Some new possibilities in non-destructive depth proﬁling using secondary emission spectroscopy: REELS and EPES. Vacuum 36, 471–475. Glaser, W. (1952). Grundlagen der Elektronenoptik. Wien: Springer-Verlag. Gries, W. H., and Werner, W. (1990). Take-oﬀ angle and ﬁlm thickness dependences of the attenuation length of x-ray photoelectrons by a trajectory reversal method. Surf. Interf. Anal. 16, 149–153. Gryzinski, M. (1965). Classical theory of atomic collisions. I. Theory of inelastic collisions. Phys. Rev. A 138, 336–358. Hachenberg, O., and Brauer, W. (1959). Secondary electron emission from solids, in Advances in Electronics and Electron Physics, Vol. II, edited by L. Marton. San Diego: Academic Press, pp. 413–499. Hasselbach, F., and Krauss, H.-R. (1988). Backscattered electrons and their inﬂuence on contrast in the scanning electron microscope. Scanning Microsc. 2, 1947–1956. Hasselbach, F., and Rieke, U. (1982). Spatial distribution of secondaries released by backscattered electrons in silicon and gold for 20–70 keV primary energy, in Proceedings of the Tenth International Congress on Electron Microscopy, Vol. 1, edited by the Congressional Organization Committee. Hamburg: Deutsche Ges. EM, pp. 253–254. Hawkes, P. W. (1997). Aberrations, in Handbook of Charged Particle Optics, edited by J. Orloﬀ. New York: CRC Press, pp. 223–274. Hawkes, P. W., and Kasper, E. (1996a). Principles of Electron Optics Vol. 2, Applied Geometrical Optics. San Diego: Academic Press. Hawkes, P. W., and Kasper, E. (1996b). Principles of Electron Optics Vol. 3, Wave Optics. San Diego: Academic Press. Hejna, J. (1994). Backscattered electron imaging in low-voltage SEM, in Proceedings of the Thirteenth International Congress on Electron Microscopy, Vol. 1, edited by B. Jouﬀrey and C. Colliex. Les Ulis, France: Les e´ditions de Physique, pp. 75–76. Hejna, J. (1998). Optimization of an immersion lens design in the BSE detector for the low voltage SEM, in Recent Trends in Charged Particle Optics and Surface Physics Instrumentation, Sixth Seminar, edited by I. Mu¨llerova´ and L. Frank. Brno: Czechoslovak Society for Electron Microscopy, pp. 30–31. Ho, Y. C., Tan, Z. Y., Wang, X. L., and Chen, J. G. (1991). A theory and Monte Carlo calculation on low-energy electron scattering in solids. Scanning Microsc. 4, 945–951.

SCANNING LEEM

437

Homma, Y., Suzuki, M., and Tomita, M. (1993). Atomic conﬁguration dependent secondary electron emission from reconstructed silicon surfaces. Appl. Phys. Lett. 62, 3276–3278. Hutarˇ , O., Oral, M., Mu¨llerova´, I., and Frank, L. (2000). Dimension measurement in a cathode lens equipped low-energy SEM, in Proceedings of the Twelfth European Congress on Electron Microscopy, Vol. 3, edited by L. Frank and F. Cˇiampor. Brno: Czechoslovak Society for Electron Microscopy, pp. 199–200. Ichimura, S. (1980). Basic study of scanning Auger electron microscopy for surface analysis. Ph.D. Thesis, Osaka University, Japan. Ichinokawa, T., Ishikawa, Y., Kemmochi, M., Ikeda, N., Hosokawa, Y., and Kirschner, J. (1986). Low-energy scanning electron microscopy combined with low-energy electron diﬀraction. Surface Sci. 176, 397–414. Ichinokawa, T., Ishikawa, Y., Kemmochi, M., Ikeda, N., Hosokawa, Y., and Kirschner, J. (1987). Scanning low-energy electron microscopy. Scanning Microsc. Suppl. 1, 93–97. Ishikawa, Y., Ikeda, N., Kemmochi, M., and Ichinokawa, T. (1985). UHV-SEM observations of cleaning process and step formation on silicon (111) surfaces by annealing. Surface Sci. 159, 256–264. Jacka, M., Kirk, M., El-Gomati, M. M., and Prutton, M. (1999). A fast, parallel acquisition, electron energy analyzer: The hyperbolic ﬁeld analyzer. Rev. Sci. Instrum. 70, 2282–2287. Jaklevic, R. C., and Davis, L. C. (1982). Band signatures in the low-energy-electron reﬂectance spectra for fcc metals. Phys. Rev. B 26, 5391–5397. Joy, D. C. (1984). Beam interactions, contrast and resolution in the SEM. J. Microsc. 136, 241–258. Joy, D. C. (1987). A model for calculating secondary and backscattered electron yields. J. Microsc. 147, 51–64. Joy, D. C. (1989). Control of charging in low-voltage SEM. Scanning 11, 1–4. Joy, D. C. (1995). Monte Carlo Modeling for Electron Microscopy and Microanalysis. Oxford Series in Optical and Imaging Sciences. London: Oxford University Press. Joy, D. C. (2001). A database of electron-solid interactions, Rev. 01–01. http://web.utk.edu/ srcutk/htm/interact.htm. Joy, D. C., and Joy, C. S. (1996). Low voltage scanning electron microscopy. Micron 27, 247–263. Joy, D. C., and Luo, S. (1989). An empirical stopping power relationship for low-energy electrons. Scanning 11, 176–180. Kanaya, K., and Kawakatsu, H. (1972). Secondary electron emission due to primary and backscattered electrons. J. Phys. D: Appl. Phys. 5, 1727–1742. Kanter, H. (1961). Contribution of backscattered electrons to secondary electron formation. Phys. Rev. 121, 681–684. Kato, M., and Tsuno, K. (1990). Numerical analysis of trajectories and aberrations of a Wien ﬁlter including the eﬀect of fringing ﬁelds. Nucl. Instrum. Meth. Phys. Res. A298, 296–320. Kazumori, H. (2002). Development of JSM-7400F: new secondary electron detection systems permit observation of non-conductive materials. JEOL News 37E(1), 44–47. Khursheed, A. (2002). Aberration characteristics of immersion lenses for LVSEM. Ultramicroscopy 93, 331–338. Kirschner, J. (1984). On the role of the electron spin in scanning electron microscopy, in Scanning Electron Microscopy, Vol. III. edited by O. Johari. Chicago: SEM, pp. 1179–1185. Knell, G., and Plies, E. (1998). Determination of the collection eﬃciency for combined magnetic–electrostatic SEM objective lenses. Optik 108, 37–42. Kohl, H., Rose, H., and Schnabl, H. (1981). Dose-rate eﬀect at low temperatures in FBEM and STEM due to object heating. Optik 58, 11–24.

438

MU¨LLEROVA´ AND FRANK

Kolarˇ ı´ k, R., and Lenc, M. (1997). An expression for the resolving power of a simple optical system. Optik 106, 135–139. Kollath, R. (1956). Sekunda¨relektronen-Emission fester Ko¨rper bei Bestrahlung mit Elektronen, in Handbuch der Physik, Vol. 21. Berlin: Springer-Verlag, pp. 232–303. Kruit, P., and Lenc, M. (1992). Optical properties of the magnetic monopole ﬁeld applied to electron microscopy and spectroscopy. J. Appl. Phys. 72, 4505–4513. Kuhr, J.-Ch., and Fitting, H.-J. (1998). Monte-Carlo simulation of low-voltage scanning electron microscopy—LVSEM, in Proceedings of the Fourteenth International Congress on Electron Microscopy, Vol. 1, edited by H.A. Calderon Benavides and M.J. Yacaman. Bristol, UK: Institute of Physics, pp. 451–452. Kuhr, J.-Ch., and Fitting, H.-J. (1999). Monte Carlo simulation of electron emission from solids. J. El. Spectrosc. Rel. Phenom. 105, 257–273. Kulenkampﬀ, H., and Spyra, W. (1954). Energieverteilung ru¨ckdiﬀundierter Elektronen. Z. Phys. 137, 416–425. Le Gressus, C., Valin, F., Gautier, M., Duraud, J. P., Cazaux, J., and Okuzumi, H. (1990). Charging phenomena on insulating materials: mechanisms and applications. Scanning 12, 203–210. Lenc, M. (1995). Immersion objective lenses for very low energy electron microscopy, in Proceedings of the Multinational Congress on Electron Microscopy, edited by F. Ciampor. Bratislava: Slovak Academic Press, pp. 103–104. Lenc, M., and Mu¨llerova´, I. (1992a). Electron optical properties of a cathode lens. Ultramicroscopy 41, 411–417. Lenc, M., and Mu¨llerova´, I. (1992b). Optical properties and axial aberration coeﬃcients of the cathode lens in combination with a focusing lens. Ultramicroscopy 45, 159–162. Lencova´, B. (1997). Electrostatic lenses. Handbook of Charged Particle Optics, edited by J. Orloﬀ. New York: CRC Press, pp. 177–221. Lencova´, B., and Wisselink, G. (1990). Program package for the computation of lenses and deﬂectors. Nucl. Instrum. Meth. A298, 56–66. Libinson, A. G. (1999). Tilt dependence of secondary electron emission at low excitation energy. Scanning 21, 23–26. Liebel, H., and Senftinger, B. (1991). Low-energy electron microscope of novel design. Ultramicroscopy 36, 91–98. Llacer, J., and Garwin, E. L. (1969). Electron–phonon interaction in alkali halides—I. The transport of secondary electrons with energies between 0.25 and 7.5 eV. J. Appl. Phys. 40, 2766–2775. Mankos, M., and Adler, D. (2002). Electron–electron interactions in cathode objective lenses. Ultramicroscopy 93, 347–354. Martin, J. P., Weimer, E., Frosien, J., and Lanio S. (1994). Ultra-high resolution SEM—A new approach. Microscopy and Analysis (USA), March, 19. McKernan, S. (1998). A comparison of detectors for low voltage contrast in the SEM, in Proceedings of the Fourteenth International Congress on Electron Microscopy, Vol. 1, edited by H. A. Calderon Benavides and M. J. Yacaman. Bristol, UK: Institute of Physics, pp. 481–482. Meisburger, W. D., Brodie, A. D., and Desai, A. A. (1992). Low-voltage electron-optical system for the high-speed inspection of integrated circuits. J. Vac. Sci. Technol. B 10, 2804–2808. Miyoshi, M., Yamazaki, Y., Nagai, T., and Nagahama, I. (1999). Development of a projection imaging electron microscope with electrostatic lenses. J. Vac. Sci. Technol. B 17, 2799–2802. Morin, P., Pitaval, M., and Vicario, E. (1976). Direct observation of insulators with a scanning electron microscope. J. Phys. E: Sci. Instrum. 9, 1017–1020.

SCANNING LEEM

439

Mott, N. F., and Massey, H. S. W. (1965). The Theory of Atomic Collisions. London: Oxford University Press. Mu¨llerova´, I. (1996). Contrast mechanisms in low voltage SEM, in Proceedings of the Ninth Conference on Electron Microscopy of Solids, edited by A. Czyrska-Filemonowicz. Krako´w: State Committee for Scientiﬁc Research, pp. 93–96. Mu¨llerova´, I. (2001). Imaging of specimens at optimized low and very low energies in scanning electron microscopes. Scanning 23, 379–394. Mu¨llerova´, I., and Frank, L. (1993). Very low energy microscopy in commercial SEMs. Scanning 15, 193–201. Mu¨llerova´, I., and Frank, L. (1994). Use of cathode lens in scanning electron microscope for low voltage applications. Mikrochim. Acta 114/115, 389–396. Mu¨llerova´, I., and Frank, L. (2002). Practical resolution limit in the scanning lowenergy electron microscope, in Proceedings of the Fifteenth International Congress on Electron Microscopy, Vol. 3, edited by R. Cross, J. Engelbrecht, T. Sewell, M. Witcomb, and P. Richards. Onderstepoort: Microscopy Society of Southern Africa, pp. 99–100. Mu¨llerova´, I., and Frank, L. (2003). Contrast at very low energies of the gold/carbon specimen for resolution testing. Scanning, submitted. Mu¨llerova´, I., and Lenc, M. (1992a). Some approaches to low-voltage scanning electron microscopy. Ultramicroscopy 41, 399–410. Mu¨llerova´, I., and Lenc, M. (1992b). The scanning very-low-energy electron microscope (SVLEEM). Mikrochim. Acta (Suppl.) 12, 173–177. Mu¨llerova´, I., El-Gomati, M. M., and Frank, L. (2002). Imaging of the boron doping in silicon using low-energy SEM. Ultramicroscopy 93, 223–243. Mu¨llerova´, I., Frank, L., and Hutarˇ , O. (2001). Visualization of the energy band contrast in SEM through low-energy electron reﬂectance. Scanning 23, 115. Mu¨llerova´, I., Lenc, M., and Floria´n, M. (1989). Collection of backscattered electrons with a single polepiece lens and a multiple detector. Scanning Microsc. 3, 419–428. Mu¨llerova´, I., Zadrazˇil, M., and Frank, L. (1997). Low-energy SEM imaging of bevelled multilayers. J. Comput. Assist. Microsc. 9, 121–122. Mulvey, T. (1984). Magnetic electron lenses II, in Electron Optical Systems for Microscopy, Microanalysis and Microlithography, edited by J. J. Hren, F. A. Lenz, E. Munro, and P. B. Sewell. Chicago: SEM, pp. 15–27. Nagatani, T., Saito, S., Sato, M., and Yamada, M. (1987). Development of an ultrahigh resolution SEM by means of a ﬁeld emission source and in-lens system. Scanning Microsc. 1, 901–909. Nieminen, R. M. (1988). Stopping power for low-energy electrons. Scanning Microsc. 2, 1917–1926. Ono, S., and Kanaya, K. (1979). The energy dependence of secondary emission based on the range-energy retardation power formula. J. Phys. D: Appl. Phys. 12, 619–632. Paden, R. S., and Nixon, W. C. (1968). Retarding ﬁeld scanning electron microscopy. J. Phys. E: Sci. Instrum. 1, 1073–1080. Pawley, J. B. (1984). Low voltage scanning electron microscopy. J. Microsc. 136, 45–68. Pejchl, D., Mu¨llerova´, I., and Frank, L. (1993). Unconventional imaging of surface relief. Czech. J. Phys. 43, 983–992. Pejchl, D., Mu¨llerova´, I., Frank, L., and Kolarˇ ı´ k, V. (1994). Separator of primary and signal electrons for very-low-energy SEM. Czech. J. Phys. 44, 269–276. Penn, D. R. (1987). Electron mean free path calculations using a model dielectric function. Phys. Rev. B 35, 482–486.

440

MU¨LLEROVA´ AND FRANK

Pfeﬀerkorn, G. E., Gruter, H., and Pfautsch, M. (1972). Observations on the prevention of specimen charging, in Scanning Electron Microscopy, Vol. I, edited by O. Johari. Chicago: SEM, pp. 147–152. Plies, E., Degel, B., Hayn, A., Knell, G., and Schiebel, B. (1998). Experimental results using a ‘‘low-voltage booster’’ in a conventional SEM, in Proceedings of the Fifth International Conference on Charged Particle Optics, edited by P. Kruit and P. W. van Amersfoort. Amsterdam: Elsevier, pp. 126–130. Powell, C. J. (1974). Attenuation lengths of low-energy electrons in solids. Surface Sci. 44, 29–46. Powell, C. J. (1984). Inelastic mean free paths and attenuation lengths of low-energy electrons in solids, in Scanning Electron Microscopy, Vol. IV, edited by O. Johari. Chicago: SEM, pp. 1649–1664. Powell, C. J. (1985). Calculations of electron inelastic mean free paths from experimental optical data. Surf. Interf. Anal. 7, 263–274. Powell, C. J. (1987). The energy dependence of electron inelastic mean free path. Surf. Interf. Anal. 10, 349–354. Preikszas, D., and Rose, H. (1995). Procedures for minimizing the aberrations of electromagnetic compound lenses. Optik 100, 179–187. Price, C. W., and McCarthy, P. L. (1988). Low voltage scanning electron microscopy of lowdensity materials. Scanning 10, 29–36. Rao-Sahib, T. S., and Wittry, D. B. (1974). X-ray continuum from thick element targets for 10– 50 keV electrons. J. Appl. Phys. 45, 5060–5068. Recknagel, A. (1941). Theorie des elektrischen Elektronenmikroskops fu¨r Selbststrahler. Z. Phys. 117, 689–708. Reimer, L. (1971). Rauschen der Sekunda¨relektronenemission. Beitra¨ge Elektr. Direktabb. Oberﬂa¨chen (BEDO) 412, 299–304. Reimer, L. (1995). In Energy-Filtering Transmission Electron Microscopy, edited by L. Reimer. Berlin: Springer-Verlag, pp. 7–9. Reimer, L. (1996). MOCASIM—Ein Monte Carlo Programm fu¨r Forschung und Lehre. Beitr. Elektr. Mikr. Direktabb. Oberﬂ. (BEDO) 29, 1–10. Reimer, L. (1998). Scanning Electron Microscopy. Berlin: Springer-Verlag. Reimer, L., and Ka¨ssens, M. (1994). Application of a two-detector system for secondary and backscattered electrons in LVSEM, in Proceedings of the Thirteenth International Congress on Electron Microscopy, Vol. 1, edited by B. Jouﬀrey and C. Colliex. Les Ulis, France: Les e´ditions de Physique, pp. 73–74. Reimer, L., and Wa¨chter, M. (1978). Contribution to the contamination problem in TEM. Ultramicroscopy 3, 169–174. Reimer, L., Bo¨ngeler, R., Ka¨ssens, M., Liebscherr, F. F., and Senkel, R. (1991). Calculation of energy spectra from layered structures for backscattered electron spectrometry and relations to Rutherford backscattering spectrometry by ions. Scanning 13, 381–391. Reimer, L., Golla, U., Bo¨ngeler, R., Ka¨ssens, M., Schindler, B., and Senkel, R. (1992). Charging of bulk specimens, insulating layers and free supporting ﬁlms in scanning electron microscopy. Optik 92, 14–22. Reimer, L., and Lo¨dding, B. (1984). Calculation and tabulation of Mott cross-sections for large-angle scattering. Scanning 6, 128–151. Reimer, L., and Tollkamp, C. (1980). Measuring the backscattering coeﬃcient and secondary electron yield inside a SEM. Scanning 8, 35–39. Rose, H. (1987). The retarding Wien ﬁlter as a high-performance imaging ﬁlter. Optik 77, 26–34. Rose, H., and Preikszas, D. (1992). Outline of a versatile corrected LEEM. Optik 92, 31–44.

SCANNING LEEM

441

Rose, H., and Spehr, R. (1980). On the theory of the Boersch eﬀect. Optik 57, 339–364. Salehi, M., and Flinn, E. A. (1981). Dependence of secondary electron emission from amorphous materials on primary angle of incidence. J. Appl. Phys. 52, 994–996. Sato, M., Todokoro, H., and Kageyama, K. (1993). A snorkel type objective lens with E B ﬁeld for detecting secondary electrons. SPIE 2014, 17–23. Scha¨fer, J., and Ho¨lzl, J. (1972). A contribution to the dependence of secondary electron emission from the work function and Fermi energy. Thin Solid Films 18, 81–86. Schauer, P., and Autrata, R. (1998). Computer optimized design of BSE scintillation detector for SEM, in Proceedings of the Eleventh European Congress on Electron Microscopy, Vol. I, edited by Committee of European Societies for Microscopy. Brussels: CESM, pp. 369–370. Schmid, R., and Brunner, M. (1986). Design and application of a quadrupole detector for lowvoltage scanning electron microscopy. Scanning 8, 294–299. Schmid, R., Gaukler, K.H., and Seiler, H. (1983). Measurement of elastically reﬂected electrons (E 2.5 keV) for imaging of surfaces in a simple ultra high vacuum scanning electron microscope, in Scanning Electron Microscopy, Vol. II, edited by O. Johari. Chicago: AMF O’Hare, pp. 501–509. Schreiber, E., and Fitting, H.-J. (2002). Monte Carlo simulation of secondary electron emission from the insulator SiO2. J. El. Spectrosc. Rel. Phenom. 124, 25–37. Seah, M. P., and Dench, W. A. (1979). Quantitative electron spectroscopy of surfaces: a standard database for electron inelastic mean free paths in solids. Surf. Interf. Anal. 1, 1–11. Sealy, C. P., Castell, M. R., and Wilshaw, P. R. (2000). Mechanism for secondary electron dopant contrast in the SEM. J. El. Microsc. 49, 311–321. Seiler, H. (1967). Some problems of secondary electron emission. Z. Angew. Physik 22, 249–263. Seiler, H. (1983). Secondary electron emission in the scanning electron microscope. J. Appl. Phys. 54, R1–R18. Seiler, H., and Kuhnle, G. (1970). Anisotropy of the secondary electron yield as a function of the energy of the primary electrons from 5 to 20 keV. Z. Angew. Physik 29, 254–260. Shaﬀner, T. J., and Van Veld, R. D. (1971). ‘‘Charging’’ eﬀects in the scanning electron microscope. J. Phys. E: Sci. Instrum. 4, 633–637. Shao, Z. (1989). Extraction of secondary electrons in a newly proposed immersion lens. Rev. Sci. Instrum. 60, 693–699. Spehr, R. (1985). Broadening of charged particle microprobes by stochastic Coulomb interactions. Optik 70, 109–114. Strocov, V. N., and Starnberg, H. I. (1995). Absolute band-structure determination by target current spectroscopy: Application to Cu(100). Phys. Rev. B 52, 8759–8765. Strocov, V. N., Starnberg, H. I., Nilsson, P. O., and Holleboom, L. J. (1996). Determining unoccupied bands of layered materials by VLEED: implications for photoemission band mapping. J. Phys.: Condens. Matter 8, 7549–7559. Takashima, S. (1994). New electron optical technologies in low voltage scanning electron microscope. JEOL News 31E(1), 33–35. Tanuma, S., Powell, C. J., and Penn, D. R. (1991a). Calculations of electron inelastic mean free paths II. Data for 27 elements over the 50–2000 eV range. Surf. Interf. Anal. 17, 911–926. Tanuma, S., Powell, C. J., and Penn, D. R. (1991b). Calculations of electron inelastic mean free paths III. Data for 15 inorganic compounds over the 50–2000 eV range. Surf. Interf. Anal. 17, 927–939. Telieps, W. (1987). Surface imaging with LEEM. Appl. Phys. A 44, 55–61. Telieps, W., and Bauer, E. (1985). An analytical reﬂection and emission UHV surface electron microscope. Ultramicroscopy 17, 57–66. Thomas, S., and Pattinson, E. B. (1970). Range of electrons and contribution of back-scattered electrons in secondary production in aluminum. J. Phys. D: Appl. Phys. 3, 349–357.

442

MU¨LLEROVA´ AND FRANK

Tromp, R. M. (2000). Low-energy electron microscopy. IBM J. Res. Develop. 44, 503–516. Tromp, R. M., and Reuter, M. C. (1993). Imaging with a low-energy electron microscope. Ultramicroscopy 50, 171–178. Tromp, R. M., Denier van der Gon, A. W., LeGoues, F. K., and Reuter, M. C. (1993). Observation of buried interfaces with low-energy electron microscopy. Phys. Rev. Lett. 71, 3299–3302. Tsai, F. C., and Crewe, A. V. (1998). A gapless magnetic objective lens for low voltage SEM. Optik 109, 5–11. Tung, C. J., Ashley, J. C., and Ritchie, R. H. (1979). Electron inelastic mean free paths and energy losses in solids, II. Electron gas statistical model. Surface Sci. 81, 427–439. Veneklasen, L. H. (1992). The continuing development of low-energy electron microscopy for characterizing surfaces. Rev. Sci. Instrum. 63, 5513–5532. Welter, L. M., and McKee, A. N. (1972). Observations on uncoated, nonconducting or thermally sensitive specimens using a fast scanning ﬁeld emission source SEM, in Scanning Electron Microscopy, Vol. I, edited by O. Johari. Chicago: SEM, pp. 161–166. Werner, W. S. M. (1992). The role of the attenuation parameter in electron spectroscopy. J. El. Spectrosc. Rel. Phenom. 59, 275–291. Werner, W. S. M. (1996). Transport equation approach to electron microbeam analysis: fundamentals and applications. Mikrochim. Acta (Suppl.) 13, 13–38. Woodruﬀ, D. P., and Delchar, T. A. (1986). Modern Techniques of Surface Science. Cambridge: Cambridge University Press. Ximen, J., Lin, P. S. D., Pawley, J. B., and Schippert, M. (1993). Electron optical design of a high-resolution low-voltage scanning electron microscope with ﬁeld emission gun. Rev. Sci. Instrum. 64, 2905–2910. Yau, Y. W., Pease, R. F. W., Iranmanesh, A. A., and Polasko, K. J. (1981). Generation and applications of ﬁnely focused beams of low-energy electrons. J. Vac. Sci. Technol. 19, 1048–1052. Zach, J. (1989). Design of a high-resolution low-voltage scanning electron microscope. Optik 83, 30–40. Zach, J., and Haider, M. (1992). A high-resolution low voltage scanning electron microscope, in Proceedings of the Tenth European Congress on Electron Microscopy, Vol. 1, edited by A. Rios, J.M. Arias, L. Megias-Megias, and A. Lopez-Galindo. Granada: Univ. de Granada, pp. 49–53. Zach, J., and Rose, H. (1986). Eﬃcient detection of secondary electrons in low-voltage scanning electron microscopy. Scanning 8, 285–293. Zach, J., and Rose, H. (1988). High-resolution low-voltage electron microprobe with large SE detection eﬃciency, in Proceedings of the Ninth European Congress on Electron Microscopy, Vol. 1, edited by P.J. Goodhew and H.G. Dickinson. Bristol, UK: Institute of Physics, pp. 81–82. Zadrazˇil, M., El-Gomati, M. M., and Walker, A. (1997). Measurements of very-low-energy secondary and backscattered electron coeﬃcients. J. Comput. Ass. Microsc. 9, 123–124. Zadrazˇil, M., and El-Gomati, M. M. (1998a). Measurements of the secondary and backscattered electron coeﬃcients in the very-low-energy range, in Recent Trends in Charged Particle Optics and Surface Physics Instrumentation, Sixth Seminar, edited by I. Mu¨llerova´ and L. Frank. Brno: Czechoslovak Society for Electron Microscopy, p. 82. Zadrazˇil, M., and El-Gomati, M. M. (1998b). Measurements of the secondary and backscattered electron coeﬃcients in the energy range 250–5000 eV, in Proceedings of the Fourteenth International Congress on Electron Microscopy, Vol. 1, edited by H.A. Calderon Benavides and M.J. Yacaman. Bristol, UK: Institute of Physics, pp. 495–496. Zadrazˇil, M., and El-Gomati, M. M. (2002). Unpublished data.

SCANNING LEEM

443

Zobacˇova´, J., and Frank, L. (2003). Specimen charging and detection of signal from nonconductors in a cathode lens equipped SEM. Scanning, 25, 150–156. Zobacˇova´, J., Oral, M., Hutarˇ , O., Mu¨llerova´, I., and Frank, L (2003). Corrections of magniﬁcation and focusing in a cathode lens equipped SEM. To be submitted. Zworykin, V. A., Hillier, J., and Snyder, R. L. (1942). A scanning electron microscope. ASTM Bull. 117, 15–23.

This Page Intentionally Left Blank

ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 128

Scale-Space Methods and Regularization for Denoising and Inverse Problems OTMAR SCHERZER Department of Computer Science, Universita¨t Innsbruck, Technikerstraße 25, A-6020 Innsbruck, Austria

I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . II. Image Smoothing and Restoration via Diﬀusion Filtering A. Level Set Modeling . . . . . . . . . . . . . . . . . . . . B. Morphological Diﬀusion Filtering. . . . . . . . . . . . C. Applications of Diﬀusion Filtering . . . . . . . . . . . D. Scale-Space Theory. . . . . . . . . . . . . . . . . . . . III. Regularization of Inverse Problems . . . . . . . . . . . . . A. Tikhonov-Type Regularization Methods . . . . . . . . B. Regularization Models for Denoising . . . . . . . . . . C. Relations between Regularization and Perona–Malik Diﬀusion Filtering . . . . . . . . . . . . . . . . . . . . D. Numerical Experiments . . . . . . . . . . . . . . . . . IV. Mumford–Shah Filtering . . . . . . . . . . . . . . . . . . . V. Regularization and Spline Approximation . . . . . . . . . VI. Scale-Space Methods for Inverse Problems . . . . . . . . . A. Deblurring with a Scale-Space Method . . . . . . . . . B. Numerical Simulations . . . . . . . . . . . . . . . . . . VII. Nonconvex Regularization Models . . . . . . . . . . . . . A. Perona–Malik Regularization . . . . . . . . . . . . . . B. Relative Error Regularization . . . . . . . . . . . . . . VIII. Discrete BV Regularization and Tube Methods . . . . . . A. Discrete BV Regularization (Sampling) . . . . . . . . . B. Finite Volume BV Regularization . . . . . . . . . . . . C. The Taut String Algorithm . . . . . . . . . . . . . . . D. Multidimensional Discrete BV Regularization . . . . . E. Numerical Test Examples . . . . . . . . . . . . . . . . 1. One-Dimensional Test Example . . . . . . . . . . . 2. Two-Dimensional Bench-Mark Problem . . . . . . IX. Wavelet Shrinkage . . . . . . . . . . . . . . . . . . . . . . A. Daubechies’ Wavelets . . . . . . . . . . . . . . . . . . B. Denoising by Wavelet Shrinkage . . . . . . . . . . . . 1. Relation to Diﬀusion Filtering . . . . . . . . . . . . X. Regularization and Statistics . . . . . . . . . . . . . . . . . XI. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . . .

445

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

446 447 453 455 458 458 460 464 465

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

466 468 472 474 478 481 484 493 493 494 500 502 504 505 506 508 509 510 510 511 514 514 517 522 523 523

Copyright ß 2003 Elsevier Inc. All rights reserved. 1076-5670/2003 $35.00

446

OTMAR SCHERZER

I. INTRODUCTION Inverse problems and imaging are two of the fastest growing areas in applied mathematics. Such problems appear in a variety of applications such as medical imaging and nondestructive evaluation. A typical example is computerized tomography (CT) where the density of a body is determined from X-ray measurements at the boundary. Inverse problems can be vaguely characterized as the problems of estimating the cause for an observed eﬀect; in CT the cause is the density of the body and the observed eﬀect is the X-ray data at the boundary of the object. With inverse problems one typically associates ill-posedness, that is, that there may not exist a solution, the solution is nonunique, or the solution does not depend continuously on the input data. In order to overcome these diﬃculties Tikhonov suggested approximating the ill-posed problem by a scale of well-posed variational problems. This initiated the work on regularization methods for the solution of ill-posed problems. Partial diﬀerential equations (PDEs) have proved to be eﬃcient methods in image processing and computer vision. They are mainly used for smoothing and restoration, in particular noise removal. Their success is partly due to the fact that the approximation is independent of the underlying numerical method. The success of PDE methods in image processing has stimulated the development of new eﬃcient numerical algorithms for the solution of inverse problems by constructing variational methods based on the energy formulations of PDEs. Nowadays the interaction between PDE models and variational formulations is subtle and has led to a fruitful interaction of inverse problems and image processing with splines, wavelets, morphology, and statistics. A goal of this survey is to review these interactions. The second goal of this survey is to compare various reconstruction algorithms. The outline of this work is as follows. In Section II we review image smoothing and restoration with PDEs. We compare several noise removal (denoising) techniques and show the eﬀect of ﬁltering as a prerequisite step of image analysis, such as segmentation. Moreover, we use the analogy of ﬂuid ﬂow to motivate PDEs for diﬀusion ﬁltering. In Section III we review regularization methods for the solution of inverse problems and establish the connection between PDEs and variational methods for denoising. Section IV is devoted to the Mumford–Shah ﬁltering method, which is a combined method for image smoothing and segmentation. In Section V we review the interaction between approximate spline ﬁltering and variational methods. Section VI establishes a diﬀusion framework for the solution of inverse problems, which is linked to Section VII where nonconvex

DENOISING AND INVERSE PROBLEMS

447

variational problems are considered. In Section VIII we introduce a discrete framework for regularization and in Section IX we highlight the relation of variational methods and diﬀusion ﬁltering with wavelets. In Section X we review the interactions of regularization and statistics.

II. IMAGE SMOOTHING

AND

RESTORATION

VIA

DIFFUSION FILTERING

PDE-based models have proved to be eﬃcient in a variety of image processing and computer vision areas such as restoration, denoising, segmentation, shape from shading, histogram modiﬁcation, optical ﬂow, and stereo vision. To demonstrate the eﬃciency of diﬀusion ﬁltering we recall a few models. To this end let u be an image deﬁned on the open domain :¼ ð0, 1Þ ð0, 1Þ. (1) The simplest and best investigated diﬀusion ﬁltering technique for image smoothing is the linear heat equation @u ¼ u, @t

ð1Þ

associated with homogeneous Neumann boundary data @u ¼ 0; @ here and in the following @u=@ denotes the derivative of u in normal direction to the boundary @ of . As initial data we use the input image uð0, xÞ ¼ u ðxÞ for x 2 :

ð2Þ

It is well known that the heat equation blurs the initial data and spurious noise is ﬁltered. Figure 1 shows heat equation ﬁltering of ultrasound data at speciﬁed times. (2) The heat equation is equally eﬃcient in removing noise and destroying image details such as edges and corners. The total variation ﬂow equation is able to preserve edges and denoise the image simultaneously. Here the diﬀerential equation @u 1 ¼r ru @t jruj

ð3Þ

448

OTMAR SCHERZER

50

50

100

100

150

150

200

200

250

250 20

40

60

80

100

120

140

160

180

50

50

100

100

150

150

200

200

250

20

40

60

80

100

120

140

160

180

20

40

60

80

100

120

140

160

180

20

40

60

80

100

120

140

160

180

250 20

40

60

80

100

120

140

160

180

50

50

100

100

150

150

200

200

250

250 20

40

60

80

100

120

140

160

180

FIGURE 1. Solution of the heat equation at time t ¼ 0, 0:1, 1, 10, 100, 1000.

together with homogeneous Neumann data is applied to the initial data u. Here and in the following j j denotes the Euclidean norm. A detailed rigorous mathematical analysis of this partial diﬀerential equation (see [14–17,23]) has been given. The mathematical analysis impressively supports the remarkable properties of this ﬁltering technique (cf. Figure 2). (3) The Bingham ﬂuid ﬂow equation is able to preserve ﬂat regions and denoise the image simultaneously. This ﬁltering technique requires one to solve the diﬀerential equation @u 1 ¼ r ru þ ku @t jruj

ð4Þ

449

DENOISING AND INVERSE PROBLEMS

50

50

100

100

150

150

200

200

250

250 20

40

60

80

100

120

140

160

180

50

50

100

100

150

150

200

200

250

20

40

60

80

100

120

140

160

180

20

40

60

80

100

120

140

160

180

20

40

60

80

100

120

140

160

180

250 20

40

60

80

100

120

140

160

180

50

50

100

100

150

150

200

200

250

250 20

40

60

80

100

120

140

160

180

FIGURE 2. Solution of the total variation ﬂow equation at time t ¼ 0, 0:01, 0:05, 0:1, 0:5, 1.

together with homogeneous Neumann conditions and initial data u. The parameters and k are strictly positive. Bingham ﬂuid ﬂow is a widely investigated model in ﬂuid mechanics (see, e.g., [80]), in which the parameters and k have the physical meaning of yield stress and plastic viscosity. The particular properties of Bingham ﬂuids make them extremely useful for image denoising (see [77]). Figure 3 shows the solution of Equation (4) at speciﬁed times. (4) Linear anisotropic diﬀusion ﬁltering is based on matrix-valued diﬀusivity. Let u be a smooth approximation of u, then a linear anisotropic diﬀusion equation is @u ¼ r ðDðru ÞruÞ, @t

ð5Þ

450

OTMAR SCHERZER

50

50

100

100

150

150

200

200

250

250 20

40

60

80

100

120

140

160

180

50

50

100

100

150

150

200

200

250

20

40

60

80

100

120

140

160

180

20

40

60

80

100

120

140

160

180

20

40

60

80

100

120

140

160

180

250 20

40

60

80

100

120

140

160

180

50

50

100

100

150

150

200

200

250

250 20

40

60

80

100

120

140

160

180

FIGURE 3. Bingham ﬁltering technique with k ¼ 0:05 and ¼ 0:05 at t ¼ 0, 0:2, 2, 10, 100, 1000.

where 1 DðrÞ ¼ 2 jrj þ 22

@ @ 2 , þ I , @y @x ð@=@xÞ ð@=@yÞ

ð6Þ

with > 0. In image processing the diﬀerential Equation (5) is considered with homogenous Neumann boundary conditions ðDðu ÞruÞ ¼ 0 and initial data u.1 u is considered an approximation of the ﬁltered data which can be obtained, for instance, by solving the heat 1 The product of two vectors x ¼ ðx1 , . . . , xn Þ and y ¼ ðy1 , . . . , yn Þ is deﬁned by x y ¼ Pn i¼1 xi yi :

DENOISING AND INVERSE PROBLEMS

451

equation with initial data u up to a certain time. In anisotropic diﬀusion models the matrix Dðru Þ is designed in such a way that the eigenvectors ~1 and ~2 are parallel, respectively orthogonal to ru . These methods prefer diﬀusion along edges to diﬀusion perpendicular to them. (5) Nonlinear anisotropic diﬀusion utilizes a matrix-valued diﬀusivity which itself depends on the solution. A typical example is @u ¼ r ðDðruÞruÞ @t

ð7Þ

where D( ) is as deﬁned in (6). In Figure 4 we have evolved an image according to the nonlinear anisotropic diﬀusion equation. (6) The classical Perona–Malik ﬁlter [134,135] is the oldest nonlinear diﬀusion ﬁlter. It is based on the equation @u 1 ¼r ru @t 1 þ jruj2 =

ð8Þ

with a positive parameter l. In comparison with total variation ﬂow the diﬀusivity is smaller near edges. So far no completely successful mathematical analysis for this model has been obtained. A few results concerning the existence of a solution have been given in [102]. (7) Mean curvature motion is a widely inspected model in applied mathematics describing phenomena such as crystal growth and polymer processing. The mean curvature equation @u 1 ¼ jrujr ru @t jruj

ð9Þ

is a paradigm of morphological diﬀerential equations. The image evolution according to the mean curvature motion is shown in Figure 5. It is common to distinguish between two classes of diﬀusion ﬁltering techniques:

Giving tribute to Perona and Malik [134] for initiating the use of nonlinear diﬀusion ﬁltering we call any of the diﬀerential equations @u ¼ r ðDðruÞruÞ @t

ð10Þ

452

OTMAR SCHERZER

50

50

100

100

150

150

200

200

250

250 20

40

60

80

100

120

140

160

180

50

50

100

100

150

150

200

200

250

20

40

60

80

100

120

140

160

180

20

40

60

80

100

120

140

160

180

20

40

60

80

100

120

140

160

180

250 20

40

60

80

100

120

140

160

180

50

50

100

100

150

150

200

200

250

250 20

40

60

80

100

120

140

160

180

FIGURE 4. Solution of nonlinear anisotropic diﬀusion at time t ¼ 0, 10, 500, 1000, 5000, 10,000; ¼ 104.

Perona–Malik diﬀusion ﬁltering. Prototypes are the heat equation, the total variation, the Bingham model, as well as anisotropic diﬀusion. Typically one diﬀerentiates also between —isotropic Perona–Malik ﬁltering models, where D is a one dimensional function and —anisotropic Perona–Malik, where D is a matrix-valued function with nonzero entries in the oﬀ-diagonals. Morphological partial diﬀerential equations are invariant under image transformation such as gray level modiﬁcation and data set deformation. A paradigm of a morphological diﬀerential equation is the mean curvature ﬂow equation. The use of Perona–Malik diﬀusion ﬁltering can be motivated by showing its analogy to ﬂuid ﬂow.

453

DENOISING AND INVERSE PROBLEMS

50

50

100

100

150

150

200

200

250

250 20

40

60

80

100

120

140

160

180

50

50

100

100

150

150

200

200

250

20

40

60

80

100

120

140

160

180

20

40

60

80

100

120

140

160

180

20

40

60

80

100

120

140

160

180

250 20

40

60

80

100

120

140

160

180

50

50

100

100

150

150

200

200

250

250 20

40

60

80

100

120

140

160

180

FIGURE 5. Solution of the mean curvature equation at time t ¼ 0, 0:1, 1, 10, 100, 1000.

A. Level Set Modeling We consider the movement of a ﬂuid in ¼ ð0, 1Þ ð0, 1Þ over time. The conservation of mass principle (see, e.g., [48]) is expressed by the diﬀerential equation @ ðt, xÞ þ r ð~Þðt, xÞ ¼ 0, @t

ð11Þ

where is the density of the ﬂuid and ~ is the velocity ﬁeld of the ﬂuid. For a ﬁxed time t we consider a level curve of the density. A level curve xðÞ ¼ ðx1 ðÞ, x2 ðÞÞ, parameterized by , in satisﬁes ðt, xðÞÞ ¼ constant for all :

454

OTMAR SCHERZER

Diﬀerentiating this equation with respect to we ﬁnd that @ @x1 @ @x2 ðÞ þ ðÞ ¼ 0: ðt, xðÞÞ ðt, xðÞÞ @x1 @x2 @ @

ð12Þ

The direction of the tangent at the level curve at x() is t~ :¼

ð@x1 =@Þ ðÞ : ð@x2 =@Þ ðÞ

If t~ 6¼ 0 and rðt, xðÞÞ ¼

@ @ ðxðÞÞ 6¼ 0, , @x1 @x2

then from (12) it follows that the vectors r and t~T are orthogonal; here and in the following T denotes the transpose of a vector or matrix. Fick’s law states that the velocity ~ is orthogonal to the level curves, i.e., ~ :¼ Cr with C < 0; the negative sign indicates that the direction of the ﬂow is from regions of high density to regions of low density. Diﬀerent models can be imagined by adequately choosing C: (1) If C ¼ 1/, then we get the diﬀusion ﬁltering technique (1). This choice of C represents the fact that an object in a ﬂuid of higher density moves slower than an object in a ﬂuid of lower density. (2) If C ¼ 1=ðjrjÞ, then we get the diﬀusion ﬁltering technique (3). The choice of C represents the fact that an object in a ﬂuid moves faster at smooth portions of level curves. (3) If C ¼ ð1=ÞDðr Þ the ﬂux is biased both in the tangential direction and in normal direction to the level curve. To derive the analogy of image diﬀusion ﬁltering and ﬂuid ﬂow we identify the gray value image data with the density of a ﬂuid. Since the ﬂuid ﬂow equations have been derived from the conservation of mass principle, we have mean gray value invariance for diﬀusion ﬁltering in image processing, that is Z ðt, xÞ dx ¼ constant over time:

DENOISING AND INVERSE PROBLEMS

455

Without external forces a ﬂuid ﬂow is subsequently simpliﬁed: for instance, we might expect that the entropy of the density increases over time and that ﬁnally the density approaches a constant function. The analogy between ﬂuid ﬂow and image diﬀusion ﬁltering suggests a subsequent simpliﬁcation of the gray value data leading to a scale space of images. In image processing a quantization of these phenomena via Lyapunov functionals has been given by Weickert [172].

B. Morphological Diffusion Filtering Morphological diﬀusion ﬁltering techniques, such as (9), are closely related to shape and curve evolutions. To illustrate this connection we recall the deﬁnition of curvature. Let c : ½0, 2pÞ ! R2 °

x1 ðÞ

!

x2 ðÞ

be a closed parameterized curve in R2, then the standard deﬁnition (see, e.g., [33]) of curvature is K¼

ð@x1 =@Þ ð@2 x2 =@ 2 Þ ð@2 x1 =@ 2 Þ ð@x2 =@Þ , ðð@x1 =@Þ2 þ ð@x2 =@Þ2 Þ3=2

where @ =@ denotes the derivative with respect to the curve parameter . Let C : ½0,1Þ ½0,2pÞ ! R2 ðt, Þ !

x1 ðt, Þ

!

x2 ðt, Þ

be a temporally varying oriented closed curve. We consider the curvaturebased evolution process @C ðt, Þ ¼ ðKÞðt, Þ ðt, Þ, @t

ð13Þ

where denotes the normal vector to the curve C, and is an appropriate scalar-valued function.

456

OTMAR SCHERZER

Let f 2 C 2 ð½0, 1Þ Þ. We assume that the zero level set LðtÞ :¼ fx ¼ ðx1 , x2 Þ : f ðt, xÞ ¼ 0g can be parameterized by a curve C(t, ) which evolves according to (13) and that f is locally invertible in a neighborhood N of the zero-level set, i.e.,

@f @f rf ¼ 6 0 in N : ¼ , @x1 @x2 Then f ðt, Cðt, ÞÞ ¼ 0 for all 2 ½0, 2pÞ and t 2 ½0, 1Þ. Consequently, by diﬀerentiation with respect to and t we get rf ðt, Cðt, ÞÞ

@C @f ðt, Þ þ ðt, Cðt, ÞÞ ¼ 0, @t @t @C rf ðt, Cðt, ÞÞ ðt, Þ ¼ 0, @

ð14Þ

for all 2 ½0, 2pÞ, t 2 ½0, 1Þ. The latter equation shows that rf ðt, Cðt, ÞÞ and the tangential vector on the level curve ð@C=@Þðt, Þ ¼ ð@x1 =@, @x2 =@ÞT ðt, Þ are orthogonal, which implies that rf ðt, Cðt, ÞÞ is proportional to the normal vector ð@x2 =@, @x1 =@ÞT ðt, Þ on the level curve, that is

@f @f , rf ðt, Cðt, ÞÞ ¼ ðt, Cðt, ÞÞ @x1 @x2 @x2 @x1 T , ðt, Þ: ¼ ðt, Þ @ @ Since we assumed that jrf j 6¼ 0 we ﬁnd that diﬀerentiation of (15) with respect to we get

ð15Þ

ðt, Þ 6¼ 0. Then by

2

ðt, Þ

2 @2 x2 @ f @f @2 f @f @ @f ðt, Cðt, ÞÞ ðt, Þ ðt, Þ ¼ þ ðt, Cðt, ÞÞ @ @x1 @ 2 @x21 @x2 @x1 @x2 @x1

2

ðt, Þ

2 @2 x1 @ f @f @2 f @f @ @f ðt, Þ ðt, Þ ¼ þ ðt, Cðt, ÞÞ: ðt, Cðt, ÞÞ þ @ @x2 @ 2 @x22 @x1 @x1 @x2 @x2

DENOISING AND INVERSE PROBLEMS

457

Consequently, we have 1 K¼ jrf j3 Let

(

@f @x2

2

)1 @2 f @2 f @f @f @2 f @f 2 2 þ : @x1 @x2 @x1 @x2 @x22 @x1 @x21

ð16Þ

< 0, then 1T 0sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 11 0 @x2 2 2 ðt, Þ @x1 @x2 A B @ rf C : ðt, Þ ¼ @ þ @ A ¼ @x jrf j @ @ 1 ðt, Þ @

Note that if f(t, ) is monotonically increasing into the interior of a domain ˜ ˜ ðtÞ with boundary L(t), then rf =jrf j points in outside direction of ðtÞ, which implies that < 0. Using the abbreviations Hf for the Hessian of f it follows that curvð f Þ :¼ r ¼

rf jrf j

jrf j2 f rf T Hf rf jrf j3

¼ K:

ð17Þ

This, together with (14), shows that the level set formulation of (13) is @f ðt, xÞ ¼ ðcurvð f Þðt, xÞÞjrf ðt, xÞj: @t

ð18Þ

Examples of curvature-based morphological processes are summarized in the following:

(t) 1 1 t t1/3

morphological process dilation erosion mean-curvature ﬂow aﬃne invariant mean-curvature ﬂow.

Morphological diﬀusion ﬁltering methods have been derived axiomatically in [2–4,37].

458

OTMAR SCHERZER

C. Applications of Diffusion Filtering The design and mathematical analysis (including existence, uniqueness, and stability results for solutions) of diﬀusion ﬁltering techniques are active research areas. Appropriate ﬁltering is important as a prerequisite step for image segmentation and edge detection, to mention but a few applications. In the following we apply Canny’s edge detection algorithm from the software package MATLAB [112] to the ﬁltered ultrasound examples in Figures 1, 2, 4, and 5, respectively. There are various parameters to be tuned in the implementation of Canny’s edge detection algorithm, which might of course have considerable eﬀect on the detected edges. For reasons of comparison we have used the standard MATLAB setting, which does not require input parameters. Canny’s edge detector is a sophisticated algorithm to extract edges in image data. For more background on this method we refer the reader to [36,108]. As can be realized from Figures 6–10, appropriate ﬁltering is an important prerequisite for edge detection. D. Scale-Space Theory Images contain structures at a variety of scales. Any feature can optimally be recognized at a particular scale. This has already been observed in the edge detection example above. If the optimal scale is not available a priori, it is desirable to have an image representation at multiple scales. A scale space is an image representation at a continuum of scales, embedding the image u into a family fTt ðu Þ : t 0g of gradually simpliﬁed versions satisfying: (1) Fidelity: T0 ðu Þ ¼ u : (2) Causality: Ttþs ðu Þ ¼ Tt ðTs ðu ÞÞ for all s, t 0: (3) Regularity: lim Tt ðu Þ ¼ u :

t!0þ

DENOISING AND INVERSE PROBLEMS

459

FIGURE 6. Canny’s edge detector applied to the solution of the heat equation at time t ¼ 0, 0:1, 1, 10, 100, 1000. At t ¼ 1000 the image is so blurred that no edges could be detected.

The diﬀerential equations introduced above satisfy these properties with Tt ðu Þ ¼ uðt,Þ : In mathematics, a family of operators Tt satisfying ﬁdelity, causality, and regularity is called semi-group. For more background on semi-group theory we refer the reader to Pazy [133] (in the linear case) and Bre´zis [32] (in the nonlinear case).

460

OTMAR SCHERZER

FIGURE 7. Canny’s edge detector applied to the solution of the total variation ﬂow equation at time t ¼ 0, 0:01, 0:05, 0:1, 0:5, 1.

III. REGULARIZATION

OF INVERSE

PROBLEMS

A vague characterization of inverse problems is that they are concerned with determining causes for a desired or an observed eﬀect. Such problems appear in a variety of applications like (1) Medical imaging such as CT (see, e.g., [25,94,123,170]). A mathematical framework for CT has been analyzed by Radon [140]. The theory has been applied in other areas including radioastronomy (e.g., [28]) and electron microscopy (e.g., [79]). (2) Signal and Image processing, such as the extrapolation of band-limited functions (see, e.g., [35]).

DENOISING AND INVERSE PROBLEMS

461

FIGURE 8. Canny’s edge detector applied to the solution of the Bingham ﬁltering technique at time t ¼ 0, 0:2, 2, 10, 100, 1000.

It is well known that many inverse problems violate Hadamard’s principle of well-posedness, that is, at least one of the following postulates is violated: (1) There exists a solution. (2) The solution is unique. (3) The solution depends continuously on the input data. If one of these properties is violated the problem is said to be ill-posed or improperly posed. Regularization methods are numerical algorithms for solving ill-posed problems in a stable way. In the linear setting Torre and Poggio [165] emphasize that diﬀerentiation is ill-posed, and that applying suitable regularization strategies approximates linear diﬀusion ﬁltering or—equivalently—Gaussian convolution. Much of the linear-scale-space

462

OTMAR SCHERZER

FIGURE 9. Canny’s edge detector applied to the solution of nonlinear anisotropic diﬀusion at time t ¼ 0, 10, 500, 1000, 5000, 10,000; ¼ 104 .

literature is based on the regularization properties of convolutions with Gaussians. In particular, diﬀerential geometric image analysis is performed by replacing derivatives by Gaussian-smoothed derivatives; see, e.g., [76,106,126,156] and references therein. In order to present a general framework of regularization methods it is convenient to consider an inverse problem as the problem of solving an illposed operator equation F ðuÞ ¼ y0 :

ð19Þ

Here F : DðF Þ X ! Y is an operator deﬁned on an appropriate subset D(F) of a space X.

DENOISING AND INVERSE PROBLEMS

463

FIGURE 10. Canny’s edge detector applied to the solution of the mean curvature motion at time t ¼ 0, 0:1, 1, 10, 100, 1000.

We use the following terminology:

Linear inverse problems: if F is a linear operator. (1) If F ¼ I, the identity operator, then the linear inverse problem is called denoising. (2) If F is a convolution, that is Z F uðxÞ ¼

kðjx yj2 ÞuðyÞ dy,

ð20Þ

with k being a smooth function, and j j being the Euclidean distance, then the problem is referred to as deblurring.

464

OTMAR SCHERZER

(3) If F is the Radon transform (see, e.g., [123]), then the problem of solving (19) is the problem of computerized tomography.

Nonlinear inverse problems: if F is a nonlinear operator.

Regularization methods were ﬁrst considered by Tikhonov in 1930. Since that time regularization theory has developed systematically. (1) During the 1980s there was success in a rigorous analysis of linear ill-posed problems. We mention the books of Louis [107], Groetsch [86], Tikhonov and Arsenin [164], Morozov [116], Nashed [122], Engl and Groetsch [66], Natterer [123,124], Bertero and Boccacci [25], Kirsch [103], and Colton and Kress [52,53,105]. See also Groetsch [83,84] for some elementary introduction in the topic of inverse problems. (2) Since 1989, starting with three fundamental papers of Seidman and Vogel [154] and Engl and co-workers [68,125], regularization theory for nonlinear inverse problems developed systematically. Some expository books on this topic are [21,67,99,116,117], to name but a few. (3) Acar and Vogel [1] and Geman and Yang [78] proposed a novel framework of nondiﬀerentiable regularization of Tikhonov type. This work stimulated the development of regularization methods for eﬃciently recovering discontinuous solutions in inverse problems.

A. Tikhonov-Type Regularization Methods Tikhonov proposed approximating the solution of the operator Equation (19) by the minimizer of the functional (Tikhonov regularization) f ðuÞ :¼ kF ðuÞ y k2Y þ ku u* k2X :

ð21Þ

Here, u* 2 X is some initial (a priori selected) guess on the desired solution and y is an approximation of the right-hand-side data in (19). The classical theory of regularization methods assumes a Hilbert space setting, that is X and Y are Hilbert spaces and that F : DðF Þ X ! Y is (1) continuous and (2) weakly (sequentially) closed, that is for any sequence fun gn2N DðF Þ, xn*X x and F ðxn Þ*Y y imply x 2 DðF Þ and F(x) ¼ y The Hilbert space X and Y are associated with inner products and norms, h,iX

and

h,iY ,

k:kX

and

k:kY ,

DENOISING AND INVERSE PROBLEMS

465

respectively. In almost every application considered in practice Y ¼ L2() with inner product Z fg f,g ¼

is used; typically for X Sobolev spaces of weakly diﬀerentiable functions are used. There exists a variety of results showing that Tikhonov’s approach in fact yields a regularization method, that is (1) there exists minimizer of (21) and (2) for ﬁxed > 0 the minimizers are stable with respect to perturbations in y. (3) Even more, it has been proved that for an appropriate choice of the minimizer is an approximation of the solution of (19).

B. Regularization Models for Denoising For denoising we have F ¼ I and y ¼ u . Tikhonov-type regularization methods for denoising consist in minimizing a functional Z

ðu u Þ2 þ

f ðuÞ :¼

Z gðjruj2 Þ:

ð22Þ

(1) g(t) ¼ t is refered to as H 1 -semi-norm regularization. (2) A popular speciﬁc energy functional arises frompunconstrained total ﬃﬃ variation denoising [1,41,43,46]. Here gðtÞ ¼ t. This method is called BV-semi-norm regularization. (3) The combination of H1- and BV-semi-norm regularization gives pﬃﬃ gðtÞ ¼ kt þ t. This method exhibits similar ﬁltering properties as the Bingham ﬂuid ﬂow. (4) The regularization counterpart to linear anisotropic diﬀusion ﬁltering consists in minimizing the functional Z

ðu u Þ2 þ

Z j1=2 Vruj2 ,

where Dðru Þ ¼ V T V

ð23Þ

466

OTMAR SCHERZER

is the singular value decomposition of D(ru) with " 1=2

¼

1=2 1

0

0

21=2

#

with l1 and l2 the singular values of Dðru Þ. (5) Minimizing the functional in (23) where and V are dependent on u results in the regularization counterpart to nonlinear anisotropic diﬀusion. C. Relations between Regularization and Perona–Malik Diffusion Filtering Let us assume that the functional (22) is deﬁned on the Sobolev space H1(), that is the space of weakly diﬀerentiable functions. Moreover, we assume that there exists a minimizer, which is denoted by u . Then for any h 2 H 1 ðÞ and any real number t the deﬁnition of u implies that f ðu þ thÞ f ðu Þ 0, which is equivalent to Z n

2 2 o u þ th u u u

Z

g jru j2 þ 2tru rh þ t2 jrhj2 g jru j2 0:

þ

If g is twice diﬀerentiable, then by making a Taylor series expansion, we ﬁnd

g jru j2 þ 2tðru rhÞ þ t2 jrhj2

¼ g jru j2 þ 2tru rh þ t2 jrhj2 g0 jru j2 þ Oðt2 Þ

¼ g jru j2 þ 2tðru rhÞg0 jru j2 þ Oðt2 Þ: Therefore, for t > 0 we have Z Z

0 ð2hðu u Þ þ th2 Þ þ 2 ðru rhÞg0 jru j2 þ OðtÞ:

Taking the limit t ! 0þ shows Z Z

hðu u Þ þ ðru rhÞg0 jru j2 : 0

DENOISING AND INVERSE PROBLEMS

467

Simulating the above calculation with t instead of t gives Z 0

hðu u Þ þ

Z

ðru rhÞg0 jru j2 :

Thus, in total, we have Z 0¼

hðu u Þ þ

Z

ðru rhÞg0 jru j2 :

Then, by using Green’s formula, we get Z 0¼

Z ¼

hðu u Þ

Z

r g0 jru j2 ru h þ

hðu u r g0 jru j2 ru Þ þ

Z @

Z @

g0 jru j2 ðru Þh

hg0 ðjru j2 Þðru Þ,

ð24Þ

where denotes the unit normal vector on @. If g0 > 0, then since (24) holds for all h 2 H 1 ðÞ, we ﬁnd

u u ¼ r g0 jru j2 ru on , @u ¼ ðru Þ on @: 0¼ @

ð25Þ

In particular, setting ¼ t, u(0) ¼ u, u(t) ¼ u shows that Tikhonov regularization with small regularization parameter t provides an approximation of the solution of the diﬀusion ﬁltering method

@u ¼ r g0 jruj2 ru on , @t @u ¼ 0 on @ , @ uð0Þ ¼ u on , at time t. In other words, the regularization parameter and the diﬀusion time can be identiﬁed if regularization is regarded as time-discrete diﬀusion ﬁltering with a single implicit time step [115,138,139,145,148,158]. Moreover,

468

OTMAR SCHERZER

iterated regularization with small regularization parameters, consisting in subsequently minimizing the functionals f ðkÞ ðuÞ :¼

Z

ðu uðk1Þ Þ2 þ

Z gðjruj2 Þ , k ¼ 1, 2 . . .

ð26Þ

and denoting the minimizers by u(k) approximates a diﬀusion process. The basic connection between regularization and diﬀusion ﬁltering methods is the basis of both practical considerations and fundamental mathematical theory, such as (nonlinear) semi-group theory (see [31,133]). D. Numerical Experiments The numerical experiments presented below have been considered in [138,139,148] and illustrate the behavior of diﬀerent regularization strategies. For more details on the numerical implementation we refer to these papers. Figure 11 shows three common test images and a noisy variant of each of them: an outdoor scene with a camera, a magnetic resonance (MR) image of a human head, and an indoor scene. Gaussian noise with zero mean has been added. Its variance was chosen to be a quarter, equal to, and four times the image variance, respectively. We applied linear and total variation regularization to the three noisy test images, used 1, 4, and 16 regularization steps, and varied the regularization parameter until the optimal restoration was found. Discretizing stabilized total variation regularization with gðxÞ ¼

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

2 þ x

leads to a nonlinear system of equations. The system of nonlinear equations was solved numerically for ¼ 0.1 by combining convergent ﬁxed point iterations as outer iterations [62] with inner iterations using the Gauss– Seidel algorithm for solving the linear system of equations. The results are shown in Figures 12 and 13. Figure 14 shows BV-denoised and rendered 3D ultrasound data. This gives rise to the following conclusions (Figure 15):

In all cases total variation (BV) regularization performed better than Tikhonov regularization. As expected, total variation regularization leads to visually sharper edges. The BV-restored images consist of piecewise almost constant patches.

DENOISING AND INVERSE PROBLEMS

469

FIGURE 11. Test images. Top left: Camera scene. Top right: Gaussian noise added. Middle left: Magnetic resonance image. Middle right: Gaussian noise added. Bottom left: Oﬃce scene. Bottom right: Gaussian noise added.

In the linear case, iterated Tikhonov regularization produced better restorations than noniterated. Visually, noniterated regularization results in images with more high-frequency ﬂuctuations. Improvements caused by iterating the regularization were mainly seen between 1 and 4

470

OTMAR SCHERZER

FIGURE 12. Optimal restoration results for H 1 -regularization (cf. (22)). Top left: Camera, 1 iteration. Top right: Camera, 16 iterations. Middle left: MR image, 1 iteration. Middle right: MR image, 16 iterations. Bottom left: Oﬃce, 1 iteration. Bottom right: Oﬃce, 16 iterations.

iterations. Increasing the iteration number to 16 hardly leads to further improvements. It appears that the theoretical and experimental results in the linear setting do not necessarily carry over to the nonlinear case with total variation regularization. For the slightly degraded camera

DENOISING AND INVERSE PROBLEMS

471

FIGURE 13. Optimal restoration results for total variation regularization. Top left: Camera, 1 iteration. Top right: Camera, 16 iterations. Middle left: MR image, 1 iteration. Middle right: MR image, 16 iterations. Bottom left: Oﬃce, 1 iteration. Bottom right: Oﬃce, 16 iterations.

image, iterated regularization performed worse than noniterated regularization. For the MR image, the diﬀerences are negligible, and the highly degraded oﬃce scene allows better restoration results with iterated regularization.

472

OTMAR SCHERZER

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ FIGURE 14. Bounded variation seminorm denoising with gðtÞ ¼ t þ 2 and ¼ 0:001 of three-dimensional ultrasound data (top). The left column shows the renderings for noniterated, the right column for iterated regularization. The regularization parameter for iterated regularization was ¼ 2.

IV. MUMFORD–SHAH FILTERING The Mumford–Shah ﬁltering technique has been proposed in [118] for simultaneous ﬁltering and edge detection of noisy piecewise continuous data. Since then the Mumford–Shah technique has received considerable interest, theoretically, due to the challenging mathematics involved (see, e.g.,

473

DENOISING AND INVERSE PROBLEMS

FIGURE 15. Results for the MR image from Figure 11(a) with noniterated and iterated regularization ( ¼ 0:001). The left column shows the results for noniterated, the middle column for iterated regularization. The images in the right column depict the modulus of the diﬀerences between the results for the iterated and noniterated method.

[6–10,115] and references therein), for segmentation applications, and its numerical implementation (see, e.g., [29,30,38,39,59,60]). Formally the Mumford–Shah segmentation model looks like a regularization functional, and consists in minimizing the functional Z f ðu, KÞ :¼

2

jruj þ 2 H ðKÞ :

Z

ðu u Þ þ 1

2

1

ð27Þ

nK

Here, 1 > 0, 2 > 0, and K is the discontinuity set (edges and corners) of u, which is assumed to be of ﬁnite one-dimensional Hausdorﬀ measure, i.e., H1 ðKÞ < 1; n K denotes the set excluded by the discontinuity set K. For instance for a rectiﬁable curve, the one-dimensional Hausdorﬀ measure is the length of the curve. The minimizer u :¼ u 1 , 2 is the ﬁltered data with discontinuity set K :¼ K 1 , 2 .

474

OTMAR SCHERZER

R The functional 1 nK jruj2 þ 2 H1 ðKÞ serves as a penalization functional and is designed to simultaneously penalize for (1) high oscillations of the ﬁltered data u outside the discontinuity set and (2) complex (long) discontinuity sets K. For the numerical minimization of f(u, K) a common tool is to use non-local approximations, like the Ambrosio–Tortorelli approximation (see [9–11]), where the minimizer of f(u, K) is approximated by the minimizer of the functional Z

2

f ðu, wÞ :¼

Z

ðu u Þ þ 1

Z 1 2 2 w jruj þ 2

jrwj þ ð1 wÞ : ð28Þ

2

2

This functional is minimized with respect to ðu, wÞ 2 H 1 ðÞ H 1 ðÞ and no longer involves tedious Rminimization over a family of discontinuity sets K. The functional ð1= Þ ð1 wÞ2 in (28) penalizes for w 6¼ 1. Eventually, for ! 0þ this term becomes dominant and the set where w 6¼ 1 becomes one-dimensional, e.g., a curve with ﬁnite length. For ! 0þ the set fw 6¼ 1g eventually becomes the discontinuity set K of the minimizer of the Mumford–Shah functional (27). The minimizer (u, w) of the functional (28) satisﬁes the optimality condition, which is a system of coupled partial diﬀerential equations ðu u Þ 1 r ðw2 ruÞ ¼ 0, wjruj2

2

2 ð1 wÞ ¼ 0, w 1 1

ð29Þ

together with homogeneous Neumann boundary data for both u and w. Figure 16 shows some numerical simulations for Mumford–Shah segmentation and ﬁltering by solving the system of coupled diﬀerential equations (29).

V. REGULARIZATION

AND

SPLINE APPROXIMATION

So far, the regularization models have been presented in an inﬁnite dimensional setting. In this section we review a relation between regularization and cubic spline approximation, by using a semi-inﬁnite dimensional setting.

DENOISING AND INVERSE PROBLEMS

475

FIGURE 16. Top: Test data u . Bottom left: u solving (29). Bottom right: w approximating the discontinuity set.

Suppose u0 is a smooth function on 0 x 1 and noisy samples ui of the values u0 ðxi Þ are known at the points of a uniform grid ¼ f0 ¼ x0 < x1 < < xn ¼ 1g: Let h ¼ xiþ1 xi be the mesh size of the grid and suppose jui u0 ðxi Þj ,

ð30Þ

where is a known level of noise in the data. For the sake of simplicity of presentation we assume that the boundary data are known exactly: u0 ¼ u0 ð0Þ and

un ¼ u0 ð1Þ:

476

OTMAR SCHERZER

We are interested in ﬁnding a smooth approximation @u/@x of @u0/@x in (0, 1), from the given data ui . To make the computations concrete we have to quantize the terminology ‘‘smooth,’’ which we characterize by the size of the second derivative, i.e., we consider a function to be smooth if the second derivative is small. Then, this approximation problem can be formulated as a constraint optimization problem. R1 2 Problem 5.1 Minimize 0 @2 u=@x2 among all smooth functions u satisfying uð0Þ ¼ u0 ð0Þ, uð1Þ ¼ u0 ð1Þ, and n1 1 X ðu uðxi ÞÞ2 2 : n 1 i¼1 i

ð31Þ

Then, take the derivative @u* =@x of the minimizing element u* as an approximation of @u0 =@x. In fact, given the uncertainty in the data, all functions u satisfying (31) can be considered as solution candidates. The minimizer of Problem 5.1 is the particular candidate that is ‘‘smoothest.’’ If the minimizing element u* of Problem 5.1 satisﬁes the constraint (31) with strict inequality (i.e., the constraint (31) is inactive) then u* ðxÞ ¼ u0 ð0Þ þ xðu0 ð1Þ u0 ð0ÞÞ,

ð32Þ

i.e., it is the straight line interpolating the two boundary values. This case occurs if and only if u* satisﬁes the constraint (31). Excluding this trivial case, the minimizer u* satisﬁes (31) with equality and hence can be calculated using the method of Lagrange. If 1= denotes the corresponding Lagrange multiplier for constraint (31), the equivalent formulation of Problem 5.1 is: Problem 5.2 Minimize f ðuÞ :¼

n1 1 X ðu uðxi ÞÞ2 þ n 1 i¼1 i

Z

@2 u @x2

2 ð33Þ

among all smooth functions u satisfying uð0Þ ¼ u0 ð0Þ, uð1Þ ¼ u0 ð1Þ, where is such that the minimizing element u of (33) satisﬁes n1 1 X ðu u ðxi ÞÞ2 ¼ 2 : n 1 i¼1 i

ð34Þ

DENOISING AND INVERSE PROBLEMS

477

The derivative @u /@x of the minimizing function u is then an approximation of @u0/@x. The model essentially diﬀers from the regularization models considered in Section III, since here discrete sample data are available. Also the scope is to ﬁnd a smooth approximation of the derivative of u and not just a denoised approximation of u as in the regularization methods considered in Section III.B. Problem 5.2 is a special instance of Tikhonov regularization. The way of choosing the regularization parameter in Problem 5.2 is called the discrepancy principle [87]. Except for the interpolatory constraints at the boundary of the interval, (33) has been investigated and solved by Schoenberg [151] and Reinsch [142], who showed that the solution of Problem 5.2 is a natural cubic spline over the grid . Reinsch also gives a constructive algorithm for calculating this spline. A more comprehensive level on the interaction between cubic spline approximation and numerical diﬀerentiation can be found in [92], see also Hanke [90]. The interaction between regularization and spline approximation is not limited to cubic splines. In fact it can be shown that the optimal solution u of the functional Z m 2 X @ u 2 ðui uðxi ÞÞ þ with m ¼ 1, 2, . . . m @x i2Z is a combination of B-splines of order n ¼ 2m 1,2 i.e., X ui 2m1 ðx kÞ, u ðxÞ ¼ k2Z

where 2m1 ðÞ denotes the B-spline of order 2m 1, which is deﬁned as follows

2m1 ¼ 0 * 0 * . . . * 0 , |ﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ} 2m times

where

* denotes convolution and

8 > < 1, 0

¼ 12 , > : 0,

2

12 < x < 12 , jxj ¼ 12 , otherwise:

Z denotes the set of integer numbers, . . . , 1, 0, 1, . . .

478

OTMAR SCHERZER

A survey on the interaction of splines and regularization can be found in Unser [168]. The topic of numerical diﬀerentiation has been studied extensively before (see, e.g., Torre and Poggio [165], Murio [119], Groetsch [82–84]). For more background on the topic of spline approximation we refer to Schoenberg [151], Reinsch [142], Schultz [152], Strang and Fix [157], de Boor [58], Schumaker [153], Rice and Rosenblatt [143], and Wahba [169] to mention a few. VI. SCALE-SPACE METHODS FOR INVERSE PROBLEMS As has been shown in [147] the concept of diﬀusion ﬁltering cannot be used directly for the solution of ill-posed operator equations. The argumentation is outlined below. For the moment we restrict our attention to Tikhonov functionals deﬁned on the Sobolev space of diﬀerentiable functions H 1 ðÞ where the stabilization term ku u* k2X in (21) is replaced by Z gðjruj2 Þ :

Then, arguing as in Section III.C the minimizer u of (22) satisﬁes for any h 2 H 1 ðÞ and any real number t f ðu þ thÞ f ðu Þ 0, which is equivalent to Z

2 2 F ðu þ thÞ y F ðu Þ y Z

g jru j2 þ 2tðru rhÞ þ t2 jrhj2 g jru j2 0:

þ

A Taylor series expansion of ðF ðu þ thÞ y Þ2 gives ðF ðu þ thÞ y Þ2 ¼ ðF ðu Þ y Þ2 þ 2tðF 0 ðu ÞhÞðF ðu Þ y Þ þ Oðt2 Þ: Then by similar arguments as in Section III.C we ﬁnd Z

0

0¼

Z

ðru rhÞg0 jru j2 :

ðF ðu ÞhÞðF ðu Þ y Þ þ

DENOISING AND INVERSE PROBLEMS

479

Using Green’s formula, we get Z

2

0

Z

r g0 jru j2 ru h

hF ðu Þ ðF ðu Þ y Þ

0¼

Z

þ

@

hg0 jru j2 ðru Þ:

Here F 0 ðu Þ2 denotes the L2 -adjoint of F 0 ðu Þ, i.e., Z

F 0 ðu Þ2 ðÞw ¼

Z

F 0 ðu ÞðwÞ for all 2 L2 ðÞ, w 2 H 1 ðÞ:

This shows that the optimality criterion for the minimizer u of (21) is F 0 ðu Þ2 ðF ðu Þ y Þ ¼ r ðg0 ðjru j2 Þru Þ on , @u ¼ 0 on @: @ In the case of noise-free attainable data, that is for y ¼ y0 ¼ F ðuy Þ, we have 0

F ðu Þ

2

F ðu Þ F ðuy Þ ¼ r ðg0 ðjru j2 Þru Þ

and there exists an associated diﬀusion-type methodology F 0 ðu Þ2 F 0 ðuÞ

@u ¼ r ðg0 ðjruj2 ÞruÞ on ð0, 1Þ , @t @u ¼ 0 on ð0, 1Þ @, @

uð0Þ ¼ uy on :

ð35Þ

Due to the ill-posedness of the operator Equation (19) there will generally not exist a solution of (19) when y0 is replaced by y 6¼ y0 . The ill-posedness thus prohibits an a priori estimation of an approximation of uy . Thus method (35) is inappropriate for calculating a scale space of an inverse problem. The relation to diﬀusion ﬁltering becomes apparent if we use F ¼ I, Y ¼ L2 ðÞ, the space of square integrable functions, X ¼ H 1 ðÞ, the Sobolev space of weakly diﬀerentiable functions, and the H 1 ðÞ-seminorm for regularization, that is gðxÞ ¼ x.

480

OTMAR SCHERZER

In this setting the minimizer u of the Tikhonov functional satisﬁes u u ¼ u on , @u ¼ 0 on @: @ Thus for ! 0þ the diﬀusion ﬁltering Equation (1) is approximated. The iterative Tikhonov–Morozov method is a variant of Tikhonov regularization for solving inverse problems. This method consists in iteratively minimizing the sequence of functionals f ðkÞ ðuÞ :¼ kF ðuÞ y k2Y þ k ku uðk1Þ k2X , k ¼ 1, 2, . . .

ð36Þ

and denoting the minimizer by uðkÞ . If the functionals f ðkÞ are convex, then the minimizers uðkÞ satisfy F 0 ðuÞXY ðF ðuðkÞ Þ y Þ þ k ðuðkÞ uðk1Þ Þ ¼ 0,

k ¼ 1, 2, . . .

ð37Þ

Here F 0 ðuÞXY denotes the adjoint of F 0 ðuÞ with respect to the spaces X and Y, that is

F 0 ðuÞXY ðÞ, w X ¼ , F 0 ðuÞðwÞ Y for all 2 Y, w 2 X:

Typically in the Tikhonov–Morozov method one sets uð0Þ ¼ 0. But any other choice is suitable as well. For example, a priori information on the solution may be incorporated in the initial approximation uð0Þ . Taking k ¼ 1=ðtk tk1 Þ shows that uðkÞ and uðk1Þ can be considered as approximations of the solution u of the asymptotic Tikhonov–Morozov ﬁltering technique @u ¼ F 0 ðuÞXY ðF ðuÞ y Þ in ð0, 1Þ , @t uð0, Þ ¼ uð0Þ ¼ 0 in :

ð38Þ

For F ¼ I, the embedding operator from H 1 ðÞ into L2 ðÞ, the iterative Tikhonov–Morozov method, where we use the H 1 -seminorm for regularization instead of the full norm, generates minimizers uðkÞ of the functionals f

ðkÞ

Z

2

ðuÞ :¼

Z

ðu u Þ þ k

jru ruðk1Þ j2 , k ¼ 1, 2, . . .

ð39Þ

DENOISING AND INVERSE PROBLEMS

481

Accordingly, the asymptotic Tikhonov–Morozov method consists in solving the diﬀerential equation of third order u u ¼

@u in ð0, 1Þ , @t

@u ¼ 0 on ð0, 1Þ @, @ uð0, Þ ¼ 0 on :

ð40Þ

Figure 17 shows the evolution of the solution of the diﬀerential equation (40). It starts with a completely diﬀused image; at t ¼ 1 the input data is restored. In analogy to scale-space theory (cf. Section II.D) we call this method the inverse scale-space method, since it generates a data representation at a continuum of scales, embedding the input data u into a family of gradually simpliﬁed versions initialized with a totally blurred imaged. In Section VI.A we discuss the asymptotic Tikhonov–Morozov method for deblurring images. In this case, F is a linear integral operator. For this particular model problem we can motivate preferences of diﬀerent numerical methods in inverse problems and image processing. A. Deblurring with a Scale-Space Method We consider a problem of deblurring data to recover a function uy on ¼ ð0, 1Þ2 given (blurred) data y ¼ F uy þ noise :¼

Z

kðj yjÞuy ðyÞ dy þ noise

on . To formulate the Tikhonov–Morozov method we have to specify a similarity measure for the data and an appropriate function space containing uy . In this section we restrict our attention to those uy in one of the following three spaces: (1) The Sobolev space H 1 ðÞ, that is the Hilbert space of weakly diﬀerentiable functions u that satisfy Z kukH 1 :¼

1=2 jruj2 þ !juj2

<1

with an appropriate positive weighting parameter ! > 0.

482

OTMAR SCHERZER

FIGURE 17. The result of nonstationary regularization (39) and regularization parameters 1 ¼100,000, 2 ¼50,000, 3 ¼25,000, 4 ¼12,500, 5 ¼6250, 6 ¼3125, 7 ¼1500, 8 ¼750, 9 ¼300, 10 ¼150, 11 ¼75, and 12 ¼30 (the last image is not visually diﬀerent to the input data).

(2) The more general Banach space W 1,p ðÞ, with p > 1, of functions u satisfying Z 1=p p p kukW 1,p :¼ jruj þ !juj < 1,

with an appropriate positive weighting parameter ! > 0. (3) The space BVðÞ of functions of bounded variation. That is the class of functions u satisfying Z kukBVðÞ :¼ ðjruj þ !jujÞ < 1:

DENOISING AND INVERSE PROBLEMS

For a function u 2 BVðÞ the term measure (see [73]).

R

483

jruj has to be understood as a

An appropriate choice for the similarity measure is the L2 ðÞ-norm. Depending on a priori information on uy it is instructive to study the Tikhonov–Morozov method in a variety of settings.

If uy 2 H 1 ðÞ, it is appropriate to consider F as an operator from H 1 ðÞ into L2 ðÞ. Accordingly, the iterated Tikhonov–Morozov method consists in minimizing fHðkÞ1 ðuÞ :¼ kF u y k2L2 ðÞ þ k ku uðk1Þ k2H 1 ðÞ :

ð41Þ

Instead of the H 1 -norm the H 1 -seminorm can be used if F does not annihilate constant functions. In particular, for denoising images, that is if F ¼ I the H 1 -seminorm is suitable. For ill-posed problems, such as deconvolution problems, this seminorm may lead to some numerical diﬃculties. For uy 2 W 1,p ðÞ, p > 1, the corresponding Tikhonov–Morozov method consists in minimizing the functional ðkÞ 2 ðk1Þ p kW 1,p ðÞ : fW 1,p ðuÞ :¼ kF u y kL2 ðÞ þ k ku u

ð42Þ

For uy 2 BVðÞ the Tikhonov–Morozov method consists in minimizing ðkÞ ðuÞ :¼ kF u y k2L2 ðÞ þ k ku uðk1Þ kBVðÞ : fBV

ð43Þ

Since the operator F is self-adjoint on L2 ðÞ, that is F 2 ¼ F, the asymptotic Tikhonov–Morozov method in the H 1 -setting reads as follows ðF F uÞðt, xÞ ðF y ÞðxÞ ¼ ð !IÞ

@u ðt, xÞ for ðt, xÞ 2 ð0, 1Þ , @t

@u ðt, xÞ ¼ 0 for ðt, xÞ 2 ð0, 1Þ @, @ uð0, xÞ ¼ 0 for x 2 :

ð44Þ

ðkÞ The minimizer uðkÞ of fW 1,p has to satisfy

p F ðF uðkÞ y Þ ¼ k r jrðuðkÞ uðk1Þ Þjp2 rðuðkÞ uðk1Þ Þ 2 p k !juðkÞ uðk1Þ jp2 ðu uðk1Þ Þ: 2

ð45Þ

484

OTMAR SCHERZER

Introducing the relation k ¼

2 1 p ðtk tk1 Þp1

ð46Þ

between the regularization parameters and the time discretization we derive the asymptotic Tikhonov–Morozov method on W 1,p ðÞ: p2 ! p2 @u @u @u @u r : F ðF u y Þ ¼ r r ! @t @t @t @t

ð47Þ

For p ¼ 1 the relation (46) degenerates, indicating that there is no asymptotic integro-diﬀerential equation for the Tikhonov–Morozov method on BVðÞ. One of the most signiﬁcant diﬀerences between diﬀusion ﬁltering and iterative Tikhonov–Morozov regularization is that a small timestep size in the diﬀusion ﬁltering method results in very large regularization parameters. This is not inconsistent with standard regularization theory since we consider an iterative regularization technique which uses the information of the previous iteration cycle. In our numerical simulations an exponentially decreasing sequence k for the iterative regularization algorithms (41)–(43) leads to a visually attractive image sequence. This, in turn, implies that the time steps tk of the diﬀusion ﬁltering method (47) are exponentially increasing. This compensates for the fact that in the beginning the diﬀusion process is rather strong and a small step size is required. As the diﬀusion progresses the image starts to stagnate and a large timestep size becomes appropriate. B. Numerical Simulations The following test cases have been considered in [147]. We discuss the numerical implementation of the asymptotic Tikhonov–Morozov method and present some numerical simulations for deblurring images. In the numerical simulations presented below we have used the kernel function kðtÞ ¼

ðt2 "2 Þ4 for t 2 ½", " "8

and

kðtÞ ¼ 0 otherwise:

For the numerical solution of the integro-diﬀerential Equation (44) we discretize in time and use a ﬁnite element ansatz of products of linear splines on . Let ðtk , x1 , x2 Þ ¼

N X i,j¼0

cij ðtk Þij ðx1 , x2 Þ

485

DENOISING AND INVERSE PROBLEMS

be the approximation of the solution of (44) where ij ðx1 , x2 Þ ¼ i ðx1 Þj ðx2 Þ and i is a spline of order 1, that is i ð j=nÞ ¼ ij for i ¼ 0, . . . , N and i is piecewise linear on ½0, 1 . For the approximation of the time derivative of we use a backward diﬀerence operator, that is ðtk , xÞ ðtk1 , xÞ @ ðtk , xÞ: tk tk1 @t Using k ¼ 1=ðtk tk1 Þ the discretized system for an approximation of (44) at time tk requires solving the following linear equation for the coeﬃcients cij ðtkþ1 Þ from given coeﬃcients cij ðtk Þ Z X X cij ðtkþ1 ÞðFij,kl þ k I !ij,kl Þ ¼ y F ðk l Þ þ k cij ðtk ÞI !ij,kl

ij

ð48Þ

ij

for all l, k 2 f0, . . . , Ng. Here

h i Z I ! ¼ I !ij,kl ¼ ði Þx1 ðk Þx1 ðj Þx2 ðl Þx2 þ !i k j l

ij,kl

and

F ¼ Fij,kl

Z

ij,kl

¼

F ði j ÞF ðk l Þ

: ij,kl

The solution of the unregularized Equation (48) (that is with k ¼ 0) is illconditioned. This becomes clear when the singular values of the matrix F are plotted (cf. Figure 18); most of the singular values are comparatively small. Errors in components of the data corresponding to singular functions with singular value near zero are then exceedingly ampliﬁed. Thus, it is prohibitive to calculate the solution of the unregularized equation. Example VI.1 In the ﬁrst example we aim to reconstruct the pattern (top image in the ﬁrst row of Figure 19) from the blurred and additionally noisy data (cf. Figure 19). Figures 20–22 show the inverse scale-space method for reconstructing the pattern from blurred data. When the blurred data is additionally distorted with Gaussian noise the ill-posedness of the problems

486

OTMAR SCHERZER

becomes apparent. Only for a relatively short period of time is the reconstruction visually attractive. For t ! 1 the reconstruction becomes useless. This eﬀect is more signiﬁcant the more error we have in the data as a comparison of Figures 20–22 shows. One of the major concerns in regularization theory is the estimation of appropriate regularization parameters needed to stop the iteration process before the image becomes hopelessly distorted by noise. For some references on appropriate stopping rules for the Tikhonov–Morozov method we refer the reader to [85,88,91,145]. Example VI.2 Here we aim to compare the Tikhonov–Morozov method on H 1 ðÞ and BVðÞ. We have chosen a piecewise constant function on a rectangle as a paradigm of a function that is in BVðÞ but not in H 1 ðÞ (cf. Figure 23).This has the eﬀect that the reconstruction with the (asymptotic) Tikhonov–Morozov method on H 1 ðÞ always has a blurry character (cf. Figure 24). Figure 25 shows the reconstruction with the Tikhonov–Morozov method on BVðÞ. This method performs worse than the asymptotic Tikhonov–Morozov method on H 1 ðÞ. This numerically supports the fact that there is no inverse scales space method on BVðÞ. This section has been devoted to highlighting the controversial behavior of scale-space methods for the solution of inverse problems and image smoothing and restoration. One of the signiﬁcant diﬀerences in inverse scale space theory for inverse problems is the choice of an adequate stopping

FIGURE 18. The singular values of the matrix F.

DENOISING AND INVERSE PROBLEMS

487

FIGURE 19. Top: The test pattern. This pattern is aimed to be recovered from the blurred data (middle left), the blurred data which is additionally distorted with medium noise (middle right), and distorted with high noise (bottom).

488

OTMAR SCHERZER

FIGURE 20. Reconstruction from blurred data without noise by the inverse scale-space method (44). The images show the solution u of (44) at speciﬁed time with exponentially decreasing time-steps. At a certain time the test pattern can be completely recovered. The inverse scale-space methods stagnated at the test pattern.

DENOISING AND INVERSE PROBLEMS

489

FIGURE 21. Reconstruction from blurred data with medium noise using the inverse scalespace method (44). The images show the solution u of (44) at speciﬁed time. Top left shows the optimal time for recovery with medium noise. After that time the reconstruction gets worse, showing the importance of determining an optimal stopping time for the inverse scale-space method.

490

OTMAR SCHERZER

FIGURE 22. Reconstruction from blurred data with high noise using the inverse scale-space method (44). Middle right shows the optimal time for recovery with high noise. After that time the reconstruction algorithm diverges extremely fast (cf. the scales of the images).

DENOISING AND INVERSE PROBLEMS

491

FIGURE 23. Test-data for comparing the Tikhonov–Morozov method on H 1 ðÞ and BVðÞ. Left: Image to be reconstructed. Right: The available blurred data, from which we intend to recover the left image.

FIGURE 24. Reconstruction with the asymptotic Tikhonov–Morozov method on H 1 ðÞ at speciﬁed time.

492

OTMAR SCHERZER

FIGURE 25. Reconstruction with the asymptotic Tikhonov–Morozov method on BVðÞ at speciﬁed time.

DENOISING AND INVERSE PROBLEMS

493

time; after a certain time the noise is considerably ampliﬁed in the reconstruction. This is not an issue in image smoothing, where the eﬀect of noise is weakened over time. We also remark that for image smoothing and restoration the total variation ﬂow ﬁltering in almost all documented cases performed signiﬁcantly better than the heat equation. This is not always true for inverse problems. VII. NONCONVEX REGULARIZATION MODELS In Section III.C we considered regularization functionals of the general form Z Z gðruÞ: f ðuÞ :¼ ðu u Þ2 þ

The existence of a minimizer is relatively easy to establish under the essential assumptions that g is convex with respect to the gradient variable ru and the functional is coercive (see, e.g., Dacorogna [54,55] and Aubert and Kornprobst [19]). The analysis of regularization functionals becomes considerably more involved if the functional f is nonconvex. Such models are outlined below. A. Perona–Malik Regularization In the classical Perona–Malik ﬁlter [134,135] we have DðruÞ ¼

1 : 1 þ jruj2

The corresponding variational technique consists in minimizing the functional Z Z Z ðu u Þ2 þ lnð1 þ jruj2 Þ ¼: g^ ðu,ruÞ:

The function g^ is nonconvex with respect to the variable ru. In this case it is well known from the calculus of variations (see, e.g., [55]) that the optimization problem is not well-posed in the sense that there need not exist a minimizer. Therefore, additional regularization concepts are involved in the functional, such as Z Z fRPM ðuÞ :¼ ðu u Þ2 þ lnð1 þ jrL uj2 Þ, ð49Þ

where L is a linear convolution operator with a smooth kernel.

494

OTMAR SCHERZER

The minimizer of the regularized Perona–Malik functional satisﬁes ! rL u * u u ¼ L r : ð50Þ 1 þ jrL uj2 The corresponding nonlinear diﬀusion process associated with this regularization technique is ! @u rL u * ¼ L r : ð51Þ @t 1 þ jrL uj2 Regularized Perona–Malik ﬁlters have been considered in the literature [22,37,130,171,172]. Catte´ et al. [37], for instance, investigated the nonlinear diﬀusion process ! @u ru ¼r : ð52Þ @t 1 þ jrL uj2 This technique (as well as other previous regularizations) does not have a corresponding formulation as an optimization problem. In an experiment we juxtapose the regularizations (51) and (52) of the Perona–Malik ﬁlter. Both processes have been implemented using an explicit ﬁnite diﬀerence scheme. The results using the MR image from Figure 11 are shown in Figure 26, where diﬀerent values for , the standard deviation of the Gaussian, have been used. For small values of , both ﬁlters produce rather similar results, while larger values lead to a completely diﬀerent behavior. For (51), the regularization smoothes the diﬀusive ﬂux, so that it becomes close to zero everywhere, and the image remains unaltered. The regularization in (52), however, creates a diﬀusivity which gets closer to one for all image locations, so that the ﬁlter creates blurry results resembling linear diﬀusion ﬁltering. B. Relative Error Regularization The noise in data detected with common measurement devices frequently correlates with the exact data. Here, relevant situations are when the noise locally correlates with the amplitude or the variation of the data. Assuming correlation between the data and noise we are led to ﬁt-to-data terms of the form Z Z 1 ðu u Þ2 1 ju u jp , p ¼ 1, 2, . . . 2 jujp p jrujp1

DENOISING AND INVERSE PROBLEMS

495

FIGURE 26. Comparison of two regularizations of the Perona–Malik ﬁlter (t ¼ 250). Top left: Filter (51), ¼ 0:5. Top right: Filter (52), ¼ 0.5. Middle left: Filter (51), ¼ 2. Middle right: Filter (52), ¼ 2. Bottom left: Filter (51), ¼ 8. Bottom right: Filter (52), ¼ 8.

496

OTMAR SCHERZER

FIGURE 27. Correlated noiseR in 1D signals. Top left: Noise-free data. Top right: R UncorrelatedR noise. Middle left: ½ðu u Þ2 =juj ¼R 2 . Middle right: ½ðu u Þ2 =juj2 ¼ 2 . Bottom left: ½ðu u Þ2 =jruj ¼ 2 . Bottom right: ½ðu u Þ4 =jruj3 ¼ 2 .

In Figures 27 and 28 we have plotted noisy data revealing the diﬀerence between uncorrelated and correlated noise. We concentrate on Tikhonov type regularization models with BV-seminorm stabilizing functional. This leads to regularization models of the form (relative error regularization): 1 2

Z

ðu u Þ2 þ jujp

Z jruj;

1 p

Z

ju u jp þ p1 jruj

Z jruj:

In order to put this work into context with diﬀusion ﬁltering techniques it is convenient to consider iterative relative error regularization.

497

DENOISING AND INVERSE PROBLEMS 10

10

20

20

30

30

40

40

50

50

60

60

70

70

80

80

90

90

100

100 10

20

30

40

50

60

70

80

90

100

10

10

20

20

30

30

40

40

50

50

60

60

70

70

80

80

90

10

20

30

40

50

60

70

80

90

100

10

20

30

40

50

60

70

80

90

100

10

20

30

40

50

60

70

80

90

100

90

100

100 10

20

30

40

50

60

70

80

90

100

10

10

20

20

30

30

40

40

50

50

60

60

70

70

80

80

90

90

100

100 10

20

30

40

50

60

70

80

90

100

FIGURE 28. Correlated noiseR in 2D signals. Top left: Noise-free data. Top right: R UncorrelatedR noise. Middle left: ½ðu u Þ2 =juj ¼R 2 ; Middle right: ½ðu u Þ2 =juj2 ¼ 2 ; Bottom left: ½ðu u Þ2 =jruj ¼ 2 ; Bottom right: ½ðu u Þ4 =jruj3 ¼ 2 .

In particular, we consider the models of iteratively minimizing the functionals: Z Z Z Z 1 ðu uðk1Þ Þ2 1 ju uðk1Þ jp þ jruj, þ jruj ð53Þ 2 p jrujp1 jujp and denoting the minimizers (presuming they exist) by uðkÞ ; moreover, we again use the convention uð0Þ :¼ u . Since the functionals in (53) are nonconvex and thus quite delicate to handle analytically and numerically, it is convenient to consider semiimplicit variants such as the models of minimization Z Z Z Z 1 ðu uðk1Þ Þ2 1 ju uðk1Þ jp þ jruj; þ jruj: ð54Þ 2 juðk1Þ jp p jruðk1Þ jp1 The functionals in (54) are convex and straightforward to analyze (see [146]).

498

OTMAR SCHERZER

Minimization of the second functional in (54) with p ¼ 2 can be considered as a semi-implicit time step with step size t ¼ for the mean curvature ﬂow equation (9); for p ¼ 4 it is a semi-implicit method for solving the aﬃne invariant mean curvature ﬂow equation 1=3 @u ru ¼ jruj r : @t jruj

ð55Þ

The ﬁrst functional in (54) corresponds to a semi-implicit time step for solving @u ru p ¼ juj r : @t jruj

ð56Þ

The Euler equation for the minimizer of 1 2

Z

ju uðk1Þ j2 þ jruj

Z jruj

ð57Þ

is u uðk1Þ ¼r jruj

1 ðu uðk1Þ Þ2 ru : 2 jruj jruj2

Note that (58) is only formal since the regularization functional not diﬀerentiable. Division of the equation (58) by gives u uðk1Þ ¼ jrujr

ð58Þ R

jruj is

1 ðu uðk1Þ Þ2 ru 1 : 2 2 jruj2 jruj

Taking the formal limit ! 0þ and considering again uðk1Þ uðtk1 Þ, uðkÞ uðtk Þ and ¼ tk tk1 gives again @u ru ¼ jrujr , @t jruj the mean curvature ﬂow equation. Since (58) can be considered to be a Perona–Malik model with positive and negative diﬀusion, the solution is illposed. The ill-posedness in the optimality condition reﬂects the fact that the underlying energy functional (57) is nonconvex with respect to the gradient

499

DENOISING AND INVERSE PROBLEMS

10

20

30

40

50

60

70

80

90

100 10

20

30

40

50

60

10

10

20

20

30

30

40

40

50

50

60

60

70

70

80

80

90

70

80

90

100

90

100

100 10

20

30

40

50

60

70

80

90

100

10

10

20

20

30

30

40

40

50

50

60

60

70

70

80

80

90

10

20

30

40

50

60

70

80

90

100

10

20

30

40

50

60

70

80

90

100

90

100

100 10

20

30

40

50

60

70

80

90

100

FIGURE 29. Original image (top) and ﬁlter images: mean curvature ﬂow (middle left); aﬃne mean curvature ﬂow (middle right); implicit regularization (bottom left); BV regularization (bottom right).

variable. By employing generalized solution concepts such as convexiﬁcation or -limits the ill-posedness (see [146]) disappears. We present two numerical experiments for relative error denoising: (1) We use the artiﬁcially generated data set at the top left of Figure 28. The several reconstructions in Figure 29 have been created with bounded variation regularization, mean curvature ﬁltering, aﬃne mean curvature ﬁltering, and implicit error regularization. The stopping time in the diﬀusion ﬁltering method and the regularization parameters are selected such that all reconstructions have about the same amplitudes. (2) The second example is concerned with denoising of ultrasound data sets.

500

OTMAR SCHERZER

50

100

150

200

250 20

40

60

80

100

120

50

50

100

100

150

150

200

200

250

140

160

180

250 20

40

60

80

100

120

140

160

180

50

50

100

100

150

150

200

200

250

20

40

60

80

100

120

140

160

180

20

40

60

80

100

120

140

160

180

250 20

40

60

80

100

120

140

160

180

FIGURE 30. Original image (top) and ﬁlter images: mean curvature ﬂow (middle left); aﬃne mean curvature ﬂow (middle right); implicit regularization (bottom left); BV regularization (bottom right).

From the numerical reconstructions one ﬁnds that mean curvature ﬂow and implicit error regularization produce very similar results if the regularization parameter and the diﬀusion time are identiﬁed (Figure 30).

VIII. DISCRETE BV REGULARIZATION AND TUBE METHODS So far we have presented regularization models in inﬁnite dimensional settings (cf. Section III) and in semi-inﬁnite dimensional setting (cf. Section V). In this section we concentrate on completely discrete settings for bounded variation regularization.

DENOISING AND INVERSE PROBLEMS

501

The derivation of discrete variants is not straightforward. Several numerical realizations of discrete bounded variation regularization can be derived, some of which are outlined below. For a piecewise constant function g in [0, 1] of the form i1 i , , i ¼ 1, . . . , n g ¼ gi in i :¼ ð59Þ n n we deﬁne

T~ g :¼ ðTgÞi i¼0,...,n , where gi þ giþ1 for i ¼ 1, . . . , n 1 2 ðTgÞ0 :¼ g1 ,

ðTgÞi :¼

ðTgÞn :¼ gn : We call T~ g the traces of g. A piecewise constant function and its traces are plotted in Figure 31. Using these ingredients we are able to formulate two discrete variants of BV regularization. We restrict our attention to minimization of functionals over the set of piecewise constant functions S :¼ fu : uðxÞ ¼

n X ci i

with jci j < 1g,

ð60Þ

i¼1

where i denotes the characteristic function of the interval i .

FIGURE 31. A piecewise constant function with values gi, i ¼ 1, . . . , n and the traces (Tg)i, i ¼ 0, . . . , n; symbolized by *.

502

OTMAR SCHERZER

Discrete bounded variation regularization functionals diﬀer by the way of interpreting the available discrete data. Two possibilities are considered. (1) The data can be interpreted as the measurement data of traces ðTf Þði=nÞ of a BV-function f. This is a sampling problem. In typical sampling problems one interprets the data as the function value f ði=nÞ. Since in our setting f may be discontinuous at i=n, and point evaluation is not possible, we are forced to use trace evaluation. This leads us to consider minimization of the functional TBVd ðuÞ ¼

n X 1 jðTuÞi ðTf Þi j2 þ 2ðn þ 1Þ i¼0

Z

1

jux j

ð61Þ

0

over S, where ðTf Þi is the given data at i=n. (2) Alternatively to assuming available sampled data, one can interpret them as values of a piecewise constant function f ¼

n X

fi i :

i¼1

Given measurement data fi , i ¼ 1, . . . , n, this suggests minimization of the functional TBVd2 ðuÞ :¼ ¼

n 1 X ðci fi Þ2 þ 2n i¼1

Z

1

jux j

0

n n1 X 1 X ðci fi Þ2 þ jciþ1 ci j 2n i¼1 i¼1

ð62Þ

over S. These two possibilities will be utilized below.

A. Discrete BV Regularization (Sampling) The functional TBVd is well-posed, i.e., there exists a unique minimizer in S P (see [95]). To further analyze properties of the minimizer u ¼ ni¼1 ui i of TBVd it is instructive to study the optimality condition for the coeﬃcients

DENOISING AND INVERSE PROBLEMS

503

ui , i ¼ 1, . . . , n. Setting ¼ 4n , the coeﬃcients of u satisfy the set-valued equations ui ui1 ui uiþ1 þ ðui1 þ ui Þ þ ðui þ uiþ1 Þ þ

jui ui1 j jui uiþ1 j 3 ðfi1 þ fi Þ þ ðfi þ fiþ1 Þ for i ¼ 2, . . . , n 1, u1 u2 5u1 þ u2 þ

3 5f1 þ f2 , ju1 u2 j un un1 3 5fn þ fn1 , 5un þ un1 þ

jun un1 j where we use the abbreviation 8 > < f1g e f1g ¼ jej > : ½1, 1

ð63Þ

if

e > 0,

if if

e < 0, e ¼ 0,

which is the subgradient of jej. We observe from (63) that for j ¼ 1, . . . , n 1 ðuj þ ujþ1 Þ þ 2

j1 X

ðui þ uiþ1 Þ þ 4u1 þ

i¼1

3 ð fj þ fjþ1 Þ þ 2

j1 X

uj ujþ1 juj ujþ1 j

ð fi þ fiþ1 Þ þ 4f1 :

ð64Þ

i¼1

and 4un þ 2

n1 X

ðui þ uiþ1 Þ þ 4u1 ¼ 4fn þ 2

i¼1

n1 X

ð fi þ fiþ1 Þ þ 4f1 :

ð65Þ

i¼1

Let F^u ðj=nÞ :¼ ðuj þ ujþ1 Þ þ 2

j1 X ðui þ uiþ1 Þ þ 4u1 , j ¼ 1, . . . , n 1, i¼1

F^u ð1Þ :¼ 4un þ 2

n1 X ðui þ uiþ1 Þ þ 4u1 : i¼1

ð66Þ

504

OTMAR SCHERZER

Moreover, let F^fþ , F^f be linear splines with respect to the nodes f j=n : j ¼ 0, . . . , ng interpolating F^f ð0Þ ¼ 0, F^f ð j=nÞ ¼ F^f ð j=nÞ

j ¼ 1, . . . , n 1,

F^f ð1Þ ¼ F^f ð1Þ:

ð67Þ

The region between the two linear splines is referred to as the tube T^ . F^f and F^fþ mark the lower and upper bounds of T^ . Since uj uj1 ju u j 1 j j1 we ﬁnd from (64) and (65) that F^u 2 T^ :

ð68Þ

In other words, the antiderivative3 F^u of the minimizer u of the discrete bounded variation regularization formulation is in the tube T^ . We therefore refer to (61) as a tube method.

B. Finite Volume BV Regularization Minimization of TBVd2 as deﬁned in (62) over S is a standard method of formulating discrete bounded variation regularization (cf. Mallat [108]). Note that in this case the data fi are interpreted as coeﬃcients of a piecewise constant function, while in Section VIII.A the data (Tf )i are interpreted as sampling data. Since the ﬁrst term in TBVd2 is strictly convex it is immediate that the functional TBVd2 has a unique minimizer. To specify the optimality criteria for a minimizer of the functional TBVd2 let ¼ n . Then the minimizer u of TBVd2 can be represented as

3

We refer to

Rt 0

f ðsÞ ds as the anti-derivative of the function f.

DENOISING AND INVERSE PROBLEMS

u :¼

505

Pn

ui i with ui satisfying ui ui1 ui uiþ1 þ ui þ

3 fi jui ui1 j jui uiþ1 j u1 u2 3 f1 , u1 þ

ju1 u2 j un un1 3 fn : un þ

jun un1 j

i¼1

for i ¼ 2, . . . , n 1,

ð69Þ

Let Ff ð0Þ ¼ 0,

Ff ð j=nÞ ¼

j X

fk

for j ¼ 1, . . . , n

k¼1

Ff ð0Þ ¼ 0,

Ff ð j=nÞ ¼ Ff ð j=nÞ

for j ¼ 1, . . . , n 1,

Ff ð1Þ ¼ Ff ð1Þ:

ð70Þ

With f we associate the tube T bounded by the linear splines Ff connecting the values Ff ð j=nÞ. Thus the minimizer u has the property that its antiderivative Fu satisﬁes Fu 2 T , and thus the ﬁnite volume BV regularization is a tube method as well. Again, the antiderivative Fu of the minimizer u is in the tube T, i.e., it is a tube method. C. The Taut String Algorithm In this section we recall the taut string algorithm (see [57,109]) for denoising discrete one-dimensional data. We choose a description which allows generalization to higher-dimensional data. T denotes the tube from the previous section. The taut string ‘‘algorithm’’ is actually the solution to a minimization problem, which we specify next. P Algorithm VIII.1 Let ¼ ni¼1 i i with antiderivative V denote the solution to: Z 1 qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ n qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1X 2 j 1 þ j ¼ 1 þ jðÞj2 d ! min ð71Þ i n i¼1 0 over all continuous and piecewise linear functions V on [0,1] with function values in T.

506

OTMAR SCHERZER

Physically speaking V is a string of minimal length contained in the tube T, connecting (0,0) and (1, Ff (1)), i.e., it is taut. In particular in regions between two contact points of V with the boundary of T a taut string is aﬃne linear and is a piecewise constant function. The values i approximate the input data fi and constitutes a denoised approximation to f. In [57] an algorithm for computing the solution to (71) was presented which proceeded iteratively from one nodal value of the tube to the next. The solution method that we shall propose will be completely diﬀerent. The taut string, as determined from Algorithm VIII.1, and the ﬁnite volume BV-regularized solution with ¼ =n are both contained in the same T. This in particular shows that the graph of the ﬁnite BV-regularized solution is at least as long as that of the taut string solution. Finite volume BV regularization and the taut string algorithm share the property that they preserve homogeneous regions of the original data. This is easily seen for the taut string algorithm, since in a ﬂat region of the original data f the function Ff is linear and consequently the taut string is linear in this region too, showing that the ﬂat regions of the ﬁltered data (i.e., the derivative of the taut string) either correspond with the input data or are enlarged. For ﬁnite volume BV regularization (as well as other methods) this statement was addressed with rigor in [128].

D. Multidimensional Discrete BV Regularization In this section we present multidimensional analogs of sampling and ﬁnite volume BV regularization. Moreover, we propose a multidimensional analog of the taut string algorithm. Let ¼ ð0, 1Þ ð0, 1Þ and let f be piecewise constant with respect ij ¼ i j . To introduce the sampling BV regularization in R2 we proceed as in Section VIII.A and model the ﬁt-to-data term as in (61) as P the sum of both components in the x1 and x2 directions, separately. For f ¼ ni,j¼1 cij ij the BV sampling method consists in minimization of the functional n1 X n1 1 X ciþ1,j þ ci,j fiþ1,j þ fi,j 2 TBVd2s ðuÞ :¼ 2n i¼1 j¼1 2 2 þ

2 Z n1 X n1 1 X ci,j þ ci,jþ1 fi,j þ fi,jþ1 þ jruj 2n i¼1 j¼1 2 2

ð72Þ

DENOISING AND INVERSE PROBLEMS

507

over ( S :¼ u : u ¼

n X

) cij ij :

i,j¼1

A multidimensional analog of the ﬁnite volume BV regularization in a higher dimension consists in minimizing the functional Z n X n 1 X 2 jcij fij j þ jruj ð73Þ TBVd2f ðuÞ :¼ 2 2n i¼1 j¼1 over S. To propose an extension of the taut string algorithm to the case of twodimensional data it will be useful to reconsider the taut string algorithm in the following form: (1) Integration of fi , i ¼ 1, . . . , n gives a linear spline Ff . (2) Determination of the taut string Fu in the tube Ff and Ff þ . (3) Diﬀerentiation of Fu to obtain the reconstruction for f. Generalization to higher dimensions is impeded by the fact that there is no obvious analog for integration step 1 above. To overcome this diﬃculty, we proceed by introducing an appropriate potential equation and consider the one-dimensional case ﬁrst. Given f, we deﬁne as a solution to xx ¼ f x ¼ 0

on , on f0, 1g,

and set Ff ¼ x : This replaces step 1 above. Step 2 can then be realized by solving the contact problem: (1) Find a function w 2 BVð0, 1Þ satisfying wð0Þ ¼ Ff ð0Þ and wð1Þ ¼ Ff ð1Þ (in the sense of traces) that minimizes Z 1 qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1 þ w2x ð74Þ 0

subject to the constraint Ff w Ff þ :

508

OTMAR SCHERZER

(2) Set u ¼ wx . This approach can be generalized to higher dimensions in a straightforward way: (1) Solve ¼ f

in

@ ¼0 @

on @:

Deﬁne Ff ¼ r: (2) In R2 ﬁnd two functions wi 2 BVðÞ, i ¼ 1, 2, minimizing Z qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1 þ jrwi j2

ð75Þ

subject to the constraints ðFf Þi wi ðFf Þi þ , i ¼ 1, 2,

ð76Þ

and w ¼ Ff on @: This is a contact problem for ﬁnding a ‘‘minimal surface’’ in the layer bounded by ðFf Þi and ðFf Þi þ . (3) Set u ¼ r w. The choice of proper boundary conditions for and w is not obvious. We tested alternatives to our choice and found that they have no signiﬁcant inﬂuence on the numerical reconstruction. In the one-dimensional case we could have chosen the boundary conditions in such a way that they are consistent with Algorithm VIII.1. This choice, however, has no clear multidimensional analog. E. Numerical Test Examples The practical realization of the taut string algorithm in R2 requires the solution of the bilateral obstacle problem (cf. Algorithm VIII.1), which can be solved eﬃciently using active set strategies. The particular implementation has been considered in [95], where also general references on active set

DENOISING AND INVERSE PROBLEMS

509

strategies and its numerical implementation can be found. In the following we present some numerical simulations with this method. 1. One-Dimensional Test Example We consider the function 8 1 > > > <2 f ðxÞ :¼ > 4 4x > > : 1

in ½0, 1=4Þ in ð1=4, 1=2Þ in ð1=2, 3=4Þ in ð3=4, 1

:

In Figure 32 we display the results of several test runs for the onedimensional example with three diﬀerent absolute noise levels, i.e., 1 ¼ 0, 2 ¼ 0.1, and 3 ¼ 0.5 from left to right. The respective values are 1 ¼ 1.0 105, 2 ¼ 5.0 103, and 3 ¼ 2.5 102. In our test, these values produce the best reconstructions. In Figure 32 column i (i ¼ 1, 2, 3) corresponds to a test run with ði , i Þ. The ﬁrst row in Figure 32 shows the

FIGURE 32. One-dimensional tests. Column i (i ¼ 1, 2, 3) corresponds to (i, i).

510

OTMAR SCHERZER

FIGURE 33. One-dimensional tests. Column i (i ¼ 1, 2, 3) corresponds to (i, li ).

data, the second presents the reconstructions. In the third row we plot the string w (solid) together with its bounds (dashed). In Figure 33 we study the eﬀect of . We have selected values which are larger than the values in Figure 32. In fact, we have l1 ¼ 1.0 102, l2 ¼ 1.0 102, and l3 ¼ 7.5 102. Since we loosen the barriers of the tube the string becomes more ﬂat. We next summarize some of the features observed for denoising of onedimensional images with the taut string algorithm. They are similar to those obtained by nonlinear BV-regularized reconstructions. (1) The mean value of the registered image intensity is preserved by the ﬁltering method. This important feature of diﬀusion ﬁltering methods (cf. [172]) and nonlinear regularization models (cf. [148]) does not hold for instance for discrete morphological ﬁlters, such as the median ﬁlter. (2) Spurious noise is removed. (3) Edges are preserved. (4) The taut string algorithm produces images which are damped in height. The magnitude of damping is comparable to that observed for BV-regularized solutions. 2. Two-Dimensional Bench-Mark Problem Here the solution bilateral contact problem is shown for the bench-mark image in Figure 34 (upper left).

IX. WAVELET SHRINKAGE In this section we review the interactions of wavelet ﬁltering, diﬀusion ﬁltering, and variational methods. For this purpose it is convenient to brieﬂy review orthonormal wavelets.

DENOISING AND INVERSE PROBLEMS Exact data

511

Data

Reconstruction

FIGURE 34. Exact data and noisy data in the ﬁrst row, reconstruction in the second row.

A. Daubechies’ Wavelets We review Daubechies’ construction of orthonormal wavelets (see [56]). The construction is based on the existence of a scaling function , such that for m 2 Z the functions m,k :¼ 2m=2 ð2m x kÞ, k 2 Z, are orthonormal with respect to the norm on L2 ðRÞ. Moreover, is chosen in such a way that for m 2 Z Vm :¼ spanfm,k : k 2 Zg ( ) X :¼ ak m,k : ak 6¼ 0 for only finitely many k 2 Z , k2Z

form a multiresolution analysis on L2 ðRÞ, that is Vm Vm1 , with

\ m2Z

Vm ¼ f0g

and

m 2 Z, [ m2Z

Vm ¼ L2 ðRÞ:

512

OTMAR SCHERZER

The wavelet spaces Wm are the orthogonal complements of Vm in Vm1 , that is Wm :¼ Vm? \ Vm1 : The mother wavelet

is chosen such that the functions m,k

:¼ 2m=2 ð2m x kÞ, k 2 Z,

form an orthonormal basis of Wm. Since ¼ 0,0 2 V0 V1 , the scaling function must satisfy the dilation equation X ðxÞ ¼ hk ð2x kÞ, ð77Þ k2Z

where the sequence {hk} is known as the ﬁlter sequence of the wavelet X ðxÞ ¼ ð1Þk h1k ð2x kÞ: ð78Þ k2Z

The ﬁlter coeﬃcients have to satisfy certain conditions in order to guarantee that the scaling functions and fulﬁll certain properties. In orthogonal wavelet theory due to Daubechies the desired properties on the scaling functions and wavelets are: (1) For ﬁxed integer N 1 the scaling function has support in the interval ½1 N, N . This, in particular, holds when the ﬁlter coeﬃcients satisfy hk ¼ 0,

for k < 1 N and for k > N:

ð79Þ

(2) The existence of a scaling function satisfying (77) requires that X hk ¼ 2: ð80Þ k2Z

(3) In order to impose orthonormality R of the integer translates of the scaling function , that is R ðx lÞðxÞ dx ¼ 0,l , the ﬁlter coeﬃcients fhk g have to satisfy X hk hk2l ¼ 20,l , l ¼ 0, . . . , N 1: ð81Þ k2Z

(4) The wavelet

Z

is postulated to have N vanishing moments, that is xl ðxÞ dx ¼ 0, l ¼ 0, . . . , N 1 R

ð82Þ

DENOISING AND INVERSE PROBLEMS

513

which require the ﬁlter sequence to satisfy X

ð1Þk h1k kl ¼ 0,

l ¼ 0, . . . , N 1:

ð83Þ

k2Z

The oldest wavelet is the Haar wavelet where h0 ¼ h1 ¼ 1. In this case the scaling function is ðxÞ ¼ The wavelet function

1

for x 2 ½0, 1

0

otherwise

:

is given accordingly by 8 for x 2 ½0, 1=2Þ, > <1 ðxÞ ¼ 1 for x 2 ½1=2, 1 , > : 0 otherwise:

Since the functions m,k form an orthonormal basis of L2 ðRÞ, any function f 2 L2 ðRÞ can be expanded in terms of this basis: f ðxÞ ¼

X

fj,k

j,k ðxÞ:

j,k2Z

Orthogonal wavelets on L2 ðRÞ form the basis to construct wavelets on compact intervals [50,113] (for a summary of this topic we also refer the reader to [49]). A family of orthonormal scaling functions and wavelets on multidimensional domains can be constructed from products of one-dimensional scaling and wavelet functions: j~,k~ðx1 , x2 Þ ¼ j1 ,k1 ðx1 Þj2 ,k2 ðx2 Þ, 1 ðx ,x Þ j~,k~ 1 2

¼

j1 ,k1 ðx1 Þ

j2 ,k2 ðx2 Þ,

2 ðx , x Þ j~,k~ 1 2

¼

j1 ,k1 ðx1 Þj2 ,k2 ðx2 Þ,

3 ðx1 , x2 Þ j~,k~

¼ j1 ,k1 ðx1 Þ

j2 ,k2 ðx2 Þ,

where we use the convention that j~ ¼ ð j1 , j2 Þ, k~ ¼ ðk1 , k2 Þ. The functions j~,k~ are called multidimensional scaling functions and the functions i~ ~, j ,k i ¼ 1, 2, 3 are called multidimensional wavelets.

514

OTMAR SCHERZER

B. Denoising by Wavelet Shrinkage Donoho and Johnstone [63] introduced a wavelet-based denoising algorithm, the so-called wavelet shrinkage algorithm. This algorithm consists in calculating the wavelet expansion (see, e.g., [56]) u ðx1 , x2 Þ ¼

3 X

X

i¼1 ð j~,k~Þ2Z2 Z2

u,i ~~ j ,k

i ðx , x Þ j~,k~ 1 2

of the input data and manipulating its coeﬃcients, to be precise the are approximated by ðu,i Þ, with coeﬃcients u,i ~~ ~~ j ,k

j ,k

8 > : tþ

t> jtj : t <

Figure 35 shows the wavelet denoising algorithm with the Daubechies-2 wavelet (with four coeﬃcients h1 , h0 , h1 , h2 ) and the Canny edge detector applied to the ﬁltered data (cf. Figure 36). The method of wavelet shrinkage has been paid considerable attention in the literature (see, e.g., [40,42,63,64,114]) and has been applied for the solution of many practically important problems. 1. Relation to Diffusion Filtering In [40,42] the relation between wavelet shrinkage and regularization methods on the Besov spaces has been established. For the sake of simplicity of presentation we assume that the image data u is available on R2. This is not quite consistent with the overall presentation where we assumed image data on the bounded domain ¼ ð0, 1Þ ð0, 1Þ. In principal one can proceed as outlined below, if instead of wavelets, periodic wavelets are used. The Besov space B11 ðL1 ðR2 ÞÞ can be characterized as follows (note that this is not the standard deﬁnition): f 2 B11 ðL1 ðR2 ÞÞ if and only if ð f Þ ¼

3 X

X

i¼1 ð j~,k~Þ2Z2 Z2

Z i fj~,k~ < 1 with fj~i,k~ ¼

R

f 2

i : j~,k~

Here i~ ~ are smooth, orthonormal wavelet functions, and thus f ~i ~ denote j ,k j ,k the wavelet coeﬃcients of the function f.

515

DENOISING AND INVERSE PROBLEMS 140

120

50

140

120

50

100 100

100 100

80

150

60

80

150

60

40 200

40 200

20

250 20

40

60

80

100

120

140

160

0

180

20

250 20

40

60

80

100

120

140

160

180

140

120

50

140

120

50

100 100

100 100

80

150

60

80

150

60

40 200

40 200

20

250

20

250 20

40

60

80

100

120

140

160

180

20

40

60

80

100

120

140

160

180

140

120

50

140

120

50

100 100

100 100

80

150

60

80

150

60

40 200

40 200

20

250

20

250 20

40

60

80

100

120

140

160

180

20

40

60

80

100

120

140

160

180

FIGURE 35. Wavelet shrinkage with ¼ 0, 1, 5, 10, 50, 100.

Formally, the derivative of ð f Þ can be calculated as follows. Let h 2 B11 ðL1 ðR2 ÞÞ \ L2 ðR2 Þ, then the derivative of ð f Þ in direction h is given by ð f þ thÞ ðf Þ t 3 X fj~i,k~ Z X ¼ 2 h i¼1 ðj~,k~Þ2Z2 fj~,k~ R

@ð f ÞðhÞ ¼ lim t!0

which in turn implies that @ðf Þ ¼

3 X

X

i¼1 ðj~,k~Þ2Z2 Z2

f ~i ~ j,k i f j~,k~

i , j~,k~

i , j~,k~

516

OTMAR SCHERZER

FIGURE 36. Canny’s edge detector applied to the wavelet shrinkage data with ¼ 0, 1, 5, 10, 50, 100.

where, of course, the meaning of f ~i ~= f ~i ~ is set-valued, as explained in j ,k j ,k Section VIII.A. i Thus the wavelet coeﬃcients u~ ~ of the minimizer u of the regularization j ,k functional 1 2

Z R2

ðu u Þ2 þ ðuÞ

ð84Þ

satisfy the optimality condition 3 X

X

i¼1 ð j~,k~Þ2Z2 Z2

uij~,k~ u,i ~~ j ,k

i j~,k~

3

3 X

X

i¼1 ðj~,k~Þ2Z2 Z2

ui~ ~ j ,k i u j~,k~

i : j~,k~

DENOISING AND INVERSE PROBLEMS

517

are orthonormal we ﬁnd that ui~ ~ j ,k ,i i uj~,k~ u~ ~ 3 for all ð j~, k~Þ 2 Z2 Z2 , i ¼ 1, 2, 3: j ,k ui ~ ~

ð85Þ

Since the functions

j~,k~

j ,k

This shows that u,i > ~~ j ,k u,i j~,k~

<

u,i ¼0 ~~ j ,k

if ui~ ~ > 0, j ,k

if ui~ ~ < 0, j ,k i if u~ ~ < :

Consequently from (85) it follows 8 > u,i > > j~,k~ > > < uij~,k~ ¼ 0 > > > > > : u,i þ ~~ j ,k

j ,k

if u,i > j~,k~ : if u,i ~ ~ j ,k

ð86Þ

if u,i < ~~ j ,k

This shows that Besov space regularization is Donoho’s wavelet shrinkage algorithm. Proceeding as in Section III with the regularization technique (84) the diﬀusion ﬁltering @u þ @ðuÞ 3 0 for t > 0 @t uð0Þ ¼ u is associated. X. REGULARIZATION AND STATISTICS There has been considerable interest in incorporating statistical a priori information in regularization techniques. In this section we outline the basic principle. There are several publications in the literature devoted to this topic; an extremely useful overview article is [89], where also adequate references can be found. For an elementary introduction to statistics we refer the reader to [98]. Let u ðxi Þ ¼ uðxi Þ þ nðxi Þ be the measured image intensity at the pixel xi, which is degraded data u with noise n. In the stochastic framework the

518

OTMAR SCHERZER

intensities u ðxi Þ and nðxi Þ are considered registered intensities of random variables U and N. We denote by PðfU < ugÞ and PðfN < ngÞ the probabilities that the random variables U and N are less than u, n, respectively. The probability density functions are accordingly denoted by PðfU ¼ ugÞ ¼ lim

PðfU 2 ½u, u þ duÞgÞ , du

PðfN ¼ ngÞ ¼ lim

PðfN 2 ½n, n þ dnÞgÞ : dn

du!0þ

dn!0þ

The notation PðfU ¼ ugÞ, PðfN ¼ ngÞ is typically used in the case of discrete random variables. We ﬁnd it instructive to use this notation for continuous random variables too. The goal is to recover the image intensity u such that the conditional probability

P fU ¼ ug \ fN ¼ u ug ¼ PðfU ¼ ugÞPðfN ¼ u ugÞ

ð87Þ

is maximized with respect to u. Note that the last identity requires that the random variables U and N are independent. If the noise is normally distributed with mean value zero and variance , then 2 1 2 PðfN ¼ u ugÞ ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ e½ðu uÞ =ð2 Þ : 2p 2

ð88Þ

It is convenient to set PðfU ¼ ugÞ :¼ eF ðuÞ ,

ð89Þ

where F is a nonnegative function. In this case maximization of (87) is equivalent to minimizing F ðuÞðxi Þ þ

1 ðu ðxi Þ uðxi ÞÞ2 : 2 2

ð90Þ

For image processing applications it is necessary to take into account neighborhood relations of image intensities between pixels. This can for instance be achieved by using a nonnegative probabilistic model F which is dependent on gradient approximations of the image intensity.

DENOISING AND INVERSE PROBLEMS

519

In order to minimize (90) for all pixel values xi , i 2 I , we minimize the functional X 1 2 F ðuÞðxi Þ þ 2 ðu ðxi Þ uðxi ÞÞ : 2 i2I To realize the interaction between Tikhonov-type regularization models it is convenient to note that the sum is a quadrature rule approximation of the integral Z

2 2 F ðuÞðxÞ þ ðu ðxÞ uðxÞÞ2 dx:

Using F ðuÞ ¼ jruj2 , with > 0, the stochastic approach is equivalent to Tikhonov regularization with regularization parameter ¼ 2 2 ; F ðuÞ ¼ jruj is bounded variation regularization; F ðuÞ ¼ jrujlogjruj is entropy regularization. For Tikhonov regularization, F ðuÞ ¼ jruj2 , the associated probability density function PðfU ¼ ugÞ ¼ eF ðuÞ ¼ ejruj

2

is large in regions where u is almost constant, and small in regions of high oscillations. Or, in other words, the image intensity u is considered to be reliable if the gradient is low. In establishing the link between stochastic models and Tikhonov-type regularization we assumed Gaussian white noise (88) and (89). Following the derivation above, minimization principles get much more complicated if we skip the assumption of Gaussian white noise. To highlight the arising complications we consider exemplarily Rayleigh distributed noise, that is PðfN ¼ u ugÞ ¼

ju uj ½ðu uÞ2 =ð22 Þ e : 2

ð91Þ

Proceeding as above we ﬁnd that maximization of the conditional probability results in minimization of the functional Z ðu uÞ2 2

jruj þ logðju ujÞ : 2 2

ð92Þ

520

OTMAR SCHERZER

There is no scale space associated with this regularization functional. However, an inverse scale-space method can be constructed, by considering iterative minimization of the functionals Z ðu uÞ2

jrðu uðk1Þ Þj2 þ logðju ujÞ , k ¼ 1, 2, . . . , 2 2

ð93Þ

and denoting a minimizer by uðkÞ , which then satisﬁes the optimality condition

ðuðkÞ uðk1Þ Þ ¼

1 1 ðuðkÞ u Þ: 2 2 2ju uðkÞ j2

Setting tk ¼ k , k ¼ 1, . . . , and uðkÞ ¼ uðktÞ, we get by taking the limit

! 0þ the inverse scale-space method: @u 1 1 ðt, xÞ ¼ ðuðt, xÞu ðxÞÞ @t 2 2 2ju ðxÞ uðt, xÞj2 uð0, xÞ ¼ 0 for x 2 :

for ðt, xÞ 2 ð0, 1Þ, ð94Þ

We present ﬁltering of data degraded with Rayleigh noise, with ¼ 0:2 (cf. Figure 37). Figure 38 shows the solution of (94) at speciﬁed time. Finally we compare the stochastic regularization method with well-established diﬀusion ﬁltering methods: the quality of the stochastic regularization is completely diﬀerent from diﬀusion-type ﬁltering. We have selected test data that are extremely distorted by Rayleigh noise. The ﬁltered images in Figure 39 were obtained with a large as possible stopping time (to reduce the eﬀect of noise), so that still a number of details, like the tripod of the cameraman, could be recovered. The selected stopping time is too small for the mean curvature motion ﬁltering and the anisotropic diﬀusion ﬁltering to completely smear out the noise. We ﬁnd that the number of preserved details in the ﬁltered images is optimal for the total variation ﬂow and the stochastic method. The good performance of these methods is due to the fact that

the stochastic method uses a priori information on the noise and the total variation ﬁltering optimally incorporates information on the image data, which is a blocky image.

521

DENOISING AND INVERSE PROBLEMS

10

10

20

20

30

30

40

40

50 50 60 60 70 70 80 80 10

20

30

40

50

60

70

80

10

20

30

40

50

60

70

80

FIGURE 37. Camera scene (left) is distorted by Rayleigh noise ( ¼ 0.2) (right).

10

10

20

20

30

30

40

40

50

50

60

60

70

70

80

80 10

20

30

40

50

60

70

80

10

10

20

20

30

30

40

40

50

50

60

60

70

70

80

10

20

30

40

50

60

70

80

10

20

30

40

50

60

70

80

10

20

30

40

50

60

70

80

80 10

20

30

40

50

60

70

80

10

10

20

20

30

30

40

40

50

50

60

60

70

70

80

80 10

20

30

40

50

60

70

80

FIGURE 38. Denoising of the image represented in Figure 37 with the inverse scale-space method (94) at time t ¼ 0.04, 0.12, 0.32, 0.56, 1.2, 2.4.

522

OTMAR SCHERZER

10

10

20

20

30

30

40

40

50

50

60

60

70

70

80

80

10

20

30

40

50

60

70

80

10

20

30

40

50

60

70

80

10

20

30

40

50

60

70

80

10 10 20 20 30

40

30

50 40 60 50 70

80

60 10

20

30

40

50

60

10 10 20 20 30

30

40

50

40

60 50 70 60 80 10

20

30

40

50

60

10

20

30

40

50

60

70

80

FIGURE 39. Denoising. Top left: Heat equation. Top right: Total variation ﬂow. Middle left: Perona–Malik diﬀusion. Middle right: Anisotropic diﬀusion. Bottom left: Mean curvature motion. Bottom right: Stochastic regularization.

XI. CONCLUSIONS In this chapter we reviewed interactions between variational methods, diﬀusion ﬁltering for denoising and image smoothing, and reviewed links to splines, wavelets, and statistical methods. We presented an introduction to inverse problems, such as deblurring and deconvolution, and highlighted numerical methods for their solution. Various other important image processing applications which are solved by variational methods and partial diﬀerential equations have not been touched on:

Optical ﬂow models [5,18,27,51,75,97,100,104,120,121,149,150,173,174].

DENOISING AND INVERSE PROBLEMS

523

Computational anatomy and image registration [12,13,65,75,81,96, 162,163]. Inpainting [20,24,44,110,111,141]. Diﬀusion and regularization of vector-valued data, such as color images and tensor-valued medical images data [26,136,167]. Blind deconvolution [34,45,47,144]. Level set methods [69–72,93,132,137,155]. Surface smoothing [61,144]. Active contours [161]. Other variational techniques [129,130,159,160].

We did not attempt to give a complete list of references on these topics, since they are not within the main goal of this chapter. We apologize for any reference that has been omitted. For the reader interested in these topics it should be possible to get a complete account from the references listed in these papers.

ACKNOWLEDGMENTS This work has been supported by the Austrian Fonds zur Fo¨rderung der Wissenschaftlichen Forschung (FWF), grant Y-123 INF. The author thanks H. Grossauer, M. Haltmeier, W. Hinterberger, R. Kowar, J. Ku¨nstle, S. Leimgruber, and G. Regensburger. Moreover, the author is grateful for the agreement of Ch. Groetsch, M. Hintermu¨ller, K. Kunisch, M. Oehsen, E. Radmoser, and J. Weickert to use some data of previous joint publications.

REFERENCES 1. Acar, R., and Vogel, C. R. (1994). Analysis of bounded variation penalty methods for illposed problems. Inverse Probl. 10, 1217–1229. 2. Alvarez, L., Guichard, F., Lions, P.-L., and Morel, J.-M. (1993). Axioms and fundamental equations of image processing. Arch. Ration. Mech. Anal. 123, 199–257. 3. Alvarez, L., Lions, P.-L., and Morel, J.-M. (1992). Image selective smoothing and edge detection by nonlinear diﬀusion. II. SIAM J. Numer. Anal. 29, 845–866. 4. Alvarez, L., and Morel, J.-M. (1994). Formalization and computational aspects of image analysis. Acta Numerica, 1–59. 5. Alvarez, L., Weickert, J., and Sanchez, J. (1999) A scale-space approach to nonlocal optical ﬂow calculations, in [127], pp. 235–246. 6. Ambrosio, L. (1989). A compactness theorem for a new class of functions of bounded variation. Boll. Un. Mat. Ital. B 3, 857–881.

524

OTMAR SCHERZER

7. Ambrosio, L. (1989). Variational problems in SBV and image segmentation. Acta Appl. Math. 17, 1–40. 8. Ambrosio, L. (1990). Existence theory for a new class of variational problems. Arch. Rational Mech. Anal. 111, 291–322. 9. Ambrosio, L., Fusco, N., and Pallara, D. (2000). Functions of Bounded Variation and Free Discontinuity Problems. New York: Oxford University Press. 10. Ambrosio, L., and Tortorelli, V. M. (1990). Approximation of functionals depending on jumps by elliptic functionals via -convergence. Commun. Pure Appl. Math. 43, 999–1036. 11. Ambrosio, L., and Tortorelli, V. M. (1992). On the approximation of free discontinuity problems. Boll. Un. Mat. Ital. B (7) 6, 105–123. 12. Amit, Y. (1994). A nonlinear variational problem for image matching. SIAM J. Sci. Comput. 15, 207–224. 13. Amit, Y., Grenander, U., and Piccioni, M. (1991). Structural image restoration through deformable templates. J. Amer. Statist. Assoc. 86, 376–387. 14. Andreu, F., Ballester, C., Caselles, V., and Mazo´n, J. M. (2000). Minimizing total variation ﬂow. C.R. Acad. Sci. Paris Se´r. I Math. 331, 867–872. 15. Andreu, F., Ballester, C., Caselles, V., and Mazo´n, J. M. (2001). The Dirichlet problem for the total variation ﬂow. J. Funct. Anal. 180, 347–403. 16. Andreu, F., Ballester, C., Caselles, V., and Mazo´n, J. M. (2001). Minimizing total variation ﬂow. Diﬀerential Integral Equations 14, 321–360. 17. Andreu, F., Caselles, V., Dı´ az, J. I., and Mazo´n, J. M. (2002). Some qualitative properties for the total variation ﬂow. J. Funct. Anal. 188, 516–547. 18. Aubert, G., Deriche, R., and Kornprobst, P. (1999). Computing optical ﬂow via variational techniques. SIAM J. Appl. Math. 60, 156–182. 19. Aubert, G., and Kornprobst, P. (2002). Mathematical Problems in Image Processing. New York: Springer-Verlag. 20. Ballester, C., Caselles, V., Verdera, J., Bertalmio, M., and Sapiro, G. (2001). A variational model for ﬁlling-in gray level and color images, in Proceedings of the Eighth International Conference On Computer Vision (ICCV-01), Los Alamitos, CA: IEEE Computer Society, pp. 10–16. 21. Banks, H. T., and Kunisch, K. (1989). Estimation Techniques for Distributed Parameter Systems. Basel: Birkha¨user. 22. Barenblatt, G. I., Bertsch, M., Dal Passo, R., and Ughi, M. (1993). A degenerate pseudoparabolic regularization of a nonlinear forward–backward heat equation arising in the theory of heat and mass exchange in stably stratiﬁed turbulent shear ﬂow. SIAM J. Math. Anal. 24, 1414–1439. 23. Bellettini, G., Caselles, V., and Novaga, M. (2001). The total variation ﬂow in RN. Technical report, Sezione de analisi matematica e probabilita`, Universita` di Pisa. 24. Bertalmio, M., Sapiro, G., Caselles, V., and Ballester, C. (2000). Image inpainting, in Proceedings of the Computer Graphics Conference 2000 (SIGGRAPH-00). New York: ACM Press, pp. 417–424. 25. Bertero, M., and Boccacci, P. (1998). Introduction to Inverse Problems in Imaging. London: IOP Publishing. 26. Black, M., Sapiro, G., Marimont, D., and Heeger, H. (1997). Robust anisotropic diﬀusion and sharpening of scalar and vector images, in [166], pp. 263–266. 27. Black, M. J., and Anandan, P. (1991). Robust dynamic motion estimation over time, in Proc. IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR ’91). Los Alamitos, CA: IEEE Computer Society Press, pp. 292–302. 28. Bracewell, R. N., and Riddle, A. C. (1967). Inversion of fan-beam scans in radio astronomy. Astrophys. J. 150, 427–434.

DENOISING AND INVERSE PROBLEMS

525

29. Braides, A. (2000). Free discontinuity problems and their non-local approximation, in Calculus of Variations and Partial Diﬀerential Equations. Berlin: Springer, pp. 171–180. 30. Braides, A., and Dal Maso, G. (1997). Non-local approximation of the Mumford–Shah functional. Calc. Var. Partial Diﬀerential Equations 5, 293–322. 31. Brezis, H. (1973). Operateurs Maximaux Monotones et semi-groupes de contractions dans les espaces de Hilbert. Amsterdam: North-Holland. 32. H. Brezis (1983). Analyse fonctionnelle. Theorie et applications. Collection Mathematiques Appliquees pour la Maitrise. Paris: Masson. 33. Bronstein, I. N., Semendjajew, K. A., Musiol, G., and Muehlig, H. (1997). Taschenbuch der Mathematik (Handbook of Mathematics), 3rd edition. Frankfurt am Main: Deutsch. 34. Burger, M., and Scherzer, O. (2001). Regularization methods for blind deconvolution and blind source separation problems. Math. Control Signals Systems 14, 358–383. 35. Cadzow, J. A. (1979). An extrapolation procedure for band-limited signals. IEEE Trans. Acoust. Speech Signal Process. 27, 4–12. 36. Canny, J. F. (1986). A computational approach to edge detection. IEEE Trans. Pattern Anal. Machine Intell. PAMI-8(6), 679–697. 37. Catte´, F., Lions, P.-L., Morel, J.-M., and Coll, T. (1992). Image selective smoothing and edge detection by nonlinear diﬀusion. SIAM J. Numer. Anal. 29, 182–193. 38. Chambolle, A. (1999). Finite-diﬀerences discretizations of the Mumford–Shah functional. M2AN Math. Model. Numer. Anal. 33, 261–288. 39. Chambolle, A., and Dal Maso, G. (1999). Discrete approximation of the Mumford–Shah functional in dimension two. M2AN Math. Model. Numer. Anal. 33, 651–672. 40. Chambolle, A., DeVore, R. A., Lee, N., and Lucier, B. J. (1998). Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans. Image Process. 7, 319–335. 41. Chambolle, A., and Lions, P. L. (1997). Image recovery via total variation minimization and related problems. Numer. Math. 76, 167–188. 42. Chambolle, A., and Lucier, B. J. (2001). Interpreting translation invariant wavelet shrinkage as a new image smoothing scale space. IEEE Trans. Image Process. 10, 993–1000. 43. Chan, T. F., Golub, G. H., and Mulet, P. (1999). A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20, 1964–1977 (electronic). 44. Chan, T. F., and Shen, J. (2002). Mathematical models for local nontexture inpaintings. SIAM J. Appl. Math. 62, 1019–1043 (electronic). 45. Chan, T. F., and Wong, C. K. (2000). Convergence of the alternating minimization algorithm for blind deconvolution. Linear Algebra Appl. 316, 259–285. 46. Chan, T. F., Golub, G. H., and Mulet, P. (1996). A nonlinear primal-dual method for total variation-based image restoration, in Proceedings ICAOS ’96, Berlin: Springer, pp. 241– 252. 47. Chan, T. F., and Wong, C. K. (1998). Total variation blind deconvolution. IEEE Trans. Image Process 7, 370–375. 48. Chorin, A. J., and Marsden, J. E. (1993). A Mathematical Introduction to Fluid Mechanics, 3rd edition. New York: Springer-Verlag. 49. Chyzak, F., Paule, P., Scherzer, O., Schoisswohl, A., and Zimmermann, B. (2001). The constrution of orthonormal wavelets using symbolic methods and a matrix analytical approach for wavelets on the interval. Exp. Math. 10, 67–86. 50. Cohen, A., Daubechies, I., and Vial, P. (1993). Wavelets on the interval and fast wavelet transforms. Appl. Comput. Harmon. Anal. 1, 54–81. 51. Cohen, I. (1993). Nonlinear variational method for optical ﬂow computation, in Proc. Eighth Scandinavian Conf. on Image Analysis, Vol. 1, Tromsø, pp. 523–530.

526

OTMAR SCHERZER

52. Colton, D., and Kress, R. (1983). Integral Equation Methods in Scattering Theory. New York: Wiley. 53. Colton, D., and Kress, R. (1992). Inverse Acoustic and Electromagnetic Scattering Theory. New York: Springer-Verlag. 54. Dacorogna, B. (1982). Weak Continuity and Weak Lower Semi-Continuity of Non-Linear Functionals. Berlin: Springer-Verlag. 55. Dacorogna, B. (1989). Direct Methods in the Calculus of Variations. Berlin: Springer-Verlag. 56. Daubechies, I. (1992). Ten Lectures on Wavelets. Philadelphia, PA: SIAM. 57. Davies, P. L., and Kovac, A. (2001). Local extremes, runs, strings and multiresolution. Ann. Statist. 29, 1–65 (with discussion and rejoinder by the authors). 58. De Boor, C. (1978). A Practical Guide to Splines. New York: Springer. 59. De Giorgi, E., Carriero, M., and Leaci, A. (1989). Existence theorem for a minimum problem with free discontinuity set. Arch. Ration. Mech. Anal. 108, 195–218. 60. Dibos, F., and Se´re´, E. (1997). An approximation result for the minimizers of the Mumford–Shah functional. Boll. Un. Mat. Ital. A (7) 11, 149–162. 61. Diewald, U., Preusser, T., Rumpf, M., and Strzodka, R. (2000). Diﬀusion models and their accelerated solution in image and surface processing. Acta Math. Univ. Comenian (NS) 70, 15–31. 62. Dobson, D. C., and Vogel, C. R. (1997). Convergence of an iterative method for total variation denoising. SIAM J. Num. Anal. 34, 1779–1791. 63. Donoho, D. L., and Johnstone, I. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90, 1200–1224. 64. Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. B., and Picard, D. (1996). Density estimation by wavelet thresholding. Ann. Statist. 24, 508–539. 65. Dupuis, P., Grenander, U., and Miller, M. I. (1998). Variational problems on ﬂows of diﬀeomorphisms for image matching. Quart. Appl. Math. 56, 587–600. 66. Engl, H. W., and Groetsch, C. W., eds (1987). Inverse and Ill-Posed Problems. Boston: Academic Press. 67. Engl, H. W., Hanke, M., and Neubauer, A. (1996). Regularization of Inverse Problems. Dordrecht: Kluwer Academic. 68. Engl, H. W., Kunisch, K., and Neubauer, A. (1989). Convergence rates for Tikhonov regularization of nonlinear ill-posed problems. Inverse Probl. 5, 523–540. 69. Evans, L. C., and Spruck, J. (1991). Motion of level sets by mean curvature. I. J. Diﬀer. Geom. 33, 635–681. 70. Evans, L. C., and Spruck, J. (1992). Motion of level sets by mean curvature. II. Trans. Am. Math. Soc. 330, 321–332. 71. Evans, L. C., and Spruck, J. (1992). Motion of level sets by mean curvature. III. J. Geom. Anal. 2, 121–150. 72. Evans, L. C., and Spruck, J. (1995). Motion of level sets by mean curvature. IV. J. Geom. Anal. 5, 79–116. 73. Evans, L. C., and Gariepy, R. F. (1992). Measure Theory and Fine Properties of Functions. Boca Raton: CRC Press. 74. Fasano, A., and Primicerio, M., eds. (1994). Proc. Seventh European Conf. on Mathematics in Industry. Stuttgart: Teubner. 75. Fischer, B., and Modersitzki, J. (1999). Fast inversion of matrices in image processing. Numer. Algor. 22, 1–11. 76. Florack, L. (1997). Image Structure. Dordrecht: Kluwer. 77. Frigaard, I. A., Ngwa, G., and Scherzer, O. (2002). On eﬀective stopping time selection for visco-plastic nonlinear BV diﬀusion ﬁlters. SIAM J. Appl. Math. accepted for publication.

DENOISING AND INVERSE PROBLEMS

527

78. Geman, D., and Yang, C. (1995). Nonlinear image recovery with half-quadratic regularization. IEEE Trans. Image Process. 4, 932–945. 79. Gilbert, P. (1972). Iterative methods for the three-dimensional reconstruction of an object from projections. J. Theor. Biol. 36, 105–117. 80. Glowinski, R. (1984). Numerical Methods for Nonlinear Variational Problems. Berlin: Springer. 81. Grenander, U., and Miller, M. I. (1998). Computational anatomy: an emerging discipline. Quart. Appl. Math. 56(4), 617–694. 82. Groetsch, C. W. (1991). Diﬀerentiation of approximately speciﬁed functions. Amer. Math. Monthly 98, 847–850. 83. Groetsch, C. W. (1993). Inverse Problems in the Mathematical Sciences. Vieweg Mathematics for Scientists and Engineers. Braunschweig: Friedr. Vieweg. 84. Groetsch, C. W. (1999). Inverse Problems. Washington, DC: Mathematical Association of America. 85. Groetsch, C. W., and Scherzer, O. (2000). Nonstationary iterated Tikhonov–Morozov method and third order diﬀerential equations for the evaluation of unbounded operators. Math. Meth. Appl. Sci. 23, 1287–1300. 86. Groetsch, C. W. (1983). Comments on Morozov’s discrepancy principle, in Improperly Posed Problems and Their Numerical Treatment, edited by G. Ha¨mmerlin and K. H. Hoﬀmann, Basel: Birkha¨user, pp. 97–104. 87. Groetsch, C. W. (1984). The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind. Boston: Pitman. 88. Groetsch, C. W., and Scherzer, O. (1993). Optimal order of convergence for stable evaluation of diﬀerential operators. Electronic J. Diﬀerential Equations, 4, 1–10 (http:// ejde.math.unt.edu). 89. Hamza, A. B., Krim, H., and Unal, G. B. (2002). Unifying probabilistic and variational estimation. IEEE Signal Processing Magazine 19, 37–47. 90. Hanke, M. (2002). Grundlagen der Numerischen Mathematik und des Wissenschaftlichen Rechnens. Stuttgart: Teubner. 91. Hanke, M., and Groetsch, C. W. (1998). Nonstationary iterated Tikhonov regularization. J. Optim. Theory Appl. 98, 37–53. 92. Hanke, M., and Scherzer, O. (2001). Inverse problems light: numerical diﬀerentiation. Amer. Math. Monthly 108, 512–521. 93. Harabetian, E., and Osher, S. (1998). Regularization of ill-posed problems via the level set approach. SIAM J. Appl. Math. 58, 1689–1706. 94. Herman, G. T. (1980). Image Reconstruction from Projections: The Fundamentals of Computed Tomography. New York: Academic Press. 95. Hinterberger, W., Hintermu¨ller, M., Kunisch, K., von Oehsen, M., and Scherzer, O. (2002). Tube methods for BV regularization, J. Math. Imag. Vision, accepted for publication. 96. Hinterberger, W., and Scherzer, O. (2001). Models for image interpolation based on the optical ﬂow. Computing 66, 231–247. 97. Hinterberger, W., Scherzer, O., Schno¨rr, Ch., and Weickert, J. (2002). Analysis of optical ﬂow models in the framework of calculus of variations. Num. Funct. Anal. Opt. 23, 69–90. 98. Hoel, P. G. (1960). Elementary Statistics. New York: John Wiley. 99. Hofmann, B. (1999). Mathematik inverser Probleme (Mathematics of Inverse Problems). Stuttgart: Teubner. 100. Horn, B., and Schunck, B. (1981). Determining optical ﬂow. Artif. Intell. 17, 185–203. 101. Kerckhove, M., ed. (2001). Scale-Space and Morphology in Computer Vision, Notes in Computer Science, LNCS 2106. New York: Springer Verlag.

528

OTMAR SCHERZER

102. Kichenassamy, S. (1997). The Perona–Malik paradox. SIAM J. Appl. Math. 57, 1328– 1342. 103. Kirsch, A. (1996). An Introduction to the Mathematical Theory of Inverse Problems. New York: Springer-Verlag. 104. Kornprobst, P., Deriche, R., and Aubert, G. (1999). Image sequence analysis via partial diﬀerential equations. J. Math. Imaging Vision 11, 5–26. 105. Kress, R. (1989). Linear Integral Equations. Berlin: Springer-Verlag. 106. Lindeberg, T. (1994). Scale-Space Theory in Computer Vision. Boston: Kluwer. 107. Louis, A. K. (1989). Inverse und Schlecht Gestellte Probleme. Stuttgart: Teubner. 108. Mallat, S. (1999). A Wavelet Tour of Signal Processing, 2nd edition. San Diego, CA: Academic Press. 109. Mammen, E., and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist. 25, 387–413. 110. Masnou, S. (2002). Disocclusion: a variational approach using level lines. IEEE Trans. Image Process. 11, 68–76. 111. Masnou, S., and Morel, J. M. (1998). Level lines based disocclusion, in [175], pp. 259–263. 112. The MathWorks, http://www.mathworks.com/. MATLAB. 113. Meyer, Y. (1991). Ondeletts sur l’intervalle. Rev. Mat. Iberoam. 7(2), 115–133. 114. Meyer, Y. (2001). Oscillating Patterns in Image Processing and Nonlinear Evolution Equations, Vol. 22 of University Lecture Series. Providence, RI: American Mathematical Society. 115. Morel, J. M., and Solimini, S. (1995). Variational Methods in Image Segmentation. Boston: Birkha¨user. 116. Morozov, V. A. (1984). Methods for Solving Incorrectly Posed Problems. New York: Springer Verlag. 117. Morozov, V. A. (1993). Regularization Methods for Ill-Posed Problems. Boca Raton: CRC Press. 118. Mumford, D., and Shah, J. (1989). Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42, 577–685. 119. Murio, D. A. (1993). The Molliﬁcation Method and the Numerical Solution of Ill-Posed Problems. New York: John Wiley. 120. Nagel, N. H. (1987). On the estimation of optical ﬂow: relations between new approaches and some new results. Artif. Intell. 33, 299–324. 121. Nagel, N. H., and Enkelmann, W. (1986). An investigation of smoothness constraints for the estimation of displacement vector ﬁelds from image sequences. IEEE Trans. Pattern Anal. Mach. Intell. 8, 565–593. 122. Nashed, M. Z., ed. (1976). Generalized Inverses and Applications. New York: Academic Press. 123. Natterer, F. (1986). The Mathematics of Computerized Tomography. Stuttgart: Teubner. 124. Natterer, F., and Wu¨bbeling, F. (2001). Mathematical Methods in Image Reconstruction. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). 125. Neubauer, A. (1989). Tikhonov regularization for non-linear ill-posed problems: optimal convergence rates and ﬁnite-dimensional approximation. Inverse Probl. 5, 541–557. 126. Nielsen, M., Florack, L., and Deriche, R. (1997). Regularization, scale-space and edge detection ﬁlters. J. Math. Imag. Vision 7, 291–307. 127. Nielsen, M., Johansen, P., Olsen, O. F., and Weickert, J., eds. (1999). Scale-Space Theories in Computer Vision. Lecture Notes in Computer Science, Vol. 1683. Springer Verlag. 128. Nikolova, M. (2000). Local strong homogeneity of a regularized estimator. SIAM J. Appl. Math. 61, 633–658.

DENOISING AND INVERSE PROBLEMS

529

129. Nitzberg, M., Mumford, D., and Shiota, T. (1993). Filtering, Segmentation and Depth, Lecture Notes in Computer Science. New York: Springer. 130. Nitzberg, M., and Shiota, T. (1992). Nonlinear image ﬁltering with edge and corner enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 14, 826–833. 131. Orphanoudakis, S., Trahamnias, P., Crowley, J., and Katveas, N., eds. (1998). Proc. Computer Vision and Mobile Robotics Workshop, CMVR’98, Santorini. 132. Paragios, M., ed. (2001). Variational and Level set methods in Computer Vision. Los Alamitos, CA: IEEE Computer Society. 133. Pazy, A. (1983). Semigroups of Linear Operators and Applications to Partial Diﬀerential Equations. New York: Springer-Verlag. 134. Perona, P., and Malik, J. (1987). Scale space and edge detection using anisotropic diﬀusion, in Workshop on Computer Vision. Washington, DC: IEEE Computer Society Press, pp. 16–22. 135. Perona, P., and Malik, J. (1990). Scale space and edge detection using anisotropic diﬀusion. IEEE Trans. Pattern Anal. Mach. Intell. 12, 629–639. 136. Pollak, I., Krim, H., and Willsky, A. (1998). Stabilized inverse diﬀusion equations and segmentation of vector-valued images, in [175], pp. 246–248. 137. Preusser, T., and Rumpf, M. (2002). A level set method for anisotropic geometric diﬀusion in 3D image processing. SIAM J. Appl. Math. 62, 1772–1793 (electronic). 138. Radmoser, E., Scherzer, O., and Weickert, J. (1999). Scale-space properties of regularization methods, in [127], pp. 211–222. 139. Radmoser, E., Scherzer, O., and Weickert, J. (2000). Scale-space properties of nonstationary iterative regularization methods. J. Visual Commun. Image Representation 11, 96–114. 140. Radon, J. (1917). U¨ber die Bestimmung von Funktionen durch ihre Integralwerte la¨ngs gewisser Mannigfaltigkeiten. Ber. Verh. Sachs. Akad. Wiss. Leipzig Math. Phys. Kl. 69. 141. Ramasubramanian, M., Pattanaik, S. N., and Greenberg, D. P. (1999). A perceptually based physical error metric for realistic image synthesis, in Alyn Rockwood, editor, SIGGRAPH 99 Conference Proceedings, Annual Conference Series, Addison Wesley, pp. 73–82. 142. Reinsch, Ch. (1967). Smoothing by spline functions. Numer. Math. 10, 177–183. 143. Rice, J., and Rosenblatt, M. (1983). Smoothing splines: regression, derivatives and deconvolution. Ann. Statist. 11, 141–156. 144. Sapiro, G. (2001). Geometric Partial Diﬀerential Equations and Image Analysis. Cambridge: Cambridge University Press. 145. Scherzer, O. (1997). Stable evaluation of diﬀerential operators and linear and nonlinear multi-scale ﬁltering. Electronic J. Diﬀerential Equations 15, 1–12 (http://ejde.math.unt.edu). 146. Scherzer, O. (2002). Explicit versus implicite relative error regularization on the space of functions of bounded variation, in ‘‘Inverse Problems, Image Analysis, and Medical Imaging,’’Contemp. Math. 313, 177–198. Providence, RI: American Mathematics Society. 147. Scherzer, O., and Groetsch, C. W. (2001). Inverse scale space theory for inverse problems. In [101], pp. 317–325. 148. Scherzer, O., and Weickert, J. (2000). Relations between regularization and diﬀusion ﬁltering. J. Math. Imag. Vision 12, 43–63. 149. Schno¨rr, C. (1994). Segmentation of visual motion by minimizing convex non-quadratic functionals, in Proc. 12th Int. Conf. on Pattern Recognition, Vol. A, pp. 661–663. 150. Schno¨rr, Ch. (1991). Funktionalanalytische Methoden zur Gewinnung von Bewegungsinformation aus TV-Bildfolgen. PhD thesis, Fakulta¨t fu¨r Informatik, University of Karlsruhe. 151. Schoenberg, I. J. (1964). Spline functions and the problem of graduation. Proc. Natl. Acad. Sci. USA 52, 947–950.

530

OTMAR SCHERZER

152. Schultz, M. H. (1973). Spline Analysis. Englewood Cliﬀs, NJ: Prentice-Hall. 153. Schumaker, L. L. (1981). Spline Functions: Basic Theory. New York: Wiley. 154. Seidman, T. I., and Vogel, C. R. (1989). Well posedness and convergence of some regularisation methods for non-linear ill posed problems. Inverse Probl. 5, 227–238. 155. Sethian, J. A. (1999). Level Set Methods and Fast Marching Methods, 2nd edition. Cambridge: Cambridge University Press. 156. Sporring, J., Nielsen, M., Florack, L., and Johansen, P., eds. (1997). Gaussian Scale-Space Theory. Dordrecht: Kluwer. 157. Strang, G., and Fix, G. J. (1973). An Analysis of the Finite Element Method. Englewood Cliﬀs, NJ: Prentice-Hall. 158. Strong, D., and Chan, T. F. (1996). Exact solutions to the total variation regularization problem. Technical report, University of California, Los Angeles, CAM 96-41. 159. Terzopoulos, D. (1983). Multilevel computational processes for visual surface reconstructions. Computer Vision, Graphics and Image Processing 24, 52–96. 160. Terzopoulos, D. (1988). The computation of visible-surface representations. IEEE Trans. Pattern Anal. Mach. Intell. 10, 417–438. 161. Terzopoulos, D., Witkin, A., and Kass, M. (1988). Constraints on deformable models: recovering 3D shape and nonrigid motion. Artif. Intell. 36, 91–123. 162. Thirion, J.-P. (1995). Fast non-rigid matching of 3D medical images. Technical Report RR-2547, Inria, Institut National de Recherche en Informatique et en Automatique. 163. Thirion, J.-P. (1996). Non-rigid matching using demons. Preprint, INRIA, France. 164. Tikhonov, A. N., and Arsenin, V. Y. (1977). Solutions of Ill-Posed Problems. Washington, DC: Wiley (translation editor: Fritz John). 165. Torre, V., and Poggio, T. A. (1986). On edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 48–163. 166. Torwick, I., ed. (1997). Proceedings of the 1997 IEEE International Conference on Image Processing (ICIP-97). Los Alamitos, CA: IEEE Computer Society. 167. Tschumperle´, D., and Deriche, R. (2002). Diﬀusion pdes on vector-valued images. IEEE Signal Processing Magazine 19, 16–25. 168. Unser, M. (1999). Splines—a perfect ﬁt for signal and image processing. IEEE Signal Processing Magazine 16, 22–38. 169. Wahba, G. (1990). Spline Models for Observational Data, Vol. 59. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). 170. Webb, S., ed. (1988). The Physics of Medical Imaging. Bristol: Institute of Physics Publishing. 171. Weickert, J. (1994). Anisotropic diﬀusion ﬁlters for image processing based quality control, in [74]. 172. Weickert, J. (1998). Anistropic Diﬀusion in Image Processing. Stuttgart: Teubner. 173. Weickert, J. (1999). On discontinuity-preserving optic ﬂow, in [131], pp. 115–252. 174. Weickert, J., and Schno¨rr, Ch. (2001). Variational optic ﬂow computation with a spatiotemporal smoothness constraint. J. Math. Imag. Vision, 14, 245–255. 175. Werner, B., ed. (1998). Proceedings of the 1998 IEEE International Conference on Image Processing (ICIP-98). Los Alamitos, CA: IEEE Computer Society.

Index

A

Autocovariance function (ACF), 14–15, 19–20 Average length, 129 Axial data, 126 statistical measures, 129

Aberration coeﬃcients, 366–367 Aberration correctors, 369 Accessibility relation, 110, 111 Achromatic axis, 171 Active set strategies, 508 Adaptive tomographic algorithm, 242 Adaptive tomography, 241–243 Aﬃne invariant mean curvature ﬂow equation, 498 -cuts, 67–68 Ambrosio–Tortorelli approximation, 474 Angular data application examples, 153–169 dispersion, 138 Angular standard deviation, 129 Angular valued data, 127 Angular valued function, 149 Anisotropic diﬀusion, 522 Anisotropic noise reduction, 39 Annihilation operator, 249, 256, 257 Antinormal ordering, 211 Approximate proximity, 73 Approximation problem, 476 AR(1) process, 21, 23, 25 Archimedian t-norms, 72 Arctan function, 129 Asymptotic Tikhonov-Morozov method, 480–481, 491–492 numerical simulations, 484–493 Auger electron microprobe (AEM), 346 Auger electrons (AE), 346

B Back projection, 300 Backscattered electrons (BSE), 311, 343–349, 355, 358–361, 374, 379–380, 382–388, 393, 425 Backscattering coeﬃcient, 351 Backward diﬀerence operator, 485 Baker–Campbell–Hausdorﬀ (BCH) formula, 212, 216 Balanced homodyne detector, 215–218, 258 Banach space, 482 Band-pass ﬁlter, 265 Basis restriction error, 23 Beam splitter (BS), 278 Bench-mark problem, 510 Bernoulli convolution, 268, 274 Besov space, 514 regularization, 517 Bhattacharya distance, 93 Bilateral obstacle problem, 508 Binary equations, translating into fuzzy equations, 68 Binary relation, 184 Bingham ﬁltering technique, 450, 461 Bingham ﬂuid ﬂow equation, 448 Bi-orthogonality condition, 230, 237–238 Bi-orthogonality relation, 239 531

532

INDEX

Bloch electrons, 343 Bloch states, 323 Bloch waves, 350 Block diagonal transforms, 28 Block transforms, 2, 8–12, 39–41 Bosonic mode, 257 Bounded variation (BV) sampling method, 506 Bragg angles, 352 Brain, internal representation of space, 61 Brightness, deﬁnition, 170 Brightness function, 175 Brillouin zone, 323, 324 Brodatz textures, 165 B-splines, 477

C Calculus of variations, 493 Cathode lens (CL), 369–374, 383, 388, 389, 392, 393, 395–397, 400, 401, 406, 407, 409, 412, 417, 429 Cauchy principal value, 225, 300 CBED (convergent beam electron diﬀraction), 316, 352 Central limit theorem, 228–229, 233 Centroids, 13 Chroma, 177–178, 180 Chromatic aberration of electron lenses, 317 Circulant matrix, 18 Circular centered gradient, 138, 153 Circular centered morphology, 136–141 Circular centered top-hat, defect detection with, 163–164 Circular data deﬁnition, 124 distributions, 133

mathematical morphology applied to, 123–203 nature of, 125 representation of, 125 types, 126 Circular data processing, 126 rotational invariance in, 125 Circular statistics, 128–129 theory, 197 Circular variance, 129 Closing operators, 142 Codebook, 13 Cognition and spatial distances, 60–63 Coherent signals, tomography of, 277–281 Collection eﬃciency, 394 Color images, 124, 138, 140, 143, 145, 148, 169 See also Hue Color representations, 124 3D polar coordinate, 171–172 Color spaces 3D polar coordinate, 169–181 derivation of useful 3D polar coordinate, 173–178 existing 3D polar coordinate, 172–173 processing of 3D polar coordinate, 181–196 Color statistics, 181–182 Color top-hat, 194 Combined magnetic-electrostatic (compound) objective lens, 367–368 Complete lattice, 184 Completeness condition, 230 Complex lapped transform (CLT), 37 Computer vision, 446, 447 Computerized tomography (CT), 446 Concatenated signal, 45

INDEX

Conceptual spaces, 62 Conditional order, 184, 186 Conditional probability, 518, 519 Conﬁdence interval, 229 Connected component labels, 146 Connected partitions, 141, 199 Conservation of mass, 453 Constrained source coding, 13–14 Contact problem, 507, 510 Continuous-time signals and systems, 3–6 Convolution, 4 Coulomb potential, 330 Covariance matrix, 15–20, 22 eigenvalues of, 247 Crame´r–Rao bound, 289 Creation operator, 256 Crisp distances extending to fuzzy distances, 74 generalizing to fuzzy distance, 67–68 Crystal growth, 451 Crystallinity eﬀects, 350–352 Cubic spline approximation, 474 Curvature-based evolution process, 455 Curvature-based morphological processes, 457 Cyclic closing, 143–145 on indexed partitions, 199–201 Cyclic opening, 145–151 Cyclic operators, 142

D Data compression, 13 Data set deformation, 452 Daubechies’ construction of orthonormal wavelets, 511–513 Deblurring, 463, 481–484 with scale space method, 481–484

533

Defect detection with circular centered top-hat, 163–164 with labeled opening, 164–166 in oriented textures, 163 Denoising, 463, 522 regularization models for, 465–466 Density matrix, 209, 243–244 of single-mode radiation ﬁeld, 292 of spin systems, 293 of two-mode radiation ﬁeld, 293 Depolarizing channel, 241 Detection quantum eﬃciency (DQE), 393 Detection theory, elements of, 209–222 Detector of secondary electrons, 318–319 Diagonal matrix, 17 Dielectric loss function, 328 Dielectric theory, 327 Diﬀraction aberration, 318 Diﬀusion ﬁltering, 447, 517 applications, 458 and wavelet shrinkage, 514–517 Diﬀusion ﬁltering method, 467 Diﬀusion ﬁltering techniques, 496 Digital images, pixel values, 128 Digital signal processing (DSP) system, 8–9 Dilation, 131, 135–136 gradient by, 137 Dilation equation, 512 Dirac delta function, 225 Dirac delta impulse, 6 Dirac impulse, 4 Direct photodetection, 274 Discontinuity set, 473 Discrepancy principle, 477 Discrete bounded variation (BV) regularization, 500–510 sampling, 502–504

534

INDEX

Discrete cosine transform (DCT), 2, 18, 21–23, 25–27, 41, 46 Discrete Fourier transform (DFT), 2, 8–12, 15, 18, 19, 22, 26, 27, 45, 46, 279 Discrete frequency coeﬃcients, 8 Discrete-time convolution, 7, 8 Discrete-time Fourier transform (DTFT), 1, 7–9 Discrete-time LTI systems, 7 Discrete-time signals and systems, 6–8 Displacement operator, 270 Dissimilarity measure, 70 Distance between two fuzzy sets, 65, 85–103 accounting for spatial distances geometrical approach, 94–95 graph theoretic approach, 100–101 histogram of distances, 101–103 morphological approach, 95–99 tolerance-based approach, 99–100 comparison of membership functions functional approach, 86–88 information theoretic approach, 86–88 pattern recognition approach, 93 set theoretic approach, 89–92 Distance between two points in a fuzzy set, 64 Distance density, deﬁnition, 96 Distance from a point to a fuzzy set, 65 as a fuzzy number, 81–85 as a number, 80–81 Distance from set relationships, 69–70 Distance from similarity, 69

Distance information representation of, 70 spatial representations of, 104–107 Distance knowledge to a given object, spatial representation of, 105–107 Distance relationship between two objects, 103 or with respect to a given object, 67 Distances and linguistics, 70 in qualitative setting, 113–114 views on, 54–63 Donoho’s wavelet shrinkage algorithm, 517 Double granulometry, 149, 151, 152 Dual basis, 229

E E B ﬁlter, 384 Edge detection, 458, 462, 463, 516 Elastic mean free path (EMFP), 321–323 Elastic scattering, 319 diﬀerential cross-sections, 322 on nuclei, 320–322 Electric ﬁeld, 249 Electromagnetic ﬁelds, screening against, 318 Electron backscattering, 345–349 Electron backscattering patterns (EBSP), 352 Electron crystallography, 350 Electron diﬀusion, 319 Electron–electron interaction, 318 Electron emission, 343–361 energy dependence, 337 Electron energy loss spectroscopy (EELS), 326, 327, 346

INDEX

Electron lenses, chromatic aberration of, 317 Electron penetration, 331–334 Electron probe scanning, 334 Electron scattering, 319 simulation tools, 340–343 Electronic ampliﬁer, 265 Electronic contrast in semiconductors, 426–430 Electrooptic modulator (EOM), 278 Electrostatic detector objective lens (EDOL), 367, 384, 387, 388 Electrostatic ﬁeld strength, 366 Electrostatic immersion lens, 383 Electrostatic lens, 366 Electrostatic SEM, 313 Emission electron microscope (EEM), 312 Energy band structure, 323 Energy concentration, 16, 17 Energy gaps, 323–324 Energy spectrum of emitted electrons, 343 Entropy, 455 Entropy functions under similarity, 88 Erosion, 135–136 deﬁnition, 131 gradient by, 136, 137 Euclidean distance, 107, 194–195 Euclidean space, 129, 137 Euler equation, 498 Everhart–Thornley (ET) detector, 318–319, 389, 429 Extended lapped transform (ELT), 33

F Fast Fourier transform (FFT), 2, 11, 39, 45, 46

535

FEG SEM, 364, 365, 370, 371, 379 Fick’s law, 454 Fictitious photons tomography, 298, 299 Fidelity measurement, 275–276 Field amplitude, detection of, 249 Field intensity, direct measurement, 248 Filter sequence, 512 Finite-duration signal, 8 Finite volume bounded variation (BV) regularization, 504–505 First-order autoregressive (AR(1)), 20 Fisher information, 289 Fluid ﬂow, 452, 455 Fock representation, 237 Fokker–Planck equations, 209, 210 Fourier integrals, 5 Fourier-optical systems, 1 Fourier transform, 1, 3–12, 124–125, 127, 216, 234, 241 amplitude, 153 Free particle, quantum estimation for, 239 Frequency coeﬃcients, 42–43 Frequency-domain enhancement, 2 Fuzziﬁcation equations, 68 Fuzziﬁcation methods, 67–68 Fuzzy cognitive map framework, 62 Fuzzy dilation, 96, 98–99, 106, 107, 114 Fuzzy distance extending crisp distances to, 74 general principles for deﬁning, 67–71 types and problems, 64–67 Fuzzy geodesic distance between two points in a fuzzy set, 78 deﬁned as fuzzy number, 77–79 deﬁned as number, 75–77

536

INDEX

Fuzzy mathematical morphology, 92 Fuzzy nearest point distance, 98 Fuzzy set theory, 53 Fuzzy sets geodesic distances in, 75–79 semiquantitative or semiqualitative interpretation, 53 Fuzzy spatial distances, 51–122 Fuzzy structuring elements, 112

Geometrical conﬁguration and space, 56 Geometrical symmetries, 301 Glauber formula, 235, 238 Gradient by dilation, 137 Gradient by erosion, 136, 137 Gram–Schmidt orthogonalization procedure, 232 Gray level modiﬁcation, 452 Green’s formula, 467, 479 Group tomography, 238

G Gamma correction, 170 Gaussian convolution, 160, 212, 218, 234, 262, 274, 461–462 Gaussian distribution, 229, 262 Gaussian ﬁlter, 157, 158 Gaussian function, 157 Gaussian state estimation, 295–298 Gaussian Wigner functions, 296–297 General method of quantum tomography, 227–239 General tomographic method, 222–243 Generalized lightness, hue, and saturation (GLHS) model, 173 Generalized squeezed quadrature operators, 236 Generalized Wigner function, 208, 213, 221, 250, 251 Geocognostics framework, 62 Geodesic dilation, 79 Geodesic distance See also Fuzzy geodesic distance in fuzzy sets, 75–79 between two points in 2D space, 76 Geographic information systems (GIS), 62 Geometric phase image, 154–155

H Haar’s invariant measure, 238 Hadamard’s inequality, 17 Hadamard’s principle of wellposedness, 461 Harmonic oscillator systems, quantum estimation for, 232–235 Hausdorﬀ distance, 66, 74, 85, 94, 95, 99, 102, 106, 113, 114 deﬁnition, 97 Hausdorﬀ measure, 473 Heat equation, 447–448, 522 Heisenberg evolution, 297 Heisenberg uncertainty principle, 206 Hermite polynomial, 244, 247, 292 Heterodyne detection, 218–222, 273 and homodyne tomography, 253–255 High-resolution transmission electron microscope. See HRTEM images Hilbert distance, 303 Hilbert–Schmidt operator, 256 Hilbert space, 224, 244, 294, 301, 464, 481 Histogram of distances, 101–103

INDEX

HLS space, 131, 172–173, 177, 178 Homodyne data, 278 Homodyne detector, 234 balanced, 258 Homodyne probability distribution, 257, 260 Homodyne tomography, 207–209, 272 and heterodyne detection, 253–255 multimode, 255–265 observables, 243–246 of quantum operation, 286 as universal detector, 243–245 Homogeneous phase extraction in HRTEM images, 153–156 HRTEM images homogeneous phase extraction in, 153–156 Y-TZP, 153 HSV space, 172–173, 177 Hue, 138, 139, 143, 145, 148, 149, 189–190 saturation-weighted, 182, 191–193 Hue angle, 175 Hue mean, saturation-weighted, 182 Human perception and spatial distances, 59–60 Hypergeometric function, 275, 276, 303 Hyperspherical parameterization, 256

I Iconicity diagrammatic, 59 imagistic, 59 IHLS color space, lexicographical order in, 187–195 IHLS space, 125–126, 169, 178–180, 194–195, 197–198

537

inverse transformation to RGB space, 179–180 transformation to RGB space, 178–180 Ill-posedness, 461, 464, 479, 498–499 Image compression, 301 Image enhancement and restoration, 37–38 Image processing, 74, 446, 460 Image processing and analysis, 125, 126 Image reconstruction, 3, 300 Image restoration and enhancement, 39–41 Image segmentation, 458 Image smoothing, 447 Immersion objective lens (IOL), 313, 366, 382–384, 387, 394, 395, 399, 401, 406 Improved hue, luminance, and saturation space. See IHLS space Impulse response, 4, 20 Inclusion index, 91–92 Inclusion measure, 69 Indexed partition, 141–142, 145 cyclic closings on, 199–201 deﬁnition, 200 Inelastic mean free path (IMFP), 311, 319, 328, 329 Inelastic scattering on atoms, 328–331 on electrons, 324–328 Inﬁmum, 131, 132, 137, 184–186, 198 Integro-diﬀerential equation, 484 Inverse Fourier transform, 6 Inverse problems, 446 ill-posedness, 446 regularization of, 460–471 scale space methods for, 478–493

538

INDEX

Inverse Radon transform, 207, 222, 225, 226, 298 Inverse scale space method, 488–490, 521 IRF, 377 Isotropic opening, 149 Iterative relative error regularization, 496–497 Iterative Tikhonov-Morozov method, 480

J JPEG algorithm, 2 JPEG compression, 3

K Karhunen–Loe`ve transform (KLT), 17, 18, 20–25 Kernel functions, 223, 227 Kripke’s semantics, 109

L Label boundary points, 146 Labeled angular image, 145, 146, 148 Labeled openings, 150 defect detection with, 164–166 Lagrange multiplier, 16, 290, 291, 476 Laguerre polynomial, 235, 258, 259, 280 Language. See Linguistics Lapped directional transform (LDT), 38–41 Lapped orthogonal transform (LOT), 3, 30–32, 42 basis functions, 33 coding gain, 32

deﬁnition, 31 extensions, 36–39 Lapped transforms, 2, 3, 28–39, 42 deﬁnition, 29 extension to, 29–30 LEED, 316, 317, 352, 403, 404, 406–407 Level curve, 453, 454 Level set modeling, 453–455 Lexicographical order, 184, 186 in IHLS color space, 187–195 Lightness, deﬁnition, 171 Linear anisotropic diﬀusion equation, 449 Linear anisotropic diﬀusion ﬁltering, 449 Linear block transforms, 9 Linear ill-posed problems, 464 Linear inverse problems, 463 Linear statistical dependencies, 14 Linear system theory, 3–12 Linear time-invariant (LTI) systems, 4–8, 20, 41 Linear transforms, 14 Linguistics and distances, 70 and spatial distances, 57–59, 64 Local oscillators (LO), 215, 219, 265 Log-likelihood function, 288, 296 Longitudinal optical phonons, 328 Low-energy electron diﬀraction. See LEED Low-energy electron microscope (LEEM), 312, 351, 369, 422, 431 Low-pass ﬁlters, 265 Lukasiewicz t-conorm, 92 Lukasiewicz t-norm, 72 Luminance, 149, 187–189 calculation, 171 deﬁnition, 171 Lyapunov functionals, 455

INDEX

M Magnetic pinhole lens, 401 Magnetic resonance (MR) image, 468–471, 494 Magniﬁcation correction factor, 412 Marginal order, 184, 185 Markov-I process, 20 Mathematical morphology, 108 applied to circular data, 123–203 choice of origin, 130–132 operations, 53 unit circle, 129–130 vectorial, 183–187 MATLAB, 458 Maximization problem, 291 Maximum likelihood estimator, 288 Maximum likelihood principle, 209 Maximum likelihood quantum state estimation, 289–294 Mean curvature motion, 451, 453, 463, 522 Mean direction, 128 Median ﬁlter, 510 Medical imaging, 446, 460 MEDOL (magnetic-electrostatic detectorobjectivelens),367,387 Membership functions, 106–108 comparison of, 86–93 Membership values, numerical representation, 105 Mereotopology, 108 Microchannel plate (MCP), 387 Minimization models, 497–498 Minimizing element, 476 Minimizing function, 477 Minkowski diﬀerence, 97 Mobile robotics, 62 MOCASIM program, 343 Modal logics, 53 Modulated complex lapped transform (MCLT), 37

539

Modulated lapped transform (MLT), 3, 33–36, 37, 42 basis functions, 34 coding gain, 36 extensions, 36–39 Moments generating function, 257 Monte Carlo (MC) procedure, 341–342 Morphological center, 132–135 Morphological diﬀerential equations, 451 Morphological diﬀusion ﬁltering, 455–457 Morphological gradients, 136–138 Morphological operators, 130, 157, 187, 197 set deﬁnitions, 109 Morphological partial diﬀerential equations, 452 Morphological segmentation of oriented textures, 161 Morphologics, 109–113 MOS (metal–oxide– semiconductor), 335 Mother wavelet, 512 Mott cross-sections, 321 -cut, 76 Multidimensional discrete BV regularization, 506–508 Multidimensional scaling functions, 513 Multidimensional wavelets, 513 Multimode homodyne tomography, 255–265 Multiplicity numerical, 56 qualitative, 56 Mumford–Shah ﬁltering, 472–474 Mumford–Shah functional, 474 Mumford–Shah segmentation, 473, 474

540

INDEX

N Nd:YAG laser, 277, 285 Nearest point distance, 96 Neumann boundary data, 474 Neuroimaging, 61 No-cloning theorem, 206 Noise deconvolution, 234, 239–241 quantum tomography, 250 removal, 446 in tomographic measurements, 246–253 Noise ratio, 248, 250, 252–253 Nonconvex regularization models, 493–500 Nondegenerate optical parametric ampliﬁer(NOPA),265,273,276 Nondestructive evaluation, 446 Nondiﬀerentiableregularization,464 Nonlinear anisotropic diﬀusion, 451, 462, 466 Nonlinear BV-regularized reconstructions, 510 Nonlinear inverse problems, 464 Non-local approximations, 474 Normal ordering, 211, 214, 253 Null estimators, 231, 243

O Opening operators, 142 Optical domain, 285–287 Optical ﬁlter, 280, 281 Optimal scalar quantization, 14 Optimum angular aperture, 372 Orientation images, 163, 166 Orientation summary image, 159 Oriented textures, 156–166 defect detection in, 163 morphological segmentation, 161

Origin, choice of, 130–132, 153 Orthogonal linear transform, 22 Orthogonal polarizations, 261 Orthogonality relation, 220, 238

P p-axial circular data, 127 P-function, 211, 266–267 Parallel openings, 144–146 Partial diﬀerential equations (PDEs), 446, 447 Partial order, 184, 185 Partitions, 141–151 deﬁnition, 199 Pattern functions, 223 Pattern recognition, 74 Paul trap, 232 Pauli matrices, 238 PEEM, 378 Periodic wavelets, 514 Perona–Malik diﬀusion ﬁltering, 451, 452, 466–468, 495, 522 Perona–Malik model, 498 Perona–Malik regularization, 493–494 Peters formulation, 131 Phase-squeezed state, 269 Photodetection, 213–215, 248 Photodiodes, 278, 281 Photoemission electron microscope (PEEM), 312 Photon number, 254, 259–260, 262–265, 271 detection, 273 distribution, 279 probability distribution, 266 Photon statistics, 277 Piecewise constant function and traces, 501 Plastic viscosity, 449

INDEX

Polymer processing, 451 Positive operator-valued measure (POVM), 217–221, 232, 274, 289 Probability density functions, 518, 519 Probability distribution, 215, 216, 218, 236, 250, 251, 259, 262, 264 Projection postulate, 272 Proximity, perception of, 59–60 Pseudoclosing operator, 136 Pseudodilation, 132–136 Pseudoerosion, 132–136 Pseudoopening operator, 136 Pseudooperators, 136, 151 Pulse code modulation (PCM), 13

Q Q-function, 222 Qualitative distance in symbolic setting, 108–114 Quantitative measures in spatial reasoning, 60 Quantization index, 13 Quantum device, tomography of, 281–287 Quantum domain, extension to, 225–226 Quantum eﬃciency, 213, 214, 217, 221, 235, 245, 250, 254, 257, 262, 264, 271, 274 Quantum estimation for free particle, 239 for harmonic oscillator systems, 232–235 maximum likelihood method, 287–298 for spin systems, 237–239 Quantum hologram, 224

541

Quantum homodyne tomography, 226, 233, 270, 272, 304 experimental situations, 277 Quantum imaging, from classical imaging, 299–304 Quantum measurements, 265–281 of observables, 274 Quantum mechanics, 272, 281–282 Quantum operation, 282–285 homodyne tomography of, 286 Quantum optical phase, 251 Quantum optics, 206–207, 272 Quantum radiography, 208 Quantum standard reference, 287 Quantum state, 206, 223 maximum likelihood estimation, 289–294 nonclassicality measurement, 266–272 reconstruction, 279 two-mode ﬁeld, 293 Quantum tomography, 205–308 aim, 227 applications, 265–281 basic statistics, 227–229 classical imaging by, 298–304 deﬁnition, 206 history, 208, 223–224 noise of, 250 overview, 206–209 Quorum, 227, 231–232 characterization, 229–232

R Radon transform, 207, 224, 232, 300, 464 inversion, 207, 222, 225, 226, 298 Radon transform-based imaging procedure, 223 Random variables, 518

542

INDEX

Rao and Schunck algorithm, 157–161 Rayleigh criterion, 376, 377 Rayleigh distributed noise, 519 Rayleigh noise, 520 Reconstruction formula, 239 Reconstruction technique, 209 Reduced order, 184, 185 Reﬂection coeﬃcient, 325 Reﬂection EELS (REELS), 346 Regularization inverse problems, 460–471 methods, 464 nonstationary, 482 numerical experiments, 468–471 parameters, 482 relative error, 494–500 and spline approximation, 474–478 and statistics, 517–520 Tikhonov, 464–465, 467, 469, 496, 519 Regularization functional, 473 Regularization models for denoising, 465–466 Reindexation, 145 Relative error regularization, 494–500, 496 Reproducing kernel, 230–231 Resemblance measure, 69 Retarding ﬁeld principle, 369 RGB color image, 169 RGB color space, 169 RGB cube, 169 RGB rectangular coordinates, 171 RGB space, 173, 175, 189, 198 inverse transformation from IHLS, 179–180 transformation to IHLS space, 178–180 Rotational invariance in circular data processing, 125

Rotationally invariant cyclic openings, 146–151 Rotationally invariant operator, 125, 151

S Satisﬁability measure, 69 Saturation, 187–189 calculation, 176–177 Saturation-weighted hue, 191–193 Saturation-weighted hue mean, 182 Scalar uniform quantizer, 43 Scale space, 455 deﬁnition, 458 Scale-space methods, 445–530 deblurring, 481–484 for inverse problems, 478–493 Scale-space theory, 458–459 Scanning electron microscope (SEM), 310 adaptation, 399–401 dedicated equipment, 401–407 specialized, 401 Scanning low-energy electron microscopy. See SLEEM Scanning transmission electron microscope. See STEM Scanwood System, 164 Schro¨dinger cat state, 267, 269 Schro¨dinger equation, 350, 351 Schro¨dinger kitten state, 282 Secondary electrons (SE), 343–345, 354–361, 374, 376, 379–380, 382–387, 393, 402, 403 Segmentation algorithm, 161 Semiconductor laser, 277 Semiconductors, electronic contrast in, 426–430 Semi-group theory, 459, 468 Semi-implicit time step, 498

INDEX

Semi-inﬁnite dimensional setting, 474 Semimetrics, 74 Semipseudometrics, 72, 74, 91, 99 Sensory conﬂicts between visual and nonvisual information, 61 Series closings, 142–144 Set relationships, distances from, 69–70 Set theoretic morphological operations, 109 Shapes, comparison of, 73 Sign language, 59 Signal blocks, 3 Signal processing, 460 Signal-to-noise ratios (SNR), 393 Signal transforms, 2 Similarity distances from, 69 entropy functions under, 88 Similarity relation, 71 Similitude measure, 69 SIMION 3D package, 383 Single-mode nonclassicality, 267–270 Single-mode radiation ﬁeld, density matrix of, 292 Single-pole condenser lens (SPCL), 402 Single-pole objective lens (SPOL), 402 SLEEM, 309–443 above-surface electric ﬁeld, 349 aims, 313 alignment and operation, 407–412 applications, 413–430 cathode lens, 369–374 coherence within primary beam spot, 353–354 combination with surface microanalysis, 405 contrast of crystal orientation, 422 critical energy mode, 418–419

543 detection and specimen-related issues, 381–399 detection strategies, 382–386 detectors, 387–393 diﬀraction contrast, 419–422 dynamics of charging process, 339 electronic contrast in semiconductors, 426–430 energy-band contrast, 430 extensions to conventional modes of operation, 314–316 ﬁrst demonstration experiments, 313 formation of primary beam, 361–380 general characteristics of micrograph series, 415–416 heating and damage of specimen, 334–336 ideal dedicated instrument, 405–406 illumination coherence, 421 incorporation of retarding ﬁeld, 366–369 instruments, 399–413 interaction of slow electrons with solids, 319–343 issues inherent to slow electron beams, 317–319 layered structures, 422–425 material contrast, 425–426 motivations to lower electron energy, 314–319 new opportunities, 316–317 overview, 310–314 pixel size, 374–377 practical issues, 410–413 primary beam trajectory inside objective and cathode lenses, 411 prospective application areas, 414–415

544

INDEX

SLEEM (cont.) quantitative limits, 310 secondary electron emission, 354–361 signal composition, 393–394 specimen charging, 336–340 specimen surface, 394–397 specimen tilt, 397–399 spot size, 362–365 spurious eﬀects, 317–319, 377–379 surface relief, 417 testing the resolution, 379–380 tilted specimen, 412 Sobolev space, 465, 478, 481 s-ordered Wigner functions, 210, 212 s-ordering, 209 Source coding, 13 Space 3D, 55 4D, 55 and geometrical conﬁguration, 56 of operators, 282 and spatial concepts, philosophical thinking, 54–57 views on, 54–63 Spatial distances, 53 See also Distance between two fuzzy sets and cognition, 60–63 economic measures, 59 and human perception, 59–60 information as edge attribute, 100 and linguistics, 57–59, 64 measures of, 59 mental representation, 61 perceptual measures, 59 properties of distances and requirements for, 71–75 temporal measures, 59

Spatial environment, cognitive understanding, 60 Spatial expressions, meaning of, 58 Spatial fuzzy distances general consideration, 63–75 represention issues, 64 Spatial fuzzy sets, 63 as representation framework, 104–105 Spatial information, 52, 53 Spatial knowledge, 54, 56 Spatial measures, 59 Spatial metaphors, 58 Spatial ordering, 55 Spatial reasoning, 54, 109, 112 qualitative information in, 60 quantitative measures in, 60 Spatial relationships, 52, 54, 57, 115 Spatial representation of distance information, 104–107 of distance knowledge to a given object, 105–107 Spatial situations, describing, 58 Spectrograms, 124 Spin systems density matrix of, 293 quantum estimation for, 237–239 Spin tomography identity, 238 Spline approximation and regularization, 474–478 Square matrix, 29 Standard morphological gradient operator, 157 State reduction (SR), 272–276 Statistics and regularization, 517–520 STEM (scanning transmission electron microscope), 316, 352, 369 Stochastic interactions, 377–378

INDEX

Stochastic models and Tikhonovtype regularization, 519 Stochastic regularization, 522 Structuring element, 130, 131, 132, 137, 144, 146, 147, 152 in morpho-logics, 109–113 origin, 138, 140 Supremum, 131, 132, 137, 184–186, 198 Symmetric gradient, 137 Symmetrical ordering, 211

T t-conorm, 92, 96, 103 t-conorm dual, 72 t-equivalence, 71 t-indistinguishability, 71, 74 t-norm, 92, 96, 102, 103 t-transitivity, 73 Taut string algorithm, 505–506, 508 TEG SEM, 363–365, 370, 371, 376, 379 Thomson–Whiddington law, 331 Tikhonov functional, 478, 480 Tikhonov–Morozov method, 483–484, 492 Tikhonov regularization, 464–465, 467, 469, 496, 519 and stochastic models, 519 Tilt-angle dependence, 359 Time-discrete diﬀusion ﬁltering, 467 Time-domain aliasing cancellation (TDAC), 36 Toeplitz matrix, 15, 18, 22 Tomographic estimator, 223, 238, 249 Tomographic imaging, 224–226 Tomographic measurements, noise in, 246–253

545

Tomographic phase measurement, 251–253 Tomographic reconstruction, 229, 234, 264, 270, 278 Tomography See also Quantum tomography of coherent signals, 277–281 of quantum device, 281–287 Top-hat operator, 138–141, 153, 194 Total order, 184 Trace condition, 230–231 Transfer function, 20 Transfer matrix, 282 Transfer width, 353 Transform coder and decoder, 13–14 Transform coding, 2, 13–25 performance, 23–25 Transform coeﬃcient, 15–17, 24–26, 28, 29, 36 Transform eﬃciency, 14–23 Transform matrix, 27, 28, 35, 42 Transform signals, 1–3 Transform tensor, 26 Transforms, role of, 13–14 Transmission electron microscope (TEM), 310, 369 Transport equation, 341 Triangular inequality, 77 Truncated Hilbert space dimension, 301 Tube method, 504, 505 Tversky deﬁnitions, 69 Twin-beam state, 263 Two-color images, 131 Two-dimensional MLT, 37–39 Two-dimensional transforms, 25–28 Two-LO tomography, 263 Two-mode ﬁeld, quantum state of, 293 Two-mode nonclassicality, 270–272

546

INDEX

Two-mode radiation ﬁeld, density matrix of, 293 Two-mode tomography, 260–265 numerical results, 260–265 Type II phase-matched parametric ampliﬁer, 261

U UHV, 428, 431 UHV SEM, 403 UHV SLEEM, 380–381, 404 Ultimate spotsize, 372 Ultrahigh-vacuum (UHV) devices, 318 Uniform quantization, 43 Uniform scalar quantization, 13 Unit circle, 126–128, 197 mathematical morphology, 129–130 Unitary operator, 261 Unitary transform matrix, 17 Unitary transformation, 262

W Wavelet coeﬃcients, 514, 516 Wavelet shrinkage, 510–517 denoising, 514–517 and diﬀusion ﬁltering, 514–517 Wavelet spaces, 512 Well-posedness, 461 Weyl–Heisenberg group, 238 Weyl’s quantization procedure, 210 Wien condition, 384 Wien ﬁlter, 384, 386, 403 Wigner function, 207, 210–213, 222, 225–226, 267, 295, 297, 299, 301 expansion, 301 reconstruction, 279

X Xenon’s paradox, 58 X-ray photon, 330 X-ray tomography, 299

V Vectorial data, 126–127 Vectorial mathematical morphology, 183–187 Vectorial orders, 183–186 VLEED (very-low-energy electron diﬀraction), 323

Y YAG:Ce3 þ single-crystal scintillator, 388–390, 393, 400 Yield stress, 449 Y-TZP, HRTEM image of, 153

(a)

(b)

(d)

(c)

(e)

(f)

FIGURE A.1. Color example images. (a) Fruit image (with red regions outlined). (b) ‘‘The Virgin’’ by P. Serra in the St. Cugat monastery in Barcelona (size 352 334 pixels). (c) Subregion of Figure A.2(a). (d) Map image with the top half inverted. (e) Map image. (f ) Cell image.

(a)

(b)

FIGURE A.2. Example of a cyclic closing. (a) Initial color image (size 441 297 pixels). (b) The color image after a cyclic closing of the hue by a square SE of size 10.

FIGURE A.3. (a) Four colors and their values of hue, luminance, and saturation. (b) Lizard image (size 544 360 pixels). (c) Morphological closing of image (b) using a lexicographical order with saturation at the ﬁrst level.

(a) Conic HSV

(b) Cyl HSV

(c) Bi-conic HLS

(d) Cyl HLS

FIGURE A.4. Slices through the conic and cylindrical HSV and HLS spaces.

FIGURE A.5. Results of the color morphological operators.

Advances in Imaging and Electron Physics, Volume 128 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 148 (Advances in Imaging and Electron Physics) (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 127 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 132 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 143 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 120 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 121 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 111 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 102 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 113 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 144 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 125 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics (Volume 112) (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 113 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 150 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, volume 136 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 111 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 101 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 135 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 130 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 99 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 141 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 146 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 123: Advances in Electron Microscopy and Diffraction (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 134 (Advances in Imaging & Electron Physics)

Advances in Imaging & Electron Physics - Volume 100 Cumulative Index (Advances in Imaging and Electron Physics)

Advances in Imaging & Electron Physics, Volume 122 (Advances in Imaging and Electron Physics)