ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 146
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES-CNRS Toulouse, France
HONORARY ASSOCIATE EDITORS
TOM MULVEY BENJAMIN KAZAN
Advances in
Imaging and Electron Physics
E DITED BY
PETER W. HAWKES CEMES-CNRS Toulouse, France
VOLUME 146
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier 525 B Street, Suite 1900, San Diego, California 92101-4495, USA 84 Theobald’s Road, London WC1X 8RR, UK ∞ This book is printed on acid-free paper.
Copyright © 2007, Elsevier Inc. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the Publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (www.copyright.com), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-2007 chapters are as shown on the title pages. If no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/2007 $35.00 Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, E-mail:
[email protected]. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.” For information on all Elsevier Academic Press publications visit our Web site at www.books.elsevier.com ISBN-13: 978-0-12-373908-7 ISBN-10: 0-12-373908-X PRINTED IN THE UNITED STATES OF AMERICA 07 08 09 10 9 8 7 6 5 4 3 2 1
CONTENTS
C ONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . P REFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F UTURE C ONTRIBUTIONS . . . . . . . . . . . . . . . . . . . . . .
vii ix xi
Spiral Phase Microscopy S EVERIN F ÜRHAPTER , A LEXANDER J ESACHER , C HRISTIAN M AURER , S TEFAN B ERNET, AND M ONIKA R ITSCH -M ARTE I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . II. Isotropic Edge Enhancement with a Spiral Phase Filter . . . . III. Asymmetric Edge Enhancement Using a Modified Spiral Phase Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV. Rotating Shadow Effect . . . . . . . . . . . . . . . . . . . . V. Optically Thick Samples—Spiral Interferometry . . . . . . . VI. Summary and Outlook . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
. .
1 5
. . . . .
18 23 32 46 52
LULU Theory, Idempotent Stack Filters, and the Mathematics of Vision of Marr C ARL H. ROHWER AND M ARCEL W ILD I. II. III. IV.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . The LULU Framework for Image Analysis and Decomposition Vistas on Idempotency . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
. 58 . 59 . 107 . 156 . 159
Bayesian Information Geometry: Application to Prior Selection on Statistical Manifolds H ICHEM S NOUSSI I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 164 v
vi II. III. IV. V. VI. VII. VIII.
CONTENTS
Differential Geometry Tools . . . . . . . Statistical Geometric Learning . . . . . . Prior Selection . . . . . . . . . . . . . . δ-Flat Families . . . . . . . . . . . . . . Mixture of δ-Flat Families and Singularities Examples . . . . . . . . . . . . . . . . . Conclusion and Discussion . . . . . . . . References . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
166 176 181 187 192 195 203 206
I NDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
CONTRIBUTORS
Numbers in parentheses indicate the pages on which the authors’ contributions begin.
S TEFAN B ERNET (1), Division for Biomedical Physics, Innsbruck Medical University, A-6020 Innsbruck, Austria S EVERIN F ÜRHAPTER (1), Division for Biomedical Physics, Innsbruck Medical University, A-6020 Innsbruck, Austria A LEXANDER J ESACHER (1), Division for Biomedical Physics, Innsbruck Medical University, A-6020 Innsbruck, Austria C HRISTIAN M AURER (1), Division for Biomedical Physics, Innsbruck Medical University, A-6020 Innsbruck, Austria M ONIKA R ITSCH -M ARTE (1), Division for Biomedical Physics, Innsbruck Medical University, A-6020 Innsbruck, Austria C ARL H. ROHWER (57), Department of Mathematics, University of Stellenbosch, Matieland 7602, South Africa H ICHEM S NOUSSI (163), ICD/LM2S, Charles Delaunay Institute, University of Technology of Troyes, 10010 Troyes, France M ARCEL W ILD (57), Department of Mathematics, University of Stellenbosch, Matieland 7602, South Africa
vii
This page intentionally left blank
PREFACE
The three subjects examined in this volume are taken from phase-contrast microscopy, the mathematics of vision, and the new field of information geometry. The need to image phase objects in microscopes has given rise to many ingenious suggestions; the best known is that of Frits Zernike in the 1930s. A recent addition involves the use of a spiral phase filter in the Fourier plane of the instrument; this is the subject of the first chapter by S. Fürhapter, A. Jesacher, C. Maurer, S. Bernet, and M. Ritsch-Marte, who have explored the technique in detail. Here, they explain how such a phase filter works and give numerous examples of its value in practice. The presentation is extremely clear and helpful and will, I am confident, help to bring the technique into wider use. This is followed by a chapter inspired by the theory of vision, by C.H. Rohrer and M. Wild. This begins with a presentation of “LULU” theory, where L and U are operators made up of the max and min operators. The second part is concerned with stack filters and their design, and includes sections on mathematical morphology and lattice stack filters. This is a broad-ranging article, and contains many speculations, as well as the formal theory. The volume concludes with an account by H. Snoussi of information geometry, a relatively new field. The attempts to employ it in information theory and physics were made in the mid-1980s. Here, this information geometry is used for selecting the best priors in Bayesian learning structures. The author shows how this problem can be solved and makes a convincing case for using this new tool in related areas. This lucid presentation of a new subject will surely be much appreciated. As always, I thank all the authors for contributing to the series and for the trouble they have taken to make their material accessible to a wide readership. Forthcoming contributions are listed in the following pages. Peter W. Hawkes
ix
This page intentionally left blank
FUTURE CONTRIBUTIONS
G. Abbate New developments in liquid-crystal–based photonic devices S. Ando Gradient operators and edge and corner detection P. Batson (special volume on aberration-corrected electron microscopy) Some applications of aberration-corrected electron microscopy C. Beeli Structure and microscopy of quasicrystals V.T. Binh and V. Semet Planar cold cathodes A.B. Bleloch (special volume on aberration-corrected electron microscopy) Aberration correction and the SuperSTEM project C. Bontus and T. Köhler Helical cone-beam tomography G. Borgefors Distance transforms Z. Bouchal Non-diffracting optical beams A. Buchau Boundary element or integral equation methods for static and time-dependent problems B. Buchberger Gröbner bases xi
xii
FUTURE CONTRIBUTIONS
F. Colonna and G. Easley The generalized discrete Radon transforms and their use in the ridgelet transform T. Cremer Neutron microscopy A.X. Falcão The image foresting transform R.G. Forbes Liquid metal ion sources C. Fredembach Eigenregions for image classification A. Gölzhäuser Recent advances in electron holography with point sources D. Greenfield and M. Monastyrskii Selected problems of computational charged particle optics M. Haider (special volume on aberration-corrected electron microscopy) Aberration correction in electron microscopy M.I. Herrera The development of electron microscopy in Spain N.S.T. Hirata Stack filter design M. Hÿtch, E. Snoeck, and F. Houdellier (special volume on aberrationcorrected electron microscopy) Aberration correction in practice K. Ishizuka Contrast transfer and crystal images J. Isenberg Imaging IR-techniques for the characterization of solar cells A. Jacobo Intracavity type II second-harmonic generation for image processing
FUTURE CONTRIBUTIONS
xiii
K. Jensen Field-emission source mechanisms B. Kabius (special volume on aberration-corrected electron microscopy) Applications of aberration-corrected microscopes L. Kipp Photon sieves A. Kirkland and P.D. Nellist (special volume on aberration-corrected electron microscopy) Aberration-corrected electron microscopy G. Kögel Positron microscopy T. Kohashi Spin-polarized scanning electron microscopy O.L. Krivanek (special volume on aberration-corrected electron microscopy) Aberration correction and STEM R. Leitgeb Fourier domain and time domain optical coherence tomography B. Lencová Modern developments in electron optical calculations H. Lichte (vol. 150) New developments in electron holography W. Lodwick (vol. 147) Interval and fuzzy analysis: A unified approach L. Macaire, N. Vandenbroucke, and J.-G. Postaire Color spaces and segmentation M. Matsuya Calculation of aberration coefficients using Lie algebra S. McVitie Microscopy of magnetic specimens
xiv
FUTURE CONTRIBUTIONS
S. Morfu and P. Marquié Nonlinear systems for image processing T. Nitta Back-propagation and complex-valued neurons M.A. O’Keefe Electron image simulation D. Oulton and H. Owens Colorimetric imaging N. Papamarkos and A. Kesidis The inverse Hough transform R.F.W. Pease (vol. 150) Miniaturization K.S. Pedersen, A. Lee, and M. Nielsen The scale-space properties of natural images S.J. Pennycook (special volume on aberration-corrected electron microscopy) Some applications of aberration-corrected electron microscopy I. Perfilieva Fuzzy transforms: A challenge to conventional transforms V. Randle Electron back-scatter diffraction E. Rau Energy analysers for electron microscopes E. Recami Superluminal solutions to wave equations J. Rodenburg (vol. 150) Ptychography and related diffractive imaging methods H. Rose (special volume on aberration-corrected electron microscopy) The history of aberration correction in electron microscopy
FUTURE CONTRIBUTIONS
xv
C.M. Parish and P.E. Russell Scanning cathodoluminescence microscopy G. Schmahl X-ray microscopy J. Serra (vol. 150) New aspects of mathematical morphology R. Shimizu, T. Ikuta, and Y. Takai Defocus image modulation processing in real time S. Shirai CRT gun design methods T. Soma Focus-deflection systems and their applications J.-L. Starck Independent component analysis: The sparsity revolution I. Talmon Study of complex fluids by transmission electron microscopy N. Taneka (special volume on aberration-corrected electron microscopy) Using aberration-corrected instruments G. Teschke and I. Daubechies Image restoration and wavelets M.E. Testorf and M. Fiddy Imaging from scattered electromagnetic fields, investigations into an unsolved problem M. Tonouchi Terahertz radiation imaging N.M. Towghi Ip norm optimal filters E. Twerdowski Defocused acoustic transmission microscopy
xvi
FUTURE CONTRIBUTIONS
Y. Uchikawa Electron gun optics K. Urban (special volume on aberration-corrected electron microscopy) Aberration correction in practice C. Vachier-Mammar and F. Meyer Watersheds K. Vaeth and G. Rajeswaran Organic light-emitting arrays M. van Droogenbroeck and M. Buckley Anchors in mathematical morphology R. Withers Disorder, structured diffuse scattering, and local crystal chemistry Y. Zhu (special volume on aberration-corrected electron microscopy) Some applications of aberration-corrected electron microscopy
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 146
Spiral Phase Microscopy SEVERIN FÜRHAPTER, ALEXANDER JESACHER, CHRISTIAN MAURER, STEFAN BERNET, AND MONIKA RITSCH-MARTE Division for Biomedical Physics, Innsbruck Medical University, A-6020 Innsbruck, Austria
I. Introduction . . . . . . . . . . . . . . . . . . II. Isotropic Edge Enhancement with a Spiral Phase Filter . . . . . A. Experimental Realization . . . . . . . . . . . . . 1. Coherent Illumination . . . . . . . . . . . . . 2. Low-Coherence Illumination . . . . . . . . . . . B. Results . . . . . . . . . . . . . . . . . . 1. Images Obtained with Laser Illumination . . . . . . . 2. Images Obtained with Partially Coherent Illumination . . . III. Asymmetric Edge Enhancement Using a Modified Spiral Phase Filter . A. Influence of the Zeroth-Order Fourier Component . . . . . . 1. Experimental Realization of an Asymmetric Spiral Phase Filter . B. Results . . . . . . . . . . . . . . . . . . IV. Rotating Shadow Effect . . . . . . . . . . . . . . A. Reconstruction of the Sample Topography Using a Series of Images B. Results . . . . . . . . . . . . . . . . . . V. Optically Thick Samples—Spiral Interferometry . . . . . . . A. The Origin of Spiral Fringes . . . . . . . . . . . . B. Demodulation of Spiral Interferograms . . . . . . . . . 1. Single-Image Demodulation . . . . . . . . . . . 2. Multi-Image Demodulation . . . . . . . . . . . VI. Summary and Outlook . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . Appendix A. Details on the Spiral Kernel . . . . . . . . . Appendix B. Details on the Vortex Filter Expansion . . . . . . Appendix C. Demodulation of Multiple Images . . . . . . . References . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
1 5 9 9 12 13 13 15 18 18 20 21 23 25 28 32 35 39 39 42 46 48 48 50 51 52
I. I NTRODUCTION The invention of the light microscope allowed a first glimpse into the world of micron- and smaller-sized objects that are otherwise not resolvable with the human eye. First microscopes used the brightfield mode, where a specimen is illuminated and the transmitted or reflected light is imaged by a microscope 1 ISSN 1076-5670 DOI: 10.1016/S1076-5670(06)46001-8
Copyright 2007, Elsevier Inc. All rights reserved.
2
FÜRHAPTER ET AL .
objective. This method still plays an important role in microscopy. Because the human eye is unable to recognize phase changes, a brightfield microscope is only suitable for specimens that show an amplitude contrast. An object is called an amplitude object if it absorbs parts of the incoming light due to pigments within the sample. The fact that the majority of the examined biological samples consist largely of water leads to poor contrast from the surrounding medium. In fluorescence microscopy, biological cells are stained so that specific parts can be examined. The labeling of cells is a complex process that needs extensive preparation. The examiner must know in advance which parts of a sample are to be imaged, and on this basis, a marker must be selected. In many cases, the dyes used are harmful and destroy the sample. These shortcomings led to the development of a variety of microscopy methods whose aim was to enhance the contrast and to unveil parts of transparent specimens that are not visible in brightfield mode. Established methods in optical microscopy that solve this problem are, for instance, darkfield, phase contrast, differential interference contrast, Hoffman contrast, or Dodt contrast imaging. In order to enhance contrast in light microscopy, the origin of the contrast must be understood (Born and Wolf, 1980). An excellent compendium that describes the principles of contrast in microscopy is, for example, given in Microscopy Primer (2006), http://micro.magnet.fsu.edu/primer/techniques/ contrast.html and a general overview of imaging methods for living samples is given in Tadrous (2002) and Stephens and Allan (2003). When a microscopic sample is illuminated (e.g., by a white light source), some of the light passes through the sample without being absorbed or scattered. The remaining part of the light is diffracted from the sample and acquires a phase shift in comparison to the undiffracted light. The microscope objective projects all light beams into the image plane, where the undiffracted light evolves to a plane wave. The diffracted light focuses at different positions in the image plane, and there interference with the plane wave occurs, resulting in an intensity image of the sample. Darkfield microscopy is one method to increase image contrast. There the zeroth order of the illumination beam is blocked such that only light diffracted, refracted, or reflected at the specimen is coupled into the microscope objective, where it can contribute to the formation of the image. The result is an illuminated object in front of a dark background. The sample is illuminated by a hollow cone of light, which is blocked by a ring in the darkfield objective, or the illumination light completely misses the collecting lens of the objective (ultra-darkfield method). This method works well for objects with low contrast and is suitable to enhance edges and contours. Since the direct illumination beam is blocked, and thus intensity is lost, this microscopy technique requires a bright light source.
SPIRAL PHASE MICROSCOPY
3
An object is called a phase object if it does not absorb light and only modifies the phase of the incoming light field. The following microscope techniques are based on the fact that they convert phase differences (BaroneNugent et al., 2002) into amplitude variations that are visible to the human eye. Phase contrast microscopy (Zernike, 1934; Zernike, 1935; Zernike, 1955; Noda and Kawata, 1992; Barty et al., 1998; Paganin and Nugent, 1998; Liang et al., 2000; Bellair et al., 2004; Paganin et al., 2004), which was first introduced by Frits Zernike, images small differences in refractive index or thickness variations between several parts of the cell. The original central phase contrast technique is based on a filter that is placed in a Fourier plane of the imaging pathway, creating a phase difference between the diffracted and undiffracted wavefront. For small phase variations, Zernike could show (Zernike, 1942a; Zernike, 1942b) that there exists a difference of a quarter wavelength between the diffracted and the undiffracted light field in a phase sample. This phase variation cannot be seen by the human eye, which is sensitive only to intensities. By shifting the phase of the undiffracted light by another quarter wavelength, these phase variations on the sample can be transformed into amplitude variations in the image plane. If the resulting phase shift between the diffracted and the undiffracted light is half a wavelength, both light fields interfere destructively, and the method is called positive phase contrast. The specimen appears dark against a bright background. Conversely, if the diffracted and undiffracted light are in phase after the phase filter, the method is called negative phase contrast. The resulting images have bright specimen details on a dark background. The success of his method earned Zernike the Nobel Prize in Physics in 1953. An advantage of this method is that living samples can be examined. As a disadvantage, “halo effects” (i.e., bright areas around dark objects or dark areas around bright objects) appear when thicker probes are analyzed. Differential interference contrast (DIC) was introduced by Georges Nomarski (Nomarski, 1955; Padawer, 1968; Allen et al., 1969; Pluta, 1989; Cogswell et al., 1997; Van Munster et al., 1997; Preza, 2000; Franz and Kross, 2001; Arnison et al., 2004) and utilizes phase gradients in the sample for contrast. Linearly polarized light is passed through a first modified Wollaston (or Nomarski) prism, which splits the light into two parts with a 90-degree difference between their polarizations. Behind the Wollaston prism, the two rays have a small shear in their directions, less than the optical resolution of the microscope. After passing the condenser, the light traverses the sample, and differences in refractive index or thickness affect each beam differently. Subsequently, the two beams are collected by the objective, recombined by a second Wollaston prism, and finally interfere behind a second polarizer. This procedure detects the phase difference between the sheared image
4
FÜRHAPTER ET AL .
waves. The result is an image with a pseudo-three-dimensional (3D) relief. The image is not isotropic, which means that only phase changes of one specific direction are detected, determined by the orientation of the Nomarski prisms. As opposed to phase contrast imaging, there are no halo effects in DIC. A disadvantage of the method is that birefringent specimens cannot be examined; thus no plastic tissue culture containers can be used. This microscopy type is also not suited to determine the exact height of a sample, because it only displays gradients within the optical thickness. It also cannot distinguish between elevations and depressions: A 180-degree rotation of the sample apparently converts a “hill” into a “valley.” Hoffman modulation contrast (Hoffman and Gross, 1975; Hoffman, 1977), developed by Robert Hoffman, consists of a standard brightfield microscope where an aperture slit before the condenser and a modulator in the Fourier plane is inserted. The modulator is a spatially modulated optical amplitude filter that converts phase gradients into brightness variations and is inserted into the rear focal plane of a microscope objective. The cross-section of the modulator contains three parallel zones: the first zone has a transmission of 1%, 15% passes the middle zone, and the third zone, which is the largest, has a transmission of 100%. The three zones influence different gradient directions: zones 1 and 3 correspond to opposite gradient directions the intensity of which is reduced to 1% or can pass through completely, respectively. The middle zone, through which the mainly undiffracted light passes, allows the transmission of 15% of the incoming light field. The resulting image shows one gradient direction appearing nearly dark, whereas the opposite direction appears bright. Image parts with smaller gradients appear gray like the background of the sample. Compared to the phase contrast method, using the Hoffman modulation contrast affects only the light amplitude and not its phase. This modulation contrast technique produces a pseudo-relief of the sample due to the fact that light from positive and negative gradients is attenuated differently. A disadvantage of this filter is that parts of the overall image intensity are absorbed, and so the total image intensity is decreased. Furthermore, one gradient direction is singled out (the method is not isotropic), and examining thicker samples results in images of lower quality. A further phase gradient-sensitive method is Dodt gradient contrast imaging (Dodt et al., 1998; Dodt et al., 1999; Dodt et al., 2003; Yasuda et al., 2004) introduced by Hans-Ulrich Dodt (Dodt, 1995, Patent No. W09529420, Microscopy Contrast Device). A quarter ring diaphragm placed close to the Fourier plane of the illumination pathway of the microscope creates an illumination gradient. The illumination wave is no longer homogeneous and as a result some parts of the Fourier transform of the image are filtered out completely. To compensate for this effect, a diffuser is placed behind the
SPIRAL PHASE MICROSCOPY
5
quarter ring diaphragm in the Fourier plane of the exposure ray trajectory, such that no Fourier component is totally erased. The arrangement creates a relieflike impression of the microscopical sample comparable to images obtained with DIC microscopy. In contrast to DIC, birefringent samples can be analyzed. One drawback of the method is that the total light intensity is not conserved due to the filter configuration. Another recently developed method to obtain phase-gradient contrast is called graded-field microscopy (Yi et al., 2006). Here an adjustable part of the illumination light and of the image wave is blocked. The result is a phase-gradient image comparable to DIC together with a variable background suppression.
II. I SOTROPIC E DGE E NHANCEMENT WITH A S PIRAL P HASE F ILTER As described before, standard microscopy methods such as phase contrast microscopy (PCM) use a filter located in a Fourier plane (Steward, 1987) of the imaging pathway. The special property of a Fourier plane is the fact that the Fourier components of the image field are spatially separated and can therefore be influenced individually. Our method consists of a vortex filter, i.e., a Fourier filter that imprints a phase proportional to exp(iϕ) on the image field. ϕ is a polar angle in a plane transverse to the light propagation direction. In earlier experiments (Larkin et al., 2001), this filter function was introduced as a two-dimensional (2D) generalization of the one-dimensional (1D) Hilbert transform (Lowenthal and Belvaux, 1967; Bracewell, 1978). Further groups used the spiral phase transform for fringe analysis of interferograms (Villa et al., 2005) or for the analysis of speckle patterns (Wang et al., 2006). In our case, such a filter is realized by displaying the spiral phase filter on a high-resolution spatial light modulator. The principle of this Fourier filter is shown in Figure 1. As illustrated, the sample is illuminated with a plane wave. Light is scattered into the different local phase or amplitude gradient directions. Two example rays are sketched as green and orange arrows. Lens 1 is located a focal length away behind the input image and creates a Fourier transform of the image in its right focal plane. There the Fourier filter, in our case a spiral phase plate, is located. The typical design of the spiral phase plate is displayed below. Gray values correspond to phase values in a range between 0 and 2π . This value is added as a phase offset to the incoming light. The major part of the light (marked by the black circle) is undiffracted and focuses on the central part of the Fourier filter, where no phase shift is added. Different gradient directions focus at different
6
FÜRHAPTER ET AL .
F IGURE 1. Image filtering in the Fourier plane. The principle of a spiral phase filter in the Fourier plane of the optical imaging pathway is shown, illustrating a so-called 4f-setup. The illumination light is scattered into different directions according to the composition of the sample. Different gradient directions (two directions are indicated with green and orange color) are focused at different positions in the Fourier plane and there acquire a phase shift corresponding to their respective position. The resulting image of the sample is a coherent superposition of the undiffracted light with the remaining filtered part of the light field. (See Color Insert.)
positions in the Fourier plane and thus a different phase shift is added. Lens 2 is located a focal length away from the spiral phase plate and performs a reverse Fourier transform, which creates the filtered output image in its right focal plane. The undiffracted light is again a plane wave, which superimposes coherently with the remaining light field, which has a spatially dependent phase offset. Spiral phase plates such as shown in the lower part of Figure 1 have already been used for creating Laguerre–Gauss modes (Arlt et al., 1998) from incoming Gauss beams. These special modes are used, for example, as optical traps (Jesacher et al., 2004a) or for size-selective trapping of microscopic particles (Jesacher et al., 2004b). An illustration of such a setup is shown in Figure 2. A plane wave illuminates the spiral phase plate, and the Laguerre–Gauss mode is obtained by a Fourier transform performed by a lens. The same spiral phase plate that can be used to trap particles can also be used for image filtering. The convolution process of a single image point with the spiral phase filter is illustrated in Figure 3. As demonstrated, the convolution of a single image point with the spiral phase filter results in a doughnut mode in the image plane, which is thus the point spread function of the imaging system. An explanation for the principle of vortex filtering is shown in Figure 4: The left image contains a sample (indicated in green) of constant phase height
SPIRAL PHASE MICROSCOPY
7
F IGURE 2. A conventional application of the spiral phase plate. The spiral phase plate is illuminated with a plane wave. A lens performs a Fourier transform of the image field. The result is a Laguerre–Gauss mode TEM∗01 , which can be used, for example, as an optical trap. (See Color Insert.)
F IGURE 3. Process of convolution with a spiral phase plate in the Fourier plane. Each point in the sample plane evolves to a doughnut in the image plane.
F IGURE 4. Visualization of the principle of the spiral phase filter. A sample (left image) of constant optical thickness (marked in green) is convoluted with the spiral phase filter. Every image point is convoluted with the Fourier transform of the filter (middle image). Different line thickness corresponds to different phase levels. After summation over all image points, every image point cancels out, except at edges or phase jumps. In the filtered image (right image), edges appear bright (here marked in red) against a dark background. (See Color Insert.)
8
FÜRHAPTER ET AL .
F IGURE 5. Demonstration of isotropic edge enhancement. Image of an oil droplet between two coverslips in brightfield (left) and spiral phase contrast microscopy (right). The size of each image is 400 × 300 µm. With spiral phase contrast the total image intensity is conserved and redistributed to the edges.
that is spiral phase filtered. The spiral phase filter is placed in the Fourier plane of the image pathway, and according to the convolution theorem, this spatial Fourier filtering corresponds to a convolution of the image field with the point spread function of the imaging system (i.e., with a doughnut field) (Reynolds et al., 1989). This is illustrated in the middle image. Every image point is multiplied with the Fourier transform of the filter, which is of the form r −2 exp(iϕ). As indicated, different phase levels are marked with doughnuts of different line thickness. A following summation over all image points results in a complete cancellation of doughnuts in “flat” regions of constant phase. This is caused by the dependence of the vortex filter on the polar angle ϕ: Every contribution to the convolution integral of one image point is canceled by a negative contribution of a central-symmetric counterpoint. On the other hand, phase jumps within the sample contribute to the image of the filtered sample. Therefore, borderlines between regions with different phases are highlighted by constructive interference. Our vortex filter is central-symmetric and this amplification affects all edge directions (i.e., the procedure is isotropic). The edge enhancement is restricted to the surrounding area because of the factor r −2 , which means that the intensity amplification falls off rapidly with increasing distance from the edge. In contrast to darkfield microscopy, the total image intensity is maintained and the intensity distribution represents the intensity gradient of the input image. Fürhapter et al. (2005a) explains how edges can be examined with the vortex filter that cannot be detected with phase contrast microscopy: In numerical simulations it has been shown that a phase step of 0.25% of the optical wavelength creates 6% total image contrast with phase contrast microscopy compared to 100% image contrast with spiral phase contrast microscopy. An example for edge enhancement is presented in Figure 5. An
SPIRAL PHASE MICROSCOPY
9
oil drop sandwiched between two coverslips is imaged with brightfield and with spiral phase contrast microscopy. Brightfield mode is realized with our setup by removing the spiral phase filter. Comparison of the two images shows that the entire background intensity of the brightfield image is redistributed to the phase edges of the spiral phase filtered image, resulting in a significant contrast enhancement. A. Experimental Realization Our Fourier filter is displayed at a high-resolution spatial light modulator (SLM) where a blazed phase hologram is shown. Other possibilities would be a static phase hologram (Heckenberg et al., 1992; Khonina et al., 1992) or a spiral phase plate (Jaroszewicz and Kolodziejczyk, 1993; Oron et al., 2001; Oemrawsingh et al., 2004) instead of the spatial light modulator. Earlier investigations of Fourier plane filtering (Ng et al., 2004) used vortex lenses (Davis et al., 2000; Swartzlander, Jr., 2001; Crabtree et al., 2004). 1. Coherent Illumination A possible experimental setup using a laser as illumination source is shown in Figure 6. We use a 785-nm single mode laser diode as illumination for experiments that require a higher degree of coherence. The collimated laser light passes through the specimen, which is placed on a 3D stage and the transmitted light is captured with a microscope objective. Imaging with an objective or more generally with a lens creates a Fourier transform of the image where all Fourier components of the image field are spatially separated and can be influenced individually. In our case, this Fourier plane is not primarily accessible as it is located inside the multilens part of the objective. Because we use an SLM to display the spiral phase hologram, a set of two lenses builds a telescope that projects this Fourier plane on our reflective SLM. The two lenses are selected such that the size of the Fourier plane image fits the dimensions of the SLM. At the SLM the spiral phase hologram is displayed. A sample hologram is illustrated in Figure 7. The forklike discontinuity in its center coincides with the zeroth Fourier order of the incoming light field, which contains most of the total image intensity. Another lens performs a Fourier backtransform and a projection of the filtered image at the charge-coupled device (CCD) camera. The CCD camera used is a TILL Imago with frame grabber board. A sketch of a gray-value image representing the spiral phase filter is shown in Figure 7. This spiral phase filter is designed as an off-axis hologram where a blazed grating is superposed to the vortex hologram. Due to the configuration of the hologram, only light diffracted into the first order is relevant for image filtering.
10
FÜRHAPTER ET AL .
F IGURE 6. Laboratory setup for coherent illumination. The sample is illuminated with collimated light emitted from a 785-nm laser diode. A 10× Olympus objective (type EA 10) with numerical aperture (NA) = 0.25 captures the transmitted light, which is afterward deflected at a mirror. The Fourier transform of the image is projected with lenses L2 and L3 at the reflective spatial light modulator (Holoeye LC-R 3000 with 1920 × 1200 pixels). There the blazed phase hologram is shown (farthest right part of the figure), which performs the Fourier filtering. The diffracted first order containing the filtered light is Fourier backtransformed with lens L4 and the image is recorded with a CCD camera. Focal length of lenses: f1 = 200 mm, f2 = 250 mm, f3 = 450 mm, and f4 = 250 mm.
F IGURE 7. A typical spiral phase hologram for off-axis holography. The hologram has a forklike discontinuity in its center that overlaps with the zeroth Fourier order of the incoming light field. The hologram is displayed at a reflective spatial light modulator where gray values correspond to a phase shift between 0 and 2π , which is imprinted on the incoming light wave. This spiral phase hologram is designed with a helicity of 1 and a grating constant of 50 pixels that corresponds to 500 µm on our spatial light modulator. Gray values appear as phase values on the liquid crystal display (LCD) screen.
SPIRAL PHASE MICROSCOPY
11
Spatial light modulator. The central element of the experimental setup is a reflective SLM LC-R 3000 (Holoeye Photonics AG, Berlin–Adlershof, Germany) with 1920 × 1200 pixels, and pixel dimensions of 9.4 × 9.4 µm2 . It is connected to a second graphics card output of a computer and there a clone of the first screen image is shown. However, the SLM converts the gray-level images displayed at the normal computer screen into phasemodulated images. Our holograms are designed as off-axis holograms and light diffracted into the first order is used for imaging. A typical hologram has already been shown in Figure 7. First, the standard phase profile proportional to exp(iϕ) is calculated. In a second step, a phase term corresponding to exp(iGx x + iGy y) is multiplied to the spiral phase to create a blazed off-axis The phase angle of hologram that diffracts the incoming light into direction G. this calculation is determined modulo 2π , and gray values in the image appear as phase values between 0 and 2π on the SLM. This phase value is added to the phase of the incoming light wave. Since we use blazed phase holograms, only light diffracted into the first order is used for image processing. The reason for using off-axis holograms—despite the necessity of subsequent dispersion correction—is that the phase filtering can be done with greatly improved accuracy. Whereas in an on-axis setup the phase shift is introduced by the poorly controllable phase offset of the individual SLM pixels (which is in the case of our SLM limited to a range between 0 and 1.4π), the phase shift in a holographic off-axis setup is introduced by the local phase of the displayed grating structures. Due to the high spatial resolution of the SLM, this grating phase is controllable with very good accuracy. In this case, technical limitations of the SLM phase-shifting capabilities influence only the diffraction efficiency, but not the phase-filtering performance. Using an SLM as a phase filter in a Fourier plane of the imaging pathway offers the great advantage that different microscope technologies can be “emulated.” Most microscopy techniques can be realized by the choice of an appropriate image sent to the SLM. This means that one can switch between several microscope modalities with the same hardware components. A major advantage is that any Fourier filter can be displayed on the SLM: • A pure brightfield microscope is realized with a normal blazed grating. • By replacing the central area with a structure that diffracts light into another spatial direction, the zeroth Fourier order is discarded, and a darkfield setup can be emulated. • Replacing the central area with a π/2-phase shifted blazed grating with the same grating constant as the remaining part of the filter results in a phase contrast filter. • Discarding one half of the Fourier components results in unidirectional edge enhancement.
12
FÜRHAPTER ET AL .
• A DIC setup can be realized by a blazed grating where the two halves in the Fourier plane acquire a different phase shift, which defines the shear factor. • Finally, a spiral phase contrast setup can be realized by displaying the spiral phase hologram with a superposed blazed grating. Any other combination between these microscopy methods is imaginable. Manipulating the zeroth Fourier order requires knowledge of the dimensions of the zero-order Fourier spot, which can be calculated or experimentally optimized. The influence of the zeroth Fourier order of the image wave and its consequences is explained in more detail in Sections III and IV. 2. Low-Coherence Illumination For some imaging experiments, such as interferometric setups where a high degree of coherence is necessary, only lasers can be used as an illumination source. In most cases, however, the high degree of coherence can be disturbing. Laser light illumination causes speckles or interference fringes that contain no additional information and only lower the image quality. To minimize these effects, an illumination source with a lower degree of coherence is used. For this purpose, a modified setup including a white light source can be used. The alternative setup using a 100-W tungsten halogen lamp as a light source is shown in Figure 8. The light source is coupled into a fiber with a 400-µm core diameter. The output from the fiber is collimated and then illuminates the sample, which is mounted on a manual 3D stage. The light that is transmitted through the specimen is captured by the microscope objective. A telescope realized by a set of two lenses projects the Fourier transform of the image wave in the plane where the SLM is located. The magnification of the telescope is chosen such that the size of the Fourier image matches the dimensions of the active area of the SLM. White light is problematic because the diffraction process at the spiral phase hologram introduces a strong dispersion, which leads to a blurred image. To compensate for the dispersion, we introduce another complementary diffraction process at another blazed grating that is also displayed at the same SLM in a window adjacent to the window of the spiral phase filter. Technically this is realized by projecting the spiral phase–filtered image through a lens L5 to a mirror. From there the image is backprojected through the same lens L5 to the second window of the SLM. There a second diffraction process at a plane blazed grating (without spiral phase filtering) compensates for the dispersion introduced by the first diffraction process, before the dispersioncorrected image wave is guided by a further mirror M3 through an imaging lens L6 to the CCD camera.
SPIRAL PHASE MICROSCOPY
13
F IGURE 8. Low coherence illumination. A halogen bulb is used as a white light source. In contrast to the setup used in Figure 6, the frequency bandwidth of the light source requires a correction of the occurring dispersion. Therefore the area of the SLM is split into two halves. The first half displays the spiral phase filter, which diffracts the filtered light into its first order; the second half is filled with a “normal” blazed grating that compensates for the dispersion effects. The first diffraction order contains the dispersion-compensated light, which is then imaged at a CCD camera. Focal length of lenses: f1 = 100 mm, f2 = 110 mm, f3 = 200 mm, f4 = 150 mm, f5 = 400 mm, and f6 = 200 mm. (See Color Insert.)
B. Results 1. Images Obtained with Laser Illumination The following images are presented to demonstrate the efficiency of the spiral phase filter and to discuss some aspects of a laser as illumination source. The images show oil droplets in water captured with the setup of Figure 6 that uses a coherent illumination. The filtered image was captured with an Olympus EA10 10 × NA = 0.25 microscope objective. This sample is suitable for a quality check of the spiral phase filter, as it mainly consists of phase structures that result in a poor contrast when viewed with a brightfield microscope.
14
FÜRHAPTER ET AL .
F IGURE 9. Left: brightfield illumination. Oil droplets in water viewed with a brightfield setup. The total size of the image is 660 × 500 µm. Right: the corresponding histogram reflects the poor contrast for a nearly pure phase sample.
F IGURE 10. Left: spiral phase–filtered image. The same specimen as in Figure 9, but now viewed with a spiral phase contrast filter. The oil droplets can be well observed, and the background appears nearly dark. Right: the corresponding histogram. Two widely separated peaks are observable that correspond to the dark and bright areas within the image.
The left image of Figure 9 shows this specimen as it is observed in brightfield mode. No spiral phase filter was used; instead, a normal blazed grating was shown on the SLM to emulate a brightfield microscope. The sample almost appears transparent. A typical speckle pattern is observable with fringes that result from the high coherence; together, both factors worsen the image quality. The right image of Figure 9 shows the corresponding histogram of occurrences of different gray values in the image. There is only one broad peak, and many gray values appear nearly uniformly distributed. Such a histogram results in poor contrast where details cannot be distinguished. In comparison, Figure 10 shows the same specimen imaged with a spiral phase filter. The result is an image with isotropic edge enhancement and conservation of total image intensity. The speckles and interference fringes are still visible, but the edges of the oil droplets appear very bright around a
SPIRAL PHASE MICROSCOPY
F IGURE 11. Brightfield image of a human cheek cell. The upper right corner shows the hologram used.
15
F IGURE 12. The same cell viewed with an emulated darkfield microscope.
nearly dark background. The corresponding histogram is shown in the right image of Figure 10. In contrast to the brightfield image, two peaks are identifiable in Figure 10: a large and broad peak coinciding with the darker gray values that dominate the image, and a second small peak representing the bright structures within the image. The two peaks are widely separated within the histogram, which illustrates the high contrast achievable with a spiral phase filter. 2. Images Obtained with Partially Coherent Illumination For images with a more homogeneous background and without disturbing speckles, we have developed the white-light setup of Figure 8. The following images were taken for a human cheek cell. These specimens consist mainly of water and show almost no contrast when imaged with a brightfield microscope. Thus it is an excellent sample for testing the contrast resolution of microscopic setups. We performed our tests on this sample emulating four different microscopic modalities. All images were obtained with the same setup and differ only in the chosen Fourier filter displayed on the SLM. An Olympus UPlanFL 60 × NA = 1.25 oil immersion microscope objective was used. Image acquisition time was typically 2 seconds. The sample size is approximately 50 × 50 µm2 . Figure 11 shows the human cheek cell imaged with the emulated brightfield microscope. The right upper corner shows the hologram used for imaging. As predicted, nearly the entire structure within the sample remains hidden. Figure 12 presents the same cell with an emulated darkfield microscope. The filter consists of a blazed grating where the central area diffracts the light in a different direction—the zeroth Fourier order is discarded. The background of the image appears dark, and parts of the cell (e.g., its core)
16
FÜRHAPTER ET AL .
F IGURE 13. The cheek cell as it appears with an emulated phase contrast microscope.
F IGURE 14. The cheek cell imaged with the spiral phase contrast filter.
appear bright. The zeroth Fourier order contains most of the light intensity; using a darkfield Fourier filter therefore results in a loss of total image intensity. The result of an emulated phase contrast method is shown in Figure 13. Here the Fourier filter on the SLM is a simple blazed grating with an overlapped blazed grating located in the center of the hologram. It has the same spatial frequency as the rest of the hologram and diffracts the light into the same direction. The central grating is π/2 phase-shifted relative to the remaining filter, which is the condition for phase contrast filtering. The background of the image is not as dark as in the darkfield method, and more details from the specimen can be recognized. Finally, Figure 14 displays the same cheek cell imaged with the spiral phase contrast filter. All edges appear bright, whereas the background is nearly dark, and many details can be recognized. The image has a relieflike impression, and the total image intensity is conserved. The resulting image shows that the coherence of the illumination source is high enough to perform these filter tasks without disturbing speckles all over the image. The next sample is a mica fragment placed in immersion oil to decrease the difference in refractive index between the sample and its surrounding and make it more difficult for a standard brightfield microscopy method. Again the setup from Figure 8 was used, with different Fourier filters that emulate the various microscope modalities. The used objective is a Zeiss A-Plan 20 × NA = 0.45, and the exposure time is 800 ms. The image size is ≈ 300 × 310 µm2 . All images are taken with the same parameters, and illumination and camera settings were not changed. Figure 15 shows the mica fragment as it appears in a brightfield microscope. As expected, the specimen is poorly visible—this method is inadequate for imaging such a sample. An increase in contrast can be reached by emulating a darkfield microscope as shown in Figure 16. The background appears slightly
SPIRAL PHASE MICROSCOPY
F IGURE 15. A mica fragment in immersion oil imaged with an emulated brightfield microscope.
17
F IGURE 16. The appearance of the same sample in Figure 15 when imaged with an emulated darkfield microscope.
F IGURE 17. Imaging with the spiral phase filter. Now the structures in the middle of the field of view become visible.
darker, and some details of the specimen can be recognized. Figure 17 shows the same mica fragment imaged with the spiral phase contrast filter. Many details within the sample can be recognized, and the edges appear bright in comparison to the surroundings. The overall intensity is much higher than for the darkfield setup. All images shown are single-shot recordings, and no further digital postprocessing was performed. In comparison to the images obtained with the laser illumination, the background is more uniform, but as the results show, the coherence is still high enough to perform the mentioned filter tasks.
18
FÜRHAPTER ET AL .
Very small phase jumps on the order of less than a few percent of the used wavelength are sufficient to obtain a high contrast with a spiral phase contrast filter. With the chosen symmetric layout of the spiral phase contrast filter the edge enhancement is isotropic, and total image intensity is conserved. The background light is redistributed to the edges of the image. A disadvantage of the white light setup lies in its relatively small total light transmission due to losses at the two diffraction steps. This increases the necessary exposure time. Nevertheless, for a sufficiently bright illumination source this does not impose a serious restriction. A second drawback of the off-axis method is the limited field of view; the reason is that the spatial separation of the different diffraction orders requires a confinement of the image field by introducing field apertures. Using an objective with a low magnification results in a larger field of view, but the limited numerical aperture also limits the resolution. In all cases, the optical elements must be chosen so that a maximum fraction of the Fourier components is imaged at the SLM. Many of these disadvantages may be avoided in the future by introduction of a high-quality, transmissive on-axis spiral phase plate directly in the aperture plane of a specially designed microscope objective. Such an implementation does not reduce the transmission of the objective—thus giving maximal brightness. Since such an on-axis filter is nondispersive, no additional dispersion correction would be required, with no subsequent limitation of the field of view.
III. A SYMMETRIC E DGE E NHANCEMENT U SING A M ODIFIED S PIRAL P HASE F ILTER Using the spiral phase filter results in an isotropic enhancement of edges caused by phase or amplitude gradients within a microscopic sample. Compared to darkfield (using a Fourier filter with an absorptive center) or Dodt microscopy, no absorptive Fourier filter is used. Therefore the total image intensity is conserved but isotropically redistributed to the edges within the sample. However, we will demonstrate that a slightly modified spiral phase filter can introduce a symmetry breaking, resulting in relieflike shadow images of amplitude or phase samples. A. Influence of the Zeroth-Order Fourier Component The principle of a spiral phase filter is based on the Hilbert transform (Bracewell, 1978; Lohmann et al., 1997; Arnison et al., 2000; Davis et al., 1 2000). In the 1D case, it is defined as the convolution of a function with − πx
SPIRAL PHASE MICROSCOPY
19
(for x = 0). In Fourier space, this corresponds to multiplying with the Fourier filter i sign(x). The signum function is defined as: −1 x < 0, sign(x) = 0 (1) x = 0, +1 x > 0. The Hilbert transform changes the symmetry of the function. In a polar coordinate system with radius r and angle ϕ, a generalized 2D Hilbert transform can be introduced by applying exp(iϕ) as Fourier filter, which corresponds to “a signum function along each radial direction defined by an angle ϕ”: (2) H f (r, ϕ) = F −1 F f (r, ϕ) exp(iϕ) . This is equivalent to a convolution with the kernel i exp(iϕ)/(2π r 2 ) (Larkin et al., 2001): iϕ e kH (r, ϕ) = i 2πr 2 r = 0, 0 r = 0, exp(iϕ) H f (r, ϕ) = f (r, ϕ) ∗ i . (3) 2π r 2 Choosing the kernel to be absorptive in the center, i.e., KH (r = 0, ϕ) = 0 results in an isotropic convolution leading to isotropic edge enhancement, but other choices are possible. In this case, the edge enhancement by the spiral phase transform is isotropic. For some reasons, it can be useful to break this circular symmetry. A microscopic sample examined with DIC shows enhanced edges along one direction. This results in a pseudo-relief that microscopists sometimes find helpful in recognizing more details of the specimens. This interesting microscopy technique does not work in examining birefringent samples or with plastic cover slips, which are widely used in microscopy. Our spiral phase filter does not impose this limitation and nevertheless produces images comparable to DIC microscopy. A symmetry break can be introduced in spiral phase contrast imaging by simply using a spiral phase plate with a transmissive central point. Actually this central point is of utmost importance, since in spatial filtering applications the zeroth Fourier order of the image wave—containing the major part of the image intensity—passes through this point. In the image plane, this pointlike source evolves into a plane wave that interferes with the remaining filtered wave to create the actual image of the sample. In the case of a spiral phase-filtered wave, the phase of the image field depends on the geometrical direction ϕgrad of the gradients within the sample, due to the
20
FÜRHAPTER ET AL .
asymmetry introduced by the 2D Hilbert transform. Thus the interference with the plane wave evolving from the zero-order Fourier component leads to a magnification or suppression of the image intensity at gradient positions within the sample, depending on their geometrical directions ϕgrad and the offset phase θ of the zero-order Fourier component. The interference term ∝ exp[i(ϕgrad − θ )] that defines the direction of maximal edge amplification itself depends on the polar angle ϕ, which explains why the shadow effect is asymmetric. Such a filter (dependent on the angle ϕ) forms an image that appears to be illuminated from a certain direction, resulting in a shadow effect, that is, it forms a pseudorelief of the analyzed microscopic sample. Another consequence of this effect is that the shadow directions of amplitude and phase samples differ by π/2, which makes it possible to distinguish between amplitude and phase structures within an image. In contrast to amplitude samples, image waves from phase samples show a phase difference of π/2 between the zeroth- and higher-order Fourier terms (Reynolds et al., 1989); compare also the discussion of Eq. (20). The π/2 offset influences the resulting interference term exp[i(ϕgrad − θ )] and finally returns a π/2 difference in the shadow orientations. We could demonstrate this fact in simulations and also in experiments (Jesacher et al., 2005). 1. Experimental Realization of an Asymmetric Spiral Phase Filter Our setup in Figure 8 must be modified slightly to realize this shadow effect. The adapted spiral phase filter is shown in Figure 18. The central part of the filter is substituted by a circular area with a “simple” blazed grating. This blazed grating diffracts the incoming zeroth Fourier order into exactly the same direction as the remaining part of the spiral phase hologram. By varying the phase of this central part between 0 and 2π , the interference angle θ can be varied, which defines a shadow direction between 0 and 360 degrees. In an ideal setup, the diameter of this central Fourier spot is diffraction limited. In our real setup, its size is chosen to fit the dimensions of the focused light source, which is on the order of a few hundred microns. Figure 19 shows the modified setup. It corresponds basically to the white light setup described before (Figure 8) with the difference that a modified spiral phase hologram (Figure 18) is used. As noted, in an ideal setup the zeroth Fourier order consists only of one infinitely small pixel. Here the diameter D of the zeroth Fourier order is influenced by the diameter of the fiber core Df , the focal length fc of the collimation lens in front of the fiber, the focal length fobj of the microscope objective, and the magnification factor m determined by the lenses L3 and L4 . The exact size of D can be calculated using D = Df mfobj /fc and is on the order of a few 100 µm. For a white light
SPIRAL PHASE MICROSCOPY
21
F IGURE 18. Modified hologram (not to scale): A spiral phase filter with a blazed grating in its center that has the same grating constant as the remaining part of the hologram. The size of the central area is chosen such that it corresponds to the diameter of the focused light source. Here it is enlarged for explanation purposes only.
source the dispersion must again be compensated for with a second blazed grating with the same grating constant as the first hologram. This is done as described before (Figure 8). B. Results The following examples demonstrate the effect of an asymmetric edge enhancement. Figure 20 shows a set of human cheek cells imaged with a Zeiss A-Plan 20 × NA = 0.45 microscope objective and filtered with a modified spiral phase hologram. The image exposure time is 3200 ms and the full resolution of the camera is used. The only differing parameter between the two snapshots is the phase of the central blazed grating, which is shifted by 2π/3. For demonstrating an even finer resolution with a very shallow phase sample, Figure 21 shows PtK2 cells that are a standard sample to test the quality of a DIC or phase contrast microscope. Because the imaging beam consists only of light diffracted into the first order, only a limited aperture can be used. Therefore this image is assembled from a set of 19 smaller images, each with an exposure time of 4800 ms and highest resolution of the camera. In a postprocessing step a brightfield image was subtracted from the spiral phase–filtered image. The limitations on the field of view could be overcome by using an onaxis spiral phase plate (Oemrawsingh et al., 2004) placed in the Fourier plane located in the back aperture plane of the microscope objective. This would
22
FÜRHAPTER ET AL .
F IGURE 19. Set-up used for shadow effects. The main change compared to Figure 8 is a modified spiral phase filter with an adjustable central area that was introduced and explained in Figure 18. (See Color Insert.)
be an easy-to-implement solution, requiring no spatial light modulator. Using an on-axis filter without superposed blazed grating would also simplify the light pathway, as no dispersion would occur, and no double-filtering would be necessary. Furthermore, the simplification of the imaging pathway would reduce the necessary exposure time. Another way to decrease the exposure time is to use white light sources with higher power density (e.g., an arc lamp). To date, we use a 100-W tungsten halogen lamp with a filament size of 2.3 × 4.2 mm2 that results in a power density of ≈10 W/mm2 . The results shown so far were all achieved for shallow microscopic samples of optical thickness below the illumination wavelength. The effects on “thicker” samples on the filtered image are discussed in Section V. The following section describes methods to increase the resolution of images and some postprocessing methods.
SPIRAL PHASE MICROSCOPY
23
F IGURE 20. Left: shadow-effect image obtained by filtering with a modified spiral phase hologram; a set of human cheek cells that seem to be illuminated from a certain direction. The size of the image is 190 × 200 µm2 . Right: the same cheek cells as in the left figure, but now the phase of the central blazed grating of the spiral phase filter is shifted by 2π/3.
IV. ROTATING S HADOW E FFECT This section demonstrates the advantages of a rotating shadow. In fact, the amplitude and phase transmission of a complex sample can be retrieved by a numerical postprocessing step of a sequence of images recorded with different shadow orientations. As mentioned in Section III, a π/2 phase shift in the scattering phases between amplitude and phase structures makes it possible to distinguish amplitude from phase modulations. For thin samples, the direction with maximum constructive interference differs by π/2 when examining amplitude and phase structures, so that they can be distinguished by the numerical method (Jesacher et al., 2005). An image can be reassembled from a sequence of spiral filtered images because a nonisotropic spiral phase filtering is a reversible operation without loss of information. Variations of the spiral phase filter produce a real 2D gradient of the specimen (Larkin, 2005) that is not reversible. A filter of the form F (ρ, φ) = ρ exp(iφ), with ρ being the radial polar coordinate, which is zero at the origin, deletes the information in the zeroth Fourier order, which therefore cannot be reconstructed. The restoring of the exact topography could be performed only up to the information of the zero-order component, but this component is responsible for the plane wave offset in the output image. Since the resulting image is a coherent superposition of the zeroth Fourier order with the filtered light field, the omission of the zero-order Fourier component is not just an unimportant offset, but results in a strongly degraded image.
24
FÜRHAPTER ET AL .
F IGURE 21. PtK2-cells viewed with spiral phase contrast microscopy. The microscope objective is a Zeiss Fluar 40 × NA = 1.30 oil immersion objective. Total image size is 146 × 99 µm; it is concatenated from 19 smaller images, each recorded with an exposure time of 4800 ms.
SPIRAL PHASE MICROSCOPY
25
A. Reconstruction of the Sample Topography Using a Series of Images The reconstruction of a microscopic specimen that is recorded by a series of shadow effect images with a nonisotropic spiral phase filter results in an image containing both amplitude and phase information. Many applications require the complete information about the absolute phase and transmission properties of a sample. We now demonstrate that this is achievable by postprocessing a series of at least three shadow images that are recorded at equally distributed shadow rotation angles in an interval between 0 and 2π (Bernet et al., 2006). Using more than three images is a straightforward generalization of the method and can further increase the imaging accuracy. Three images of a sample with equally distributed phases of the zeroth Fourier order are obtained using a nonisotropic spiral phase filter. The resulting intensity distribution Iout1,2,3 = |Eout1,2,3 |2 in the output plane can be described by: 2 Iout1,2,3 = (Ein − Ein0 ) ∗ F −1 exp(iφ) + Ein0 .
(4)
The complex amplitude of the input light field is defined as Ein = |Ein (x, y)| × exp[iθin (x, y)], and Ein0 = |Ein0 | exp(iθin0 ) describes the constant zero-order Fourier component of the input light field (including the complex phase). The three rotation angles α1,2,3 are the angles of the spiral phase plate, which are adjusted during recording of the three images. Note that here we assume the spiral phase plate to be rotated from exposure to exposure instead of changing the phase of the central blazed grating. The rotation angles are evenly distributed in the interval between 0 and 2π and are increased in 2π/3-steps. ∗ is the convolution symbol, and F −1 {exp(iφ)} describes the inverse Fourier transform of the spiral phase plate F −1 {exp[iφ(x, y)]} = i exp[iφ(x, y)]/ρ 2 (Larkin et al., 2001) with radial coordinates (ρ, φ). Eq. (4) shows that the output image is retrieved by a convolution of the input image field without its zero-order Fourier component (Ein − Ein0 ) with the reverse Fourier transform of the spiral phase plate. According to the convolution theorem, this corresponds to a multiplication of the Fourier transform of the image field with the spiral phase function (Reynolds et al., 1989). The three images obtained differ by the rotational angle α1,2,3 of the spiral phase filter that is increased in steps of 2π/3 for each exposure. In a next step, the uninfluenced zero-order Fourier component Ein0 is added as a “reference” plane wave. The total image intensity is the squared absolute value of these three recorded “interferograms.”
26
FÜRHAPTER ET AL .
Eq. (4) can be written as 2 Iout1,2,3 = (Ein − Ein0 ) ∗ F −1 exp(iφ) + |Ein0 |2 ∗ + (Ein − Ein0 ) ∗ F −1 exp(iφ) Ein exp(iα1,2,3 ) 0 ∗ + (Ein − Ein0 ) ∗ F −1 exp(iφ) Ein0 exp(−iα1,2,3 ). (5) The three real output images Iout1,2,3 are multiplied with the three known complex phase factors exp(−iα1,2,3 ), and after a final averaging a complex sum IC is obtained, which is needed to retrieve the original image information Ein (x, y): 1 IC = Iout1 exp(−iα1 ) + Iout2 exp(−iα2 ) + Iout3 exp(−iα3 ) . (6) 3 The multiplication with the complex phase factors exp(−iα1,2,3 ) results in a complex phase angle of exp(−iα1,2,3 ) in the first and exp(−2iα1,2,3 ) in the third line of Eq. (5), and the phase term in the second line is eliminated. Because the three angles are evenly distributed within the interval between 0 and 2π , a summation over all three complex images cancels out all terms with phase factors. Therefore the result for the image array IC can be simplified to: ∗ . (7) IC = (Ein − Ein0 ) ∗ F −1 exp(iφ) Ein 0 The convolution in Eq. (7) can be reversed by a numerical deconvolution with the inverse convolution function F {exp(iα1,2,3 )}: ∗ (Ein − Ein0 )Ein = IC ∗ F exp(iφ) . (8) 0 This step is equal to a numerical “reverse spiral-transformation,” which can be performed unambiguously, since the spiral phase transform is reversible. Using the convolution theorem, this is executed by a numerical Fourier transform of IC and a following multiplication with a spiral phase function that has the opposite helicity as the experimentally used spiral phase filter (i.e., with exp[−iφ(x, y)]), followed by a reverse Fourier transform. The phase value of the central point in the spiral phase kernel need not be examined, since the zero-order Fourier component of IC equals zero. From Eq. (8), the original image information Ein (x, y) can finally be restored by: Ein (x, y) exp i θin (x, y) − θin 0
= IC ∗ F exp(iφ) + |Ein0 |2 /|Ein0 |. (9) Here Ein (x, y), which is a complex field, has been split into its absolute value and its phase. If the intensity |Ein0 |2 , which corresponds to the constant zeroorder Fourier component of the input image, is known, it is possible to uncover
SPIRAL PHASE MICROSCOPY
27
the entire original image topography Ein up to an unimportant phase offset θin0 , which corresponds to the spatially constant phase of the zeroth Fourier order. The last step is to obtain the intensity of the zero-order Fourier component of the input image |Ein0 |2 from the three spiral-transformed images. The first step calculates the average IAv of the three recorded images. The result is an image consisting of real, positive values: 1 (Iout1 + Iout2 + Iout3 ). (10) 3 As before, all terms within Eq. (5) that contain a complex phase factor exp(±iα1,2,3 ) will be eliminated after the averaging, again because the three angles α1,2,3 are equally distributed within the interval 0 and 2π . Inserting Eq. (5) into Eq. (10) yields: 2 IAv = (Ein − Ein0 ) ∗ F −1 exp(iφ) + |Ein0 |2 . (11) IAv =
Finally, an equation is gained from comparing Eq. (11) with Eq. (7): |Ein0 |4 − IAv |Ein0 |2 + |IC |2 = 0. Now the requested value for |Ein0 |2 can be determined: 1 1 2 I − 4|IC |2 . |Ein0 |2 = IAv ± 2 2 Av
(12)
(13)
|Ein0 |2 can be determined at each image pixel value individually, although for an ideal sample |Ein0 |2 should have the same value at each image pixel. In real applications, however, there will be some image noise that creates a jitter around a mean value of |Ein0 |2 . In our simulations and experiments, we found that using the most frequently occurring value instead of the mean value of |Ein0 |2 in a histogram yielded the best output, and therefore we used this value and inserted it into Eq. (9). Eq. (13) delivers two possible solutions for the intensity |Ein0 |2 of the zeroth Fourier order just from the sign in front of the square root. The value for the zeroth Fourier order at every image pixel lies beyond or below one half of the average image intensity. Using the solution with positive sign, the main part of the total intensity per pixel arises from the plane wave evolving from the zeroth Fourier order, whereas the real image information is incorporated in the higher-order Fourier components that spatially modulate the plane “carrier wave.” This happens primarily when examining pure amplitude samples or thin microscopic phase objects that have an adequately small phase modulation. Otherwise the solution with the negative sign is suitable for samples with a high degree of phase modulation in a range of π or larger
28
FÜRHAPTER ET AL .
and high spatial frequencies. Typical examples are strongly scattering samples such as ground glass, where the zeroth Fourier order is strongly diminished. For our thin and low-scattering microscopic samples, the solution with the positive sign in Eq. (13) is appropriate. Inserting |Ein0 |2 from Eq. (13) into Eq. (9) delivers the absolute phase topography of the sample, if the complex phase angle of the right-hand side of Eq. (9) is calculated. Here the phase profile of the sample is obtained without any scaling factor in absolute phase units and no calibration is necessary. The square of the absolute value of the right-hand side of Eq. (9) is equal to a transmission image of the sample that could otherwise be obtained with a standard brightfield microscope. Compared to a brightfield recording, the spiral phase filtering method results in a strongly reduced background due to the coherent averaging of IC (as explained in Eq. (6)) over a set of images with different shadow directions. Here disturbing influences that are not altered by the phase shifting throughout the varying exposures, such as stray light, readout noise of the CCD sensor, or noise from polluted optical components behind the Fourier plane, are averaged out. As already mentioned, a higher number of shadow images increases image quality and reduces noise. The only requirement for generalizing the method to more than three exposures is that the rotational angles of the spiral phase plate are evenly distributed within the interval between 0 and 2π . B. Results The setup explained in Figure 19 is suitable to implement these ideas. For an experimental demonstration of quantitative sample reconstruction from a sequence of shadow effect images, we took a commercially obtainable pure phase sample, a Richardson slide that consists of transmissive silica with a micropattern imprinted by etching. The structures have an absolute height of h = 240 ± 10 nm, measured with an atomic force microscope (AFM). Together with its refractive index of n = 1.56, the optical path difference of the etched structures can be calculated by (n − 1)h ≈ 135 nm. Figure 22 shows six images of the Richardson slide obtained with different imaging modalities. The upper row shows three spiral phase contrast images with different shadow directions. Instead of changing the phase of the central blazed grating, the spiral phase filter was rotated in 2π/3-steps. The images appear as if they were illuminated from a certain direction, comparable to images obtained with DIC. The outermost left image in the lower row shows a brightfield image of the sample. Even imaging with a brightfield setup results in a small intensity contrast although the object is a pure phase sample. This can be
SPIRAL PHASE MICROSCOPY
29
F IGURE 22. Imaging and reconstruction of the Richardson slide imaged with a Zeiss Achroplan 63 × NA = 0.95 air microscope objective. The upper row shows three images taken at three different spiral phase plate angles. The second row shows an experimentally recorded brightfield image and two numerically processed images corresponding to the intensity and phase within the sample. Dark gray values correspond to high intensity values in the real images. The real size of each image is ≈ 30 × 60 µm2 .
30
FÜRHAPTER ET AL .
F IGURE 23. Reconstructed phase topography of a section of a so-called Richardson slide. The depth of the structure is measured in absolute units. Our white-light illumination source has limited spatial coherence; therefore the numerical reconstructed image underestimates the true height by 40%. (See Color Insert.)
explained by the fact that some parts of the sample scatter the illumination light at angles higher than the maximum aperture angle of the microscope objective. The intensity and the phase image in the middle and lower right part are numerically processed from the three shadow-effect images in the upper row. As supposed, the brightfield image closely matches the intensity values numerically obtained in the intensity image. The best contrast arises from the reconstructed phase of the image, displayed in the lower right corner. For a quantitative evaluation of our method, we compared the results (Bernet et al., 2006) with values retrieved by an AFM. The upper part of the phase sample presented in Figure 22 was numerically reconstructed, and the result is shown in Figure 23. It represents a surface plot of the etched structures with absolute depth information in nanometers. The numerically processed absolute depth of the micropattern is ∼150 ± 20 nm and shows an underestimation of 40% compared to the AFM measurements. This is in contrast to our theoretical description, which predicts that our method should give quantitatively exact optical thickness parameters—even without a previous calibration. However, the limited spatial coherence of our illumination source is the main reason for the insufficient depth estimation: A pointlike illumination cannot be used because the size of our illumination source corresponds to the dimension of the fiber, which has a core diameter of 400 µm. Furthermore, the entire optical setup of microscope objective and the focal lengths of the transformation lenses produce a zero-order spot in
SPIRAL PHASE MICROSCOPY
31
F IGURE 24. Imaging and postprocessing of crystallized proteins. Image A is the brightfield image. Images (B)–(D) are three spiral phase–filtered images obtained at a spiral phase plate angle of 0, 2π/3, and 4π/3, respectively. Images E and F in the second row show the numerically processed intensity transmission and phase profile of the cell. G shows a surface plot gained from the phase profile that displays the optical thickness of the specimens. (See Color Insert.)
the Fourier plane that has a size on the order of a few hundreds microns. All filtered image-field components are convoluted with this spatially expanded zero-order Fourier spot. The consequence is that the Fourier transform of the image field in the SLM plane is slightly blurred due to imperfect cancellations, which results in a reduction of the optimal edge enhancement. Since the edge enhancement is directly proportional to the height of the phase profile, a diminished shadow effect creates a smaller profile depth. We tested this hypothesis by imaging a similar phase sample with the setup introduced in Figure 6. The laser diode with its coherent TEM00 -mode profile could reproduce the real profile depth with an exactness of ±10% compared to the AFM results, which verified our expectations. This demonstrates that although the implementation of the method is not intrinsically quantitative due to experimental shortcomings, quantitative measurements can still be performed after a previous calibration of the setup with a reference sample such as a Richardson slide. Figure 24 shows a classic example for testing the performance of a phase contrast setup. It shows a series of images obtained from crystallized proteins extracted from dried oral mucosa. The sample size is ≈20 × 30 µm2 . The optical thickness of the microscopic specimen is indicated in nanometers. The images were obtained with the same setup as used for Figure 22.
32
FÜRHAPTER ET AL .
Image A shows a brightfield image of the protein structure that already has considerable contrast. Images B–D are spiral phase contrast images recorded at three different spiral phase plate angles rotated by 2π/3-steps. These three images were used to reconstruct the transmission (image E) and phase profile (image F) of the sample. The images have high contrast, in this case especially, the amplitude transmission image, and in both calculated images many details of the sample are revealed. Image G shows a surface plot of the phase profile, where the z-axis is the calculated optical thickness of the phase profile in nanometers. This sample shows a maximal optical path length difference of ≈ 250 nm between the maximum height of the cell and its surroundings. To determine the absolute height of the cell, the relative refractive index must be known, which can be obtained, for example, as presented in Curl et al. (2005) and Rappaz et al. (2005). Summarizing, a spiral phase filter placed in the Fourier plane of the imaging pathway can be used for a quantitative reconstruction of the amplitude transmission and the phase profile of an optically thin sample. The optical thickness of a sample is defined as the product of its diffractive index and the height of the object, and for a “thin” sample, the result must be smaller than the illumination wavelength. Three images of the sample are obtained, recorded with different phase offsets between the zeroth Fourier order and the remaining filtered part of the image wave. The result of numerical postprocessing is a complex image. Its amplitude and phase matches the amplitude and phase transmission of the object. Theoretically, it is possible to obtain a quantitative phase profile of the sample with no prior calibration. Due to the limited spatial coherence of our white-light illumination source, the height of the phase profile is underestimated. This underestimation factor can be retrieved by a preliminary calibration with a reference sample. In principle, using a TEM00 laser diode can principle allow for a referenceless quantitative image; however, the disadvantage of the resulting speckles and fringe patterns make it difficult to reconstruct the exact sample profile.
V. O PTICALLY T HICK S AMPLES —S PIRAL I NTERFEROMETRY In classical optical interferometry the phase profile (Teague, 1983; Curl et al., 2004) can be determined with sub-wavelength resolution. Such interferograms normally consist of closed contour lines, comparable to the level curves in topographic maps. The contour lines are generated from interference maxima and minima. Every contour line corresponds to a specific height within the sample and is measured in units of the optical wavelength used for imaging. The fact that the phase level is only determined modulo one wavelength is the first problem of classical interferometry. The second challenge arises in
SPIRAL PHASE MICROSCOPY
33
F IGURE 25. Two different interferograms of deformations in a transparent glue strip. The standard interferogram (left) is compared with an interferogram obtained with spiral interferometry (right).
going from one contour line to the adjacent one—it cannot be determined whether the step is up or down. Therefore various techniques to reconstruct the whole sample topography have been developed. A first method is the socalled phase stepping, where more than one interferogram of the sample is recorded, with variable phase shifts between the object and reference beam. Interferograms can also be recorded from different directions; this is called angular multiplexing. In addition, different illumination wavelengths can be used in a process called wavelength multiplexing. When recording more than one interferogram, the sample must be mechanically stable between the succeeding exposures, which makes it difficult to record high-speed processes interferometrically (Smyth and Moore, 1984). The setup illustrated in Figure 6 represents a self-referenced interferometer, where the zero-order Fourier component of the image field wave is compared with the remaining light wave. Analogous self-referenced methods use a standard phase contrast method where the phase of the zeroth Fourier order is stepped relative to the rest of the image wave (Kadono et al., 1994; Ng et al., 2004; Popescu et al., 2004). The interference contrast of a standard phase contrast setup is proportional to the phase difference, whereas the images obtained with a spiral phase contrast filter always have a maximal contrast because the phase of the resulting filtered image covers the whole range between 0 and 2π. In comparison, shifting the phase of the zeroth Fourier order is much easier to realize when using a spiral phase filter—here the spiral phase plate simply has to be rotated. Our self-referenced interferometer allows reconstruction of the entire sample topography with one single exposure (Cuche et al., 1999; Fürhapter et al., 2005b; Marquet et al., 2005; Rappaz et al., 2005). Normally, an interferogram consists of closed contour lines, but placing a spiral phase filter in a Fourier plane of the imaging pathway results in an entirely different image: the closed contour lines become spiral-shaped fringes, as shown in Figure 25.
34
FÜRHAPTER ET AL .
The phase level is determined by the local tangential direction of the fringes, and the rotation direction of the spirals unambiguously distinguishes between depressions and elevations within the sample. A comparison between classical interferometry and spiral phase contrast interferometry is presented in Figure 25. Both interferograms were obtained with the setup introduced in Figure 6 using a 660-nm single-mode laser diode as illumination source. The microscope objective is a Zeiss A-Plan 20 × NA = 0.45. The classical interferogram shown in the left part of the figure is realized with our standard setup using the SLM to display a Fourier filter, which simulates a “normal” phase contrast method as explained previously (see Figure 13). It consists of a blazed grating where the central part is replaced by a second blazed grating. The inner grating has the same spatial frequency as the outer one and therefore diffracts light into exactly the same direction as the rest of the filter. The difference between the two gratings is a phase shift of π/2. This represents a phase-holographic implementation of self-referenced interferometry. The result of the filtering process is an interferogram consisting of closed contour lines. Every line corresponds to a specific height in terms of optical thickness, and two neighboring fringes have a height difference of one wavelength. This example illustrates the two problems mentioned previously: the phase level can only be determined modulo one wavelength, and it cannot be determined whether the difference between two adjacent fringes is a step up or down, as both cases would lead to the same interferogram. As a result, the sample topography cannot be reconstructed unambiguously from a single interferogram. In principle, both problems can be solved by using a modified spiral phase filter, as seen on the right side of the figure. The closed fringes are replaced by spiral-like interference fringes, and their revolution direction allows distinguishing between depressions and elevations in the image. Here the SLM displays a similar hologram as presented in Figure 18, which consists of a blazed spiral phase filter where the center part is replaced by a blazed grating. The corresponding hologram is presented in Figure 26. The size of this center hologram again corresponds to the size of the zeroth Fourier order of the image wave. This central area is responsible for the plane wave in the image plane that is interferometrically superposed with the spiral-filtered image. The local tangential direction can be used to determine the phase level of the sample, and the whole image allows reconstruction of the exact sample topography from just one single interferogram. The rotational phase of the spirals can be steered by varying the phase of the central grating. Abrupt fringe displacements are an indicator for local phase-step discontinuities. Interferometrically created spiral fringes were already presented (Shannon et al., 1965; Chen et al., 1997; Huguenin et al., 2003) and proposed earlier (Guo et al.,
SPIRAL PHASE MICROSCOPY
35
F IGURE 26. Gray-value image of a hologram: a spiral phase filter with a simple blazed grating in its center, which has the same grating constant as the remaining part of the hologram.
2004) for macroscopic applications. The presented sample shows a glue strip that was deformed due to internal stress induced by local heating. The two spirals have a different sense of rotation; thus the examined sample consists of a depression and an elevation. Spiral interferograms appear only if the central area of the spiral phase filter is not absorptive and breaks the radial symmetry (constant phase or phase grating); the phase filter becomes asymmetric. If the central grating is removed and replaced by an absorbing area, the filtered image shows an isotropic edge enhancement with closed interference fringes, where elevations within the sample are not distinguishable from depressions. A. The Origin of Spiral Fringes Before describing different demodulation techniques of spiral fringes (Jesacher et al., 2006), a few considerations about the mathematical properties of the vortex filter must be made. Figure 27 shows a standard setup that corresponds to a 4f system, similar to Figure 1. All relevant elements have a distance of one focal length between each other in the imaging pathway. The sample is placed in the (x, y) plane and is illuminated with a coherent plane wave. Lens 1 is placed a focal length away, and thus performs a Fourier transform of the input light field Ein (x, y), which is created in its right focal plane (μ, ν). The spiral phase filter is located in the same plane. In the lower part of the figure, the modified design of the spiral phase filter is shown in detail. The central area is replaced with an area of constant phase shift. As mentioned previously, this central area is used to create a plane wave in the image plane that is superposed with the spiral-filtered image wave and its size
36
FÜRHAPTER ET AL .
F IGURE 27. Classical 4f setup: A modified spiral phase filter in a Fourier plane of the imaging pathway is used to create self-referenced interferograms. The vortex filter is placed in the Fourier plane of the image field. Contrary to the filtered image wave, the zeroth Fourier order acquires a constant phase shift. After another Fourier transform this zero-order component acts as a plane reference wave that is superposed with the vortex-filtered image wave. The corresponding phase filter is displayed in the lower part; gray values correspond to phase shifts between 0 and 2π . The size of the central area matches the dimensions of the zero-order Fourier component and is here enlarged for explanation purposes only.
corresponds to the diameter of the zeroth Fourier order. Therefore this setup creates a self-referenced interferogram of the sample (Fürhapter et al., 2005b). After another Fourier transform performed by lens 2, the filtered image Eout is created in the plane (x , y ). The following discussion is restricted to the consequences of elementary vortex filtering; that is, the filter without a modified center performs an isotropic edge enhancement. The effect of a superposition of a plane wave is discussed later. The Fourier convolution theorem can be used to derive the result of the vortex filter process. The convolution of the input light field distribution Ein (x, y) with the Fourier transformed vortex filter is described by: Eout (x , y ) = (Ein ∗ KV )(x , y ) ∞ ∞ Ein (x, y)KV (x − x, y − y) dx dy. = x=−∞ y=−∞
(14)
SPIRAL PHASE MICROSCOPY
37
Due to the character of the 4f system from Figure 27, the coordinate system must to be inverted; that is, a mirrored filter function exp(iθ ), identical to − exp(iθ ), must be used. KV is the Fourier transform of − exp(iθ ), which is also called the convolution kernel. Because of the symmetry, polar coordinates are introduced, and there the convolution kernel is of the form ρ
max 2π 2π KV (r, φ) = i exp(iφ) rρ dρ. (15) ρJ1 λf λf ρ=0
J1 is the first-order Bessel function, λ the light wavelength and, f the focal length of the two lenses. A detailed derivation of KV (r, φ) is presented in Appendix A. This Bessel function corresponds to the field distribution of the Laguerre– Gauss mode TEM∗01 . It is also established as “optical vortex” or “doughnut mode” (Sundbeck et al., 2005). For the limit ρmax → ∞, KV reduces to iλf exp(iφ)/(2π r 2 ) (Larkin et al., 2001). The point spread function K(r, φ) of a pure two-lens imaging system with a circular aperture with radius ρmax but without the spiral phase filter is given by 2π K(r, φ) = λf
ρ max
ρJ0
ρmax 2π 2π rρ dρ = J1 ρmax r . λf r λf
(16)
ρ=0
The main differences of the convolution kernels of Eqs. (15) and (16) are the different orders of the Bessel functions and a vortex phase factor exp(iφ). This phase factor is responsible for the φ-dependence of the vortex kernel KV , and thus for the anisotropy within the filtered images. For simplification purposes, we define an approximated filter kernel as 1 exp(iφ) for R1 < r < R2 , ˜ (17) KV (r, φ) = N 0 elsewhere. N is a scaling factor that is defined as N = (R22 − R12 )π , where R1 and R2 are the inner and outer radius of an aperture, respectively. The convolution of Ein with this kernel gives the following expression (Jesacher et al., 2006): 1 Eout (P) = N
2π
R2
exp(iφP )Eˆ in (rP , φP )rP drP dφP .
(18)
φP =0 rP =R1
(rP , φP ) defines a polar coordinate system that has its origin in the center of the kernel, and Eˆ in is the input light field, expressed in this local coordinate
38
FÜRHAPTER ET AL .
system. This convolution integral must be calculated for every point P of the input light field Eˆ in . To understand the basic effects of this convolution, Eˆ in (xP , yP ) = ˆ |Ein (xP , yP )| exp[i ψˆ in (xP , yP )] is expanded in a Taylor series to first order: Eˆ in (xP , yP ) ≈ Eˆ in (0) exp i ψˆ in (0) + exp i ψˆ in (0) × gAm (0) · rP + i Eˆ in (0)gPh (0) · rP . (19) gAm = ∇|Eˆ in | and gPh = ∇ ψˆ in are the amplitude and phase gradient of Eˆ in , respectively, calculated at point P, which coincides with the kernel center and corresponds to the origin 0 in the local coordinate system (xP , yP ). Using Eq. (18) (for details, see Appendix B), the output light field can be written as Eout (P) ∝ exp iψin (P) gAm (P) exp iδAm (P) + iEin (P)gPh (P) exp iδPh (P) . (20) δAm (P) and δPh (P) describe the polar angles of the corresponding gradients. The sensitivity of vortex filtering to phase gradients can be explained qualitatively by Eq. (20). There are two terms that correspond to the influence of amplitude and phase variations of the input object on the filter result. The terms are proportional to the absolute values of the gradients gAm and gPh , respectively. This explains the noticed isotropic enhancement of amplitude and phase edges. The factors exp(iδPh ) and exp(iδAm ) can be explained as the appearance of gradient-dependent geometric phases on the following grounds: δPh and δAm normally are geometric angles that describe the spatial direction of the respective gradients. In the filtered image, these factors occur as additional phase offsets of the image wave at the appropriate positions. As in other appearances of geometric phases (Anandan et al., 1997), the phase offset depends only on its geometric characteristics, which in this case is the direction of the field gradient, but not on the magnitude of the amplitude or phase gradient. An effect of the factors exp(iδPh ) and exp(iδAm ) is the anisotropic edge amplification, when Eout interferes with a plane reference wave. These phase factors produce a phase difference of π between a rising and falling edge of equal orientation. This gives the impression of the sample being illuminated from a certain direction (Jesacher et al., 2005). The factor i in Eq. (20) explains the 90-degree rotation of this pseudo-illumination between pure amplitude and phase samples. In the next considerations the object is assumed as an optically thick pure phase sample. Under these circumstances, the first term in Eq. (20) can be neglected. Now the anisotropic edge enhancement that is caused by the
SPIRAL PHASE MICROSCOPY
39
superposition of the filtered image wave with an external plane reference wave changes to interference fringes. The fringe profile depends not only on the phase distribution ψin of the sample, but also on the local phase gradient direction. Different interference results are obtained from two different points within the specimen that have identical values for ψin but unequal values for δPh . Now the difference from classical interferometry can be revealed; a closed isoline of equal phase ψin surrounding a local extremum normally leads to a closed fringe of equal brightness. The modified vortex filter, however, leads to an open spiral-shaped fringe, because the local phase gradients must cover the entire range of direction angles from 0 to 2π along any closed isoline. This rotational direction of the spiral fringe, which depends on the relative sign of ψin and δPh , makes it possible to distinguish between local maxima and minima. The weighting of the resulting amplitude Eout by the phase gradient gPh influences the fringe positions and is therefore unwanted. We found that this effect is of little importance, except in the close surrounding of local extrema and saddle points. Furthermore, using more than one spiral image can completely eliminate this effect, as explained later with a multi-image demodulation technique. B. Demodulation of Spiral Interferograms For the reconstruction of the sample topography from conventional interferograms (Takeda et al., 1982; Ikeda et al., 2005; Popescu et al., 2005), basically three single exposures of the same object with different phase values of the reference wave must be taken. The different images have slightly shifted fringe patterns that allows reassembly of the entire topography of the examined sample. The combination of the three images is evaluated numerically, and the resulting object’s surface structure is gained modulo 2π . This unambiguity can finally be solved with a phase-unwrapping algorithm. In contrast, object reconstruction from spiral interferograms is possible based on only one single spiral-filtered image. The following text subsections describe two demodulation possibilities, where only a single spiral interference image is needed to recover the quantitative sample topography. These methods can be used to supplement the nonexistent topographic information to standard demodulation techniques of classical interferometry (Robinson and Reid, 1993; Larkin et al., 2001). 1. Single-Image Demodulation The following demodulation techniques assume that the filtered wavefront is of the form Eout ∝ exp{i[ψin (P) + δPh (P)]} and a constant field amplitude. In
40
FÜRHAPTER ET AL .
fact, the fringe positions match the local values for ψin +δPh , modulo 2π . This includes that mod[ψin +δPh , 2π ] is constant where the fringes have maximum intensity. This constant can be experimentally adjusted via the phase of the zero-order Fourier component, which coincides with the center of the spiral phase plate (Fürhapter et al., 2005b). The phase gradient gPh is always transverse to the tangent of the local spiral fringe, with its characteristic angle δPh . So we can write mod[ψin +δPh , 2π ] = C. Since C is an arbitrary constant, it can be set to zero, with the conclusion that along a fringe ψin = −δPh
(21)
up to multiples of 2π . In absolute length units, this is nλ up to multiples of nλ. (22) 2π Here n is the difference between the refractive indices of the object and the surrounding medium, and λ the illumination wavelength. Using Eq. (22), the topographic information of a pure phase sample can be reconstructed by using a standard contouring algorithm that proceeds through the entire spiral. The local tangential direction is determined, which is proportional to the height h of the phase profile at each position within the sample; this height is given in units of the optical wavelength λ. This tangential direction reproduces after each complete 2π -revolution; this would not be enough to reconstruct the sample topography uniquely. To retrieve the absolute phase level, the number of complete revolutions must be kept in mind. Now the sample structure is represented by a 3D spiral curve. For a visual improvement, an interpolation algorithm between adjacent lines can be applied. Two methods of single-image demodulation are now explained. h = −δPh
Contour line demodulation. A seemingly straightforward method to reproduce the surface profile makes use of contour lines. Figure 28 shows an example for this demodulation for an immersion oil drop placed on a coverslip. The upper left image shows the corresponding spiral interferogram. The upper right image displays the processed and smoothed contour line. This line is extracted by standard image-processing software and connects points of equal intensity. It is cut into two parts in regions where the surface is flat (extrema or background). The two parts are indicated by a green and a blue line. This cutting process is necessary because the contouring method automatically generates two contour lines for each intensity fringe—which are displaced to the left and to the right with respect to the fringe center. If the contour line were left unprocessed as a single closed loop, the start and end point of the numerical processing, which are adjacent, would show a height difference of nλ. This is the result of the sum of clockwise and
SPIRAL PHASE MICROSCOPY
41
F IGURE 28. Contour line demodulation method. Upper left: spiral interferogram; upper right: corresponding contour line, cut into parts; lower left: height information is added; lower right: final contour after an interpolation between adjacent lines. The dimensions of the sample are ≈ 150 × 150 µm2 ; the optical thickness of the sample is about 2 µm. (See Color Insert.)
counterclockwise revolutions, which in a closed 2D curve must differ by 1. In a next step, the height information is added separately to both parts of the contour line. The remaining relative height shift between the (now 3D) lines is determined by finding the best fit of one line into an interpolated surface, obtained by the topographic data of the other line (lower left image). Increased accuracy can finally be achieved by finding a new surface based on both lines. Center line demodulation. In the next demodulation technique the same sample as in Figure 28 is examined. The spirals correspond to curves, which follow values of maximum intensity. The center line is gained by a reconstruction algorithm that “skeletonizes” the spiral interferogram step by step. In a consecutive manner, pixels are removed at the spiral boundaries until a “skeleton” remains. Such a skeleton, roughly corresponding to the maxima of the spiral fringes, is shown in the upper left image of Figure 29. After this line has been constructed, the following steps are similar to the ones presented in the contour line demodulation. The local tangential direction is again determined and added as a height information. An example is shown in
42
FÜRHAPTER ET AL .
F IGURE 29. Center line demodulation for the same oil drop, viewed from another perspective. Upper left image: “skeleton” that corresponds to the maxima within the spiral fringes. Upper right: height information is added and the result is a 3D curve representing the surface of the sample. Lower image: after a contour interpolation the surface topography is fully reconstructed.
the upper right image of Figure 29. The error that appears in the contour line method is avoided here. One problem can arise at possible fringe branching points if a line is cut into two parts. Having obtained the height information according to Eq. (22), a final interpolation is performed to reconstruct a smooth surface (lower image in Figure 29). In comparison, the phase profile reconstruction accuracy of the contour line method from a single interferogram is higher compared with the center line algorithm, because in the contour line method more data are obtained from the fringes. A disadvantage of the contour line algorithm is the fact that the user must interactively cut out the critical fringe parts. 2. Multi-Image Demodulation For applications with a high demand for accuracy, we present a technique evaluating more than one spiral interferogram. Each image is captured at
SPIRAL PHASE MICROSCOPY
43
F IGURE 30. Three interferograms of an immersion oil drop placed on a coverslip. The different interferograms are taken at different phase offsets of the zeroth Fourier order.
different phase values of the reference wave, thus resulting in different interferograms. This method delivers a higher degree of accuracy, as no fringe tracking algorithm need be applied. The intensity distribution of a conventional interference pattern is given by 2 I (x , y ) = Aobj (x , y ) exp iψobj (x , y ) + Aref exp(iψref ) . (23) The parameters describing the reference wave, Aref and ψref , are assumed to be constant. Our self-referenced setup (Fürhapter et al., 2005b) creates the reference wave by the zeroth Fourier order of the object itself. This spot in the Fourier plane evolves to a plane wave of uniform intensity in the image plane (x , y ). A suitable phase shift ψj can be added to the central area of the hologram (that has the size of the zeroth Fourier order) to adjust the reference phase ψref . As explained, our Fourier filter consists of an off-axis hologram; thus the central region is replaced by a blazed grating with adjustable phase offset. The images presented in Figure 30 are three interferograms of an immersion oil drop placed on a coverslip. The three images differ by the phase offset of the zeroth Fourier order, which is increased in 2π/3-steps. As can be seen, the interference spirals rotate as the phase shift of the central grating is changed. The next step after the recording of three interferograms with different phase values ψj of the central area is the addition of the reference phase information. It is added to the intensity distribution by multiplication with exp(iψj ), that is, Ic1,2,3 = I1,2,3 exp(iψ1,2,3 ). After calculating the arithmetic mean IC of the three complex images Icj and under the assumption that the reference phase values ψj are evenly distributed within the interval [0, 2π], that is nj=1 exp(iψj ) = 0, the result can be written as n 1 IC (x , y ) = Icj = Aref Aobj (x , y ) exp iψobj (x , y ) . n
j =1
(24)
44
FÜRHAPTER ET AL .
F IGURE 31. Image demodulation of three interferograms. Left: complex phase angle of the arithmetic mean of the three complex images of Figure 30, calculated with Eq. (24). It already contains the whole sample topography information. Right: result after applying an inverse spiral phase filter.
I¯C (x , y ) contains the complete object topography. The left image in Figure 31 presents the phase angle ψobj of the arithmetic mean of the three interferograms from Figure 30. Note that the phase is already “demodulated,” that is, it consists of a “sawtooth” phase profile that indicates the directions of increasing or decreasing optical thickness. However, the result of Eq. (24) contains the whole object topography only for interferograms that are superposed with a separate external reference wave. In our self-referenced interferometry, the object’s zero-order itself represents the reference beam. Using Eq. (23), this implies that Aobj exp(iψobj ) describes the object without its zero-order component (i.e., Aref has to be added for a complete reconstruction of the sample topography). The value of the factor Aref is determined from the three interferograms, which is explained in detail in Appendix C. The phase angle of the complex image IC is not the reconstruction of the original sample; it is only its vortex filtered image. The next step is to apply an inverse spiral filter that corresponds to a numerical spatial filtering with the function exp(−iφ). Technically this is done by a Fourier transformation of IC , followed by a multiplication with an array exp[−iφ(μ, ν)] and a final reverse Fourier transform of the result. The right image in Figure 31 shows our image after the spiral backtransform. It contains the “blazed” sample topography. Thus, the final step is to perform a classical image-unwrapping algorithm. The result after this last step is presented in Figure 32. It shows the topography of the investigated oil drop, having a diameter of approximately 150 µm and an optical thickness of ≈ 2 µm. To avoid this last unwrapping step, an approach using contour lines is again possible. One gray level within the spiral filtered image is used to retrieve the height deconvolution in an analogous manner as presented in the single-image
SPIRAL PHASE MICROSCOPY
45
F IGURE 32. Multi-image demodulation: The resulting image reveals the object topography with high accuracy.
techniques. This algorithm can be performed for every gray value within the image until the entire image topography is extracted. In comparison, this algorithm would bypass the unwrapping process, which can be difficult. The method can be straightforwardly generalized to the processing of more than three interferograms to further improve the resolution and noise suppression. The only requirement for our numerical processing algorithm is that the phase shifts of the recorded interferograms are evenly distributed between 0 and 2π . As already mentioned, a shift of the phase of the reference wave corresponds to a rotated interference fringe pattern. If the method is performed with an on-axis spiral phase filter as presented in Figure 27, it must be rotated only at a defined angle to lead to the described effect. Compared to existing self-referenced phase-stepping methods using a phase contrast setup (Ng et al., 2004), the vortex filtering method does not affect the fringe contrast when changing the phase of the zeroth Fourier order. This allows us to retrieve a high number of interference images with slightly changing phase, which makes it possible to obtain the highest precision. Furthermore, this demodulation technique can also be performed on a “complex” sample consisting of amplitude and phase structures. Like the reconstructed “shadow images” of Section IV, the amplitude of a complex sample processed by the described algorithm is contained in the absolute value of IC , whereas the information about its optical thickness is contained purely in the complex phase angle of IC . In summary, this Fourier filter can be used for unambiguous reconstruction of the sample topography “at a glance.” The spiral phase filter with the
46
FÜRHAPTER ET AL .
characteristic phase singularity in its center can be used to obtain spiral-like interference fringes, where the sense of rotation of the spirals distinguishes elevations and depressions. It contradicts the general statement in classical interferometry that more than one image of the sample must be recorded for unambiguous reconstruction. This is interesting for very fast processes imaged with pulsed lasers (e.g., rotating parts whose 3D deformation can now be processed in real time). For tasks requiring a higher accuracy rather than speed, more than one image of the specimen can be recorded, which afterward can be numerically evaluated with standard demodulation techniques.
VI. S UMMARY AND O UTLOOK The spiral phase contrast method has interesting applications in optical microscopy and interferometry. In microscopy, a spiral phase contrast filter with an absorptive central point (blocking the zero-order Fourier component of the image wave) can be used for isotropic edge enhancement of both amplitude and phase samples. This type of isotropic edge contrast has advantages in applications such as cell counting with respect to typically used one-dimensionally oriented contrasting methods, such as DIC microscopy. Conversely, a spiral phase filter with a transmissive center produces a relieflike shadow image of a sample. The image reveals the amplitude or phase topography as a shadow relief, similar to DIC, however with the advantage that spiral phase filtering makes no explicit use of the light polarization. Therefore it can be used for birefringent samples or for the imaging of specimen through cheap plastic coverslips that are typically birefringent. Another important feature of the relieflike shadow images is the fact that their apparent shadow direction can be easily rotated by shifting the phase of the central point of the spiral phase filter (typically applied for a holographic off-axis implementation of filtering) or by simply rotating an on-axis spiral phase filter. Such a rotating shadow allows investigation of the sample from different illumination directions to determine its complete topography. A sequence of at least three shadow effect images recorded with different shadow orientations is sufficient for a numerical reconstruction of the complete amplitude and phase profile of a sample with an accuracy on the order of some percent of the optical wavelength. If spiral phase microscopy is applied to samples with a larger optical thickness on the order of several wavelengths, a new type of “spiral interferogram” is produced, which contains the information about elevations and depressions of the sample phase profile in the coiling direction of the spiral fringes. In contrast to “normal” interferometric methods that deliver closed contour fringes of the sample profile, one single spiral interferogram contains
SPIRAL PHASE MICROSCOPY
47
sufficient information for unambiguous object reconstruction. In the future, this single-image interferometry may have applications for interferometric measurements on fast processes, such as chemical reactions, fast diffusion, or rapidly moving objects, where there is time for only one single-shot interferogram. This can be done, for example, by using a single laser pulse for illumination. However, the spiral interferograms can also be used in a manner more similar to standard phase-shifting interferometry. There, the phase of the interference spirals can be easily adjusted with high precision by simply rotating the spiral phase plate (or by shifting the phase of its central spot). A sequence of three spiral interferograms with shifted phases allows a numerical reconstruction in a manner similar to standard interferometry, and it has the advantage that it works even with complex samples that consist of mixed amplitude and phase structures. An advantage with respect to standard interferometry lies in the fact that the spiral filtering method corresponds to a self-referenced interferometer (where the reference wave is obtained from the zero-order Fourier component of the image wave), which is completely insensitive to environmental disturbances (like vibrations). Furthermore, phase stepping by rotation of the spiral phase filter is very convenient and accurate compared with, for example, piezo-shifted mirrors that are often used in standard interferometers and which can introduce errors by piezo hysteresis or thermal expansion. To demonstrate these effects we have used a high-resolution spatial phase modulator where the desired phase profiles can be programmed as offaxis holograms (i.e., the filtered image information is contained in the first diffraction order). Such a holographic SLM system offers the advantage that in addition to the spiral phase method, a variety of other Fourier methods in microscopy can be implemented by displaying a corresponding hologram, such as darkfield microscopy, central phase contrast microscopy, brightfield microscopy, and others. Therefore such an electronic system is perfectly suited for the development and evaluation of different filtering methods. In the future it may be interesting to investigate the performance of novel phase filters that are based on the spiral phase filter, but which have higher helical mode indices (“helical charges”) than 1. It is known that the spiral phase filtering operations with helicities of 1, 2, 3, . . . are related to the generation of the image wave gradient, the image wave Laplacian, its third derivative, and so on. Therefore it can be assumed that an image of a specimen can be numerically generated from a sequence of spiral-filtered images with increasing helical indices, similar to the synthesis of a function from a Taylor series. Each of the filtered images contains information about other aspects of the sample, like its gradient (“normal” spiral phase plate with helical index 1) or its curvature (helical index 2). Thus a numerical assembly might result
48
FÜRHAPTER ET AL .
in a super-resolution image of the sample. For the evaluation of such novel methods, a setup that uses the high resolution of recently available SLMs is an ideal prototype system, since there the desired phase filters can be implemented with a high phase precision as off-axis holograms. For a practical microscope, however, it can be advantageous to implement the spatial phase filter as a “hardware” optical element, such as an on-axis spiral phase plate. Since such optical components are transparent (in contrast to our reflective SLM), they might be inserted directly in the back aperture plane of a microscope objective—they are the only modification of a standard microscope that is necessary to achieve the described filtering effects, and thus they do not disturb the optical layout. Such a “spiral phase aperture mask”/filter acting as an on-axis diffractive element does not introduce any dispersion (since there is only a rotational, but no radial, modulation of the phase mask), and therefore no complicated dispersion control such as in our double-diffraction setup is necessary, even when using white light illumination. In such an implementation of the filter as an aperture mask, the centering of the zero-order Fourier component of the image wave with respect to the center of the spiral phase plate can be done by adjusting the direction and collimation of the illumination beam. A rotating shadow effect or phase stepping of interference spirals could be done by simply rotating the objective (together with the embedded spiral phase plate) in an appropriate objective mount. Such a hardware version of a spiral phase aperture mask/filter within an objective can be manufactured as a rather lowpriced diffractive optical element, and it might be a cost-effective way to upgrade a standard microscope with a powerful contrasting method, or even for converting it into a precision interferometer.
ACKNOWLEDGMENTS We want thank Olympus Japan for supplying the PtK2 sample. The Richardson slide was provided by Michael Helgert (Research Center Carl Zeiss, Jena, Germany), who also performed the AFM measurements on the sample. This work was funded by the Austrian Science Foundation (FWF), Project No. P18051-N02. Alexander Jesacher was supported by a DOC Fellowship of the Austrian Academy of Sciences.
A PPENDIX A. D ETAILS ON
THE
S PIRAL K ERNEL
The convolution kernel KV corresponds to a Fourier-transform of the filter function − exp(iθ ) (Reynolds et al., 1989):
SPIRAL PHASE MICROSCOPY
KV (x, y) = −
1 λf
49
exp iθ (μ, ν)
Aperture
2π × exp −i (xμ + yν) dμ dν. λf
(A.1)
λ is the light wavelength, f the focal length of the Fourier-transforming lens, and the aperture is assumed to be a circularly symmetric disk. The symmetry recommends the usage of polar coordinates: x = r cos φ,
y = r sin φ,
μ = ρ cos θ,
ν = ρ sin θ.
(A.2)
After a simplification with trigonometric sum formulas, KV can be written as ρ max 2π
exp(iφ) KV (r, φ) = − λf
ρ exp(−iθ ) ρ=0 θ=0
2π × exp −i rρ cos θ dρ dθ. λf
(A.3)
After an integration, Eq. (A.3) simplifies to: 2π KV (r, φ) = i exp(iφ) λf
ρ max
ρJ1
2π rρ dρ. λf
(A.4)
ρ=0
Here Jn (z) denotes the Bessel functions of the first kind in the integral representation (Abramowitz and Stegun, 1970): i −n Jn (z) = 2π
2π exp(iz cos θ ) exp(inθ ) dθ.
(A.5)
θ=0
Subsequently, after integration, the kernel is of the form (Khonina et al., 1992) 2π 2π πρmax J1 rρ H0 rρ KV (r, φ) = i exp(iφ) 2r λf λf
2π 2π − J0 rρ H1 rρ . (A.6) λf λf H0 and H1 are Struve functions of zero and first order, respectively.
50
FÜRHAPTER ET AL .
A PPENDIX B. D ETAILS ON
THE
VORTEX F ILTER E XPANSION
The first order approximation of Eq. (19) is inserted into Eq. (18): Ein (P) Eout (P) ≈ N
2π
R2 exp(iφP )rP drP dφP
φP =0 rP =R1
exp[iψin (P)]gAm (P) exp(iφP )rP2 + N × cos φP − δAm (P) drP dφP Ein (P)gPh (P) exp(iφP )rP2 +i N × cos φP − δPh (P) drP dφP .
(B.1)
The integration over exp(iφP ) eliminates the first term. The other terms are simplified using the exponential forms of the cosine functions: exp[iψin (P)]gAm (P) Ein (P)gPh (P) E1 + i E2 , 2N 2N
Eout (P) ≈ where
(B.2)
E1 := exp −iδAm (P) exp(i2φP )rP2 drP dφP φP rP
+ exp iδAm (P)
rP2
drP dφP
(B.3)
φP rP
and
E2 := exp −iδPh (P)
exp(i2φP )rP2 drP dφP
φP rP
+ exp iδPh (P)
φP
rP2
drP dφP .
(B.4)
rP
All integrals containing exp(i2φP ) vanish. The remaining terms are integrated, which leads to the final form Eout (P) ≈
1 R23 − R13 exp iψin (P) gAm (P) exp iδAm (P) 2 2 3 R2 − R1 + iEin (P)gPh (P) exp iδPh (P) .
The scaling factor N has been replaced by
(R22
− R12 )π .
(B.5)
SPIRAL PHASE MICROSCOPY
51
A PPENDIX C. D EMODULATION OF M ULTIPLE I MAGES For convenience, we repeat the analysis of Section IV explicitly for the case of multi-image demodulation of thick samples. A general interferogram produces an intensity distribution of the form 2 I = Aobj exp(iψobj ) + Aref exp(iψref ) = A2obj + A2ref + Aobj Aref exp i(ψobj − ψref ) + exp −i(ψobj − ψref ) . (C.1) The corresponding complex image IC is obtained by multiplication of the individual images with their corresponding phase terms exp(iψref ), and is written as
IC = A2obj + A2ref exp(iψref ) + Aobj Aref exp(iψobj ) + Aobj Aref exp(−iψobj ) exp(i2ψref ).
(C.2)
The mean value IC of three images IC , taken with a different reference phase, is IC =
1 2 Aobj + A2ref exp(iψ1 ) + exp(iψ2 ) + exp(iψ3 ) 3 1 + Aobj Aref exp(iψobj ) + Aobj Aref exp(−iψobj ) 3 × exp(i2ψ1 ) + exp(i2ψ2 ) + exp(i2ψ3 ) .
(C.3)
For the case that the phase values ψn are equally distributed within the interval [0, 2π ] (i.e., nj=1 exp(iψj ) = 0), the terms in square brackets vanish, which results in the expression of Eq. (24). The mean value of the three interferograms 1 In = A2obj + A2ref (C.4) I¯ = 3 n can be used to show that the reference amplitude can be written as 1/2 . Aref = I¯/2 ± (I¯/2)2 − |IC |2
(C.5)
Aref has an ambiguity resulting from the sign of the inner square root. In any situation, only one of the solutions is appropriate. This behavior is caused by the fact that two interfering light fields would lead to exactly the same interference pattern, if they exchanged their absolute values. Since Aref is usually much more homogeneous than the absolute value of the object light
52
FÜRHAPTER ET AL .
field—which is in view of spiral interferometry also true for pure phase objects—it can be easily identified within the two solutions. Nevertheless, it must be noted that the sign of the square root that corresponds to Aref might change within one picture. It switches at regions where A2ref and A2obj are changing their local dominance.
R EFERENCES Abramowitz, M., Stegun, I.A. (1970). Handbook of Mathematical Functions. Dover, New York. Allen, R.D., David, G.B., Nomarski, G. (1969). The Zeiss–Nomarski differential interference equipment for transmitted light microscopy. Z. Wiss. Mikrosk. 69, 193–221. Anandan, J., Christian, J., Wanelik, K. (1997). Resource letter GPP-1: Geometric phases in physics. Am. J. Phys. 65, 180–185. Arlt, J., Dholakia, K., Allen, L., Padgett, M.J. (1998). The production of multiringed Laguerre–Gaussian modes by computer-generated holograms. J. Mod. Opt. 45, 1231–1237. Arnison, M.R., Cogswell, C.J., Smith, N.I., Fekete, P.W., Larkin, K.G. (2000). Using the Hilbert transform for 3D visualization of differential interference contrast microscope images. J. Microsc. 199, 79–84. Arnison, M.R., Larkin, K.G., Sheppard, C.J.R., Smith, N.I., Cogswell, C.J. (2004). Linear phase imaging using differential interference contrast microscopy. J. Microsc. 214, 7–12. Barone-Nugent, E.D., Barty, A., Nugent, K.A. (2002). Quantitative phaseamplitude microscopy I: Optical microscopy. J. Microsc. 206, 194–203. Barty, A., Nugent, K.A., Roberts, A., Paganin, D. (1998). Quantitative phase microscopy. Opt. Lett. 23, 817–819. Bellair, C.J., Curl, C.L., Allman, B.E., Harris, P.J., Roberts, A., Delbridge, L.M.D., Nugent, K.A. (2004). Quantitative phase amplitude microscopy IV: Imaging thick specimens. J. Microsc. 214, 62–69. Bernet, S., Jesacher, A., Fürhapter, S., Maurer, C., Ritsch-Marte, M. (2006). Quantitative imaging of complex samples by spiral phase contrast microscopy. Opt. Exp. 14, 3792–3805. Born, M., Wolf, E. (1980). Principles of Optics: Electromagnetic Theory of Propagation, Interference and Diffraction of Light, 6th ed. Pergamon, Oxford. Bracewell, R.N. (1978). The Fourier Transform and Its Applications. McGraw–Hill, New York. Chen, Z., Segev, M., Wilson, D.W., Muller, R.E., Maker, P.D. (1997). Selftrapping of an optical vortex by use of the bulk photovoltaic effect. Phys. Rev. Lett. 78, 2948–2951.
SPIRAL PHASE MICROSCOPY
53
Cogswell, C.J., Smith, N.I., Larkin, K.G., Hariharan, P. (1997). Quantitative DIC microscopy using a geometric phase shifter. Proc. SPIE 2984, 72–81. Crabtree, K., Davis, J.A., Moreno, I. (2004). Optical processing with vortexproducing lenses. Appl. Opt. 43, 1360–1367. Cuche, E., Marquet, P., Depeursinge, C. (1999). Simultaneous amplitudecontrast and quantitative phase-contrast microscopy by numerical reconstruction of Fresnel off-axis holograms. Appl. Opt. 38, 6994–7001. Curl, C.L., Bellair, C.J., Harris, P.J., Allman, B.E., Roberts, A., Nugent, K.A., Delbridge, L.M.D. (2004). Quantitative phase microscopy: A new tool for investigating the structure and function of unstained live cells. Clin. Exp. Pharm. Phys. 31, 896–901. Curl, C.L., Bellair, C.J., Harris, T., Allman, B.E., Harris, P.J., Stewart, A.G., Roberts, A., Nugent, K.A., Delbridge, L.M.D. (2005). Refractive index measurement in viable cells using quantitative phase-amplitude microscopy and confocal microscopy. Cytometry A 65A, 88–92. Davis, J.A., McNamara, D.E., Cottrell, D.M., Campos, J. (2000). Image processing with the radial Hilbert transform: Theory and experiments. Opt. Lett. 25, 99–101. Dodt, H.-U., 1995. Patent No. WO9529420. Dodt, H.-U., Frick, A., Kampe, K., Zieglgänsberger, W. (1998). NMDA and AMPA receptors on neocortical neurons are differentially distributed. Eur. J. Neurosci. 10, 3351–3357. Dodt, H.-U., Eder, M., Frick, A., Zieglgänsberger, W. (1999). Precisely localized LTD in the neocortex revealed by infrared-guided laser stimulation. Science 286, 110–113. Dodt, H.-U., Schierloh, A., Eder, M., Zieglgänsberger, W. (2003). Circuitry of rat barrel cortex investigated by infrared-guided laser stimulation. Neuroreport 14, 623–627. Franz, G., Kross, J. (2001). Generation of two-dimensional surface profiles from differential interference contrast (DIC) images. Optik 112, 363–367. Fürhapter, S., Jesacher, A., Bernet, S., Ritsch-Marte, M. (2005a). Spiral phase contrast imaging in microscopy. Opt. Expr. 13, 689–694. Fürhapter, S., Jesacher, A., Bernet, S., Ritsch-Marte, M. (2005b). Spiral interferometry. Opt. Lett. 30, 1953–1955. Guo, C., Cheng, X., Ren, X., Ding, J., Wang, H. (2004). Optical vortex phaseshifting digital holography. Opt. Exp. 12, 5166–5171. Heckenberg, N.R., McDuff, R., Smith, C.P., White, A.G. (1992). Generation of optical phase singularities by computer-generated holograms. Opt. Lett. 17, 221–223. Hoffman, R., Gross, L. (1975). The modulation contrast microscope. Nature 254, 586–588. Hoffman, R. (1977). The modulation contrast microscope: Principles and performance. J. Microsc. 110, 205–222.
54
FÜRHAPTER ET AL .
Huguenin, J.A.O., Coutinho dos Santos, B., dos Santos, P.A.M., Khoury, A.Z. (2003). Topological defects in moire fringes with spiral zone plates. J. Opt. Soc. Am. A 20, 1883–1889. Ikeda, T., Popescu, G., Dasari, R.R., Feld, M.S. (2005). Hilbert phase microscopy for investigating fast dynamics in transparent systems. Opt. Lett. 30, 1165–1167. Jaroszewicz, Z., Kolodziejczyk, A. (1993). Zone plates performing generalized Hankel transforms and their metrological applications. Opt. Commun. 102, 391–396. Jesacher, A., Fürhapter, S., Bernet, S., Ritsch-Marte, M. (2004a). Diffractive optical tweezers in the Fresnel regime. Opt. Exp. 12, 2243–2250. Jesacher, A., Fürhapter, S., Bernet, S., Ritsch-Marte, M. (2004b). Size selective trapping with optical “cogwheel” tweezers. Opt. Exp. 12, 4129– 4135. Jesacher, A., Fürhapter, S., Bernet, S., Ritsch-Marte, M. (2005). Shadow effects in spiral phase contrast microscopy. Phys. Rev. Lett. 94, 233902. Jesacher, A., Fürhapter, S., Bernet, S., Ritsch-Marte, M. (2006). Spiral interferogram analysis. J. Opt. Soc. Am. A 23, 1400–1409. Kadono, H., Ogusu, M., Toyooka, S. (1994). Phase shifting common path interferometer using a liquid-crystal phase modulator. Opt. Commun. 110, 391–400. Khonina, S.N., Kotlyar, V.V., Shinkaryev, M.V., Soifer, V.A., Uspleniev, G.V. (1992). The phase rotor filter. J. Mod. Opt. 39, 1147–1154. Larkin, K.G., Bone, D.J., Oldfield, M.A. (2001). Natural demodulation of two-dimensional fringe patterns. I. General background of the spiral phase quadrature transform. J. Opt. Soc. Am. A 18, 1862–1870. Larkin, K.G. (2005). Uniform estimation of orientation using local and nonlocal 2-D energy operators. Opt. Exp. 13, 8097–8121. Liang, R., Erwin, J.K., Mansuripur, M. (2000). Variation on Zernike’s phasecontrast microscope. Appl. Opt. 39, 2152–2158. Lohmann, A.W., Tepichin, E., Ramirez, J.G. (1997). Optical implementation of the fractional Hilbert transform for two-dimensional objects. Appl. Opt. 36, 6620–6626. Lowenthal, S., Belvaux, Y. (1967). Observation of phase objects by optically processed Hilbert transform. Appl. Phys. Lett. 11, 49–51. Marquet, P., Rappaz, B., Magistretti, P.J., Cuche, E., Emery, Y., Colomb, T., Depeursinge, C. (2005). Digital holographic microscopy: A noninvasive contrast imaging technique allowing quantitative visualization of living cells with subwavelength axial accuracy. Opt. Lett. 30, 468–470. Microscopy Primer (2006). An excellent compendium explaining contrast in optical microscopy. Available at: http://micro.magnet.fsu.edu/primer/ techniques/contrast.html.
SPIRAL PHASE MICROSCOPY
55
Ng, A.Y.M., See, C.W., Somekh, M.G. (2004). Quantitative optical microscope with enhanced resolution using a pixelated liquid crystal spatial light modulator. J. Microsc. 214, 334–340. Noda, T., Kawata, S. (1992). Separation of phase and absorption images in phase-contrast microscopy. J. Opt. Soc. Am. A 9, 924–931. Nomarski, G. (1955). Microinterféromètrie différentiel à ondes polarisées. J. Phys. Radium 16, 9S–11S. Oemrawsingh, S.S.R., van Houwelingen, J.A.W., Eliel, E.R., Woerdman, J.P., Verstegen, E.J.K., Kloosterboer, J.G., ’t Hooft, G.W. (2004). Production and characterization of spiral phase plates for optical wavelengths. Appl. Opt. 43, 688–694. Oron, R., Davidson, N., Friesem, A.A., Hasman, E. (2001). Transverse mode shaping and selection in laser resonators. Prog. Opt. 42, 325–386. Padawer, J. (1968). The Nomarski interference microscope: An experimental basis for image interpretation. J.R. Microsc. Soc. 88, 305–349. Paganin, D., Nugent, K.A. (1998). Non-interferometric phase imaging with partially coherent light. Phys. Rev. Lett. 80, 2586–2589. Paganin, D., Barty, A., McMahon, P.J., Nugent, K.A. (2004). Quantitative phase-amplitude microscopy III: The effects of noise. J. Microsc. 214, 51– 61. Pluta, M. (1989). Advanced Light Microscopy, vols. 1–3. Elsevier, Amsterdam. Popescu, G., Deflores, L.P., Vaughan, J.C., Badizadegan, K., Iwai, H., Dasari, R.R., Feld, M.S. (2004). Fourier phase microscopy for investigation of biological structures and dynamics. Opt. Lett. 29, 2502–2503. Popescu, G., Ikeda, T., Best, C.A., Badizadegan, K., Dasari, R.R., Feld, M.S. (2005). Erythrocyte structure and dynamics quantified by Hilbert phase microscopy. J. Biomed. Opt. 10, 060503. Preza, C. (2000). Rotational-diversity phase estimation from differential interference contrast microscopy images. J. Opt. Soc. Am. A 17, 415–424. Rappaz, B., Marquet, P., Cuche, E., Emery, Y., Depeursinge, C., Magistretti, P. (2005). Measurement of the integral refractive index and dynamic cell morphometry of living cells with digital holographic microscopy. Opt. Exp. 13, 9361–9373. Reynolds, G.O., DeVelis, J.B., Parrent Jr., G.B., Thompson, B.J. (1989). The New Physical Optics Notebook: Tutorials in Fourier Optics. SPIE Optical Engineering Press, Bellingham, Washington. Robinson, D.W., Reid, G.T. (1993). Interferogram Analysis: Digital Fringe Pattern Measurement Techniques. IOP Publishing, Bristol. Shannon, R.R., Weekley, R.E., Shafer, D. (1965). A source of spiral fringes. Appl. Opt. 4, 1193–1196. Smyth, R., Moore, R. (1984). Instantaneous phase measuring interferometry. Opt. Eng. 23, 361–364.
56
FÜRHAPTER ET AL .
Stephens, D.J., Allan, V.J. (2003). Light microscopy techniques for live cell imaging. Science 300, 82–86. Steward, E.G. (1987). Fourier Optics: An Introduction, 2nd ed. Ellis Horwood, Chichester. Sundbeck, S., Gruzberg, I., Grier, D.G. (2005). Structure and scaling of helical modes of light. Opt. Lett. 30, 477–479. Swartzlander Jr., G.A. (2001). Peering into darkness with a vortex spatial filter. Opt. Lett. 26, 497–499. Tadrous, P.J. (2002). Methods for imaging the structure and function of living tissues and cells. Available at: http://www.bialith.com/Teaching/ PathologyPG/BAMScHCInV.PDF. Takeda, M., Ina, H., Kobayashi, S. (1982). Fourier-transform method of fringe-pattern analysis for computer-based topography and interferometry. J. Opt. Soc. Am. 72, 156–160. Teague, M.R. (1983). Deterministic phase retrieval: A Green’s function solution. J. Opt. Soc. Am. 73, 1431–1441. Van Munster, E.B., Van Vliet, L.J., Aten, J.A. (1997). Reconstruction of optical pathlength distributions from images obtained by a wide-field differential interference contrast microscope. J. Microsc. 188, 149–157. Villa, J., De la Rosa, I., Miramontes, G., Quiroga, J.A. (2005). Phase recovery from a single fringe pattern using an orientational vector-field-regularized estimator. J. Opt. Soc. Am. A 22, 2766–2773. Wang, W., Yokozeki, T., Ishijima, R., Wada, A., Miyamoto, Y., Takeda, M., Hanson, S.G. (2006). Optical vortex metrology for nanometric speckle displacement measurement. Opt. Exp. 14, 120–127. Yasuda, R., Nimchinsky, E.A., Scheuss, V., Pologruto, T.A., Oertner, T.G., Sabatini, B.L., Svoboda, K. (2004). Imaging calcium concentration dynamics in small neuronal compartments. Sci. STKE 219, 15. Yi, R., Chu, K.K., Mertz, J. (2006). Graded-field microscopy with white light. Opt. Exp. 14, 5191–5200. Zernike, F. (1934). Diffraction theory of knife-edge test and its improved form, the phase contrast. Mon. Not. R. Astron. Soc. 94, 382–383. Zernike, F. (1935). Das Phasenkontrastverfahren bei der mikroskopischen Beobachtung. Z. Tech. Phys. 16, 454–457. Zernike, F. (1942a). Phase contrast, a new method for the microscopic observation of transparent objects. Part I. Physica 9, 686–698. Zernike, F. (1942b). Phase contrast, a new method for the microscopic observation of transparent objects. Part II. Physica 9, 974–986. Zernike, F. (1955). How I discovered phase contrast. Science 121, 345–349.
F IGURE 1.1. Image filtering in the Fourier plane. The principle of a spiral phase filter in the Fourier plane of the optical imaging pathway is shown, illustrating a so-called 4f-setup. The illumination light is scattered into different directions according to the composition of the sample. Different gradient directions (two directions are indicated with green and orange color) are focused at different positions in the Fourier plane and there acquire a phase shift corresponding to their respective position. The resulting image of the sample is a coherent superposition of the undiffracted light with the remaining filtered part of the light field.
F IGURE 1.2. A conventional application of the spiral phase plate. The spiral phase plate is illuminated with a plane wave. A lens performs a Fourier transform of the image field. The result is a Laguerre–Gauss mode TEM∗01 , which can be used, for example, as an optical trap.
F IGURE 1.4. Visualization of the principle of the spiral phase filter. A sample (left image) of constant optical thickness (marked in green) is convoluted with the spiral phase filter. Every image point is convoluted with the Fourier transform of the filter (middle image). Different line thickness corresponds to different phase levels. After summation over all image points, every image point cancels out, except at edges or phase jumps. In the filtered image (right image), edges appear bright (here marked in red) against a dark background.
F IGURE 1.8. Low coherence illumination. A halogen bulb is used as a white light source. In contrast to the setup used in Figure 6, the frequency bandwidth of the light source requires a correction of the occurring dispersion. Therefore the area of the SLM is split into two halves. The first half displays the spiral phase filter, which diffracts the filtered light into its first order; the second half is filled with a “normal” blazed grating that compensates for the dispersion effects. The first diffraction order contains the dispersion-compensated light, which is then imaged at a CCD camera. Focal length of lenses: f1 = 100 mm, f2 = 110 mm, f3 = 200 mm, f4 = 150 mm, f5 = 400 mm, and f6 = 200 mm.
F IGURE 1.19. Set-up used for shadow effects. The main change compared to Figure 8 is a modified spiral phase filter with an adjustable central area that was introduced and explained in Figure 18.
F IGURE 1.23. Reconstructed phase topography of a section of a so-called Richardson slide. The depth of the structure is measured in absolute units. Our white-light illumination source has limited spatial coherence; therefore the numerical reconstructed image underestimates the true height by 40%.
F IGURE 1.24. Imaging and postprocessing of crystallized proteins. Image A is the brightfield image. Images (B)–(D) are three spiral phase–filtered images obtained at a spiral phase plate angle of 0, 2π/3, and 4π/3, respectively. Images E and F in the second row show the numerically processed intensity transmission and phase profile of the cell. G shows a surface plot gained from the phase profile that displays the optical thickness of the specimens.
F IGURE 1.28. Contour line demodulation method. Upper left: spiral interferogram; upper right: corresponding contour line, cut into parts; lower left: height information is added; lower right: final contour after an interpolation between adjacent lines. The dimensions of the sample are ≈ 150 × 150 µm2 ; the optical thickness of the sample is about 2 µm.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 146
LULU Theory, Idempotent Stack Filters, and the Mathematics of Vision of Marr CARL H. ROHWER AND MARCEL WILD Department of Mathematics, University of Stellenbosch, Matieland 7602, South Africa
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . II. The LULU Framework for Image Analysis and Decomposition . . . . . . . . A. LULU Theory and the Discrete Pulse Transform . . . . . . . . . . . 1. Elementary Particles and Atoms of LULU Theory . . . . . . . . . 2. The Discrete Pulse Transform . . . . . . . . . . . . . . . 3. The Lack of the Usual Linear Theory . . . . . . . . . . . . . 4. Preliminaries About Lattices . . . . . . . . . . . . . . . 5. The Ranges of the LULU Operators . . . . . . . . . . . . . 6. Separators as Generalized Projections . . . . . . . . . . . . . 7. Replacing Pythagoras . . . . . . . . . . . . . . . . . 8. The Emerging Perspective . . . . . . . . . . . . . . . . 9. The Ambiguity Problem and Choices . . . . . . . . . . . . . 10. Analysis of the Resolution Components . . . . . . . . . . . . B. Component Modification and Consistency . . . . . . . . . . . . . 1. The Fundamental Problem of Nonlinearity in Decompositions . . . . . . 2. Shape Preservation in the DPT . . . . . . . . . . . . . . . 3. Full-Trend Preservation and Consistency . . . . . . . . . . . . C. Practical Considerations . . . . . . . . . . . . . . . . . . 1. The Effect of Noise in the DPT . . . . . . . . . . . . . . . 2. The Relationship with Median Decompositions and Alternatives . . . . . 3. Quantizing and Thresholding Components . . . . . . . . . . . 4. Numerical Considerations . . . . . . . . . . . . . . . . D. LULU Image Analysis of Optical Second-Harmonic Imaging of Pbx Cd1–x Te Ternary Alloys . . . . . . . . . . . . . . . . . . . . . . . 1. The Second Dimension . . . . . . . . . . . . . . . . . 2. Highlighting of Cd-Rich Crystals . . . . . . . . . . . . . . 3. Edge Detection Using the Discrete Pulse Transform . . . . . . . . . 4. Image Registration . . . . . . . . . . . . . . . . . . 5. Estimation of Variance of Random Distribution . . . . . . . . . . 6. Results . . . . . . . . . . . . . . . . . . . . . . III. Vistas on Idempotency . . . . . . . . . . . . . . . . . . . A. Stack Filters With a View on LULU Operators . . . . . . . . . . . 1. Positive Boolean Functions on Linearly Ordered Sets . . . . . . . . 2. Threshold Decompositions . . . . . . . . . . . . . . . . 3. Stack Filters, Evaluation, and Optimization . . . . . . . . . . . 4. Teasing a Little Linearity From Stack Filters . . . . . . . . . . . 5. Linear Combinations of Stack Filters . . . . . . . . . . . . .
58 59 60 60 62 65 66 67 70 72 73 75 76 81 81 83 86 90 90 91 95 98 100 101 102 102 104 105 105 107 107 107 109 110 114 117
57 ISSN 1076-5670 DOI: 10.1016/S1076-5670(06)46002-X
Copyright 2007, Elsevier Inc. All rights reserved.
58
ROHWER AND WILD
B. Abstract Semigroups and Galois Connections . . . . . . 1. Semigroups and Their Idempotent Elements . . . . . 2. Ordered Semigroups . . . . . . . . . . . . 3. Galois Connections . . . . . . . . . . . . C. Mathematical Morphology . . . . . . . . . . . 1. Structural Openings . . . . . . . . . . . . 2. Envelopes . . . . . . . . . . . . . . . D. The Fine Structure of LULU Operators . . . . . . . . 1. The band B(M, N ) . . . . . . . . . . . . 2. The Invariant Series of a LULU Operator . . . . . . E. Lattice Stack Filters . . . . . . . . . . . . . 1. Positive Boolean Functions on Arbitrary Lattices . . . 2. Lattice Stack Filters, Nonlinear Systems, and Image Algebra 3. Idempotency of LSFs . . . . . . . . . . . 4. Co-Idempotency of LSFs . . . . . . . . . . 5. Concrete Calculations . . . . . . . . . . . IV. Conclusion . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
118 118 122 126 128 128 133 135 136 138 141 141 142 145 150 154 156 159
I. I NTRODUCTION To many interested observers from fields near to image processing, the need for a mathematics of vision, as articulated by Meyer (1993), is convincing. This is particularly so if viewed from wavelet theory and mathematical morphology (MM). We present crucial ideas from LULU theory; in particular, we demonstrate that discrete pulse transforms (DPTs) exist that meet realistic representation requirements. We present our ideas in two distinct parts. The first (Section II) is contributed by the first author, who was involved with nonlinear filtering and its practical applications, often in industrial problems. In 1998 the second author, drawing on his abstract lattice-theoretic background, joined in a fruitful collaboration. His contribution is Section III. The first part (Section II) presents the fundamental constituents and perspectives of the LULU-theory, leading to significant results for image decomposition and analysis, and ending with a specific application that demonstrates the simplicity of application that has been found typically. The remainder of the Introduction mainly previews parts of Section III. In particular, stack filters (III.A) and mathematical morphology (III.C). Although Ln Un and Un Ln (the most prominent LULU operators) fit both theories, we argue that they have been neglected by both. Stack filter theory based on threshold decomposition, which is very efficient in other cases, is infeasible for Ln Un . It can be replaced by another kind of decomposition that has not been appreciated. As to MM, using our own preferred formalism, mappings that are induced by structural elements A, and compositions of such
LULU THEORY, IDEMPOTENT STACK FILTERS
F IGURE 1.
59
A Venn diagram of the structure of the article.
mappings are investigated. Our smoothers Ln Un and Un Ln are of this type, and whence some of their properties derive from general MM, several were established independently by the first author before he became aware of MM. Since MM focuses mainly on structural elements in two or more dimensions, it has missed many extra features of one-dimensional line segments A, the kind that underlie our LULU operators. Although the second author was initially sparked off by LULU theory, some of the deliberations in Section III take us far away from it. Within the domain of signal processing we mention stack filter optimization and its quest for idempotency, but new results and perspectives on ordered semigroups, max– min systems, monotone Boolean networks, and cellular automata are also discussed. These remarks, the outline above, and the accompanying Venn diagram (Figure 1) must suffice at this point; we do not want to distract reader from LULU theory before it is even started. Idempotence and co-idempotence are key concepts throughout the whole article. Often the essence or the key finding of a section cannot be appreciated sufficiently from the section’s introduction. To overcome this, we use boldface type for one or two key statements or equations per Sections II.A.1 to III.E.5.
II. T HE LULU F RAMEWORK FOR I MAGE A NALYSIS AND D ECOMPOSITION The ideas we present started 20 years ago in practical problems of nonlinear smoothing, for the removal of impulsive noise corrupting a signal, along
60
ROHWER AND WILD
the lines popularized by Tukey. In the first article on the subject (Rohwer, 1989), a basic underlying structure was exposed, which also resulted in some investigations into applications in image processing. Tribute must be paid to Collatz (1975) and Eccles and Popper (1981) for planting ideas previously. Ignorant of MM, the connection was only discovered almost ten years later due to a remark after a lecture in Antwerp. It was independently pointed out to us by Maragos and Schaefer (1987). Involvement in wavelet theory resulted in an overlap of two fields: MM and multiresolution analysis. Blessed with ignorance, the search turned into unexplored regions that repeatedly yielded jewels. We believe that this region (baptized LULU theory by a colleague), yields ideas worth study in pursuit of the mathematics of vision. Our point of departure is the discussion by Meyer on the subject of computer vision and human vision in Chapter 8 of his informative book (Meyer, 1993). We start with loose quotations as pegs on which to hang our ideas; specifically, we learn from these premises of Marr as follow: 1. There exists a science of vision that must be developed for the understanding of human and robotic vision. The foundation must be found. Algorithms involved should be physiologically realistic. 2. The doubt in iterative loops, and the belief that the choice of representation of an image is crucial. We provocatively set up a class of possible representation as targets to aim for, even if only for constructive criticism. We do not consider ourselves qualified to compare various alternatives, but at best to present a framework of reasoning—a simple and beautiful theory—often very successful in particularly practical cases. We postulate a close relationship between the mathematics of vision and the mathematics of smoothers, as developed in the monogram by Rohwer (2005), which is based on previous publications. We use a hypothetical 1D retina to present this. Noting that the operators can be considered two-dimensional (2D) quite simply, though having only a rotational symmetry of π radians, we restrict our presentation essentially to showing what is possible, not what is optimal or even near optimal. A. LULU Theory and the Discrete Pulse Transform 1. Elementary Particles and Atoms of LULU Theory We start with a vector x of N natural numbers (gray scales) between 0 and 2m , for some integer m, as elements. We ignore alternative treatment at both ends and simply append sufficiently many zeros before and after so that the vector x can be considered as an element of RZ with only N nonzero values
LULU THEORY, IDEMPOTENT STACK FILTERS
61
xi at the indexes i = 1 to i = N. We can also, for visualization and heuristic arguments, identify the sequence with a sampling at midpoints of the intervals of a spline function of order 1, (degree 0), on the partition of integers. We can then visualize it as a particular skyline of a Lego city. We pretend to derive the so-called LULU theory from basic heuristic premises and arguments and provide only some proofs of fundamental, easy, and instructive theorems. The monogram (Rohwer, 2005) on the subject is generally sufficient here for an understanding of the origin of the theory and the proofs needed. We consider x as an image on a fictitious 1D retina and assume that networks behind the retina can change the stimuli x into a representation for transport to a projection room for reconstruction to be viewed for interpretation. It is accepted that such a conversion in the case of human vision is nonlinear (Mallet, 1998), economical, and useful for some immediate interpretation. We immediately resort to some basic axioms of smoothers that were selected and adapted from those of Mallows (1980) at a time when a theory of nonlinear smoothers was considered difficult if not impossible (Velleman, 1977). We assume that algorithms of vision on an elementary level may well be axis independent and scale independent. Definition. A smoother S is an operator RZ → RZ on sequences x ∈ RZ such that: (HT) ES = SE, where E is the shift operator defined by (Ex)i = xi+1 . (VT) S(x + c) = S(x) + c, for any constant sequence c. (SI) S(αx) = αS(x) for α ≥ 0. Here HT, VT, and SI represent, respectively, horizontally vertically translation invariant, and scale independent. We note that the third axiom is sufficient, and less restrictive, than that chosen by Mallows where α < 0 is permitted. Clearly compositions of smoothers are smoothers. (See also Section III.A.5). We choose primitive smoothers (“electrons” and “protons”), denoted by and , respectively, and defined by:
x i = max{xi , xi+1 }. x i = min{xi−1 , xi } and We can build arbitrary finite compositions of these and call them , operators, but of particular interest are the (“neutrally charged”) compu tational “atoms” that consist of compositions of equally many and . Heuristically it is clear why we shall use “neutral atoms” for smoothing. Suppose we have a constant sequence as signal and a single isolated upward impulse is superimposed; then would remove such an impulse to yield the
62
ROHWER AND WILD
pure constant sequence. If a single negative impulse were present, however, it would be widened by . This is destructive to the shape of a signal, and it is easy to see that should rather be followed by immediately. is a quasi-inverse of with respect to at least two norms, in that x − x ≤ A x − x, for each sequence x and , -operator A (Rohwer, 2004a, 2004b, 2005). Since following cannot recreate the removed positive pulse, it preserves the constant sequence, while reducing the widened downward is pulse to its original length. Thus the atom less destructive than by itself. A similar argument favors instead of . Similar arguments favor n n and n n as smoothers to remove, respectively, upward anddownward impulses of width n, with minimal destruction. The couple , yields arguably the simplest, and the most relevant, erosion-dilation pair in the sense of MM (Section III.C.1). We therefore define our atoms, Ln and Un , as follows. Definition. For n ≥ 0 put Ln = n n and Un = n n . Any finite composition of these (e.g., L7 U3 L2 L5 U6 ) will be called a LULU operator. It is an easy matter to verify that Ln and Un are dual to each other in the sense that Ln (−x) = −Un x,
for all x ∈ RZ .
Using these operators, we intend to demonstrate a conversion to different representations of x by only using two operations—composition and subtraction—and a reconstruction consisting only of addition. We note that a hypothetical physiology could conceivably do the mapping onto x and x with a bank of 2N comparisons between neighboring pixels, wired to the back of the retina. Thus x and x could be similarly obtained by 2N comparisons on a second layer. This could happen with a time delay of 2δ, where δ is the time for a comparison. Similarly, arbitrary compositions of our computational atoms could then be obtained with minimal time delay from such layers of comparators. The operators are, in principle, highly vectorizable. Finally, differences of any two even layers are obtainable as a sequence consisting only of differences of elements of x. Thus no numbers other than those between 0 and 2m are ever needed (except for signs). No roundoff error need occur. These are all the outputs needed for our candidate representation to be transmitted either through a serial port or through a parallel port. 2. The Discrete Pulse Transform We present a transformation analogous to a Haar wavelet decomposition (Figure 2a) on the first level, namely, a projection onto the smoother (= more
LULU THEORY, IDEMPOTENT STACK FILTERS
63
(a)
(b) F IGURE 2. Haar (wavelet) decomposition of a sequence (a). Decomposition of the same sequence using L1 U1 and U1 L1 (b).
smooth) sequence P1 x, and the (additive) complement W x = (I − P1 )x, with (P1 x)2i = 12 [x2i−1 + x2i ] = (P1 x)2i−1 defining P1 . In this Haar decomposition, the sequence W x = (I − P1 )x is the projection of x onto the (wavelet) Subspace. With the wavelet basis this can be represented by 1 2 N coefficients. This is the highest resolution component of x. The Parseval identity (Pythagoras law) yields 2 x2 = P1 x2 + (I − P1 )x 2
2
2
so that the energy in the highest resolution level, as well as where it is localized in [1, N], is available for decision making. We replace P1 ,which the (linear) smoother is a projection, by the smoother LU = (or by U L = ), which cannot be a projection as it is nonlinear. In principle, this can be done in two steps, first U x = x is used to compute U x − x = (U − I )x. Then LU x = L(U x) is used to obtain U x − LU x = (I − L)U x. In Figure 2b the smoother parts of LU x and U Lx are in fact very similar and appear to have “more trend” than P1 x. The residuals are also very similar and sparsely populated.
64
ROHWER AND WILD
How else do LU and U L compare to P1 ? They seem simpler computationally, as the elements of LU x are none other than some of x, representable by m binary digits, and the elements of the first (highest) resolution level of the sequence x only use differences of elements of x in the two sequences (U − I )x and (I − L)U x. Despite the ease in proving that (U − I )x cannot have two consecutive nonzero elements (neither can (I −L)U x), this does not suggest much yet. An interesting observation in both graphs of the smoother part that is the outcome of the first mapping is that there is a readily observable “trend” in that each set of three consecutive elements is a monotone set. This will become important. The crucial step further is in the way the smoothers are chosen for the next step. This is where the best and simplest choices were missed, and some strong theory remained undiscovered. The usual Haar wavelet projection maps onto the next resolution level, which again is a (wavelet) subspace of dimension half that of the previous. It is therefore natural to think diadically. Consensus often was that choosing median smoothers instead of the linear filters results in much better decomposition (than wavelets) in many crucial applications, and that morphological filters donot do as well (Bijaoui et al., 1998). We propose and motivate specific , -operators for this next stage in a decomposition. However, first we use the analogy of the discrete Fourier transform (DFT) and do not decompose diadically, but linearly (incrementally). For conceptual convenience and economy, we view this basis transformation (the DFT) as a sequence of projections as follows: x → P1 x → P2 (P1 x) → P3 (P2 (P1 x) → etc. with residuals given by (I − P1 )x = r 1 ,
(I − P2 )P1 x = r 2 , etc.
Here (I − P1 ) is seen as the projection of x onto the highest-frequency component. (I − P2 ) projects the smoother part P1 x onto the second-highest frequency level, and so on. The aim is clear. We decompose the sequence into resolution (frequency) components, codable by two numbers. If the sequence x is band limited, we can economize by omitting frequency levels with relatively little energy. For this, the Parseval identity is available, as is well known and exploited in signal processing. Quantifying judiciously (and/or thresholding) will economize without much damage to mapping back near x. LULU theory, and our experience of it, makes us choose to replace Pn as follows. . . Definition. We define recursively Cn = Ln Un Cn−1 , with C0 = I . Similarly, Fn = Un Ln Fn−1 , with F0 = I . With these definitions, we can now choose Pn to be Ln Un and have a sequence x decomposed into the resolution sequences r n by
LULU THEORY, IDEMPOTENT STACK FILTERS
65
Definition. r n = (I − Ln Un )Cn−1 x, for n = 1, 2, . . . . An alternative choice would be to choose r n = (I − Un Ln )Fn−1 . At this stage, it may seem strange to have x represented by a sequence of sequences, but further investigation demonstrates that these sequences are generally simple in structure and easy to represent. Instead of resorting to magic tricks, we turn to the heuristics behind the results that led to their discovery (Rohwer, 1989, 1999, 2002a, 2002b, 2002c, 2004a, 2004b, 2005, 2006). The theorems we quote have been proved over many years, in diverse journals, and may have been missed by researchers in image processing. Our choice of decomposition actually yields a simple representation of the signal in terms of block pulses of different width. We are certain that this may have been attempted elsewhere, but the theory behind our strategy yields a particularly useful set of decompositions that probably have not been motivatable yet, and also explains why some others may have been found unsatisfactory. We have found the perspectives useful. 3. The Lack of the Usual Linear Theory A linear decomposition, like the DFT or the Haar wavelet decomposition, can be considered here as a sequence of successive approximations in smaller dimensional subspaces (of smoother functions), saving the rougher part, which is the additive complement. In the choice of our wavelet decomposition, the first approximation is the projection H onto the half-dimensional subspace of piecewise constant splines with knots at even integers. The component (I –H ) yields the (additive) complement, which is a projection onto the appropriate wavelet subspace. We tryto emulate as far as possible. The operators at our disposal are , -operators, and we need to choose a corresponding approximation algorithm in the absence of a projection since these are not linear. The mappings from x to a particular resolution component of the choices above are thereforenot projections. In fact, the neutral operators, having as many and in their composition, are idempotent. Idempotent operators are nearlyprojections, but notsufficiently so. They are not even homogeneous, since (−x) = −( x), which is the fundamental duality property. They do, however, have many of the properties of projections. But first the following question should be asked: Why does one aim for projections? The following reasons suffice here: • Projections are near-best approximations in their ranges in any norm, by the Lebesgue inequality, provided the operator norm is small. • Pythagoras’ law is applicable, ultimately yielding the Parseval identity, which is useful for basic decisions in quantizing, thresholding, and so on.
66
ROHWER AND WILD
• There are only two eigenvalues, 0 and 1. These can be used to implicitly define “signal” and “noise.” Can we aim for the same objectives, avoiding the impossible demands of a projection? We do just that in several consistently motivated stages. • • • • •
We characterize useful ranges of potential choices of operators. We show that our mappings onto these ranges are good approximations. There is a Lebesgue-type inequality to prove near-best approximation. We establish a preservation law in a norm on the sequence sets involved. We show that, for a given sequence x, the operator behaves “nearly linearly.”
4. Preliminaries About Lattices Since the linear theory intrinsically ingrained from general undergraduate education is not applicable, we resort to the order structure that Collatz (1975) always insisted is the more fundamental one even in numerical analysis. We recall that a partially ordered set L = (L ≤) is a lattice if any x, y ∈ L admit a supremum (the least common upper bound) x ∨ y, and an infimum (the biggest common lower bound) x ∧ y. If just ∨ or just ∧ is universally defined, then L is a ∨-semilattice, respectively a ∧-semilattice. For x ≤ y the interval [x, y] consists of all z ∈ L with x ≤ z ≤ y. Generally our lattices L will be of the type L = RS , where R is a base lattice and S is any set. In this case, x ≤ y for x, y ∈ L is defined component-wise, i.e., xi ≤ yi for all i ∈ S. In particular, the class LL of all self-operators Φ : L → L becomes a lattice by setting Φ ≤ Ψ iff Φx ≤ Ψ x for all x ∈ L. Of main interest are operators Φ : L → L , which are increasing (other common names: order preserving, monotone, isotone, syntone) in the sense that x ≤ y implies Φx ≤ Φy for all x, y ∈ L. It is clear that a composition of increasing operators is again increasing. A lattice L is complete if every (possibly infinite) subset of L has an infimum and a supremum. For instance, each finite lattice is complete, but (Q, ≤) is not complete. For the remainder of Section II.A.4, L is supposed to be complete. A self-map α : L → L is a closing (other names: closure, closure operator) if it is increasing, extensive (x ≤ α(x) for all x ∈ L), and idempotent. Dually β : L → L is an opening (other name: dual closure) if it is increasing, anti-extensive (β(x) ≤ x for all x ∈ L) and idempotent. Given a closing α : L → L, its range F := {αx | x ∈ L} is a closure system (i.e., closed under arbitrary infima). Conversely, when F ⊆ L is any closure system, then α(x) := {y ∈ F | y ≥ x} is a closing. Mutatis mutandis the same relation holds between openings and dual-closure systems (i.e., subsets of L closed under arbitrary suprema).
LULU THEORY, IDEMPOTENT STACK FILTERS
67
Details to all of the material above may be found, for example, in Blyth (2005). Stronger than being increasing, but less relevant in our article, is the property of being a ∨-homomorphism, that is, satisfying Φ(x ∨y) = Φx ∨Φy for all x, y ∈ L. Mutatis mutandis the same for ∧-homomorphisms. 5. The Ranges of the LULU Operators Several efforts were made previously to characterize the range of the popular median operators, with the frustrating near-success obvious to all involved. The problems often were the so-called spurious roots of median smoothers, whose presence, sometimes only locally, spoiled things. Repeating the median or using recursive versions did yield roots and complicated characterizations of the main part of the range. These had been characterized laboriously in terms of “edges,” “constant neighborhoods,” “monotone sections,” and so on, but only became simplified with the following type of definition. The idea appeared independently in several places, but we choose the following definition. Definition. Let M0 = RZ be the set of all sequences that can appear in the analysis. Mn is defined as the subset of sequences x with the property that they are n-monotone, which means that {xi , xi+1 , xi+2 , . . . , xi+n+1 } is monotone for all such subsets of consecutive elements of x. It is worthwile investing some thought on this class of properties, specifically to become accustomed to these as concepts of smoothness (or roughness) for sequences, and the lack of simple alternatives. We note that local monotonicity has been a classical concept of smoothness in real analysis (Royden, 1969), where piecewise monotonicity is a concept that is linked to differentiability and bounded variation. We visualize the nested sequence of sets in Figure 3. Note that the notation is consistent, in that x ∈ M0 for each x and Mn+1 ⊂ Mn . The intersection of all these monotonicity classes is M∞ , the class of monotone (nonincreasing or nondecreasing) sequences. These properties of local monotonicity define a concept of smoothness, consistent with the interpretation of the popular median smoothers defined by (Mn x)i = median{xi−n , . . . , xi , . . . , xi+n }, in that Mn = I for x ∈ Mn , but Mn = I for x ∈ Mj and j < n. This is easily deduced. Basic for the purposes here is that the popular median smoother is included in the LULU interval [Un Ln , Ln Un ] since Un Ln ≤ Mn ≤ Ln Un .
68
ROHWER AND WILD
F IGURE 3.
The nested classes of n-monotone sequences.
This was proved long ago (Rohwer, 1989), and is crucial in the analysis and comparison of large classes of smoothers. (See Example 4 in III.A.3 for an alternative proof.) The fundamental rôle of the LULU interval remains central in the theory, since a fundamental result (Rohwer, 2002a, 2002b, 2002c) established a Lebesque-type inequality for all the operators in the LULU interval [Un Ln , Ln Un ] that map into Mn . Such operators all map onto nearbest approximation in Mn ! Figure 4 shows the mutual order relations among the key LULU operators. Notice that no further order relations besides the ones displayed above do hold; for example, F4 L7 U7 . Figure 5 shows the complete LULU interval [U2 L2 , L2 U2 ] along with the (non-LULU) operators M2 and the center B2 = U2 L2 ∨ (L2 U2 ∧ I ) (see (II.C.2)). That the figures depict the order relation correctly follows from Theorem 14 in III.A.3, and Theorem 6.2 in Rohwer and Wild (2002), which gives a complete set of conditions governing the order relations among arbitrary LULU operators; including exotic instances such as L10 U8 L9 U6 L7 U5 L5 U2 L2 < U9 L9 U7 L4 U4 L3 U3 L2 U1 ,
and
L 7 U 4 L5 U 2 U 5 L6 U 3 . Slight experimenting demonstrates that the range of the first Haar projection H is in M1 . (The ranges of H 2 , H 3 , . . . are M5 , M9 , etc.). Thus these classes are sufficiently large. These classes of sequences were shown to be natural in the LULU theory, in that the common ranges of Ln Un and Un Ln are precisely Mn ! Note that all compositions of and are increasing smoothers, as are the medians. This property, together with idempotence, are the axioms for
LULU THEORY, IDEMPOTENT STACK FILTERS
F IGURE 4.
F IGURE 5.
69
The order relations among the key LULU operators.
Some smoothers contained in the LULU interval [U2 L2 , L2 U2 ].
morphological filters (III.C.2), of which the LULU operators turn out to be special choices. Inside a particular LULU interval are a large choice of smoothers different from Un Ln , Ln Un , Fn , Cn from which to to select, using secondary additional attributes to make specific choices to suit our aims. See Figure 5 for the case n = 2; the morphological center B2 is discussed in II.C.2. It is also significant that all operators in a particular LULU interval [Un Ln , Ln Un ] are equivalent in a strong sense. They all preserve precisely the sequences in Mn (the n-monotone ones), like Un Ln and Ln Un do (III.D.2). This guarantees many of the good properties of the well-known median smoothers, and even eventually, for example, the first proof we have seen that
70
ROHWER AND WILD
M1 is variation reducing (Rohwer, 2004a, 2004b), and a understanding of the effect of sections from the spurious roots of the median smoothers that were characterized and all shown to be periodic (Zhou et al., 1992). 6. Separators as Generalized Projections A second idea connected with the projection property was apparently universally missed and proved to yield a crucial breakthrough in understanding and theory. A projection automatically makes I − P also a projection since P is linear, so that also (I − P )2 = I − P . For a general idempotent operator S its co-idempotence i.e., the idempotence of the (additive) complement I − S is not general. See III.E.5 for explicit counter-examples. Investigating this idea, motivated by a simple heuristic argument, initially proved the co-idempotence for some basic LULU smoothers, but a more thorough investigation yielded co-idempotence for the relevant neutral min– max composition. Much later a different theory yielded this as a corollary. It is instructive to motivate and imprint this (unfamiliar to many) simple heuristic perspective here. We first define the concept of a separator. Definition. A separator is a smoother that is idempotent and co-idempotent. In LULU theory a smoother is an operator that is translation and scale independent (see II.A.1). This is easy to motivate and less restrictive than the axioms of Mallows (1980) would be. We note that morphological filters are required to be increasing and idempotent, but not co-idempotent. We seek co-idempotence for the following fundamental reason. We think of a machine (operator) S that is to accept x (signal plus noise) and to ideally have two outputs, Sx as signal and (I − S)x as noise, with the mass being preserved. We can imagine (inefficient) uranium enrichment separators in a cascade, where the signal (wanted isotope) and the noise (other atoms) are required to be separated. Alternatively, our grandmothers had a simple (and more effective) separator called a milk separator. A simple machine accepted what came from a cow and had one input and two outputs. Pails collected the cream (signal) and the whey (noise) separately. (Note that a cheesemaker may, equally validly, interchange the concepts of signal and noise.) To test the effectiveness of such a separator, one simply took the pail with cream as input and observed what happened to the output. A good separator yielded only cream. On acceptance of the pail with whey, a second (possibly different) result yielded a second opinion on the quality of separation. The first test clearly is for idempotence; the second is for co-idempotence. We consider it worthwhile to fix this fundamental heuristic with the diagram shown in Figure 6. At this stage, it is instructive to show that any separator has only two eigenvalues: 0 and 1. The corresponding eigensequences can then clearly be used to define what is meant by signal and what is meant by noise.
LULU THEORY, IDEMPOTENT STACK FILTERS
F IGURE 6.
71
Diagram of a two-stage separator cascade.
Theorem 1. (The Fundamental Separator Theorem; Rohwer, 2005.) A separator has only two eigenvalues, 0 and 1. Proof. Let Se = λe, where λ is an eigenvalue and e a corresponding eigenvector. If λ ≥ 0, then S 2 e = S(λe) = λS(e) = λ2 e = Se = λe. But (λ2 − λ)e = 0 implies that λ = 0 or λ = 1. If λ < 0, then (I − S)e = e − λe = (1 − λ)e. Since 0 = S(I − S)e = S(1 − λ)e = (1 − λ)Se, because 1 − λ ≥ 0 and S is scale independent. But then (1 − λ)λe = 0, which is impossible for λ < 0. Many readers may immediately ask: What else follows from meeting the requirements of a separator? This question is unexpectedly difficult to answer. Clearly a separator, being a smoother, has S(αx) = α(Sx) for α ≥ 0. (Our grandmothers may have had a stricter set of tests for a good milk separator.) Taking any nonnegative combination of the signal Sx and the noise (I − S)x should be predictable in that for all α, β ≥ 0:
S αSx + β(I − S)x = αSx. (1) Does this follow from the axioms chosen? Apparently not generally. If β = 0, the scale independence yields idempotence. If α = 0, it yields coidempotence of S since (I − S)2 = I − S ⇔ S(I − S) = 0. Also if α = β, S(αx) = αSx. In other words, Eq. (1) implies scale independence. The domain of consistency of S [the set of pairs (α, β) for which Eq. (1) holds] thus easily can be shown to include the three lines α = 0, β = 0, and α = β in the upper right quarter of the parameter plane given by α, β ≥ 0. What
72
ROHWER AND WILD
about other parts of this region? This puzzle has recently been reexamined and yielded valuable results for all the idempotent min–max operators under discussion. Other properties than the minimal ones are involved, however. We shall return to the general consistency for α, β ≥ 0 involved in our choices later. 7. Replacing Pythagoras Since we cannot expect orthogonality in our choices, tackling the problem of finding a conservation law, like that of Pythagoras in the case of projections (linear separators), never has been attempted (at least, we found no evidence presented). The answer appeared incidentally, as a result of a belated response to one of the first questions Mallows asked after becoming acquainted with basic LULU theory: Can one perhaps prove a reduction in variation for the LULU operators Ln Un and Un Ln ? For a many reasons, the pursuit of an answer to his wise suggestion was shelved for a long time. When the question was revisited, the result was shattering. Initial success with Un and Ln for n = 1 proved to be simple: T (x) =
∞
|xi+1 − xi | = T (U1 x) + T (I − U1 )x.
i=−∞
This is convenient—a smoother should reduce roughness. This, however, yields not only the required variation reduction T (x) ≥ T (U1 x), but the precise conservation between the input variation and output variation of the separators U1 (and dually L1 ). Since the well-known measure of roughness, total variation, is a seminorm, we note that it becomes a norm in 1 since T (x) ≤ 2x1 . It is therefore a natural norm in the theory. For the sake of illustration, we include a simple consequence of the = and preservation result above. Considering only compositions of L 1 , we have a semigroup of only four elements—L1 , U1 , U1 , L1 , and U1 = U1 L1 —fully ordered by L1 ≤ U1 L1 ≤ L1 U1 ≤ U1 . The middle inequality is not as easy to prove as the others, but its validity ensures that these are only four elements in the semigroup (see III.D.1). Proving that U1 is variation reducing immediately proves that L1 is also, since U1 (−x) = −L1 (x). But then T (x) = T (U1 x) + T (I − U1 )x = T (L1 U1 x) + T (I − L1 )U1 x + T (I − U1 )x
≥ T (L1 U1 x) + T (I − L1 )U1 x + (I − U1 )x . The last step follows by the triangle inequality for seminorms. But (I − L1 )U1 x + (I − U1 )x = (I − L1 U1 )x, so that T (x) ≥ T (L1 U1 x) + T (I −
LULU THEORY, IDEMPOTENT STACK FILTERS
73
L1 U1 )x and, together with the triangle inequality, the variation preservation also follows for L1 U1 . The preservation holds for U1 L1 as well. Thus all compositions of L1 and U1 are variation preserving, and therefore also variation reducing. It is natural to investigate the compositions of L2 and U2 (there are again only four) and, afterward all compositions of L1 , U1 , L2 , and U2 . (There are only finitely many compositions of L1 , . . . , Ln , U1 , . . . , Un , and they are all idempotent; see III.D.1.) Originally the proofs for variation preservation by Un and Ln for increasing n were technically demanding, though instructive, but sufficient progress was made to yield a more general theory. A general observation is that, once some of the compositions at a specific value of n are proved variation preserving (VP), some other compositions follow easily as the above example demonstrates. After many progressively more difficult successes, a simple yet almost unbelievable pattern surfaced. We return to this after completing the presentation of the primary ideas behind the decomposition. Provisionally we state that all compositions of Ln and Un , for all positive integers n, are variation preserving. This is closely linked with a very important type of shape preservation. 8. The Emerging Perspective A strategy for nonlinear multiresolution decomposition can thus be understood as follows. Given x, separate it into L1 U1 x and the first residual, the sequence r 1 = (I − L1 U1 )x. Our x in M0 is therefore mapped into M1 ⊂ M0 yielding the smoother sequence L1 U1 x. We choose L2 U2 as the second separator to map this onto the even smoother part L2 U2 (L1 U1 x) ∈ M2 ⊂ M1 and the second residual r 2 = (I − L2 U2 )L1 U1 x. Continuing thus, we reach the Wavelet like-decomposition scheme, which is not difficult to consider as analogous to a DFT, visualized as a sequential peeling off of the highest remaining frequency component left in the smoother part. In this case, each residual is a 2D shadow of the original signal, codable by two real numbers or one complex number. Figure 7 demonstrates such a decomposition. It is easily observable that T (x) =
n
T r i + T (Cn x), i=1
where we define r i as the sequence (I − Li Ui )Ci−1 (x), with Cn defined recursively by Cn = Ln Un Cn−1 , with C0 = I . We can illustrate and familiarize some ideas by considering a signal consisting of three block pulses of width 1, 2, and 3 as a signal and superimposed independent
74
ROHWER AND WILD
F IGURE 7.
A wavelet-like decomposition scheme.
and identically distributed (i.i.d.) random noise from a uniform distribution. Figure 8 decomposes the sequence x by two decompositions: one with Ln Un and the other with Ln Un . Keeping track of the reduction in total variation, note that a large quantity is removed at the first resolution level. Thresholding can identify the large pulses in all three levels. Noticeable are the small deviations between the two decompositions, which can be associated with ambiguity of a fundamental type, arising from the definition of a pulse. This ambiguity will be significant if and where a piece of Nyquist frequency is present. This is left as an observation but can be made more precise. Significantly, a sharp reduction in total variation indicates a significant part of the “signal” appearing at a resolution level. In the case of Figure 8, this is at each of the resolution layers. The application to the smoothing and extraction of pulsemodulated signals is obvious, and this has been very successful in applications over several years. It is instructive to observe the good shape preservation apparent in comparing the resolution levels and the original signal, particularly if a Haar wavelet decomposition is imagined, as can be done by revisiting the example in Figures 2a and 2b. The wavelet decomposition always forces synthetic pairs of pulses to appear in the place of the significant shape of the three pulses representing the signal. We investigate this more thoroughly later. Not surprisingly, as n tends to infinity, Cn x tends to a monotone sequence. In 1 there is only one such sequence, the zero sequence with zero variation. When analyzing the residual sequences, r i = (I − Li Ui )Ci−1 (x) later, we are convinced that T (Cn x) must be zero once n is larger than the number N i N of nonzero elements in x. This implies that i=1 r is an alternative representation of x.
LULU THEORY, IDEMPOTENT STACK FILTERS
F IGURE 8.
75
The LULU decompositions of a sequence x.
There seems to be no conceivable use for our decomposition yet, let alone clear advantages over others. We have simply decomposed x into a sum of sequences r i , which can be called resolution components, in a nonlinear way. Any hope for use initially comes when the analog of the median transforms is considered. Many of these are judged to be better in practice than wavelet decompositions in many crucial applications (Bijaoui et al., 1998), apparently by consensus. If the medians Mn , with Un Ln ≤ Mn ≤ Ln Un , had been used in the decomposition above, they would have been visually almost indistinguishable. All the visually pleasing properties are almost always there as well, but not guaranteed. 9. The Ambiguity Problem and Choices The previous decompositions above could, at any level, have used the alternative mapping Un Ln of the smooth component Cn−1 (x) onto Cn (x) ∈ Mn (instead of Ln Un ). Thus there are 2n choices involved. The choice in the location of the secondary knots in the Haar decomposition yields similar
76
ROHWER AND WILD
F IGURE 9.
The fundamental ambiguity in the concept of impulsive noise.
choices. This problem is linked to a fundamental ambiguity in connection with pulses that is worth considering (Figure 9). Which choice should be made at each separation stage? It has already been stated that, since Un Ln ≤ Mn ≤ Ln Un for each n, a decomposition with Mn as smoother can be heuristically argued to be similar. There is no reason to expect it to be better, but the argument is not as easy as it first seems. Only upon analyzing the residuals do the advantages of LULU become apparent. At this stage, anyone with elementary computing skills can experiment with these decompositions, as all operations involved can be done with only three subroutines, to compute and and to subtract two sequences. (We postpone discussion of computational expense as n grows, which is a problem with medians, and even with primitive algorithmic choices in the LULU decompositions.) Experimenting with random sequences x quickly yields many initially surprising observations that generally are not difficult to prove, using the theory presented so far. A few are difficult to believe, formulate, or prove, and sometimes all of these. But all the theory presented so far is easily observed never to be contradicted for any choice of x. Many students have spent hours fascinated by what is seen experimentally; we encourage it. Here a simple example may be useful—a simple profile of a fort, with added impulsive noise and random noise, which is decomposed into resolution sequences r n = Dn x = (I − Ln Un )Cn−1 (x) to demonstrate the economization possible by choosing only a few quantized components. It is instructive to first decompose with the Haar decomposition for the purpose of comparison (Figure 10), and then to use an equivalent LULU decomposition (Figure 11). 10. Analysis of the Resolution Components Only when the properties of the sequence of component sequences r i become clearer can any use for the decomposition be appreciated. There are many
LULU THEORY, IDEMPOTENT STACK FILTERS
F IGURE 10.
77
A Haar (wavelet) decomposition of a fort, without noise.
simplifications, which are not always used here, as our intention is to provide ideas that may be useful, rather than provide optimal answers. Consider a Separator Sn , either Ln Un or Un Ln , at a specific level n in the decomposition. The smooth version that remains as signal at a stage is denoted by z and is in the smoothness class Mn−1 , so that each subset Λi = {zi , zi+1 , . . . , zi+n } is monotone. We consider the separation of z into Sn z + (I − Sn )z. Without loss of generality, we may choose the case where z happens to be Cn−1 x and Sn = Ln Un . Clearly, Sn can be viewed as a twostage separation, with z separated into Un z + (I − Un )z first, and Un z then separated into Ln Un z + (I − Ln )Un z. Note that Ln Un z ∈ Mn is smoother than z and the residual is still in Mn−1 , as will be proved later.
78
ROHWER AND WILD
F IGURE 11.
Fort profile, with impulsive noise added, and significant sequences Ci and Di .
Clearly r n consists of the sum of two sequences as (I − Un )z + (I − Ln )Un z = (I − Ln Un )z. Thus it consists of two components. Since Un ≥ I = U0 , the first sequence consists of nonnegative elements, and Ln ≤ I = L0 demonstrates nonpositivity of the second component. Denoting (I − Un )z by r − , we condense the original meticulous analysis into a brief form. Clearly, (I − A)2 = I − A if and only if A(I − A) = 0.
LULU THEORY, IDEMPOTENT STACK FILTERS
79
Since Un is co-idempotent Un (r − ) = Un (I − Un )z = 0, and it is clear that Un maps this component onto the zero sequence. It was easy to show that a separator S can have only two eigenvalues 1 and 0, a property obviously shared in linear separators that are projections. (Note that I − S is not a separator though; while it enjoys (HT) and (SI), it lacks (VT) because (I − S)(x + c) = (I − S)x. See also III.A.5.) Thus r − is an eigensequence of 0 for Un . A simple sequence of arguments shows that if an eigensequence of 0 is in Mn−1 , then each nonzero (therefore negative) element of (I − Un )z is part of a block of n equal values. If this block does not have nonnegative elements on both sides, Un would not map this section onto zero, but onto the minimum of these two negative neighbors, which contradicts. Also if any element of I − Un is zero, then it is part of a section of at least n consecutive zeroes. Thus r − is a sequence of negative blockpulses of width n, separated from each other by sections of at least n zeroes. It is easy to store (or code) each block pulse by one integer (say where it starts) and the absolute value of the amplitude (possibly approximately in quantized form). A similar analysis yields a similar structure for the residual r + = (I − Ln )Un z, except that the pulses are positive, once it is demonstrated that Un z is, like z, in Mn−1 . This is not difficult (but boring), and a simpler general result is later available. Some arguments (also simply replaceable later) prove that (Ln x)i = xi ⇒ (Un x)i = xi , as well as the dual of this statement, so that the negative and positive blocks of r − and r + cannot overlap. Clearly r − and r + are orthogonal to each other and so are individual block pulses in each of these. T (r − ) is just twice the sum of the amplitudes of the pulses involved, and T (r + ) similarly. Since the pulses do not overlap, it is easily seen that T (r n ) = T (I − Ln Un )z is just twice the sum of all the absolute amplitudes of the blocks involved and is easy to calculate. Also T (r n ) = n2 r n 1 , since r n contains block pulses of width n that do not overlap. Considering r n as a sequence containing block pulses of width n, we can now write r n as j ∈Λn bnj , where each bnj is a sequence with one block pulse, of width n, starting at index j , with amplitude bnj , and Λn an index set. We represent x by x=
N
rn =
n=1
N
bnj .
n=1 j ∈Λn
Clearly, also T (x) =
N N N 2 bnj 1 . T rn = T (bnj ) = n n=1
n=1 j ∈Λn
n=1 j ∈Λn
Further statements about the blocks of a resolution sequence can be made. One important statement is that, whereas different blocks in a particular
80
ROHWER AND WILD
resolution level are disjointed (they cannot overlap), blocks from different resolution levels are either disjointed or completely overlapping. This can be proved with a thorough algorithmic analysis, or as seen later, by a subtle argument using a fundamental and powerful consistency result. A representation of x is possible in the form of a matrix of amplitudes with at most n rows (indicating the resolution level) and relatively few columns (indicating the starting index). It is a simple argument to show that this matrix of amplitudes is sparse, in that it contains no more nonzero values than nonzero elements in the original sequence x. We note that this representation is not very convenient yet, to store or transmit, but not much worse than the original. Further coding, storage, quantizing, and truncating processes are not addressed here except for some remarks later. What is clear at this stage is that we have a pulse decomposition of x with many guaranteed properties and connections visible already. Given the choices between Ln Un and Un Ln at each stage, the equivalent median decomposition is clearly related to the class of decompositions since Un Ln ≤ Mn ≤ Ln Un . In practice, the decompositions are similar in many respects, but the medians have the guaranteed properties only nearly, and no prospect of coding or computational simplicity nearly as simple as that of the LULU alternatives. Even restricted to Mn−1 , Mn does not map onto Mn generally. For the purpose of perspective, we compare again with an equivalent Fourier decomposition, which is a basis transformation. The highest resolution level in the DFT has a 2D component, easily coded by two numbers (by separating, for instance into a sine and a cosine wave, each of which consists of an alternating sequence of equal high-resolution pulses). Each resolution layer (frequency layer) has a similar structure and economy of coding is clear with a band-limited signal x, in that only some resolution levels need to be stored. There is no significant loss of information, even in the presence of some random noise. Quantizing and thresholding can augment this economy with little destruction to the signal. This obviously holds for band-limited sequences. Large impulses of noise, however, no matter how brief, result in severe distortion, yielding a broad spectrum (Heisenberg’s inequality is useful to remember; Chui, 1997.) Visual images have significantly different properties. Edges, constant regions, and so on are important. We can think of such sequences as pulse limited by analogy, and the block pulse decomposition above would result in the total number of block pulses being significantly less than the number of nonzero elements of the sequence x being decomposed. Simple argumentation shows that at each separation stage the number of pulses appearing in that resolution level reduces the number of different values of the input sequence by a similar amount to yield the number of different elements in the smooth part (Rohwer, 2005). A section in a row of pixels, as in a typical (gray shade)
LULU THEORY, IDEMPOTENT STACK FILTERS
81
image on a television screen, may contain large constant sections. Thus, the sequences of gray-scale values may have only few different values, resulting in the total number of blocks in the pulse decomposition being few. Thus block-limited sequences yield economization. Extending this argument to sequences with superimposed random noise is not easy, as the decomposition is not linear, but in practice it seems to work equally well. Quantizing economizes as expected, if not better. Also note that there is no multiplication involved as the work is done with integer values, with no rounding error. As is the case of the power spectrum in the DFT being useful for decision making, a total variation spectrum can be envisaged by the sequence ti = T (r i ). Thus we define the following formally: Definition. The sequence t (x) = {ti = T (r i (x))}, (i ∈ Z) is the variation spectrum of x. A glance at such a sequence yields insight. A pulse-limited sequence transmitted (as a typical signal for pulse-width–modulated control) should have an essentially local spectrum. Thresholding would yield the pulse width intact and all lesser-amplitude noise (impulsive, random, and drift) removed. Practical experimentation with simulation confirms astonishing robustness in the presence of even large-magnitude noise. All the above remarks exploit the analogy with the DFT, but we warn readers not to forget the constraints of nonlinearity on interpretation, particularly in the presence of noise. This must be considered carefully. Even here the DPT is useful for understanding and analysis. (Recall that the DFT is also an analysis tool for the design of filters, and similarly the DPT is useful for the design of smoothers.) Before considering the influence of noise, which cannot be treated in isolation as with a linear filter F , where F (signal + noise) = F (signal) + F (noise), we need to examine the fundamental sanity of even considering analyzing noise separately. B. Component Modification and Consistency 1. The Fundamental Problem of Nonlinearity in Decompositions Reconstruction of a pulse representation, as in the example of the decomposition with Ln Un , is simple and stable. We simply add the resolution levels. The more general problem of consistent decomposition is fundamental as illustrated with the following arguments. Suppose we intend to transmit an (essentially) pulse-limited signal x economically and robustly. By prior agreement we send only the few largest
82
ROHWER AND WILD
pulses, as extracted by our DPT, in quantized form. Assuming there is no transmission error, the recipient reconstructs the signal by addition of the pulses and, hopefully, can recognize the essence of the image. What happens when this reconstructed image is then decomposed in the DPT, to determine whether it consists of the pulses sent? To illustrate, a simpler question could be asked first. Given x, suppose only the resolution levels 5 to 11 and 21 have significant percentage total variation and are selected. Adding these 3 sequences to form a sequence z, we expect the recognizable essence of the (image) sequence x. If we were to decompose z, would all the resolution components of x, except those selected, be zero, and would the others be identical to those of x? Recalling that our decomposition consists of nonlinear operators, this is not a trivial question but important. The answer, surprisingly, is yes. But there is more. The proof took a long time and involved many initially unsuccessful attempts, despite believing in the truth observed from simulation. Proofs that yielded insight or explanation were considered important, not merely those yielding the truth. (There are reasonably simple algorithmic proofs not yet published.) It is important to understand this passion. LULU theory came from discoveries in critical classified problems. Many of them, in addition to the later industrial contract applications, had impressive success. This led to the belief in an underlying strong theory that slowly exposed itself. Another important concept, and tool, seemed to hide in the consistency mystery. When the theory had developed sufficiently, it became provable that the DPT, of a particular sequence x, is consistent to the extent that r i (z) = αi r i (x), for each i, when z = αi r i (x), with αi ≥ 0. This means that an arbitrary nonnegative combination of resolution levels of x is decomposed consistently. This was refined and generalized, with little extra effort, to yield a similar consistency in the case of z= αi r i+ (x) + βi r i− (x) with, αi , βi ≥ 0, +
+
when r i (z) = αi r i (x) and r i− (z) = βi r i (x). But even more consistency was to be exposed. In a commercial application, the detection of an object in a noisy environment was made spectacular by making only this simple white shape more luminous. A single identified pulse in each resolution sequence had its amplitude multiplied to highlight this. Decomposing the resulting image gave precisely the original image with an amplified (highlighted) white shape, with no distortion elsewhere. No dark rings surrounded this amplified pulse, as is associated with the Gibbs phenomenon. This was found to be true but did not have a proof yet. This so-called Highlight Conjecture lay waiting in the presence of more pressing problems. An algorithmic proof
LULU THEORY, IDEMPOTENT STACK FILTERS
83
seemed possible again, but more insight was desired. Dirk Laurie has since proved it algorithmically, but a different pursuit, yielding an expansion of existing theory, as well as a proof of the conjecture, was also successful. Here we shall take it for granted, being easily verifiable by simulation, and examine the consequence. We can remember quite simply the following statements. With respect to the cone generated by the individual pulses, the decomposition is as linear as is required for images. (Images do not have negative luminosities.) The general formal statement of the Highlight Conjecture is thus the following; N N If x = j ∈Λn bnj (x), then, if z = j ∈Λn αnj bnj (x), with n=1 n=1 αnj ≥ 0 arbitrary weights, we determine that bnj (z) = αnj bnj (x) are the pulses of the decomposition of z. Equivalently, we can say: The DPT acts linearly on the cone generated by the block components of a particular sequence x. One immediate consequence is an argument as to why two of the block components of a particular sequence x either overlap completely or not at all. Consider block pulses B1 and B2 of width m and n, respectively. Let z = |B11 | B1 + |B12 | B2 . Then z decomposes consistently. Assume m > n (otherwise they are on the same resolution level and disjoint by a previous argument), z consists of two pulses of amplitude +1 or −1. Suppose both have amplitude 1. Then if they were to partially overlap, there would a block of width less than n that is of amplitude 2, and larger than the values outside. Noting that z ∈ Mn−1 , since it decomposes consistently, this contradicts since we have a pulse of width less than n in z. A similar argument holds with the other amplitude. 2. Shape Preservation in the DPT As hoped, the struggle for illuminating proofs for the primary consistency results was worthwhile, yielding a very good idea with important economization consequences in various parts of LULU theory. We introduce the ideas as follows. One frequent comment in image processing is the good “edge preservation” of median smoothing. Clearly the comparison is with linear smoothers (or digital filters). A failure of wavelet decompositions is associated precisely with the oscillatory impulse responses. The classic example of unsatisfactory decompositions is a photograph of the night sky, with high-amplitude brief
84
ROHWER AND WILD
pulses produced by stars. The LULU decompositions do well, as well as is reported about medians from image-processing experts, and even better. Upon thorough investigation of the theory of LULU operators, we noted progressively more local types of trend preservation. 1. Sequences that are monotone are preserved (as with median smoothers). Linear filters have a weaker preservation here in that the outputs may again be monotone, but not necessarily identical. 2. The requirement of monotonicity can be weakened to require only monotonicity in the support of the smoother. We note that the support of Un and Ln is the set {xi−n , . . . , xi , . . . , xi+n } for the values (Un x)i and (Ln x)i . Therefore Ln Un , and Un Ln could be considered to have a support of {xi−2n , . . . , xi+2n }. 3. The classes Mn of n-monotone sequences consist of eigensequences w.r.t. the eigenvalue 1 of all operators in the LULU interval [Un Ln , Ln Un ]. Thus local monotonicity is preserved in a precise sense. This is an even weaker requirement than the weakened requirement in 2, since a sequence is preserved where {xi , xi+1 , . . . , xi+n+1 } is monotone for each i. Noting that the median M1 ∈ [U1 L1 , L1 U1 ] is equal to L1 U1 and U1 L1 and preserves the value xi when xi−1 ≤ xi ≤ xi+1 , it was observed, and proved, that L1 U1 and U1 L1 do this as well, despite formally having a wider support of {xi−2 , xi−1 , xi , xi+1 , xi+2 }. Similarly, all powers of M1 , and even the recursive version (with infinite support in the past) also clearly preserve xi under this local condition. A more audacious question emerged slowly: Is there some further, even more local, preservation hidden underneath? We consider the following basic trend preserving property. Definition. An operator P is neighbor trend preserving (NTP) if for any sequence x, at each index i, xi ≥ xi+1 implies (P x)i ≥ (P x)i+1 and xi ≤ xi+1 implies (P x)i ≤ (P x)i+1 . If the atoms Un and Ln , from which all the smoothers involved in the pulse decompositions are assembled as molecules, are NTP, then all the molecules are also NTP. Compositions directly inherit the NTP property from constituents. The atoms turned out to be easily shown1 to be NTP. The simplest implication is clear; no smoother involved in our decomposition ever changes the order of neighbors. In linear approximation theory, this is a Holy Grail, found only in low-order approximation operators, such as 1 Note that and are not NTP. A polynomial time NTP test in the general setting of lattice stack
filters (III.E) is given in Wild (2005, Theorem 12).
LULU THEORY, IDEMPOTENT STACK FILTERS
85
Bernstein operators and variation-reducing spline approximation operators. Here it is everywhere, but even more is readily apparent. The DPT has not only LULU operators (morphological filters) involved. These are all NTP, apart from being increasing (which is another form of order preservation) and having all the properties of a separator, including variation preservation. Other operators involved are the complements (or residual operators) like I − S, where S is a smoother. Recall from II.A.10 that I − S is not a smoother. Considering all the compositions of the LULU atoms, and their complements, we noted that they trivially inherit variation preservation. Investigation shows that they are all NTP as well, and it is natural to define a stronger property as follows. Definition. A separator S is fully trend preserving (FTP) if both S and I − S are NTP. Note, from the definition, that such separators trivially have that S and I −S are difference reducing, |+|xi+1 −Sxi+1 − and that |xi+1 −x i | = |Sxi+1 −Sxi xi + Sxi |. Therefore |xi+1 − xi | = |Sxi+1 − Sxi | + |(xi+1 − Si+1 − (xi + Sxi )|, and we have variation preserving on any subset that is chosen for the summation. In the LULU theory the two properties of FTP and VP are actually equivalent, in that one implies the other. In the typical decomposition, one of the first simple, and surprising, consequences is that each of the resolution sequences emulates the local order of the original sequence. Note (from the consistency results of the previous paragraph) that the mapping from x onto a particular resolution sequence r n is as close to a projection as can be permitted, since it can be viewed as coming from a separator. Although we have not proved this yet, it is idempotent and co-idempotent, FTP, and thus also variation preserving. This is also apparent from the Parseval identity
T (x) = T ri = T ri + T rn . i
i=n
Even the mapping from x to any specific pulse in the decomposition has this property, as the highlight conjecture is true. This merits reflection. Using the analogy of the DFT where the individual 2D resolution levels can be considered as shadows of a vector x on particular walls orthogonal to each other, if the projection were FTP, each shadow would be order consistent everywhere with the vector x! This clearly is impossible; there are only trivial linear projections that are FTP, the identity I and the O operators. However, for the typical DPT, every vector in the cone of the resolution sequences
86
ROHWER AND WILD
r i of x is order consistent with x. The same holds for every vector in the cone generated by the individual pulses of all the resolution levels. Stated differently: an edge in any resolution component implies an edge at the same position in the original sequence x. This principle also holds in any economizing reconstruction omitting arbitrary resolution levels. We could not expect more. Edges are preserved in any partial reconstruction. Note how the Haar wavelet in Figure 2a smears the most significant edge in the data and the equivalent LULU smoothing in Figure 2b preserves the same edge. Also note the following trivial but important result (Rohwer, 2005). Theorem 2. Let A be NTP. Then x ∈ Mn ⇒ Ax ∈ Mn . Corollary 3.
r n (x) = (I − Ln Un )Cn−1 (x) ∈ Mn−1 .
We note that this corollary, previously mentioned, together with the fact that r (n) x is an eigensequence of Ln Un w.r.t. the eigenvalue 0, by the coidempotence, proves that r n (x) consists of nonoverlapping blocks of width n that are separated by at least n zeros if they have the same sign. The residual levels can be characterizing in this way, but we do not pursue this here. 3. Full-Trend Preservation and Consistency NTP and FTP operators abound in the LULU decomposition. These are properties that are easily inherited by composition, and the idea emerged that they could be a key to a simple proof structure for the observed consistencies. At this stage the following strategy is the simplest attained to date. Two fundamental theorems are established. From these, the primary consistency results follow by relatively simple arguments. The first fundamental consistency theorem is rather technical to prove (Rohwer, 2005). This is not a trivial theorem, as Un and Ln are definitely nonlinear operators. It inspired a more general result, shown in III.A.4. Theorem 4. Let x ∈ Mn−1 and A FTP. Then Un (I − AUn )x = Un x − Un AUn x
and
Ln (I − ALn )x = Ln x − Ln ALn x. The above theorem easily yields the following one. The proof is presented here, as it is simple and instructive.
LULU THEORY, IDEMPOTENT STACK FILTERS
87
Theorem 5. Let x ∈ Mn−1 and A FTP. Then Ln Un (I − ALn Un )x = Ln Un x − Ln Un ALn Un
and
Un Ln (I − AUn Ln )x = Un Ln x − Un Ln AUn Ln . Proof. Ln Un (I − ALn Un )x = Ln (Un x − Un ALn Un x) (by Theorem 4) = Ln (I − Un ALn )Un x. Un is FTP, so that Un A is also FTP and Un x ∈ Mn−1 . Thus the last expression can be simplified by Theorem 4 to (Ln − Ln Un ALn )Un x = Ln Un x − Ln Un ALn Un x. Using the above theorems (Rohwer, 2005), yields the following corollaries. Corollary 6. (1) Cj (I − Cn ) = Cj − Cn and Fj (I − Fn ) = Fj − Fn for j ≤ n. (2) Cn , Cm , I − Cn , I − Cm commute with each other for all m and n, and so do Fn , Fm , I − Fn and I − Fm . Clearly the idempotence and co-idempotence of Cn , Fn has come about as a by-product. There were many alternative proofs of idempotence, but in general, co-idempotence had been considered difficult to prove. Example 1. Let us prove in detail the co-idempotency of C2 , thus that C2 (I − C2 ) = 0. We start with C1 (I − C2 ) = L1 U1 (I − C2 L1 U1 ) = L1 (U1 − U1 C2 L1 U1 )
(Theorem 4 with A = C2 L1 )
= L1 (I − U1 C2 L1 )U1 = (L1 − L1 U1 C2 L1 )U1
(Theorem 4 with A = U1 C2 )
= L1 U 1 − C 2 = C1 − C 2 . Now proceed as follows: C2 (I − C2 ) = L2 U2 C1 (I − C2 ) = L2 U2 (C1 − C2 )
(by the above)
= −U2 L2 (C2 − C1 ) (by duality) = −U2 L2 (L2 U2 − I )L1 U1
88
ROHWER AND WILD
= L2 U2 (I − L2 U2 )L1 U1 = L2 (U2 − U2 L2 U2 )L1 U1
(by duality) (Theorem 4 with A = L2 )
= L2 (U2 L1 U1 − L2 U2 L1 U1 ) = L2 (I − L2 )U2 L1 U1 = 0
(U2 L2 U2 = L2 U2 , III.D.1)
(co-idempotency of L2 ).
The second fundamental consistency theorem has a similar technical proof as the first one (Rohwer, 2005), and is generalized later. For the purposes required here, we simply state the important relevant result: Theorem 7. Let x ∈ Mn−1 . If A and B are NTP, A commutes with Un (respectively Ln ), and α, β ≥ 0 are real numbers, then Un (αA + βBUn )x = (αUn A + βBUn )x, respectively, Ln (αA + βBLn )x = (αLn A + βBLn )x. Roughly speaking, Un and Ln act linearly under the given conditions since Un βBUn = βBUn , as βB is NTP. (This is easy to prove from the invariant series of Un ; see III.D.2.) Some simple corollaries are the following and their duals:
Un α(I − Ln Un ) + βLn Un x = αUn (I − Ln Un )x + βLn Un x,
Ln Un α(I − Ln Un ) + βLn Un x = βLn Un . It is noteworthy to consider the last result as a generalization of the primary cases of (α, β) = (0, 1) (idempotence of Ln Un ) and (α, β) = (1, 0) (coidempotence of Ln Un ). Curiously, this generalization does not seem to follow generally for separators. However, we can now argue the primary consistency results of the LULU decomposition of a sequence x, as discussed previously. We consider again the multiresolution decomposition in the form of a mapping onto a finite set of resolution sequences and a residual (smoothed) shape; DP T (x) = [D1 x, D2 x, . . . , DN x, CN x]. The following theorem holds (Rohwer, 2005): Theorem 8. Let x ∈ M0 . Let n ≤ N and αi ≥ 0. If z =
n i=1
αi Di x,
then Di z = αi Di x
for 1 ≤ i ≤ n.
LULU THEORY, IDEMPOTENT STACK FILTERS
89
Proof. Noting that the operators Di = (I − Ci )Ci−1 are all FTP, we see that n n α i Di x = αi (I − Ci )Ci−1 x = Am Cm−1 x ∈ Mm−1 i=m
i=m
since Am is NTP as αi ≥ 0. Assume that Cj −1 z = ( ni=j αi Di )Cj −1 x for j < n. (Clearly this is true for j = 1, by the definition of z.) But then n C j z = L j U j α j Dj + αi Di Cj −1 x i=j +1
= Lj Uj (αj Dj + Aj +1 Cj )Cj −1 x
= Lj Uj αj (I − Lj Uj ) + Aj +1 Lj Uj Cj −1 x
= Lj αj Uj (I − Lj Uj ) + Aj +1 Lj Uj Cj −1 x. The last equality comes from the second fundamental theorem (Theorem 7), since I − Lj Uj is NTP and commutes with Uj by Theorem 4. Also Aj +1 Lj is NTP and Cj −1 x ∈ Mj −1 . Thus
Cj z = Lj αj (I − Lj )Uj + Aj +1 Lj Uj Cj −1 x
= Lj αj (I − Lj ) + Aj +1 Lj Uj Cj −1 x
= αj Lj (I − Lj ) + Aj +1 Lj Uj Cj −1 x by Theorem 7 since Lj commutes with (I − Lj ) and U , Cj −1 x ∈ Mj −1 . Since Lj (I − Lj ) = (I − Lj )Lj = 0 by the idempotence and coidempotence of Lj , we finally derive Cj z = Aj +1 Cj x and Dj z = (Cj −1 − Cj )z = αj Dj x. Thus, Cj z = ( ni=j +1 αi Di )Cj z and the standard induction argument yields Di z = αi Di x for i ≤ n. As noted, we can interpret the above result as follows. The decomposition acts linearly in the cone generated by the resolution vectors of x. A similar argument yields a more general consistency result in that the positive and negative parts of each resolution sequence can be modified by their own factors αj+ and αj− and still yield consistent decomposition. (The cone is now in vector space of twice the dimension.) As mentioned, even this result can be generalized to the cone generated by individual pulses (the highlight conjecture).
90
ROHWER AND WILD
A proof of the highlight conjecture appears from the theory above, after a refined investigation into the consistency of the primary separators Lj Uj in the decomposition (an article presenting this has been submitted to Quaestiones Mathematicae). At this stage, we feel convinced that the two fundamental theorems presented here are crucial in a theory of consistency, which is sufficient for the representation we propose, for the mathematics of vision. We even suspect that such a consistency is appropriate for a conversion to such a representation. It certainly is achievable in the case of our choice. C. Practical Considerations 1. The Effect of Noise in the DPT To avoid confusion, we only discuss some useful ideas in specific cases of definition choices. Given a signal x, known to be (or assumed to be) pulse limited (meaning that the chosen DPT gives all its pulses between the width N and M), we can clearly conclude that the appearance of pulses on other resolution levels must be due to noise contamination and, particularly likely, additive noise. Quantizing a function when sampling introduces i.i.d. noise from a uniform distribution. Thus a point of departure is to study the decomposition of i.i.d. generated random sequence (from a chosen distribution f ). A first useful observation is that such a sequence is clearly far from being pulse limited, in that more than half the total number of pulses will be in the first resolution level of any of the LULU based DPTs. The argument is simple but illustrative and important. Analyzing the case with L1 U1 , and therefore U1 first, a downward pulse comes out iff xi < xi−1 , xi+1 ⇔ U1 xi > xi . Since these values are i.i.d., the probability is 13 for a continuous distribution. A similar, slightly more elaborate analysis shows that L1 U1 xi < U1 xi iff xi > xi−1 , xi−2 , xi+1 , xi+2 , which occurs with probability 15 . Therefore, on average 13 + 15 of the total number of indexes will have pulses in r 1 , and the total number of pulses in the full DPT is not more than the number of indexes. Similar analysis with lower-resolution pulses is possible but increasingly difficult. Simulation is simple and the essentially strongly decreasing presence of pulses is easy to remember. This is independent of the distribution involved. The amplitude of the pulses clearly is distribution dependent, as is the variation spectrum, which is, however, also strongly decreasing. If the noise comes from a discrete distribution, the finite probability of equality of two of the three yields a slight reduction in probability, and so forth. Analysis of the statistics involved received attention from more than one researcher. One publication has appeared (Conradie, de Wet, and Jankowitz,
LULU THEORY, IDEMPOTENT STACK FILTERS
91
2006), and others are in preparation, one of which addresses the estimation of parameters of the distribution from the first resolution level. Rough arguments, on the line used in the pure noise case above, indicate that the essential effect of a signal, added, consisting of wide pulses only, will not have much effect on the first resolution level. The first resolution level therefore provides much information on the added i.i.d. noise present; this can be used to select appropriate thresholding to remove the noise to a significant degree. These ideas are still in their infancy, however, and require more thorough research. Impulsive noise and its effect is relatively easy to investigate—the DPT was intended to be a useful tool for this purpose. If the pulses are brief and rare, there should be relatively little distortion, even when the amplitude is arbitrarily large. Minimal damage may result to the width of wider pulses, but generally this is no more than the width of the noise impulses. The amplitude of the (wide) signal pulses cannot be affected too much, due to the FTP property of all the projection involved, but more precise statements require more analysis, and perhaps new ideas. The addition of slow drift to a signal can also be considered noise. The local order being marginally affected on the comparisons involved in the DPT can be expected to lead to marginal distortion. Periodic noise of low frequency should have a similar effect. Noise on the Nyquist frequency is interesting, but this is part of a larger research investigation into the differences in the resolution components when any specific projection, say Ln Un , is replaced by Un Ln . At this stage, it is sufficient to state that simulation on the type of signal that is used in pulse width–modulated control, as is common in model aircraft, is easy to experiment with and illuminating. Here the signal is a train of essentially square pulses (updated many times per second), and the information transmitted is in the width of these pulses. Adding noise of the above mentioned types demonstrates a remarkable robustness. In a problem of estimating a source location from the time of arrival differences at receives, the first attempt with LULU smoothers resulted in a significant improvement on the best previous time of arrival triggers of an international company. The FTP property is clearly a tool with which to research these observations. The implication to the problem of edge preservation, clearly important in image processing, is obvious. We refer readers to Section II.D, where the ideas and the simplicity of the tools used are illustrated by novices to the field of image processing. 2. The Relationship with Median Decompositions and Alternatives Two obvious questions, which also lead to natural extensions, need to be answered. (1) How does an equivalent median decomposition fare? (2) Why have morphological equivalents been considered inferior?
92
ROHWER AND WILD
We start with basic observation that Ln ≤ Mn ≤ Un , which is instructive to derive, given the definitions: (Mn x)i = median{xi−n , . . . , xi , . . . , xi+n }, (Ln x)i = max min{xi−n , . . . , xi }, . . . , min{xi , . . . , xi+n } and (Un x)i = min max{xi−n , . . . , xi }, . . . , . . . , max{xi , . . . , xi+n } . If we note that the median (Mn x)i is the minimum of all the maxima of subsets of n + 1 elements from the set {xi−n , . . . , xi+n }, as well as the maximum of all the minima of all subsets of n + 1 elements, the inequalities follow since the other two involve extrema of only some particular subsets. Less obvious, stronger, and more difficult to prove is that Un Ln ≤ Mn ≤ Ln Un . It is stronger since Ln ≤ I ≤ Un , and therefore Ln ≤ Un Ln ≤ Mn ≤ Ln Un ≤ Un , due to the fact that the operators are all syntone. Natural alternatives in the decompositions seem to be Ln , Un and others between these values. We consider first the choice of Ln as separators, which seems natural, given that we have no negative luminosities, and we would expect to decompose a signal by peeling off upward pulses of increasing width as follows x → L1 x → L2 L1 x → etc. with residuals given by (I − L1 )x = r 1 , (I − L2 )L1 x = r 2 , etc. Experimentation (and analysis) easily demonstrates that the resolution sequences are composed of pulses. (In this case, they are all positive pulses.) There are some serious disadvantages, however, as quickly become apparent. Suppose we omitted the two highest-resolution levels for a more eco i nomical representation; then the rest, z = N i=3 r (x), would be a strongly biased approximation to x, since z = L2 x ≤ x. Furthermore, the alternative decompositions used previously would not agree that z has no pulses of width less than 3. (There is a lack of consensus on the meaning of signal.) Ln and Un do not have the same range, whereas Ln Un and Un Ln do. Our feeling is that this is one of the important defects of this choice and may be the reason for the belief that morphological decompositions are not as good as the median decompositions. The choices used may have been suboptimal. We feel the lesser bias and the common range Mn of Ln Un and Un Ln are the primary properties to target. This would then justify selecting any smoother from the LULU interval [Un Ln , Ln Un ]. Apart from the two edges discussed, a few primary candidates must be considered: Mn , Bn (the morphological center of the interval) and the strange operator Qn = Un + Ln − I , which generally is not in the LULU interval but is in [Ln , Un ]. The median is statistically unbiased but is not a separator. Tukey overcame this problem by iterating till convergence, which means applying Mn repeatedly until there is no change. Denoting this operation Mn∞ , it does map into Mn (stated without proof). Our decomposition can expect a similar type of
LULU THEORY, IDEMPOTENT STACK FILTERS
93
structure in residuals as in the cases of Ln Un or Un Ln . We have not analyzed this and do not know whether Mn∞ is a separator, but we would conjecture a similar behavior on decomposing any sequence in the cone generated by the residuals. Its lack of bias should be slightly advantageous, but the complexity of the computations is not appealing, particularly in real-time applications, particularly with synchronization of pulses in parallel transmission. Using the median Mn itself has a lack of predictability (theory)—for instance, no Parseval identity or resolution level structure, like pulses of fixed width and constant amplitude to ease coding. The approximating properties (although these may often be nearly there) can be expected to be not worse than our two basic choices; however, since Ln Un and Un Ln are generally equal for structured sequences, and when Ln Un and Un Ln give different output, the operator Mn would also differ. We view the situation in a similar light as interval computations on a computer. When the intervals become large, this is useful to know instead of being unaware by selecting a particular value in the interval with no simple indication of error size. The morphological center of Ln Un and Un Ln is an interesting candidate. For our purposes, it can be simply defined as: (Ln Un x)i , if xi ≥ Ln Un xi , (Bn x)i = (Un Ln x)i , if xi ≤ Un Ln x, otherwise. xi , It is clearly unbiased and Un Ln ≤ Bn ≤ Ln Un , so that it is in the interval of ambiguity [Un Ln , Ln Un ] (Rohwer and Toerien, 1991). It can be proved to be a separator, and its use in a decomposition starts well but soon demonstrates less than simple behavior. This is currently a research pursuit. Understanding and predicting the decomposition is clearly not as simple as in basic cases. Used as a single smoother, however, it does have the advantage of removing pulses with good information preservation, but it does not seem worthwhile for decomposition. A curiosity is Qn = Un + Ln − I . It is easy to motivate heuristically: “Un removes upward pulses from the signal. Ln removes downward pulses. The sum should give twice the signal minus the upward and downward pulses. Subtracting the original sequence (signal plus upward and downward pulses) should yield the signal.” A simple investigation shows that Q1 = M1 , but Q2 = M2 and Q3 is not even a selector. It is therefore almost mandatory to reject these operators, but they have a curious value. The statement Q1 = M1 can be generalized as follows: x ∈ Mn−1 ⇒ Qn (x) = Mn (x). Thus, in a decomposition where Mn is successively mapped from Mn−1 , it is identical to the median. As Qn is easier to analyze with LULU theory, for x ∈ Mn−1 , (Qn x)i = (Mn x)i =
94
ROHWER AND WILD
(Un Ln xi ) or (Ln Un x)i . This demonstrates that where Ln Un and Un Ln differ, the median Mn yields a alternating oscillation in the envelope of the LULU interval. We have found no other justification for this choice. A further indication of how near Mn and Qn are to Ln Un and Un Ln is the following set of curiosities, which have simple proofs. Theorem 9. If A = Qn or A = Mn , then (1) An Ln = Un Ln and AUn = Ln Un ; (2) Ln ≤ Ln An ≤ Ln Un and Un ≥ Un An ≥ Un Ln . Proof. (1) QLn = (Un + Ln − I )Ln = Un Ln + Ln Ln − Ln = Un Ln , and the dual statement follows similarly. Also Mn Ln ≤ Un Ln since Mn ≤ Un and Mn Ln ≥ (Un Ln )Ln since Mn ≥ Un Ln . Thus Mn Ln = Un Ln . (2) Ln is syntone and Ln ≤ Qn ≤ Un , so that Ln = Ln Ln ≤ Ln An ≤ Ln Un . The dual is proved similarly. A simple corollary is that An Ln is idempotent. Also (Ln An )3 = (Ln An )2 , but Ln An is not idempotent. This is sometimes useful for theoretical purposes, but we see no reason for any choices other than Ln Un or Un Ln . The average does well but does not map into Mn . Iterating works very well, but we avoid this for several obvious reasons. However, since Ln Un and Un Ln both map x ∈ Mn−1 onto x ∈ Mn , there is the option of selecting a particular one at each index of a sequence x. This could be alternated in the hope of lessening bias, or we could follow L1 U1 by Un Ln for all the following separations. This makes some sense statistically, but we cannot yet elaborate on this. The search for secondary criteria to consider remains an open problem. We could report regardingoptimal selection of compositions of Ln and Un that all compositions A of and have left quasi-inverses w.r.t. the total variation as norm, and if S is a neutral, compositions have the identity as quasi-inverse. This means that no postcomposition AS can reduce the total variation of the residual T ((S − I )x). The same as above is true for quasiinverses w.r.t. the usual 1-norm (Rohwer, 2004a, 2004b). In both cases, right quasi-inverses do not exist. This implies that there is no a priori reason to prefer either L1 U1 or U1 L1 in either the first stage of a decomposition or later. In summary, we believe that the two primary choices of decomposing with Ln Un or Un Ln are well understood. Discarding the higher-resolution sequences of sequences yields z=
N i=m
r i (x) = Cm (x) and
y = Fm (x),
with Fm = Um Lm Fm−1 ,
LULU THEORY, IDEMPOTENT STACK FILTERS
95
respectively, as good (near-best) approximations to x in any of the usual p-norms. Each value of z and y is considered as signal by Ln Un and Un Ln , in that Ln Un y = y and Un Ln z = z. Using alternative operators in the LULU interval [Un Ln , Ln Un ] at any of the stages would yield an approximation in [Fm x, Cm x]. If Cm (x) and Fm (x) are seen to be far apart in a particular case, it may be worth attempting alternatives. It is important to note that Ln Un (Un Ln ) = Un Ln and Un Ln (Ln Un ) = Ln Un . Although Un Ln ≤ Fn ≤ Cn ≤ Ln Un proves that all of these, including the median Mn , agree that elements of Mn are signals, they do not agree on what noise is since (I − Ln Un )(I − Un Ln ) = I − Un Ln , (I − Un Ln )(I − Ln Un ) = (I − Ln Un ), (I − Mn )(I − Ln Un ) = I − Ln Un , etc. We still find it difficult to interpret the full consequences of this and refer to it as the problem of resolution consensus. We are researching this and feel that it is important. However, the consensus on what signal is yields the primary consistency we were seeking, which seems to be sufficient. A good analogy to use as a guide may be the following. In classical (linear) approximation theory, we aim to have higher-order polynomials, trigonometric polynomials, or splines spaces preserved. Here the nested sequence of sets Mn may be the approximation sets to use as goals. Seeking best approximations seems a good idea, but is not appealing because of computational complexity and other serious defects. We do not regard this as important except perhaps for theoretical purposes. 3. Quantizing and Thresholding Components It seems clear that, even when using (linear) wavelet decompositions for image processing, we have to become nonlinear. Mallet (1998) argues the case for this convincingly. Illustration with a simple example in the Haar wavelet decomposition demonstrates that, even in the case of decomposing a simple block signal of width 3 with some random noise added, severe distortion results in attempting economizing for coding. We must quantize in some way and/or threshold to discard all the small components, leaving only a few significant components. These operations are nonlinear, and these operators themselves introduce noise into the transmitted version of the signal. For analysis purposes, it is convenient to note that such operators merge naturally into the LULU theory in that many properties are complementary to those of the smoothing operators involved. Some typical examples of such operators can be given with some of the relevant properties.
96
ROHWER AND WILD
Definition. The rounding operator R maps the sequence x onto Rx, with
|xi | 1 + . (Rx)i ≡ h sgn(xi ) int h 2 Theorem 10. R is idempotent, syntone, NTP, self-dual, and commutes with ∨ and ∧. Proof. We note that sgn(Rxi ) = sgn(i) and that
|h sgn(xi ) int( |xhi | + 12 )| 1 |Rxi | 1 |xi | 1 1 + = + = int + + . h 2 h 2 h 2 2 Therefore
|Rxi | 1 + R(Rx)i = h sgn(Rxi ) int h 2
1 |xi | 1 + + = Rxi . = h sgn(xi ) int int h 2 2
Also
| − xi | 1 + R(−x) i = h sgn(−xi ) int h 2
|xi | 1 + = −(Rx)i . = −h sgn(xi ) int h 2
Thus R is self-dual, and it is clearly increasing. Consider
| ∨ xi | 1 + . (R∨)xi = h sgn(∨xi ) int h 2 Assume xi+1 ≥ xi ≥ 0. (A similar argument holds if xi ≥ xi+1 ≥ 0.) Then,
|xi+1 | 1 + = max{Rxi+1 , Rxi } = (∨R)xi . (R∨)xi = h sgn(xi+1 ) int h 2 If xi+1 ≥ 0 > xi (similarly if xi ≥ 0 > xi+1 ), then
|xi+1 | 1 + (R∨)xi = h sgn(xi+1 ) int h 2
|xi+1 | 1 |xi | 1 + , h sgn(xi ) int + = max h sgn(xi+1 ) int h 2 h 2 = (∨R)xi .
LULU THEORY, IDEMPOTENT STACK FILTERS
97
If 0 ≥ xi+1 > xi (similarly if 0 ≥ xi > xi+1 ), then
| ∨ xi | 1 |xi+1 | 1 (R∨)xi = h sgn(∨xi ) int + = −h int + h 2 h 2
|xi | 1 |xi+1 | 1 + , −h int + = (∨R)xi . = max −h int h 2 h 2 In all cases, therefore, (R∨)xi = (∨R)xi . Corollary 11. R commutes with all composition of Ln and Un . Clearly R∨n = ∨n R by induction. The proof now follows as the case for n = 1: RL = R(∨∧) = (R∨)∧ = (∨R)∧ = ∨(R∧) = ∨(∧R) = (∨∧)R = LR. It is important to note that R is not co-idempotent. If we replace the upward rounding by downward rounding to the nearest multiple of h, and note that |xi −Rxi | h i| < 12 so that int( |xi −Rx + then −h 2 < xi − Rxi < 2 , it is clear that h h 1 2 ) = 0. Hence, R(I − R) = 0. In some sense, it is therefore nearly coidempotent. Definition. The thresholding operator T ; (Tt x)i = (T x)i ≡ xi , sgn(xi )t,
if |xi | ≤ t, if |xi | ≥ t.
Theorem 12. T is idempotent, self-dual, is FTP, and commutes with ∨ and ∧. Proof. Idempotence is trivial, and so is the proof that T is self-dual: T (−x) = −T (x). Let xi+1 ≥ xi . (The case where xi ≥ xi+1 is handled similarly.) Then (T x)i+1 ≥ (T x)i trivially, so that T is NTP. Consider (I − T )xi = xi − (T x)i and (I − T )xi+1 = xi+1 − (T x)i+1 . If xi , xi+1 ∈ [−t, t] then both terms are zero. If xi ∈ [−t, t] but xi+1 > t, the first term is zero and the second is positive. If xi+1 ∈ [−t, t] but xi < −t, the first term is negative and the second zero. / [−t, t] then xi+1 > t > −t > xi . Then the first term is If xi , xi+1 ∈ negative and the second term positive. In each case, therefore (I − T )xi+1 ≥ (I − T )xi , and I − T is NTP. Consider (∨T )xi = max{T xi , T xi+1 }.
98
ROHWER AND WILD
If xi+1 , xi ∈ [−t, t] then (∨T )xi = ∨xi = max{xi , xi+1 } = (T ∨)xi . If xi+1 ∈ [−t, t] and xi > t, then (∨T )xi = max{t, xi+1 } = t = (T ∨)xi . If xi+1 ∈ [−t, t] and xi < −t, then (∨T )xi = max{xi+1 , −t} = xi+1 = (T ∨)xi . A similar argument holds if xi ∈ [−t, t] and xi+1 ∈ / [−t, t]. If xi+1 , xi ∈ / [−t, t] there are four cases as follows: If xi+1 , xi > t; then (∨T )xi = max{t, t} = (T ∨)xi . If xi+1 , x + i < −t, then (∨T )xi = max{−t, t} = (T ∨)xi . If xi+1 > t > −t > xi , then (∨T )xi = max{−t, t} = t = (T ∨)xi . If xi > t > −t > xi+1 , then (∨T )xi = max{t, −t} = t = (T ∨)xi . Therefore, T commutes with ∨, and a similar argument proves that T ∧ = ∧T . Corollary 13. T commutes with all compositions of Ln and Un . Proof. Clearly T ∨n = ∨n T by induction and similarly T ∧n = ∧n T . For all other composition of Ln and Un , a similar argument proves commutativity. NOTE: T is not co-idempotent, as a simple example of a constant sequence xi = 3t shows. With the properties of the above operators, it seems that the effect on our decompositions can be investigated with confidence, when this is required. 4. Numerical Considerations The fast Fourier transform (FFT), which is just a fast form of a decomposition of a sequence into its resolution (frequency) components, is generally considered to be the most important mathematical tool of the previous century. It is simply a DFT with the computational convenience of having a complexity of N log N. The decomposition of a sequence into its resolution (block) components with the operators Ln Un (for example) can be seen as a DPT (Rohwer, 2005; Laurie and Rohwer (to appear)). What are the computational costs involved? We do not go into the virtually limitless vectorization possible but concentrate only on basic serial computations. When a sequence of N nonzero elements is to be decomposed into pulses with separators Ln Un for n = 1, 2, . . . , N, the computational effort appears enormous. Ever for n = 1 we get from definition that L1 U1 x is obtained by computing (U1 x)i = min{max{x i−1 , xi }, max{xi , xi+1 }} for i = 1 to N. Economizing by first calculating ( x)i = max{xi , xi+1 }, i ∈ [1, N] and
LULU THEORY, IDEMPOTENT STACK FILTERS
99
following a similar procedure on w = x by, ( w)i = min{xi−1 , xi }, i ∈ [1, N] yields 2N comparison. Clearly, the computation of L1 U1 x from U1 x requires a similar amount of work. Thus separating x into L1 U1 x and r = (I − L1 U1 )x needs at least 4N comparisons and N subtractions. When considering the further N − 1 separators, each with progressively more comparisons to be made (the operators n and n are now involved), the prospects of ending with a complexity of even O(N 2 ) are not good. Optimists may note that as the inputs get smoother (in Mn ), there are fewer different values involved (more constant regions) and this could result in a saving that grows progressively. An example can be given to substantiate this. Suppose x ∈ Mn−1 . Then it is (n − 1)-monotone; every set of the type {xj , xj +1 , . . . , xj +n } is monotone. Therefore (Un x)i = min max{xi−n , . . . , xi }, . . . , max{xi , . . . , xi+n } has all the maxima of type max{xj , . . . , xj +n }, which are easy to evaluate as max{xj , xj +n }. Thus ( n x)j requires only one comparison instead of n. Care must be taken now as the sequence n x is no longer (n − 1)-monotone, but the subsequent h computation can still be simplified by careful thought. Once Un x has been computed, it is again (n−1)-monotone, (since Un is FTP), so that the computation of (Ln Un x)i has a similar saving as before. Without providing a full analysis, we can simply state that we have been working with algorithms that do the full decomposition with N ln N comparison/subtraction for some years now. (See III.A.3 for similar remarks but in relation to stack filtering.) Furthermore, an old idea was recently discussed with Dirk Laurie, who soon proved an order N algorithm. A postgraduate student (J.P. du Toit) soon had it running on a practical image-processing application. It is order N. We hope to publish on this issue soon. At this stage it is reassuring to know that mappings Ln Un and Cn = Ln Un Ln−1 Un−1 , . . . , L2 U2 L1 U1 do not substantially differ in complexity. The less biased mapping Cn , and its dual Fn , are as easy to handle as Ln Un and Un Ln , respectively. They have the i i advantage that if the DPT of x is N i=1 r , with r = (I − Li Ui )Ci−1 , then N that of Cn x is i=n+1 r i . The relationship with the decomposition of Ln Un x is not simple, and as a smoother Cn is generally superior to Ln Un in many significant respects; as a start, it is less biased statistically. Importantly, all the computations involved are comparisons and subtractions. If the numbers are integers, as in the case of image processing, there is no roundoff error accumulating or numbers other than the finite few in use.
100
ROHWER AND WILD
D. LULU Image Analysis of Optical Second-Harmonic Imaging of Pbx Cd1–x Te Ternary Alloys Scheidt et al. (2005) employ optical second harmonic (SH) imaging in the analysis of Pbx Cd1–x Te ternary alloys. The sample wafers are prepared by the vertical Bridgman method from a homogenized melt of pure Pb, Cd, and Te and show two distinct segregated phases consisting mainly of either Pbrich or Cd-rich crystalline material. Areas in the Pb-rich phase enclose fine Cd-rich microcrystals due to PbTe acting as a solvent for CdTe during the solidification process of the melt. The Cd-rich phase occurs in two different crystalline growth directions. Large area grains (several mm2 ) of (111) and (411) crystalline growth direction are identified by SH imaging at different azimuthal angles, and characterized by a ∼ 30◦ phase shift in the rotational SH anisotropy curves. The Cd-rich microcrystal present in the Pb-rich phase are strongly aligned in the (111) direction. The SH response of pure PbTe (rock-salt crystal structure) is found to be at least two orders of magnitude weaker than that of pure CdTe (zinc-blend crystal structure), indicating that the Cd-rich phases dominate the SH response of the Pbx Cd1–x Te ternary alloy. We are exploring the use of LULU filters in the analysis of these types of images. First, let us look at the images as measured using optical secondharmonic imaging. The two images in Figure 12 show the same area but with a rotation of the azimuthal angle by approximately 30◦ in the right image. There are two Cd-rich phases present. Each has a large SH response in only one of the two images. This is due to a phase shift (∼ 30◦ ) in the rotational SH anisotropy curves of the two crystal growth directions. The Pb-rich areas show a weak signal superimposed with a large variance random distribution, which is attributed by Scheidt et al. (2005), to the presence of Cd-rich microcrystals.
(a) F IGURE 12.
(b) SH images of Pb0.2 Cd0.8 Te (recorded in p-p polarization).
LULU THEORY, IDEMPOTENT STACK FILTERS
101
1. The Second Dimension The discrete pulse transform and the LULU filters on which it is based usually operate on 1D signals. It is necessary to define how this will be extended to 2D images. Each row of the image will be regarded (and decomposed) separately. After processing is done on the rows, it is possible to construct the processed image from these rows. The same can be done for all the columns. This results in two images: One with the multiresolution analysis done vertically and another one where it is done horizontally. It is sometimes useful to end up with one image only. For the purposes of the problem we are investigating, this final image is created by calculating the mean image of the horizontally and vertically analyzed images. As an example, we use the problem of removing impulsive noise from an image. An image with some vertical and horizontal lines is used and 2500 points of impulsive noise are added (Figure 13a). This is decomposed horizontally and vertically using the pulse decomposition based on the LU filter. These images are then reconstructed by skipping the first five resolution levels, which eliminates the noise. The horizontal and vertical lines feature in the larger-scale levels in the horizontally and vertically decomposed images, respectively. Figures 13b and 13c show the preserved lines. The final image is just the mean image of these two, as visible in Figure 13d.
F IGURE 13.
(a)
(b)
(c)
(d)
(a) Original image; (b) Horizontal; (c) Vertical; (d) Mean.
102
ROHWER AND WILD
2. Highlighting of Cd-Rich Crystals A feature that stands out in Figure 12 is the high-response Cd-rich area in both of the images. By finding the points of the image that form part of the edge between the high and low response areas, it should be possible to determine the equations of the lines that form the boundaries between the different phases. These lines will also make it possible to automatically match the features on the two images, regardless of their differing azimuthal angles. Doing conventional edge detection can be problematic due to the noisy nature of the SH images. The large amount of microcrystals will yield many edges that we actually want to ignore when trying to find the edges between the different phases. The LULU filters provide us with a natural way to achieve this. By only using some of the resolution levels when reconstructing the image, it is possible to visually enhance the difference between the highresponse Cd-rich phase and the high-variance Pb-rich phase. Figure 14 shows the result of removing the first five resolution levels of the L discrete pulse transform. The average of the column-based and rowbased analysis was used as the final image. The Cd-rich areas are highlighted compared to the original SH images. This will simplify the detection of the crystal boundaries. 3. Edge Detection Using the Discrete Pulse Transform Now that the difference between the two phases is sufficiently highlighted, it should be easier to determine the points that form part of the edges. After this
(a) F IGURE 14.
(b)
The images with the Cd-rich phases enhanced by the removal of small pulses.
F IGURE 15.
Standard width n blockpulse and its replacement in the reconstruction.
LULU THEORY, IDEMPOTENT STACK FILTERS
(a) F IGURE 16.
103
(b)
Edge pixels extracted by skipping the first five resolution levels.
it is possible to extract the line equations from this collection of points. The discrete pulse transform decomposes a sequence into a list of block pulses. The sum of all these block pulses is just the input sequence. It is possible to replace each block pulse in the decomposition with some other sequence when reconstructing. For the purpose of recognizing edges, replace each block pulse with the border elements of that pulse. Figure 15 shows a width n block pulse and the replacement pulse that will be used in the reconstruction. Wherever a pulse was removed in the original decomposition, only the edges of that pulse shall remain in the reconstruction. Figure 16 shows the described procedure applied to the second harmonic images of Figure 12. By leaving out some of the resolution levels when doing the reconstruction, it was possible to remove much of the edges found in the high-variance Pb-rich phase. It is now possible to use other techniques to find the equations that describe the boundary lines between the crystals. A standard technique to identify straight lines in a image is to calculate its Hough Transform. A straight line consisting of a lot of pixels will correspond to a large peak in the Hough Transform of the edge image. The Hough Transform maps every pixel in the x–y plane into a curve in ρ–θ space, with the curve defined by the following equation: x sin θ + y cos θ = ρ.
(2)
The Hough Transform was calculated using the edge pixels of Figure 16. By taking a weighted average of the area around the highest peak it is possible to find the parameters ρ and θ. This yields the line that is most strongly suggested by the collection of edge pixels. All edge pixels close to the line are then removed before the Hough Transform is recalculated. Again, a weighted
104
ROHWER AND WILD
(a) F IGURE 17.
(b) SH images with identified crystal boundaries.
average around the highest peak is used to obtain the line parameters. As there are only two prominent crystal boundaries in the data, we stop here. In Figure 17 we can see that this method was successful in obtaining reasonable estimates of the crystal boundaries. 4. Image Registration Image registration refers to compositing the two SH images of Figure 12 into one image. The detected lines as displayed in Figure 17 provide enough information to do this. First, we need to determine the point where the lines cross in both the images. This will make it possible to put the images in the correct place after it is rotated. From Eq. (2) it follows that: Iy =
sin θ1 ρ1 − ρ2 sin θ2 θ1 cos θ1 − cos θ2 sin sin θ2
,
(3)
where θi and ρi are the parameters of the lines found in Section II.D.3. The x-value of the intersection point is then easy to determine as well. If θ1 is very close to zero, then the second version of Eq. (4) should be used. ρ1 − y cos θ1 ρ2 − y cos θ2 or Ix = . (4) sin θ1 sin θ2 Equations (3) and (4) allows us to determine the position of the intersection point in both images. Next, we calculate how much each line in Figure 17b must be rotated to fit onto the corresponding lines in Figure 17a. Due to approximation errors and noise the two calculated angles will differ slightly. With the data analyzed here the difference was about 3◦ . The mean of these Ix =
LULU THEORY, IDEMPOTENT STACK FILTERS
F IGURE 18.
105
Result of image registration.
two values is used as the angle, α, by which the azimuthal angle was rotated before the second SH image was measured. We then get: α = 30.65◦ . Now transform the second image to place it in the correct position and orientation with respect to the first image. First, the image is translated so as to put its intersection point at (0, 0). Then the image is rotated by α. Finally, the image is moved such that the two intersection points lie at the same position. Selecting appropriate resolution levels yields clearly identifiable regions (Figure 18). 5. Estimation of Variance of Random Distribution The SH imaging of the Pb-rich areas show a weak response with a superimposed random distribution. This is caused by the presence of Cd-rich microcrystals (Scheidt et al., 2005). By quantifying the amount of variance it becomes possible to compare the concentration and growth direction of Cd-rich microcrystals in the Pb-rich area. Signals with a high variance cause large peaks in the first resolution level of the pulse decomposition. The first resolution levels of both SH images were calculated and then smoothed with a disk-shaped averaging filter. This makes it easier to visually examine the difference in concentration (see Figure 20). 6. Results In the following section the collected information is superimposed on our input data. Figure 19 shows the different crystal structures that were recognized and their boundaries. Only the parts that are visible in both of the SH images are displayed. The Cd-rich areas are characterized by a high SH response in one of the images of Figure 14, and a close to zero SH response in the other
106
ROHWER AND WILD
F IGURE 19.
(a) F IGURE 20.
Crystal regions and boundaries.
(b) Visualizing the variance in the Pb-rich areas.
image. The remaining area is classified as Pb-rich. These areas are displayed in one image with the detected boundaries superimposed. The line segment that divides the Pb-rich area in Figure 17 was removed by calculating the distances between the four line segments originating at the intersection point and the centers of the three crystal regions. The line segment farthest from the Cd-rich phases was removed. The enhanced images of the Cd-rich areas made it possible to calculate the centers of the different regions. Figure 20 identifies the areas in the Pb-rich phase where the variance is higher. Darker areas in the Pb-rich phase correspond to higher and more closely spaced peaks. It is clear that most of the Cd-rich microcrystals have a higher SH response at the same azimuthal angle at which the Cd-Rich I phase was detected. There might be a correlation between the strength of this response and the concentration of the microcrystals.
LULU THEORY, IDEMPOTENT STACK FILTERS
107
III. V ISTAS ON I DEMPOTENCY Recall from the Introduction that Section III reflects the works of the second author. A preview of stack filters (III.A) and MM (III.C) was given in the Introduction. Let us now comment more on III.B, III.D, and especially III.E. In (III.B) some results about abstract ordered semigroups are derived, relevant mainly for MM and the fine structure of LULU operators (III.D), but also to, for example, natural language processing (Lambek, 2006). Section III.D.2 about the root signals of LULU operators is situated at the intersection of LULU theory, semigroup theory, stack filters, and MM (see the Venn diagram in the Introduction). Section III.E introduces lattice stack filters (LSFs), which live in two worlds. Within the world of signal and image processing they generalize classic stack filters (III.A), and invite the framework of templates, a central concept of image algebra. The second world is the world of nonlinear systems, such as monotone Boolean networks, cellular automata, and others. Idempotence is a key issue in both worlds. We present a polynomial time algorithm that decides the idempotence of an LSF. The same is done for coidempotence, the relevance of which (at least within the first world) was amply demonstrated in Section II. A. Stack Filters With a View on LULU Operators After discussing positive Boolean functions (III.A.1), we define stack filters and recall the method of threshold decomposition to evaluate them (III.A.2). For LULU filters, which are special types of stack filters, there is a more efficient way (III.A.3). Section III.A.4 features a proof of Theorem 4 in II.B.3, generalized to stack filters. Finally, we look at linear, affine, and convex combinations of stack filters, thereby touching on smoothers (II.A.1) and socalled topical operators (III.A.5). 1. Positive Boolean Functions on Linearly Ordered Sets A superficial familiarity with Boolean functions is assumed. Recall that a Boolean function b : {0, 1}n → {0, 1} is positive if it is increasing in the usual sense: for all2 x, y ∈ {0, 1}n , it follows from x ≤ y, i.e., xi ≤ yi for all 1 ≤ i ≤ n, that b(x) ≤ b(y). This entails that b has no negated literals x¯i (whence the name positive), and admits a disjunctive normal form (DNF) b(x) = xi (DNF). (5) C∈C i∈C 2 It will always be clear from the context whether x, y, . . . are vectors, or (formally) bi-infinite series
such as in Section II.
108
ROHWER AND WILD
It may be assumed that C C for all distinct C, C in the family of sets C , since otherwise C can be canceled without affecting Eq. (5). Such a clean DNF is unique up to the order of leaves C ⊆ {1, 2, . . . , n}. Dually b admits a unique clean conjunctive normal form (CNF) xj (CNF). (6) b(x) = D∈D j ∈D
The CNF can be derived from the DNF, or vice versa; by using the distributive laws that hold in the Boolean lattice {0, 1}: α ∧ (β ∨ γ ) = (α ∧ β) ∨ (α ∧ γ ), α ∨ (β ∧ γ ) = (α ∨ β) ∧ (α ∨ γ ).
(7)
Alternatively, it is well known that the leaves in C are precisely the minimal transversals of the set system D , and vice versa. See Example 12 in Section III.E for a concrete calculation. One can evaluate Eqs. (5) and (6) not just for xi ∈ {0, 1} but, more generally, for elements xi coming from any linearly ordered3 set R, such as R = {0, 1, 2, . . . , L} or R = R. Because terminology has not been standardized, we call such a function b : Rn → R a positive Boolean function (PBF) on R. While the PBFs on {0, 1} are exactly the increasing functions {0, 1}n → {0, 1}, this is false for larger linearly ordered sets (Ronse, 1990; p. 309): Let R be linearly ordered. Then b : Rn → R is a PBF on R iff b is increasing, and b commutes with all increasing maps:
(8) σ : R → R, i.e., b σ (x1 ), . . . , σ (xn ) = σ b(x1 , . . . , xn ) . Other necessary conditions for a map b : Rn → R to be a PBF on R can be formulated, such as being a selector in the sense that b(α, β, . . . , γ ) ∈ {α, β, . . . , γ } for all α, β, . . . , γ ∈ R. Example 2. Suppose b1 (x) and b2 (x) have DNFs with clusters of leaves C and D , respectively, such that D⊆ for each C ∈ C , there is some D ∈ D with C. Then each conjunction i∈C xi of b1 (x) is ≤ some conjunction i∈D xi of b2 (x). This immediately implies that b1 (x) ≤ b2 (x) for all x ∈ {0, 1}n . Conversely, suppose this condition about C and D is not satisfied. To fix ideas, say b1 (x) = (x1 ∧ x3 ) ∨ (x1 ∧ x2 ∧ x4 ),
and
b2 (x) = (x1 ∧ x4 ) ∨ (x2 ∧ x3 ).
No leaf of b2 (x) being contained in {1, 3}, we set xi = 1 for i ∈ {1, 3}, and xi = 0 otherwise. It follows that b1 (1, 0, 1, 0) = 1 0 = b2 (1, 0, 1, 0). 3 In fact, R could be any lattice, but we postpone that possibility to Section III.E.
LULU THEORY, IDEMPOTENT STACK FILTERS
109
2. Threshold Decompositions Determining the median Mn (x1 , . . . , x2n+1 ) is a nontrivial computational problem. Sorting the numbers and then picking the middle number costs at least O(n log n) since every sorting algorithm is of that complexity. However, finding Mn (x1 , . . . , x2n+1 ) when all xi are either 0 or 1 amounts to a simple majority count. In 1984 Fitch et al. exploited this fact as follows. Fix a natural number (threshold) m. Then σm is defined on [0, L] = {0, 1, . . . , L} as σm (c) := 1 c ≥ m, 0 otherwise. For x = (x1 , . . . , x2n+1 ) with xi ∈ [0, L] set
σm (x) := σm (x1 ), . . . , σm (x2n+1 ) . One verifies at once the threshold decomposition of x: x=
L
(9)
σm (x).
m=1
Less obviously, it turns out that Mn (x) =
L
Mn σm (x) .
(10)
m=1
Example 3. Here n = 3 and L = 10. The arrow next to σm (x) points to M3 (σm (x)). x = (4, 10, 3, 9, 2, 6, 7), σ1 (x) = (1, 1, 1, 1, 1, 1, 1)
→
1,
σ2 (x) = (1, 1, 1, 1, 1, 1, 1)
→
1,
σ3 (x) = (1, 1, 1, 1, 0, 1, 1)
→
1,
σ4 (x) = (1, 1, 0, 1, 0, 1, 1)
→
1,
σ5 (x) = (0, 1, 0, 1, 0, 1, 1)
→
1,
σ6 (x) = (0, 1, 0, 1, 0, 1, 1)
→
1,
σ7 (x) = (0, 1, 0, 1, 0, 0, 1)
→
0,
σ8 (x) = (0, 1, 0, 1, 0, 0, 0)
→
0,
σ9 (x) = (0, 1, 0, 1, 0, 0, 0)
→
0,
σ10 (x) = (0, 1, 0, 0, 0, 0, 0)
→
0.
110
ROHWER AND WILD
Thus M3 (x) =
10
σm (x) = 1 + · · · + 1 + 0 + · · · + 0 = 6.
m=1
Generalizing Mn , the kth-order statistic OS(n, k), which picks the kth smallest of n elements, is equally amenable to this procedure. In fact, Eq. (10) generalizes to arbitrary PBFs b on R = [0, L] (Wendt et al., 1986): L
b(x) = b σm (x) .
(11)
m=1
Indeed, since σm : [0, L] → [0, L] is clearly increasing, it follows from Eq. (8) that b(σm (x)) = σm (b(x)), from which Eq. (11) follows at once. In order to access the computational benefits of Eq. (11), suppose b(x) is given in disjunctive normal form xi . (12) b(x) = C∈C i∈C
To fix ideas, say |C | = s and |C| = t for all leaves C in C . Let x have components xi in [0, L]. Calculating b(x) via Eq. (12) amounts to compute s times the minimum of a t-subset of [0, L] (plus one maximum operation). On the other hand, using Eq. (11) and deciding L times whether b(σm (x)) = 0 or b(σm (x)) = 1, works fast. Even better, since b(x) is increasing, it follows that there is a threshold of thresholds, namely, the unique number m ¯ ∈ [0, L] with the property that
¯ b σm (x) = 1 if m ≤ m, 0 if m > m. ¯ For instance, m ¯ = 6 in Example 3. Since b(x) = m, ¯ finding m ¯ is all we want. Using a binary search tree (Chen, 1989) is a natural idea that produces m ¯ in time O(log L). Binary search can be viewed as the case p = 0 of socalled Fibonacci p-codes, which sometimes are advantageous (Agaian et al., 1995; p. 202). We resume the discussion of threshold decomposition in the next section. 3. Stack Filters, Evaluation, and Optimization Let RZ be the set of all bi-infinite series x = (. . . , x−1 , x0 , x1 , . . .). A stack filter is a function Φ : RZ → RZ , which for some n admits a PBF b of type R2n+1 → R such that (Φx)s = b(xs−n , . . . , xs , . . . , xs+n )
(13)
LULU THEORY, IDEMPOTENT STACK FILTERS
111
for all x ∈ RZ and all indices s ∈ Z. Thus, albeit only one PBF is involved, it acts on different (overlapping) parts of the series x. Every stack filter Φ enjoys these properties: (L)
Φ is local in the sense that there is some n such that each component (Φx)s depends only on the xj s with s − n ≤ j ≤ s + n (I) Φ is increasing (C) Φ commutes with all increasing operators σ : R → R, i.e., Φ(σ ◦ x) = σ ◦ Φ(x). (HT) Φ is horizontally translation invariant, i.e., Φ ◦ E = E ◦ Φ. As to (L), this is clear from Definition 10. Properties (I) and (C) also derive from the fact that b(xs−n , . . . , xs+n ) has the corresponding properties [see Eq. (8)]. Choosing σ in the obvious way, it follows from Property (C) that Φ preserves all constant series. Recall from Section II that the shift operator E is defined by (Ex)i := xi+1 . Let us illustrate by pictures why Property (HT) holds for stack filters. Take a series x, move it, say, k = 2 units to the right, and then apply Φ: xi−2 xi−1 xi xi+1 xi+2 xi−4 xi−3 xi−2 xi−1 xi b(· · · xi−4 · · ·) b(· · · xi−2 · · ·) b(· · · xi · · ·) For instance, b(· · · xi−2 · · ·) abbreviates b(xi−2−n , . . . , xi−2 , . . . , xi−2+n ). Done the other way around (i.e., first applying Φ and then moving the result 2 units to the right) yields the same: xi−2 xi−1 xi xi+1 xi+2 b(· · · xi−2 · · ·) b(· · · xi · · ·) b(· · · xi+2 · · ·) b(· · · xi−4 · · ·) b(· · · xi−2 · · ·) b(· · · xi · · ·) By III.A.1 each PBF on R has a unique DNF and CNF. Accordingly, a stack filter Φ has an essentially unique (up to a shift of indices) DNF and CNF for each component (Φx)i . We sometimes compare the way in which i-leaves relate to j -leaves to topological concepts, although no topologies in the proper sense are present. Theorem 14 (Rohwer and Wild, 2002; p. 149). Let the stack filters Φ, Ψ have disjunctive normal forms (Φx)s = xs+j , respectively (Ψ x)s = xs+j . C∈C j ∈C
D∈D j ∈D
Then Φ ≤ Ψ iff (∀C ∈ C )(∃D ∈ D ) D ⊆ C. This is well known and the essence of the proof is in Example 2. We shall refer to C as the DNF cluster of Φ (the CNF cluster is defined dually).
112
ROHWER AND WILD
Example 4. It was mentioned in II.A.5 that Un Ln ≤ Mn ≤ Ln Un (n ≥ 1). Let us reprove this in terms of DNFs. To fix ideas, let C1 , . . . , C5 be the DNF clusters of L1 < U1 L1 < M1 < L1 U1 < U1 , respectively. Then
C1 = {−1, 0}, {0, 1} ,
C2 = {−1, 0}, {0, 1}, {−2, −1, 1, 2} C3 = {−1, 0}, {0, 1}, {−1, 1} , C4 = {−2, 0}, {−1, 0}, {−1, 1}, {0, 1}, {0, 2} , C5 = {0}, {−1, 1} . Referring to Theorem 14, U1 L1 ≤ M1 since every C ∈ C2 contains some D ∈ C3 (e.g., {−2, −1, 1, 2} contains {−1, 1}). Similarly, M1 ≤ L1 U1 since every C ∈ C3 contains some D ∈ C4 . It can be shown (Wild, 2005; Theorem 6) that the DNF cluster of Un Ln consists exactly of these sets: [k − n, k]
(0 ≤ k ≤ n)
[j − n, j ] ∪ [i, i + n] (1 ≤ i ≤ n) (i − 1 − n ≤ j ≤ −1). For instance, setting n = 1, forces i = 1, j = −1, and so [j − n, j ] ∪ [i, i + n] is reduced to {−2, −1, 1, 2} in C2 . Fix any n ≥ 1. All the leaves C in the DNF cluster of Un Ln are subsets of [−2n, 2n], and one checks that C ∩ [−n, n] ≥ n + 1 for every such C. Because the leaves in the DNF cluster of Mn are precisely all the (n + 1) subsets of [−n, n], it follows from Theorem 14 that Un Ln ≤ Mn . The shape of the DNF cluster of Ln Un (= CNF cluster of Un Ln ) is uglier; all its leaves have cardinality two or three (Wild, 2005; p. 177). In any case, Mn ≤ Ln Un follows from Un Ln ≤ Mn in view of the self duality of Mn . Observe that U2 L2 U1 L1 M2 since, for example, the DNF leaf {0, 1, 3, 4} / [Fn , Cn ] of the former contains no DNF leaf of the latter. It is likely that Mn ∈ for all n ≥ 2. Theorem 15 (Transfer Principle, folklore). If two stack filters Φ, Ψ are such that Φx = Ψ x for all bi-infinite 0, 1-sequences x, then Φ = Ψ . Proof. Say the PBFs underlying Φ and Ψ are b(y1 , . . . , y2n+1 ) and b (y1 , . . . , y2m+1 ), respectively, and say m ≥ n. By assumption, b and b coincide for all
LULU THEORY, IDEMPOTENT STACK FILTERS
113
y1 , . . . , y2m+1 ∈ {0, 1}. Thus y2n+2 , . . . , y2m+1 are fictitious variables of b , and so b can be replaced by b in the definition of Ψ . It follows that Φx = Ψ x for all x ∈ RZ . Consider a stack filter Φ, based on b(y), being applied to a series x ∈ RZ (with components normalized to xi ∈ [0, L]). In Section III.A.2, we saw how threshold decomposition combined with Chen’s binary search reduces ¯ for a the calculation of b(y), y ∈ [0, L]2n+1 fixed, to the calculation of b(y) few vectors y¯ ∈ {0, 1}2n+1 . In the context of stack filtering, the PBF b must be evaluated many times, once for each desired component (Φx)i . In view of this, two problems (we oversimplify) remain: How to exploit (Φx)i for the computation of (Φx)i+1 ;
(14)
albeit y is binary, how to nevertheless speed up the computation of b(y). (15) Stack filters Φ based on order statistics b = OS(2n + 1, k), and (weighted) variations thereof, are arguably the most popular ones; see Astola and Kuosmanen (1997). For them, Problem (14) works reasonably well, and Problem (15) is not an issue when threshold decomposition is opted out in favor of other methods (such as histogram-type algorithms (Agaian et al., 1995). In another vein, stack filters Φ are now bred with genetic algorithms and use performance on test images as selection criteria. The outcome Φ is unpredictable, and the best method to slice computational costs is threshold decomposition combined with (15). More generally, there are many articles4 devoted to the optimization of stack filters. As in Section II, idempotency is desirable. Section III.E.3 is dedicated to a characterization of idempotent stack filters. We cite from Shmulevich and Coyle (1998): Such a characterization would prove to be most useful in the theory of optimal stack filtering and would facilitate the search for optimal idempotent stack filters. The LULU filters Ln , Un and their compositions are also stack filters. How do they fit the picture? The PBF underlying Ln Un is complex, although useful for some purposes (e.g., in Example 4 above or to prove that Ln Un is strong, III.C.1). But applying threshold decomposition combined with (14) and (15) to compute (Ln Un x)i would be ridiculous. Rather decompose Ln Un as a product of three very simple order statistics: n n n n n 2n n Ln Un = ◦ ◦ ◦ ◦ ◦ = = OS(n + 1, n + 1) ◦ OS(2n + 1, 1) ◦ OS(n + 1, n + 1). (16) 4 For starters, use the Google search engine with term “stack filter optimization” and/or see Paredes and Arce (2001) and the homepages of I. Shmulevich, M. Gabbouj, Kuo-Chin Fan, and T. Saramäki.
114
ROHWER AND WILD
Although order statistics are core nonlinear digital filters, the decomposition in Eq. (16) is not featured in standard textbooks. The maximum (or minimum) of a window of numbers is the easiest order statistic. No threshold decomposition or histogram-type algorithm is required to compute ( n x)i . Furthermore, in most cases the computationof ( n x)i+1 amounts to a comparison of just two numbers: xi+n+1 and ( n x)i . Once the series y = nx is computed, we get z = 2n y in analogous fashion, and similarly from it n z = Ln Un x. (See II.C.4 for more on numerics.) After these computational benefits of LULU operators, some words of a more aesthetic nature are indicated. While all stack filters of window length 2n + 1 arise as suprema of infima of some obvious 2n + 1 (projection) operators,one may alternatively start out with just two operators, namely the electron and the proton , and take arbitrary compositions thereof. The resulting class of , -operators F comprises all LULU operators. In fact, LULU operators are neatly characterized within this class. These properties are equivalent (see Eq. (34) in III.B.3 and the following remark): 1. F is LULU operator 2. F is idempotent 3. F contains equally many and . 4. Teasing a Little Linearity From Stack Filters While trying to distill the essence of the proof of Rohwer (2005; Theorem 6.15)—Theorem 4 in II.B.3—the second author derived the following generalization. When considering a stack filter Φ, can we ever expect that Φ(y + z) = Φ(y) + Φ(z)? Certainly not for all y, z, but how should y, z be related in order to tease a bit of linearity out of Φ? The answer we provide is that it works when y and z are based on a common source x. More precisely, y = Ψ1 x and z = Ψ2 x for some suitable operators Ψ1 and Ψ2 . The details are as follows. Suppose we are given a concept of a good interval. This can be any property of discrete intervals I = {x1 , . . . , xm } (xi ∈ R). Enter a class K ⊆ RZ of series and a stack filter Φ on RZ . We say that Φ likes K if the following holds: For each fixed x ∈ K and i ∈ Z there is some good interval {xj , . . . , xj +m } and a corresponding5 PBF b such that for all y ∈ RZ for which xα → yα (j ≤ α ≤ j + m) happens to be an increasing map, (Φy)i = b (yj , . . . , yj +m ).
(17)
5 This b is usually different (easier) than the PBF b underlying Eq. (13) of Φ. See the concrete application after Theorem 16.
LULU THEORY, IDEMPOTENT STACK FILTERS
115
In particular, putting y = x, the result is (Φx)i = b (xj , . . . , xj +m ). So much about Φ. As to Ψ1 , Ψ2 , call an operator (not necessarily a stack filter) Ψ on RZ interval trend processing (ITP) on I = {xj , . . . , xj +m } if (∀xα , xβ ∈ I )
xα ≤ xβ ⇒ (Ψ x)α ≤ (Ψ x)β .
(18)
Notice the relation to NTP (II.B.2), which concerns all series x and all intervals {xi , xi+1 } (i ∈ Z). However, restricted to a particular interval I , ITP is clearly stronger then NTP. Theorem 16. Referring to a fixed concept of good interval, suppose the stack filter Φ likes the class K ⊆ RZ . Furthermore, suppose Ψ1 , Ψ2 are operators on RZ that are ITP on all good subintervals of series x ∈ K. Then Φ(Ψ1 + Ψ2 )x = ΦΨ1 x + ΦΨ2 x for all x ∈ K. If, additionally, Ψ1 − Ψ2 is ITP on all good subintervals of series x ∈ K, then also Φ(Ψ2 − Ψ2 )x = ΦΨ1 x − ΦΨ2 x for all x ∈ K. Proof. Fixing x ∈ K and i ∈ Z, we show that Φ(Ψ1 + Ψ2 )x i = [ΦΨ1 x]i + [ΦΨ2 x]i . From Fact (17) we have (Φx)i = b (xj , . . . , xj +m ) for some good interval I = {xj , xj +1 , . . . , xj +m } (observe that xi need not be in I ). Say (Φx)i = xk ∈ I . Set y := Ψ1 x. The fact that Ψ1 is ITP on I amounts to saying that σ (xα ) := yα (j ≤ α ≤ j + m) is an increasing map. Thus
(8)
(17) Φ(Ψ1 x) i = (Φy)i = b σ (xj ), . . . , σ (xj +m ) = σ b (xj , . . . , xj +m ) = σ (xk ) = yk . Setting z := Ψ2 x, we analogously get [Φ(Ψ2 x)]i = zk . Clearly, with Ψ1 and Ψ2 also Ψ1 + Ψ2 is ITP on all good subintervals of series x ∈ K. Setting v := (Ψ1 + Ψ2 )x, we therefore have Φ(Ψ1 + Ψ2 )x i = vk = yk + zk = [ΦΨ1 x]i + [ΦΨ2 x]i . It does not follow automatically from Ψ1 , Ψ2 being ITP that Ψ1 − Ψ2 is as well. But if Ψ1 − Ψ2 happens to be ITP, the above argument carries over to Ψ1 − Ψ2 (put w := (Ψ1 − Ψ2 )x): Φ(Ψ1 − Ψ2 )x i = wk = yk − zk = [ΦΨ1 x]i − [ΦΨ2 x]i .
116
ROHWER AND WILD
Let us prove Theorem 4 of II.B.3 in this framework. Call an interval I good if it either is monotone of cardinality n + 1, or is of type I = {xj > xj +1 = · · · = xj +n < xj +n+1 }. Setting
K := Mn−1 ,
Φ := Un ,
Ψ1 := id,
Ψ2 := AUn ,
we must verify Condition (17) and check that all of id, AUn , id − AUn are ITP on good intervals. (Of course, id is the identify map.) Fix x ∈ Mn−1 and i ∈ Z. We distinguish two cases below. First case. (Un x)i = xi . In view of x ∈ Mn−1 , it easily follows that either xi ≥ xi+1 ≥ · · · ≥ xi+n or xi−n ≤ · · · ≤ xi or xj = · · · = xi = · · · = xj +n for some j . All three constitute a good interval. If b is the PBF that yields the maximum of n + 1 numbers, then in all three cases b yields (Un x)i . Let y ∈ RZ be such that either yi ≥ · · · ≥ yi+n or yi−n ≤ · · · ≤ yi or yj = · · · = yi = · · · = yj +n . In all three cases, b yields yi , and from the definition of Un it follows at once that yi = (Un y)i (independent of what y is outside this n + 1-window). Second case. (Un x)i = xi , hence (Un x)i > xi . In view of x ∈ Mn−1 , this implies that xj > xj +1 = · · · = xi = · · · = xj +n < xj +n+1 for some j , and so (Un x)i = xj ∧xj +n+1 . Thus {xj , . . . , xj +n+1 } is a good interval, and we define the corresponding PBF as b (yj , . . . , yj +n+1 ) := yj ∧ yj +n+1 . Let now y ∈ RZ be such that xα → yα (j ≤ α ≤ j + n + 1) is increasing. If say (Un x)i = xj then b (yj , . . . , yj +n+1 ) = yj (since xj ≤ xj +n+1 implies yj ≤ yj +n+1 ). But yj = (Un y)i by the very definition of Un . As to id being ITP, this is trivial. As to AUn , with A and Un also AUn is NTP. Obviously NTP is equivalent to ITP on any monotone interval. What about good intervals of type {xj > xj +1 = · · · = xj +n < xj +n+1 }? Say w.l.o.g. xj ≤ xj +n+1 . The other cases xα ≤ xβ in Eq. (18) being trivial (by NTP), it remains to argue that (AUn x)j ≤ (AUn x)j +n+1 . This follows from (Un x)j = · · · = (Un x)j +n < (Un x)j +n+1 ,
which is
xj = · · · = xj < xj +n+1
(19)
and from A being NTP. As to id − AUn , by assumption AUn is not just NTP but also difference reducing. This implies that for a monotone interval xα ≤ · · · ≤ xβ , one has 0 ≤ (AUn x)β − (AUn x)α ≤ xβ − xα , and so [(id − AUn )x]α ≤ [(id − AUn )x]β . As to the case xj > xj +1 = · · · = xj +n < xj +n+1 , say again xj ≤ xj +n+1 . Since A is difference reducing, it follows that (19)
(AUn x)j +n+1 − (AUn x)j ≤ (Un x)j +n+1 − (Un x)j = xj +n+1 − xj ,
LULU THEORY, IDEMPOTENT STACK FILTERS
117
whence again [(id − AUn )x]j ≤ [(id − AUn )x]j +n+1 . This completes the proof of Theorem 4 in II.B.3. See also the remarks after Corollary 33 in III.E.4. 5. Linear Combinations of Stack Filters We saw that each stack filter satisfies Properties (L), (I), (C), and (HT). Actually, in view of Fact (8), the following is true: An operator Φ : RZ → RZ is a stack filter iff it satisfies Properties (L), (I), (C), and (HT). (20) Notice that Property (C) entails both Properties (VT) and (SI) because addition of a constant, or multiplication with a nonnegative constant, are both increasing maps σ : R → R. Thus each stack filter is a smoother. But the latter class is bigger, as will be clear from the sequel. In the remainder of III.A.5, becoming increasingly specific, we look at linear, affine, and convex combinations of stack filters. is a paucity of data on unrestricted linear combinations Φ = There n λ Φ i=1 i i (λi ∈ R) of stack filters Φi . For example, what about a type of normal form akin to Theorem 14? Nevertheless, Properties (L) and (HT) carry over from the Φi s to Φ, the latter because the shift operator E is a linear selfmap of RZ . But lacking Property (VT), such operators Φ are barely useful for smoothing; however, they appear anyway alias Φ = Ψ − I (Ψ stack filter, I the identity map) and challenge us to check their idempotency (see Sections II and III.E.4). n us look at affine combinations Φ = i=1 λi Φi , so by definition Let n λ = 1. i=1 i Now Property (VT) is inherited: Φ(c + x) =
n
λi Φi (c + x) =
i=1
=
n i=1
n
λi c + Φi (x)
i=1
λi c +
n
λi Φi (x) = c + Φ(x).
i=1
It follows that the class of smoothers, being defined by Properties (SI), (VT), and (HT), is closed under affine combinations. In particular, each affine combination of stack filters is a smoother; such as the average value in a centered (2n + 1)-window: (Pn x)i :=
1 (xi−n + xi−n+1 + · · · + xi + · · · + xi+n ). 2n + 1
118
ROHWER AND WILD
Another useful smoother (cf. II.C.2) is Qn := Ln + Un − I
(here 1 + 1 − 1 = 1).
A more exotic affine combination of stack filters would be
1 2 (Φx)i := (xi−3 ∧ xi+2 ) ∨ (xi−1 ∧ xi+1 ) − (xi ∨ xi+1 ) 3 4 7 + (xi−3 ∧ xi−2 ∧ xi−1 ). 12 Question: How much bigger is the family of smoothers than the family of affine combinations of stack filters? n now convex combinations Φ = i=1 λi Φi , so additionally to Consider n i=1 λi = 1 we have 0 ≤ λi ≤ 1 for all 1 ≤ i ≤ n. The key effect is that now Property (I) is inherited as well. Operators RZ → RZ (actually Rn → Rn ) that satisfy Properties (I) and (VT) were coined topical in Gunawardena (2003). It follows that the class of topical operators is closed under convex combinations. In particular, each convex combination of stack filters is a topical operator. Topical operations occur in a wide variety of contexts, for example, in discrete event systems, and recently even the Perron–Frobenius theorem (classically about eigenvalues of positive matrices) was proved in that setting. Whether topical operators befriend nonlinear smoothing remains to be seen. Figure 21 summarizes III.A.5. B. Abstract Semigroups and Galois Connections Here we distill, and partly elaborate on, those parts of LULU theory and MM (treated in III.C) that are purely semigroup-theoretic or are a matter of Galois connections. These build the basis for Section III.D. The direction is from the general to the specific: semigroups → ordered semigroups → Galois connections. Albeit later applied to LULU theory, we trust that Theorems 20, 21, and 23 also appeal to the pure semigroup theorist. 1. Semigroups and Their Idempotent Elements Recall that a semigroup (H, ∗) is a set H endowed with an associative binary operation—that is, (a ∗b) ∗c = a ∗(b ∗c) for all a, b, c ∈ H . We use the letter H (as in German Halbgruppe) because S is reserved for other purposes (in Sections II.A–II.C, and III.E). For notational convenience, we usually write ab for a ∗ b. An idempotent is an element a ∈ H such that a 2 = aa = a.
LULU THEORY, IDEMPOTENT STACK FILTERS
F IGURE 21.
119
A Venn diagram summary of III.A.5.
In this section, we are interested mainly in how and when the idempotency of elements is carried over to products thereof. The easiest instance is that a and b are commuting idempotents; then ab is idempotent as well: !
(ab)(ab) = a(ba)b = a(ab)b = (aa)(bb) = ab. Concerning the applications we have in mind, it is rarely the case that idempotents a, b commute, but some weaker properties, such as aba = ba (which is implied by ab = ba but not conversely), may still have the desired effect (Lemma 17). We set Id(H ) := a ∈ H : a 2 = a . A semigroup B is a band if Id(B) = B. Here are some fundamental examples. A left zero semigroup X is such that xx = x for all x, x ∈ X. Thus, put on the left, every element acts like zero. Dually, a right zero semigroup Y has yy = y for all y, y ∈ Y . Obviously X and Y are bands. Hence, the direct product X × Y (with the operation defined component-wise) is a band as well. In fact, any a = (x, y) and b = (x , y ) in X × Y satisfy:
aba = (x, y)(x , y )(x, y) = x(x x), (yy )y = (x, y) = a.
120
ROHWER AND WILD
It can be shown that conversely any semigroup satisfying aba = a for all a, b, must be isomorphic6 to the direct product B of a left zero with a right zero semigroup. Such bands B are called rectangular.7 In each semilattice (T ≤) (see II.A.4) we trivially have a ∨ a = a and a ∨ b = b ∨ a. Also (a ∨ b) ∨ c = a ∨ (b ∨ c) (exercise) and hence each semilattice is a commutative band (T , ∨). Again, every commutative band (T , ∗) turns out to be a semilattice (with a ≤ b being defined by a ∗ b = a). Surprisingly, every band B is composed in a lucid way as a semilattice of rectangular bands. Namely, there is a semilattice (T , ∨), and rectangular bands Bt (t ∈ T ), such that B is their disjoint union and such that a ∈ Bt , b ∈ Bt implies ab ∈ Bt∨t . Here is another crucial fact. Every finitely generated band is in fact finite. If μ(n) is the maximum cardinality of an n-generated band, then μ(1) = 1,
μ(2) = 6,
μ(3) = 159,
μ(5) = 2751884514765,
μ(4) = 332380,
μ(n) < ∞.
(21)
We reiterate that products of idempotents are generally not idempotent. All that Eq. (21) implies is: If all products of, for example, three idempotents happen to be idempotent, then there are at most 159 such products. An n-generated band with exactly μ(n) elements is called free. Example 5. Each 2-generated band B = a, b" has at most μ(2) = 6 elements, in which case the accompanying semilattice is T = {t, t , t ∨ t } and B “looks” so (Figure 22):
F IGURE 22.
The structure of the free 2-generated band as a semilattice of rectangular bands.
As fundamental as normal subgroups for groups are the Green’s relations L and R for semigroups H . These are two equivalence relations on H that are 6 Let us just prove parts of that, namely, that aba = a for all a, b implies a 2 = a for all a (Howie, 1976; p. 96). Indeed, setting b = a in aba = a yields a 3 = a for all a ∈ B, whence a 4 = a 2 , whence a = a(a 2 )a = a 4 = a 2 .
7 The name indicates that the elements of B can be displayed in a rectangular grid in such a way that for all a, b ∈ B, the product ab sits where the row of a intersects the column of b. See Example 5.
LULU THEORY, IDEMPOTENT STACK FILTERS
121
very much dual to each other. By an application of associativity (Howie, 1976; p. 39), the relational products L ◦ R and R ◦ L coincide. As a consequence, D := L ◦ R = R ◦ L is an equivalence relation as well. Thus, aDb
⇔
(∃c ∈ H ) aLc
and cRb.
For us, it suffices to define L and R on the subset Id(H ) ⊆ H . In this case, the definition simplifies a bit (Howie, 1976, Example 2, p. 53): aLb
:⇔
ab = a
and ba = b,
aRb
:⇔
ab = b
and ba = a.
In the case of bands B, the D-classes are nothing more than the previously mentioned rectangular bands Bt . To fix ideas, the R-classes contained in the large D-class in Example 5 are {ab, aba} and {bab, ba}, while the L-class it contains are {ab, bab} and {aba, ba}. One has ab D ba since ab L aba and aba R ba. Lemma 17. Suppose the elements e, d of a semigroup satisfy the relations (de)en d n+1 = en d n+1 and (ed)d n en+1 = d n en+1 for all n ≥ 0. Putting an := en d n and bn := d n en yields am am = amax(m,m ) , a m b n a m = b n a m
(m ≤ m),
bn bn = bmax(n,n ) , b n a m b n = a m b n
(22)
(n ≤ n), (23)
and each product x of e’s and d’s can be rewritten in (at least) one of four ways: 1. 2. 3. 4.
bn1 am1 bn2 am2 · · · (m1 > m2 > · · · , n1 > n2 > · · ·), am1 bn1 am2 bn2 · · · (m1 > m2 > · · · , n1 > n2 > · · ·), d s et bn1 am2 bn2 · · · (m1 > m2 > · · · , n1 > n2 > · · ·) (0 ≤ s < t), et d s am1 bn1 am2 · · · (m1 > m2 > · · · , n1 > n2 > · · ·) (0 ≤ t < s).
Furthermore, if x contains the same number of d’s and e’s, it can be rewritten in form 1 or 2. The proof of Lemma 17 can be modeled almost literally on the proofs of Rohwer and Wild (2002; Theorems 5.1 and 5.8). Note that for n = 0, the hypotheses about e and d in Lemma 17 reduce to ded = d and ede = e. Here is a concrete calculation illustrating Eq. (23):
a2 b3 a4 = e2 d 2 d 3 e3 e4 d 4 = e(ed)d 4 e7 d 4 = ed 4 e7 d 4 = (ed)d 3 e7 d 4 = d 3 e7 d 4 = b3 a4 . Neither e, d, nor am , bn in Lemma 17 need be idempotent. On the other hand, the elements am , bn in Theorem 18 below are idempotent [that follows
122
ROHWER AND WILD
from setting m = m , n = n in Eq. (22)], but not necessarily linked to any other elements e, d. The proof of Theorem 18 is easily modeled on the arguments given in Rohwer and Wild, 2002 (pp. 153–154). In III.D.1, we shall illustrate Theorem 18 in a more specific setting. Theorem 18. Suppose the elements am (m ≤ M) and bn (n ≤ N) of the semigroup H satisfy (22) and (23). Then the subsemigroup B generated by the am s and bn s is a band of cardinality
M +N +2 |B| ≤ − 2. N +1 In fact, B is a semilattice of right zero semigroups; each x ∈ B can be written in form 1 or 2 (cf. Lemma 17), and the cardinality bound is sharp iff distinct forms always represent distinct elements of B. The Green’s relations L and R are completely symmetric on an abstract level, but R comes to the fore8 if H is a semigroup of self-maps on P (where P is any fixed set and the semigroup operation is composition of maps). In this case, it makes sense to consider the range of a ∈ H , Ran(a) := a(x): x ∈ P , and the domain of invariance
Inv(a) := x ∈ P : a(x) = x .
These facts are well known in semigroup theory and easy to verify: (∀a ∈ H )
Inv(a) ⊆ Ran(a),
(∀a, b ∈ H )
(24)
ab = b ⇔ Ran(b) ⊆ Inv(a),
(∀a ∈ H ) Inv(a) = Ran(a) ⇔ a ∈ Id(H ),
∀a, b ∈ Id(H ) Inv(a) = Inv(b) ⇔ aRb.
(25) (26) (27)
2. Ordered Semigroups An ordered semigroup H is endowed with a partial order ≤ that is compatible with the semigroup operation: (∀a, b, c ∈ H ) a ≤ b
⇒
(ca ≤ cb
and
ac ≤ bc).
(28)
For later use, let us verify that in particular every semigroup H generated by increasing selfmaps a, b, . . . on a poset P satisfies (28). Indeed, if a, b, c 8 If mappings were to be written on the right, that is, as (x)a instead of a(x), it would be L.
LULU THEORY, IDEMPOTENT STACK FILTERS
123
are such that a(x) ≤ b(x) for all x ∈ P , then c(a(x)) ≤ c(b(x)) (since c is increasing) and a(c(x)) ≤ b(c(x)) (trivial). In case B is an ordered band, we emphasize that the order on B need not bear any relation with the semilattice order that connects the rectangular subbands (= D-classes) of B. Here is an immediate adaptation to ordered semigroups of a result of Schonfeld and Goutsias (1991). Theorem 19. Let H be an ordered semigroup. (1) If xi , b, c are idempotents of H such that b ≤ xi ≤ c for all 1 ≤ i ≤ m, then any product of these elements that contains both b and c is idempotent. (2) If a1 ≤ a2 ≤ · · · ≤ an are idempotents, then any product of them is idempotent. Proof. As to (1), it actually suffices that b2 ≥ b and c2 ≤ c. To see this, a typical example suffices: (x2 bx1 x2 cx3 x1 )(x2 bx1 x2 cx3 x1 ) ≤ x2 bx1 x2 c8 x3 x1 ≤ x2 bx1 x2 cx3 x1 (x2 bx1 x2 cx3 x1 )(x2 bx1 x2 cx3 x1 ) ≥ x2 b8 x1 x2 cx3 x1 ≥ x2 bx1 x2 cx3 x1 . As to (2), since the ai s are linearly ordered, any product ai1 ai2 · · · aik contains a smallest element b and a largest element c. Thus (1) can be applied. One says that an element b of a semigroup H is an inverse of a ∈ H if a = aba and b = bab. In this case, a is also an inverse of b and both ab and ba are obviously idempotent. For instance, any two elements of a rectangular band are mutually inverse. The next theorem identifies the essential ingredients in a result about Galois connections [see Eq. (34)] and makes it a mere matter of ordered semigroups. Theorem 20.
Let d and e be elements of an ordered semigroup such that
de ≤ ed, and d n , en are mutually inverse for all n ≥ 1.
(29)
Then every product x of d’s and e’s with the same number of both is idempotent. Proof. As seen above, from d n and en being mutually inverse follows that both d n en and en d n are idempotent. Hence de ≤ ed implies d 2 e2 = d(de)e ≤ d(ed)e = (de)(de) = de. Similarly, e2 d 2 = e(ed)d ≥ e(de)d = (ed)(ed) = ed. Now if d n en ≤ d n−1 en−1 (true for n = 2), then d n+1 en+1 = d(d n en )e ≤ d(d n−1 en−1 )e = d n en . Dually for en d n , and so · · · ≤ d 3 e3 ≤ d 2 e2 ≤ de ≤ ed ≤ e2 d 2 ≤ · · · .
(30)
124
ROHWER AND WILD
Next we show that x is a product of idempotents of type ai := d i ei and bj := ej d j , and so x is idempotent by Eq. (30) and Theorem 19(2). Assume there were products of equally many d’s and e’s that are not products of ai s and bj s. Among all these, we choose a product y with the minimum total number of d’s and e’s and seek a contradiction. Since y = d m1 en1 or y = en1 d m1 implies m1 = n1 by assumption, and whence y = am1 or y = bm1 , we must have y = · · · en2 d m1 en1 or y = · · · = d m2 en1 d m1 (all exponents ≥ 1). W.l.o.g. say the latter. Case 1. n1 ≥ m1 . Then z := · · · d m2 en1 −m1 has as many d’s and e’s, but fewer than y, and so z is a product of ai s and bj s. Hence, y = zbm1 is also such a product, a contradiction. Case 2. n1 < m1 . If we had n1 ≤ m2 , then, in view of d n1 en1 d n1 = d n1 , we deduce
y = · · · d m2 −n1 d n1 en1 d n1 d m1 −n1 = · · · d m2 +m1 −n1 , contradicting the minimality property of y. Therefore m2 < n1 < m1 . If we had n2 ≥ m2 < n1 , a similar contradiction due to em2 d m2 em2 = em2 would arise. Hence, n2 < m2 . Generally, in our fixed representation y = e n r d mr · · · e n 1 d m1 ,
respectively y = d mr · · · en1 d m1 ,
one must have n r < mr < · · · < n 1 < m1 ,
respectively mr < · · · < n1 < m1 . In both cases we get the contradiction ni < mi .
Some semigroups H have an identity—an (uniquely determined) element 1 ∈ H such that 1x = x1 = x for all x ∈ H . Such an H is called a monoid. Here are two (individually) sufficient conditions that imply parts of the hypothesis (29) in Theorem 20. Theorem 21.
Let d, e be elements of an ordered semigroup H .
(1) If H has an identity 1 and de ≤ 1 ≤ ed, then d n , en are mutually inverse for all n ≥ 1. (2) Suppose only that de ≤ ed but that additionally (de)en d n+1 = en d n+1 and (ed)d n en+1 = d n en+1 for all n ≥ 0. Then again d n , en are mutually inverse (n ≥ 1). Proof. As to (1), from de ≤ 1 follows d n+1 en+1 = d n (de)en ≤ d n 1en = d n en . Similarly, 1 ≤ ed implies en d n ≤ en+1 d n+1 , and so Eq. (30) is
LULU THEORY, IDEMPOTENT STACK FILTERS
125
strengthened to · · · ≤ d 3 e3 ≤ d 2 e2 ≤ de ≤ 1 ≤ ed ≤ e2 d 2 ≤ e3 d 3 ≤ · · · From d n en ≤ 1 follows d n en d n ≤ d n . Dually 1 ≤ en d n yields d n ≤ d n en d n . Hence d n en d n = d n , and ditto en d n en = en . As to (2), the fact that d n and en are mutually inverse will be evident already from the case n = 3: d 3 e3 d 3 = d 2 (de)e2 d 3 = d 2 e2 d 3 = d(de)ed 3 = ded 3 = d 3 , e3 d 3 e3 = e2 (ed)d 2 e3 = e2 d 2 e3 = e(ed)de3 = ede3 = e3 .
Theorem 21(1) relates to left pregroups (Lambek, 2006; p. 46) which are ordered monoids H such that each e ∈ H admits an element d ∈ H with de ≤ 1 ≤ ed. Here are two results of Matheron (Serra, 1982; p. 119), recast in the more illuminating setting of ordered semigroups (H, ≤) and Green’s relation R. Notice that neither (H, ≤) nor its subposet (Id(H ), ≤) need be lattices; that is, suprema and infima may fail to exist. Lemma 22. Given an ordered semigroup (H, ≤), any g ≤ f in Id(H ) generates an at most six-element subsemigroup that forms a sublattice L of (Id(H ), ≤). Proof. According to Theorem 19, the subsemigroup L ⊆ H generated by g, f is a subset of Id(H ). By Eq. (21) and Example 5, it then follows that L comprises at most the six elements displayed on the left in Figure 23 (some might coincide). One verifies that the order relations postulated by the diagram are correct. For instance, from g ≤ f follows g(fg) ≤ f (fg) = fg. A priori, fgf is some upper bound of gf and fg. Let c ∈ Id(H ) be any upper bound of gf, fg. Then c = c2 ≥ fggf = fgf , whence fgf is the smallest upper bound (i.e., fgf = gf ∨ fg). The same approach shows that gfg = gf ∧ fg. Consider any elements x ∧ y ≤ x, y ≤ x ∨ y of a lattice. Then obviously x ∧ y = x ⇔ x ∨ y = y, and also x ∧ y = y ⇔ x ∨ y = x. This is neatly visualized by collapsing the transposed links (x ∧ y, x) and (y, x ∨ y) in the Hasse diagram of the lattice (see McKenzie et al., 1987). Theorem 23. Let g ≤ f be idempotents in an ordered semigroup H . Then these equivalences hold (with the understanding that Inv(· · ·) only applies when H is a semigroup of self-maps): fg ≤ gf ⇔ gfg = fg ⇔ fgf = gf ⇔ gf R fg
126
ROHWER AND WILD
⇔ Inv(gf ) = Inv(fg) ⇔ Inv(gf ) = Inv(g) ∩ Inv(f ) ⇔ Inv(fg) = Inv(g) ∩ Inv(f ) Proof. From fg ≤ gf follows fg = gf ∧ fg = gfg. But a collapsed link (gfg, fg) on the left in Figure 23 is equivalent to the transposed link (gf, fgf ) being collapsed; and the latter conversely implies fg ≤ gf :
F IGURE 23.
Collapse of a 6-element lattice to a 4-element lattice.
It is immediate that (gfg = fg and fgf = gf ) ⇔ gf Rfg. Recall that gf Rfg ⇔ Inv(gf ) = Inv(fg) by Eq. (27). The first inclusion in Inv(g) ∩ Inv(f ) ⊆ Inv(gf ) ⊆ Inv(g) is trivial, and the second holds by Eqs. (25) and (26) whenever g is idempotent. Hence, Inv(gf ) = Inv(g) ∩ Inv(f ) iff Inv(gf ) ⊆ Inv(f ). The latter holds by Eqs. (25) and (26) iff f (gf ) = gf . Similarly, Inv(fg) = Inv(f ) ∩ Inv(g) is equivalent to gfg = fg. We promote the name Matheron pair for elements g ≤ f of an ordered semigroup that realize the equivalent statements of Theorem 23. Matheron pairs are encountered in action in Sections III.C.2, III.D.1, and III.E.4. Starting with Saitô (1962), who thoroughly investigated Green’s relations in case of a linearly ordered band, a lot of sporadic results about ordered semigroups have accumulated in various fields of mathematics. The deliberations in II.B.2 are just a minor reflection thereof. The time seems ripe for a coherent monograph dedicated to the algebraic theory of ordered semigroups, with hints to their many applications! 3. Galois Connections Let P and Q be posets. An increasing map δ : P → Q is called residuated (Blyth, 2005; p. 7) if for each y ∈ Q there is a biggest element in P that
LULU THEORY, IDEMPOTENT STACK FILTERS
127
is mapped below y. In formulas, if the biggest element is termed ε(y), this reads: (∀x ∈ P )(∀y ∈ Q)
δ(x) ≤ y
⇔
x ≤ ε(y).
(31)
It follows that ε, considered as a map from Q and P , is increasing as well, and that for each x ∈ P there is a smallest element in Q, namely δ(x), which is mapped above x. The map ε is called the residual of δ. It is clear from Eq. (31) that the roles are interchangable; we could as well start with ε and then derive δ. Thus one determines the other, which justifies the notation δ + := ε and ε+ := δ. We call the pair (δ, ε) a Galois connection9 between P and Q. If (δ, ε) is a Galois connection between P and Q, then δεδ = δ, εδε = ε, δε ≤ I is an opening on P , and εδ ≥ I is a closure on Q. (32) This is all well known. It also follows from Theorems 20 and 21(1): Since ε(y) is mapped below y under δ (i.e., δ(ε(y)) ≤ y), we have δε ≤ I . The same applies for I ≤ εδ. In fact, every opening α (and every closing) arises in this way (Blyth, 2005; p. 10): For each opening α on a poset P , there is a suitable poset Q and a Galois connection (δ, ε) between them such that α = δε. (33) Notice that generally Q cannot be chosen to be P . It turns out that the composition of composable residuated maps δ1 and δ2 is again residuated and that (δ1 ◦ δ2 )+ = δ2+ ◦ δ1+ . In particular, the set Res(P ) of all residuated maps P → P is a sub-semigroup of the semigroup Mon(P ) of all monotone maps P → P . For instance, if P is a three-element chain, then | Mon(P )| = 10 and | Res(P )| = 6. Readers may verify that Mon(P ), and hence Res(P ), are ordered semigroups in the sense of Eq. (28) with respect to the pointwise ordering of maps in Mon(P ). Along with Res(P ), the subset Res+ (P ) of residual maps is a sub-semigroup, but their union generally is not. Fact (34), a special case of Theorem 20, can be viewed as a statement about the idempotents in the sub-semigroup of Mon(P ), which is generated by Res(P ) ∪ Res+ (P ). Let (δ, ε) be a Galois connection on the poset P . Then any product of equally many δs and εs is idempotent (Heijmans, 1997; Property 7.1). (34) 9 This is in honor of Evariste Galois, killed at the age of 20 in a duel, who investigated these structures 200 years ago in an algebraic setting. (A similar loss to science was David Marr’s death at the youthful age of 36.) Note that the name Galois connection is sometimes reserved to a dual setup where δ and ε are antitone maps; see (Blyth, 2005; p. 14).
128
ROHWER AND WILD
For the special case (δ, ε) = ( , ) this has been shown independently in Rohwer andWild, 2002; Theorem 5.8; moreover, here only products with equally many and are idempotent. Much of Blyth (2005) is dedicated to the question of how order-theoretic properties of P are reflected as algebraic properties of Res(P ). Of foremost interest is the case where the posets happen to be lattices. This is just the beginning of the theory: If L, L are complete lattices, then δ : L → L is residuated iff δ is a (35) complete ∨-homomorphism. The sufficiency of the stated condition is obvious: Given y ∈ L , the biggest element x¯ ∈ L mapped below y is x¯ := x: δ(x) ≤ y since δ(x) ¯ = δ(x): δ(x) ≤ y ≤ y. For the necessity, see (Blyth, 2005; Theorem 2.8). Not surprisingly, δ + : L → L is a complete ∧-homomorphism. For powerset lattices L = L = P (S) one has the following (Blyth, 2005; p. 8): Let r be a binary relation on a nonempty set S. Then the mapping δr : P (S) → P (S) given by δr (X) := {y ∈ S: (∃x ∈ X)(x, y) ∈ r} is residuated. Conversely, every residuated mapping δ : P (S) → P (S) is of type δ = δr for some suitable binary relation r. (36) C. Mathematical Morphology We cite from the preface of Heijmans (1994): Mathematical morphology is a theory which is concerned with the processing and analysis of images, using operators and functionals based on topological and geometrical concepts. During the last decade, it has acquired a special status within the field of image processing, pattern recognition and, to a lesser extent, computer vision. The basic role of lattice theory for MM is no secret and can be gleaned from the deliberations below. The results in Section III.B.2 indicate that semigroup theory may eventually entertain similar relations with MM. Here we focus on two cornerstones of MM that tie in well with Section III.E: first on structural openings introduced by Matheron as early as 1967, and second on the theory of envelopes. 1. Structural Openings A key concept in MM is that of a Galois connection (δ, ε) between complete lattices L and L , although mathematical morphologists speak of an adjunction (δ, ε). Furthermore, a residuated map (complete join homomorphism)
LULU THEORY, IDEMPOTENT STACK FILTERS
129
δ : L → L is called a dilation, and ε : L → L is an erosion. These terms are motivated by the specific geometric meaning when L = P (S) is a powerset lattice. Example 6. Consider the 2 × 2 square A = {(0, 0), (0, 1), (1, 0), (1, 1)}, a subset of Z2 . Each X ⊆ Z2 can be viewed as a black-and-white image, the elements (i, j ) of X being the black pixels, and Z2 \ X being the white background. The Minkowski sum of X and A is X ⊕ A := X(0, 0) ∪ X(0, 1) ∪ X(1, 0) ∪ X(1, 1), while the Minkowski subtraction is X $ A := X(0, 0) ∩ X(0, −1) ∩ X(−1, 0) ∩ X(−1, −1). Here, for example, X(1, 1) is defined as the translate X(1, 1) := (i, j ) + (1, 1) : (i, j ) ∈ X = (i + 1, j + 1) : (i, j ) ∈ X . As seen below, the mappings X → X ⊕ A and X → X $ A yield an adjunction (δA , εA ) on P (Z2 ). Hence the map δA εA (denoted X ◦ A in MM) must be an opening by Eq. (32) in III.B.3. It has a neat geometric meaning10 that lies at the root of MM (Heijmans, 1994; p. 89): δ A εA X is the union of all translates of A that are contained in X.
(37) This can all be stated in terms of PBFs, which represent our language of 2 choice. Namely, if x ∈ {0, 1}Z is the characteristic function corresponding to the image X ⊆ Z2 , and if y is the characteristic function corresponding to X ⊕ A, then y(i,j ) = 1 ⇔ (i, j ) ∈ X(0, 0) ∪ X(0, 1) ∪ X(1, 0) ∪ X(1, 1) ⇔ x(i,j ) = 1 or x(i,j −1) = 1 or x(i−1,j ) = 1, or x(i−1,j −1) = 1. Hence, switching from X to x, the definition of δA X becomes (δA x)(i,j ) := x(i,j ) ∨ x(i,j −1) ∨ x(i−1,j ) ∨ x(i−1,j −1) . Similarly, (εA x)(i,j ) := x(i,j ) ∧ x(i+1,j ) ∧ x(i,j +1) ∧ x(i+1,j +1) . It follows that (δA εA x)(i,j ) = (εA x)(i,j ) ∨ (εA x)(i,j −1) ∨ (εA x)(i−1,j ) ∨ (εA x)(i−1,j −1) 10 We dispense with the slightly less attractive meaning of the closing ε δ X (denoted X • A in A A MM).
130
ROHWER AND WILD
= (x(i,j ) ∧ x(i+1,j ) ∧ x(i,j +1) ∧ x(i+1,j +1) ) ∨ (x(i,j −1) ∧ x(i+1,j −1) ∧ x(i,j ) ∧ x(i+1,j ) ) ∨ (x(i−1,j ) ∧ x(i,j ) ∧ x(i−1,j +1) ∧ x(i,j +1) ∨ (x(i−1,j −1) ∧ x(i,j −1) ∧ x(i−1,j ) ∧ x(i,j ) ). Putting
C := A = A(0, 0), A(0, −1), A(−1, 0), A(−1, −1) ,
we may condense this formula to (δA εA x)(i,j ) =
x(k,)+(i,j ) ,
(38)
(39)
B∈C (k,)∈B
which is reminiscent of the formula in Theorem 14 (III.A.3). Thus δA εA may be considered as a stack filter—not of type RZ → RZ but of type {0, 1}Z2 → {0, 1}Z2 (see III.E.2 for a concise definition that extends both). Setting
C (i, j ) := B(i, j ) | B ∈ C , (i, j ) ∈ Z2 , (40) we may rephrase Eq. (39) like this: (δA εA x)(i,j ) =
x(g,h) .
(41)
C∈C ((i,j )) (g,h)∈C
In particular, (δA εA x)(i,j ) = 1 iff there is some C in C ((i, j )) that has all x(g,h) = 1 (i.e., iff C is contained in X). On the other hand, by Eqs. (38) and (40), the elements of C ((i, j )) are precisely those translates of A that contain (i, j ) [because the elements of C = C ((0, 0)) are precisely the translates of A that contain (0, 0)]. Thus we have proven Fact (37) by means of a typical example. Adhering to PBFs may have stretched our proof compared with the standard MM argument, but the PBF view of MM operators more than pays off in III.E.3 and III.E.4. Let us replace Z2 by an arbitrary Abelian group S = (S, +, −). For X ⊆ S and s ∈ S the translate X(s) can then be defined as in Example 6. Similar11 to III.A.3, an operator Φ on P (S) is translation invariant (Heijmans, 1994; p. 73) if Φ(X(s)) = (ΦX)(s). Let A ⊆ S be any subset, called a structuring element. Generalizing Example 6, consider the self-map δA of P (S) defined by {xs−t : t ∈ A}. (δA x)s := 11 However, horizontally translation invariant as in III.A.3 makes no sense for operators P(S) → P(S). See III.E.2 for an encompassing concept of translation invariance.
LULU THEORY, IDEMPOTENT STACK FILTERS
131
According to Fact (35) in III.B.3, δA is a dilation. We claim that (εA y)s := {ys+t : t ∈ A} is the corresponding (unique) erosion. Indeed, suppose that δA x ≤ y, so {xs−t : t ∈ A} ≤ ys for all s ∈ S. (42) If we had x ε(y), then xs {ys+t : t ∈ A} for some s ∈ S. Hence xs ys+t0 for some s ∈ S, t0 ∈ A. But xs = x(s+t0 )−t0 must be ≤ ys+t0 according to Eq. (42). This contradiction shows that x ≤ εA y. Similarly, x ≤ εA y entails δA x ≤ y. Thus (δA , εA ) is an adjunction. One calls δA εA the structural opening induced by A. This argument works for any RS (R lattice), not just {0, 1}S = P (S). The following also works for arbitrary lattices RS . If A ⊆ S is a structuring element and C (s) (s ∈ S) is defined as the set of all translates of A that contain s, then trivially
(43) (∀s ∈ S) ∀C ∈ C (s) , (∀t ∈ C) C ∈ C (t). Not so obvious, but mutatis mutandis arguing as in Eq. (41), for all s ∈ S holds xt . (δA εA x)s = (44) C∈C (s) t∈C
For later use (III.E.3) we record the following. Given any collection of clusters C (s)(s ∈ S) that satisfy Eq. (43), there is a fixed subset A ⊆ S, such that each C (s) comprises exactly the translates of A that contain s. Namely, A can be taken as any element of C (0). Letting RS = RZ and looking at our proton and electron , we understand the deeper necessity for plus and minus signs occurring in ( x)s = xs ∨ xs+1 , respectively ( x)s = xs ∧ xs−1 (see II.A.1). Here the underlying structuring element A = {0, −1} ⊆ Z is a 1D line segment. Although arguably the most natural structuring elements, line segments are of minor interest12 to MM, which focuses on higher dimensions. Note that our two sample structuring elements A contained the zero of the Abelian group S = Z2 , respectively, S = Z. The effect is that εA ≤ I ≤ δA . If 0∈ / A, this is false, but always δA εA ≤ I ≤ εA δA [by Eq. (32) in III.B.3]. 12 For instance, the concept of a “strong filter” (Section III.C.2) is essential in both MM and LULU theory. The important fact that (δA εA )(εA δA ) is a strong filter whenever A is a line segment, is mentioned without proof in Serra (1982; p. 175). A proof was given in Wild (2005; Theorem 7), see also Section III.E.5. Similarly, the fact that a structural opening and closing induced by a line segment form a Matheron pair is mentioned without proof in Heijmans (1994; p. 413). See Section III.D.1 for a proof and more.
132
ROHWER AND WILD
Let us return to powersets. By Eq. (36) each adjunction on a power set P (S) is due to an underlying binary relation. More can be said when S is an Abelian group and translation invariance is postulated (Matheron): The translation invariant local13 adjunctions on P (S) are exactly the ones of type (δA , εA ) with A ⊆ S finite. (45) The translation invariant local openings α on P (S) are exactly the suprema γ = A∈A δA εA , where both A and all A ∈ A are finite. (46) Proof of Fact (45). By Eq. (35) in III.B.3 and in view of P (S) % {0, 1}S , δ is a ∨-homomorphism {0, 1}S → {0, 1}S , and whence each component function x → (δx)s is a ∨-homomorphism from {0, 1}S to {0, 1}. What does such a function look like? Locality ensures that for each s ∈ S there is a finite set As ⊆ S such that (δx)s = {xs−t : t ∈ As } for all x ∈ {0, 1}S . By translation invariance, As = A for all s ∈ S, and it follows that δ = δA . As to Fact (46), we shall prove, and even generalize it along the way. To start with, any supremum of openings is easily seen to be an opening again. Furthermore, due to the finiteness conditions, the locality and translation invariance of all δA εA (A ∈ A) carry over to γ = A∈A δA εA . Here is an example. Example 7. Put S := Z8 = {0, 1, . . . , 7}. So addition and subtraction is modulo 8 (e.g., 2 − 7 = −5 = 3). If the structuring element is A := {1, 3, 4, 7}, and α := δA εA , then (αx)0 = (εA x)−1 ∨ (εA x)−3 ∨ (εA x)−4 ∨ (εA x)−7 = (x0 ∧ x2 ∧ x3 ∧ x6 ) ∨ (x−2 ∧ x0 ∧ x1 ∧ x4 ) ∨ (x−3 ∧ x−1 ∧ x0 ∧ x3 ) ∨ (x−6 ∧ x−4 ∧ x−3 ∧ x0 ). For A = {3, 5, 6} and α := δA εA one obtains similarly (α x)0 = (x0 ∧ x2 ∧ x3 ) ∨ (x−2 ∧ x0 ∧ x1 ) ∨ (x−3 ∧ x−1 ∧ x0 ). 13 When the word local is dropped in Facts (45) or (46), the statements remain true (Heijmans, 1994), provided the structuring elements A, as well as the family A, are allowed to be infinite. Locality allows us to stay with standard (calculable) PBFs and dispense with Minkowski sums and subtractions; which are not even standardized within MM (Heijmans, 1994; p. 84). More information on the calculability of the PBF approach follows in Sections III.E.3 and III.E.4.
LULU THEORY, IDEMPOTENT STACK FILTERS
133
Similarly, A
= {2, 4} gives α
:= δA
εA
with (α
x)0 = (x0 ∧ x−2 ) ∨ (x2 ∧ x0 ). With α, α , α
also γ := α ∨ α ∨ α
is an opening. In view of various cancellations, for example, x0 ∧ x2 ∧ x3 ≤ x0 ∧ x2 for all x0 , x2 , x3 ∈ {0, 1}, the formula for γ simplifies to: (γ x)0 = (x0 ∧ x−2 ) ∨ (x0 ∧ x2 ) ∨ (x0 ∧ x−1 ∧ x−3 ) ∨ (x0 ∧ x−3 ∧ x−4 ∧ x−6 ).
(47)
We shall write B + s rather than B(s) for the translate of a set B. Since γ is translation invariant, its family of s-leaves is C (s) = {B + s | B ∈ C (0)} for all s ∈ Z8 , and so it suffices to give C (0), which is {{0, 6}, {0, 2}, {0, 7, 5}, {0, 5, 4, 2}}. Let us scan the elements of the 3-leaf C := {0, 5, 4, 2} + 3 = {3, 0, 7, 5}. The element 7 ∈ C admits a 7-leaf, which is contained in C, namely Γ7 = {7, 5}. Similarly, the 0-leaf Γ0 = {0, 7, 5} is contained in C. From Example 7, it is fairly clear14 that one type of translation-invariant local opening γ is provided by stack filters P (S) → P (S), whose DNF cluster satisfies this “topological” property:
(∀s ∈ S) ∀C ∈ C (s) , (∀i ∈ C) ∃Γi ∈ C (i) Γi ⊆ C. (48) So much for obtaining some translation-invariant local openings on P (S). That there cannot be others (even for many lattices R S = P (S)) will be an instance of Corollary 28 in III.E.3. 2. Envelopes Another cornerstone of MM is the theory of envelopes, spearheaded by Matheron in the early 1980s. We draw on Serra (1994),15 and restrict ourselves to the theoretic essence. For a deeper understanding of how this relates to morphological filtering, see Serra (1994) and III.E.3. These are the key definitions: Let L be a complete lattice. An increasing operator Φ : L → L is an underfilter if Φ ◦ Φ ≤ Φ, and an overfilter if Φ◦Φ ≥ Φ. Any increasing idempotent operator Φ = Φ 2 is a (morphological) 14 That conversely any translation-invariant stack filter satisfying (48) is a supremum of structural openings (and hence a local opening) is plausible by reading Example 7 backwards (see also Wild, 2003; Example 14). 15 Despite some annoying typographical errors, this is a clearer introduction to envelopes than Heijmans (1994, Chapter 12) or the encyclopedic Serra (1982, Chapters 6 and 7).
134
ROHWER AND WILD
filter.16 Special types of underfilters are the ∨-underfilters, defined by Φ ◦ (Φ ∨ I ) = Φ, and special types of overfilters are the ∧-overfilters defined by Φ ◦(Φ ∧I ) = Φ. An idempotent ∨-underfilter is a ∨-filter, and an idempotent ∧-overfilter is a ∧-filter. If Φ is both a ∨-underfilter and a ∧-overfilter, then Φ is a strong filter. See III.E.5 for a Venn diagram illustration. From II.A.4 recall the definition of a closure system. On the lattice LL of all self-maps α : L → L we investigate five crucial closure systems: (1) (2) (3) (4) (5)
{all increasing maps L → L}; {all underfilters L → L}; {all ∨-underfilters L → L}; {all closings L → L}; {all extensive maps L → L}.
For example, to see that (3) constitutes a closure system, consider any family Φi of ∨-underfilters (i ∈ I ). Then Φ := i∈I Φi is a ∨-underfilter because it is increasing (by (1)) and thus Φ(Φ ∨ I ) ≥ ΦI = Φ. On the other hand, Φi (Φ ∨ I ) ≤ Φi (Φi ∨ I ) = Φi (i ∈ I ), which yields
Φi (Φ ∨ I ) ≤ Φ(Φ ∨ I ) = Φi = Φ. i∈I
i∈I
Denoting by Inc(α), F (α), F∨ (α), αˆ the closure operators (envelopes in Serra’s terminology) corresponding to (1), (2), (3), and (4), it is clear that ˆ α ≤ Inc(α) ≤ F (α) ≤ F∨ (α) ≤ α,
α ≤ I ∨ α (α ∈ L).
The envelope for (5) is generally incomparable to the four above, but it is the easiest to compute; obviously I ∨α is the least among all extensive maps ≥ α. Observe that (1) is self-dual in that it constitutes a dual-closure system as well. Dualizing the other four yields overfilters, ∧-overfilters, openings, and anti-extensive maps, respectively. The notation for the corresponding envelopes is self-explanatory: αˇ ≤ G∧ (α) ≤ G(α) ≤ Inc (α) ≤ α,
I ∧ α ≤ α (α ∈ L).
It follows at once from the definitions that: Inv(F ) ∩ Inv(G) = {underfilters} ∩ {overfilters} = {filters}, Inv(F∨ ) ∩ Inv(G∧ ) = {∨-underfilters} ∩ {∧-overfilters} = {strong filters}. When α = Φ is increasing, the following formulas hold: F (Φ) = Φˆ ◦ Φ,
ˆ F∨ (Φ) = Φ ◦ Φ,
16 Notice that speaking of a stack filter Φ does not entail that Φ is idempotent, neither is this implied for the many other types of filters that occur in signal processing. Only MM seems to make the link filter ↔ idempotent.
LULU THEORY, IDEMPOTENT STACK FILTERS
G(Φ) = Φˇ ◦ Φ,
ˇ G∧ (Φ) = Φ ◦ Φ.
135 (49)
Moreover, in practice, Ran(Φ) is finite and one has Φˆ = (I ∨ Φ)n ,
Φˇ = (I ∧ Φ)n
(50)
for n large enough. Fact (49) goes further still: If one successively calculates Φ → F (Φ) → G ◦ F (Φ), then not only F (Φ) is underpotent and GF (Φ) superpotent, but also operator G does not destroy the underpotence inherited from the previous step Φ → F (Φ). Therefore F GF = GF . But this entails that G ≤ F constitutes a Matheron pair (III.B.2). In particular, Inv(GF ) = Inv(F ) ∩ Inv(G), that is, only the idempotent maps are fixed by GF . Let us say that an increasing map Φ : L → L induces a closing (respectively opening) if I ∨ Φ is a closing (respectively I ∧ Φ an opening). In other words, one can use n = 1 in Eq. (50). It is a one-liner to verify that each ∨-underfilter induces a closing, and each ∧-overfilter an opening. We cite Serra (1994; p. 6): To say the truth, it is precisely this property17 that has suggested to investigate the classes of ∨-underfilters and ∧-overfilters. Again, the associated envelopes F∨ and G∧ form a Matheron pair, just as F and G already do. Hence, for each increasing Φ, one has F∨ G∧ (Φ) ≤ G∧ F∨ (Φ) and both are strong filters. Under suitable extra conditions on Φ and the underlying lattice L (modularity), one can guarantee that already G∧ (Φ) (dually F∨ (Φ)) are strong filters. Independent of L, strong filters can also be characterized by either of these two properties: (∀x, y ∈ L) x ∧ Φ(x) ≤ y ≤ x ∨ Φ(x) ⇒ Φ(x) = Φ(y) , ˆ Φ = (I ∨ Φ)(I ∧ Φ) = (I ∧ Φ)(I ∨ Φ) = Φˆ Φˇ = Φˇ Φ. The latter says that Φ can be decomposed into the commutative product of its opening and closing envelope. D. The Fine Structure of LULU Operators The general results of III.B suffice to establish that the semigroup B(M, N) generated by L1 , . . . , LM , U1 , . . . , UN is a band: all possible compositions of these operators are idempotent. Section III.D.1, which is dedicated to the fine structure of B(M, N), admittedly is more for the semigroup enthusiast than for the digital filter practitioner. But in III.D.2 we polish the proof of Rohwer and Wild (2002, Theorem 3.13), which gives Inv(Un ) and Inv(Ln ), that is, the root signals of Un and Ln . The structure of Inv(F ) for arbitrary LULU operators Φ then follows easily from general results derived in Section III.B. 17 We may add: It does not take a ∨-underfilter to induce a closing; see Section III.E.3.
136
ROHWER AND WILD
1. The band B(M, N) The basic LULU operators Ln and Un comprise an equal number of electrons and protons . From the fact that ( , ) is a Galois connection on RZ , it follows from Fact (34) in Section III.B.3 that all LULU operators are idempotent. By mere semigroup theory, Eq. (21) in III.B.1, it follows further that the band B(M, N) generated by L1 , . . . , LM , U1 , . . . , UN is finite. By III.B.1 it must be a semilattice of rectangular bands. Here is the key to the fine structure of B(M, N). Theorem 24. n n+1 n n+1
n n+1 = and n n+1 = (n ≥ 0).
= and = Proof.18 The case n = 0 is to be interpreted as , and these relations hold for every Galois connection. So let n ≥ 1. To prove the first equality (the second is dual), it suffices by Theorem 15 of III.A.3 to consider 0, 1-series x. n n+1 x Thus we show that for every 0, 1-series x the 0, 1-series z := is fixed by L1 = . Case 1. z contains no isolated 1s. In other words, whenever zi = 1, then either zi+1 = 1 or zi−1 = 1. This forces (L1 z)i = (zi−1 ∧ zi ) ∨ (zi ∧ zi+1 ) = zi . When zi = 0, then (L1 z)i = 0 ∨ 0 = zi . Thus L1 z = z. Case 2. z contains isolated 1s (we shall show that this is impossible). Say w.l.o.g. z0 = 1 and z−1 = z1 = 0. That nothing about the values zi (i = 0, −1, 1) is assumed, is indicated by “?” below. n n+1 n+1
x= x= x=
−n −1 0 1 ? ? ? ? ··· 0 1 0 ? ⇓ ? 0 1 1 ··· 1 1 0 ? ⇓ ? 0 0 0 ··· 0 0 1 ?
18 A proof of Theorem 24 using lattice distributivity was given in Rohwer and Wild (2002, lemma 4.6). Here we give a simpler argument but indulge in distributivity in Section III.E.3.
LULU THEORY, IDEMPOTENT STACK FILTERS
Setting y :=
n+1
x we have z =
n
137
y and can argue like this:
1 = z0 = y0 ∧ y−1 ∧ · · · ∧ y−n+1 ∧ y−n ⇒y0 = · · · = y−n = 1, y−1 ∧ · · ·· · · ∧ y−n ∧ y−n−1 ⇒y−n−1 = 0, 0 = z−1 = ∧ y−n+1 ⇒y1 = 0. 0 = z1 =y1 ∧ y0 ∧ · · · We proceed similarly and conclude: 0 = y−n−1 = x−n−1 ∨ x−n ∨ · · · ∨ x0 ⇒ x−n−1 = · · · = x0 = 0, = x−n ∨ · · · ∨ x0 ∨ x1 ⇒ x1 = 1. 1 = y−n Now x1 = 1 implies y1 = x1 ∨· · ·∨xn+2 = 1, which is a desired contradiction to y1 = 0. Now that Theorem 24 is established, much of the remainder of this section again follows from general semigroup theory. For a start, from Lemma 17 in III.B.1, we get that Lm Lm = Lm∨m
Lm Un Lm = Un Lm
and Un Un = Un∨n ,
(m ≤ m),
(n ≤ n), (52) Lemma 17 further entails that each LULU operator Φ can be written in the form Φ = (Un1 Lm1 Un2 · · ·) or
U n Lm U n = L m U n
(51)
Φ = (Lm1 Un1 Lm2 · · ·),
(m1 > m2 > · · · , n1 > n2 · · ·).
(53)
Now Theorem 18 in III.B.1 implies that the semigroup B(M, N) generated by the Lm s and Un s is a band. With some effort, it turns out (Rohwer and Wild, 2002; Corollary 6.3) that in our particular setting distinct19 forms always yield distinct functions RZ → RZ . Hence, Theorem 18 also yields:
B(M, N) = M + N + 2 − 2. (54) N +1 To achieve the fine structure of the band B(M, N), first observe that B(M, N) is an ordered band by the remark following Eq. (28) in III.B.2. Define the rank of a LULU operator Φ as follows. Write it in canonical form Eq. (53) and put rank(Φ) := (m1 , n1 ). Furthermore, set rank(Lm ) := (m, 0) and rank(Un ) := (0, n). Let T := (m, n) | 0 ≤ m ≤ M, 0 ≤ n ≤ N − (0, 0) 19 For the semigroup expert, this amounts to saying that B(M, N ) is ismorphic to the relatively free semigroup subject to the relations in Eqs. (51) and (52). See McKenzie et al. (1987).
138
ROHWER AND WILD
be the semilattice defined by (m, n) ∨ (m , n ) := (m ∨ m , n ∨ n ), and let B(m,n) ⊆ B(M, N) be the set of all LULU operators of rank (m, n). For instance, the band B(2, 2) is partitioned as follows (Figure 24):
F IGURE 24.
The band B(2,2) as a semilattice of right zero semigroups.
In particular, B(2,2) is just the LULU interval [U2 L2 , L2 U2 ] depicted in II.A.5 (dropping the non-LULU operators M2 and B2 ). In general, again by Theorem 18, each B(m, n) is a right zero semigroup, and hence satisfies ΦRΨ
for all Φ, Ψ ∈ B(m,n) .
(55)
In fact, proving Φ ◦ Ψ = Ψ is not hard. Since all factors Un and Lm of Φ have n ≤ n, m ≤ m, by Eq. (52) they are canceled one by one by the two leading factors of Ψ . For instance, looking at (m, n) = (4, 3) we have Φ ◦ Ψ = (U3 L3 U1 )(L4 U3 L2 ) = (U3 L3 )(L4 U2 L2 ) = U3 (L4 U3 L2 ) = L4 U3 L2 = Ψ. 2. The Invariant Series of a LULU Operator We next describe the invariant series (root signals) of LULU operators and begin with some terminology. Given x ∈ RZ , say that {xj , xj +1 , . . . , xj +k } is a descent of length k if xj > xj +1 > · · · > xj +k and xj −1 ≤ xj and xj +k ≤ xj +k+1 . Notice that a descent of length k (k ≥ 1 by definition) has cardinality k + 1. Analogously we define an ascent, and a plateau of length k ≥ 1 (having xj −1 = xj = xj +1 = · · · = xj +k = xj +k+1 ). Each x ∈ RZ decomposes uniquely in descents, ascents, and plateaus; any two adjacent pieces intersect in precisely one element. A cup is either a descent followed by an ascent, or a descent followed by a plateau followed by an ascent. A series x ∈ RZ has n-large cups if each cup of x has a plateau of length at least n. To see that each series x with n-large cups is invariant under Un , i.e., Un x = x, fix i and distinguish three cases.
LULU THEORY, IDEMPOTENT STACK FILTERS
139
Case 1. xi belongs to a descent; say it is one of xj > xj +1 > · · · > xt . If this descent is part of a cup, then its plateau has cardinality ≥ n + 1. Otherwise, we fare even better; so on the right in any case xt ≥ xt+1 ≥ · · · ≥ xt+n . This implies ( n x)i = xi ∨ xi+1 ∨ · · · ∨ xi+n = xi and hence (with obvious notation) n n
n
(Un x)i = x i∧ x i−1 ∧ · · · ∧ x i−n = xi ∧ (≥ xi ) ∧ · · · ∧ (≥ xi ) = xi . Case 2. xi belongs to an ascent. Looking to the left of the ascent, one now argues that xi−n ≤ xi−n+1 ≤ · · · ≤ xi and whence (Un x)i = xi−n ∨· · ·∨xi = xi . Case 3. xi belongs to a plateau. Subcase 1. To the right of the plateau is a descent. Then (Un x)i = xi ∨ · · · ∨ xi+n = xi . Subcase 2. To the left of the plateau is an ascent. Then (Un x)i = xi−n ∨ · · · ∨ xi = xi . Subcase 3. To the left of the plateau is a descent and to the right an ascent. Then the plateau belongs to a cup, so xj = xj +1 = · · · = xi = · · · = xj +k (k ≥ n) and again (Un x)i = xj ∨ · · · ∨ xj +n = xi . Theorem 25 below states that Un has no other invariant series. In order to state a dual result for Ln we define a cap as an ascent followed by a descent, or an ascent followed by a plateau, followed by a descent. Say that y ∈ RZ has n-large caps if each cap of y has a plateau of length at least n. The following is based on Rohwer and Wild (2002, Theorem 3.13) with a polished proof. Theorem 25. The invariant series of Un , respectively Ln , are precisely the series with n-large cups, respectively n-large caps. Example 8. Figure 25 displays a series x that has 2-large cups but not 2large caps. Hence U2 x = x but L2 x = x (Figure 25). Proof of Theorem 25. In order to prove the claim about Un (Ln works dually) we show, for any x ∈ RZ , the equivalence of these three statements: (1) x ∈ Inv(Un ) (2) From i < j < k and xi > xj < xk follows k − i ≥ n + 2 (3) x has n-large cups To see (1) ⇒ (2), let x ∈ Inv(Un ) and suppose there were xi > xj < xk with k ≤ i + n + 1. Proceeding from j − n to the right, passing i, i + 1, and going up to j , we have
140
ROHWER AND WILD
F IGURE 25.
i ∈ [j − n, j ],
...,
A segment of a series with 2-large cups.
i ∈ [i, i + n],
k ∈ [i + 1, i + 1 + n],
...,
k ∈ [j, j + n] and hence n
x j −n ≥ xi ,
...,
n
n
x i ≥ xi , x i+1 ≥ xk , n
x j ≥ xk ,
...,
which yields the contradiction n n
n
n
x j = x j ∧ ··· ∧ x j −n ≥ xi ∧ xk > xj . (Un x)j = It is clear that (2) ⇒ (3) because a cup with short plateau xj = xj +1 = · · · = xj +k (k ≤ n − 1) triggers xj −1 > xj < xj +k+1 with (j + k + 1) − (j − 1) ≤ n + 1. The implication (3) ⇒ (1) was proven beforehand in Cases 1 to 3. With the convention that every x ∈ RZ has 0-large caps, and 0-large caps, define Cap(n), respectively Cup(n) as the set of series x with n-large caps, respectively n-large cups (n ≥ 0). As a side remark, since Un is a closing, it follows from Theorem 25 and II.A.4 that for any x, y ∈ Cup(n) = Inv(Un ), also x ∧ y ∈ Cup(n) (dually for Cap(n)). (It is a fine exercise to prove that directly.) A moment’s thought shows that Cap(n) ∩ Cup(n) coincides with the set Mn of n-monotone series introduced in II.A.5. Having settled the technical Theorem 25, the remainder essentially reduces to the fact that Lm , Un form a Matheron pair:
LULU THEORY, IDEMPOTENT STACK FILTERS
141
Theorem 26 (Rohwer and Wild (2002, Theorem 5.7)). Let Φ ∈ B(M, N) have rank(Φ) = (m, n). Then Inv(Φ) = Cap(m) ∩ Cup(n). Proof. Having equal rank, both Φ and Lm Un lie in B(m,n) . It follows from Eqs. (55) and (27) in III.B.2 that Inv(Φ) = Inv(Lm Un ). But Inv(Lm Un ) = Inv(Lm ) ∩ Inv(Un ) by Theorem 23 in III.B.2 in view of Lm < Un and Lm Un RUn Lm . Thus the claim follows from Theorem 25. E. Lattice Stack Filters Section III.E.1 extends PBFs from linearly ordered sets (III.A.1) to arbitrary lattices R. Accordingly, the classic stack filters RZ → RZ from III.A.3 will be generalized to lattice stack filters RS → RS in Section III.E.2. The fact that S is any set, not necessarily an Abelian group, entails that in addition to the groundset R, translation invariance is also eliminated. Most of III.E.2 is devoted to the two worlds of lattice stack filters alluded to in the introduction to III. The most complex mathematical sections in the whole article are probably those on idempotency (III.E.3) and co-idempotency (III.E.4). In Section III.E.3 the lattice R needs to be distributive; in III.E.4 it is linearly ordered and endowed with a compatible addition operation. Section III.E.5 refreshes with some concrete calculations. 1. Positive Boolean Functions on Arbitrary Lattices Note that one can evaluate the right-hand side of Eqs. (5) and (6) in III.A.1 not only for xi coming from {0, 1} or some other linearly ordered set, but in fact for xi coming from any lattice R. Thus we call a function b : Rn → R a positive Boolean function on R if b(x) = b(x1 , . . . , xn ) is obtained by evaluating a fixed lattice term,20 that is, any well-built formula involving xi and the lattice operations ∨ and ∧. The immediate consequence is that b needs no longer be a selector when R is not linearly ordered. However, b(c, . . . , c) = c for all c ∈ R still holds. If R happens to be distributive (say R = R), every PBF on R can be brought in disjuntive/conjunctive normal form as in III.A.2. Example 9. Consider the PBF
b(x) = b(x0 , x1 , x2 , x3 ) := x1 ∧ (x0 ∨ x2 ) ∨ x3 .
20 The intuitive understanding of lattice term will suffice (e.g., x ∨ x ∧ is none), the precise 1 2 recursive definition is taken care of by universal algebra (McKenzie et al., 1987). In fact, rather than “PBF on R,” the correct name would be “lattice-term function.” We opted for the former to keep in touch with “classic” PBFs of type {0, 1}n → {0, 1}.
142
ROHWER AND WILD
For any lattice R, this is a function from R4 to R. If R is distributive, then b can be rewritten in disjunctive or conjunctive normal form: b(x) = (x1 ∧ x0 ) ∨ (x1 ∧ x2 ) ∨ x3
(DNF),
b(x) = (x1 ∨ x3 ) ∧ (x0 ∨ x2 ∨ x3 )
(CNF).
Thus, referring to III.A.1, C = {{1, 0}, {1, 2}, {3}} and D = {{1, 3}, {0, 2, 3}}. Is there an intrinsic characterization of PBFs on an lattice R, which is reminiscent to Fact (8) in III.A.1 (Wild, 2003, p. 608)? For many distributive lattices R, there is Wild (2003, p. 608). In particular, for all finite distributive R. Namely, it turns out that b : Rn → R is a PBF on R if and only if b is increasing and commutes with all thresholdings σa : R → R defined by 1, if c ≥ a, σa (c) := 0, if c a, where a ∈ R is join irreducible and where 0, 1 ∈ R are the bottom and top elements. Example 10. A PBF on a finite distributive lattice need not commute with all increasing maps. In fact, take R := {0, α, β, 1} with α ∧ β = 0, α ∨ β = 1. Then b(x1 , x2 ) := x1 ∨ x2 does not commute with the thresholding σ1 associated with the join reducible element 1:
b σ1 (α), σ1 (β) = b(0, 0) = 0, but σ1 b(α, β) = σ1 (1) = 1. As another example, consider the function b : R3 → R defined by 0, if (x1 , x2 , x3 ) = (α, 0, x3 ), b(x1 , x2 , x3 ) := x1 , otherwise. One checks that b is increasing and satisfies b(c, c, c) = c for all c ∈ R. Yet b is not a PBF on R because it does not commute with σα :
b σα (α), σα (0), σα (0) = b(1, 0, 0) = 1 = 0 = σα (0) = σα b(α, 0, 0) . 2. Lattice Stack Filters, Nonlinear Systems, and Image Algebra Let R be any lattice, and S any nonempty index set. In this section, we indicate the relevance of lattices of type RS for image processing, respectively, nonlinear dynamical systems. First we introduce the kind of operator Φ that accompanies RS . Namely, associated with each s ∈ S there are indices a[s, 1], . . . , a[s, ns ] in S, and a PBF bs : Rns → R. The corresponding lattice stack filter (LSF)
LULU THEORY, IDEMPOTENT STACK FILTERS
143
Φ : RS → RS is defined by (Φx)s := bs (xa[s,1] , xa[s,2] , . . . , xa[s,ns ] ),
(56)
for all s ∈ S. A lattice stack filter Φ is translation invariant if all functions bs (s ∈ S) coincide.21 In this case, it suffices to exhibit C (0) and D (0) because for all s ∈ S: C (s) = C + s: C ∈ C (0) , D(s) = D + s: D ∈ D(0) . A translation variant lattice stack filter Φ may have completely different DNFs for all s ∈ S, and the same applies for the CNFs: xi = xj . (Φx)s = (57) C∈C (s) i∈C
D∈D(s) j ∈D
Each C (s) will be called a DNF cluster and each D (s) a CNF cluster. As to image (or signal) processing, our lattice stack filters Φ : RS → S R obviously comprise the “classic” stack filters of Section III.A, and the “morphological” stack filters of III.C. In general, the elements of RS can be conceived as images in the broad sense of R-colored S-grids. Thus RZ in III.A is the class of all R-colored Z-grids (= gray-scale bi-infinite series), 2 and P (Z2 ) % {0, 1}Z in III.C contains all {0, 1}-colored Z2 -grids (= discrete black-and-white images). Actually, R may have a much richer structure than R or {0, 1} because it may itself be a lattice R = RT0 of images. Then RS is, so to speak, a lattice of movies (each movie containing |S| images). These ideas are more than musings; this is the heart of image algebra whose core concept of a template is just that: an image whose pixels are images themselves. Of course, this can be iterated further. In case R0 is distributive, RT0 will be distributive as well. We mention that because distributivity is the conditio sine qua non in III.E.3. We now stop image processing and come to lattice stack filters Φ alias dynamical systems. Details being currently explored, the following remarks are sketchy but hopefully convey a perspective. Broadly speaking, the elements of RS are viewed as states of a dynamical (nonlinear) system; if the system is in state x, then the successor state is Φx. The fixed points y ∈ Inv(Φ) are viewed as attractors. Actually, when RS = {0, 1}S , the above describes exactly a monotone Boolean network (MBN). Boolean networks (Kauffman, 1993) can model genetic regulatory networks, and it can be argued from 21 As mere PBFs they coincide indeed, but they are applied to shifted parts of x ∈ RS . In other words,
Φ is translation invariant if S = (S, +, −) is an Abelian group and for some fixed finite “window” T = {t1 , . . . , tn } ⊆ S and some fixed PBF b : Rn → R one has (Φx)s = b(xs+t1 , xs+t2 , . . . , xs+tn ) for all s ∈ S and x ∈ RS . This subsumes Eq. (13) in III.A.3 since each finite window contained in S = Z is contained in {−n, . . . , n} for n large enough.
144
ROHWER AND WILD
F IGURE 26.
The state-transition diagram of an idempotent nonlinear system.
biology to pay special attention to MBNs which, by definition, have all their underlying Boolean functions positive (Shmulevich et al., 2002). Cellular automata (Wolfram, 2002) and the cellular neural networks CNNs (Chua, 2002) provide similar paradigms. Most binary (discrete timed) CNNs arising in applications were found to be monotonic (Fajfar and Bratkovic, 1996), in which case they are MBNs (with the extra benefits of having more localized PBFs). Each MBN is a nonlinear systems based on ∨ and ∧. When RS = Rn and the focus is on ∨, + (dually ∧, +), we are dealing with discrete event systems (Bacelli et al., 1992) with their many applications in operations research. When all of ∧, ∨, + are involved, the mathematics becomes intricate; first strides were made by Gunawardena (1994). Not only classic stack filters (III.A.3), but all kinds of nonlinear systems long for idempotence (Gunawardena et al., 1998; Litinov and Maslov, 2005; Figure 26). With the above as background, distinguish the following problems: (1) Given an LSF (= nonlinear system) Φ, determine Inv(Φ) (2) Given an LSF (= nonlinear system) Φ, determine whether Φ 2 = Φ. The state-transition diagram D describing Φ by definition has vertex set RS and arcs (x, Φx). As to problem (1), one looks for the loops of D. As to problem (2), one decides whether or not D looks like Figure 26. While it is NP-complete (Corollary 1 in Yang and Zhao, 2004) to even decide whether a MBN Φ has a nontrivial attractor (the trivial attractors x = (0, 0, . . . , 0) and x = (1, 1, . . . , 1) are always present), surprisingly the idempotency of Φ can be tested in polynomial time. In fact, Section III.E.3 is dedicated to a characterization of idempotent LSFs where R may be a distributive lattice other than {0, 1}. Thus, only ∧, ∨ are involved here. But in Section III.E.4, which takes up co-idempotence (Section II) in a general setting, all of ∧, ∨, + take center stage. The link to nonlinear systems having been discussed, we shall concentrate on the mathematics in Section III.E.3 and III.E.4, with occasional comments on LULU theory and MM.
LULU THEORY, IDEMPOTENT STACK FILTERS
145
3. Idempotency of LSFs Section III.C.2 about envelopes gives an impression of how idempotency is handled in MM. While the Matheron theorems are elegant, MM features no class of parametrized operators for which parameter relations sufficient and necessary for their idempotency have been formulated. Here we show that the class of all LSFs RS → RS (R distributive) is of this kind. Before we proceed, let us argue that this class covers the core operators Φ investigated in MM. Indeed, most mathematical morphologists would agree that the core Φs are defined on lattices of type L = RS (R linearly ordered), that they are increasing, and that they “commute with contrast changes,” i.e., satisfy (C) in III.A.3. Thus, if Φ is local as well, it is just a LSF in disguise because RS can be substituted for RZ in Fact (20) in III.A.5 and Property (HT) can be dispensed with. From a purely computational point of view, there is no loss of generality in assuming locality. In practice, each operator is local since images have only finitely many pixels. Example 11. Recall from III.C.2 that every ∧-overfilter induces an opening. The translation invariant LSF Φ on RZ defined by (Φx)0 = (x−1 ∧ x0 ) ∨ x1 is not even an overfilter (exercise), but we claim it nevertheless induces an opening Φ ∧ I . Indeed,
(Φ ∧ I )x 0 = (x−1 ∧ x0 ) ∨ x1 ∧ x0 = (x−1 ∧ x0 ) ∨ (x1 ∧ x0 ), and so Φ ∧I = δA εA with structuring element A = {0, −1} ⊆ Z. Recall from III.C.1 that an operator of type L := δA εA must be an opening, independent of the underlying lattice R. The envelope Φˇ coinciding with Φ ∧ I , it follows from Eq. (49) in III.C.2 that Ψ := G∧ (Φ) = Φ ◦ Φˇ = Φ ◦ (Φ ∧ I ) = Φ ◦ L will be a ∧-overfilter. As a prelude to more abstract manipulations, let us assume that R is distributive and compute the DNF of Ψ :
(Ψ x)0 = Φ(Lx) 0 = (Lx)0 ∧ (Lx)−1 ∨ (Lx)1
= (x0 ∧ x−1 ) ∨ (x0 ∧ x1 ) ∧ (x−1 ∧ x−2 ) ∨ (x−1 ∧ x0 ) ∨ (x1 ∧ x0 ) ∨ (x1 ∧ x2 ) = (y−1,0 ∨ y0,1 ) ∧ (y−2,−1 ∨ y−1,0 ) ∨ (y0,1 ∨ y1,2 )
(where yi,j := xi ∧ xj )
= y−1,0 ∨ (y0,1 ∧ y−2,−1 ) ∨ y0,1 ∨ y1,2
(by distributivity, Eq. (7) in III.A.1)
= y−1,0 ∨ y0,1 ∨ y1,2
(since y0,1 ∧ y−2,−1 ≤ y0,1 )
146
ROHWER AND WILD
= (x−1 ∧ x0 ) ∨ (x0 ∧ x1 ) ∨ (x1 ∧ x2 ). If we did not know about the theory of envelopes, it would not be clear from the DNF above that Ψ is a ∧-overfilter. Repeating the kind of calculations above could verify that Ψ ◦ (Ψ ∧ I ) = Ψ , but this will come out as an instance of general DNF theory (see Example 12). We shall in fact intensely use the following consequence (proved by induction) of distributivity22 : (a1 ∨ a2 ∨ · · · ∨ ar ) ∧ (b1 ∨ b2 ∨ · · · ∨ bs ) ∧ · · · ∧ (c1 ∨ c2 ∨ · · · ∨ ct ) = (a1 ∧ b1 ∧ · · · ∧ c1 ) ∨ (a1 ∧ b1 ∧ · · · ∧ c2 ) ∨ · · · ∨ (ar ∧ bs ∧ · · · ct ) (58) Replacing ∨ by + and ∧ by · Eq. (58) is the familiar expansion of a product of sums in ordinary arithmetic. As opposed to arithmetic, observe that ∨ and ∧ may be switched in Eq. (58). We say that a transversal τ of a family {S (i) | i ∈ D} is a family {τ (i) | i ∈ D} such that (∀i ∈ D) τ (i) ∈ S (i). We allow that τ (j ) = τ (i) for j = i. Thus, if in Eq. (58) we take the S (i) (1 ≤ i ≤ n) to be the sets {a1 , . . . , ar } up to {c1 , . . . , ct }, then {τ (i) | 1 ≤ i ≤ n} = {a3 , b7 , . . . , c5 } is a transversal. It may happen that, for example b7 = c5 ; that is, τ (2) = τ (n). In the transversal calculus below, the τ (i)s are actually themselves sets, which may take a while to digest. So let R be a distributive lattice, S any (index) set, and Φ, Ψ lattice stack filters on RS with DNFs xi , xj (Ψ x)s = (59) (Φx)s = )∈D(s) j ∈)
C∈C (s) i∈C
(do not confuse Facts (57) and (59)]. Let us compute the DNF of the product Φ ◦ Ψ: (Ψ x)i Φ(Ψ x) s = C∈C (s) i∈C
=
C∈C (s)
=
xj
i∈C )∈D(i) j ∈)
C∈C (s)
i∈C )∈D(i)
y)
where y) :=
xj
j ∈)
22 It has been tried before in nonlinear filter theory to tackle idempotency by “using strictly logical calculus” (Dougherty and Haralick, 1992, p. 82). It seems fair to say that exploiting distributivity is the key to success.
LULU THEORY, IDEMPOTENT STACK FILTERS (58)
=
C∈C (s)
=
yτ (i)
τ transversal i∈C of {D(i)|i∈C}
C∈C (s) τ transversal of{D(i)|i∈C}
147
xj ,
j ∈A(C,τ )
where A(C, τ ) :=
τ (i) | i ∈ C .
By Theorem 14 in III.A.3 (adapted to the translation variant case in the obvious way), it follows that
Φ ◦ Ψ ≥ Φ ⇔ (∀s ∈ S) ∀C ∈ C (s) ∃C ∈ C (s) (∃τ )
(60) C ⊇ A(C, τ ) . Here, of course, τ is any map on C such that τ (i) ∈ D (i) for all i ∈ C. A subset G ⊆ S is C -good if G = ∅ and
(∀i ∈ G) ∃Γi ∈ C (i) , Γi ⊆ G. Using this terminology, recall that at the end of III.C.1, we argued that each translation invariant LSF γ = {0, 1}S → {0, 1}S , with C -good s-leaves is an opening. In fact, essentially the same reasoning shows that each LSF γ : RS → RS (R any lattice, translation variance allowed) with C -good sleaves is an opening. Question. Does every LSF-opening γ on RS have C -good s-leaves? The answer comes after Corollary 28. Theorem 27 (Wild, 2001; Corollary 5). Let R be a distributive lattice and S a nonempty set. Then a lattice stack filter Φ on RS with a clean DNF (57) is a ∧-overfilter if and only if
(I1) (∀s ∈ S) ∀C¯ ∈ C (s) C¯ is C -good. We stress that C¯ ∈ C (s) does not entail s ∈ C¯ in (I1). In view of Corollary 28, note, however, that s ∈ C¯ for all C¯ ∈ C (s) when Φ is assumed to be anti-extensive. Corollary 28 (Wild, 2001; Corollary 6). Let R be a distributive lattice, S a nonempty set, and Φ an anti-extensive lattice stack filter on RS with clean DNF (57). Then Φ is an opening if and only if it satisfies Eq. (I1).
148
ROHWER AND WILD
Proof of Corollary 28. By assumption Φ ≤ I , whence Φ ∧ I = Φ, whence the idempotency Φ ◦Φ = Φ is equivalent to Φ ◦(Φ ∧I ) = Φ. By Theorem 27, the latter amounts to Eq. (I1). As to the question preceding Theorem 27, by Corollary 28 the answer is yes provided R is distributive. In particular, the answer is yes for R = {0, 1}, and this completes the proof of Fact (46) in III.C.1. Proof of Theorem 27. Because every LSF Φ satisfies Φ◦(Φ∧I ) ≤ Φ◦I = Φ, it suffices to show that Fact (I1) is equivalent to Φ ◦ (Φ ∧ I ) ≥ Φ. Setting Ψ := Φ ∧ I we have (Ψ x)s = (Φx)s ∧ xs = xi . C∈C (s) i∈C∪{s}
In other words, the cluster of s-leaves in the DNF of Ψ is D(s) = B ∪ {s} | B ∈ C (s) (s ∈ S). In order to see that Fact (I1) is equivalent to the right-hand side (RHS) of Eq. (60), suppose first Fact (I1) holds. For given C¯ ∈ C (s), we need to find a ¯ Let C := C. ¯ For each i ∈ C, ¯ there is by Fact (I1) set A(C, τ ) contained in C. ¯ some Γi ∈ C (i) with Γi ⊆ C. Putting τ (i) := Γi ∪ {i} for all i ∈ C, we have ¯ This proves the RHS of Eq. (60). τ (i) ∈ D (i), and clearly A(C, τ ) ⊆ C. Conversely, suppose the RHS of Eq. (60) takes place. Fix C¯ ∈ C (s). By ¯ From i ∈ τ (i) (i ∈ C) assumption, there is some A(C, τ ) contained in C. ¯ follows that C ⊆ A(C, τ ) ⊆ C. Our DNFs being clean (III.A.1), we conclude ¯ Pick any i ∈ C. ¯ Then Γi ⊆ Γi ∪ {i} =: τ (i) ⊆ C¯ that C = A(C, τ ) = C. with Γi ∈ C (i). Hence Fact (I1) holds. The dual Φ d of a LSF Φ is obtained by just dualizing the underlying PBFs; if Φ has the DNFs (Φx)s = xi (s ∈ S), C∈C (s) i∈C
then Φ d has the CNFs d
xi Φ x s=
(s ∈ S).
C∈C (s) i∈C
In brief, C (s) remains but ∧ and ∨ are switched. As for Boolean functions, one verifies:
(Φ ∧ Ψ )d = Φ d ∨ Ψ d , (Φ ∨ Ψ )d = Φ d ∧ Ψ d , (Φ ◦ Ψ )d = Φ d ◦ Ψ d .
(61)
LULU THEORY, IDEMPOTENT STACK FILTERS
149
Corollary 29 (Wild, 2001, p. 1125). Let R be a distributive lattice and S a nonempty set. Then a lattice stack filter Φ on RS with a clean DNF and CNF as in Eq. (57) is a strong filter iff
(∀s ∈ S) ∀C ∈ C (s) C is C -good, (I1)
(∀s ∈ S) ∀D ∈ D (s) D is D -good. (I1)
Proof. In view of Eq. (61), one has Φ = Φ ◦ (Φ ∨ I )
⇔
Φd = Φd ◦ Φd ∧ I d = Φd ◦ Φd ∧ I ,
and thus the claim follows from Theorem 27.
Example 12. Recall from Example 11 that the LSF Ψ with DNF-cluster C (0) = {{−1, 0}, {0, 1}, {1, 2}} is a ∧-overfilter. This now also follows from Theorem 27 since all 0-leaves are obviously C -good. Let us determine the minimal transversals D of the set system C (0). Case 1. 0 ∈ D. Then D = {0, 1} and D = {0, 2} are minimal transversals. If −1 ∈ D, then D either contains {0, 1} or {0, 2}. Case 2. 0 ∈ / D. Then necessarily −1, 1 ∈ D, but {−1, 1} is already a minimal transversal. As mentioned in III.A.1, it follows that D (0) = {{0, 1}, {0, 2}, {−1, 1}}. Since none of these leaves is D -good, Φ is no ∨underfilter. Being a strong filter is better than just being an idempotent filter. How are the latter characterized in terms of their DNF? Setting Ψ := Φ in Eq. (60) one shows similarly to the proof of Theorem 27: Theorem 30 (Wild, 2001; Theorem 4). Let R be distributive lattice and S a nonempty set. Then a lattice stack filter Φ on RS with DNF as in (57) is an overfilter iff
(∀s ∈ S), ∀C¯ ∈ C (s) ∃C ∈ C (s) , (∀i ∈ C)
¯ (I2) ∃Γ ∈ C (i) Γ ⊆ C. Strange enough, while ∧-overfilters are described by the neat Condition (I1), ordinary overfilters must do with the “uglier brother” Condition (I2). Observe23 that Condition (I1) is the special case of Condition (I2) where C ¯ The only beauty discernible in Condition (I2) can always be chosen to be C. is perhaps that it matches well one neighborhood axiom of (proper) topology; 23 Condition (I2), as opposed to (I1), is essentially due to the fact that we do not have C ⊆ A(C, τ ) anymore. Accordingly a clean DNF is irrelevant in Theorem 30.
150
ROHWER AND WILD
see (V4) in Schubert (1975; p. 13). If we take as s-neighborhoods all supersets ¯ of s-leaves, then we can phrase (I2) like this: For each s-neighborhood C, ¯ there is a s-neighborhood C such that C is an i-neighborhood for all i ∈ C. Similar to Corollary 29, one shows: Corollary 31 (Wild, 2001; p. 1125). Let R be a distributive lattice and S a nonempty set. Then a lattice stack filter Φ on RS with DNF and CNF as in (54) is idempotent if and only if
∃C ∈ C (s) , (∀i ∈ C), (∀s ∈ S), ∀C¯ ∈ C (s) ,
¯ (I2) ∃Γ ∈ C (i) , Γ ⊆ C, (∀s ∈ S),
∃D ∈ D (s) , ∀D¯ ∈ D (s) ,
¯ ∃) ∈ D (i) , ) ⊆ D.
(∀i ∈ D), (I2)
When Φ is translation invariant, only C (0) and D (0) matter, and the idempotency of Φ can then be decided in polynomial time. 4. Co-Idempotency of LSFs In Section II we considered operators such as Ln Un − I . For an LSF Φ on RS an expression Φ − I only makes sense when an operation minus is defined. Thus we first consider lattice-ordered groups24 (L, ≤, ⊕), where (L, ≤) is a lattice and (L, ⊕) is an Abelian group such that for all a ∈ L the addition of a is an increasing map: b≤c
⇒
a ⊕ b ≤ a ⊕ c.
(62)
We again use 0 for the neutral element of (L, ⊕) and x¯ for the group inverse of x ∈ L, so x ⊕ x¯ = 0. It is convenient to write x $ y for x ⊕ y. ¯ It is standard theory (Birkhoff, 1979, p. 292) that L is infinite and that the compatibility of Eq. (10) of ≤ and ⊕ forces the lattice (L, ≤) to be distributive. Moreover, ⊕ distributes over joins and meets, xi = a ⊕ xi and a ⊕ xi = a ⊕ xi (63) a⊕ i∈C
i∈C
i∈C
i∈C
which enables us to maintain the transversal calculus of III.E.3. 24 This is the standard name for the past 60 years, although the author prefers to think of (L, ≤, ⊕) as a group-lattice, i.e., foremost a lattice with some extra structure. Now these tend to be called semirings, an even more unfortunate name (in our context) that wholly dismisses the lattice aspect. Notice that a lattice-ordered group is an ordered semigroup, but other than that, there is little connection between Sections III.B.2 and III.E.4.
LULU THEORY, IDEMPOTENT STACK FILTERS
151
Furthermore, the laws of de Morgan hold: ¯ a ∨ b = a¯ ∧ b,
and
a ∧ b = a¯ ∨ b¯
for all a, b ∈ L.
For any maps Φ, Ψ : L → L the sum Φ ⊕Ψ and difference Φ $Ψ are defined component-wise. The dual map Φ d is defined by ¯ Φ d (x) := Φ(x).
(64)
= I . The relations (61) also hold in the present context, In particular, along with some more (Wild, 2003; p. 618): Id
(Φ ⊕ Ψ )d = Φ d ⊕ Ψ d ,
(Φ $ Ψ )d = Φ d $ Ψ d
(65)
Call Φ co-idempotent if I $ Φ, which maps x to x $ Φx, is an idempotent self-map of L. Notice that co-idempotency is preserved by duality:
(65) I $ Φ d ◦ I $ Φ d = (I $ Φ)d ◦ (I $ Φ)d d (61) = (I $ Φ) ◦ (I $ Φ) = (I $ Φ)d = I $ Φ d .
(66)
The type of lattice-ordered group we are most interested in is L = RS , where R is some base lattice-ordered group and all operations in RS are declared component-wise. In particular, R being a distributive lattice, both Eq. (64) and the definition of the dual given in Section III.E.3 make sense. Fortunately, they coincide (Wild, 2003; p. 619). RS = RZ is our prime exponent. It satisfies Theorem 32 below in that R = (R, ≤, ⊕) is a linearly ordered group, that is, ≤ is a linear order. The essence of the next result is Wild (2003, Theorem 14) but the author managed to substitute the clumsy condition (C2) in Wild (2003, p. 621) with a nice CNF–DNF duality argument. Consider these conditions:
(CI) (∀s ∈ S), ∀C ∈ C (s) , (∃G ⊆ C), G is C -good;
∀D ∈ D (s) , (∃H ⊆ D), H is D -good. (CI) (∀s ∈ S), We emphasize that the set G in Condition (CI) need not be an element of C (s), and H not an element of D(s). Theorem 32. Let R be a lattice-ordered group and Φ a LSF on RS with DNF-clusters C (s), and CNF-clusters D (s) (s ∈ S). (1) Then (CI), (CI) are necessary for the co-idempotence of Φ. (2) In case R is linearly ordered, (CI) and (CI) are jointly sufficient for the co-idempotence of Φ.
152
ROHWER AND WILD
Proof. Just as in II.A.10, the co-idempotency of Φ is tantamount to Φ ◦ (I $ Φ) = 0, 0 : RS
(67)
RS
→ is the constant map x → 0. Using Eqs. (58) and (63), where one computes similarly to the calculation following (59) that for all s ∈ S and x ∈ RS (Wild, 2003; p. 621): yτ (C) . (68) Φ ◦ (I $ Φ)x s = τ transversal C∈C (s)
We explain the precise meaning of transversal here. It is any set of ordered pairs indexed by the leaves of C (s) that is of this sort: τ = (iC , ΓC ) | C ∈ C (s) with iC ∈ C and ΓC ∈ C (iC ). (69) In Eq. (68) we wrote τ (C) for (ic , Γc ); the meaning of yτ (C) is covered by the general definition yi,Γ := (xi $ xj ). j ∈Γ
As to Property (1), supposing that Φ is co-idempotent, let us verify (CI) and (CI) . We concentrate on (CI), afterward (CI) will follow by duality. In view of Eq. (67) the co-idempotency of Φ implies that the RHS of Eq. (68) is 0 for all s ∈ S and x ∈ RS . Because an infinimum of elements zj of a linearly ordered group is 0 iff (zj ≥ 0 for all j , and zj = 0 for at least one j ), it follows that: For all s ∈ S and x ∈ RS there is a transversal τ such that yτ (C) := (xiC $ xj ) = 0. (70) C∈C (s)
C∈C (s) j ∈ΓC
Assume that condition (CI) is false. Then there is s¯ ∈ S and w.l.o.g. C1 ∈ C (¯s ) = {C1 , . . . , Cn } such that C1 contains no good subsets G. We define a directed graph G as follows. Because G0 := C1 is not good, there is k1 ∈ G0 such that (∀Γ ∈ C (k1 ))Γ G0 . This yields the arcs k1 → j (Γ ∈ C (k1 ), j ∈ Γ − G0 ). Because G1 := G0 − {k1 } is not good, there is a k2 ∈ G1 such that (∀Γ ∈ C (k2 ))Γ G1 . This yields the arcs k2 → j (Γ ∈ C (k2 ), j ∈ Γ −G1 ). Continuing in this fashion, we arrive at a nongood subset Gm−1 = {km } that yields the arcs km → j (Γ ∈ C (km ), j ∈ Γ − Gm−1 ). Assuming that no directed cycle has been generated up to step i − 1, it is clear that by adding all arcs of type ki → j no directed cycle will arise. Hence G is an acyclic-directed graph whose vertex set is the union of C1 with all leaves Γ ∈ C (k)(k ∈ C1 ). As for any acyclic digraph and infinite linearly ordered set (Bang-Jensen and Gutin, 2001; p. 13), we can pick suitable elements zi ∈ R
LULU THEORY, IDEMPOTENT STACK FILTERS
153
such that i → j in G implies zi > zj . Extend these elements zi in an arbitrary way to a series z ∈ RS . For each transversal τ = {(i1 , Γ1 ), . . . , (in , Γn )} of the type in Eq. (69) (we slightly changed notation), focus on (i1 , Γ1 ). By with j1 ∈ construction of G thereis at least one arc i1 → j1 Γ1 . Hence zi1 $ zj1 > 0, whence j ∈Γ1 (zi1 $ zj ) > 0, whence 1≤α≤n ( j ∈Γα (ziα $ zj )) > 0. This is the desired contradiction to Eq. (70). As to the truth of (CI) , we saw in Eq. (66) that with Φ also Φ d is co-idempotent. By the above, the DNF of Φ d satisfies (CI). But this DNF amounts to Φ’s CNF, and so (CI) takes place. As to Property (2), suppose (CI) and (CI) hold. In order to establish the co-idempotency of Φ, it suffices to show that (CI) ⇒ Φ ◦ (I $ Φ) ≤ 0.
(71)
Indeed, from Eq. (71) and (CI) then follows Φ d ◦ (I d $ Φ d ) ≤ 0. In view of Eqs. (61) and (65), this yields Φ ◦ (I $ Φ) ≥ 0. Thus Eq. (67) holds, and Φ must be co-idempotent. Proof of Eq. (71). Fix s ∈ S and let C (s) = {C1 , . . . , Cn }. By Eq. (68), and since R is linearly ordered, the desired inequality Φ ◦ (I $ Φ) ≤ 0 amounts for each x ∈ RS to the existence of a transversal τ¯ = (kα , Γα ) | 1 ≤ α ≤ n with kα ∈ Cα and Γα ∈ C (kα ) such that
C∈C (s)
yτ¯ (C) =
(xkα $ xj ) ≤ 0.
1≤α≤n j ∈Γα
According to (CI), we can fix a good subset Gα ⊆ Cα for all 1 ≤ α ≤ n. Define now τ¯ := {(k1 , Γ1 ), . . . , (kn , Γn )} by letting kα be the index of the smallest element of the linearly ordered set {xj | j ∈ Gα } ⊆ R, and by picking Γα ∈ C (kα ) in such a way that Γα ⊆ Gα (possible by the goodness of Gα ). Then xkα $ xj ≤ 0 for all 1 ≤ α ≤ n and j ∈ Γα , and whence C∈C (s) yτ¯ (C) ≤ 0. This proves Eq. (71) and thus Theorem 32. Corollary 33. Let Φ be a LSF on RS where R is a linearly ordered group. If Φ is strong then Φ is co-idempotent. Proof. This follows on noticing that (I1), (I1) in Theorem 27 imply (CI), (CI) in Theorem 32. In particular, each LSF-opening L on RS , along with the closing U = Ld , is co-idempotent. Consider these desirable extra properties:
154
ROHWER AND WILD
(1) U L ≤ LU (that is, L and U are a Matheron pair, cf. III.B.2); (2) U L and (whence) LU are co-idempotent; (3) U L and (whence) LU are strong. Question: Aside from (3) ⇒ (2), are there other implications among Properties (1), (2), and (3)? We know that Ln , Un form a co-idempotent Matheron pair. What about 2 Matheron pairs L, U on RZ induced by structuring elements A ⊆ Z2 ? We investigated six relevant As. In four cases both Properties (1) and (2) held; in two cases both failed. For instance, the A = {(0, 0), (0, 1), (1, 0), (1, 1)} from III.C.1 rejects both (1) and (2); a counterexample to (2) is given in III.E.5. For what kind of A do δA and εA satisfy the analogue of Theorem 24 in III.D.1? It is also hoped that for some 2D structuring elements A the operators L and U 2 will satisfy the hypothesis of Theorem 16 in III.A.4 (adapted to RZ ), giving way to a 2D multiresolution analysis. As opposed to (I1) or (I2), the filter-theoretic meaning of (CI) alone (without (CI) ) is enigmatic. Observe that an anti-extensive Φ with (CI) must be co-idempotent. Indeed, by anti-extensivity each CNF-cluster D (s) contains the s-leaf {s}. Thus each D ∈ D (s) is itself D -good, and the claim follows from Theorem 32. 5. Concrete Calculations Using the theorems of the preceding section, we verify that some concrete LSFs have various properties but lack others. Question. All LULU operators Cn and Fn are co-idempotent (Section II.B.3). Are they even strong filters? The answer is still unknown, but by brute force calculation of minimal transversals (along the lines, but dwarfing, the ones in Example 12 in III.E.3), it was shown in Wild (2005) that at least Ln Un and Un Ln are strong. Other sporadically tested LULU operators are strong as well, for instance, U1 L2 = 3 2 . Its DNF and CNF are given by: C (0) = {−3, −2, −1}, {−2, −1, 0}, {−1, 0, 1}, {0, 1, 2}, {1, 2, 3} , D(0) = {−2, −1, 0}, {−1, 0, 1}, {0, 1, 2}, {−3, −2, −1, 1, 2, 3} . Since all leaves in C (0) are C -good, and all leaves in D (0) are D -good, U1 L2 is strong by Corollary 29.
LULU THEORY, IDEMPOTENT STACK FILTERS
x =
3
3 2
⇒ Ux = 1
⇒ x − LU x =
3 3 3 ⇒ LU x = 2 2 2 1 1
155
1 1 1 1 1 1
3 −1 2 3 2 2 ⇒ U (x − LU x) = −1 −1 1 1 1 −1 1 1 1 1
⇒ LU (I − LU )x =
F IGURE 27.
1 1 1 1 1 1
= 0.
A proof by pictures that Φ2 is not co-idempotent.
As an instance of “co-idempotent idempotent” consider Φ1 with DNF given by (Wild, 2003; p. 623) C (0) = {0, −2}, {0, 2}, {0, 1, 3, 4} . Since G := {1, 3} is a good subset of {0, 1, 3, 4}, and {0, −2}, {0, 2} are good themselves, Φ1 satisfies (CI). Because Φ1 is anti-extensive, it must be co-idempotent (see last remark in III.E.4). Because {0, 1, 3, 4} is not good, Φ1 is not idempotent by Corollary 28. As an instance of “idempotent co-idempotent,” take the 2D structural element A = {(0, 0), (0, 1), (1, 0), (1, 1)}, put L := δA ◦ εA , U := εA ◦ δA . Since L < U are idempotent, Φ2 := L ◦ U is easily seen to be a ∨-filter (Heijmans, 1994, p. 411). To see that Φ2 is not co-idempotent, we25 falsify 2 Eq. (67) for some concrete image x ∈ RZ with four nonzero pixels (the value of the empty squares is 0) (Figure 27). Every extensive map, such as , trivially is an ∧-overfilter. One verifies 3 is a nonextensive, nonidempotent ∧-overfilter, by glancing at that C (0) = {0, −1}, {0, 1}, {1, 2}, {−1, −2, 2, 3} (DNF), D(0) = {0, 2}, {1, −1}, {0, 1, 3}, {0, 1, −2} (CNF). (The nonidempotency of Φ from the fact that Φ3 does not 3 also follows comprise an equal number of s and s. See Fact (34) in III.B.3.) Neither is 3 3 co-idempotent. Clearly has all the dual properties. 25 Determining the DNF of Φ and checking that (CI) fails, is rather more labor intensive. 2
156
ROHWER AND WILD
F IGURE 28.
A Venn diagram of various classes of lattice stack filters.
Consider Φ3 defined by C (0) = {0, 1}, {0, −1}, {0, 2}, {0, −2}, {−1, 1}, {2, 3}, {1, 3, 6} , D(0) = {0, 1, 2}, {0, 1, 3}, {0, −1, 3}, {0, −1, 6}, {−2, −1, 1, 2} . The DNF satisfies (I2): For C¯ := {1, 3, 6} choose C := {2, 3}. Then C¯ contains the 2-leaf {1, 3} = {−1, 1} + 2 and the 3-leaf {1, 3} = {0, −2} + 3. All leaves in C (0) different from {1, 3, 6} are good, but the latter is not good. Thus (I2) holds, whence Φ3 is an overfilter; but (I1) does not hold and whence Φ3 is no ∧-overfilter. Furthermore, since D (0) falsifies both (I2) and (CI) , Φ3 is neither idempotent nor co-idempotent [notice that C (0) satisfies (CI)]. The reader is invited to find representatives for the remaining regions of Figure 28. Notice, for instance that the dashed line with the big (small) pieces delimits the class of ∧-overfilters (∨-underfilters). The intersection of these classes is the shaded rectangle of strong filters (which is within the region of co-idempotence) (Figure 28).
IV. C ONCLUSION Having been involved in multiresolution analysis with wavelets and in the establishment of a theory for nonlinear smoothers (LULU theory) in one
LULU THEORY, IDEMPOTENT STACK FILTERS
157
dimension, we are intrigued by the clear perspective of Marr that reached us indirectly, in his belief in a mathematics of vision. Occasional specific excursions into image processing using LULU theory have convinced us that such a mathematics is associated with a multiresolution analysis with specially selected morphological filters. The aim should be to have available a theory at least as strong as that established in LULU theory. We feel that in one dimension, the representations to aim for consist essentially of blocks of different width, similar to that of a Haar decomposition except in a few major aspects: The blocks are not restricted to those of wavelet bases, which occur in alternating pairs. The blocks should be more redundant for more flexibility. This idea agrees with that of Mallet (1998) who says that it is known that simple cells in the visual cortex perform such an image decomposition into such a larger class. Second, the decomposition should preferably be invariant to translation as much as is possible. The Haar decomposition is severely restricted by the phase problem in this respect. A third aspect is the restriction imposed by a dyadic wavelet decomposition. We note, for instance, that the Haar wavelet does map successively into the smoothness classes Mn , like the LULU smoothers, but skips progressively more of them. This yields many obvious problems, for instance, that a simple block of length 13 is represented by more than 1 block. The LULU decomposition yields precisely this one block in its representation. Generalization of dyadic wavelet decomposition is possible by projecting sequentially, in a least square sense, onto subspaces that are only smaller by one dimension. Algorithmically this lacks simplicity and still forces blocks to be coupled. Alternative mappings into more redundant classes (like frames) have been tried. Meyer (1993) notes that greedy pursuit algorithms, minimizing the residual of a linear sum of such blocks, are considered suboptimal, and that other attempts have been made, but a lack of theory for analysis of performance has been noted. In Section II we presented our strategy for establishing good representations for vision. Our strategy consists in the definition of a concept of smoothness called local monotonicity and the set of sequences Mn corresponding to a level of smoothness. The decomposition consists of a sequence of separators Sn that successively map into Mn from Mn−1 , peeling of a resolution layer r n . Section II strengthens foundations in wide areas of established mathematics and yields insight into alternatives. Our contention is that such separators should map onto a near-best approximation in Mn , which is the case if Sn ∈ [Un Ln , Ln Un ], the LULU interval at that level. Failure may lead to a perceived lack of performance of morphological decomposition. For several choices of Sn , primarily Ln Un and Un Ln , the sequence r n is easy to characterize and code. Despite the decomposition being constructed of highly nonlinear operators, the eventual decomposition of x into a sum of
158
ROHWER AND WILD
blocks can be quantized, scaled, and thresholded to the extent of each block being treated differently, and still have the result decomposed consistently. The decomposition acts linearly on the cone generated by the constituent blocks of a particular sequence. It is not a basis transformation, but maps onto a space generated by a linearly independent set of N blocks, depending on the particular sequence. Our aim has been to demonstrate that, in principle, morphological filters can be constructed such that they are near-optimal in approximation error. Our choice maps a vector in RN onto a subset of 12 N(N +1) possible block pulses, selecting at most N of them, the actual number dependent on the number of different gray shades of the 2n + 1 possibilities present in the elements of the particular sequence x. This guarantees economization in the representation with the blocks. The norm that is natural in our decompositions is the total variation. There is a useful related roughness spectrum, in this case the sequence T (bj i ) where bj i are the blocks of width j , starting at index i with amplitude bj i . This is associated with a Parseval-type identity preserving total variation in the same way the “energy” in a Fourier or wavelet decomposition is preserved. This is a valuable tool for analysis and automated decision making in practical applications. Thresholding is natural, and, in fact, a thresholding operator (or quantizing operator) commutes with the basic decomposition operators. In the mapping of the sequence x onto the block sequences, edges are perfectly preserved, even in the presence of noise of amplitude nearly that of the edge. This provides a key to possible movement analysis, which we at present feel incapable of arguing, though we are convinced that the ideas presented here are suggestive. Finally, each resolution component preserves the original order between neighboring elements of x. This is a remarkable shape preservation, unheard of in any other representation known to us. Reconstruction from any quantized selection of significant resolution components therefore inherits the relative order of the original sequence. One can hardly hope for a better representation, and the fact that we have a large selection of such decompositions from which to choose opens a wide area of research for selection of secondary attributes. We believe the ideas presented may contribute significantly toward a mathematics of vision as articulated by Marr. As to Section III, its relation to Section II has been previously discussed. With hindsight, we give it another try and perspective. When the second author was approached by P. Hawkes to write a chapter on DNF theory of stack filters for AIEP, he immediately opted for a joint venture with Carl Rohwer. Readers have likely appreciated the resulting shift of balance from theory to practice. But Section II also profits from Section III, and be it only for III.A.4 and III.D, which are closest to LULU theory. Section III was an
LULU THEORY, IDEMPOTENT STACK FILTERS
159
attempt to organize the articles (Wild, 2001, 2003, 2005; Rohwer and Wild, 2002) into a coherent whole. Here is a very brief history of these works. It began in 1998 when Carl Rohwer posed the idempotency problem for long LULU operators such as L2 U1 L1 (the idempotency of Ln , Un , Ln Un , Un Ln had long been settled by Carl Rohwer). Soon I found that distributivity is a powerful tool, only later it dawned that some of my deliberations reduced to pure semigroup theory or to MM, the latter of which I learned in the process (foremost the lattice-flavored parts). In trying to tackle co-idempotency (a concept invented by Rohwer), the link to max-plus algebra and discrete event systems became apparent. The relation to other structures emerged later still in this order: (classic) stack filters, monotone Boolean networks, cellular automata, pregroups, and templates. The proofs in Section III, that have been previously published, have all been streamlined, in particular, the key relation satisfied by and (Rohwer and Wild, 2002; Lemma 4.6) got a brand-new footing (Theorem 24 in III.D). The following results have not appeared previously: Theorem 32 in III.E, Theorem 16 in III.A, and Theorems 20 and 21 in III.B.
ACKNOWLEDGMENTS The authors thank Lauretta Adams for excellent word processing and J.P. du Toit for assistance with artwork, and the application in Image Analysis. For the latter, we also thank T. Scheidt and E.G. Rohwer for the background and apportunity to apply our ideas.
R EFERENCES Agaian, S., Astola, J., Egiazarian, K. (1995). Binary polynomial transforms and nonlinear digital filters. Pure Appl. Math. 191. Astola, J., Kuosmanen, P. (1997). Fundamentals of Nonlinear Digital Filtering. CRC Press, Boca Raton, Florida. Baccelli, F., Cohen, G., Olsder, G.J., Quadrat, J.P. (1992). Synchronization and linearity. An algebra for discrete event systems. Wiley Series in Probability and Mathematical Statistics. Bang-Jensen, J., Gutin, G. (2001). Digraphs: Theory, Algorithms and Applications. Springer Monographs in Mathematics. Bijaoui, A., Murtagh, F., Starck, J.L. (1998). Image Processing and Data Analysis—The Multiscale Approach. Cambridge Univ. Press. Birkhoff, G. (1979). Lattice Theory, 3rd edition. Amer. Math. Soc., Providence, RI. Blyth, T.C. (2005). Lattices and Ordered Structures. Springer Verlag.
160
ROHWER AND WILD
Chen, K. (1989). Bit-serial realizations of a class of nonlinear filters based on positive Boolean functions. IEEE Trans. Circuits Syst. 36, 785–794. Chua, L. (2002). Celluar Neural Networks and Visual Computing. Cambridge Univ. Press. Chui, C.K. (1997). Wavelets: A Mathematical Tool for Signal Analysis. SIAM, Philadelphia. Collatz, L. (1975). Monotonicity in numerical mathematics. In: Proceedings of the 1st SANUM Conference. Durban. Conradie, W.J., de Wet, T., Jankowitz, M.D. (2006). Exact and asymptotic distribution of LULU smoothers. J. Comput. Appl. Math. 186 (1), 253– 267. Dougherty, E.R., Haralick, R.M. (1992). Unification of nonlinear filtering in the context of binary logical calculus, Part I: Binary filters. J. Math. Imaging Vision 2, 173–183. Fajfar, I., Bratkovic, F. (1996). Design of monotonic binary-valued cellular neural network. In: Proceedings of CNNA’96 Conference, pp. 321–326. Gunawardena, J. (2003). From max-plus algebra to nonexpansive mappings: a nonlinear theory of discrete event systems. Theor. Camp. Sci. 293, 141– 167. Gunawardena, J. (1994). Min–max functions. Discrete Event Dyn. Syst. 4, 377–406. Gunawardena, J. (Ed.) (1998). Idempotency. Publications of the Newton Institute, vol. 11. Cambridge Univ. Press. Heijmans, H.J.A.M. (1994). Morphological Image Operators. AIEP Suppl., vol. 24. Heijmans, H.J.A.M. (1997). Composing morphological filters. IEEE Trans. Image Process. 6, 713–723. Howie, J.M. (1976). An Introduction to Semigroup Theory. Academic Press. Kauffman, S.A. (1993). The Origins of Order. Oxford Univ. Press. Lambek, J. (2006). Pregroups and natural language processing. Math. Intelligencer 28, 41–48. Laurie, D.P., Rohwer, C.H., The discrete pulse transform. SIAM J. Math. Anal., in preparation. Litinov, G.L., Maslov, V.P. (Eds.) (2005). Idempotent mathematics and mathematical physics. Contemp. Math., vol. 377. American Mathematical Society. Mallet, S. (1998). Applied mathematics meets signal processing. Doc. Math. Extra Volume ICM.I, 319–338. Mallows, C.L. (1980). Some theory of nonlinear smoothers. Ann. Stat. 8 (4), 695–715. Maragos, P., Schafer, R.W. (1987). Morphological filters—Part II. Their relations to median, order-statistic, and stack-filters. IEEE Trans. Accoustics, Speech Signal Process. ASSP-35, 1170–1184.
LULU THEORY, IDEMPOTENT STACK FILTERS
161
Meyer, Y. (1993). Wavelets, Algorithms and Applications. SIAM, Philadelphia. McKenzie, R., McNulty, G.F., Taylor, W. (1987). Algebras, Lattices, Varieties. Brooks, Cole. Paredes, J.L., Arce, G.R. (2001). Recent developments in Stack Filtering and Smoothing. In: AIEP, vol. 117, pp. 173–239. Eccles, J.C., Popper, K.R. (1981). The Self and Its Brain. Springer. Rohwer, C.H. (1989). Idempotent one-sided approximation of median smoothers. J. Approx. Theor. 58 (2), 151–163. Rohwer, C.H. (1999). Projections and separators. Quaest. Math. 22, 219–230. Rohwer, C.H. (2002a). Fast approximation with locally monotone sequences. In: Proceedings 4th FAAT Conference, Maratea. In: Supplemento ai rendiconti del Circolo matimatico di Palermo, Series II, vol. 68. Rohwer, C.H. (2002b). Variation reduction and LULU smoothing. Quaest. Math. 25, 163–176. Rohwer, C.H. (2002c). Multiresolution analysis with pulses. In: Buhmann, M.D., Mache, D.A. (Eds.), Advanced Problems in Constructive Approximation. In: International Series of Numerical Mathematics, vol. 142. Birkhäuser, Basel, pp. 165–186. Rohwer, C.H. (2004a). Fully trend preserving operators. Quaest. Math. 27, 217–229. Rohwer, C.H. (2004b). Quasi-inverses and approximation with min–max operators. In: Supplimento ai rendiconti del Carcolo matimatico di Palermo, Serie II, vol. 76. Rohwer, C.H. (2005). Nonlinear Smoothing and Multiresolution Analysis. ISNM, vol. 150. Birkhäuser. Rohwer, C.H. (2006). Quasi-inverses and the approximation with min–max operators in the 1 -norm. Quaest. Math. 29. Rohwer, C.H., Toerien, L.M. (1991). Locally monotone robust approximation of sequences. J. Comput. Appl. Math. 36, 399–408. Rohwer, C.H., Wild, M. (2002). Natural alternatives for one-dimensional median filtering. Quaest. Math. 25, 135–162. Ronse, C. (1990). Order-configuration functions: Mathematical characterizations and applications to digital signal and image processing. Inform. Sci. 50, 275–327. Royden, H.L. (1969). Real Analysis. Macmillan. Saitô, T. (1962). Ordered idempotent semigroups. J. Math. Soc. Japan 14, 150–169. Schonfeld, D., Goutsias, J. (1991). Optimal morphology pattern restoration from noisy binary images. IEEE Trans. Pattern Anal. Machine Intell. 13, 14–29. Schubert, H. (1975). Topologie. Teubner, Stuttgart.
162
ROHWER AND WILD
Serra, J. (1982). Image Analysis and Mathematical Morphology. Academic Press, London. Serra, J. (1994). Morphological filtering: An overview. Signal Processing 38, 3–11. Scheidt, T., Rohwer, E.G., von Bergmann, H.M., Saucedo, E., Diéguez, E., Fornaro, L., Stafast, H. (2005). Optical second-harmonic imaging of Pbx Cd1–x Te ternary alloys. J. Appl. Phys. 97, 103104-1–103104-6. Shmulevich, I., Coyle, E.J. (1998). On the structure of idempotent monotone Boolean functions. In: Proceedings of NOBLESSE on Nonlinear Model Based Image Analysis. Springer-Verlag, Glasgow, pp. 339–344. Shmulevich, I., Dougherty, E.R., Zhang, W. (2002). From Boolean to probabilistic Boolean networks as models of genetic regulatory networks. Proc. IEEE 90, 1778–1792. Velleman, P.F. (1977). Robust nonlinear data smoothers: definitions and recommendations. Proc. Natl. Acad. Sci. USA 74 (2), 434–436. Wendt, P.D., Coyle, E.J., Gallagher, N.C. (1986). Stack filters. IEEE Trans. Acoust., Speech Signal Process. 34, 898–911. Wild, M. (2001). On the idempotency and co-idempotency of the morphological centre. Int. J. Pattern Recognition Artif. Intell. 15 (7), 1119–1128. Wild, M. (2003). Idempotent and co-idempotent stack filters and min–max operators. Theoret. Comput. Sci. 299, 603–631. Wild, M. (2005). The many benefits of putting stack filters into disjunctive or conjunctive normal form. Disc. Appl. Math. 149, 174–191. Wolfram, S. (2002). A New Kind of Science. Wolfram Media. Yang, K., Zhao, Q. (2004). The balance problem of min-max systems is co-NP hard. Syst. Control Lett. 53, 303–310. Zhou, X.-W., Yang, D.-Y., Ding, R.-T. (1992). Infinite length roots of median filters. Sci. China (Ser. A) 35, 1496–1508.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 146
Bayesian Information Geometry: Application to Prior Selection on Statistical Manifolds HICHEM SNOUSSI ICD/LM2S, Charles Delaunay Institute, University of Technology of Troyes, 10010 Troyes, France
I. Introduction . . . . . . . . . II. Differential Geometry Tools . . . . III. Statistical Geometric Learning . . . . A. Mass and Geometry . . . . . . B. Bayesian Learning . . . . . . C. Restricted Model . . . . . . . 1. Nonparametric Modeling . . . . 2. Parametric Modeling . . . . . IV. Prior Selection . . . . . . . . A. Family of (δ, α)-Priors . . . . . B. Choice of Reference Distribution . . V. δ-Flat Families . . . . . . . . A. δ Optimal Estimates in δ-Flat Families B. Prior Selection with δ-Flat Families . C. Projection of Priors . . . . . . VI. Mixture of δ-Flat Families and Singularities A. Singularities with Mixture Families . VII. Examples . . . . . . . . . . A. Multivariate Gaussian Mixture . . . B. Source Separation . . . . . . 1. Fisher Information Matrix . . . . 2. δ-Divergence (δ = 0, 1) . . . . VIII. Conclusion and Discussion . . . . . Appendix A . . . . . . . . . A. Proof of Theorem 1 . . . . . . Appendix B . . . . . . . . . A. Proof of Theorem 2 . . . . . . References . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
164 166 176 176 178 179 179 179 181 182 184 187 189 189 190 192 193 195 195 198 200 201 203 203 203 204 204 206
Information geometry is an emerging research domain. The work of Amari (1985) can be considered as the first attempt to elucidate the power of information geometry tools to solve several problems arising in various information and physical sciences. The field of neurocomputing is a perfect example for illustrating the success of applying such an abstract mathematical 163 ISSN 1076-5670 DOI: 10.1016/S1076-5670(06)46003-1
Copyright 2007, Elsevier Inc. All rights reserved.
164
SNOUSSI
tool. Information geometry consists of developing novel tools based on differential geometry, which is in turn, a mature field of mathematics aiming at characterizing the local and global properties of manifolds. This chapter considers the application of information geometry to solve the problem of prior selection in a Bayesian learning framework. There is extensive literature on the construction of noninformative priors and the subject seems far from a definite solution (Kass and Wasserman, 1994). We consider this problem in light of the recent development of information geometric tools by Amari and Nagaoka (2000). The differential geometric analysis allows the formulation of the prior selection problem in a general manifold valued set of probability distributions. In order to construct the prior distribution, a criterion, expressing the trade-off between decision error and uniformity constraint, is proposed in Snoussi, 2005. The solution has an explicit expression obtained by variational calculus. In addition, it has two important invariance properties: (1) invariance to the dominant measure of the data space and (2) invariance to the parameterization of a restricted parametric manifold. We show how the construction of a prior by projection is the best way to take into account the restriction to a particular family of parametric models. For instance, we apply this procedure to autoparallel restricted families. Two practical examples illustrate the proposed construction of prior. The first example deals with the learning of a mixture of multivariate Gaussians in a classification perspective. In this learning problem we show how the penalization of likelihood by the proposed prior eliminates the degeneracy occurring when approaching singularity points. The second example treats the blind source separation problem.
I. I NTRODUCTION A learning machine can be described as a system mapping some inputs x to some outputs y (Figure 1). The inputs x and the outputs y lie in two general sets either Euclidean or not. The learning of the machine consists essentially in extracting information from some collected data to perform a specific task related to the behavior of the modeled machine. The distinction inputs/outputs is not generally related to the task performed by the learning machine. For instance, filtering consists in estimating the inputs x given the outputs y. The inference consists in finding some parameter θ characterizing the mapping y = fθ (x). The prediction consists in estimating the stochastic behavior of the outputs given some previous recorded data, and so on. The complexity of the physical mechanism underlying the mapping inputs/outputs or the lack of information makes the prediction of the outputs given the inputs (forward model) or the estimation of the inputs given the outputs (inverse problem) a difficult task.
BAYESIAN INFORMATION GEOMETRY
F IGURE 1.
165
Learning machine model of experimental science.
When a parametric forward model p(y|x, θ ) is assumed to be available from the knowledge of the system, one can use the classical maximum likelihood (ML) to estimate either the parameter θ or the inputs x given the data (outputs) y. When a prior model p(x, θ ) = p(x|θ)p(θ) is assumed to be available too, the classical Bayesian methods can be used to obtain the joint a posteriori p(x, θ |y) and then both p(x|y) and p(θ|y) from which we can make any inference about x and θ. The problem of prediction can be stated as follows: given some training data D = (x i , y i )i=1,...,N , where i is the time index and N is the sample size, our purpose is the estimation of the output probability distribution (prediction). In this work, we focus on the prediction problem. We note that, in this situation, before designing the learning algorithm, two important questions must be confronted: (1) how to choose the parametric model p(y|x, θ ) (model selection), and (2) how to select a prior distribution on the parameter θ . In words, the first question concerns the selection of an appropriate manifold in the whole set of probability distributions P , on which the learning algorithm will estimate the prediction p(y|D ). This problem is beyond the scope of this chapter. Balasubramanian (1996, 1997) provides a geometric insight of the selection of a model among a finite set of available models. This chapter concerns the second question of the selection of appropriate prior distribution. Assuming a given statistical model (differentiable manifold) either parametric or not, we propose a novel method to construct a prior distribution. This method can be interpreted as an inverse problem of geometric Bayesian learning (Zhu and Rower, 1995a, 1995b). In fact, Bayesian learning consists of constructing a decision rule (a mapping from the data space to the manifold Q of predicted distributions; see Figure 12) by minimizing a cost function (generalization error) given a chosen manifold and a prior distribution on this manifold (Zhu and Rohwer, 1995a). However, in the proposed method, we assume a fixed prediction (reference distribution) and we minimize the decision error cost under a uniformity constraint (a measure of ignorance).
166
SNOUSSI
In the sequel, we assume that we are given some training data x 1,...,N and y 1,...,N and some information about the mapping, which consists of a model Q of probability distributions, either parametric (Q = {P (z|θ )}) or nonparametric. The statistical manifold Q is the set of probability distributions on the space Z = X × Y (see Figure 12). The objective of a learning algorithm is to construct a learning rule τ mapping the set D of training data D = (x 1,...,N , y 1,...,N ) to a probability distribution p ∈ Q ⊂ P (P is the whole set of probability densities): τ:
D → Q, D → q = τ (D).
The Bayesian statistical learning leads to a solution depending on the prior distribution of the unknown distribution p. In the parametric case, where the points p of the manifold Q are parameterized by a coordinate system θ, this is equivalent to the prior Π (θ) on the parameter θ. Finding a general expression for Π (θ) and how this expression reflects the relationship between a restricted model (Q) and the closer set of ignorance containing it are the main objectives of this work. We show that the prior expression depends on the chosen geometry (subjective choice) of the set of probability measures. The entropic prior1 (Rodríguez, 1991, 2001) and the conjugate prior of exponential families are special cases related to special geometries. Section II briefly recalls some definitions from differential geometry. Section III reviews some concepts of Bayesian geometrical statistical learning and the role of differential geometry. Section IV develops the basics of prior selection in a Bayesian decision perspective and discusses the effect of model restriction, both from nonparametric to parametric modelization and from parametric family to a curved family. Section V studies the particular case of δ-flat families where previous results have explicit formula. Section VI discusses the case of δ-flat families mixture. Section VII applies these results to two learning examples: (1) the mixture of multivariate Gaussian classification and (2) blind source separation. We conclude with a summary and indicate some future scopes.
II. D IFFERENTIAL G EOMETRY T OOLS This section recalls some definitions from differentiable geometry related to the concept of Riemannian manifolds. For further details, please refer to 1 Some related work about ignorance and prior selection in a geometric framework can be found in http://omega.albany.edu:8008/ignorance.
BAYESIAN INFORMATION GEOMETRY
F IGURE 2.
167
Topological manifold.
Boothby (1986). First, we define a topological manifold as follows: Definition 1. A manifold M of dimension n, or n-manifold, is a topological space with the following properties: 1. M is Hausdorff 2. M is locally Euclidean of dimension n, and 3. M has a countable basis of open sets. Intuitively, a topological manifold is a set of points that can be considered locally as a flat Euclidean space. In other words, each point p ∈ M has a neighborhood U homeomorphic to an n-ball in Rn . Let φ be such an homeomorphism. The pair (U, φ) is called a coordinate neighborhood: to p ∈ U , we assign the n coordinates ξ 1 (p), ξ 2 (p), . . . , ξ n (p) of its image φ(p) in Rn (Figure 2). If p also lies in a second neighborhood V , let ψ(p) = [ψ 1 (p), ρ 2 (p), . . . , ρ n (p)] be its correspondent coordinate system. The transformation ψ ◦ φ −1 on Rn given by: ψ ◦ φ −1 : ξ 1 , . . . , ξ n ⇔ ρ 1 , . . . , ρ n , defines a local coordinate transformation on Rn from φ = [ξ i ] to ψ = [ρ i ]. Differential geometry is interested in intrinsic geometric properties that are invariant with respect to the choice of the coordinate system. This can be achieved by imposing smooth transformations between local coordinate systems (Figure 3). The following definition of differentiable manifold formalizes this concept in a global setting. Definition 2. A differentiable (or smooth) manifold M is a topological manifold with a family U = {Uα , φα } of coordinate neighborhoods such that: 1. The Uα cover M
168
SNOUSSI
F IGURE 3.
Differentiable manifold.
2. For any α, β, if the neighborhood’s intersection Uα ∩ Uβ is nonempty, then φα ◦ φβ−1 and φβ ◦ φα−1 are diffeomorphisms of the open sets φβ (Uα ∩ Uβ ) and φα (Uα ∩ Uβ ) of Rn 3. Any coordinate neighborhood (V , ψ) meeting Property 2 with every Uα , φα ∈ U is itself in U . Tangent spaces. On a differentiable manifold, an important notion (in the sequel) is the tangent space. The tangent space Tp (M) at a point p of the manifold M is the vector space of the tangent vectors to the curves passing by the point p. It is intuitively the vector space obtained by a local linearization around the point p. More formally, let f : M → R be a differentiable function on the manifold M and γ : I → M a curve on M, the directional derivative of f along the curve γ is written: ∂f dγi d dγi ∂ f f (t) = = dt ∂ξi dt dt ∂ξi where the derivative operator ei = ∂ξ∂ i can be considered a vector belonging to the tangent space at the point p. The tangent space is then the vector space spanned by the differential operators ( ∂ξ∂ i )p : 1 ∂ c , . . . , cn ∈ eR n , Tp ( M) = c i ∂ξ i p
BAYESIAN INFORMATION GEOMETRY
F IGURE 4.
F IGURE 5.
169
Tangent space on the manifold.
Vector field on the manifold.
where the differential operator ( ∂ξ∂ i )p can be seen geometrically as the tangent vector to the ith coordinate curve (fixing all ξ j coordinates j = i and varying only the value of ξ i ) (Figure 4). Vector fields and tensor fields. A vector field X is an application M → ! p Tp , which assigns a tangent vector to each point of the manifold: X : p ∈ S → Xp ∈ Tp . The vector field X can be defined by its n component functions {Xi }ni=1 (Figure 5). X is C ∞ (smooth) if and only if all its scalar components (Xi ) are C ∞ . A tensor field. A of type [q, r] is an application that maps a point p ∈ M q to some multilinear mapping Ap from Tpr to Tp : A : p → Ap Ap : Tp × · · · × Tp → Tp × · · · × Tp . " #$ % " #$ % r direct products
q direct products
170
SNOUSSI
F IGURE 6.
Riemannian metric.
The types [0, r] and [1, r] are respectively called tensor fields of covariant degree r and tensor fields of contravariant degree 1 and covariant degree r. For example, a scalar product is a tensor field of type [0, 2]: Tp × Tp → R, (Xp , Yp ) → Xp , Yp ". Riemannian metric. For each point p in M, assume that an inner product , "p is defined on the tangent space Tp (M). Thus, a mapping from the points of the differentiable manifold to their inner product (bilinear form) is defined. If this mapping is smooth, then the pair (M, , "p ) is called Riemannian manifold (Figure 6). The Riemannian metric is thus a tensor field g that is, according to a coordinate system {ξ }, defined by the positive definite matrices Gp : ' & ∂ ∂ , . Gij (p) = ∂ξ i ∂ξ j p On a manifold M, an infinite number of Riemannian metrics may be defined. The metric is not indeed an intrinsic geometric property of the manifold. Consider now a curve γ : [a, b] → (S , g), its length γ is defined as: b b dγ γ = dt = gij γ˙ i γ˙ j dt. dt a
a
Geodesics. A geodesic between two endpoints γ (a) and γ (b) on a Riemannian manifold M is a curve γ : [a, b] → M, which is locally defined as the shortest curve on the manifold connecting these endpoints. More formally, a geodesic is defined as follows.
BAYESIAN INFORMATION GEOMETRY
F IGURE 7.
171
Exponential mapping on the manifold.
Definition 3. The parameterized curve γ (t) is said to be a geodesic if its velocity (tangent vector) dγ /dt is constant (parallel), that is, if it satisfies the condition (D/dt)(dγ /dt) = 0, for a < t < b. Exponential mapping. The notion of exponential mapping represents an interesting tool to build a bridge between a Euclidean space and the Riemannian manifold. For a point p and a tangent vector X ∈ Tp (M), let γ : t ⇒ γ (t) be the geodesic such that γ (0) = p and dγ dt (0) = X. The exponential mapping of X is defined as Ep (X) = γ (1). In other words, the exponential mapping assigns to the tangent vector X the endpoint of the geodesic whose velocity at time t = 0 is the vector X (Figure 7). It can be shown that there exist a neighborhood U of 0 in Tp (M) and a neighborhood V of p in M such that Ep |U is a diffeomorphism from U to V . Also, note that since the velocity dγ /dt is constant along the geodesic γ (t), its length L from p to Ep (X) is: 1 1 dγ L = dt = X dt = X. dt 0
0
The exponential mapping Ep (X) corresponds thus to the unique point on the geodesic whose distance from p is the length of the vector X. Affine connections. An affine connection is an infinitesimal linear relation Πp,p between the tangent spaces of two neighboring points p and p
(Figure 8). It can be defined by its n3 connection coefficients Γijk (with respect to the coordinate system [ξ i ]) as follows:
Πp,p (∂j )p = (∂j )p − dξ i Γijk p (∂k )p .
172
SNOUSSI
F IGURE 8.
Affine connections.
F IGURE 9.
Parallel translation.
Let p and q be two points on M and γ a curve linking p and q. If the tangent vectors X(t) meet the following relation along the curve γ :
X(t + dt) = Πγ (t),γ (t+dt) X(t) , then, X is parallel along γ and Πγ is a parallel translation on γ (Figure 9). The covariant derivative of a vector field X along a curve γ is defined as the infinitesimal variation between X(t) and the parallel translation of X(t + h) ∈ Tγ (t+h) to the space Tγ (t) along γ . The parallel translation is in fact necessary in order to consider the limit of the difference of two vectors belonging to the same vector space. The vectors X(t) and X(t + dt) belong to different tangent spaces and the quantity dX(t) = X(t + dt) − X(t) may not be defined (Figure 10). The covariant derivative δX dt then forms a vector
BAYESIAN INFORMATION GEOMETRY
F IGURE 10.
173
Covariant derivative along a curve γ .
field along the curve γ and can be expressed as a function of the connection coefficients as follows:
δX = Πγ (t+dt),γ (t) X(t + dt) − X(t) /dt dt (1) = X˙ k (t) + γ˙ i (t)Xj (t) Γijk γ (t) (∂k )γ (t) . The expression (1) of the covariant derivative along a curve γ can be extended to define the directional derivative along a tangent vector D by considering the curve whose tangent vector is D. The directional derivative, denoted by ∇D X, has the following expression:
j ∇D X = D i ∂i Xk p + Xp Γijk p (∂k )p . The covariant derivative along the curve γ can then be written as: δXγ (t) = ∇γ˙ (t) X. (2) dt Consider now two vector fields X and Y on the manifold M. The covariant derivative ∇X Y ∈ Tp (M) of Y with respect to X can be defined by the following expression: (3) ∇X Y = Xi ∂i Y k + Y j Γijk ∂k . The expression Eq. (3) of the covariant derivative can be used as a characterization of the connection coefficients Γijk . In fact, taking X = ∂i
174
SNOUSSI
F IGURE 11.
Autoparallel submanifold.
and Y = ∂j , the connection coefficients are characterized as follows: ∇∂i ∂j = Γijk ∂k . A differentiable manifold M is said to be flat if and only if there exists a coordinate system [ξi ] such that the connection coefficients {Γijk } are identically 0. This means that all the coordinate vector fields ∂i are parallel along any curve γ on M. Let S be a submanifold of M and ∇ a connection defined on M. For any p ∈ S , the tangent space Tp (S ) is included in the tangent space Tp (M). However, if two vector fields X and Y are defined on the submanifold S , the covariant derivative ∇X Y belongs in general to the tangent space Tp (M) and not to Tp (S ). If, in particular, the covariant derivatives remain in the tangent space Tp (S ) for any point p ∈ S , then the submanifold S is said to be autoparallel with respect to ∇ (Figure 11). Example. A geodesic γ is a one-dimensional (1D) autoparallel submanifold. As an autoparallel submanifold S is closed with respect to the parallel translation on M and considering the fact that a 1D manifold is flat, a geodesic γ can be characterized by the following equation: δ dγ = 0, dt dt implying that the velocity vector dγ dt is parallel along the curve γ . An autoparallel submanifold S of a flat manifold M is also a flat manifold. In addition, the affine coordinates of M and S are related by an affine transformation. In other words, there exist a matrix A and a vector b such that: ξ (p) = Au(p) + b, where
[ξ i ]
and
[ua ]
∀p ∈ S ,
are the affine coordinates of M and S , respectively.
BAYESIAN INFORMATION GEOMETRY
175
Riemannian connection. A Riemannian connection is an affine connection ∇ defined on a Riemannian manifold (M, g = , ") such that ∀X, Y, Z ∈ T (S ), the following property holds: Z X, Y " = ∇Z X, Y " + X, ∇Z Y "
(4)
where the left-hand side of the equation means the differential operator Z applied to the scalar function X, Y " on the manifold. δY Let γ be a curve on the manifold M and δX dt and dt the covariant derivatives of X and Y along γ , respectively. According to the expression of the covariant derivative [Eq. (2)] and the fact that the differential operator γ˙ (t) consists of deriving with respect to t, the following interesting identity is obtained concerning the variation of the scalar product on the manifold with a Riemannian connection: ' & ' & ) δY (t) δX(t) d( , Y (t) + X(t), . X(t), Y (t) = dt dt dt The above equation means that the scalar product is conserved under a δY (t) parallel translation ( δX(t) dt = dt = 0): ) ( Πγ (X), Πγ (Y ) = X, Y ". A particular example is the Euclidean space, which is a flat manifold characterized by a Riemannian connection. Dual connections. This notion is very important when dealing with probability distributions manifolds (see Amari and Nagaoka, 2000, for more details). It can be introduced in the following manner. In general, Eq. (4) characterizing a Riemannian connection does not hold. Instead, there may exist two affine connection ∇ and ∇ ∗ on the Riemannian manifold (M, g = , ") such that, ∀X, Y, Z ∈ T (M): ( ) Z X, Y " = ∇Z X, Y " + X, ∇Z∗ Y . The connections ∇ and ∇ ∗ are called dual connections and the Riemannian manifold is denoted by (M, g, ∇, ∇ ∗ ). The conservation of the scalar product X, Y " along a curve γ has an equivalent form that consists of translating the first vector X according to the parallel translation Πγ (with respect to ∇) and the second vector Y according to the parallel translation Πγ∗ (with respect to ∇ ∗ ): ) ( Πγ (X), Πγ∗ (Y ) = X, Y ". A manifold (M, g, ∇, ∇ ∗ ) is dually flat if and only if the affine connection ∇ is identically 0 (which also implies that ∇ ∗ = 0).
176
SNOUSSI
III. S TATISTICAL G EOMETRIC L EARNING A. Mass and Geometry Statistical learning consists in constructing a learning rule τ that maps the training measured data D to a probability distribution2 q = τ (D) ∈ Q ⊂ * P = {p| p = 1} (the predictive distribution). We discuss the consequences of the restriction to a subset Q in Section III.C. Therefore, our target space is a space of distributions and it is fundamental to provide this space with, at least in this work, two attributes, which are the mass (a scalar field) and a geometry. The mass is defined by an a priori distribution Π (p) on the space P , before collecting the data D and modified according to Bayesian rule, after observing the data to give the a posteriori distribution (Figure 12): P (p0 |D) ∝ P (D|p0 )Π (p0 ),
for all p0 ∈ P ,
(5)
where P (D|p0 ) is the likelihood of the probability p0 to generate the data D (the distribution is to be compared to a parameter in the classic ML methods). In the sequel, z is the couple (x, y) first mentioned in the Introduction. In the case of i.i.d. samples D = {zi }i=1,...,N , the likelihood of the probability p0 is + n simply P (D|p0 ) = N i=1 p0 (zi ). For the parametric case Q = {pθ , θ ∈ R }, where θ is a coordinate system and n is the dimension of the manifold, just replace p0 in Eq. (5) by θ to find the classic Bayesian parametric formulation. We assume that the data D are generated by an unknown distribution p∗ . As the number N of data samples becomes large, the a posteriori distribution P (p0 |D ) is concentrated around the true distribution p∗ (consistency), under some weak regular conditions.3 The geometry can be defined by the δ-divergence Dδ : * * * δ 1−δ p q p q Dδ (p, q) = 1−δ + δ − δ(1−δ) , δ = 0, 1, (6) * * * D1 (p, q) = D0 (q, p) = q − p + p log p/q, where the integration is defined with respect to a dominant measure. We notice that this definition is parametric free. Therefore, in the case of a parametric restricted manifold Q, this measure is invariant under reparameterization. It is shown in Amari (1985) that, in the parametric manifold Q, the δ-divergence induces a dualistic structure (g, ∇ δ , ∇ 1−δ ), where g is the Fisher metric 2 In literature, the considered subset Q is parametric. This restriction to parametric manifold is important for computational reasons, which is why Q is also called the computational model. However, for the derivation of the main results in this work, there is no need to restrict Q to be parametric. 3 In Section VI.A, there is an example illustrating the violation of these conditions and how the construction of prior and the use of Bayesian approach eliminate the singularity problem and ensure the consistency of the MAP solution.
BAYESIAN INFORMATION GEOMETRY
177
F IGURE 12. The a posteriori mass is proportional to the product of the a priori mass and the likelihood function. As the number of samples N grows, the a posteriori distribution P (p|D) (the dark ball) is increasingly concentrated around the true distribution p∗ .
(defining the scalar product in the tangent spaces), ∇ δ the δ connection with δ and ∇ ∗ = ∇ 1−δ its dual connection: Christoffel symbols Γij,k gij = ∂i , ∂j " = Eθ [∂i l(θ)∂j l(θ)], (7) Γ δ = E [(∂ ∂ l(θ) + δ∂ l(θ)∂ l(θ))∂ l(θ)]. ij,k
θ
i j
i
j
k
The parametric manifold Q is δ-flat if and only if there exists a parameδ (θ ) = 0. The terization [θi ] such that the Christoffel symbols vanish: Γij,k coordinates [θi ] are then called the affine coordinates. If for a different coordinate system [θi ], the connection coefficients are null, then the two coordinate systems [θi ] and [θi ] are related by an affine transformation; that is, there exists a (n × n) matrix A and a vector b such that θ = Aθ + b. All the above definitions can be extended to nonparametric families by replacing the partial derivatives with the Fréchet derivatives. Embedding the model Q in the whole space of finite measures P˜ (Zhu and Rohwer, 1995a, 1995b), not only the space of probability distributions P , many results can be proven easily for the main reason that P˜ is δ-flat and δ-convex ∀δ in [0, 1], whereas P is δ-flat for only δ = {0, 1} and δ-convex for δ = 1. For δ notation convenience, we use the δ-coordinates l of a point p ∈ P˜ defined as: pδ . (8) δ A curve linking two points a and b is a function γ : [0, 1] → P˜ , such that γ (0) = a and γ (1) = b. A curve is a δ-geodesic in the δ-geometry if it is a δ
l(p) =
178
SNOUSSI
straight line in the δ-coordinates: δ
δ
δ
l(t) = (1 − t)l(a) + t l(b). B. Bayesian Learning The loss quantity of a decision rule τ with a fixed δ-geometry can be measured by the δ-divergence Dδ (p, τ (z)) between the true probability p and the decision τ (z). This divergence is first averaged with respect to all possible measured data z and then with respect to the unknown true probability p, which gives the generalization error E(τ ):
Eδ (τ ) = P (p) P (z|p)Dδ p, τ (z) . p
z
Therefore, the optimal rule τδ is the minimizer of the generalization error: τδ = arg min Eδ (τ ) . τ
The coherence of Bayesian learning is shown in Zhu and Rohwer (1995a, 1995b), and means that the optimal estimator τδ can be computed pointwise as a function of z and we do not need a general expression of the optimal estimator τδ : p(z) ˆ = τδ (z) = arg min P (p|z)Dδ (p, q). (9) q
p
By variational calculation, the solution of Eq. (9) is straightforward and gives: pˆ δ = pδ P (p|z), which is exactly the a posteriori mean of the δ coordinates. The above result can be considered as the extension of the classic parametric Bayesian inference to the more abstract set of probability distributions. For example, consider the estimation of a parameter η from its a posteriori distribution ˆ 2. p(η|z). The δ divergence is to be compared to the quadratic cost η − η The minimization of the expected * cost leads to the a posteriori expectation (EAP) solution: ηˆ = Ep [η|z] = ηp(η|z) dη. From a physical point of view, the above solution is exactly the gravity center of the set P˜ within a mass P (p|z), the a posteriori distribution of p and with the δ-geometry induced by the δ-divergence Dδ . Here, we have the analogy with the static mechanics and the importance of the geometry
BAYESIAN INFORMATION GEOMETRY
179
defined on the space of distributions. The whole space of finite measures P˜ is δ-convex and thus, independently on the a posteriori distribution P (p|z) the solution pˆ belongs to P˜ ∀δ ∈ [0, 1]. C. Restricted Model In practical situations, we restrict the space of decisions to a subset Q ∈ P˜ . Q is in general a parametric manifold that we suppose to be a differentiable manifold. Thus Q is parameterized with a coordinate system [θi ]ni=1 , where n is the dimension of the manifold. Q is also called the computational model because the main reason of the restriction is to design and manipulate the points p with their coordinates that belong to an open subset of Rn . However, the computational model Q is not disconnected from nonparametric manipulations and we will show that both a priori and final decisions can be located outside the model Q. Let us now compare nonparametric learning with parametric learning when we are constrained to a parametric model Q: 1. Nonparametric Modeling The optimal estimate is the minimizer of the generalization error where the true unknown point p is allowed to belong to the whole space P˜ and the minimizer q is constrained to Q (the integration is computed over the whole set P˜ , but the minimization is performed on the subset Q): P (p|z)Dδ (p, q). (10) q(z) ˆ = τδ (z) = arg min q∈Q
p∈P˜
Thus the solution qˆ is the δ-projection of the barycentre pˆ of (P˜ , P (p|z), Dδ ) onto the model Q (Figure 13). A point b is the δ projection of a point a onto the manifold Q if b minimizes the δ divergence Dδ (a, q), ∀q ∈ Q. The projection b can also be characterized by the property that the geodesic line linking a and b is orthogonal to all curves in Q passing through the point b. For details, refer to Zhu and Rohwer (1995a) where the authors define the point pˆ as the ideal δ-estimate and the point qˆ as the δ estimate within the model Q. 2. Parametric Modeling The optimal estimate is the minimizer of the same cost function as in the nonparametric case but the true unknown point p is also constrained to be in Q:
180
SNOUSSI
F IGURE 13. The δ estimate qˆ is the δ projection of the nonparametric solution pˆ onto the computational model Q.
F IGURE 14.
Projection of the barycentre solution onto the parametric model.
q(z) ˆ = τδ (z) = arg min q∈Q
= arg min q∈Q
P (p|z)Dδ (p, q)
p∈Q
P (θ|z)Dδ (pθ , q) dθ.
(11)
θ
The solution is the δ-projection of the barycentre pˆ of (Q, P (θ|z), Dδ ) onto the model Q (Figure 14). The interpretation of the parametric modeling as a nonparametric one and the effect of such restriction can be done in two ways: 1. The cost function to be minimized in Eq. (11) is the same as the cost function in Eq. (10) when p is allowed to belong to the whole set P˜ and the a posteriori P (p|z) is zero outside the model Q. This is the case when the prior P (p) has Q as its support. However, this interpretation implies that
BAYESIAN INFORMATION GEOMETRY
181
the best solution p, ˆ which is the barycentre of Q, can be located outside the model Q and thus has a priori a zero probability. 2. The second interpretation is to say that the cost function to be minimized in Eq. (11) is the same as the cost function in Eq. (10) when the a posteriori P (θ |z) is the projected mass of the a posteriori P (p|z) onto the model Q. This interpretation is more consistent than the first one. In fact, it is more robust with respect to the model deviation. For instance, assume that the data are generated according to a true distribution p∗ outside the manifold Q. As the sample size N gets larger, the a posteriori distribution is increasingly concentrated around the point p∗ . The classic a posteriori measure of the manifold Q will converge to 0. Consequently, the inference on the manifold Q has no meaning. However, when considering the projected a posteriori distribution, the measure on the manifold Q will concentrate around the δ projection of the true distribution p∗ . Therefore, the parametric modeling is equivalent to the nonparametric modeling in the restricted case. We note here the role of the geometry defined on the space P and the relative geometric shape of the manifold. For instance, the ignorance is directly related to the geometry of the model Q. The projected a posteriori or a priori can be computed by: ⊥ f (q) ∝ f (p), p∈Sq
where f (p) designs the a priori or the a posteriori distribution and Sq = {p ∈ P˜ |p⊥ = q} the set of points p whose the δ-projection is the point q in Q (Figure 15). The manipulation of these concepts in the general case is very abstract. However, Section IV presents the explicit computations in the case of restricted autoparallel parametric submanifold Q1 ∈ Q of δ-flat families.
IV. P RIOR S ELECTION In this section, we report the main results of Snoussi (2005) where the problem of prior selection is addressed in a Bayesian decision framework. By prior selection, we mean how to construct a prior P (p) respecting the following rule: exploit the prior knowledge without adding irrelevant information. We note that this represents a trade-off between some desirable behavior and uniformity (ignorance) of the prior. Of note here, the prior selection must be performed before collecting the data z; otherwise the coherence of the Bayesian rule is broken down.
182
SNOUSSI
F IGURE 15. Projection of the a priori/a posteriori distribution on the manifold Q leads to an equivalence between the parametric and nonparametric modeling.
In a decision framework, the desirable behavior can be stated as follows. Before collecting the training data, provide a reference distribution p0 as a decision. The reference distribution can be provided by an expert or by our previous experience. Now we have the inverse problem of the statistical learning. Before, the a posteriori distribution (mass) was fixed and we had to find the optimal decision (barycentre). Now, the optimal decision p0 (barycentre) is fixed and we must find the optimal repartition Π (p) according to the uniformity constraint. In order to have the usual notions of integration and derivation, we assume that our objective is to find the prior on the parametric model Q = {qθ | θ ∈ Θ ⊂ Rn }. A. Family of (δ, α)-Priors The cost function can be constructed as a weighted sum of the generalization error of the reference prior and the divergence of the prior from the Jeffreys prior [the square root of the determinant of the Fisher information (Box and Tiao, 1972)] representing the uniformity. In fact, the Fisher matrix is a bilinear form that is a natural metric of the statistical manifold and it is shown that the square root of its determinant represents an equal prior for all the distributions of the model (Balasubramanian, 1996). It is worth noting that we are considering *two different spaces: the space P˜ of finite measures and the space G = {Π, Π = 1} of prior distributions on the finite measures. Since
BAYESIAN INFORMATION GEOMETRY
183
we have two distinct spaces, we can choose two different geometries on each space. In the sequel, we consider the δ-geometry on the space P˜ and the αgeometry on the space of priors. For the same reason as for the distributions pθ , we embed the space G* of priors Π in the corresponding space of finite measure priors G˜ = {Π, Π > 0}. We have the following family of cost functions parameterized4 by the couple (δ, α): √ Jδ,α (Π ) = γe Π (θ)Dδ (pθ , p0 ) dθ + γu Dα (Π, g ), (12) where Dδ and Dα are the δ divergence and the α divergence defined on the spaces P˜ and G˜ , respectively, according to Eq. (6). γe is the confidence degree in the reference distribution p0 (reflecting some a priori knowledge) and γu the uniformity degree (constraint of ignorance). Considered independently, these two coefficients are not significant. However, their ratio is relevant in the following. The cost [Eq. (12)] can be rewritten as: √ Jδ,α (Π ) = γe Eδ (τ0 ) + γu Dα (Π, g ), ∂τ0 ∂z
= 0,
where Eδ (τ0 ) is the generalization error of a fixed learning rule τ0 . This learning rule is fixed because we have not collected any data:
Eδ (τ0 ) = Π (θ) p(z|θ)Dδ pθ , τ0 (z) dz dθ θ
=
z
Π (θ)Dδ (pθ , p0 ) dθ. θ
The cost function represents a balance between a fixed predictive density p0 (the prior knowledge of the user) and the uniformity constraint reflecting our prior ignorance. Its minimization is the inverse problem of Bayesian statistical learning introduced in the previous section as the predictive density is fixed and the cost function is minimized with respect to the prior density. Theorem 1. The following (δ, α) measure: ⎧ √ g(θ) ⎪ ⎨ Πδ,α (θ ) ∝ [1+(1−α) γe D (p ,p ⎪ ⎩
γu
Πδ (θ ) ∝ e
δ
− γγue Dδ (pθ ,p0 ) 0
θ
1/(1−α) 0 )]
, α = 1,
g(θ ),
minimizes the cost function Jδ,α (Π ) over the space G˜ . 4 The cost function is also parameterized by the weights γ and γ . e u
α = 1,
(13)
184
SNOUSSI
See Appendix A for the proof of Theorem 1. The minimization of the function Jδ,α (Π ) relies on variational calculus. In the sequel, we call this measure the (δ, α)-Prior. For notational convenience, we refer to the particular case (δ, 1)-Prior as δ-Prior.5 Remark 1. The obtained (δ, α)-Prior family contains many particular known cases corresponding to particular values of the couple (δ, α) and the ratio γe /γu . For instance: • If (δ, α) = (1, 1) then the cost function (12) is the Kullback–Leibler divergence between the joint distributions of data and parameters. The (1, 1)-Prior is then the Entropic prior considered in Rodríguez (1991). • If (δ, α) = (0, 1) we obtain the conjugate prior for exponential families (see examples in Section VI). • For the particular case where Q is a δ Euclidean family (δ-flat + self dual6 ), we obtain the t-distribution for α = 1 and the Gaussian distribution for α = 1. 0 • If the ratio γe /γu goes to 0, we obtain the Jeffreys prior g(θ ). • If the ratio γe /γu goes to ∞ we obtain the Dirac concentrated on p0 . Remark 2. We note that the (δ, α)-Prior (13) can be extended to a coordinate free space Q. If we consider the prior on the elements p of the nonparametric space Q, then we have the following expression: ⎧ √ g(p) ⎨ Πδ,α (p) ∝ , α = 1, p ∈ Q, [1+(1−α) γγue Dδ (p,p0 )]1/(1−α) (14) ⎩ − γe D (p,p0 ) √ Πδ (p) ∝ e γu δ g(p), α = 1, p ∈ Q, where g(p) is a measure of statistical curvature of the space Q. B. Choice of Reference Distribution The model restriction to the parametric manifold Q is essentially for computational reasons. However, the reference distribution is a prior decision and does not depend on a postprocessing after collecting the data. Therefore, the reference distribution p0 can be located in the whole space of probability measures. We can also have either a discrete set of N reference distributions i N (p0i )N i=1 weighted by (γe )i=1 , or a continuous set of reference distributions 5 In the original contribution (Snoussi and Mohammad-Djafari, 2002), the author proposed the particular case of α = 1 and considered the family of the (δ, 1)-Priors. Not restricting to α = 1 was suggested by Carlos Rodriguez in a private communication. 6 A differentiable manifold is self-dual if the dual connections are equal: ∇ ∗ = ∇ 1−δ = ∇.
BAYESIAN INFORMATION GEOMETRY
185
(a region or the whole set of probability distributions) with a probability measure P (p0 ) corresponding to the weights (γei )N i=1 in the discrete case. We assume in both cases (discrete and continuous) that the weights sum to 1: i * γe = Pr (p0 ) = 1. We show in the following that the (δ, α)-Prior has the same expression form as Eq. (13) but with additional terms measuring: • The relative accuracy of the reference distributions (i.e., the mean distance from the reference distribution to the manifold Q) • The dispersion of the reference distributions. In the following, we give exact definitions of the two above notions (accuracy and dispersion) before introducing the expression of the (δ, α)-Prior. Definition 4.
(β-Barycentre) The distribution pG is the β-barycentre of the β
discrete set {(p1 , γe1 ), . . . , (pN , γeN )} if its β-coordinate l (8) is: β
l(pG ) =
N
β
γei l(pi ).
i=1
(β-Barycentre) The distribution pG is the β-barycentre of the β continuous set (P˜ , Pr ) if its β-coordinate l (8) is: β β l(pG ) = l(p0 )Pr (p0 ).
Definition 5.
We introduce the notions of accuracy and dispersion of a set of reference distributions (either discrete or continuous). Definition 6. (β-Accuracy) The β-accuracy of a set of reference distributions (P˜ , Pr ) (resp. {(pi , γei )}) relatively to a manifold Q is the inverse of the β-divergence between the β-barycentre of (P˜ , Pr ) (resp. {(pi , γei )}) and its β-projection on the manifold Q (Figure 16):
⊥ Aβ = 1/Dβ pG , pG . (15) Definition 7. (β-Dispersion) The β-dispersion of a set of reference distributions (P˜ , Pr ) (resp. {(pi , γei )}) is the average of the β-divergence to the β-barycentre (Figure 16):
i Vβ = Dβ (p0 , pG )Pr (p0 ) resp. γe Dβ (pi , pG ) . (16)
186
SNOUSSI
F IGURE 16. The continuous set of reference distributions is represented by the filled ball. The ⊥ its β-projection on the manifold Q. The β-accuracy is the inverse point pG is the β-barycentre and pG ⊥ of Dβ (pG , pG ). The β-dispersion is the mean (according to the distribution Pr ) of the divergence to pG .
Theorem 2. In the general case where we are given a set of reference distributions (not necessarily included in the manifold Q) with the corresponding probability measure (Pr in the continuous case and {γei } in the discrete case) and if Q is δ-convex,7 the (δ, α)-Prior has the following expression: √ ⎧ g(θ) ⎪ ⎪ Πδ,α (θ) ∝ , α = 1, γe ⎨ (1−α) γ ⊥ )1/(1−α) u 1+ Dδ (pθ ,pG γe (17) 1+(1−α) γ (1/A1−δ +V1−δ ) u ⎪ ⎪ γ γ 0 ⊥ ⎩ − e (1/A1−δ +V1−δ ) − γue Dδ (pθ ,pG ) Πδ (θ) ∝ e γu e g(θ ), α = 1, ⊥ its (1 − δ)where pG is the (1 − δ)-barycentre of reference distributions, pG projection on Q, and A1−δ and V1−δ are the accuracy and the dispersion of reference distributions set.
See Appendix B for the proof of Theorem 2. First, we notice that the expression of the (δ, α)-Prior has a similar form as in the original expression [Eq. (13)] (where the reference distribution belongs to the manifold Q (p0 = pθ 0 )). Second, we notice the additional term (1 − α) γγue (1/A1−δ + V1−δ ) in the denominator of the coefficient weighting ⊥ ). The presence of this term is intuitive. In fact, it the divergence Dδ (pθ , pG reduces the confidence coefficient (1 − α) γγue in the reference distribution p0 , in particular when the reference distribution is located outside the manifold Q 7 A manifold is β-convex if all the β-geodesics are contained in Q.
BAYESIAN INFORMATION GEOMETRY
187
(A1−δ < ∞) or when there is an uncertainty about p0 (V1−δ > 0). In words, when the confidence coefficient γe /γu is very high (→ ∞), the resulting weighting coefficient converges to 1/(1/A1−δ + V1−δ ). Therefore, the (δ, α)⊥ and implicitly takes into account the Prior does not converge to a Dirac at pG accuracy and the dispersion of the reference set (Figure 17). The confidence term is bounded as follows: 1≤
(1 − α) γγue 1 + (1 − α) γγue (1/A1−δ + V1−δ )
≤ 1/(1/A1−δ + V1−δ ).
Example 1. In the particular case of only one reference distribution p0 located outside the manifold Q (Figure 17(a)), the barycentre pG is p0 . The accuracy is the inverse of the (1−δ)-divergence of p0 to Q (1/Dδ−1 (p0 , p0⊥ )) and the dispersion is null. The above results show that regardless of the choice of the reference distribution, the resulting prior has the same form with a certain (nonarbitrary) reference prior belonging to the model Q. The existence of many reference distributions (or even a continuous set) indicates implicitly the existence of hyperparameters, and the resulting solution shows that these hyperparameters are integrated and at the same time optimized if the a priori average (the barycentre) is considered as an optimization operation.
V. δ-F LAT FAMILIES This section studies the particular case of δ-flat families. Q is a δ-flat manifold if and only if there exists a coordinate system [θi ] such that the connection coefficients Γδ (θ ) are null. We call [θi ] an affine coordinate system. It is known that δ-flatness is equivalent to (1 − δ) flatness. Therefore, there exist dual affine coordinates [ηi ] such that Γ1−δ (η) = 0. One of the many properties of δ-flat families is that we can express, in a simple way, the δ-divergence Dδ as a function of the coordinates θ and η and thus any decision can be computed while manipulating the real coordinates. It is shown in Amari (1985) that the dual affine coordinates [θi ] and [ηi ] are related by Legendre transformations and the canonical divergence is: Dδ (p, q) = ψ(p) + φ(q) − θi (p)ηi (q), where ψ and φ are the dual potentials such that: ∂ηj ∂θi −1 ∂θi = gij , ∂ηj = gij , ∂i ψ = ηi ,
∂i φ = θi .
188 SNOUSSI F IGURE 17. (a) The equivalent of the reference distribution p0 located outside Q is its 1 − δ projection. (b) The equivalent reference distribution is the 1 − δ projection of the 1 − δ barycentre of the N reference distributions. (c) The equivalent reference distribution of a continuum reference region is the 1 − δ projection of the 1 − δ barycentre.
BAYESIAN INFORMATION GEOMETRY
189
For example, the exponential families are 0-flat with the canonical parameters as 0-affine coordinates, the mixture family * is 1-flat with the mixture coefficients as 1-affine coordinates, P˜ = {p, p < ∞} is δ flat for all δ ∈ [0, 1]. A. δ Optimal Estimates in δ-Flat Families As in Section II, the δ optimal estimate is *the δ projection of * indicated δ P (θ|z), which is the minimizer of the functional p θ θ P (θ|z)Dδ (pθ , q). In general, the divergence as a function of the parameters [θi ] does not have a simple expression. However, with δ-flat manifolds, we obtain an explicit solution. Noting that:
∂i Dδ (pθ , q) = Dδ pθ , (∂i )q = θi (q) − θi (p), the solution is: ˆ qˆ = q(θ),
θˆ =
θP (θ|z) dθ = Eθ|z [θ].
This means that the δ optimal estimate is the EAP of the δ-affine coordinates. Since the only degree of freedom of the affine coordinates is the affine transformation, this estimate is invariant under affine reparameterization. This property of invariance is well expected since a parametric free geometric construction of estimates is used. In addition, noting that:
∂i D1−δ (p, q) = D1−δ p, (∂i )q = ηi (q) − ηi (p), then the EAP of the (1 − δ) affine coordinates is the (1 − δ) optimal estimate. We can directly obtain this result by simply replacing δ by (δ − 1), since a δ-flat manifold is also (1 − δ)-flat. In general, the δ-estimate is different from the (1 − δ)-estimate. They are equal in the case of a Euclidean manifold (∇ = ∇ ∗ ). B. Prior Selection with δ-Flat Families The (δ, α)-Prior Πδ,α has the following general expression: √ ⎧ g(θ) ⎨ Π (θ) ∝ , α = 1, δ,α [1+λDδ (pθ ,p0 )]1/(1−α) γe 0 ⎩ − D (p ,p ) Πδ (θ) ∝ e γu δ θ 0 g(θ ), α = 1,
(18)
where λ is a fixed coefficient depending on the confidence ration γe /γu , the accuracy, and the dispersion (see Section IV.B). p0 ∈ Q is the equivalent
190
SNOUSSI
reference distribution in the manifold Q. When we assume that Q is δ flat with affine coordinates [θi ] and dual affine coordinates [ηi ], the expression of the prior becomes: ⎧ √ g(θ) ⎪ ⎨ Πδ,α (θ ) ∝ , α = 1, [1+λ(ψ(θ)−θi ηi0 )]1/(1−α) (19) ⎪ ⎩ − γγue (ψ(θ)−θi ηi0 ) 0 Πδ (θ ) ∝ e g(θ ), α = 1, where [θi0 ] and [ηi0 ] are the affine coordinates of p0 . Therefore, we have an explicit analytic expression of the prior. Example 2. In the Euclidean case, that is, when the connection ∇ is equal to its dual connection ∇ ∗ , which is equivalent to equality of the affine coordinates [θi ] = [ηi ]: (i) the (δ, α)-Prior distribution is a t-distribution with 1+α 1−α degrees of freedom, mean θ 0 , and precision λ (ii) the δ-Prior (α = 1) is Gaussian with mean θ 0 and precision 2γe /γu : √ ⎧ g(θ) ⎨ Πδ,α (θ) ∝ , α = 1, [1+λθ−θ 0 2 ]1/(1−α) ⎩ − γe θ −θ 0 2 0 Πδ (θ ) ∝ e γu g(θ ), α = 1. C. Projection of Priors We detail here the notion of prior projection. Our objective is how to determine a prior (or in general, a probability mass) on the subspace Qa , taking into account the prior of the embedding space Q. The essence of the projection mass notion is to define a prior on a restricted set by suitably projecting the prior of the embedding space. Then, when working in the restricted space, we do not lose the information about the initial space. This notion is completely different from the common notion of defining the prior on Qa by just restricting the prior on Q (Figure 18). This idea is very ambitious compared to our limited understanding of the geometry of the space at hand. For this reason, we will illustrate the computation in the particular case of ∇ ∗ -autoparallel submanifolds Qa ⊂ Q. The general case requires a more abstract mathematical investigation about how to perform the projection. Qa is (1 − δ)-autoparallel in Q if and only if, at every point p ∈ Qa , the covariant derivative ∇∂∗a ∂b remains in the tangent space Tp of the submanifold Qa at the point p. A simple characterization in flat manifolds is that the (1 − δ)-affine coordinates [ui ] of Qa form an affine subspace of the coordinates [ηi ]. We can show that by a suitable affine reparameterization of Q, the
BAYESIAN INFORMATION GEOMETRY
(a)
191
(b)
F IGURE 18. (a) The orthogonal manifold to Qa is the manifold Qca obtained by fixing the complementary part of the dual coordinates. The projected mass is then the integral along Qca . (b) In the Euclidean case, the dual coordinates are equal. The projected mass is then obtained by marginalizing in the same coordinate system.
submanifold Qa is defined as: Qa = {pη ∈ Q|ηI = η0I is fixed}, I ⊂ {1, . . . , n}, where n − |I | is the dimension of Qa . If we consider the space Qca such that the complementary dual affine coordinates θ I I = θ 0I I are fixed (I I = {1, . . . , n} − I ), then the tangent spaces Tp and Tpc are orthogonal at the point p(η0I , θ 0I I ). Consequently, the projected prior from Q onto Qa is simply: ⊥
Π (p) =
Π (q) =
q∈Qca
Π θ I , θ 0I I dθ I .
θI
Hence, the projected prior onto a ∇ ∗ -autoparallel manifold is the marginalization in the δ-affine coordinates and not with respect to the ηI coordinates, as it seems intuitive at first. This is essential due to the dual-affine structure of the space P˜ . In fact, this incorrect intuition is due to our experience with Euclidean spaces. In a Euclidean space, the θ-coordinates are equal to the ηcoordinates. Therefore, the projection is obtained by simply marginalizing the coordinates (Figure 18).
192
SNOUSSI
VI. M IXTURE OF δ-F LAT FAMILIES AND S INGULARITIES The mixture of distributions has attracted much attention in that it provides a wider exploration of the probability distributions space based on a simple parametric manifold. For instance, by the mixture of Gaussians (which belongs to a 0-flat family) we can approach any probability distribution in total variation norm. In this section, we study the general case of the mixture of δ-flat families. The space can be defined as:
Q = {pθ | pθ = kj =1 wj pj (.; θ j )}, pj ∈ Qj , Qj is δ-flat,
where the manifolds Qj are either distinct or not. The mixture distribution can be viewed as an incomplete model where the weighted sum is considered as a marginalization over the hidden variable z representing the label of the mixture. Thus pθ = z p(z)p(x|z, θ z ) and the weights p(z) are the parameters of a mixture family. We consider now the statistical learning problem within the mixture family. A mixture of δ-flat families is not, in general, δ-flat. Therefore, the δ optimal estimates have no more a simple expression. However, with data augmentation procedure we can construct iterative algorithms computing the solution. In this section and the following one, we focus on the computation of the particular case of δ-Prior (α = 1) of the mixture density. The δ-Prior has the following expression: Πδ (θ) ∝ e
− γγue Dδ (pθ ,p0 )
0
g(θ ).
(20)
The mixture (marginalization) form of the distribution pθ leads to a complex expression of the δ-divergence and the determinant of the Fisher information. However, the computation of these expressions in the complete data distribution space (Rodríguez, 2001) is feasible and gives explicit formula. By complete data y, we mean the union of the observed data x and the hidden data z. Therefore, the divergence will be considered between complete data distributions:
Dδ p
c
, p0c
*
pc + = 1−δ
*
p0c − δ
*
(pc )δ (p0c )1−δ , δ(1 − δ)
where pc is the complete likelihood p(x, z|θ) and θ includes the parameters of the conditionals p(x|z, θz ) and the discrete probabilities p(z).
193
BAYESIAN INFORMATION GEOMETRY
The additivity property of the δ-divergence is not conserved unless δ is equal to 0 or 1 (Amari, 1985): Dδ (p1 p2 , q1 q2 ) = Dδ (p1 , q1 ) + Dδ (p2 , q2 ) − δ(1 − δ)Dδ (p1 , q1 )Dδ (p2 , q2 ). Consequently, in the special case of δ ∈ {0, 1}, we have the following simple formula: ⎧ w0 ⎪ ⎨ D0 (p, p0 ) = kj =1 w0 D0 (pj , p0 ) + log j , j j wj k w 0 ⎪ ⎩ D1 (p, p0 ) = j =1 wj D1 (pj , pj ) + log j0 . wj
A. Singularities with Mixture Families It is known that in learning the parameters of Gaussian mixture densities (Snoussi and Mohammad-Djafari, 2001) the ML fails because of the degeneracy of the likelihood function to infinity when certain variances go to zero or certain covariance matrices approach the boundary of singularity. In Snoussi and Mohammad-Djafari (2001), there is an analysis of this situation in the multivariate Gaussian mixture case. This section gives a general condition leading to this problem of degeneracy in the learning within the mixture of δ flat families. Let Q be a δ-flat manifold and [θi ] the natural affine coordinates and [ηi ] the dual affine coordinates. The two coordinate systems are related by Legendre transformation (Amari, 1985): ∂ηj ∂θi −1 ∂θi = gij , ∂ηj = gij , ∂i ψ = ηi , j =1,...,n
∂i φ = θi ,
where (gij )i=1,...,n is the Fisher matrix and ψ and φ are the dual potentials. It is clear from the expression of the variable transformation between the two affine coordinates that a singularity of the Fisher information matrix g leads to nondifferentiability in the transformation between θ and η. A singularity of g means that the determinant of this matrix is zero. Therefore, it is interesting to study the behavior of the dual divergence at the boundary of singularity; we will show in an example that the dual divergences may have different behavior as the distribution p approaches the boundary of singularity. To illustrate such behavior, we take a Gaussian family {N (μ, σ 2 ) | μ ∈ R, σ ∈ R+ }, which is a two-dimensional statistical manifold 0-flat. The 0-affine coordinates are θ and the 1-affine coordinates are η given by the
194
SNOUSSI
following expressions:
θ1 =
μ , σ2
θ2 =
η1 = μ,
η2 =
−1 , 2σ 2 μ2 + σ 2 .
(21)
The corresponding Fisher information is: g(η) ∝ 1/σ 6 . g(θ ) ∝ σ 6 ,
(22)
The canonical divergence has the following expression: Dδ (p1 , p2 ) = D1−δ (p2 , p1 ) = ψ(p1 ) + φ(p2 ) − θi (p1 )ηi (p2 ), (23) where ψ and φ are the potentials given by: √ √ μ2 −1 − log + log 2πσ, φ = 2πσ. (24) 2 2σ 2 We see that the degeneracy occurs when the variance σ goes to zero. A detailed study of how this degeneracy occurs in the Gaussian mixture case is in Snoussi and Mohammad-Djafari (2001) and is recalled in the example of the next section. Here we focus on the difference of behavior of the two canonical divergences D0 and D1 . The expression of the δ-Prior is: 0 Πδ ∝ e−Dδ (pθ ,p0 ) g(θ ). ψ=
Following the complete data procedure: ⎧ w γ ⎨ Π ∝ e− γue wi0 {D0 (pθi ,p0i )+log wi0i } 0g(θ , w), 0 w wi {D1 (pθi ,p0i )+ w i } 0 − γe ⎩ i0 Π1 ∝ e γu g(θ , w). The resulting prior is factorized and separated into independent priors on the components of the Gaussian mixture. Combining the expressions of Eqs. (21), (22), (23), and ( 24), we note the following comparison of the 0 and 1 priors through their dependences on the variance σj : δ=1 ↓ p → ∂Q
δ=0 ↓ p → ∂Q Π0 is
O(σjα e−k0 /σj
↓ Exponential where α, k0 are constant.
2
)
Π1 is
γ
2wj j O(σj γu )
↓ Polynomial
BAYESIAN INFORMATION GEOMETRY
195
We note that: • For δ = 0, the prior decreases to 0 when p approaches the boundary of singularity ∂ Q with an exponential term leading to an inverse Gamma prior for the variance. • For δ = 1, the prior decreases to 0 when p approaches the boundary of singularity ∂ Q with a polynomial term leading to a Gamma prior for the variance. We note the presence of the parameter wi in the power term. This kind of behavior pushes us to use the 0 prior in that it is able to eliminate the degeneracy of the likelihood function.
VII. E XAMPLES In this section we develop the δ-Prior in two learning problems: (1) multivariate Gaussian mixture and (2) joint blind source separation and segmentation. A. Multivariate Gaussian Mixture The multivariate Gaussian mixture distribution of x ∈ Rn is: p(x i ) =
K
wk N (x i ; mk , R k ),
(25)
k=1
where wk , mk , and R k are the weight, mean, and covariance of the cluster k. This can be interpreted as an incomplete data problem where the missing data are the labels (zi )i=1,...,T of the clusters. Therefore, the mixture in Eq. (25) is considered as a marginalization over z: p(zi )N (x i |zi , θ ), p(x i ) = zi
where θ is the set of the unknown means and covariances. Our objective is the prediction of the future observations given the trained data x i , i = 1, . . . , T . The whole parameter characterizing the statistical model is η = (θ, w). We consider now the derivation of the δ prior for δ ∈ {0, 1} and compare the two resulting priors. The δ prior has the following form: 0 − γe D (p ,p ) Πδ (η) ∝ e γu δ η 0 g(η). Therefore, we need to compute the Dδ divergence and the Fisher information matrix. As noted in the previous section and following (Rodríguez, 2001), the computation is considered in the complete data space (X × Z )T of observations x i and labels zi , T is the number of observations: in fact, we
196
SNOUSSI
mean the number of virtual observations as the construction of the prior precedes the real observations. We have: ⎧ p(x ,z1,...,T |η0 ) ⎪ D0 (η : η0 ) = log p(x1,...,T E ⎪ ,z1,...,T |η) , ⎪ 1,...,T ⎪ x 1,...,T ,z1,...,T |η0 ⎪ ⎪ ⎨ p(x 1,...,T ,z1,...,T |η) D1 (η : η0 ) = log p(x 1,...,T E ,z1,...,T |η0 ) , x 1,...,T ,z1,...,T |η ⎪ ⎪ ⎪ ∂2 ⎪ ⎪ ⎪ E ⎩ gij (η) = − ∂i ∂j log p(x 1,...,T , z1,...,T |η) . x 1,...,T ,z1,...,T |η
By classifying the labels z1,...,T and using the sequential Bayes’ rule between x 1,...,T and z1,...,T , the δ divergences become: ⎧ 0
⎨ D0 (η : η0 ) = T k w0 D0 (Ni : N 0 ) + log wi , i=1 i i wi
⎩ D1 (η : η0 ) = T k wi D1 (Ni : N 0 ) + log w0i , i=1 i wi
where D0 (Ni : Ni0 ) = D1 (Ni0 : Ni ) is the 0 divergence between two multivariate Gaussians: ⎧ 0 ⎨ D0 (Ni Ni )
1 | + Tr(R i0 R −1 ) − n + (μi − μi0 )∗ R −1 (μi − μi0 ) , = 2 log |R i R −1 i i i0 ⎩ D1 (Ni Ni0 ) = D0 (Ni0 Ni ). The Fisher matrix is block diagonal with K diagonal blocks corresponding to the components of the mixture. Each block gi with size (n + n2 + 1) has also a diagonal form (n is the dimension of the vector x t ): ⎤ ⎡ [g1 ] wi gN (mi , R i ) [0] .. ⎦ g=⎣ = , , g i . [0] 1/wi [gK ] where gN is the Fisher matrix of the multivariate Gaussian and has the following expression: −1 R [0] −1 gN (m, R) = [0] − 12 ∂R ∂R whose determinant is:
gN (m, R) = |R|−(n+2) .
Thus, the determinant of the block gi is:
n2 (n2 +n−1) gi (wi , mi , R i ) = 1 wi |R i |−(n+2) . 2
(26)
BAYESIAN INFORMATION GEOMETRY
197
The additional form of the {0, 1} divergences (implying the multiplicative form of their exponentials) and the multiplicative form of the determinant of the Fisher matrix (due to its block diagonal form) lead to an independent + priors of the components ηi = (wi , mi , R i ): Π (η) = K k=1 Π (η i ). The two values of δ = {0, 1} lead to two different priors Πδ : • δ = 0:
wi0 γe 0 0 0 wi D0 Ni : Ni + wi log gi (ηi ) Π0 (ηi ) ∝ exp − γu wi
Ri β0 −1 ∝ N mi ; m0 , Wn R −1 (27) i ; ν0 , R 0 wi 0 αwi
with, α=
γe , γu
ν0 = αwi0 ,
β0 = αwi0 +
n2 + n − 1 . 2
Wn is the Wishart distribution of an n × n matrix:
ν−(n+1) ν −1 . Wn (R; ν, ) ∝ |R| 2 exp − Tr R 2 The 0-prior is normal inverse Wishart for the mean and covariance (mi , R i ) and Dirichlet for the weight wi , that is, the conjugate prior. • δ = 1:
γe wi 0 gi (ηi ) wi D1 Ni : Ni + wi log 0 Π1 (ηi ) ∝ exp − γu wi
Ri αwi − 1 Wn R i ; αwi − 1, R0 ∝ N mi ; m0 , αwi αwi
n2 +n−1 αwi − 1 −(1+ n2 )αwi 0 αwi 2 (28) n × wi wi 2 where n is the generalized Gamma function of dimension n (Box and Tiao, 1972, p. 427): 1 n(n−1) 5
n n−1 1 2 i−n , b> . b+ n (b) = 2 2 2 i=1
The 1-prior Π1 [Eq. (28)] is the generalized entropic prior (Rodríguez, 2001) to the multivariate case. We see that the prior Π1 is a Wishart function of the covariance matrices R i , and the prior Π0 is an inverse Wishart function of the covariances. This leads to a difference of the behavior of
198
SNOUSSI
these functions on the boundary of singularity (the set of singular matrices). Figure 19 illustrates the problem of degeneracy and highlights the advantage of penalizing the likelihood by a 0-Prior when learning the parameters of the Gaussian mixture. In this simulation example, we have considered the ML estimation of a mixture of 10 Gaussians of bidimensional vectors (n = 2). The 10 multivariate Gaussians have the same covariance and the means are located on a circle. The graph on the left of Figure 19 represents the original distribution, which is a mixture of 10 Gaussians. The graph in the middle shows the estimated distribution with the ML estimator. We note the degeneracy of the ML, which diverges to very sharp Gaussians (because of the singularity of the estimated covariances). The graph on the right shows the effect of regularization produced by the penalization of the likelihood by a 0-Prior. B. Source Separation The second example deals with the source separation problem. The observations x 1,...,T are T samples of m-vectors. At each time t, the vector data x t is supposed to be a noisy instantaneous mixture of an observed n-vector source s t with unknown mixing coefficients forming the mixing matrix A. This is simply modeled by the following equation: x t = As t + nt ,
t = 1, . . . , T ,
where given the data x 1,...,T , our objective is the recovering of the original sources s 1,...,T and the unknown matrix A. The Bayesian approach taken to solve this inverse problem (Knuth, 1999; Mohammad-Djafari, 1999; Snoussi and Mohammad-Djafari, 2002) also needs the estimation of the noise covariance matrix R n and the learning of the statistical parameters of the original sources s 1,...,T . In the following, we suppose that the sources are statistically independent and that each source is modeled by a mixture of univariate Gaussians, so that we have to learn each set of source j parameters ηj , which contains the weights, means, and variances composing the mixture j : j ηj = (ηi )i=1,...,Kj , j
j
j
j
ηi = (wi , mi , σi ). The index j indicates the source j and i indicates the Gaussian component i of the distribution of the source j . Therefore, we do not have a multidimensional Gaussian mixture but instead independent unidimensional Gaussian mixtures. In the following, our parameter of interest is θ = (A, R n , η): the mixing matrix A, the noise covariance R n and η contains all the parameters of the
(b)
(c)
BAYESIAN INFORMATION GEOMETRY
(a)
F IGURE 19. (a) Original distribution. (b) Estimated distribution with maximum likelihood, given 100 samples. (c) Estimated distribution with penalized maximum likelihood, given 100 samples.
199
200
SNOUSSI
sources model. Our objective is the computation of the δ priors for δ ∈ {0, 1}. We have an incomplete data problem with two hierarchies of hidden variables—the sources s 1,...,T and the labels z1,...,T —so that the complete data are (x 1,...,T , s 1,...,T , z1,...,T ). We begin by the computation of the Fisher information matrix, which is common to the both geometries. 1. Fisher Information Matrix The Fisher matrix F (θ ) is defined as: 2 ∂ Fij (θ ) = − log p(x 1,...,T , s 1,...,T , z1,...,T |θ ) . E x 1,...,T ,s 1,...,T ,z1,...,T ∂i ∂j The factorization of the joint distribution p(x 1,...,T , s 1,...,T , z1,...,T |θ ) as: p(x 1,...,T , s 1,...,T , z1,...,T |θ ) = p(x 1,...,T |s 1,...,T , z1,...,T , θ )p(s 1,...,T |z1,...,T , θ )p(z1,...,T |θ ) and the corresponding expectations as: E
x 1,...,T ,s 1,...,T ,z1,...,T
[·] = E [·] z1,...,T
E
s 1,...,T |z1,...,T
[·]
E
x 1,...,T |s 1,...,T ,z1,...,T
[·]
and taking into account the conditional independencies ((x 1,...,T |s 1,...,T , + j j s 1,...,T |z1,...,T ), the z1,...,T ) ⇔ (x 1,...,T |s 1,...,T ) and (s 1,...,T |z1,...,T ) ⇔ Fisher information matrix will have a block diagonal structure as follows: ⎡ ⎤ g(A, R n ) ··· [0] .. ⎢ ⎥ . g(η1 ) ⎢ ⎥ g(θ ) = ⎢ ⎥. .. ⎣ ⎦ . n [0] · · · g(η ) (A, R n )-Block. The Fisher information matrix of (A, R n ) is: 2 ∂ Fij (A, R n ) = − E E log p(x 1,...,T |s 1,...,T , A, R n ) , s x|s ∂i ∂j which is very similar to the Fisher information matrix of the mean and covariance of a multivariate Gaussian distribution. The obtained expression is: 8( [0] 9 E R ss ) ⊗ R −1 n s 1,...,T , g(A, R n ) = ∂R −1 [0] − 12 ∂Rnn ∗ s t s t and ⊗ is the Kronecker product. where R ss = T1
201
BAYESIAN INFORMATION GEOMETRY
We note the block diagonality of the (A, R n )-Fisher matrix. The term corresponding to the mixing matrix A is the signal to noise ratio (SNR) as can be expected. Thus, the amount of information about the mixing matrix is proportional to the SNR. The induced volume of (A, R n ) is then: | E R ss |m/2 1/2 g(A, R n ) dA dR n = η dA dR n . m+n+1 |R n | 2 (ηj )-Block. Each g(ηj ) is the Fisher information of a 1D Gaussian distribution. Therefore, it is obtained by setting n = 1 in the expression [Eq. (26)] of the previous section: Kj 1/2 : 5w j 1/2 j i g η dη = dηj . 3/2 v i=1 i 2. δ-Divergence (δ = 0, 1) The δ-divergence between two parameters θ = (A, R n , η) and θ 0 = (A0 , R 0n , η0 ) for the complete data likelihood p(x 1,...,T , s 1,...,T , z1,...,T |θ ) is: ⎧ p(x 1,...,T ,s 1,...,T ,z1,...,T |θ 0 ) 0 ⎪ ⎪ ⎨ D0 (θ : θ ) = E 0 log p(x 1,...,T ,s 1,...,T ,z1,...,T |θ ) , x,s,z|θ
⎪ 0 ⎪ ⎩ D1 (θ : θ ) =
p(x
,s
,z
|θ )
E log p(x 1,...,T,s 1,...,T,z 1,...,T|θ 0 ) .
x,s,z|θ
1,...,T
1,...,T
1,...,T
Similar developments of the above equation, as in the computation of the Fisher matrix based on the conditional independencies, lead to an affine form of the divergence, which is a sum of the expected divergence between the (A, R n ) parameters and the divergence between the sources parameters η: ⎧ 0 0 0 0 ⎪ ⎨ D0 (θ : θ ) = E D0 (A, R n : A , R n ) + D0 (η : η ), s|η0 |s
0 0 0 ⎪ ⎩ D1 (θ : θ ) = E D1 (A, R n : A , R n ) + D1 (η : η ), 0
s|η |s
where Dδ means the divergence between the distributions p(x 1,...,T |A, R n , |s
s 1,...,T ) and p(x 1,...,T |A0 , R 0n , s 1,...,T ) keeping the sources s 1,...,T fixed. The δ-divergence between η and η0 is the sum of the δ-divergences between j each source parameter ηj and η0 due to the a priori independence between the j sources. Then, the divergence between ηj and η0 is obtained as a particular case (n = 1) of the general expression derived in the multivariate case. Therefore, we have the same form of the prior as in Eqs. (27) and (28).
202
SNOUSSI
The expressions of the averaged divergences between the (A, R n ) parameters are: ⎧ E D0 (A, R n : A0 , R n0 ) ⎪ ⎪ ⎪ s|η0 |s ⎪ ⎪ ⎪ ⎪ | + Tr(R −1 = 12 log |R n R −1 ⎪ n R n0 ) ⎪ −1 n0
⎪ ⎪ ⎪ + Tr R (A − A ) [R ss ](A − A0 )∗ , E 0 ⎨ n s|η0
E D1 (A, R n : A0 , R n0 ) ⎪ ⎪ ⎪ s|η |s ⎪ ⎪ ⎪ ⎪ = 12 log |R n0 R −1 | + Tr(R −1 ⎪ n0 R n ) ⎪ −1 n
⎪ ⎪ ⎪ + Tr R n0 (A − A0 ) E [R ss ](A − A0 )∗ . ⎩ s|η
leading to the following δ priors on (A, R n ): ⎧ Π0 (A, R −1 ⎪ n ) ⎪ ⎪ m ⎪ 0 −1 2 ⎨ ∝ N (A; A0 , α1 R 0ss −1 ⊗ R n )Wim (R −1 n ; α, R n )| E [R ss ]| , s|η
⎪ Π1 (A, R n ) ⎪ ⎪ 0 ⎪ ⎩ ∝ N (A; A0 , α1 E [R ss ]−1 ⊗ R 0n )Wim (R n ; α − n, α−n α R n ). s|η
Therefore, the 0-prior is a normal inverse Wishart prior (conjugate prior). The mixing matrix and the noise covariance are not a priori independent. In −1 fact, the covariance matrix of A is the noise to signal ratio α1 R 0ss ⊗ R n . We note a multiplicative term, which is a power of the determinant of the a priori expectation of the source covariance E [R ss ]. This term can be injected in the s|η
prior p(η) and thus the (A, R n ) parameters and the η parameters are a priori independent. The 1-prior (entropic prior) is normal Wishart. The mixing matrix and the noise covariance are a priori independent since the noise to signal ratio 1 0 0 −1 α E [R ss ] ⊗ R n depends on the reference parameter R n . However, we have s|η
in counterpart the dependence of A and η through the term E [R ss ]−1 present s|η
in the covariance matrix of A. In practice, we prefer to replace the expected covariance E [R ss ], in the two priors, by its reference value R 0ss . s|η
We note that the precision matrix for the mixing matrix A (αR 0ss ⊗ R −1 n −1 0 for Π0 and α E [R ss ] ⊗ R n for Π1 ) is the product of the confidence term s|η
α = γe /γu in the reference parameters and the signal to noise ratio. Therefore, the resulting precision of the reference matrix A0 is not only our a priori coefficient γe , but the product of this coefficient and the signal to noise ratio.
BAYESIAN INFORMATION GEOMETRY
203
VIII. C ONCLUSION AND D ISCUSSION In this work, we have shown the importance of providing a geometry (a measure of distinguishability) to the space of distributions. A different geometry will give a different learning rule mapping the training data to the space of predictive distributions. The prior selection procedure established in a statistical decision framework needs to be taken in a specified geometry. The problem of prior selection is considered an inverse problem of a geometric statistical decision learning problem. The solving of a variational cost function leads to a family of a priori distributions called the (δ, α)-Priors. This family contains many known particular cases of probability distributions such as the exponential family, the student distribution, etc., which correspond to particular geometries. All the results in this work can be extended to manifold valued parametric models. Indeed, when in a specific problem the space of parameters is not Euclidean but rather a manifold, we can apply this work results to construct a prior on the manifold. This can be done by: 1. Replacing the statistical manifold Q of probability distributions by the manifold of parameters under hand. 2. Choosing a suitable metric and an affine connection on the manifold. We have also derived the expression of this family in the more general case of a set of reference distributions by introducing the notions of accuracy and dispersion. We have tried to elucidate the interaction between the parametric and nonparametric modeling. The notion of “projected mass” gives to the restricted parametric modelization a nonparametric sense and shows the role of the relative geometry of the parametric model in the whole space of distributions. The same investigations are considered in the interaction between a curved family and the whole parametric model containing it. Exact expressions are shown in a simple case of auto-parallel families, and we are working on the more abstract space of distributions.
A PPENDIX A A.1 Proof of Theorem 1 Consider the (δ, α)-cost as a function of the prior Π : √ Jδ,α (Π ) = γe Π (θ)Dδ (pθ , p0 ) dθ + γu Dα (Π, g ),
204
SNOUSSI
where the β-divergence Dβ is defined as: * * * β 1−β p q p q Dβ (p, q) = 1−β + β − β(1−β) , β = 0, 1, * * * D1 (p, q) = q − p + p log p/q = D0 (q, p). For the first case (α = 0, 1), by variational calculus, we have the following expression of the variation Jδ,α : √ Jδ,α = γe Dδ (pθ , p0 ) Π dθ + γu Dα (Π, g )
α−1 Π γu = γe Dδ (pθ , p0 ) Π dθ + Π dθ 1− 0 1−α g(θ )
α−1 γu Π γu 0 = Π γe Dδ (pθ , p0 ) + dθ. − 1−α 1−α g(θ )
Equating Jδ,α to 0 yields the (δ, α)-Prior: 0 g(θ) , Πδ,α (θ ) ∝ γe [1 + (1 − α) γu Dδ (pθ , p0 )]1/(1−α)
α = 0, 1.
We note that the case α = 0 can be obtained simply by replacing α by 0 in the previous equation. We have obtained the same result when considering the 0-divergence in the cost function. For the case α = 1, the variation of Jδ,1 is: √ Jδ,1 = γe Dδ (pθ , p0 ) Π dθ + γu D1 (Π, g ) 0 = γe Dδ (pθ , p0 ) Π dθ + γu log Π/ g(θ ) Π dθ. Jδ,1 = 0 yields the δ-Prior: Πδ (θ) ∝ e
− γγue Dδ (pθ ,p0 )
0
g(θ ).
A PPENDIX B B.1 Proof of Theorem 2 Before proving the theorem, we recall some important definitions and results (see Amari, 1985; Amari and Nagaoka, 2000; Zhu and Rohwer, 1995a for details):
BAYESIAN INFORMATION GEOMETRY
205
Theorem 3 (Pythagorean relation). If the β-geodesic connecting p and r is orthogonal to the (1 − β)-geodesic connecting r and q (the geodesics are considered in a δ-flat space), then Dβ (p, q) = Dβ (p, r) + Dβ (r, q). Corollary 1. (β-Projection) Let p a point in a dually β-flat space S and Q a (1−β)-autoparallel manifold. Then a necessary and sufficient condition for a point q in Q to satisfy Dβ (p, q) = minr∈Q Dβ (p, r) is for the β-geodesic connecting p and q to be orthogonal to Q at q. The point q is called the β-projection of p onto Q. Using the above results, the following decomposition of the divergence is straightforward: Corollary 2. Let p a point in a δ-convex Q (with respect to the whole set P˜ ) and let p0 a point in P˜ , then
Dδ (p, p0 ) = Dδ p, p0⊥ + Dδ p0⊥ , p0 where p0⊥ is the (1 − δ)-projection of p0 onto Q. Consider the cost function to be minimized, in the general case of not restricted reference distributions: √ Jδ,α (Π ) = γe Pr (p0 ) Π (θ)Dδ (pθ , p0 ) dθ + γu Dα (Π, g ). With the definition of the barycentre (Definition 5 in Section IV.B) and the expression of the δ-divergence [Eq. (6)], we have a simple expression of the integral with respect to the reference distribution p0 :
& ' 1 p0 P (p0 )Dδ (pθ , p0 ) = Dδ (pθ , pG ) + − pG δ Pr ( ) (B.1) = Dδ (pθ , pG ) + Dδ (pG , p0 ) . ⊥ as the (1 − δ)-projection of p onto Using Corollary 2 with the point pG G Q, we can decompose the divergence between the points pθ and pG as the geodesics are orthogonal (Figure 20), )
⊥
( ⊥ P (p0 )Dδ (pθ , p0 ) = Dδ pθ , pG , pG + Dδ (pG , p0 ) + Dδ pG
⊥ = Dδ pθ , pG (B.2) + 1/A1−δ + V1−δ
206
SNOUSSI
F IGURE 20. ⊥ , p ). l(pG G
⊥ ) is orthogonal to the (1−δ)-geodesic Pythagorean relation: the δ-geodesic l(p, pG
where the accuracy A1−δ and the dispersion V1−δ are defined according to Definitions 6 and 7, respectively. Then, replacing the expression of the mean divergence [Eq. (B.2)] in the cost function [Eq. (B.1)] and minimizing with respect to the prior Π using the same variational arguments as in the proof of Theorem 1 in Appendix A, we obtain the expression of the (δ, α)-Prior: √ ⎧ g(θ) ⎪ ⎪ , α = 1, γ ⎨ Πδ,α (θ ) ∝ (1−α) e ⎪ ⎪ ⎩
γu ⊥ ) 1/(1−α) Dδ (pθ ,pG γ 1+(1−α) γ e (1/A1−δ +V1−δ ) u ⊥) − γγue (1/A1−δ +V1−δ ) − γγue Dδ (pθ ,pG
1+
Πδ (θ ) ∝ e
e
0
g(θ ),
α = 1.
R EFERENCES Amari, S. (1985). Differential-Geometrical Methods in Statistics. Springer Lecture Notes in Statistics, vol. 28. Springer-Verlag, New York. Amari, S., Nagaoka, H. (2000). Methods of Information Geometry. Translations of Mathematical Monographs, vol. 191. AMS University Press, Oxford. Balasubramanian, V. (Jan. 1996). A Geometric Formulation of Occam’s Razor for Inference of Parametric Distributions, Technical Report (preprint PUPT-1588 and http://xyz.lanl.gov/adap-org/9601001), Princeton.
BAYESIAN INFORMATION GEOMETRY
207
Balasubramanian, V. (1997). Statistical inference, Occam’s razor and statistical mechanics on the space of probability distributions. Neural Computation 9 (2), 349–368. cond-mat/9601030. Boothby, W. (1986). An Introduction to Differential Manifolds and Riemannian Geometry. Academic Press. Box, G.E.P., Tiao, G.C. (1972). Bayesian Inference in Statistical Analysis. Addison–Wesley. Kass, R.E., Wasserman, L. (1994). Formal Rules for Selecting Prior Distributions: A Review and Annotated Bibliography. Technical report No. 583. Carnegie Mellon University, Department of Statistics. Knuth, K. (1999). A Bayesian approach to source separation. In: Cardoso, J.-F., et al. (Eds.), Proceedings of the First International Workshop on Independent Component Analysis and Signal Separation, pp. 283–288. Mohammad-Djafari, A. (1999). A Bayesian approach to source separation. In: Erikson, J.R.G., Smith, C. (Eds.), Bayesian Inference and Maximum Entropy Methods. MaxEnt Workshops, American Institute of Physics. Rodríguez, C. (1991). Entropic priors (technical report). http://omega.albany. edu:8008/entpriors.ps. Rodríguez, C. (2001). Entropic priors for discrete probabilistic networks and for mixtures of Gaussians models. In: Fry, R.L. (Ed.), Bayesian Inference and Maximum Entropy Methods. MaxEnt Workshops, American Institute of Physics, pp. 410–432. Snoussi, H. (2005). Geometry of prior selection. NeuroComputing 67 (Spec. issue), 214–244. Snoussi, H., Mohammad-Djafari, A. (2001). Penalized maximum likelihood for multivariate Gaussian mixture. In: Fry, R.L. (Ed.), Bayesian Inference and Maximum Entropy Methods. MaxEnt Workshops, American Institute of Physics, pp. 36–46. Snoussi, H., Mohammad-Djafari, A. (2002). Information geometry and prior selection. In: Williams, C. (Ed.), Bayesian Inference and Maximum Entropy Methods. MaxEnt Workshops, American Institut of Physics, pp. 307– 327. Snoussi, H., Mohammad-Djafari, A. (2002). MCMC joint separation and segmentation of hidden Markov fields. In: Neural Networks for Signal Processing XII. IEEE Workshop, pp. 485–494. Zhu, H., Rohwer, R. (1995a). Bayesian invariant measurements of generalisation. Neural Proc. Lett. 2 (6), 28–31. Zhu, H., Rohwer, R. (1995b). Bayesian invariant measurements of generalisation for continuous distributions (technical report, NCRG/4352). ftp://cs.aston.ac.uk/neural/zhuh/continuous.ps.z. Aston University.
This page intentionally left blank
Index
A
Brightfield microscopy, 1–2 cheek cell in emulated, 15, 15f crystallized protein in, 31–32, 31f mica fragment in emulated, 17, 17f oil droplet in, 8f, 14, 14f
β-Accuracy, 185, 186f, 203 definition of, 185 Adjunction, 121 Affine combinations, 111, 111f Affine connections, 171–172, 172f Affine coordinate system, 187 Affine coordinates, 177 AFM. See Atomic force microscope Amplitude objects, 2 Angular multiplexing, 33 Approximation theory, 95 Ascents, 131 Asymmetric edge enhancement, 18–22 experimental realization of, 20, 21f, 22f modified spiral phase filter and, 18–22 results of, 21–22 shadow effects and, 18–22 cheek cell in, 21, 23f setup used for, 22f zeroth-order Fourier component’s influence on, 12, 18–20 Atomic force microscope (AFM), 28 Attractors, 136 Auto-parallel submanifold. See Submanifold, auto-parallel
C C property. See Commutes property CCD camera. See Charged-couple device camera Cells, 2 cheek brightfield image of, 15, 15f darkfield image of, 15, 15f shadow effects and, 21, 23f labeling of, 2 PtK2, spiral phase contrast microscopy of, 21, 24f Cellular neural networks (CNNs), 136 Center line demodulation, 41–42, 42f Charged-couple device (CCD) camera, 9, 10f Cheek cell brightfield image of, 15, 15f darkfield image of, 15, 15f shadow-effect image of, 21, 23f Closings, 66 Closure systems, 66, 126 CNF. See Conjunctive normal form CNNs. See Cellular neural networks Coherent illumination, 9–12 laboratory setup for, 10f Co-idempotency, 59, 69 corollary 40 and, 145–146 LSFs and, 142–146 theorem 39 and, 144–145 Venn diagram with, 148f Commutes (C) property, 103, 104 operators with HT, L, I, and, 110 Computational model, 176n2, 179 Conjugate prior, 166, 184 Conjunctive normal form (CNF), 100 Consistency
B Band B (M, N), 128–130 theorem 27 and, 128–129 β-Barycentre, 185, 186f definition of, 185 Bayesian information geometry, 164 Bayesian learning, 178–179. See also Statistical geometric learning Blazed phase holograms, 9, 10f Boolean functions, 100 positive (PBF), 100–101, 101 arbitrary lattices with, 133–135 example 11 and, 101 linearly ordered sets and, 100–101
209
210
INDEX
corollary 3 and, 87 FTP property and, 86–89 mathematics of vision and, 89 theorems (1, 2, 4, 5) and, 86–89 theory of, 89 Consistency theorem first fundamental, 86–87 second fundamental, 87–89 Contour line demodulation, 40–41, 40f Contrast, 2 Convex combinations, 111, 111f Convolution kernel, 37 details on, 48–49 Convolution theorem, 8, 26, 36 Coordinate neighborhoods, 167 Corollary 3 (FTP/consistency and), 87 8 (rounding operator R and), 97 10 (thresholding operator T and), 98 34 (idempotency and), 140–141 35 (idempotency and), 141 38 (idempotency and), 142 40 (co-idempotency and), 145–146 Covariant derivative along curve γ , 172–173, 173f Crystallized proteins, 31 brightfield image of, 31–32, 31f spiral phase-filtered images of, 31–32, 31f Cups, 131, 132f n-large, 131
DFT. See Discrete Fourier transform DIC. See Differential interference contrast Differentiable manifold. See Manifold, differentiable Differential geometry, 164 definitions from, 166–175 Riemannian manifolds and, 166–175 Differential interference contrast (DIC), 3–4 Dilation, 121 Directional derivative, 168 Discrete event systems, 111 Discrete Fourier transform (DFT), 64, 65 FFT and, 98 Discrete pulse transforms (DPTs), 58, 62–65 edge detection with, 152–153 LULU theory and, 60–86 noise’s influence in, 90–91 shape preservation in, 83–86 Disjunctive normal form (DNF), 100 β-Dispersion, 185, 186f, 203 definition of, 185 δ-Divergence, 201–202 DNF. See Disjunctive normal form Dodt gradient contrast imaging, 4–5 Doughnut field, 8 Doughnut mode, 37 DPTs. See Discrete pulse transforms Dual connections, 175 Dual-closure system, 66
D
EAP. See Expected a posteriori Edge enhancement. See Asymmetric edge enhancement; Isotropic edge enhancement Edge preservation, 91 Eigenvalues, of separators, 70 Electrons, 61 Entropic prior, 166, 184 Envelopes, 125–127 Erosion, 121 Euclidean space, 175 Example 11 (PBFs on linearly ordered sets and), 101 12 (threshold decompositions and), 102 14 (stack filters and), 104–105 17 (semigroups and), 113 25 (structural openings and), 121–122 26 (structural openings and), 124–125
Darkfield microscopy, 2 cheek cell in emulated, 15, 15f mica fragment in emulated, 16–17, 17f ultra-, 2 Decision rule, 165, 176, 177f Decompositions fundamental problem of nonlinearity in, 81–83 LULU, of sequence x, 74–75, 74f nonlinear multiresolution, 73–74, 73f Demodulation, spiral interferogram, 39–46 multi-image, 42–46, 44f, 45f thick samples and, 51–52 single-image, 39–42 center line, 41–42, 42f contour line, 40–41, 40f Descents, 130, 131
E
211
INDEX 30 (PBFs on lattices and), 134 31 (PBFs on lattices and), 134–135 32 (idempotency and), 137–138 36 (idempotency and), 141 Expected a posteriori (EAP), 178, 189 Experimental science, learning machine model of, 164f Exponential family, 203 Exponential mapping, 171, 171f
F Fast Fourier transform (FFT), 98 DFT and, 98 Fast processes, 46, 47 FFT. See Fast Fourier transform Fisher information matrix, 200–201 source separation problem and, 200–201 δ-Flat families, 187, 189–191 δ optimal estimates in, 189 prior projection and, 190–191, 191f prior selection with, 189–190 singularities and, 192–195 Fluorescence microscopy, 2 Fort profile, 76 Haar decomposition of, 76, 77f impulsive noise, Ci , Di , added to, 78f Fourier plane image filtering in, 5–6, 6f special property of, 5 Fringes. See Interference fringes; Spiral interferograms FTP property. See Fully trend preserving property Fully trend preserving (FTP) property, 85 consistency and, 86–89 corollary 3 and, 87 theorems 1, 2, 4, 5 and, 86–89 VP property’s equality with, 85 Fundamental Separator Theorem, 70
G Galois connections, 119–120 abstract semigroups and, 111–120 Galois, Evariste, 119n Gaussians, 192 Geodesics, 170–171 Glue strip, 35 interferograms of deformations in, 33f Graded-field microscopy, 5
H Haar wavelet decomposition, 62, 63f, 65 of fort profile, 76, 77f Halo effects, 3 Halogen bulb, 12, 13f Heisenberg’s inequality, 80 Helical charges, 47–48 Helical mode indices, 47–48 Highlight conjecture, 82, 85, 89 formal statement of, 83 proof of, 89 Hilbert transform, 18–19 Hoffman modulation contrast, 4 Hoffman, Robert, 4 Holograms blazed phase, 9, 10f off-axis, 9, 10f, 11, 47 spiral phase, 9, 10f static phase, 9 Homomorphisms, 66 Horizontally translation invariant (HT) property, 61, 103, 104 operators with L, I, C, and, 110 smoothers and, 61 Hough Transform, 152, 153 HT property. See Horizontally translation invariant property
I I property. See Increasing property Idempotency, 59, 69 corollary in 34th, 140–141 35th, 141 38th, 142 example in 32nd, 137–138 36th, 141 LSFs and, 137–142 stack filters and, 106 theorem and 33rd, 139–140 37th, 141–142 Venn diagram with, 148f Idempotent, 112 Idempotent operators, 65 Illumination source coherent, 9–12, 10f
212 laser, 13–15, 14f low-coherence, 12, 13f partially coherent, 15–18, 15f, 16f, 17f Image algebra, 135 lattice stack filters and, 135–137 nonlinear systems and, 135–137 Image filtering, in Fourier plane, 5–6, 6f Image registration, 153–154 Impulsive noise. See Noise, impulsive Increasing operators, 66 Increasing (I) property, 103, 104 operators with C, HT, L, and, 110 Infimum, 66 Information geometry, 163, 164 Bayesian, 164 Interference fringes, 12, 14 Interferograms. See also Spiral interferograms angular multiplexing of, 33 glue strip with, 33f phase stepping of, 33, 47 standard, 33, 33f, 34 wavelength multiplexing and, 33 Interferometer, self-referenced, 10f, 33, 34 Interferometry, 32 classical optical, 32–33 problems in, 32–33, 34 Invariant series, of LULU operator, 130–133 Isotone operators, 66 Isotropic edge enhancement, 5–18 advantages of, 46 demonstration of, 8f experimental realization of, 9–13 coherent illumination in, 9–12 low-coherence illumination in, 12, 13f SLM in, 11–12 results in, 13–18 images obtained with laser illumination, 13–15, 14f images obtained with partially coherent illumination, 15–18, 15f, 16f, 17f spiral phase filter and, 5–18
L L property. See Local property Laguerre–Gauss modes, 6, 7f Laser illumination, 13–15, 14f Lattice stack filters (LSFs), 133–148 co-idempotency and, 142–146
INDEX concrete calculations for, 146–148 idempotency and, 137–142 image algebra and, 135–137 nonlinear systems and, 135–137 PBF on lattices in, 133–135 Lattice-ordered groups, 142, 143 Lattices, 66 complete, 66 distributive, 134 PBFs on arbitrary, 133–135 examples (30–31) and, 134–135 Learning machines, 164 experimental science, 164f Learning rule, 176 Lebesgue-type inequality, 65, 66, 68 Lemma 18 (semigroups and), 114 23 (ordered semigroups and), 117–118 Light microscopy, 1–2 Linear combinations, 109–111, 111f Linear decompositions, 65 Linearity, from stack filters, 107–109 theorem 16 and, 108 Linearly ordered groups, 143 Local monotonicity, 67–68, 84, 156 Local (L) property, 103, 104 operators with I, C, HT, and, 110 Low-coherence illumination, 12, 13f LSFs. See Lattice stack filters LULU filters, 100 as stack filters, 106 LULU image analysis, 149–156 edge detection with DPT in, 152–153 estimation of variance of random distribution in, 155 highlighting of CD-rich crystals in, 151 image registration in, 153–154 of optical SH imaging, 149 of Pbx Cd1−x Te ternary alloys, 149–156 results in, 155–156 second dimension in, 150 LULU operators, 62 definition of, 62 fine structure of, 127–133 band B (M, N) in, 128–130 invariant series of, 130–133 theorems (28–29) and, 131–132, 133 mutual order relations among, 68, 68f ranges of, 67–69
213
INDEX rank of, 130 stack filters and, 100–111 strong filters and, 146–147 LULU theory, 58 DPT and, 60–86 elementary particles and atoms of, 60–62 lack of linear theory in, 65–66 lattices and, 66 ranges of LULU operators in, 67–69 Venn diagram with, 59f
M M. See Manifold(s) Manifold(s) (M) definition of, 167 differentiable, 167–168, 168f flat, 174 self-dual, 184, 184n6 tangent space on, 168–169, 169f exponential mapping on, 171, 171f Riemannian, 166, 170, 170f differential geometry tools related to, 166–175 smooth, 167–168, 168f topological, 167, 167f vector field on, 169, 169f Marr, David, 60, 119n Mathematical morphology (MM), 58, 120–127 definition of, 120 envelopes in, 125–127 structural openings in, 121–125 Venn diagram with, 59f Mathematical vision. See Vision, mathematics of Matheron pairs, 118 MBN. See Monotone Boolean network Median decompositions, 91 alternatives and, 91–95 theorem 6 and, 94 Mica fragment brightfield image of, 17, 17f darkfield image, 16–17, 17f spiral phase-filtered image of, 17, 17f Microscope objectives, 2 Olympus EA10, 10f, 13 Olympus UPlanFL 60, 15 Zeiss Achroplan 63, 29f Zeiss A-plan 20, 16, 21, 34
Zeiss Fluar 40, 24f Microscopy brightfield, 1–2 oil droplet in, 8f, 14, 14f contrast in, 2 darkfield, 2 ultra-, 2 DIC, 3–4 Dodt gradient contrast imaging in, 4–5 fluorescence, 2 graded-field, 5 Hoffman modulation contrast, 4 light, 1–2 optical, 1–2 phase contrast (PCM), 3, 5 spiral phase, 1–56 Microscopy Primer, 2 Min–max systems, 59, 70, 71 Mixture families, 193 singularities with, 193–195 MM. See Mathematical morphology Monotone Boolean network (MBN), 136 Monotone operators, 66 Monotonicity, 67, 68 local, 67–68, 84, 156 Morphological filters, 69, 157 axioms for, 69 Multi-image demodulation, 42–46, 44f, 45f thick samples and, 51–52 Multiresolution analysis, 60 Multivariate Gaussian mixture, 164, 195–198, 199f
N Negative phase contrast, 3 Neighbor trend preserving (NTP), 84 Neurocomputing, 163 Neutral atoms, 61 Noise DPT influenced by, 90–91 impulsive, 75, 91 ambiguity problem in, 75–76, 76f fort profile and, 78f periodic, 91 Nomarski prisms, 3, 4 Nonlinear systems, 135 image algebra and, 135–137 lattice stack filters and, 135–137 Nonparametric modeling, 179 NTP. See Neighbor trend preserving
214 Nyquist frequency, 74, 91
O Objectives. See Microscope objectives Objects amplitude, 2 phase, 3 Off-axis holograms, 9, 10f, 11, 47 Oil droplet brightfield image of, 8f, 14, 14f spiral phase-filtered image of, 8f, 14–15, 14f Olympus EA10 objective, 10f, 13 Olympus UPlanFL 60 objective, 15 One dimensional (1D) line segments A, 59 One dimensional (1D) retina, 60 1D retina. See One dimensional retina Openings, 66 Operators, 61. See also LULU operators idempotent, 65 increasing, 66 isotone, 66 monotone, 66 order preserving, 66 properties (L, I, C, HT) and, 110 rounding (R), 95–96 as stack filters, 110 syntone, 66 thresholding (T), 97–98 topical, 111, 111f Optical microscopy, 1–2 Optical second harmonic (SH) imaging, 149 of Pbx Cd1−x Te ternary alloys, 149 LULU image analysis of, 149–156 Optical SH imaging. See Optical second harmonic imaging Optical thickness, 32 Optical traps, 6, 7f Optical vortex, 37 δ Optimal estimates, 189 Order preserving operators, 66 Ordered semigroups, 115–118 lemma 23 and, 117–118 Overfilters, 125, 148f special types of, 126 Venn diagram with, 148f
P Parallel translation, 172, 172f
INDEX Parametric modeling, 179–181 Parseval identity, 63, 64, 65, 85 PBF. See Boolean functions, positive Pbx Cd1−x Te ternary alloys, 149 optical SH imaging of, 149 LULU image analysis of, 149–156 PCM. See Phase contrast microscopy Periodic noise. See Noise, periodic Perron–Frobenius theorem, 111 Phase contrast microscopy (PCM), 3, 5 Phase gradients, 3 Phase objects, 3 Phase stepping, 33, 47 Plateaus, 131 Positive Boolean functions. See Boolean functions, positive Positive phase contrast, 3 Prediction problem, 165 Prior projection, 190–191, 191f Prior selection, 164, 181–187, 203 choice of reference distribution in, 184–187 with δ-Flat families, 189–190 (δ, α)-Prior family, 182–184, 203 theorem 1 and, 183, 184 Prisms Nomarski, 3, 4 Wollaston, 3 Projection mass, 190, 203 Projections, 65, 69 conservation law and, 71 separators as, 69–71 variation preservation of, 72–73 Protons, 61 PtK2 cells, 21 spiral phase contrast microscopy of, 21, 24f Pythagoras’s law, 63, 65
R R. See Rounding operator Reference distribution, 184 prior selection and choice of, 184–187 Refractive index, 3 Relieflike shadow images, 18 features of, 46 Residuated map, 121 Resolution components, analysis of, 76–81 Resolution consensus, 95
INDEX Richardson slide, 28 imaging and reconstruction of, 29f reconstructed phase topography of, 30, 30f Riemannian connection, 175 Riemannian manifolds, 166, 170, 170f differential geometry tools related to, 166–175 Riemannian metrics, 170, 170f Rotating shadow effect, 23–32 reconstruction of sample topography in, 25–28 results in, 28–32 Rounding operator (R), 95–96 corollary 8 and, 97 theorem 7 and, 95–96
S S. See Submanifold Scale independent (SI) property, 61 smoothers and, 61 Self-dual, 184, 184n6 Semigroups example 17 and, 113 Galois connections and abstract, 111–120 idempotent elements of, 112–115 lemma 18 and, 114 ordered, 115–118 theorems (20–22, 24) and, 115–117, 118 theorem 19 and, 114–115 Venn diagram with, 59f Semilattices, 66 Separator cascade, 70 diagram of two-stage, 71f Separators, 69 definition of, 70 eigenvalues of, 70 as generalized projections, 69–71 as smoothers, 70 Shadow effects asymmetric edge enhancement and, 18–22, 22f, 23f cheek cell image and, 21, 23f rotating, 23–32 setup used for, 22f Shape preservation, 73, 75, 157–158 DPT and, 83–86 SI property. See Scale independent property Signal, 95
215 Single-image demodulation, 39–42 center line, 41–42, 42f contour line, 40–41, 40f Singularities δ-Flat families and, 192–195 with mixture families, 193–195 SLM. See Spatial light modulator Slow drift, 91 Smooth manifold. See Manifold, smooth Smoothers, 58, 60, 111f basic axioms of, 61 definition of, 61–62 HT property and, 61 increasing, 69 nonlinear, 59, 61 separators as, 70 SI property and, 61 VT property and, 61 Source separation problem, 164, 198, 200–202 δ-divergence in, 201–202 Fisher information matrix in, 200–201 Spatial light modulator (SLM), 9, 11–12 advantages of, 11–12 Speckles, 12, 14 Spiral fringes, 35 origin of, 35–39 Spiral interferograms, 33, 33f, 34, 35, 46–47 applications of, 47 demodulation of, 39–46 multi-image demodulation of, 42–46, 44f, 45f thick samples and, 51–52 optically thick samples and, 32–46 single-image demodulation of, 39–42 center line, 41–42, 42f contour line, 40–41, 40f Spiral kernel, 37 details on, 48–49 Spiral phase aperture mask/filter, 48 hardware version of, 48 Spiral phase filter asymmetric edge enhancement with modified, 18–22 crystallized protein image, 31–32, 31f isotropic edge enhancement with, 5–18 mica fragment image with, 17, 17f oil droplet image with, 8f, 14–15, 14f principle of, 6f PtK2 cells and, 21, 24f
216 visualization of principle of, 7f Spiral phase holograms, 9, 10f Spiral phase microscopy, 1–56 Spiral phase plates, 5–6, 6f, 9 conventional application of, 7f convolution with, 6, 7f Spline spaces, 95 Stack filters, 58. See also Lattice stack filters C property of, 103, 104 definition of, 103 evaluation/optimization of, 103–107 example 14 and, 104–105 HT property of, 103, 104 I property of, 103, 104 idempotency in, 106 L property of, 103, 104 linear combinations of, 109–111, 111f linearity from, 107–109 theorem 16 and, 108 LULU filters as, 106 LULU operators and, 100–111 operators as, 110 PBFs and, 100–101 properties of, 103 theorem and 13th, 104 15th, 105–106 threshold decompositions and, 101–103 Venn diagram with, 59f Static phase holograms, 9 Statistical geometric learning, 165–166, 176–181 Bayesian learning in, 178–179 decision rule in, 165, 176, 177f mass and geometry in, 176–178 nonparametric modeling in, 179 parametric modeling in, 179–181 restricted model in, 179–181 Statistical manifold Q, 166, 177f Strong filters, 123n, 126, 127 LULU operators as, 146–147 Venn diagram with, 148f Structural openings, 121–125 examples (25–26) and, 121–122, 124–125 Structuring element, 122, 123 Student distribution, 203 Submanifold (S), 174 auto-parallel, 174, 174f example of, 174 Successor state, 136
INDEX Supremum, 66 Syntone operators, 66
T T. See Thresholding operator Tangent spaces, 168–169, 169f Tensor fields, 169–170 Theorem(s) 1 (FTP/consistency and), 86 proof of, 108–109 1 (δ, α)-Prior family and, 183, 184 proof of, 203–204 2 (FTP/consistency and), 86–87 2 (δ, α)-Prior family and, 186 proof of, 204–206 corollary 3, 87 4 (FTP/consistency and), 88 5 (FTP/consistency and), 88–89 6 (median decompositions and), 94 7 (rounding operator R and), 95–96 corollary 8, 97 9 (thresholding operator T and), 97–98 corollary 10, 98 example 11, 101 example 12, 102 13 (stack filters and), 104 example 14, 104–105 15 (Transfer Principle), 105–106 stack filters and, 105–106 16 (linearity from stack filters and), 108 example 17, 113 lemma 18, 114 19 (semigroups and), 114–115 20 (ordered semigroups and), 115 21 (ordered semigroups and), 116 22 (ordered semigroups and), 117 lemma 23, 117–118 24 (ordered semigroups and), 118 example 25, 121–122 example 26, 124–125 27 (band B (M, N) and), 128–129 28 (invariant series of LULU operator and), 131–132 29 (invariant series of LULU operator and), 133 example 30, 134 example 31, 134–135 example 32, 137–138 33 (idempotency and), 139–140
217
INDEX corollary 34, 140–141 corollary 35, 141 example 36, 141 37 (idempotency and), 141–142 corollary 38, 142 39 (co-idempotency and), 144–145 corollary 40, 145–146 Threshold decompositions, 58, 101–103 example 12 and, 102 stack filters and, 101–103 Thresholding operator (T), 97–98 corollary 10 and, 98 theorem 9 and, 97–98 TILL Imago, 9 Topical operators, 111, 111f Topological manifold. See Manifold, topological Transfer Principle, 105–106 Translation invariant, 122
U Underfilters, 125, 148f special types of, 126 Venn diagram with, 148f
V Variation preserving (VP) property, 72–73 FTP property’s equality with, 85 Variation spectrum, 81 Vector fields, 169, 169f Venn diagram(s), 59f, 99, 126, 148f, 158 co-idempotent in, 148f idempotent in, 148 LULU theory in, 59f mathematical vision in, 59f
MM in, 59f overfilters in, 148f semigroups in, 59f stack filters in, 59f strong filters in, 148f underfilters in, 148f Vertically translation invariant (VT) property, 61 smoothers and, 61 Vision mathematics of, 58, 60 consistency and, 89 Venn diagram with, 59f robotic, 60 Vortex filtering, 5, 6, 7f, 8 details on expansion in, 50 mathematical properties of, 35–39 VP property. See Variation preserving property VT property. See Vertically translation invariant property
W Wavelength multiplexing, 33 Wavelet theory, 58, 60 Wollaston prisms, 3
Z Zeiss Achroplan 63 objective, 29f Zeiss A-plan 20 objective, 16, 21, 34 Zeiss Fluar 40 objective, 24f Zernike, Frits, 3 Zeroth-order Fourier component, 12 asymmetric edge enhancement influenced by, 12, 18–20
This page intentionally left blank