Advances in Imaging and Electron Physics (Volume 112) (Advances in Imaging and Electron Physics)

ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 112 a EDITOR-IN-CHIEF PETER W. HAWKES CEMES/Laboratoire d’Optique E...

Author: Benjamin Kazan | Peter W. Hawkes | Tom Mulvey

36 downloads 778 Views 4MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 112

a

EDITOR-IN-CHIEF

PETER W. HAWKES CEMES/Laboratoire d’Optique Electronique du Centre National de la Recherche Scientiﬁque Toulouse, France

ASSOCIATE EDITORS

BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California

TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdom

Advances in

Imaging and Electron Physics Edited by PETER W. HAWKES CEMES/Laboratoire d’Optique Electronique du Centre National de la Recherche Scientiﬁque Toulouse, France

VOLUME 112

San Diego San Francisco New York Boston London Sydney Tokyo

This book is printed on acid-free paper. Copyright 2000 by Academic Press All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. The appearance of the code at the bottom of the ﬁrst page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of speciﬁc clients. This consent is given on the condition, however, that the copier pay the stated per-copy fee through the Copyright Clearance Center, Inc. (222 Rosewood Drive, Danvers, Massachusetts 01923), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-2000 chapters are as shown on the title pages: if no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/00 $35.00 Explicit permission from Academic Press is not required to reproduce a maximum of two ﬁgures or tables from an Academic Press article in another scientiﬁc or research publication provided that the material has not been credited to another source and that full credit to the Academic Press article is given. ACADEMIC PRESS A Harcourt Science and Technology Company 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.academicpress.com Academic Press Harcourt Place, 52 Jamestown Road, London, NW1 7BY, UK http://www.hbuk.co.uk/ap/ International Standard Serial Number: 1076-5670 International Standard Book Number: 0-12-014754-8 Printed in the United States of America 00 01 02 03 EB 9 8 7 6 5 4 3 2 1

CONTENTS

Contributors . . . . . . . . . . . . . . . . . . . . . . . . . Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . Forthcoming Contributions . . . . . . . . . . . . . . . . . . .

vii ix xi

Second-Generation Image Coding N. D. Black, R. J. Millar, M. Kunt, M. Reid, and F. Ziliani I. II. III. IV. V.

Introduction . . . . . . . . . . . . . Introduction to the Human Visual System Transform-Based Coding . . . . . . . Segmentation-Based Approaches . . . . Summary and Conclusions . . . . . . . References . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

1 4 8 31 46 50

. . . . . .

56 63 66 69 70 71

. . .

74 81 86

. .

90 93

. . . .

95 98 103 113

The Aharonov-Bohm Effect — A Second Opinion Walter C. Henneberger I. II. III. IV. V. VI. VII.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . The Vector Potential . . . . . . . . . . . . . . . . . . . . Dynamics of the Aharonov-Bohm Effect . . . . . . . . . . . . Momentum Conservation in the Aharonov-Bohm Effect . . . . . Stability of the AB Effect . . . . . . . . . . . . . . . . . . . The AB Effect Can Not Be Shielded . . . . . . . . . . . . . . Interaction of a Passing Classical Electron with a Rotating Quantum Cylinder . . . . . . . . . . . . . . . . . . . . . . . . . . VIII. Solution of the Entire Problem of the Closed System . . . . . . . IX. The Interior of the Solenoid . . . . . . . . . . . . . . . . . X. Ambiguity in Quantum Theory, Canonical Transformations, and a Program for Future Work . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .

Well-Composed Sets Longin Jan Latecki I. II. III. IV.

Introduction . . . . . . . . . . . . . . . . . . . Deﬁnition and Basic Properties of Well-Composed Sets 3D Well-Composed Sets . . . . . . . . . . . . . . 2D Well-Composed Sets . . . . . . . . . . . . . .

v

. . . .

. . . .

. . . .

. . . .

. . . .

vi

CONTENTS

V. Digitization and Well-Composed Images VI. Application: An Optimal Threshold . . VII. Generalizations . . . . . . . . . . References . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

142 154 159 161

. . .

165

. . .

167

. . . . .

191 221 225 225 228

Non-Stationary Thermal Field Emission V. E. Ptitsin I. Introduction . . . . . . . . . . . . . . . . . . . . II. Electron Emission and Thermal Field Processes Activated by High Electric Fields Acting on Metal Surfaces . . . . . . . III. Phenomenological Model of Non-Stationary Thermal Field Emission . . . . . . . . . . . . . . . . . . . . . . IV. Discussion and Conclusion . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

Theory of Ranked-Order Filters with Applications to Feature Extraction and Interpretive Transforms Bart Wilburn I. II. III. IV. V.

Index

Introduction . . . . . . . . . . . . . . . . . . Statistical Approach to Ranked-Order Filters . . . . . Mathematical Logic Approach to Ranked-Order Filters . A Language Model Based on Ranked-Order Filters . . Conclusions . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

233 235 241 307 331 332

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

333

CONTRIBUTORS Numbers in parentheses indicate the pages on which the author’s contribution begins.

N. D. Black (1), Information & Software Engineering, University of Ulster, Northern Ireland Walter C. Henneberger (56), Department of Physics, Southern Illinois University, Carbondale, IL 62901-4401 M. Kunt (1), Swiss Federal Institute of Technology, Lausanne, Switzerland Longin Jan Latecki (95), Department of Applied Mathematics, University of Hamburg, Bundesstr. 55, 20146 Hamburg R. J. Millar (1), Information & Software Engineering, University of Ulster, Northern Ireland V. E. Ptitsin (165), Institute for Analytical Instrumentation RAS, Rizhskij Prospekt 26, 198103, St. Petersburg, Russia M. Reid (1), Kainos Software Ltd., Belfast, Northern Ireland Bart Wilburn (233), University of Arizona, Optical Sciences Center, Tucson, Arizona F. Ziliani (1), Swiss Federal Institute of Technology, Lausanne, Switzerland

vii

a This Page Intentionally Left Blank

PREFACE The transmission of digital images is by now a commonplace, although most of us have encountered innumerable obstacles and difﬁculties in practice. For transmission, images need to be compressed and a family of techniques for doing this efﬁciently has been developed. Sophisticated though these are, the transmitted image may be found imperfect, particularly if it represents everyday objects, with which the eye is familiar. For such reasons as these, a new generation of image coding techniques is being developed, which satisfy to a greater extent the expectations of the visual system. It is these ‘‘second-generation’’ coding methods that form the subject of the ﬁrst chapter in this volume, by N.D. Black and R.J. Millar of the University of Ulster, F. Ziliani and M.Kunt (who introduced these secondgeneration approaches) of the EPFL in Lausanne and M. Reid of Kainos Software Ltd. The Aharonov-Bohm effect, discovered in a semi-classical form by W. Ehrenberg and R. E. Siday nearly a decade before the seminal paper of Y. Aharonov and D. Bohm, has a huge literature and has been at the heart of innumerable disputes and polemics. The existence of the effect is no longer in doubt, thanks to the conclusive experiments of A. Tonomura, but there is still argument about the correct way of analyzing it. The difﬁculty concerns the scattering treatment of the phenomenon and it is here that W. C. Henneberger, who has written numerous thought-provoking papers on the subject, departs from the widely accepted canon. I have no doubt that the argument will continue but I am delighted to include this carefully reasoned alternative opinion in these pages. The third contribution is concerned with one of the theoretical problems of analyzing digital images that continues to be a source of nuisance, if nothing worse. It is well known that in order to avoid paradoxes, it is necessary to use different adjacency relations in different areas of images, which is obviously inconvenient and intellectually unsatisfying. L. J. Latecki has introduced the idea of well-composed sets into binary image studies, precisely in order to prevent such paradoxes from arising, and this very readable account of his ideas will, I am sure, be found most helpful. The quest for ever brighter electron sources, notably for electron lithography, is in a lively phase and the fourth chapter describes an unusual approach, non-stationary ﬁeld emission, that is currently under investigation. In addition to the intrinsic interest and scientiﬁc relevance of the subject, this chapter has the additional merit of making better known the ix

x

PREFACE

Russian work in this area; despite the fact that the principal Russian serials are available in English translation, their contents are often less well-known than they might be. V. E. Ptitisin ﬁrst describes the physical processes that occur when high electric ﬁelds are applied to metal surfaces, then presents in detail a phenomenological model of the non-stationary effects that are at the origin of the desirable emissive properties of the associated sources. We conclude with an extended discussion by B. Wilburn of the theory of ranked order ﬁlters and of their applications for feature extraction and even for artiﬁcial intelligence. These ﬁlters, of which the median ﬁlter is the best known, remained for many years somewhat mysterious, their attractive features were known experimentally but the underlying theory remained obscure. Now, however, the reasons for their performance are better understood and formal analyses of their behavior have been made. The fascinating relation between them and the constructs of mathematical morphology is likewise now understood. B. Wilburn not only presents the theory, both statistical and logical, very fully and clearly but also includes some new ﬁndings, which have not yet been published elsewhere. I am particularly pleased that he agreed to include this material in these Advances. I thank all our contributors, in particular for all their efforts to ensure that their contributions are accessible to readers who are not specialists in the same area and present a list of material to appear in future volumes. Peter Hawkes

FORTHCOMING CONTRIBUTIONS D. Antzoulatos Use of the hypermatrix N. Bonnet (vol. 114) Artiﬁcial intelligence and pattern recognition in microscope image processing G. Borgefors Distance transforms A. van den Bos and A. Dekker Resolution P. G. Casazza (vol. 115) Frames J. A. Dayton Microwave tubes in space E. R. Dougherty and Y. Chen Granulometries J. M. H. Du Buf Gabor ﬁlters and texture analysis G. Evangelista Dyadic warped wavelets R. G. Forbes Liquid metal ion sources E. Fo¨rster and F. N. Chukhovsky X-ray optics A. Fox The critical-voltage effect M. I. Herrera The development of electron microscopy in Spain K. Ishizuka Contrast transfer and crystal images xi

xii

FORTHCOMING CONTRIBUTIONS

C. Jeffries Conservation laws in electromagnetics I. P. Jones ALCHEMI M. Jourlin and J.-C. Pinoli (vol. 115) Logarithmic image processing E. Kasper Numerical methods in particle optics A. Khursheed (vol. 115) Scanning electron microscope design G. Ko¨gel Positron microscopy W. Krakow Sideband imaging D. J. J. van de Laak-Tijssen and T. Mulvey (vol. 115) Memoir of J.B. Le Poole C. Mattiussi (vol. 113) The ﬁnite volume, ﬁnite element and ﬁnite difference methods J. C. McGowan Magnetic transfer imaging S. Mikoshiba and F. L. Curzon Plasma displays S. A. Nepijko, N. N. Sedov and G. Scho¨nhense (vol. 113) Photoemission microscopy of magnetic materials P. D. Nellist and S. J. Pennycook (vol. 113) Z-contrast in the STEM and its applications K. A. Nugent, A. Barty and D. Paganin Non-interferometric propagation-based techniques E. Oesterschulze Scanning tunnelling microscopy M. A. O’Keefe Electron image simulation J. C. Paredes and G. R. Arce Stack ﬁltering and smoothing

FORTHCOMING CONTRIBUTIONS

xiii

C. Passow Geometric methods of treating energy transport phenomena E. Petajan HDTV F. A. Ponce Nitride semiconductors for high-brightness blue and green light emission J. W. Rabalais Scattering and recoil imaging and spectrometry H. Rauch The wave-particle dualism G. Schmahl X-ray microscopy J. P. F. Sellschop Accelerator mass spectroscopy S. Shirai CRT gun design methods T. Soma Focus-deﬂection systems and their applications I. Talmon Study of complex ﬂuids by transmission electron microscopy I. R. Terol-Villalobos Morphological image enhancement and segmentation R. Tolimieri, M. An and A. Brodzik Hyperspectral imaging A. Tonazzini and L. Bedini Image restoration J. Toulouse New developments in ferroelectrics T. Tsutsui and Z. Dechun Organic electroluminescence, materials and devices Y. Uchikawa Electron gun optics D. van Dyck Very high resolution electron microscopy

xiv

FORTHCOMING CONTRIBUTIONS

L. Vincent Morphology on graphs N. White (vol. 113) Multi-photon microscopy C. D. Wright and E. W. Hill Magnetic force microscopy T. Yang (vol. 114) Cellular Neural Networks

ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 112

a This Page Intentionally Left Blank

ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 112

Second-Generation Image Coding N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, and F. ZILIANI Information & Software Engineering, University of Ulster, Northern Ireland Computing & Mathematical Sciences, University of Ulster, Northern Ireland Swiss Federal Institute of Technology, Lausanne, Switzerland Kainos Software Ltd, Belfast, Northern Ireland

I. Introduction . . . . . . . . . . . . . . II. Introduction to the Human Visual System . . III. Transform-Based Coding . . . . . . . . . A. Overview . . . . . . . . . . . . . . B. The Optimum Transform Coder . . . . C. Discrete Cosine Transform Coder . . . . D. Multiscale/Pyramidal Approaches . . . E. Wavelet-Based Approach . . . . . . . F. Edge Detection . . . . . . . . . . . . G. Directional Filtering . . . . . . . . . IV. Segmentation-Based Approaches . . . . . . A. Overview . . . . . . . . . . . . . . B. Preprocessing . . . . . . . . . . . . C. Segmentation Techniques: Brief Overview D. Texture Coding . . . . . . . . . . . E. Contours Coding . . . . . . . . . . . F. Region-Growing Techniques . . . . . . G. Split-and-Merge-Based Techniques . . . H. Tree/Graph-Based Techniques . . . . . I. Fractal-Based Techniques . . . . . . . V. Summary and Conclusions . . . . . . . . References . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

1 4 8 8 9 10 15 19 26 27 31 31 32 34 38 39 40 41 43 44 46 50

I. Introduction The thirst for digital signal compression has grown over the last few decades, largely as a result of the demand for consumer products, such as digital TV, commercial tools, such as visual inspection systems and video conferencing, as well as for medical applications. As a result a number of ‘‘standards’’ have emerged that are in wide-spread use today, and which exploit some aspect of the particular image they are used on to achieve reasonable compression rates. One such standard is M-JPEG, which was originally developed for the compression of video images. It does this by 1 Volume 112 ISBN 0-12-014754-8

ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright 2000 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/00 $35.00

2

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

treating each image as a separate still picture. It works by taking ‘‘blocks’’ of picture elements and processing them using a mathematical technique known as the discrete cosine transform (DCT), resulting in a set of digital data representing particular aspects of the original image. These data are then subject to ‘‘lossless’’ compression to further reduce the size before transmission. The technique is very effective but, as one might expect, affects the resulting image quality to a certain degree. At high data rates for example, the process has the effect of enhancing picture contrast whereas at low data rates the process introduces ‘‘blocking’’ effects, which deteriorate the picture quality. Successive compression techniques often build upon previous designs, as is the case with the MPEG standard which encodes images rather like the M-JPEG standard but transmits information on the differences between successive image frames. In this way improved compression ratios can sometimes be achieved. The gain in compression is often at the expense of some other feature such as quality. The MPEG standard, for example, offers higher compression than M-JPEG but produces a recovered picture that is not only less sharp but introduces signiﬁcant delays. The International Telecommunications Union (ITU) has deﬁned a number of standards relating to digital video compression, all of which use the H.261 compression standard. This technique is speciﬁcally designed for low bandwidth channels and, as a result, does not produce images which could be considered of TV quality. Currently, the best compression techniques can produce about a 20:1 compression if the picture quality is not to be compromised. These ‘‘standards’’ are all based upon the so-called ‘‘First-Generation’’ coding techniques. All exploit temporal correlation through block-based motion estimation and compensation techniques, whereas they apply frequency transformation techniques (mainly discrete cosine transform, DCT) to reduce spatial redundancy. There is a high degree of sophistication in these techniques and a number of optimization procedures have been introduced that further improve their performances. However, the limits of these approaches have been reached and further optimizations are unlikely to result in drastically improved performance. First generation coding schemes are based on classical information theory (Hoffman, 1952; Golomb, 1966; Welch, 1977) and are designed to reduce the statistical redundancies present in the image data. These schemes exploit spatial and temporal redundancies in the video sequence at a pixel level or at a ﬁxed-size, block of pixels level. The various different schemes attempt to achieve the least possible coding rate for a given image distortion, and/or to minimize the distortion for a given bit rate. The compression ratios obtained with ﬁrst generation lossless techniques are moderate at around

SECOND-GENERATION IMAGE CODING

3

2:1. With lossy techniques a higher ratio (greater than 30:1) can be achieved but at the expense of image quality. The distortion introduced by the coding scheme is generally measured in terms of mean square error (MSE) between the original image and its reconstructed version. Although MSE is a simple measure of distortion that is easy to compute, it is limited in characterizing the perceptual level of degradation of an image. New image quality measures are necessary to second-generation image coding techniques. These will be introduced and discussed later in this paper. Second-generation image coding was ﬁrst introduced by Kunt et al. (1985). The work stimulated new research aimed at further improvement in the compression ratios compared with those that were produced using existing coding strategies whose performances have now reached saturation level. The main limitation of ﬁrst-generation schemes compared with the second-generation approach is that ﬁrst-generation schemes do not directly take into account characteristics of the human visual system (HVS) and hence the way in which images are perceived by humans. In particular, ﬁrst generation coding schemes ignore the semantic content of the image, simply partitioning each frame into artiﬁcial blocks. It is this that is responsible for the generation of strong visible degradation referred to as blocking artifacts, because a block can cover spatial/temporal nonhomogeneous data belonging to different entities (objects) in the scene. Block partitioning results in a reduced exploitation of spatial and temporal redundancies. In contrast, instead of limiting the image coding on a rigid block-based grid, second-generation approaches attempt to break down the original image into visually meaningful subcomponents. These subcomponents may be deﬁned as image features and include edges, contours, and textures. These are known to represent the most relevant information to enable the HVS to interpret the scene content (Cornsweet, 1970; Jain, 1989; Rosenfeld and Kak, 1982) and need to be preserved as much as possible to guarantee good perceptual quality of the compressed images. Second-generation coding techniques minimize the loss in terms of human perception so that when decoded, the reconstructed image does not appear to be different from the original. For second-generation schemes, therefore, MSE is not sufﬁcient as a measure of quality and a new criterion is required to correctly estimate the distortion introduced. Alternative to edges, contours and textures, the scene may be represented as a set of homogeneous regions or objects. This representation offers several advantages. First, each object is likely to present a high spatial and temporal correlation, improving the efﬁciency of the compression schemes much beyond the limits imposed by a block-based representation. Second, a description of the scene in terms of objects give access to a variety of

4

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

functions. For example, it is possible to assign the priority of each object and to distribute, accordingly, the available bit rate for the single frame. This functionality, referred to as scalability, enhances the quality of the objects of interest compared to those regions with less importance in the scene. Similarly, it is possible to apply for each object, according to its properties, the corresponding optimum coding strategy. This concept, referred to as dynamic coding (Ebrahimi et al., 1995), may further optimize the overall performances of the coding scheme. As suggested in Kunt (1998), future multimedia systems will strongly exploit all of these new fuctions, indeed some have already been introduced in the new video coding standard MPEG-4 (Ebrahimi, 1997). This chapter is organized into four main sections. In Section II a brief introduction to the human visual system is given in which characteristics that may be exploited in compression systems are discussed. Throughout the text, and where it is appropriate, additional and more speciﬁc material is referenced. Sections III and IV present the main body of the chapter and include transform-based techniques and segmentation-based techniques, respectively. Finally, we offer a Summary and Conclusions in Section V.

II. Introduction to the Human Visual System We consider essentially two techniques for the coding of image information: transformation and segmentation. These techniques are essentially signal processing strategies, much of which can be designed to exploit aspects of the human visual system (HVS) in order to gain coding efﬁciency. Part of the process of imaging involves extracting certain elements or features from an image and presenting them to an observer in a way that matches their perceptual abilities or characteristics. In the case of a human observer, there are a number of sensitivities, such as amplitude, spatial frequency and image content, that can be exploited as a means of improving the efﬁciency of image compression. In this section, we shall introduce the reader to the HVS by identifying some of its basic features, which can be exploited fruitfully in coding strategies. We shall consider quantitatively those aspects of the HVS that are generally important in the imaging process. More explicit information on the HVS, as it relates to speciﬁc algorithms described in the text, is referenced throughout the text. The HVS is part of the nervous system and, as such, presents a complex array of highly specialized organs and biological structures. Like all other biological organs, the eye, which forms the input to the HVS, consists of highly specialized cells that react with speciﬁed yet limited functionality to

SECOND-GENERATION IMAGE CODING

5

input stimuli in the form of light. The quantitative measure of light power is luminance, measured in units of candela per meter squared (cd/m). The luminance of a sheet of white paper reﬂecting in bright sunlight is about 30,000 cd/m and that in dim moonlight is around 0.03 cd/m; a comfortable reading level is a page that radiates about 30 cd/m. As can be seen from these examples, the dynamic range of the HVS, deﬁned as the range between luminance values so low as to make the image just discernible and those at which an increase in luminance makes no difference to the perception, is very large and is, in fact, in the order of 80 dB. From a physical perspective, light enters the eye through the pupil, which varies in diameter between 2—9 mm, and is focused onto the imaging retina by the lens. Imperfections in the lens can be modeled by a two-dimensional (2D) lowpass ﬁlter while the pupil can be modeled as a lowpass ﬁlter whose cut-off frequency decreases with enlargement. The retina contains the neurosensory cells that transform incoming light into neural impulses, which are then transmitted to the brain and image perception occurs. It has two types of cells, cones and the slightly more sensitive rods. Both are responsible for converting the incoming light into electrical signals while compressing its dynamic range. The compression is made according to a nonlinear law of the form B : LA where B represents brightness and L represents luminance. Both sensitivity and resolution characteristics of the HVS are largely determined by the retina. Its center is an area known as the fovea, which consists mainly of cones separated by distances large enough to facilitate grating resolutions of up to 50 cycles/degree of subtended angle. The spatial contrast sensitivity of the retina depends on spatial frequency and varies according to luminance levels. Figure 1, derived from Pearson, shows the results of this sensitivity at two luminance levels, 0.05 cd/m and 500 cd/m. The existence of the peaks in Fig. 1 illustrates the important ability of the HVS in identifying sharp boundaries and further its limitations in identifying gradually changing boundaries. The practical effect of Figure 1 is that the HVS is very adept at identiﬁcation of distinct changes in boundary, grayscales, or color, but important detail can easily be missed if the changes are more gradual. The rods and cones are interconnected in complex arrangements and this leads to a number of perceptual characteristics, such as lateral inhibition. Cells that are activated as a result of stimulation can be inhibited from ﬁring by other activated cells that are in close proximity. The effect of this is to produce an essentially high-pass response, which is limited to below a radial spatial frequency of approximately 10 cycles/degree of solid angle, beyond which integration takes place. The combined result of lateral inhibition and the previously described processes make this part of the HVS behave as a linear model with a bandpass frequency response.

6

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

Figure 1. Typical contrast sensitivity of the eye for sine-wave gratings. Evidently the perception of ﬁne detail is dependent on luminance level; this has practical implications, for example on the choice between positive and negative modulation for the display of particular types of image.

The majority of the early work on vision research used the frequency sensitivity of the HVS as described by the modulation transfer function (MTF). This characterizes the degree to which the system or process can image information at any spatial frequency. The MTF is deﬁned for any stage of an imaging system as the ratio of the amplitudes of the imaged and the original set of spatial sine-waves representing the object which is being imaged, plotted as a function of sine-wave frequency. Experiments by Manos and Sakrison (1974) and Cornsweet (1970) propose a now commonly used model for this function, which relates the sensitivity of the eye to sine-wave gratings at various frequencies. Several authors have made use of these properties (Jangard Rajola, 1990, 1991; Civinlar et al., 1986) in the preprocessing strategies, particularly when employing segmentation where images are preprocessed to take account of the HVS’s greater sensitivity to gradient changes in intensity and threshold boundaries (Civinlar et al., 1986; Marque´s et al., 1991).

SECOND-GENERATION IMAGE CODING

7

Figure 2. Dependence on threshold contrast .L /L (the Weber ratio) on size of an observed circular disc object, for two levels of background luminance, with zero noise and a 6 s viewing time. The effect of added noise and/or shorter viewing time will generally be to increase threshold contrast relative to the levels indicated here (after Blackwell (1996)).

A considerable amount of research work has been carried out to determine the eye’s capability in contrast resolution (the interested reader is referred to Haber and Hershenson, 1973, for detailed information). Contrast resolution threshold is given as .L /L where L is the luminance level of a given image and .L is the difference in luminance level that is just noticeable to an observer. The ratio, known as the Weber ratio, is a function of light falling on the retina and can vary considerably. Under ideal conditions, plots of Weber’s ratio indicate that the eye is remarkably efﬁcient at contrast resolution between small differences in grayscale level. The response of the HVS is a function of both spatial and temporal frequency and is shown in Fig. 2. Measurements by Blackwell (1996) on the joint spatiotemporal sensitivity indicate that this joint sensitivity is not

8

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

separable. As shown in Figure 2, distinct peaks in individual bandpass characteristics appear in both cases. In the following sections of this paper, algorithms derived to facilitate the coding of images will exploit aspects of the HVS with the intention of improving coding efﬁciency. For lossless transformations the efﬁciency does not always result in signiﬁcant compression but the tasks in the coding sequence such as quantization and ordering are made easier if the properties of the HVS can be properly exploited.

III. Transform-Based Coding A. Overview This section introduces and describes some of the most frequently used coding techniques, based on image transformation. Basically it consists of two successive steps: image decomposition/transformation; and quantization/ordering of the transform coefﬁcients. The general structure of all of these techniques is summarized in Figure 3. The basic idea exploited by these techniques is to ﬁnd a more compact representation of the image content. This is initially achieved by applying a decomposition/transformation step; different decompositions/transformations can be applied. Most transformation techniques (discrete cosine transform, pyramidal decompo-

Figure 3. A generic transform coding scheme. The image is ﬁrst transformed according to the chosen decomposition/transformation function. Then a quantization step, eventually followed by a reordering, provides a series of signiﬁcant coefﬁcients that will be converted into a bit-stream after a bit assignment step.

SECOND-GENERATION IMAGE CODING

9

sition, wavelet decomposition, etc.) distinguish low frequency contributions from high frequency contributions. This is a ﬁrst approximation of what happens in the HVS as was described in Section II. More accurate transformations from an HVS model point of view are applied in the directional-ﬁltering-based techniques (see Section III.G). Here frequency responses in some preferred spatial directions are used to describe the image content. Generally all of these transformations are lossless, thus they do not necessarily achieve a signiﬁcant compression of the image. However, the resultant transformed image has the property of highlighting the features that are signiﬁcant in the HVS model, thus easing the task of quantizing and ordering the obtained coefﬁcients according to their visual importance. The real compression step is obtained in the quantization/ordering step and the following entropy-based coding. Here the continuous transform coefﬁcients are ﬁrst projected into a ﬁnite set of symbols each representing a good approximation of the coefﬁcient values. Several quantization methods are possible from the simplest uniform quantization to the more complex vector quantization (VQ). A reordering of nonzero coefﬁcients is generally performed after the quantization step. This is done to better exploit the statistical occurrence of nonzero coefﬁcients to improve the performances of the following entropic coding. In a second generation coding framework, this ordering step is also responsible for deciding which coefﬁcients are really signiﬁcant and which can be discarded with minimum visual distortion. The criteria used to perform this choice are based on the HVS model properties and they balance the compromise between quality of the ﬁnal image and compression ratio. The next section will review some of the most popular coding techniques that belong to this class, identifying the properties that make them second generation coding techniques. First a brief introduction on general distortion criteria is presented in Section III.B in order to deﬁne the optimum transform coder. Then the discrete cosine transform will be discussed in Section III.C. Multiscale and pyramidal approaches are introduced in Section III.D and wavelet-based approaches are discussed in Section III.E. Finally, techniques that make use of extensive edge information will be reviewed in the last two sections, III.F and III.G.

B. The Optimum Transform Coder In a transform-based coding scheme, the ﬁrst step consists in transforming pixel values to exploit redundancies and to improve the compression rates

10

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

of successive entropy encoding. Once the optimality criterion is deﬁned, it is possible to ﬁnd an optimum transform for that particular criterion. In the framework of image coding, the most commonly used optimality criterion is deﬁned in terms of mean square distortion (MSD), also referred to as mean square error (MSE) between the reconstructed and original images. For such a criterion, it has been shown that an optimum transform exists (Schalkoff, 1989; Burt and Adelson, 1983) in the Karhunen-Loe`ve (KL) transform (Karhunen, 1947; Loe`ve, 1948). The KL transform depends on critical factors such as the second-order statistics as well as the size of the image. Due to these dependencies, the basis vectors are not known analytically and their deﬁnition requires heavy computation. As a result, the practical use of the KL transform in image coding applications is very limited. Although Jain (1976) proposed a fast algorithm to compute the KL transform, his method is limited to a speciﬁc class of image models and thus is not suitable for a general coding system. Fortunately, a good approximation of the KL transform that does not suffer from complexity problems exists as the discrete cosine transform presented in Section III.C. It is important to note that from an HVS point of view, the MSE criterion is not necessarily optimal. Other methods have been considered and are currently under investigation for measuring the visual distortion introduced by a coding system (van den Branden Lambrecht, 1996; Winkler, 1998; Miyahara et al., 1992). They take into account the properties of the HVS in order to deﬁne a visual distance between the original and the coded image and thus to assess the image quality of that particular compression (Mannos and Sakrison, 1974). These investigations have already shown improvements in standard coding systems (Westen et al., 1946; Osberger et al., 1946). Future research in this direction may provide more efﬁcient criteria and the corresponding new, optimum, transforms that could improve the compression-ratio without loss in visual image quality.

C. Discrete Cosine Transform Coder The discrete cosine transform (DCT) coder is one of the most used in digital image and video coding. Most of the standards available today, from JPEG to the latest MPEG-4, are based on this technique to perform compression. This is due to the good compromise between computational complexity and coding performance that the DCT is able to offer. A general scheme for a coding system based on DCT is presented in Fig. 4. The ﬁrst step is represented by a Block Partitioning of the image. This is divided into N;N pixel blocks f [x, y], where N needs to be deﬁned.

SECOND-GENERATION IMAGE CODING

Figure 4. A generic scheme for DCT coding. First the image is divided into 8 ; 8 blocks of pixels. Each block is then transformed using the DCT. A quantization step performs the real compression of the data. Finally a zig-zag scanning from the DC component to the AC components is performed.

11

12

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

Typical values for N are 8 or 16. A larger block size may lead to more efﬁcient coding, as the transform may access higher correlated data from the image; however, larger blocks also increase the computational cost of the transform, as will be explained. Better compression efﬁciency can be achieved by using a combination of blocks of different shapes as suggested by Dinstein et al. (1990). Clearly, this method increases the overall complexity of the coding scheme. In the international standard JPEG, N has been chosen equal to eight; thus the following examples will use the same value. Once the Block Partitioning step is performed each block is coded independently from the others by applying the DCT. The result of this step is a block, F[u, v], of N;N transformed coefﬁcients. Theoretically, the 2D DCT block, F[u, v], of the N;N image block, f [x, y], is deﬁned according to the following formula: F[u, v] :

2 (2x ; 1)u (2y ; 1)v \ \ C(u)C(v) f [x, y] cos cos N 2N 2N

where

1

C(z) : (2 1

, z:0

for z : u and z : v. Its inverse transform is then given by

2 \ \ (2x ; 1)u (2y ; 1)v cos C(u)C(v)F[u, v] cos N 2N 2N with C(u) and C(v) deﬁned as before. Intuitively, DCT coefﬁcients represent the spatial frequency components of the image block. Each coefﬁcient is a weight that is applied to an appropriate basis function. In Fig. 5 we display the basis functions for an 8 ; 8 DCT block. The DCT has some very interesting properties. First, both forward and inverse DCT are separable. Thus, instead of computing the 2D transform, it is possible to apply a one-dimensional (1D) transform along all the rows of the block and then down the columns of the block. This reduces the number of operations to be performed. As an example, a 1D 8-point DCT will require an upper limit of 64 multiplications and 56 additions. A 2D 8 ; 8-point DCT considered as a set of 8 rows and 8 columns would require 1024 multiplications and 896 additions. Secondly, it can be observed that the transform kernel is a real function. In coding, this is an interesting property because only the real part for each f [x, y] :

SECOND-GENERATION IMAGE CODING

13

Figure 5. The basis function for an 8 ; 8 DCT block.

transform coefﬁcient needs to be coded. This is not necessarily true for other transformations. Finally, fast transform techniques (Chen et al., 1977; Narasimha and Peterson, 1978) that take advantage of the symmetries in the DCT equation can further reduce the computational complexity for this technique. For example, the cosine transform for an N;1 vector can be performed in O(N log N) operations via an N-point FFT. Computational efﬁciency is not the only important feature of the DCT transform. Of primary importance is its role in performing a good energy compaction of the transform coefﬁcients. The DCT also performs this well; it is veriﬁed that in practice the DCT tends towards the optimal KL transform for highly correlated signals such as natural images that can be modeled by a 1st order Markov process (Caglar et al., 1993). Once the DCT has been computed, it is necessary to perform a quantization of the DCT coefﬁcients. At this point of the scheme, no compression has been performed; the quantization procedure will introduce it. Quantization is an important step in coding and, again, it can be performed in several ways. Its main role is to minimize the average distortion introduced by ﬁxing the desired entropy rate (Burt and Adelson, 1993). In practice, it makes more values look the same, so that the subsequent entropy-based coding can improve its performance while coding the DCT coefﬁcients.

14

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI TABLE 1 Example of JPEG Quantization Table 16

11

10

16

24

40

51

61

12

12

14

19

26

58

60

55

14

13

16

24

40

57

69

56

14

17

22

29

51

87

80

62

18

22

37

56

68

109

103

77

24

35

55

64

81

104

113

92

49

64

78

87

103

121

120

101

72

92

95

98

112

100

103

99

As discussed in Section III.B a distortion measure can be deﬁned, either in the sense of an MSE or, in the more interesting, HVS sense. State-of-theart coding techniques are in general based on the former. In this context, it appears that the uniform quantizer is optimal or quasi-optimum for most of the cases (Burt and Adelson, 1993). A quantizer is said to be uniform when the same distance uniformly separates all the quantization thresholds. The simplicity of this method and its optimal performance are the reasons why the uniform quantizer is so widely used in coding schemes and in standards such as JPEG and MPEG. In particular, in these standards, a quantiﬁcation table is constructed to deﬁne a quantization step for each DCT coefﬁcient. Table 1 shows the quantization table used in the JPEG standard. Each DCT coefﬁcient is divided by the corresponding quantization step value deﬁning dynamically their inﬂuence. From a perceptual point of view the MSE optimality criterion is not relevant. Therefore in second generation coding techniques other techniques are proposed such as the one described by Macq (1989) and van den Branden (1996). After the quantization is performed, a reordering of the DCT coefﬁcients in a zig-zag scanning order represents the successive step. This procedure starts parsing the coefﬁcient from the upper left position, which represents the DC coefﬁcient, to the lower right position of the DCT block, which represents the highest frequency, AC, coefﬁcient. The exact order is represented in detail in Fig. 4.

SECOND-GENERATION IMAGE CODING

15

The zig-zag reordering is justiﬁed by a hypothesis on the knowledge of both natural images and HVS properties. In fact it is known that most of the energy in a natural image is concentrated in the DC components, where AC components are both less likely to occur and less important from a visual point of view. This is the reason why we can expect that most of the nonzero coefﬁcients will be concentrated in the DC components. Ordering the coefﬁcients in such a way that all zero or small coefﬁcients are concentrated at the end improves the performance of entropy-based coding techniques by generating a distribution as far as possible from the uniform distribution. In Fig. 6 the results obtained by applying a JPEG-compliant DCT-based compression scheme on the L ena image are shown. The image is in a QCIF format (176;144 pixels) and is shown in both color and black and white format. Four different visual quality results are represented. Each corresponds to a different compression ratio as indicated. The higher the compression ratio, the worse the visual quality, as might be expected. Blocking artifacts are evident at very low bit rates, highly degrading the image quality.

D. Multiscale/Pyramidal Approaches Multiscale and pyramidal coding techniques represent an alternative to the block quantization approach based on DCT (see Section III.C). Both approaches perform a transformation and a ﬁltering of the image in order to compact the energy and improve the coding performances. However, multiscale/pyramidal coding techniques operate on the whole original image instead of operating on limited dimension blocks. In particular, the image is ﬁltered and subsampled in order to produce various levels of image detail at progressively smaller details. An interesting property of this approach, when compared with the DCT, is the possibility for progressive transmission of the image, as will be described later. Moreover, the fact that no blocks are introduced avoids the generation of blocking artifacts, which represents one of the most annoying drawbacks in the DCT-based coding techniques. Multiresolution approaches have recently been of great interest to the video coding research community. From a complexity point of view, the approach of coding an image through successive approximation is often very efﬁcient. From a theoretical point of view, it is possible to discover amazing similarities with the HVS models. In fact, experimental results have shown that the HVS uses a multiresolution approach (Schalkoff, 1989) in completing its tasks. Researches suggest that multifrequency channel decomposition seems to take

Figure 6. Visual performances at different bit rates of a JPEG compliant, DCT-based compression scheme. The top two images are the original images. The following represent the results obtained with different bit rates.

16

SECOND-GENERATION IMAGE CODING

17

place in the human visual cortex, thereby causing the visual information to be processed separately according to its frequency band (Wandell, 1995). Similarly, the retina is decomposed into several frequency band sensors with uniform bandwidth on an octave scale. All of these considerations justify the keen interest shown by the researchers in this direction. In 1983, Burt and Adelson (1983) presented a coding technique based on the Laplacian pyramid. In this approach a lowpass ﬁltering of the original image is performed as ﬁrst step. This is obtained by applying a weighted average function (H ). Next a down-sampling of the image is performed. These two steps are repeated in order to produce progressively smaller images, in both spatial intensity and dimension. All together, the results of these transformations represent the Gaussian Pyramid represented in Figure 7 by the three top images G , G , and G . Each level in the Gaussian Pyramid, starting from the lowest, smallest level, is interpolated to the size of its predecessor in order to produce the Laplacian Pyramid. In terms of the coding, it is the Laplacian Pyramid, instead of the image itself, which is coded. As in the DCT-based coding method, the original image has been transformed to a speciﬁc structure in which each level of the pyramid has a different visual importance. The smallest level of the Gaussian Pyramid represents the roughest representation of the image. If greater quality is required, then successive levels of the Laplacian Pyramid need to be added. If the complete Laplacian Pyramid is available, a perfect reconstruction of the image is possible through the process of adding with appropriate interpolation all the different levels from the smallest resolution to the highest. This structure makes a progressive transmission particularly simple. As in the DCT-based approach, the real coding process is represented by the successive step: the quantization of each level of the pyramid. Again, a uniform quantization is the technique preferred by the authors. They achieve this by simply dividing the range of pixel values into bins of a set width: quantization then occurs by representing each pixel value that occurs within the bin by the bin centroid. Different compression ratios can be achieved by increasing or decreasing the amount of quantization. As before, there is a trade-off between high compression and visual quality. Burt and Adelson (1983) attempted to exploit areas in the image that are largely similar. These similar areas appear at various resolutions, hence, when the subsampled image is expanded and subtracted from the image at the next higher resolution, the difference image (L ) contains large areas of zero, indicating commonality between the two images. These areas can be noticed in Fig. 7 as the dark zones in L and L . The larger the degree of commonality, the greater the amount of zero areas in the difference

18 N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

Figure 7. The scheme of the Gaussian Pyramid approach proposed by Burt and Adelson (1983). G is the original image to be coded. A ﬁrst lowpass ﬁltering, followed by a down-sampling, generates G , which is a smaller version both in spatial intensity and dimension of G . Iterating this process a certain number of times builds the Gaussian Pyramid represented here by G , G , and G . This is used to generate the Laplacian Pyramid, which is the one to be coded. L and L represent in this scheme two levels of the Laplacian Pyramid.

SECOND-GENERATION IMAGE CODING

19

images. Standard ﬁrst-generation coding methods can then be applied to the difference images to produce good compression ratios. With very good image quality, compression ratios of the order of 10:1 are achievable. Other techniques exist that are based on a similar pyramidal approach. Of particular interest in a second generation coding context are those based on mathematical morphology (Salembier and Kunt, 1992; Zhou and Venetsanopoulos, 1992; Toet, 1989). These techniques provide an analysis of the image based on object shapes and sizes; thus, they include features that are relevant to the HVS. The advantage of these techniques is that they do not suffer from ringing effects (cf. Section III.G) even under heavy quantization. However, these techniques produce residual images still with large entropy, thus not efﬁciently compressible through ﬁrst generation coding schemes. Moreover, the residual images obtained are the same size as the original image. These drawbacks do not allow for a practical application of these techniques in image coding. A more detailed discussion on these techniques will be presented in Section III.G.

E. Wavelet-Based Approach Although the wavelet transform-based coding approach is a generalization of multiscale/pyramidal approaches, it deserves to be treated separately. The enormous success it has obtained in the image coding research community and its particular compatibility with the second generation coding philosophy provide the rationale for a more extensive discussion of this category in this overview. Moreover, the future standard for still image compression, JPEG2000, will be based on the wavelet coding system. The wavelet transform represents the most commonly used transform in the current domain of research: subband coding (SBC). The idea is similar to that for the Gaussian Pyramid already described here, but much more general. Using the wavelet tranforms, instead of computing only a lowpass version of the original image, a complete set of subbands is computed by ﬁltering the input image with a set of bandpass ﬁlters. In this way, each subband directly represents a particular frequency range of the image spectrum. A primary advantage of this transform is that it does not increase the number of samples over that of the original image, whereas pyramidal decompositions do. Moreover, wavelet-based techniques are able to efﬁciently conserve important perceptual information like edges, even if their energy contribution to the entire image is low. Other transform coders, like the one based on DCT, decompose images into representations where each coefﬁcient corresponds to a ﬁxed-size spatial area and frequency band. Edge

20

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

information would require many nonzero coefﬁcients to represent them sufﬁciently. At low bit rates other transform coders allocate too many bits to signal behavior that is more localized in the time or space domain and not enough bits to edges. Wavelet techniques therefore offer beneﬁts at low bit rates since information at all scales is available for edges and regions (Shapiro, 1993). Another important characteristic of wavelets in a second generation coding framework is that each subband can be coded separately from the others. This provides the possibility of allocating the total bit rate available according to the visual importance of each subband. Finally, SBC does not suffer from the annoying blocking artifacts reported in the DCT coders. However, it does suffer from an artifact speciﬁc to wavelet transfer, that of ringing. This effect occurs mainly around high— contrast edges and is due to the Gibbs phenomenon of linear ﬁlters. The effect of this phenomenon varies according to the speciﬁc ﬁlter bank used for the decomposition. Taking into account properties of the HVS, SBC, and, in particular wavelet transforms makes it possible to achieve high compression ratios with very good visual quality images. Moreover, as with the pyramidal approach proposed by Burt and Adelson (1983), they permit a progressive transmission of the images by the hierarchical structure they possess. As a general approach, the concept of subband decomposition on which wavelets are based was originally introduced in the speech-coding domain by Crochiere et al. (1976) and Croisier et al. (1976). Later, Smith and Barnwell (1986) proposed a solution to the problem of perfect reconstruction for a 1D multirate system. In 1984, Vetterli (1984) extended perfect reconstruction ﬁlter banks theory to bidimensional signals. In 1986 Woods and O’Neil (1986) proposed the 2D separable quadrature mirror ﬁlter (QMF) banks that introduced this theory in the image-coding domain. The most currently used ﬁlter banks are the QMF proposed by Johnston (1980). These are 2-band ﬁlter banks that are able to minimize a weighted sum of the reconstruction error and the stopband energy of each ﬁlter. Fig. 8 represents a generic scheme for 2-band ﬁlter banks. As they exhibit linear phase characteristics these ﬁlters are of particular interest to the research community; however they do not allow for perfect reconstruction. An alternative is represented by the conjugate quadrature ﬁlter (CQF) proposed by Smith and Barnwell (1986). These allow for perfect reconstruction, but do not have linear phase. M-band ﬁlters also exist as an alternative to quadrature ﬁlters (Vaidyanathan, 1987); however, the overhead introduced by their more complex design and computation has not helped the diffusion of these ﬁlters in the coding domain. Finally, some attempts to deﬁne ﬁlter banks that take further account of HVS properties have been pursued by Caglar et al. (1993) and Akansu et al. (1993).

SECOND-GENERATION IMAGE CODING

21

Figure 8. Generic scheme for a 2-band analysis/synthesis system. H and G are, respectively, the analysis and synthesis lowpass ﬁlter. H and G represent the equivalent high-pass ﬁlters. Perfect reconstruction is achievable when Y (z) is a delayed version of X(z).

Among the complete set of subband ﬁlters that have been developed, an important place in second generation image coding is represented by the wavelet decomposition. This approach takes into consideration the fact that most of the power in natural images is concentrated in the low frequencies. Thus a ﬁner partition of the frequency in the lowpass band is performed. This is achieved by a tree structured system as represented in Fig. 9. A wavelet decomposition is a hierarchical approach: at each level the available frequency band is decomposed into four subbands using a 2-band

Figure 9. A depth 2 wavelet decomposition. On the right, the 2-level tree structure is represented. X is the original image. L H represents a ﬁrst level, lowpass ﬁlter in the horizontal direction and high-pass ﬁlter in the vertical direction. The other ﬁlter deﬁnitions obey the same convention. Since each ﬁltering is followed by a down-sampling step, the complete decomposition can be represented with as many coefﬁcients as the size of the original image, as shown in the left-hand side of this ﬁgure.

22

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

Figure 10. An example of wavelet decomposition. The ﬁlter used to decompose the image ‘‘Lena’’ (512 ; 512) is a 2-level biorthogonal Daubechies’ 9/7 ﬁlter.

ﬁlter bank applied both to the lines and to the columns. This procedure is repeated until the energy contained in the lowest subband ( L L ) is less then a preﬁxed threshold, determined according to an HVS model hypothesis. In Fig. 10 the results of applying wavelet decomposition on the test image Lena are represented. A generic scheme for a wavelet transform coder is represented in Fig. 11. Several different implementations reported in the literature differ according to the wavelet representation, the method of quantization, or the ﬁnal entropic encoder.

Figure 11. A generic scheme for wavelet encoders. First a wavelet representation of the image is generated, then a quantiﬁcation of the wavelet coefﬁcients has to be performed. Finally an entropic encoder is applied to generate the bit-stream. Several choices for each step are available. Some of the most common ones are listed in the ﬁgure.

SECOND-GENERATION IMAGE CODING

23

Among the different families of wavelet representations, it is worth noting the compactly supported orthogonal wavelets. These wavelets belong to the more general family of orthogonal wavelets that generate orthonormal bases of L (R). The important feature of the compactly supported orthogonal wavelets is that in the discrete wavelet transform (DWT) domain they correspond to ﬁnite impulse response (FIR) ﬁlters. Thus they can be implemented efﬁciently (Mallat, 1989; Daubechies, 1993, 1998). In this family the Daubechies’ Wavelets and Coifman’s Wavelets are popular. An important drawback of compactly supported orthogonal wavelets is their asymmetry. This is responsible for the generation of artifacts at the borders of the wavelet subbands as reported by Mallat (1989). To avoid this drawback, he has also investigated noncompact orthogonal wavelets; however they do not represent an efﬁcient alternative due to their complex implementation. An alternative wavelet family that presents symmetry properties is the biorthogonal wavelet representation. This wavelet representation also offers efﬁcient implementations and thus it has been adopted in several wavelet image coders. The example represented in Figure 10 was generated using a wavelet belonging to this family. There has been some work carried out in an attempt to deﬁne methods that identify the best wavelet basis for a particular image. In this framework a generalized family of multiresolution orthogonal or biorthogonal bases that includes wavelets has been introduced; these are regrouped according to Lu et al. (1996) in the wavelet packets family. Different authors have proposed entropic or rate-distortion based criteria to choose the best basis from this wide family (Coifman and Wickerhausen, 1992; Ramchandran and Vetterli, 1993). In a second generation image coding framework, of particular interest is the research carried out on the zero-crossings and local maxima of wavelet transforms (Mallat, 1991; Froment and Mallat, 1992). These techniques directly introduce in the wavelet framework the concept of edges and contours, so important in the HVS (Croft and Robinson, 1994; Mallat and Zhong, 1991). More detail on this approach will be given in Section III.F. The choice of the wavelet to be used is indeed a key issue in designing a wavelet image coder. The preceding short discussion shows that many different choices are available: not all directly take into account HVS considerations. These can, however, be introduced in the subsequent quantization step of the coding process. As was discussed, a wavelet representation generates, for each image, a number of 3D ; 1 subbands, where D represents the levels of the decomposition (dyadic scales). Each subband shows different statistical behaviors; thus it is important to apply an optimized quantization for each of them.

24

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

As already discussed in Section III.C for the DCT transform, and as reported by Jain (1989), the uniform quantizer is a quasi-optimum solution for an MSE criterion. In this case we simply need to deﬁne for each subband a quantizer step. Note that this solution is similar to the one used in the JPEG standard, where each coefﬁcient in the DCT block is associated with a different quantization step (see Table 1). This choice is the one used in practice by a well-known software package, EPIC (Simoncelli and Adelson). In this case an initial step size is deﬁned and divided by a factor of two as one goes to the next coarser step in the wavelet decomposition. Thus the lowest subband that provides most of the visual information is ﬁnely quantized with the smallest step size. Other methods increase the compression by mapping small coefﬁcients in the highest frequency bands to zero. Research has also been performed aimed at the design of HVS-based quantizers. In particular Lewis and Knowles (1992) designed a quantizer that considers the HVS’s spectral response, noise sensitivity in background luminance, and texture masking. For scalar quantization, the uniform quantization performs well; other alternatives are represented by the vector quantization (VQ) methods. Generally VQ performs better then SQ as discussed in Senod and Girod (1992). The principle is to quantize vectors or blocks of coefﬁcients instead of the coefﬁcient itself. This generalization of the SQ takes into account the possible correlation between coefﬁcients, already at the quantization step. Cicconi et al. (1994) describe a Pyramidal Vector Quantization that takes into account correlation between subbands that belong to the same frequency orientations. Thus both intra- and interband correlation are taken into account during the quantization process. In the same contribution the authors also introduce a criterion for a perceptual quantization of the coefﬁcients, which is particularly suited to second generation image coding techniques. Another possible solution in wavelet coders is represented by a successiveapproximation quantization. In this category, it is important to cite the method proposed by Shapiro (1993): ‘‘embedded zerotree wavelet algorithm’’ (EZW). This method tries to predict the absence of signiﬁcant information across the subbands generated by the wavelet decomposition. This is achieved in deﬁning a zerotree structure. Starting from the lowest frequency subband, a father-children relationship is deﬁned recursively through all the following subbands as represented in Fig. 12. Basically the quantization is performed by successive approximation across the subbands with the same orientation. Similar to the zig-zag scanning reported in Section III.C, a scanning of the different subbands as shown in Fig. 13 is performed. This strategy turns out to be an efﬁcient technique to code zero and nonzero quantized values.

SECOND-GENERATION IMAGE CODING

Figure 12. Parent-children relationship deﬁned by the EZW algorithm.

Figure 13. Zero-tree scanning order for a 3-scale QMF wavelet decomposition.

25

26

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

Research by both Said and Pearlman (1996) and Taubman and Zakhor (1994) based on the same principle developed by Shapiro, provided even better coding performances. Their new techniques are known as set partitioning in hierarchical trees (SPIHT) (Said and Pearlman, 1996) and layered zero coefﬁcient (LZC) (Taubman and Zakhor, 1994). Recently, new efforts have been devoted to the improvement of these coding techniques with special attention to both HVS properties and color components (Lai and Kuo, 1998a,b; Nadenau and Reichel, 1999). An interesting example is represented by the technique proposed by Nadenau and Reichel (1999). This technique is based on an efﬁcient implementation of the LZC method (Taubman and Zakhor, 1994). It applies the lifting steps approach, presented by Daubechies and Sweldens (1998) in order to reduce the memory and the number of operations required to perform the wavelet decomposition. It also performs a progressive coding based on the HVS model and includes color effects. The HVS model is used to predict the best possible bits allocations during the quantization step. In particular, the color image is converted into the opponent color space discussed by Poirson and Wandell (1993, 1996): this representation reﬂects better the usual YCbCr representation, the properties of color perception in the HVS model. Finally, this technique produces a visually embedded bit-stream. This means that not only the quality improves as more bytes are received and that the transmission can be stopped at anytime, but that the partial results are always coded with the best visual quality.

F. Edge Detection Mallat and Zhong (1991) point out that in most cases structural information required for recognition tasks is provided by the image edges. However, one major difﬁculty of edge-based representation is to integrate all the image information into edges. Most edge detectors are based on local measurements of the image variations and edges are generally deﬁned as points where the image intensity has a maximum variation. Multiscale edge detection is a technique in which the image is smoothed at various scales and edge points are detected by a ﬁrst- or second-order, differential operator. The coding method presented involves two steps. First, the edge points considered important for visual quality are selected. Second, these are efﬁciently encoded. Edge points are chained together to form edge curves. Selection of the edge points is performed at a scale of 2. This means that the edge points are selected from the image in the pyramidal structure that has been scaled to a factor of four. Boundaries of important structures often

SECOND-GENERATION IMAGE CODING

27

generate long edge curves, so, as a ﬁrst step, all edge curves whose lengths are smaller than a threshold are removed. Among the remaining curves, the ones that correspond to the sharpest discontinuities in the image are selected. This is achieved by removing all edge curves along which the average value of the wavelet transform modulus is smaller than a given amplitude threshold. After the removal procedures, it is reported that only 8% of the original edge points are retained; however it is not clear if this ﬁgure is constant for all images. Once the selection has been performed, only the edge curves at scale 2 are coded in order to save bits; the curves at other scales are approximated from this. Chain coding is used to encode the edge curve at this scale. The compression ratio reported by Mallat and Zhong (1991) with this method is approximately 27:1 with good image quality. G. Directional Filtering Directional ﬁltering is based on the relationship between the presence of an edge in an image and its contribution to the image spectrum. It is motivated by the existence of direction-sensitive neurons in the HVS (Kunt et al., 1985; Ikonomopoulos and Kunt, 1985). It can be seen that the contribution of an edge is distributed all over the spectrum; however, the highest frequency component lies in the direction orthogonal to that of the edge. It can also be seen that the frequency of the contribution diminishes as we turn away from this direction, until it vanishes at right angles to it. A directional ﬁlter is one whose frequency response covers a sector or part of a sector in the frequency domain. If f and g are spatial frequencies and r is the cut-off frequency of the lowpass ﬁlter, then the ideal frequency response of the ith directional ﬁlter of a set of n is given by:

1, if tan\(g/ f ) G ( f, g) : 0, otherwise with : (i 9 1) , 2n

: (i ; 1) 2n

and f , g 0.5. A directional ﬁlter is a high-pass ﬁlter along its principal direction and a lowpass ﬁlter along the orthogonal direction. The directional ﬁlter response is modiﬁed, as in all ﬁlter design, by an appropriate window function

28

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

(Harris, 1978), to minimize the effect of the Gibbs phenomenon (Ziemer et al., 1989). In the sum of a trigonometric series, it can be seen that there tend to be overshoots in the signal being approximated at a discontinuity. This is referred to as the Gibbs phenomenon. An ideal ﬁlter can be viewed as a step or rectangular pulse waveform, that is, a discontinuous waveform. The reason for the overshoot at discontinuities can be explained using the Fourier transform. Consider a signal x(t) with a Fourier transform X( f ). The effect of reconstructing x(t) from its lowpass part shows that:

x (t) : F\ X( f )

f 2W

,

where

f 1 when f W : 2W 0 otherwise.

According to the convolution theorem of Fourier transform theory,

x (t) : x(t) 9 F\

f 2W

: x(t) 9 (2W sinc 2Wt).

Bearing in mind that convolution is a folding-product, sliding-integration process, it can be seen that a ﬁnite value of W will always result in x(t) being viewed through the sinc window function; even as W increases, more of the frequency content of the rectangular pulse will be used in the approximation of x(t). In order to eliminate the Gibbs phenomenon it is important to modify the frequency response of the ﬁlter by a window function. There are many window functions available, each with different frequency responses. The frequency response of the chosen window function is convolved with the ﬁlter response. This ensures that the overall frequency response does not contain the sharp discontinuities that cause the ripple. In a general scheme using directional ﬁlters, n directional ﬁlters and one lowpass ﬁlter are required. An ideal lowpass ﬁlter has the following frequency response: G

( f, g) :

1, 0,

if f ; g r

otherwise.

It should be noted that superposition of all the directional images and the lowpass image lead to an exact reconstruction of the original image. Two parameters are involved in the design of a directional ﬁlter-based image coding scheme: the number of ﬁlters and the cutoff frequency of the lowpass ﬁlter. The number of ﬁlters may be set a priori and is directly related to the

SECOND-GENERATION IMAGE CODING

29

minimum width of the edge elements. The choice of lowpass cutoff frequency inﬂuences the compression ratio and the quality of the decoded image. As reported by Kunt et al. (1985) a very early technique in advance of its time was the synthetic highs system (Schreiber et al., 1959; Schreiber, 1963). It is stated by Kunt that the better known approach of directional ﬁltering is a reﬁnement of the synthetic highs system. In this technique, the original image is split into two parts: the lowpass picture showing general area brightness and the high-pass image containing edge information. Twodimensional sampling theory suggests that the lowpass image can be represented with very few samples. In order to reduce the amount of information in the high-pass image, thresholding is performed to determine which edge points are important. Once found, the location and magnitude of each edge point is stored. To reconstruct the compressed data, a 2D reconstruction ﬁlter, whose properties are determined by the lowpass ﬁlter used to produce the lowpass image, is used to synthesize the high frequency part of the edge information. This synthesized image is then added to the lowpass image to give ﬁnal output. Ikonomopoulos and Kunt (1985) describe their technique for image coding based on the reﬁnement of the synthetic highs system, directional ﬁltering. Once the image has been ﬁltered, the result is 1 lowpass image and 16 directional images. The coding scheme proposed is lossy since high compression is the goal. When the image is ﬁltered with a high-pass ﬁlter the result gives zero-crossings at the location of abrupt changes (edges) in the image. Each directional component is represented by the location and magnitude of the zero-crossing. Given that a small number of points result from this process, typically 6—10% of the total number of points, run length encoding proves efﬁcient for this purpose. The low frequency component can be coded in two ways. As maximum frequency of this component is small, it can be resampled based on 2D sampling theorem and the resulting pixels can be coded in a standard way. Alternatively, transform coding may be used, with the choice of transform technique being controlled by the ﬁltering procedure used. The transform coefﬁcients may then be quantized and coded via Huffman coding (Huffman, 1952). The compression ratios obtained with this technique depend on many factors. The image being coded and the choice of cutoff frequency all play an important role in the ﬁnal ratio obtained. The compression scheme can be adapted to the type of image being compressed. Zhou and Venetsanopoulos (1992) present an alternative spatial method called morpholological directional coding. In their approach, spatial image features at known resolutions are decomposed using a multiresolution morphological technique referred to as the feature-width morphological pyramid (FMP). Zhou and Venetsanopoulos (1992) report that nontrivial

30

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

spatial features, such as edges, lines, and contours within the image, determine the quality of the reproduced image for the human observer. It was this fact that motivated them to employ a stage in their coding technique that identiﬁes these nontrivial features in order that they may be coded separately. Morphological directional coding schemes were developed to preserve nontrivial spatial features in the image during the coding phase. Such ﬁltering techniques are used for feature separation, as they are spatial methods that are capable of selectively processing features of known geometrical shapes. A multiresolution morphological technique therefore decomposes image features at various resolutions. In this technique image decomposition is a multistage process involving a ﬁlter called an open-closing (OC) ﬁlter. Each ﬁltered image from the current stage is used as the input to the next stage, and in addition the difference between the input and output images of each stage is calculated. The ﬁrst N 9 1 decomposed subimages (L % L ) are termed feature images and \ each contains image features at known resolutions. For example, L contains image features of width 1, L has features of width 2, and so on. Each OC ﬁlter has a structuring element associated with it, with those for stage n progressively larger than for the previous stage n 9 1. The structuring element deﬁnes the information content in each of the decomposed images. The decomposed FMP images contain spatial features in arbitrary directions. Therefore directional decomposition ﬁltering techniques are applied to each of the FMP images in order to group features of the same direction together. Before this is implemented, the features in the FMP images, L , . . . , L , must be eroded to 1-pixel width. There are two \ reasons for this feature thinning phase (Zhou and Venetsanopoulos, 1992). First, the directional decomposition ﬁlter bank gives better results for features of 1 pixel width and second, it is more efﬁcient and simpler to encode features of 1-pixel width. After the FMP images have been directionally decomposed, the features are further quantized by a nonuniform scalar quantizer. Each extracted feature is ﬁrst encoded with a vector and then each vector is entropy encoded. The coarse image L is encoded using conventional methods such as VQ. Both of these methods employ directional decomposition as the basis of their technique. Ikonomopoulos and Kunt (1985) implemented a more traditional approach in that the directional decomposition ﬁlters are applied directly to the image. In their method the compression ratio varies from image to image. The ﬁlter design depends on many factors, which in turn affect the compression ratio. Therefore Ikonomopoulos and Kunt (1985) state that these parameters should be tuned to the particular image because the quantity, content, and structure of the edges in the image determine the

SECOND-GENERATION IMAGE CODING

31

compression obtained. Despite these factors, compression in the order of 64:1 is reported with good image quality. The morphological ﬁltering technique by Zhou and Venetsanopoulos (1992) separates the features into what they refer to as FMP images. Traditional directional decomposition techniques are applied to these FMP images in order to perform the coding process. The compression ratios reported by this method are reasonable at around 20:1.

IV. Segmentation-Based Approaches A. Overview A general scheme of a segmentation-based image coding approach is represented in Fig. 14. The original image is ﬁrst preprocessed in order to eliminate noise and small details. The segmentation is then performed in order to organize the image as a set of regions. These might represent the objects in the scene, or more generally some homogeneous group of pixels. Once regions have been generated, the coding step takes place. This is now composed of two different procedures: contour coding and texture coding. The former is responsible for coding the shape of each region so that it can be reconstructed later at the decoder site and the latter is responsible for coding the texture inside each region. These two procedures generate two bit-streams that, together, are used for the reconstruction of the original image. The segmentation-based approaches have strong motivations in the framework of second generation image coding. The visual data to be coded is generally more coherent inside a region that is semantically more meaningful than predeﬁned blocks. The introduction of a semantic representation of the scene might increase the decorrelation of the data, thus providing a higher energy compaction with consequent improved compression performances.

Figure 14. Generic scheme for segmentation-based approach to image coding.

32

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

Moreover, an object representation of the scene is the key point for dynamic coding (Ebrahimi et al., 1995). Each region can be coded independently from the others; this means that the best coding approach that suits the statistics of the single region can be applied. The introduction of a semantic representation of the image has another advantage: that of object interaction. This concept is particularly suitable for video sequences, and is one of the keypoints of the new MPEG-4 standard, but it can also be extended to still image coding. As we have mentioned, the HVS is able to recognize objects and automatically assign a different priority to an object of high or low interest. This can be simulated in a segmentation-based approach by association to visually or semantically important regions, high bandwidth, and to less crucial objects with low bandwidth. Research to predict and dynamically allocate the available bit rate, have been performed by Fleury et al. (1996) and Fleury and Egger (1997). In addition to these advantages, segmentation-based coding approaches suffer from some major drawbacks. First, the segmentation process is computationally expensive and generally not very accurate or automatic. Thus, it is still not possible to correctly analyze, in realtime, the semantic content of a generic image. This is a severe limitation for practical applications. Second, for each region we want to compress, it is necessary to code not only the texture information but also the contour information. This introduces an overhead that might even overcome the advantages obtained by coding a more coherent region. Finally, it has been shown that a semantic representation of the scene does not always provide homogeneous regions suitable for high compression purposes. In the next section, a brief review of important preprocessing techniques will be outlined. In Section IV.C, an overview of existing segmentation techniques will be proposed. A discussion of texture and contours coding will be presented in Sections IV.D and E. Finally, a review of major coding techniques based on a segmentation approach will be presented in the last four Sections, IV.F, G, H, and I.

B. Preprocessing The purpose of preprocessing is to eliminate small regions within the image and remove noise generated in the sampling process. It is an attempt to model the action of the HVS and is intended to alter the image in such a way that the preprocessed image will resemble more accurately the human brain processes. There are various methods used to preprocess the image,

SECOND-GENERATION IMAGE CODING

33

all derived from properties of the HVS. Two properties commonly used are Weber’s Law and the modulation transfer function (MTF) (Jang and Rajala, 1990, 1991; Civinlar et al., 1986). Marque´s et al. (1991) suggest the use of Steven’s Law. This accounts for a greater sensitivity of the HVS to gradients in dark areas as compared to light ones. For example, if B is the perceived brightness and I the stimulus intensity then: B : K.I. Therefore, by preprocessing according to Steven’s Law, visually homogeneous regions will not be split and unnecessary and heterogeneous dark areas will not be falsely merged. In addition, the inverse gradient ﬁlter (Wang and Vagnucci, 1981) has also been implemented in order to give a lowpass response inside a region and an all-pass response on the region’s contour (Kwon and Chellappa, 1993; Kocher and Kunt, 1986). This is an iterative scheme that employs a 3 ; 3 mask of weighting coefﬁcients. These coefﬁcients are the normalized gradient inverse between the central pixel and its neighbors. If the image to be smoothed is expressed as an n ; m array, whose coefﬁcients p(i, j) are the graylevel of the image pixel at (i, j) with i : 1 . . . n and j : 1 . . . m, the inverse of the absolute gradient at (i, j) is then deﬁned as (i, j : k, l) :

1 p(i ; k, j ; l) 9 p(i, j)

where k, j : 91, 0, 1 but k and l are not equal to zero at the same time. This means that (i, j : k, l)s are calculated for the eight neighbors of (i, j); this is denoted the vicinity V (i, j). If p(i ; k, j ; 1) : p(i, j), then the gradient is zero and (i, j : k, l) is deﬁned as 2. The proposed 3 ; 3 smoothing mask is deﬁned as: w(i 9 1, j 9 1) W (i, j) :

w(i 9 1, j)

w(i 9 1, j ; 1)

w(i, j 9 1)

w(i, j)

w(i, j ; 1)

w(i ; 1, j 9 1)

w(i ; 1, j)

w(i ; 1, j ; 1)

where w(i, j) : and w(i ; k, j ; l) : [ (i, j : k, l)]\(i, j : k, l) for k, l : 91, 0, 1, but not 0 at the same time. The smoothed image is then given as p (i, j) : w(i ; k, j ; l)p(i ; k, j ; l). \ \ Finally, the anisotropic diffusion ﬁltering (Perona and Malik, 1990; Yon et al., 1996; Szira´nyi et al., 1998) is worth citing as a preprocessing method

34

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

Figure 15. Example of anisotropic diffusion applied to a natural image. On the left-hand side, the original image is displayed; on the right-hand side, the ﬁltered version is represented.

because it is effective in smoothing image details by preserving edges information as shown in Fig. 15.

C. Segmentation Techniques: Brief Overview This section introduces some of the commonly used methods for segmenting an image. Segmentation groups similar pixels into regions and separates those pixels that are considered dissimilar. It may be thought of as representing an image by a disjoint covering set of image regions (Biggar et al., 1988). Many segmentation methods have been developed in the past (Pal and Pal, 1993; Haralick, 1983) and it is generally the segmentation method that categorizes the coding technique. Most of the image segmentation techniques today are applied to video sequences. Thus, they have access to motion information, and are extremely useful in improving their performances. We have focused here on still image coding, but motion information remains an important HVS feature. Thus in the following we will also refer to those techniques that integrate both spatial and temporal information to achieve better segmentation of the image. 1. Region Growing Region growing is a process that subdivides a (ﬁltered) image into a set of adjacent regions, whose gray-level variation within the region does not exceed a given threshold. The basic idea behind region growing is that, given a starting point within the image, the largest set of pixels whose gray level is within the speciﬁed interval is found. This interval is adaptive in that it is allowed to move higher or lower on the grayscale in order to intercept the maximum number of pixels. Figure 16 illustrates the concept of regiongrowing for two contrasting images.

SECOND-GENERATION IMAGE CODING

35

Figure 16. Images a) and b) are, respectively the original test images ‘‘Table Tennis’’ and ‘‘Akiyo.’’ Images c) and d) are the corresponding segmentations obtained through region growing.

2. Split and Merge Split-&-merge algorithms (Pavlidis, 1982) segment the image into sets of homogeneous regions. In general, they are based around the quadtree (Samet, 1989) data structure. Initially the image is divided into a predeﬁned subdivision; then, depending on the segmentation criteria, adjacent regions are merged if they have similar gray-level variations or a quadrant is further split if large variations exist. An example of this method is displayed in Fig. 17.

Figure 17. An example of quadtree decomposition of the image ‘‘Table Tennis.’’ Initial decomposition in square blocks is iteratively reﬁned through successive split and merging steps.

36

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

Figure 18. Segmentation results obtained by applying the method proposed by Ziliani (1998).

3. K-Means Clustering K-means clustering is a segmentation method based on the minimization of the sum squared distances from all points in a cluster to a cluster center. First k-initial cluster centers are taken and the image vectors are iteratively distributed among the k-cluster domain. New cluster centers are computed from those results in such a way that the sum of the squared distances from all points in a cluster to a new cluster center has been minimized. It is interesting to note that this method can characterize each cluster and each pixel of the image with several features, including luminance, color, textures, etc., as described in Castagno (1998) and Ziliani (1998). In Fig. 18, an example of the segmentations obtained in applying the method proposed by Ziliani (1998) is presented. 4. Pyramidal L inking This method proposed by Burt et al. (1981) uses a pyramid structure where ﬂexible links between the nodes of each layer are established. At the base of the pyramid the original image is assumed. The layers consist of nodes that comprise the feature values and other information as described by Ziliani and Jensen (1998). The initial value for the node of a layer is obtained by computing the mean of the certain area in the layer below. This is done for the all-nodes in a way that they correspond to partially overlapping regions. After this is done for the entire pyramid, father-son relationships are deﬁned between the current layer and the layer below using those nodes that participated in the initial feature computation. Using these links, the feature values of all layers are updated again and afterwards new links are established. This is repeated until a stable state is reached. In Fig. 19 an example of the segmentation obtained in applying the method proposed in Ziliani and Jensen (1998) is represented.

SECOND-GENERATION IMAGE CODING

37

Figure 19. These are the regions obtained by applying to ‘‘Table Tennis’’ the Pyramid Linking segmentation proposed by Ziliani and Jensen (1998).

5. Graph T heory There are a number of image segmentation techniques that are based on the theory of graphs and their applications (Morris et al., 1986). A graph is composed of a set of ‘‘vertices’’ connected to each node by ‘‘links.’’ In a weighted graph the vertices and links have weights associated with them. Each vertex need not necessarily be linked to every other, but if they are, the graph is said to be complete. A partial graph has the same number of vertices but only a subset of the links of the original graph. A ‘‘spanning tree’’ can be referred to as a partial graph. A ‘‘shortest spanning tree’’ of a weighted graph is a spanning tree such that the sum of its link weights is a minimum for many possible spanning trees. To analyze images using graph theory, the original image must be mapped onto a graph. The most obvious way to do this is to map every pixel in the original image onto a vertex in the graph. Other techniques generate a ﬁrst over-segmentation of the image and map each region instead of each pixel. This reduces complexity and improves segmentation results because each node of the graph is already a coherent structure. Recently, Moscheni et al. (1998) have proposed an effective segmentation technique based on graphs. 6. Fractal Dimension The fractal dimension D is a characteristic of the fractal model (Mandelbrot, 1982), which is related to properties such as length and surface of a curve. It provides a good measure of the perceived roughness of the surface of the image. Therefore, in order to segment the image, the fractal dimension across the entire image is computed. Various threshold values can then be used to segment the original image according to its fractal dimension.

38

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

D. Texture Coding According to the scheme presented in Fig. 14, once segmentation of the image has been performed, it is necessary to code each deﬁned region. By taking into account the hypothesis that the segmentation has generated luminance homogeneous regions, a ﬁrst approach to code their texture is represented by the polynomial approximation approach presented in Section IV.D.1. However, we have already noticed that the need for a semantic representation of the scene that might be useful for dynamic coding applications does not always correspond to the deﬁnition of homogeneous regions. In these cases, a more general approach such as the shape-adaptive DCT transform (Section IV.D.2) is used. 1. Polynomial Approximation In order to efﬁciently code the gray-level content of the regions, these are represented by an n-order polynomial. The basic idea behind polynomial ﬁtting is that an attempt is made to model the gray-level variation within a region by an order-n polynomial while ensuring that the MSE between the predicted value and the actual is minimized. An order-0 polynomial would ensure that each pixel in the region is represented by the average intensity value of the region. An order-1 polynomial is represented by: z : a ; bx ; cy, where z : new intensity value at (x, y). 2. Shape-Adaptive DCT The shape-adaptive DCT (SADCT) proposed by Sikora (1995) and Sikora and Makai (1995) is currently very popular. The transform principles are the same as we have already introduced in Section III.C: The image is organized in N ; N blocks of pixels as usual. Some of these will be completely inside the region to be coded while others will contain pixels belonging to the region and pixels outside the region to be coded. For those blocks completely contained in the region to be coded, no differences have been introduced from the standard DCT-based coder. For those blocks that contain some pixels of the region to be coded, a shift of all the pixels of the original shape to the upper bound of the block is ﬁrst performed. Each column is then transformed, based on the DCT transform matrix deﬁned by Sikora and Makai (1995). Then another shift to the left bound of the block is performed. This is followed by a DCT transform of each line of coefﬁcients. This ﬁnal step provides the SADCT coefﬁcients for the block. This algorithm is efﬁcient because it is simple and it generates a total

SECOND-GENERATION IMAGE CODING

39

number of coefﬁcients corresponding to the number of pixels in the region to be coded. Its main drawback is decorrelation of the nonadjacent pixels it introduces. Similar techniques also exist for the wavelet transforms (Egger et al., 1996).

E. Contours Coding As illustrated in Fig. 14, a segmentation-based coding approach requires a contour coding step in addition to a texture coding step. This is necessary to correctly reconstruct the shape of the regions deﬁned during the segmentation step. Contour coding can be a complex problem. The most simple solution is to record every pixel position in the region in a bitmap-based representation. This is not the most efﬁcient approach but it can achieve good compression performances when combined with efﬁcient statistical entropy coding. The trade-off between exact reconstruction of the region and the efﬁcient coding of its boundaries has been the subject of much research (Rosenfeld and Kak, 1982; Herman, 1990). Freeman chain coding (1961) is one of the earlier and most referenced techniques that attempts to code region contours efﬁciently by representing the given contour with an initial starting position and a set of codes representing relative positions. The Freeman chain codes are shown in Fig. 20. In this coding process an initial starting point on the curve is stored via its (x, y) coordinates. The position of the next point on the curve is then located. This position can be in 1 of the 8 locations illustrated in Fig. 20. If, for example, the next position is (x, y 9 1) then the pixel happens to lie in position 2 according to Freeman and hence a 2 is output. This pixel is then updated as the current position and the coding process repeats. The coding

Figure 20. Each number represents the Freeman chain code for each possible movement of the central pixel.

40

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

is terminated when either the original start point has been reached (closed contour), or no further points on the curve can be found (open contour). The code is an efﬁcient representation of the contour because at least 3 bits are required to store each code in the chain; however further gains can be achieved by applying entropy coders or lossy contour coding techniques to the contours. In addition to chain coding, other approaches have been investigated. We cite the geometrical approximation methods (Gerken, 1994; Schroeder and Mech, 1995) and the methods based on mathematical morphology (Briggar, 1995). A recent technique based on polygonal approximation of the contour that provides progressive and efﬁcient compression of region contours is one that was proposed by Le Buhan et al. (1998).

F. Region-Growing Techniques Kocher and Kunt (1983) presented a technique based on region growing called contour texture modeling. The original image is preprocessed by the inverse gradient ﬁlter (Wang and Vagnucci, 1981) to remove picture noise in preparation for the region growing process. After the growing process, a large number of small regions are generated, some of which must be eliminated. This elimination is necessary in order to reduce the number of bits required to describe the segmented image, and thus increase compression ratio. It is performed on the basis of removal of small regions and merging of weakly contrasting regions. Regions whose gray-level variations differ slightly are considered weakly contrasting. In this technique, contour coding is performed in stages. First an orientation of each region contour is deﬁned. Then spurious and redundant contour points are deleted. Also small regions are merged with nearby valid regions. Finally, the contours are approximated by line and circle segments and coded through differential coding of the successive end-of-segment addresses. Texture coding is achieved by representing the gray-level variation within the region by an nth-order polynomial function. As a ﬁnal step, pseudorandom noise is added in order to produce a natural looking image. Civanlar et al. (1986) present an HVS-based segmentation coding technique in which a variation of the centroid linkage region growing algorithm (Haralick, 1983) is used to segment the image after preprocessing. In a centroid linkage algorithm the image is scanned in a set manner, for example, left to right or top to bottom. Each pixel is compared to the mean gray-level value of the already partially constructed regions in its neighborhood and if the values are close enough the pixel is included in the region

SECOND-GENERATION IMAGE CODING

41

and a new mean is computed for the region. If no neighboring region has a close enough mean, the pixel is used to create a new segment whose mean is the pixel value. In the technique by Civanlar et al. (1986), the centroid linkage algorithm described here applies. However, if the intensity difference is less than an HVS visibility threshold, the pixel is joined to an existing segment. If the intensity differences between the pixel and its neighbor segments are larger than the thresholds, a new segment is started. The work by Kocher and Kunt (1983) provides the facility to preset the approximate compression ratio prior to the operation. This is achieved by setting the maximum number of regions that will be generated after the region growing process. The results obtained via their method are good both in terms of reconstructed image quality and compression ratio. However, they point out that the performance of their technique in terms of image compression and quality is optimal for images that are naturally composed of a small number of large regions. Civanlar et al. (1986) report good image quality and compression ratios comparable to those achieved by Kocher and Kunt (1983). G. Split-and-Merge-Based Techniques Kwon and Chellappa (1993) and Kunt et al. (1987) present a technique based on a merge-and-threshold algorithm. After the image has been preprocessed, the intensity difference between two adjacent regions is found. If this difference is less than or equal to k, which has been initialized to 1, the regions are merged and the average of the intensities is computed. A histogram of the merged image is computed and if separable clusters exist, the above steps are repeated; otherwise, the original image is segmented by thresholding the intensity clusters. When the overall process is complete the regions obtained may be represented by an nth order polynomial. The preceding method of segmentation extracts only homogeneous regions and thus for textured regions a large number of small homogeneous regions will be generated. In terms of image coding, it is more efﬁcient to treat textured areas as one region as opposed to several small regions. Therefore, in addition to the homogeneous region extraction scheme, textured regions are also extracted and combined with the results of the uniform region segmentation. Multiple features are used in the texture extraction process, along with the recursive thresholding method using multiple 1D histograms. First, the image is regarded as one region. A histogram is then obtained within each region of the features to be used in the extraction process. The histogram showing the best clusters is selected and this corresponding region is then

42

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

split by thresholding. These steps are repeated for all regions until none of the histograms exhibit clustering. Final segmentation is achieved by labeling the extracted uniform regions. If the area of such a region is covered by more than 50% of a textured region of type ‘‘X’’, then the uniform region is labeled as a textured region of that type. Adjacent uniform regions are merged with a texture region if they show at least one similar texture feature with the corresponding texture region. In terms of coding, uniform regions are represented by polynomial reconstructions. Texture regions are represented by a texture synthesis technique using the Gaussian Markov random ﬁeld (GMRF) model (Chellappa et al., 1985). Encoding the image therefore involves storing information about the contours of the regions, polynomial coefﬁcients of the uniform regions, GMRF parameters for textured regions, and a means of identifying each region. Variable bits are allocated for each component. Another approach based on a split-and-merge algorithm is that by Cicconi and Kunt (1977). Segmentation is performed by initially clustering the image using a standard K-means clustering algorithm (Section IV.C.3). Once the image has been segmented into feature homogeneous areas, an attempt to further reduce the redundancy inside the regions is implemented by looking for symmetries within the regions. In order to do this the medial axis transformation (MAT) (Pavlides, 1982) is used for shape description. The MAT is a technique that represents, for each region, the curved region descriptor. The MAT corresponds closely to the skeleton that would be produced by applying sequential erosion to the region. Values along the MAT represent the distance to the edge of the region and can be used to ﬁnd its minimum and maximum widths. The histogram of the values will give the variation of the width. Once the MAT has been found, a linear prediction of each pixel in one side of the MAT can be constructed from pixels symmetrically chosen in the other side. Coding of the segmented image is performed in two stages — contours coding and texture coding. As the MAT associated with a region is reconstructible from a given contour, only contours have to be coded. Texture components in one part of the region with respect to the MAT may be represented by a polynomial function. However, representing the polynomial coefﬁcients precisely requires a large number of bits. Therefore, the proposed method suggests deﬁning the positions of 6 pixels, which are found in the same way for all regions, then quantizing these 6 values. These quantized values allow the unique reconstruction of the approximating second-order polynomial. Both of the preceding techniques are similar in that they employ a split-and-merge algorithm to segment the original image. However, Kwon and Chellappa (1993), state that better compression ratios may be obtained by segmenting the image into uniform and textured regions. These regions

SECOND-GENERATION IMAGE CODING

43

may be coded separately and in particular the textured regions may be more efﬁciently represented by a texture synthesis method, such as a GMRF model, as opposed to representing the textured region with many small uniform regions. Cicconi and Kunt’s method (1977) segments the image into uniform regions and, in addition, they propose to exploit further redundancy in these regions by identifying symmetry within the regions. The gray-level variation within each of the uniform regions is represented using polynomial modeling. Cicconi and Kunt further developed a method for reducing the storage requirements for the polynomial coefﬁcients. Despite the different methods used to represent both the contours and the gray-level variations within the regions, both methods report similar compression ratios. H. Tree/Graph-Based Techniques Biggar et al. (1988) developed an image coding technique based on the recursive shortest spanning tree (RSST) algorithm (Morris et al., 1986). The RSST algorithm maps the original image onto a region graph so that each region initially contains only one pixel. Sorted link weights, associated with the links between neighboring regions in the image, are used to decide which link should be eliminated and therefore which regions should be merged. After each merge, the link weights are recalculated and resorted. The removed links deﬁne a spanning tree of the original graph. Once the segmentation is complete, the spanning tree is mapped back to image matrix form, thus representing the segmented image. The regions generated are deﬁned by coding the lines that separate the pixels belonging to different regions. The coded segmented image consists of three sources: a list of coordinates from which to start tracing the edges; the edge description; and a description of the intensity proﬁle within each region. Although the intensity proﬁle within the region could be represented as a simple ﬂat intensity plateau, it has been suggested by Kunt et al. (1985) and Kocher and Kunt (1983) that a better result is achievable by higher-order polynomial representation. Biggar et al. (1988) suggest that to embed the polynomial ﬁtting procedure at each stage of the region-merging process, as Kocher and Kunt (1983) do, would be computationally too expensive. Therefore in this case a ﬂat intensity plane is used to generate the regions and polynomials are ﬁtted after the segmentation is complete. The edge information is extracted from the segmented image using the algorithm for thin line coding by Kaneko and Okudaira (1985). A similar technique to the forementioned, based on the minimum spanning forest MSF is reported by Leou and Chen (1991). Segmentation and contour coding and performed exactly as described by Biggar et al. (1988), however

44

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

the intensity values within a segmented region are coded with polynomial representation. Here a texture extraction scheme is used, based on the assumption that lights are cast overhead on the picture and that the gray values vary according to the distance to the corresponding region centroid. After texture extraction, the regions have a high pixel-to-pixel correlation. Therefore, for simplicity and efﬁciency, a polynomial representation method is used to encode the texture. This is achieved by representing any row of the image by a polynomial. A different graph theory approach is presented by Kocher and Leonardi (1986) based on the region adjacency graph (RAG) data structure (Pavlidis, 1982). The RAG is again a classical map graph with each node corresponding to a region and links joining nodes representing adjacent regions. The basic idea of the segmentation technique is that a value that represents the degree of dissimilarity existing between two adjacent regions is associated to a graph link. The link that exhibits the lowest degree of dissimilarity is removed and the two nodes it connects are merged into one. This merging process is repeated until a termination criterion is reached. Once complete, the RAG representation is mapped back to the image matrix form, and thus a segmented image is created. The segmented image is coded using a polynomial representation of the regions and gives very good compression ratios. All of the preceding methods are based on similar graph structures that enable the image to be mapped to the graph form in order to perform segmentation. The techniques by Biggar et al. (1988) and Kocher and Leonardi (1986), both model the texture within the image via a polynomial modeling method. However, Kocher and Leonardi report on compression ratios of much larger proportions than those from Biggar et al. (1988). Leou and Chen (1991) and Pavlidis (1982) implement a segmentation technique identical to that presented by Biggar et al. However, Leou and Chen point out that better compression ratios can be achieved by ﬁrstly performing a texture extraction process and then modeling the texture by polynomials as opposed to polynomial functions. The compression ratio achieved via this method is an improvement on that reported by Biggar et al (1988). A more recent technique belonging to graph-based segmentation techniques is the one proposed by Moscheni et al. (1998) and Moscheni (1997).

I. Fractal-Based Techniques In the previous sections, various methods for image segmentation have been suggested that lend themselves to efﬁcient compression of the image. Most

SECOND-GENERATION IMAGE CODING

45

of these techniques segment the image into regions of homogeneity and thus, when a highly textured image is encountered, the result of the segmentation is many small homogeneous regions. Jang and Rajala (1990, 1991) suggest a technique that segments the image in terms of textured regions. They also point out that in many cases previous segmentation-based coding methods are best suited to head and shoulder type (closeups) images and that results obtained from complex natural images are often poor. In their technique the image is segmented into textually homogeneous regions as perceived by the HVS. Three measurable quantities are identiﬁed for this purpose: the fractal dimension; the expected value; and the just noticeable difference. These quantities are incorporated into a centroid linkage region growing algorithm that is used to segment the image into three texture classes: perceived constant intensity; smooth texture; and rough texture. An image coding technique appropriate for each class is then employed. The fractal dimension D value is then thresholded to determine the class of the block. The following criteria are used to categorize the particular block under consideration: D D perceived constant intensity; D D D smooth texture; and D D rough texture. After this segmentation process the boundaries of the regions are represented as a two-tone image and coded using arithmetic coding. The intensities within each region are coded separately according to their class. Those of class 1, perceived constant intensity, are represented by the average value of the region. Class 2, smooth texture, and class 3, rough texture, are encoded by polynomial modeling. It should be noted from the description in Section IV.D.1 that polynomial modeling leads to some smoothing and hence may not be useful for rough texture. Therefore, it is not clear as to why Jang and Rajala chose this method of representation for the class 3 regions. Each of the various segmentation techniques used group pixels according to some criterion, whether it was homogeneity, texture, or pixels within a range of gray-level values. The problem that arises after segmentation is how to efﬁciently code the gray-level values within the region. The most basic representation of gray level within a region is by its mean value. This will result in a good compression, especially if the region is large; however the quality of the decoded image will be poor. In most cases gray-level variation is approximated by a polynomial function of order two. The results obtained by polynomial approximation can be visually poor, especially for highly

46

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

textured images. It is for this reason that more researchers are representing highly textured regions by texture synthesis techniques such as GMRF. These methods do not gain over the compression ratios obtained using polynomial approximation, but the quality of reproduced image is claimed to be improved (Kwon and Chellappa, 1993). Another approach used to encode the gray-level variation is by representing the variations by polynomials, as was done by Leou and Chen (1991). This method is of similar computational complexity but the results in terms of compression ratio and image quality are claimed to be better than polynomial reconstruction. As stated by Jang and Rajala (1990, 1991), many of the forementioned segmentation-based techniques are not sufﬁcient when the input image is a natural one, that is, an image of a real scene. These may be images that contain highly textured regions and when segmented using conventional methods, the resulting textured region is composed of a large number of small regions. These small regions are often merged or removed in order to increase the compression ratio and as a result the decoded image appears to be very unnatural. Therefore, Jang and Rajala (1990, 1991) employed the use of the fractal dimension to segment the image. This ensures that the region is segmented into areas that are similar in terms of surface roughness, this being the result of visualizing the image in 3D with the third dimension that of gray-level intensity. However, once segmented into regions of similar roughness the method employed to code the identiﬁed areas is similar to that of traditional segmentation-based coding methods, that is, polynomial modeling. This polynomial modeling, as reported by Kwon and Chellappa (1993) does not sufﬁce for the representation of highly textured regions and they suggest the use of texture synthesis. Therefore, it may be concluded that a better segmentation-based coding method might employ the fractal dimension segmentation approach coupled with texture synthesis for textured region representation. As discussed in this section there are a number of different methods that can be used in the segmentation process. Table 2 summarizes the methods used in the techniques that employ a segmentation algorithm as part of the coding process.

V. Summary and Conclusions This chapter has reviewed second-generation image coding techniques. These techniques are characterized by their exploitation of the human visual system. It was noted that ﬁrst-generation techniques are based on classical information theory and are largely statistical in nature. As a result they tend to deliver compression ratios of approximately 2:1. The problem with the

47

SECOND-GENERATION IMAGE CODING TABLE 2 Texture Coding Employed in Segmentation Algorithms Technique HVS-based segmentation (Civinlar et al., 1986) Segmentation-based (Kwon and Chellappa, 1993) Symmetry-based coding (Cicconi and Kunt, 1977) RSST-based (Biggar et al., 1988) MSF-based (Leou and Chen, 1991) RAG-based (Kochen and Leonardi, 1986) Fractal dimension (Jang and Rajala, 1990)

Texture Coding Method Polynomial Polynomial Polynomial Polynomial Polynomial Polynomial Polynomial

function & GMRF function function function function function

early techniques is that they ignore the semantic content of an image. In contrast, second-generation methods break an image down into visually meaningful subcomponents. Typically these are edges, contours, and textures of regions or objects in the image or video. Therefore, not surprisingly, these subdivisions are a strong theme in the emerging MPEG-4 standard for video compression. In addition, second-generation coding techniques often offer scalability where the user can trade picture quality for increased compression. An overview of the human visual system was presented to demonstrate how many of the more successful techniques closely resemble the operation of the human eye. It was explained that the HVS is particularly sensitive to sharp boundaries in a scene and why it ﬁnds gradual changes more difﬁcult to identify — with the result that detail in such scenes can be missed. Early coding work made use of the frequency sensitivity of the eye and it can be concluded that the eye is particularly efﬁcient at contrast resolution. The techniques considered herein were categorized into two broad approaches: transform-based coding; and segmentation-based coding. Transform-based coding initially decomposes/transforms the image into low and high frequencies (c.f. the HVS) to highlight these features that are signiﬁcant to the HVS. It was observed that directional ﬁltering is a technique that more closely matches the operation of the HVS. Following this initial stage, which is generally without loss, the transformed image is quantized and ordered according to visual importance. A range of methods from the simple uniform quantization to the more complex vector quantization can achieve this, with differing visual results in terms of image quality. Much research is still ongoing to ﬁnd a suitable measure for the effect of second-generation coding techniques on the visual quality of an image. Such measures attempt to quantify the distortion introduced by the technique by providing a ﬁgure for the visual distance between the original image and the

48

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

one that has been coded and decoded. Early measures used in ﬁrstgeneration image coding, such as MSE, are not necessarily optimal in the HVS domain. The discrete cosine transform is the basis of many transform-based codings. For example, it features in standards ranging from JPEG to MPEG-4. This can be attributed to its useful balance between complexity and performance and it was noted that it approaches the optimal KL transform for highly correlated signals (such as natural images). The multiscale and pyramidal approaches to image coding are multiresolution approaches, again paralleling the HVS. In addition, they offer the possibility of progressive image transmission — an attractive feature. For these reasons, most of the current research in transform-based coding is focused on wavelet codings; indeed, this will be the basis of JPEG 2000, the new standard for still image coding. Although wavelet coding is a generalization of pyramidal coding, it does not increase the number of samples over the original and it also preserves important perceptual information such as edges. As an example of subband coding, wavelets allow each subband to be separately coded allowing a greater bit allocation to those subbands considered to be visually important; for example, the power in natural images is in the lower frequencies. Unlike DCT, there are no blocking artifacts with wavelets, although they do have their own artifact called ‘‘ringing’’ — particularly around high contrast edges. A number of different wavelets exist and research is ongoing into ﬁnding criteria to assist in choosing the most suitable wavelet given the nature of the image. Segmentation-based approaches to image coding segment the original image into a set of regions following a preprocessing step to remove noise and small details. Given a set of regions, it is then ‘‘only’’ necessary to record the contour of each region and to code the texture within it. The segmentation approach aims to identify regions with semantic meaning (and hence a high correlation) rather than generic blocks. In addition, it is then possible to apply different codings to different regions as required. This is particularly important for video coding where research is considering how to predict the importance of regions and hence the appropriate allocation of bandwidth. The drawback to the segmentation approach is that it is not currently possible to correctly analyze the semantic content of a generic image in realtime. This section of the paper considered six approaches to segmentation: region growing; split and merge; K-means clustering; pyramidal linking; graph theory; and fractal dimension. All, except perhaps the ﬁrst two, are still being actively researched. Methods for texture representation range from using the mean value, through polynomial approximations to texture synthesis techniques. Both mean value and polynomial approximations

SECOND-GENERATION IMAGE CODING

49

(which tend to be second-order) yield poor quality images, especially if the image is highly textured. It must be remembered that the semantic representation does not always correspond to the deﬁnition of a homogeneous region. At present, the shape-adaptive DCT, particularly that proposed by Sikora (1995), is very popular. Most current research is concentrating on texture synthesis techniques, for example, GMRF. In coding the contours of segmented regions there is a trade-off between exactness and efﬁciency. Freeman’s (1961) chain code is probably the most referenced technique in the literature, but has been surpassed by techniques based on geometrical approximations, mathematical morphology, and the polygonal approximation of Le Baham et al. (1998). We conclude that the best approach to segmentation-based coding is currently a technique that uses fractal dimensions for the segmentation phase and texture synthesis techniques for the texture representation. The future of image and video coding will probably be driven by multimedia interaction. Coding schemes for such applications must support object functionalities such as dynamic coding and object scalability. Initial research is directed at how to deﬁne the objects. Such object-based coding is already being actively pursued in the ﬁeld of medical imaging. The concept is to deﬁne 3D models of organs such as the heart and then, instead of sending an image, the parameters of the model that best match the current data are sent and completed with an error function. This work is still in its infancy. All of the techniques reviewed here have their relative merits and drawbacks. In practice, the choice of technique will often be inﬂuenced by non-technical matters such as the availability of an algorithm, or its inclusion in an imaging software library. A direct comparison between techniques is difﬁcult as each is based on different aspects of the HVS. This is often the reason why a technique performs well on one type or source of images but not on others. Comparing the compression ratios of lossy techniques is meaningless if image quality is ignored. Therefore, until a quantitative measure of image quality is established, direct comparisons are really not possible. In the meantime, shared experiences and experimentation for a particular application will have to provide the best method of determining the appropriateness of a given technique.

Acknowledgments The authors would like to thank Julien Reichel, Marcus Nadenau, and Pascal Fleury for their contributions and useful suggestions.

50

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

References Akansu, N., Haddad, R. A., and Caglar, H. (1993). The binomial QMF-wavelet transform for multiresolution signal decomposition, IEEE Trans. Signal Proc., 41: 13—19. Berger, T. (1972). Optimum quantizers and permutation codes, IEEE Trans., Information T heory, 18: (6), 756—759. Bigger, M., Morris, O., and Costantinides, A. (1988). Segmented-image coding: Performance comparison with the discrete cosine transform, IEEE Proc., 135: (2), 121—132. Blackwell, H. (1946). Contrast thresholds of the human eye, Jour. Opt. Soc. Am., 36: 624—643. Brigger, P. (1995). Morphological Shape Representation Using the Skeleton Decomposition: Application to Image Coding, PhD Thesis No. 1448, EPFL, Lausanne, Switzerland. Burt, P. J. and Adelson, E. H. (1983). The Laplacian pyramid as a compact image code, IEEE Trans. Comm., COM-31: (4), 532—540. Burt, P., Hong, T. H., and Rosenfeld, A. (1981). Segmentation and estimation of region properties through cooperative hierarchical computation, IEEE Trans. Syst., Man, Cyber., SMC-11: 802—809. Caglar, H., Liu, Y., and Akansu, A. N. (1993). Optimal PR-QMF design for subband image coding, Jour. V is. Comm. and Image Represent., 4: 242—253. Castagno, R. (1998). V ideo Segmentation Based on Multiple Features for Interactive and Automatic Multimedia Applications, PhD Thesis, Swiss Federal Institute of Technology, Lausanne. Chellapa, R., Chatterjee, S., and Bagdazian, R. (1985). Texture synthesis and compression using Gaussiajn-Markov random ﬁelds, IEEE Trans. Syst. Man Cybern. SMC, 15: 298—303. Chen, W. H., Harrison Smith, C., and Fralick, S. C. (1977). A fast computational algorithm for the Discrete Cosine Transform, IEEE Trans. Comm., 1004—1009. Cicconi, P. and Kunt, M. (1977). Symmetry-based image segmentation, Soc. Photo Optical Instrumentation Eng. (W PIE), 378—384. Cicconi, P. et al. (1994). New trends in image data compression, Comput. Med. Imaging Graph, 18: (2), 107—124. Civanlar, M., Rajala, S., and Lee, X. (1986). Second generation hybrid image-coding techniques, SPIE-V isual Comm. Image Process, 707: 132—137. Coifman, R. R. and Wickerhauser, M. V. (1992). Entropy-based algorithms for best basis selection, IEEE Trans. Inform T heory, 38: 713—718. Cornsweet, T. N. (1970). V isual Perception, New York: Academic Press. Crochiere, R. E., Weber, S. A., and Flanagan, F. L. (1976). Digital coding of speech in sub-bands, Bell Syst. Tech. J., 1069—1085. Croft, L. H. and Robinson, J. A. (1994). Subband image coding using watershed and watercourse lines of the wavelet transform, IEEE Trans. Image Proc., 3: 759—772. Croisier, X., Esteban, D., and Galand, C. (1976). Perfect channel splitting by use of interpolation, decimation, tree decomposition techniques, Proc. Int’l Conf. Inform. Sci./Systems, 443—446. Daubechies, I. (1998). Orthonormal bases of compactly supported wavelets, Comm. Pure Appl. Math., 41: 909—996. Daubechies, I. (1993). Orthonormal bases of compactly supported wavelets II, Variations on a theme, SIAM J. Math. Anal., 24: 499—519. Daubechies, I. (1998). Factoring wavelet transforms into lifting steps, J. Fourier Anal. Appl., 4: (4), 245—267. Dinstein, K., Rose, A., and Herman, A. (1990). Variable block-size transform image coder, IEEE Trans. Comm., 2073—2078.

SECOND-GENERATION IMAGE CODING

51

Ebrahimi, T. (1997). MPEG-4 video veriﬁcation model: A video encoding/decoding algorithm based on content representation, Signal Processing: Image Comm., 9: (4), 367—384. Ebrahimi, T. et al. (1995). Dynamic coding of visual information, technical description ISO/IEC JTC1/SC2/WG11/M0320, MPEG-4, Swiss Federal Institute of Technology. Egger, O., Fleury, P., and Ebrahimi, T. (1996). Shape adaptive wavelet transform for zerotree coding, European Workshop on Image Analysis and Coding, Rennes. Fleury, P. and Egger, O. (1997). Neural network based image coding quality prediction, ICASSP, Munich. Fleury, P., Reichel, J., and Ebrahimi, T. (1996). Image quality prediction for bitrate allocation, in IEEE Proc. ICIP, 3: 339—342. Freeman, H. (1961). On the encoding of arbitrary geometric conﬁguration, IRE Trans. Electronic Computers, 10: 260—268. Froment, J. and Mallat, S. (1992). Second generation compact image coding with wavelets, in Wavelet: A Tutorial in T heory and Applications, C. K. Chui, ed., San Diego: Academic Press. Gerken, P. (1994). Object-based analysis-synthesis coding of image sequences at very low bit rates, IEEE Trans. Circuits, Systems V ideo Technol., 4: (3), 228—235. Golomb, S. (1966). Run length encodings, IEEE Trans. Inf. T heory, IT-12: 399—401. Haber, R. N. and Hershenson, M. (1973). T he Psychology of V isual Perception, New York: Holt, Rinehart and Winston. Haralick, R. (1983). Image segmentation survey, in Fundamentals in Computer V ision, Cambridge: Cambridge University Press. Harris, F. (1978). On the use of windows for harmonic analysis with the discrete Fourier transform, Proc. IEEE, 66: (1), 51—83. Herman, T. (1990). On topology as applied to image analysis, Computer V ision Graphics Image Proc., 52: 409—415. Huffman, D. (1952). A method for the construction of minimum redundancy codes, Proc. IRE, 40: (9), 1098—1101. Ikonomopoulos, A. and Kunt, M. (1985). High compression image coding via directional ﬁltering, Signal Processing, 8: 179—203. Jain, K. (1989). Image transforms, in Fundamentals of Digital Image Processing, Chapter 5, Englewood Cliffs, NJ: Prentice-Hall Information and System Science Series. Jain, K. (1976). A fast Karhunen Loeve transform for a class of random processes, IEEE Trans. Comm., COM-24: 1023—1029. Jang, J. and Rajala, S. (1991). Texture segmentation-based image coder incorporating properties of the human visual system, in Proc. ICASSP’91, 2753—2756. Jang, J. and Rajala, S. (1990). Segmentation based image coding using fractals and the human visual system, in Proc. ICASSP’90, 1957—1960. Jayant, N., Johnston, J., and Safranek, R. (1993). Signal compression based on models of human perception, Proc. IEEE, 81: (10), 1385—1422. Johnston, J. (1980). A ﬁlter family designed for use in Quadrature mirror ﬁlter banks, Proc. Int’l Conference on Acoustics, Speech and Signal Processing. ICASSP, 291—294. Jordan, L., Ebrahimi, T., and Kunt, M. (1998). Progressive content-based compression for retrieval of binary images, Computer V ision and Image Understanding, 71: (2), 198—212. Kaneko, T. and Okudaira, M. (1985). Encoding of arbitrary curves based on the chain code representation, IEEE Trans. Comm. Comm-33, 7: 697—707. Karhunen, H. (1947). Uber Lineare Methoden in der Wahrscheinlichkeits-Rechnung, Ann. Acad. Science Fenn, A.I: (37). Kocher, M. and Kunt, M. (1983). Image data compression by contour texture modelling, Proc. Soc. Photo-Optical Instrumentation Eng. (SPIE), 397: 132—139.

52

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

Kocher, M. and Leonardi, R. (1986). Adaptive region growing technique using polynomial functions for image approximation, Signal Processing, 11: 47—60. Kovalevsky, X. (1993). Topological foundations of shape analysis: Shape picture-math. description shape grey-level images, NATO ASI Series F: Comput. Systems Sci., 126: 21—36. Kunt, M. (1998). A vision of the future of multimedia technology, Mobil Multimedia Communication, Chapter 41, pp. 658—669, New York: Academic Press. Kunt, M., Benard, M., and Leonardi, R. (1987). Recent results in high compression image coding, IEEE Trans. Circuits and Systems, CAS-34: 1306—1336. Kunt, M., Ikonomopoulos, A., and Kocher, M. (1985). Second generation image coding, in Proc. IEEE, 73: (4), 549—574. Kwon, O. and Chellappa, R. (1993). Segmentation-based image compression, Optical Engineering, 32: (7), 1581—1587. Lai, Yung-Kai and Kuo, C,-C. Jay (1998a). Wavelet-based perceptual image compression, IEEE International Symposium on Circuits and Systems, Monterey, California, May 31—June 3, 1998. Lai, Yung-Kai and Kuo, C.-C. Jay (1998b). Wavelet image compression with optimized perceptual quality, Conference on ‘‘Applications of Digital Image Processing XXI,’’ SPIE’s Annual Meeting, San Diego, CA, July 19—24, 1998. Leou, F. and Chen, Y. (1991). A contour based image coding technique with its texture information reconstructed by polyline representation, Signal Processing, 25: 81—89. Lewis, S. and Knowles, G. (1992). Image compression using the 2-D wavelet transform, IEEE Trans. Image Processing, 1: 244—250. Lin, Fu-Huei and Mersereau, R. M. (1996). Quality measure based approaches to MPEG encoding, in Proc. ICIP, 3: 323—326, Lausanne, Switzerland, September 1996. Loe`ve, M. (1948). Fonctions aleatoires de second ordre, Processus stochastiques et mouvement brownien, P. Levvey, ed., Paris: Hermann. Lu, J., Algazi, V. R., and Estes, R. R. (1996). A comparative study of wavelet image coders, Optical Engineering, 35: (9), 2605—2619. Macq, B. (1989). Perceptual Transforms and Universal Entropy Coding For an Integrated Approach to Picture Coding, PhD Thesis, Universitie Catholique de Louvain, Louvain-laNeuve, Belgium. Mallat, S. G. (1989a). Multifrequency channel decomposition of images and wavelet models, IEEE Trans. Acoustics, Speech and Signal Processing, 37: 2091—2110. Mallat, S. G. (1989b). A theory of multiresolution signal decomposition: the wavelet representation, IEEE Trans. Pattern Anal. and Machine Intell., 11: 674—693. Mallat, S. G. (1991). Zero-crossing of a wavelet transform, IEEE Trans. Inform. T heory, 37: 1019—1033. Mallat, S. G. and Zhong, S. (1991). Compact coding from edges with wavelets, in Proc. ICASSP’91, 1745—2748. Mandlebrot, B. (1982). T he Fractal Geometry of the Nature, 1st edition, New York: Freeman. Mannos, J. L. and Sakrison, D. J. (1974). The effects of a visual ﬁdelity criterion on the encoding of images, IEEE Trans. Information T heory, 20: (4), 525—536. Marques, F., Gasull, A., Reed, T., and Kunt, M. (1991). Coding-oriented segmentation based on Gibbs-Markov random ﬁelds and human visual system knowledge, in Proc. ICASSP’91, 2749-2752. Miyahara, M., Kotani, K., and Algazi, V. R. (1992). Objective picture quality scale (PQS) for image coding, Proc. SID Symposium for Image Display, 44: (3), 859—862. Morris, O., Lee, M., and Constantinides, A. (1986). Graph theory for image analysis: An approach based on the shortest spanning tree, IEEE Proc. F, 133: (2), 146—152. Moscheni, F. (1997). Spatio-Temporal Segmentation and Object Tracking: An Application to

SECOND-GENERATION IMAGE CODING

53

Second Generation V ideo Coding, PhD Thesis, Swiss Federa; Institute of Technology, Lausanne. Moscheni, F., Bhattacharjee, S., and Kunt, M. (1998). Spatiotemporal segmentation based on region merging, IEEE Trans. Pattern Anal. Mach. Intell., 20: (9), 897—915. Nadenau, M. and Reichel, J. (1999). Compression of color images with wavelets under consideration of HVS, Human V ision and Electronic Imaging, IV, San Jose. Narasimha, M. J. and Peterson, A. M. (1978). On the computation of the Discrete Cosine Transform, IEEE Trans. Comm., COM-26: 934—936. Osberger, W., Maeder, A. J., and Bergmann, N. (1996). A perceptually based quantization technique for MPEG encoding, Proc. SPIE, Human V ision and Electronic Imaging, 3299: 148—159, San Jose, CA. Pal, N. and Pal, S. (1993). A review on image segmentation techniques, in Pattern Recognition, 26: (9), 1277—1294. Pavlidis, T. (1982). Algorithms for Graphics and Image Processing. 1st edition, Rockville, MD: Computer Science Press. Pearson, D. E. (1975). Transmission and Display of Pictorial Information, London: Pentatech. Perona, P. and Malik, J. (1990) Scale-space and edge detection using anisotropic diffusion, IEEE Trans. Pattern Anal. Machine Intell., 12: (7), 629—639. Poirson, A. B. and Wandell, B. A. (1996). Pattern-color separable pathways predict sensitivity to simple colored patterns, V ision Research, 36: (4), 515—526. Poirson, A. B. and Wandell, B. A. (1993). Appearance of colored patterns: pattern-color separability, Optics and Image Science, 10: (12), 2458—2470. Ramchandran, K. and Vetterli, M. (1993). Best wavelet packet bases in a rate-distortion sense, IEEE Trans. Image Processing, 2: 160—175. Rose, A. (1973). V ision — Human and Electronic, New York: Plenum. Rosenfeld, A. and Kak, A. C. (1982). Digital Picture Processing, San Diego: Academic Press. Said, A. and Pearlman, W. A. (1996). A new, fast, and efﬁcient image codec based on set partitioning in hierarchical trees, IEEE Trans. Circuits and Systems for V ideo Technology, 6: (3), 243—250. Salembier, P. and Kunt, M. (1992). Size-sensitive multiresolution decomposition of images with rank order based ﬁlters, Signal Processing, 27: 205—241. Samet, H. (1989a). Applications of Spatial Data Struc‘tures, 1st edition, Reading, MA: Addison-Wesley. Samet, H. (1989b). T he Design and Analysis of Spatial Data Structures, 1st edition, Reading, MA: Addison-Wesley. Schalkoff, R. J. (1989). Digital Image Processing and Computer V ision, Singapore: John Wiley and Sons. Schreiber, W. F. (1963). The mathematical foundation of the synmthetic highs systems, MIT, RLE Quart. Progr. Rep., No. 68, p. 140. Schreiber, W. F., Knapp, C. F., and Kay, N. D. (1959). Synthetic highs, an experimental TV bandwidth reduction system, Jour. SMPT E, 68: 525—537. Schroeder, M. R. and Mech, R. (1995). Combined description of shape and motion in an object based coding scheme using curved triangles, IEEE Int. Conf. Image Proc., Washington, 2: 390—393. Senoo, T. and Girod, B. (1992). Vector quantization for entropy coding of image subbands, IEEE Trans. Image Proc., 1: 526—533. Shapiro, J. M. (1993). Embedded image coding using zerotrees of wavelet coefﬁcients, IEEE Trans. Signal Proc., 41: (12), 3445—3462. Sikora, T. (1995). Low complexity shape-adaptive DCT for coding of arbitrarily shaped image segments, Signal Processing: Image Communication, 7: 381—395.

54

N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI

Sikora, T. and Makai, B. (1995). Shape-adaptive DCT for generic coding of video, IEEE Trans. Circuits and Systems for V ideo Technol., 5: 59—62. Simoncelli, P. and Adelson, E. H. Efﬁcient Pyramid Image Coder (EPIC), a public domain software available from URL: ftp://ftp.cis.upenn.edu/pub/eero/epic.tar.Z (Jan. 2000). Smith, M. J. T. and Barnwell, T. P. (1986). Exact reconstruction techniques for tree structured subband coders, IEEE Trans. Acoustics, Speech, and Signal Processing, 34: 434—441. Sziranyi, T., Kopilovic, I., Toth, B. P. (1998). Anisotropic diffusion as a preprocessing step for efﬁcient image compression, 14th ICPR, Brisbane, IAPR, Australia, pp. 1565—1567, August 16—20, 1998. Taubman, D. and Zakhor, A. (1994). Multirate 3-D subband coding of video, IEEE Trans. Image Proc., 572—588. Toet, A. (1989). A morphological pyramid image decomposition, Pattern Recognition L etters, 9: 255—261. Vaidyanathan, P. P. (1987). Theory and design of M channel maximally decimated QMF with arbitrary M, having perfect reconstruction property, IEEE Trans. Acoustics, Speech, and Signal Processing. Van den Branden Lambrecht (1996). Perceptual Models and Architectures for V ideo Coding Applications, PhD Thesis, Swiss Federal Institute of Technology, Lausanne, Switzerland. Vetterli, M. (1984). Multi-dimensional subband coding: some theory and algorithms, IEEE Trans. Acoustics, Speech, and Signal Processing, 97—112. Wandell, A. (1995). Foundations of V ision, Sunderland, MA: Sinauer Associates, Inc. Publishers. Wang, T. P. and Vagnucci, A. (1981). Gradient inverse weighted smoothing scheme and the evaluation of its performance, Computer Graphics and Image Processing, 15, 167—181. Welch, T. (1977). A technique for high performance data compression, IEEE Computing, 17: (6), 8—19. Westen, S. J. P., Lagendijk, R. L., and Biemond, J. (1996a). Optimization of JPEG color image coding under a human visual system model, Proc. SPIE Human V ision and Electronic Imaging, 2657: 370—381, San Jose, CA. Westen, S. J. P., Lagendijk, R. L., and Biemond, J. (1996b). Spatio-temporal model of human vision for digital video compression, Proc. SPIE, Human V ision and Electronic Imaging, 3016: 260—268. Winkler, S. (1998). A perceptual distortion metric for digital color images, in Proc. ICIP, 1998, Chicago, IL, 3: 399—403. Woods, J. and O’Neil, S. (1986). Subband coding of images, IEEE Trans. Acoustics, Speech, and Signal Processing, 1278—1288. You, Y., Xu, W., Tannenbaum, A., and Kaveh, M. (1996). Behavioral analysis of anisotropic diffusion in image processing, IEEE Trans. Image Processing, 5: (11), 1539—1553. Zhou, Z. and Venetsanopoulos, A. N. (1992). Morphological methods in image coding, Proc. Int’l Conf. Acoust., Speech, and Signal Processing ICASSP, 3: 481—484. Ziemer, R. Tranter, W., and Fannin, D. (1989). Signals and Systems: Continuous and Discrete, 2nd edition, New York: Macmillan. Ziliani, F. (1998). Focus of attention: an image segmentation procedure based on statistical change detection, Internal Report 98.02, LTS, Swiss Federal Institute of Technology, Lausanne, Switzerland. Ziliani, F. and Jensen, B. (1998). Unsupervised image segmentation using the modiﬁed pyramidal linking approach, Proc. IEEE Int. Conf. Image Proc., ICIP’98.

ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 112

The Aharonov-Bohm Effect — A Second Opinion WALTER C. HENNEBERGER Department of Physics, Southern Illinois University, Carbondale, IL 62901-4401

I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objections to the ‘‘standard’’ interpretation of the Aharonov-Bohm effect are discussed in detail. In particular, it may not be interpreted as a ‘‘scattering’’ effect. II. The Vector Potential . . . . . . . . . . . . . . . . . . . . . . . . . . T he role of the vector potential in the AB effect is discussed. T he transverse vector potential is shown to be related to the electromagnetic momentum, which is, indeed, a physical quantity. III. Dynamics of the Aharonov-Bohm Effect . . . . . . . . . . . . . . . . . A rigorous proof is given that, in Coulomb gauge, (e/c)A is just the electromagnetic momentum of the electron. T hus, in Coulomb gauge, A is an observable. T he longitudinal part of A carries no physics; it is merely a computational convenience. IV. Momentum Conservation in the Aharonov-Bohm Effect . . . . . . . . . . . In the AB effect, there is no force on the electron. T he electron does, however, exert a force on the ﬂux whisker or solenoid. T he force on the solenoid and the time rate of change of electromagnetic momentum constitute an action-reaction pair. V. Stability of the AB Effect . . . . . . . . . . . . . . . . . . . . . . . . T he reason for the stability of the fringe pattern is discussed. VI. The AB Effect Can Not Be Shielded . . . . . . . . . . . . . . . . . . . T he interaction of a passing electron with a superconducting shield is discussed VII. Interaction of a Passing Classical Electron with a Rotating Quantum Cylinder . T he solenoid is represented as a charged, rotating cylinder. It is shown that the rotating cylinder suffers a phase shift which is equal and opposite to that of the electron in an AB experiment. It is shown further that this result follows directly from classical mechanics VIII. Solution of the Entire Problem of the Closed System . . . . . . . . . . . . A correct solution of the AB problem is given. T he problem involves three degrees of freedom, not two, as is usually thought. T he solution involves bringing the L agrangian of the problem to normal coordinates, and subsequently quantizing the system. It is shown that problems in quantum theory do not (if canonical transformations are allowed) have unique solutions. IX. The Interior of the Solenoid . . . . . . . . . . . . . . . . . . . . . . . A semiclassical theory of an electron in a constant magnetic ﬁeld is given. T he correct treatment of this problem also involves three degrees of freedom. It is shown that Berry phase has a dynamic origin.

56

63

66

69

70 71 74

81

86

55 Volume 112 ISBN 0-12-014754-8

ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright 2000 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/00 $35.00

56

WALTER C. HENNEBERGER X. Ambiguity in Quantum Theory, Canonical Transformations, and a Program for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other examples of ambiguities in quantum theory are cited. T he oldest and best known of these is the p · A vs. r · E question. It is argued that one obtains better solutions of problems by eliminating velocity-dependent potentials by means of suitable canonical transformations. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

93

I. Introduction In 1949, Ehrenberg and Siday proposed an experiment for detecting phase shifts in an interference pattern due to the presence of a magnetic ﬁeld conﬁned to a region not accessible to the electrons. Ten years later, Aharonov and Bohm made a detailed study of such a system, in which a

Figure 1a. Idealized Aharonov-Bohm experiment.

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

57

coherent beam of electrons is directed around two sides of a solenoid. Since the de Broglie wavelength (which is a gauge dependent quantity) depends upon the vector potential A, quantum theory predicts a shift in the interference pattern that one obtains. The experiment is typically carried out by concealing a solenoid or a whisker of ﬂux between two slits, as shown in Figure 1a. The slit system serves the purpose of preventing the electrons from entering the ﬂux-carrying regions, as well as providing an initial interference pattern that serves as a basis upon which to compare the pattern with the ﬂux present. Figure 1b shows the result of Mo¨llenstedt and Bayh (1962), obtained by stopping the interference pattern down to a narrow region and moving the rectangular stop vertically as the current in the solenoid was increased. One sees clearly that the Aharonov-Bohm (AB) effect is a right-left shift in an interference pattern. This is the phenomenon that is observed, and this is the phenomenon that the theorist must explain.

Figure 1b. Result of Mo¨llenstedt and Bayh.

58

WALTER C. HENNEBERGER

Figure 2. An AB interference pattern of a ﬂux whisker at the center of a wide slit.

It is possible, in principle, to obtain an effect without the slit system, but the experimental difﬁculties involved are greater and the interference pattern would not be as clear. A theoretical result based on a ﬂux whisker of zero thickness is shown in Figure 2. In this computation (Shapiro and Henneberger, 1989) (based on the Feynman path integral method), the whisker is at the center of a very wide single slit. The result shows clearly the contribution of the slit edge, as well as the interference pattern of the ﬂux whisker. There has been ample experimental evidence for the AB effect, from the early experiments to the elegant experiments of Tonomura and coworkers. This writer has never doubted the existence of an AB effect. Physics is, after all, an experimental science. Quantum theory predicts an AB effect, and the theory, at least in a limited fashion, appears to be well understood by the experimental community that works in electron optics. The point at which this author dissents strongly from the viewpoint of his theoretical colleagues is on the topic of AB ‘‘scattering.’’ In 1959, Aharonov and Bohm treated the problem described here as a scattering problem, with electrons being scattered by an external vector potential. In units in which : : 1 ( is the electron mass, and is Planck’s constant divided by 2),

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

59

the Hamiltonian of the system is H:

1 e P9 A(r) 2 c

(1)

with A : 0 and A : /2r. is the total ﬂux in the whisker or solenoid. F AB found the stationary states m : J

e F m : 0, <1, <2, <etc. ? which led to the differential cross section sin() d/d : 2k sin( /2)

(2)

(3)

where : 9(e/ch) and k is the electron momentum. The AB calculation has been repeated many times. It is generally agreed that the mathematics of the derivation is ﬂawless. The problem lies with the result. One may make the following objections. 1. The total cross section is divergent. Of course, a divergent cross section also occurs in Coulomb scattering. In the Coulomb case, however, one recognizes that this result is due to the inﬁnite range of the Coulomb force. In the AB effect, there is no force at all. 2. d/d is symmetric in . Experiments all give the right-left shift in an interference pattern. The AB result is not even qualitatively correct. It is not reasonable to believe that one can change the symmetry properties of the result by going to thinner ﬂux whiskers. 3. The velocity operator is v:P9

e e A(r) : 9i 9 A(r). c c

(4)

It has components v : v cos 9 v sin F v : v sin ; v cos . F It is convenient to introduce the operators v : v ; iv

v : v : v 9 iv .

\ By straightforward substitution, we have v : 9ie F/r ; (e F/r)/ ; ie F/r

v : v : 9e\ F/r 9 (e\ F/r)/ 9 ie\ F/r.

\

(5a) (5b) (6a) (6b) (7a) (7b)

60

WALTER C. HENNEBERGER

In the Hilbert space of functions that vanish at the origin (the electrons are excluded from the ﬂux-carrying region), we have and

[v ,

v ]:0 \

(8)

H : v v . (9) \ We see that [H,v] : 0, so that v is an integral of motion. This implies that the electrons move in a straight line with constant speed while they are being scattered. This is not, as is generally claimed, a quantum effect. It is a contradiction. In a world in which statements are simultaneously true and false, there can be no such thing as knowledge, and science becomes an illusion. The situation becomes still worse (Henneberger, 1981). The canonical angular momentum operator is M : 9i/. The commutation relations for M, v and v are

\ [M, v ] : v

[M, v ] : 9v \ \ [M, H] : 0.

(10) (11a) (11b) (11c)

Thus, M and H commute, so that we may use the eigenvalue m of the angular momentum operator to characterize the eigenstates that comprise the basis of a Hilbert space of states having energy k/2. Therefore, if Mm : mm, then Eq. (11a) implies that Mv m : (m ; 1)v m

and Eq. (11b) implies that

(12a)

Mv m : (m 9 1)v m. (12b) \ \ Therefore v is an operator that raises the eigenvalues of M. Similarly, v

\ is a lowering operator. For clarity, we restrict to the range 0 1. Other values of can be treated by making obvious changes in the following discussion. Let us consider the action of the operators v and v on the AB states of

\ Eq. (2). This is illustrated in Fig. 3a. We see that the operator v can be

used to generate the positive m states from the state 0. The operator v \ may similarly be used to generate the negative m states from the state 91. These results follow easily from the recurrence relations for Bessel functions. However, there are two distinct chains of eigenstates that are not linked to each other by v and v . From the recurrence relations for Bessel functions, \

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

61

Figure 3a. Action of v and v on Aharonov-Bohm eigenstates.

\

we see that v 0 involves J (kr). This function is not in the Hilbert \ \ ? space of AB eigenfunctions. It is inﬁnite at the z axis, while the AB eigenfunctions all vanish there. Applying v twice to the state 0 produces \ a function that is not square integrable over any region containing the z axis. Similarly, the function v 91 does not lie in the AB Hilbert space

(Fig. 3a). Pauli (1939, 1958) has emphasized that such states are unsuitable. Pauli wrote that ‘‘A general criterion for the admissibility of eigenfunctions, which does not assume single-valuedness at the outset, was given by W. Pauli. This says that the repeated application of operators corresponding to physical properties may not lead to functions outside the space of quadratically integrable functions.’’ (Author’s translation.) There is good reason for Pauli’s criterion. Any perturbation leading to nonvanishing matrix elements between states would eventually lead to a state that is not normalized. Thus, the AB states are not suitable for any perturbation calculation unless one makes further assumptions that modify the system (Henneberger, 1981). It is noteworthy that, if one chooses the multivalued functions : J (kr)e JFe\ ?F, : 0, <1, <2, (13) J one obtains a consistent set of normalizable states, which is closed under the operators v and v , as shown in Fig. 3b.

\

62

WALTER C. HENNEBERGER

Figure 3b. Action of v and v on eigenstates of Eq. (13).

\

It is further noteworthy that the propagator that satisﬁes the integral equation (r, t) :

K(r, r; t)(r , 0) dr

(14)

is given by K(r , r; t) : *(r ) (r) exp(9iE t/ ), (15) where the (r) are the stationary state solutions of the Schro¨dinger equation. Insertion of the states of Eq. (13) into Eq. (15) yields the Feynman propagator. Theoretical results that show a displacement in an interference pattern in AB calculations are all based on the Feynman path integral method. It should be noted that there is considerable literature involving the path integral method in which electron paths winding around the ﬂux one or more times are included. In this writer’s opinion, such paths are forbidden

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

63

by quantum theory, as the velocity vector is an integral of the motion. The result of this section leads to an interesting debacle. On one hand, the correct set of eigenstates for the AB problem ‘‘must be’’ the states of Eq. (13). On the other hand, these states ‘‘can not be’’ the correct ones, as they are not single valued. Yang (1983) has written: ‘‘We emphasize that to challenge the single valuedness requirement of the wave function is to challenge the very foundation of quantum mechanics itself.’’ How is this dilemma to be resolved? It turns out that these two points of view are not contradictory. The reader is urged to read on.

II. The Vector Potential The vector potential is probably the most misunderstood function in physics. It is generally agreed that the AB effect is an effect of the vector potential. However, this statement takes on meaning only when the role of the vector potential in physics is clariﬁed. Conventional wisdom states that the vector potential is not a physical quantity because of the freedom to make gauge transformations. This view, while not completely wrong, has spawned many papers that are almost metaphysical. Let us bring the vector potential back to the real world by considering the following experiment. A parallel plate capacitor having plates of area A separated by a distance d is discharged in a uniform magnetic ﬁeld B, as shown in Fig. 4. The magnetic ﬁeld permeates all of space. The resistance in the circuit is R. At t : 0, the switch S is closed and a current i ﬂows in the circuit. The forces on the wire that do not cancel are those on a projection of the wire on the distance d. The force to the right at any time is F : Bid/c, and the ﬁnal momentum of the capacitor, wire, resistance, and switch (assumed to be rigidly connected) is

i dt : BQd/c. We will not be surprised to ﬁnd that momentum is conserved. The initial electromagnetic momentum is P

F dt : (Bd/c)

: 1/(4c)

EB dr : EBAd/(4c) : BQd/c.

Let us now compute e A /c in Landau gauge. We have A : 9By A : A : 0.

64

WALTER C. HENNEBERGER

Figure 4. Apparatus demonstrating potential momentum.

If we let y : 0 be the plane of the positively charged plate, then we ﬁnd e A /c : (9Q/c)(9Bd) : QB d/c. In his lectures on relativity at Purdue University, Belinfante described the quantity eA/c as a ‘‘potential momentum.’’ The motivation there was the fact that (kinetic momentum) and W /c, where W is the proper energy, form a 4-vector, as do eV and eA. The reader will recall that the proper energy of a particle is (c ; c). Therefore, eA/c is a ‘‘potential momentum.’’ The example of the discharging capacitor shows vividly the nature of this potential momentum. Here, potential momentum is converted into kinetic momentum, just as a ball rolling down a plane has its potential energy converted into kinetic energy.

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

65

So it is in the AB effect. The ﬂux whisker or solenoid is penetrated by the electric ﬁeld of the passing electron. (It is not the tail of the wave function that penetrates the ﬂux, as is sometimes thought.) At this point, the reader will, no doubt, object strenuously on two grounds: 1. In the AB effect, there is no force on the electron. 2. The elegant experiments of Tonomura et al. (1986a,b) have shown that the AB effect persists when there is no net electric ﬁeld in the ﬂuxcarrying region. Each of these objections will be discussed in detail. However, let us proceed with a logical development of the theory. We return to a discussion of the vector potential. Any vector potential that is due to localized sources can be uniquely decomposed into two vector functions: A(r) : AQ(r) ; A(r)

with · AQ(r):0

and

; A(r) : 0. (16)

AQ(r) , the ‘‘transverse part’’ of A(r) is given by Jackson (1975) AQ(r) :

1 ;; 4

A(r ) dr . r 9 r

(17)

The ‘‘longitudinal part’’ of A(r) is given by A(r) : 9

1 4

· A(r ) dr . r 9 r

(18)

Two facts immediately come to one’s attention: (1) The ‘‘transverse part’’ is a global concept. One must know A(r) everywhere in space in order to extract the transverse part. (2) The transverse part of a vector potential will, in general, not be unique if there are ﬁelds at inﬁnity. Thus, a uniform magnetic ﬁeld in the Z direction may be described in Landau gauge (already discussed) or by A : Br, A : 0, A : 0. Another example is the ﬂux F whisker in the AB problem. Here A may be given by A : /(2r), A : 0, F and also by A : 0, A : H(x)(y). In the latter example, H(x) is the Heaviside step function: H(x) : 0 for x 0, H(x) : 1 for x 0. (y) is the usual Dirac delta function. It is clear that

A · dl :

for all paths that encircle the origin, and zero for all paths that do not. This particular vector potential allows the ﬂux whisker to be treated as a phase plate that covers the positive x axis.

66

WALTER C. HENNEBERGER

It is the unique (in all physically realizable problems) transverse part of the vector potential that carries all of the physics. This function is, in every case, due to the penetration of some magnetic ﬂux-carrying region by the electron’s electric ﬁeld. This was already known to Thomson (1904), and it appears in a text by Konopinski (1981). A rigorous proof will be given in the following section. A(r) carries no physics. One takes advantage of the existence of A(r) in order to do manifestly covariant calculations. All of the physics is contained in the electric and magnetic ﬁelds, and hence, in the transverse part of the vector potential. In any problem involving a magnetic ﬁeld, one should imagine the charged particle’s electric ﬁeld lines intersecting magnetic ﬁeld lines. This is what the transverse vector potential (the physical part of the vector potential!) is all about. III. Dynamics of the Aharonov-Bohm Effect In order to fully understand the forces involved in the AB effect, it is essential to consider the effect of the electron’s own ﬁelds on the solenoid. We follow the approach of Zhu and Henneberger (1990). The interaction energy in the AB problem is purely magnetic. E :

1 4

B (r 9 r) · B (r ) dr

(19)

with 1 B : ; E(r 9 r) c

(20)

for an electron at point r moving with velocity v. It is useful to compute 1 4

1 B (r 9 r) · B (r ) dr : 4c :

1 4c

; E(r 9 r) · B (r ) dr · E(r 9 r) ; B (r ) dr

: ( ·P ):( ·)P ;;(;P ) d : P ; ; ( ; P ). (21) dt We remind the reader that P : 1/(4c) E ; B dr . As the reader has probably guessed, the term v ; ( ; P ) is just the Lorentz force. This

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

67

can be seen as follows: 1 ; 4c

1 E(r 9 r) ; B (r ) dr : 9 4c ;

1 4c

B(r )[ · E(r 9 r)] dr [B (r ) · ]E(r 9r) dr .

(22)

The relation · E(r 9 r) : 9 · E(r 9 r) yields 1 ;P : 4c

1 B 4e(r 9 r) dr ; 4c

[B (r ) · ]E(r 9 r) dr . (23)

The ﬁrst term of Eq. (23) is (e/c)B (r). The second term can be shown to vanish. Let + be an arbitrary constant vector:

+·

[B(r ) · ]E(r 9 r) dr : :

9

[B(r ) · ][+ · E(r 9 r) dr · [+ · E(r 9 r)]B(r ) dr

+ · E(r 9 r) · B(r ) dr .

(24)

The ﬁrst integral can be converted into a surface integral that vanishes at inﬁnity. The second integral vanishes since · B : 0. We thus have the result

1 d e B (r 9 r) · B (r ) dr : P ; ;B (25) 4 dt c Equation (25) provides insight into the dynamics of the AB problem. The Lorentz force is the time rate of change of the kinetic momentum . The left-hand side of Eq. (25) reads F : (E ), not F : 9(E ), with E given by Eq. (19). We are assuming that the current in the solenoid windings and the probability current density of the electron are kept constant. The consequences of this assumption will be discussed later. It is well known that the magnetic force between current-carrying conductors is given by F : (E ) when the currents are kept constant. Our force now reads 1 4

dP B (r 9 r) · B (r )dr : dt

(26)

with P : ; P . The usual statement that there is no force in the AB problem requires clariﬁcation. There is no mechanical force (i.e., no force on

68

WALTER C. HENNEBERGER

the electron). However, the rate of change of total momentum is not zero, because the electric and magnetic ﬁelds of the electron penetrate the solenoid. (In cases in which the interior of the solenoid is shielded, the electric and magnetic ﬁelds of the electron interact with currents in the shielding materials. An extensive discussion of this will be given shortly.) In the preceding paragraphs, we have seen that e ;P : B. c

(27)

It is therefore clear that there exists a ‘‘natural’’ gauge in which e P : A. c

(28)

The reader should not be surprised to learn that this ‘‘natural’’ gauge is the Coulomb gauge. 1 P (r) : 4c :

1 4c

E(r 9 r) ; [ ; A(r )] dr [E(r 9 r) · A(r )] dr 9

1 4c

9

1 4c

9

1 4c

[A(r ) · ]E(r 9 r) dr [E(r 9 r) · ]A(r ) dr A(r );[ ;E(r 9r)] dr . (29)

The fourth term of Eq. (29) vanishes because ; E : 0 for the Coulomb ﬁeld of the electron. The ﬁrst term vanishes by a corollary of Gauss’ theorem. The second integral vanishes in Coulomb gauge. (We assume that the external magnetic ﬁeld has a ﬁnite source, so that the ﬁelds vanish at inﬁnity.) Again, taking + to be an arbitrary vector, we ﬁnd

+·

[A(r ) · ]E(r 9 r) dr :

· [A(r ) + ·E(r 9 r)] dr

9

[+ · E(r 9 r)] · A(r ) dr . (30)

The ﬁrst integral of Eq. (30) can be converted into a vanishing surface integral. The second term vanishes in Coulomb gauge, · A(r ) : 0. The

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

69

third integral of Eq. (29) is then 1 4c

A(r ) · E(r 9 r) dr :

1 c

e A(r )e(r 9 r) dr : A(r). c

(31)

Thus, we have a rigorous proof that for any problem involving sources of ﬁnite extent in space, the electromagnetic momentum is given by e/c(A), where A(r) is the uniquely deﬁned vector potential in Coulomb gauge. IV. Momentum Conservation in the Aharonov-Bohm Effect We consider the more general problem of a moving electron in or near a magnetic ﬁeld, as discussed by Al-Jaber and Henneberger (1992). We consider a source of ﬂux that is rigid and ﬁxed in spatial orientation. Let a ﬁxed point in the body of the source have a displacement r. Let r be the displacement vector of the electron. The interaction energy of the current source with the electron is :

1 4

B (r 9 r) · B (r 9 r) dr .

(32)

Equation (32) is a trivial generalization of Eq. (19). The force on the whisker is 1 F : B (r 9 r) · B (r 9 r) dr . (33) 4

In the nonrelativistic limit B (r 9 r) : ; E(r 9 r), c

(34)

so that F is given by F: :

1 4c

1 · 4c

: ( · P

; E(r 9 r) · B (r 9 r) dr

E(r 9 r) ; B (r 9 r) dr

(r 9 r) : 9( · P (r 9 r). We have seen in Eq. (21) that ( · P ) :

dP e ; ; B . dt c

(35)

70

WALTER C. HENNEBERGER

We thus have F;

e dP ; ; B : 0, dt c

(36)

where F is the force exerted on the source of the external ﬁeld. In the case of the AB effect, the Lorentz force is zero. The force on the ﬂux whisker or solenoid is the negative of the time rate of change of the electromagnetic momentum. If no other forces act on the ﬂux whisker, the total momentum of the ﬂux whisker plus that of the electromagnetic ﬁeld is conserved. The forces on the ﬁeld and on the ﬂux whisker form an action-reaction pair. The role of the electromagnetic momentum is central to the Aharonov-Bohm and Aharonov-Casher effects. This has been discussed by several authors [Goldhaber, 1989; Zhu and Henneberger, 1990).

V. Stability of the AB Effect In the previous section, we have assumed the ﬂux in the solenoid to be absolutely constant. Let us examine what this assumption entails. We assume an ideal solenoid of n turns per cm. In order to maintain the solenoid ﬂux constant, the current must be well regulated. The passing electron induces a voltage into the windings in dz given by n d dE : 9 dz c dt

B (r) da,

(37)

where the integration is over the cross section of the solenoid. The total induced voltage in the solenoid is therefore E:9

n d d dt

(38)

B dr,

where the integration is over the volume of the solenoid. To maintain the current (and thus, the solenoid ﬂux) ﬁxed, the current source must produce an additional voltage given by the negative of Eq. (38). The source must therefore produce an additional instantaneous power nl d P : c dt

1 d B dr : 4 dt

B B dr.

(39)

Thus, the additional ﬁeld energy must be supplied by the current source of the solenoid. It is interesting that, in an ideal AB experiment, this power is positive or negative depending on whether the electron passes to the right

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

71

or to the left of the solenoid. In an ideal AB experiment, the power source itself must become a quantum system! The conditions of an ideal AB experiment clearly can not be realized in the laboratory. There are many magnetic ﬁelds acting on the solenoid that together induce a larger voltage than the ﬁeld of the passing electron, for example, ﬂuctuations in the ﬂux due to current ﬂuctuations, ﬁelds of other passing charged particles, and the ﬁeld due to the electron’s own magnetic moment. In the following, it is shown that small voltage ﬂuctuations that are not correlated with the motion of the electron do not affect the fringe pattern. The wave properties of a particle in quantum theory depend upon the canonical momentum (the sum of and P ). Let be the ﬂuctuation in the ﬂux that occurs in time t due to all random causes. This ﬂuctuation in the ﬂux induces an electric ﬁeld on an electron 1 E :9 , F 2rc t which gives rise to the momentum change e : eE t : 9 . F F 2rc The change in P

(40)

is (with A in Coulomb gauge, of course) e e P : A : . F c F 2rc

(41)

We see that P : ; P : 0. (42) The total momentum (and hence, the local wavelength) is unaffected by variations in the ﬂux. Were it not for Eq. (42), observations of the AB effect would be much more difﬁcult. Equation (42) guarantees the stability of the fringes.

VI. The AB Effect Can Not Be Shielded The ﬁrst of the objections of Section II has now been discussed in detail, and we have seen the role played by the electromagnetic momentum, or ‘‘potential momentum’’ in the AB effect. We now turn our attention to the observation that ﬁelds can be shielded from the interior of the ﬂux-carrying solenoid, and the AB effect persists.

72

WALTER C. HENNEBERGER

The beautiful and well-known experiment of Tonomura and coworkers (1986a,b), which demonstrates the AB effect in a ﬂux-carrying torus surrounded by a superconducting shield, has an aspect that is seldom discussed. The purpose of the shield was to ensure that no ﬂux leakage occurs. The shield enabled ﬂux quantization to be observed when it was at a superconducting temperature. A thoughtful person may wonder why, at superconducting temperature, there was any effect at all. Since neither electric or magnetic ﬁelds penetrate the shield, how does information regarding the ﬂux reach the electron? This question may be answered at two levels. The obvious answer is the superﬁcial one: The AB effect is an effect of the vector potential, and the vector potential is always related to the ﬂux by Stokes’ theorem. Stokes’ theorem is a mathematical identity. It can not be violated. Hence, the result of the Tonomura experiment is the only one possible. Many readers probably ﬁnd this explanation satisfactory. On the other hand, we have seen that, in Coulomb gauge, the vector potential is related to the electromagnetic momentum of the passing electron by e 1 A(r) : P : 4c c

E (r 9 r) ; B (r ) dr .

Thus, in a manner of speaking, the vector potential (modulo a gauge transformation) is just another name for the electromagnetic momentum. This appears to present a problem. If the electric ﬁeld can not overlap the magnetic ﬂux because of the presence of the superconducting shield, how can there be a vector potential? How is information regarding the ﬂux transmitted to the electron? The answer to this question is not difﬁcult. Consider an AB effect, either with a ﬂux whisker or a solenoid along the z axis, or a toroidal ﬂux, as in the Tonomura experiment. In either case, let the ﬂux be completely surrounded by a superconducting shield. Imagine an electron passing in the x-y plane. The inner surface of the shield will, in general, carry some current because of ﬂux quantization. The total ﬂux is the sum of the ﬂux due to the whisker and the ﬂux due to the interior surface current. In the absence of passing charged particles, the outer surface of the shield can carry no current. A current on the outer surface would create a magnetic ﬁeld inside the superconducting material. Now consider the changes brought about by a passing electron. In the following, we represent the electromagnetic momentum of the electron by P and the electromagnetic momentum of the superconducting electrons on

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

73

the outer surface of the shield by P . These quantities are then given by 1 P : E (r 9 r) ; B (r ) dr (43) 4c

1 P: 4c

E (r ) ; B (r ) dr .

(44)

In the forementioned, E (r 9 r) is the electric ﬁeld at a point r due to an electron at the point r. E (r ) is the electric ﬁeld due to the superconducting electrons (together with the background charge) at the point r . The integrals are over all space, but the integrands are nonvanishing only in the region enclosed by the inner surface of the shield. As the net electric ﬁeld inside the outer surface of the superconductor must vanish, we have E (r 9 r) : 9E (r ) (45) at all points r inside the outer superconductor surface, and therefore we have P : 9P . (46) The electromagnetic momenta of the electron and of the surface charges of the superconducting shield are equal and opposite. However, they do not cancel. They are momenta of different charges. Differentiation of Eq. (46) with respect to time yields a relation that is reminiscent of Newton’s third law. It is important to realize that the ﬁeld of the passing electron is seen as an external ﬁeld (as opposed to a self-ﬁeld) by the ﬂux whisker and shield. The ﬁelds of the superconducting shield are, likewise, seen as external ﬁelds by the passing electron. Thus, the passing electron experiences a change in its electromagnetic momentum given by 1 P : 4c

E (r 9 r) ; B (r ) dr ,

(47)

where B (r ) is the magnetic ﬁeld of the superconducting electrons which, in the interior of the superconductor, just cancels the magnetic ﬁeld of the passing electron. Finally, we recall that the wave properties of particles depend upon the canonical momentum, which is p : ; P.

(48)

At a given time, the vector potential seen by a passing electron is A(r, t) : A (r) ; A(r, t)

(49)

74

WALTER C. HENNEBERGER

with P : (e/c)A(r, t). A (r) is the vector potential in the absence of passing electrons. Thus, if in a time dt, the change in the vector potential due to a supercurrent is dA, then the electric ﬁeld experienced by the electron will be E:9

1 (A) c t

so that e d : eE dt : 9 d(A). c

(50)

Therefore, dp : d ; (e/c) d(A) : 0. Clearly, the superconducting shielding can have no effect on an AB experiment (except, of course, to bring about the usual ﬂux quantization). The result has been derived independent of geometry. It is further clear that the problem of shielding is not of a new type. It is merely an example of the stability of the AB effect under small ﬂuctuations in the ﬂux. This was discussed in the previous section. The ﬁeld of the supercurrent is just one example of such ﬂuctuations. VII. Interaction of a Passing Classical Electron with a Rotating Quantum Cylinder In Section I it was argued that the usual single-valued stationary states, while mathematically correct, made no physical sense. The physics associated with them exhibits many contradictions. On the other hand, the total wave function ‘‘must’’ be single-valued. The way out of this dilemma is fairly easy to guess: The Schro¨dinger equation, as written down by Aharonov and Bohm contains too few degrees of freedom. The AB effect is directly concerned with phase. A correct mathematical description of the effect depends critically on getting phase information right. In this section, the solenoid will be represented by a charged, rotating cylinder. The treatment is that of Henneberger and Opatrny (1994). The reader has already seen that there is a force exerted on the solenoid by the passing electron. The reader will also see that the passing electron exerts a torque on the rotating cylinder. Thus, in the external ﬁeld approximation, phase information is lost. Therefore the external ﬁeld approximation is inappropriate for discussion of the AB effect. In this section, this is shown to be the case. The model of a charged, rotating cylinder enables one to avoid a discussion of the quantum theory of the power source that supplies the

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

75

current to the solenoid (see the discussion associated with Eq. (39)). The model adopted here is similar to that of Peshkin et al. (1961). However, these authors discuss the change in angular velocity of the cylinder. The essential thing here, however, is not the change in the motion of the rotor, but the shift in its phase. We thus arrive at a conclusion that is just the opposite of the one drawn by these authors. We consider a cylinder of radius a and length l carrying a surface charge density , as shown in Fig. 4. The axis of the cylinder carries a line charge : 92a, so that the electric ﬁeld of the cylinder is conﬁned to its interior. Thus, the electrostatic potential vanishes in the region exterior to the cylinder. The cylinder is free to rotate about its axis (the z axis). An exact treatment of the problem in which an electron passes the rotating cylinder with ﬁxed angular momentum consists, in general, of an inﬁnite series of rapidly converging corrections to the unperturbed motion. The passing electron affects the motion of the cylinder and the change in the cylinder’s motion gives a further correction to the usual AB effect (which turns out to vanish, because of the AB effect’s stability, as already discussed). However, an angular acceleration of the cylinder would exert a small force on the electron. By the argument of Peshkin et al., in the limit of an inﬁnitely massive cylinder, there will be no change in the cylinder’s angular velocity. The cylinder will, however, undergo a phase shift. We may therefore consider the treatment to be given here as exact. The method of this section is a semiclassical one. We write the state function for the electron—cylinder system as a product (r, , t) : ( , r, t)(r, t).

(51)

In Eq. (51), the function depends upon the electron coordinate r in order to allow for the inﬂuence of the passing electron on the cylinder. It is (r, , t) that describes an isolated system. It is this function that must be single valued. The product Ansatz of Eq. (51) is based on the fact that, in the limit of an extremely massive cylinder, the electron sees a constant ﬂux, and the cylinder moves with virtually constant angular velocity, but undergoes a ﬁnite phase shift. We next assume that ( , r, t) is the solution of the quantum problem for the cylinder interacting with a passing classical electron. It may appear strange to the reader to do a quantum mechanical computation for a massive macroscopic object, but in light of the fact that the electron experiences no force (and thus moves in a straight line with constant speed, in accordance with the fact that the electron’s velocity is an integral of the motion, as seen earlier), the treatment here is exact, in the context of a semiclassical method. The reader who is unhappy with this method will ﬁnd consolation in the fact that the complete quantum problem is solved exactly

76

WALTER C. HENNEBERGER

in the next section. Because of the large number of skeptics who have no problem with the usual treatment of AB scattering, the author feels the need for at least two independent derivations of his result! We consider that a classical electron passes the cylinder with constant speed along a line parallel to the x axis with impact parameter b, as shown in Fig. 5. From Eqs. (26) and (28), we obtain 1 4

e dA(r) B (r 9 r) · B (r ) dr : c dt

(52)

as d/dt : 0 in the AB problem. The reader is again reminded that Eq. (52) holds only in Coulomb gauge, the ‘‘natural’’ gauge for solving problems involving constant magnetic ﬁelds. The magnitude of B is related to the current/length K by the relation B : 4K/c, with K : a!. The angular

Figure 5. Electron passing a charged, rotating cylinder.

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

77

velocity of the cylinder is " so that B is given by B : (4/c)a" . (53) Equation (52), together with the fact that the electron has only one degree of freedom yields x

a "B (r 9 r) dr : x c

a " (z , x) dz c

e dA (r) e A : v . : c dt c x

(54)

The last integral of Eq. (54) is over the length l of the cylinder. (z , x) denotes the magnetic ﬂux due to the passage of an electron at point x through an area of the cylinder parallel to the x, y plane a distance z above it; vol denotes the volume of the cylinder, and v is the velocity of the electron. X and t are related by x : vt. The constancy of v assumes neglect of electric ﬁelds caused by the angular acceleration of the cylinder. In this approximation (which is exact in the limit of an inﬁnitely massive cylinder), one may integrate Eq. (54) directly, obtaining a (e/c)vA : " c

(z , x) dz .

(55)

The cylinder has length l and (mechanical) moment of inertia I . The electromagnetic angular momentum of the cylinder is L

:

1 4c

r ; (E ; B) dr

(56)

l L : r(4a/r)(4/c) " a · 2r dr : 4al"/c (57) 4c and I : I ; 4al/c is the effective moment of inertia of a charged cylinder of length l. The interaction Lagrangian is given by (e/c)v · A, so that the Lagrangian for the cylinder is I e L : " ; vA . 2 c

(58)

The angle is the angle subtended by some ﬁducial mark on the cylinder with the perpendicular to the trajectory indicated by the vector b in Fig. 4. The ﬂux is given by : aB : (4a/c)" .

(59)

78

WALTER C. HENNEBERGER

The relations A :

2a" c((b ; x)

cos :

b

(60a)

(b ; x

2ab" A : A cos : c(b ; x)

(60b)

1 2eabv" L : I" ; . 2 c(b ; x)

(61)

yield

The canonical angular momentum is L P : : I" ; R(x) F " with R(x) :

2eabv c(b ; x)

(62)

and x : vt. The relation L / : 0 indicates that P is conserved. The wave F function ( , r, t) of Eq. (51) is of the form

(" , r, t) : exp i/

F

P (" ) d 9 F

E(t ) dt . (63) \ \ The interaction between the electron and the rotating cylinder is very weak. The kinetic energy of the electron is conserved. Energy conservation therefore dictates that the sum of the energy of the cylinder plus the magnetic interaction energy is conserved. This is easily checked. We deﬁne the quantity # as the value of " at t : 9-. Thus, we have Then Eq. (62) yields

P I# : I" . \ F

(64)

" : # 9 R(x)/I, so that " : 9R(x)/I

(65)

represents the change in " due to the passage of the electron. The change in kinetic energy is then K.E. : I"" $ I#" : 9

2eabv# . c(b ; x)

(66)

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

79

The change in magnetic energy is 1 4

#a B (r 9 r) · B (r ) dr : c

(z , x) dz .

It is now clear that any phase shift of the cylinder must be due to a lag in the angle (t). e e 2abv : vA : # : 9K.E. c c (b ; x)

(67)

Indeed, the phase shift in Eq. (63) is given by

I# # 2eab e 1 P : " dt :9 dx :9 A (x ) dx .

F

c(b;x ) \ \ \ (68) We see that the phase shift of ( , r, t) is just the negative of the Dirac phase shift that a quantum electron experiences in the vector potential of a classical ﬂux. The preceding result has serious consequences. Returning to the requirement that the total wave function (r, , t) : ( , r, t)(r, t) be singlevalued, we see that (r, t) can not be a single-valued function. It must contain a phase factor exp

ie

c

r

A(r ) · dr : exp

ie

c

r

r d : exp(9i ) (69) 2r

with deﬁned in Eq. (3). The cylinder may be arbitrarily massive; one must nevertheless take into account the phase shift of the cylinder. Failure to do so yields incorrect eigenfunctions for the system and physics that is simply nonsense. One cannot avoid this phase factor of the source of the magnetic ﬁeld. In order to have true stationary states, one must have a closed system, that is, there can be no transfer of energy to or from the system. It will not do to avoid discussion of the source of the magnetic ﬁeld by simply powering the solenoid with power from the local power company. As has been discussed in Section IV, this merely complicates the problem by making the local power company part of the quantum system! This cancellation of phases between parts of an isolated system is not accidental. It always occurs. It has been shown by Yu and Henneberger (1996) that this also occurs in the Aharonov-Casher effect. The authors go on to show that, in every closed system, the sum of the phase shifts must be zero. The proof follows.

80

WALTER C. HENNEBERGER

We consider an isolated system having N degrees of freedom with coordinates q , q , . . . , q . The system is described by a Lagrangian L , with canonically conjugate momenta given by L p : , k : 1, . . . , N. q! The principle of least action states that

with

(70)

L dt : 0,

L : p q! 9 H, (71) with the second of these equations deﬁning the Hamilton function H. Equations (71) then yield

p q! dt 9 H dt : 0. (72) These equations hold for arbitrary inﬁnitesimal variations in coordinates. Instead of considering arbitrary variations, we ﬁrst consider the system without interaction. Interactions that result only in a path-dependent phase factor are very weak. We therefore consider the variations q and p to be those resulting from an adiabatic switching on and off of the interaction leading to the phase shifts. The treatment here is especially suited to very weak interactions involving free particles, such as in the AB effect. The Hamiltonian H is conserved, and the adiabatic switching on and off of the interaction occurs when the interacting parts are separated by very great distances. Hence, we have

Equation (72) now tells us that

H dt : 0.

(73)

I p q! dt : p dq : 0. (74) I But I p dq is just the phase shift associated with the coordinate q , I multiplied by . We note that in the case considered here the q do not necessarily vanish at the endpoints of the integral. The variations here are not virtual displacements, but real ones. This is quite legitimate. In the usual theory, the

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

81

q are chosen to vanish at the endpoints of the time integral for conveni ence, as the displacements are only virtual. All that is required is the variational condition of Eq. (71). Our result, expressed in terms of phases, is

1 I p dq : 0 (75)

I for any isolated, weakly interacting system. This result appears almost obvious; however it has profound implications. One must use great care in splitting an interacting system into a particle and an external ﬁeld. Whenever the wave function of a particle in such a system undergoes a phase shift, the wave function of the rest of the system undergoes an equal and opposite phase shift. To ignore this phase shift of the remainder of the system is to violate the postulate of quantum theory that requires the wave function of the completely isolated system (which, to be completely accurate, is the wave function of the system!) to be single valued. The external ﬁeld approximation is, as we have just seen, an approximation. VIII. Solution of the Entire Problem of the Closed System Because of widespread misunderstanding of the AB problem, it was useful to demonstrate the phase shift of the rotating cylinder in the previous section. In a problem where skepticism abounds, it is good to have two independent derivations, one corroborating the other. The complete problem is solved in this section by ﬁnding normal modes of the system and quantizing these normal modes. The method illustrates a shortcoming of quantum theory: T he processes of canonical transformation and quantization do not commute! It is possible for two (or more!) persons to produce different solutions of a physical problem, all of which are mathematically correct. A wider discussion of this rather disconcerting phenomenon will be given at the end of this section. Returning to the problem of an electron in the region exterior to a charged, rotating cylinder, we ﬁnd that the Lagrangian of the total system is 1 1 e 1 L : r! ; r" ; "" ; I", 2 2 c 2 2

(76)

where is the electron mass, r and are the electron coordinates, and is the angle turned by some ﬁducial mark on the cylinder, as before. The treatment here follows one that was published earlier by the author (Henneberger, 1997).

82

WALTER C. HENNEBERGER

In this section, we must modify the model of Section VII slightly. The line charge along the z axis is replaced by a stationary charged cylinder having a radius a ; %, where % is inﬁnitesimal. The cylinder carries a surface charge 9. In this way, all electric ﬁelds of the rotating cylinder have been eliminated (except for the inﬁnitesimal region between the cylinders). The magnetic moment of inertia of the rotating cylinder is easily shown to be unchanged from the value found in Section VII. The vector potential at a distance r from the ﬂux whisker is A :

with : " 2r

and

: 4a/c.

(77)

As before, is the charge/cm on the rotating cylinder and a is the radius of the cylinder (assumed to be small). The moment of inertia of the cylinder is the total moment (the sum of mechanical and electromagnetic moments) as deﬁned earlier. In Eq. (76), the interaction of the electron with the vector potential has been replaced by the interaction with its source, the rotating charged cylinder. The most direct method of treating this Lagrangian (and also the method that inspires the most conﬁdence!) is to transform it to normal coordinates. The reader is reminded that Planck discovered the quantum of action by quantizing the normal modes of a cavity. If there is any system at all to which quantum theory can be applied without any sort of caveat, surely it is a system having normal modes. We begin with an orthogonal transformation to new variables & and & , such that & : cos ' ; sin ' & : 9 sin ' ; cos '

(78)

Equation (78) is a rotation through an angle ' in the , space. The inverse transformation is : & cos ' 9 & sin ' : & sin ' ; & cos '.

(79)

The angle ' is assumed to be time-independent. This assumption will be seen to be justiﬁed in the limit I r. This is in keeping with the usual assumption that the rotating cylinder is extremely massive. The assumption that ' is time-independent yields " : "& cos ' 9 "& sin ' " : &" sin ' ; "& cos '.

(80)

83

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

The Lagrangian of Eq. (76) now becomes L : r" ; r("& sin' ; "& cos' ; 2"& "& sin ' cios ') e " ; (& sin ' cos ' 9 "& sin ' cos ' ; "& "& cos' 9 "& "& sin') 2c (81) ; I("& cos' 9 2"& "& sin ' cos ' ; "& sin'). The angle ' is chosen to have a value that causes the "& "& term to vanish. This condition yields r sin ' cos ' ;

e (cos' 9 sin') 9 I sin ' cos ' : 0. 2c

(82)

Equation (82) has the solution tan' :

e e $ . c(I 9 r) cI

(83)

Equation (83) shows that the assumption '! : 0 is valid for I r. The angle ' is very small, but it may not be assumed to vanish. Equation (83) yields sin ' $

e 2cI

and

cos ' $ 1.

(84)

The Lagrangian then becomes

1 1 e 1 L : r! ; r sin' ; sin ' cos ' ; I cos' "& 2 2 2c 2 ;

1 e 1 r cos' 9 sin ' cos ' ; I sin' &" . 2 2c 2

(85)

The canonical momenta are L p : : r! r!

p& :

L e : r sin' ; sin ' cos ' ; I cos' &" c &"

p& :

L e : r cos' 9 sin ' cos ' ; I sin' &" . c " &

(86)

84

WALTER C. HENNEBERGER

It is convenient to introduce the quantities I& and : I& : r sin' ;

e sin ' cos ' ; I cos' c

r : r cos' 9

e sin ' cos ' I ; sin' . c r r

In the limit I -, I& I and . The Lagrangian now has the simple form

(87)

L : r! ; r&" ; I& &"

(88)

1 1 1 p p p H : r! ; r&" ; I& &" : ; & ; & . 2 2 2 2r 2I& 2

(89)

and the Hamiltonian is

In the limit of extremely large I, we may drop the tildes on I& and . Then H becomes p p p H : ; & ; & . 2 2r 2I

(90)

The eigenstates of this Hamiltonian are (k, r, & , &) : J (kr)e im& eiM&, (91) where m and M are integers and the electron kinetic energy is given by

k/2. We must, of course, exclude the m : 0 states, as these are nonvanishing at r : 0. We assume the radius of the cylinder to be effectively zero, as did Aharonov and Bohm. Equations (76) and (88) show that p , p , p& , and p& are conserved quantities. We now turn to the physical signiﬁcance of the angles & and & . p : I" ;

e " 2c

p : r ;

e " 2c

p& : I"& : M

p& : r"& : m

e p 9 p& : r(" 9 "& ) ; . 2c With the second of Eqs. (80), this becomes e e e p 9 p& : r&" sin ' ; : r ; . 2c 2cI 2c

(92) (93)

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

85

The approximation r I then yields p 9 p& : 9

(94)

with :9

e , ch

as deﬁned by AB. Equation (94) shows that p& is the kinetic angular momentum of the electron. In the AB problem, it is the kinetic angular momentum (not the canonical angular momentum) that is quantized. This kinetic angular momentum is, of course, the canonical angular momentum in the new coordinates. For the past 18 years the author has been emphasizing that this must be the case (Henneberger, 1981). The angles & and & are the angles and in the absence of any interaction. Multiplication of the ﬁrst of Eqs. (79) by the (conserved) canonical angular momentum of the cylinder yields e

M ( 9 &) : 9& M sin ' : 9M 2cI

(95)

where terms of order e have been neglected in the last equality. The relation M /I : yields e M ( 9 &) : 9

: . ch

(96)

Equation (96) may be written

p d : 9

p d,

(97)

where the symbol refers to the change in these quantities due to the AB interaction. This is the result of Henneberger and Opatrny (1994), as was already given in the previous section. There is a second proof that it is the kinetic angular momentum that takes on integral values of in AB problems. We combine the equations

: m i &

(98)

: cos ' 9 sin ' . &

(99)

and

86

WALTER C. HENNEBERGER

The approximation sin ' $ tan ' and cos ' $ 1 yields m :

e 9 . i 2cI i

(100)

Putting /i(/ ) : I" and " : then yields m :

1 e 9 . i ch

(101)

Equation (101) shows that when the original canonical angular momentum operator acts on an eigenstate of 1/i(/& ), one obtains 1 e 1 m : L can 9 : L can ; . ch

(102)

Again, we see that, in the AB problem, the eigenvalues of (1/ )L can are m 9 , where m is an integer. The result of this section depends critically upon the quantization of the normal modes of the system. The author believes strongly that in the case of ambiguities it is the normal modes of the system that should be quantized. A person wishing to argue against this point of view might argue that one could ﬁnd the Hamiltonian corresponding to the original Lagrangian of Eq. (76) and then simply postulate stationary states of the form (r, , ) : f (r)e F e

(103)

where M and m are integers. This approach forces the quantization of the original canonical angular momenta, and leads directly to the solution for the electron found by Aharonov and Bohm. Several arguments against this solution have already been given. In classical theory, all sets of canonically conjugate variables are equivalent for solving a problem. This is not true in quantum theory. In general, a person who makes a canonical transformation before quantizing a system will arrive at a different result than one who does not. There are several other examples of this in the literature. These will be discussed in Section X. IX. The Interior of the Solenoid In the last section, we saw that the usual formulation of the AB problem suffers from the difﬁculty of having too few degrees of freedom. The reader may now wonder whether the well-known solutions of the Schro¨dinger equation in a constant magnetic ﬁeld suffer from the same defect. It will be seen in this section that, sadly, they do. These solutions are adequate for

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

87

almost all problems. However, they cannot correctly describe interference experiments. Let us turn now to the problem of the electron in the interior of the solenoid. The system has the Lagrangian 1 1 er 1 L : r! ; r" ; "" ; I". 2 2 ca 2

(104)

The canonical momenta are p :

L er : I" ; " 2ca "

L p : : r! r! p :

L er : r" ; . 2ca "

(105)

Because of the factor r in the interaction term of Eq. (104), it is no longer possible to transform the Lagrangian to normal coordinates by means of a single rotation, as in the previous section. The assumption that the angle ' is time-independent fails. It is the author’s hope that the reader has become convinced that, in cases where normal coordinates exist, it is the normal coordinates which must be quantized. The perplexing question is how to proceed in the absence of normal coordinates. Here, we deal with just such a case. It is possible to arrive at a somewhat satisfactory discussion in terms of the original coordinates. As before, the angular momenta p and p are conserved quantities. The value of p is I/, where is the value of the ﬂux when r : 0 (or equivalently, when no electron is present). The cylinder’s angular velocity is given by er" " : ; : ; ". caI

(106)

The cylinder phase shift is

1

1 p d : F

1 p " dt : 9 F

er eB d : 9 2ca

c

r d. (107) 2

Over one revolution of the electron in its orbit, the phase shift is 9eB/c (A) : 9e /c , where is the ﬂux encircled by the orbit of area A.

88

WALTER C. HENNEBERGER

The conserved electron canonical angular momentum is p" : r" ;

er" er er : r" ; 9 . 2ca 2ca 4acI

(108)

In the limit I -, the term in e/I may be dropped. The second term of Eq. (108) is the vector potential term. It gives a phase shift per cycle of e

c

A r d :

e

c

rB e e d : B · (area) : . 2

c

c

(109)

The phase shifts are again equal and opposite, as Eq. (75) requires. For an AB experiment performed inside the solenoid, the phase difference would be given by Eq. (109) with being the ﬂux between the two paths open to the electron. The phase shift of Eq. (109) is an example of the Berry phase (Berry, 1984). We see that, in closed systems, that is, systems that do not involve external potentials, the Berry phase follows directly from the dynamics of the system. Let us return to the cylinder phase shift of Eq. (107). Consider an electron passing through the axis of rotation of the cylinder (r : 0). For such an electron, p vanishes. The change in the angular velocity of the rotating cylinder is then er " : 9 " . 2caI

(110)

The energy change of the rotating cylinder is er Ber e E : I"" : 9 " : 9 " : 9 · A. 2ca 2c c

(111)

This last term is the negative of the overlap magnetic ﬁeld energy (i.e., the energy term involving the scalar product of the cylinder’s magnetic ﬁeld and the electron’s magnetic ﬁeld). The reader will recall that : " when the electron is at the origin, in accordance with Eq. (106). When r : 0, · A vanishes. Equation (111) shows that the energy shift is time dependent, in general, reﬂecting a lack of constancy in the angular velocity of the cylinder. As the electron moves in its orbit (which is ﬁxed in space), kinetic energy of the cylinder is continually being exchanged with magnetic energy. The average energy shift may be computed by means of a semiclassical argument. The phase shift per cycle for the cylinder was found in Eq. (109) to be 9e /(c ). A continuous shift in phase is just a shift in angular frequency. This frequency shift is just the phase shift divided by the period

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

89

of revolution of the electron. This is 9

e/c

eB :9 (orbit area). 2/eB 2c

Multiplication by gives an average energy shift per cycle of (112) (E : 9v, where v is the velocity of the electron in its orbit. Once more, this illustrates the difference in behavior of the cylinder when the electron is in the magnetic ﬁeld, and when it is in the exterior region. When the electron is in the magnetic ﬁeld, the cylinder undergoes a continuous phase shift, that is, a shift in energy. This energy will (at least partially, depending on the orbit) continuously oscillate between kinetic energy and the energy of the magnetic ﬁeld. As in the AB effect, the kinetic energy of the electron is unchanged. An electron in the exterior region travels in a straight line. The cylinder gets only a one-time phase shift — as does the electron wave function. We have already seen that, in the AB problem, the price paid for making an external ﬁeld approximation is the sacriﬁce of single-valuedness of the electron wave function. Were one to go further and use an external ﬁeld approximation over all space, one would omit a degree of freedom that is correlated with the motion of the electron. This solution would be adequate in the interior region, as long as one does not treat interference phenomena. However, we see that the behavior of the omitted cylinder is quite different as the electron approaches the boundary of the cylinder from the two sides of the cylinder wall. It seems highly unlikely to this author that the Schro¨dinger equation is valid over all space in the external ﬁeld approximation. In 1984, the author published his most controversial paper on the AB effect. At that time, wave functions corresponding to the external ﬁeld approximation were considered exact solutions. The AB problem was assumed to be 2D. The author postulated two boundary conditions on the current density: (A) n · j continuous, and (B)

(n ; j) continuous, n

where n is the normal to a given surface (in this case, the boundary of the solenoid). These boundary conditions, along with the assumption that the wave function in the interior of the solenoid is single-valued, give a wave

90

WALTER C. HENNEBERGER

function in the exterior region having the form f (r, )e\ ? , where f (r, ) is a single-valued function. This result is no longer especially interesting, as the complete AB problem involves 3 degrees of freedom, not 2. In the external ﬁeld approximation, there is no guarantee that the wave function in the interior of the cylinder is single valued. Only the total wave function of the electron plus spinning cylinder system need be single valued. Whether the result of Henneberger (1984) is right or wrong will only be determined when a solution of the complete problem of the electron plus spinning cylinder system is available in all of space. It is highly desirable to have a transformation of the type of Eq. (78) that is valid in the interior region. One would like this transformation to join smoothly with Eq. (78) at the cylinder boundary. To be sure, the interior system does not have normal coordinates. It is unclear how one would determine such a transformation. One highly desirable property of such a transformation would be that it eliminate velocity-dependent potentials. In every case of which this author is aware, elimination of velocitydependent potentials by means of a suitable canonical transformation brings with it not only an enormous simpliﬁcation of the problem, but also physics that is reasonable and unquestionably correct. This will be discussed in detail in the upcoming last section of this work.

X. Ambiguity in Quantum Theory, Canonical Transformations, and a Program for Future Work In classical theory, canonical transformations are simply changes in coordinate systems. The new coordinate system is equivalent to the old one in that both are equally valid for description of the system under consideration. The motivation for the transformation is generally convenience, for example, one may wish to transform to normal coordinates, as in the case of coupled oscillators. In quantum theory, canonical transformations can bring ambiguities with them. The reader who has carefully read Section VIII has already seen one example of this. The earliest example known to this author is the transformation by Maria Go¨ppert-Mayer (1931) in her well-known paper on Raman effect and two-quantum emission. She noted that the variational principle allows one to add the total time derivative of any function to the Lagrangian. She added L : 9

e d e e (r · A) : 9 · A ; er · E 9 r · ( · )A c dt c c

(113)

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

91

to the Lagrangian, which left her with the new interaction Lagrangian er · E, after making a dipole approximation (which neglects the r · ( · )A term). This interaction has been used by many authors over the years. It has come to be the interaction of choice in quantum optics. It was found by Lamb (1952), that, in some problems, the interaction Hamiltonians 9e/mcp · A and 9er · E give different results. This problem has been discussed extensively by Power and Zienau (1959), who showed that 9er · E is the better interaction Hamiltonian. One cannot make an argument supporting one on the basis that the other was derived from mathematical errors. Mathematically, one has two quantum systems. Unfortunately, at most one corresponds to the physical system that one is trying to describe. It seems to this author that one reason for preferring r · E over p · A is the following: In perturbation theory, one begins with eigenstates of 9 /(2m) ; V. In the absence of a perturbation, ( /i) : , the kinetic momentum. If one now introduces the perturbation 9e/mc(p · A), then /i() becomes ; e/c(A). Thus, the ‘‘unperturbed’’ state is no longer unperturbed. This does not happen if the perturbation is 9r · E. In this case, p is always . The Go¨ppert-Mayer transformation has done away with the offensive velocitydependent potential! The author has encountered quantum ambiguities several times in his career. The earliest was in his doctoral dissertation (1959). The problem involved a canonical transformation by van Kampen (1951) and later derived by Steinwedel (1955), who used an alternative derivation. The transformation involved bringing a charged harmonic oscillator coupled to the radiation ﬁeld (in dipole approximation) to normal coordinates. One of the author’s tasks was to use the transformation to show that a displaced ground state wave packet would oscillate with the usual radiation damping while holding its shape, and eventually settle at the origin. The author began by formulating the problem in terms of the original particle and ﬁeld variables. The problem was not only monstrously complicated, but (to the great relief of the author!), had the property that the mean square deviation of the new variables from their averages was a divergent integral! He quickly concluded that the formulation of the problem in terms of the original particle and ﬁeld variables made no sense. It then turned out that, when the problem was formulated in terms of the new variables, the normal coordinates, the problem was relatively simple. Again, formulation of the problem in terms of the normal coordinates does not involve velocity-dependent potentials. Another example is a canonical transformation due to Kramers (1938). The author obtained the same transformation many years later, starting with a time-dependent unitary transformation acting on the wave function in the Schro¨dinger equation (Henneberger, 1968). A few years after he

92

WALTER C. HENNEBERGER

published his result, the author realized that the two transformations were the same. The transformation is now known as the K-H transformation. The author (1970) quantized the electromagnetic ﬁeld after ﬁrst making a K-H transformation to the new variables. When one transforms the quantized system back to the original particle and ﬁeld variables, one ﬁnds that in addition to the usual Hamiltonian a term identical with the one usually put in ‘‘by hand,’’ called the mass renormalization term, is obtained. Here we have another example of an improvement of the system by quantizing it in a form that does not contain a velocity-dependent potential. The AB effect is merely the latest in a series of examples that demonstrates that solutions of quantum systems involving velocity-dependent potentials, that is, systems involving electrodynamics, have more than one solution. Indeed, there is probably a different solution for every possible canonical transformation that one might wish to make. The reader has hopefully noted that the AB effect is the simplest of all problems in electrodynamics. Even the problem of an electron in a constant magnetic ﬁeld is more difﬁcult when solved correctly. The importance of the AB effect lies in its simplicity. In the AB effect, nature is trying to tell us something. We should be listening! Finally, one should be concerned about a program for the future. It appears that physicists have chosen to quantize the obvious variables, while nature has in fact quantized some mysterious set of variables that is related to that of the physicist by a canonical transformation. One can only hope that (as in the examples given in this work) the transformation is linear. In the general case of q.e.d. (not in dipole approximation!) this is probably wishful thinking. Nature has hidden her secrets well. In the meantime, renormalization techniques make possible computations that yield an amazing agreement with experiments. A good beginning might be to start with the easiest problem, that of an electron interacting with a rotating charged solenoid, as discussed in the previous sections. To a young physicist in search of a problem, the author would suggest seeking a transformation in the interior region of the solenoid (rotating charged cylinder) that eliminates the velocity-dependent coupling between particle and ﬁeld, and that joins Eqs. (78) smoothly at the boundary of the cylinder. This would give an improved solution for an electron in a constant magnetic ﬁeld. It would also shed light on the problem of what happens at the boundary of the solenoid. There have in the author’s opinion been many errors made over the years regarding the AB effect. These have been based on one or more of the following erroneous assumptions: 1. If one allows the moment of inertia of the rotating cylinder to become inﬁnite, the external ﬁeld approximation becomes exact. Actually, the phase shift of the rotating cylinder is independent of its mass.

THE AHARONOV—BOHM EFFECT — A SECOND OPINION

93

2. The vector potential is not a physical quantity. This, as hopefully the reader has seen, is at best a half-truth. The transverse part of the vector potential is indeed a physical quantity. It is the electromagnetic momentum due to the presence of a charge at the point r (when multiplied by e/c). The remainder (the longitudinal part) is merely a computational convenience. It is perhaps noteworthy that there was a group in Italy that argued (quite correctly, in this writer’s opinion) that unphysical quantities cannot cause physical effects. Their failure to recognize that the vector potential is a physical quantity is what led them to conclude that there could be no AB effect, and that all of the experiments had to be wrong. 3. Problems in quantum theory have unique solutions. This, it appears, is only true if one forbids canonical transformations. It now appears that there is a different solution for every canonical transformation, assuming of course, that the transformation is carried out before the quantization. Failure to recognize these problems has led to an enormous amount of literature, all of which has generated an immense amount of heat, but very little light. References Aharonov, Y. and Bohm, D. (1959). Phys. Rev., 115: 485. Al-Jaber, S. M. and Henneberger, W. C. (1992). Il Nuovo Cim., 107B: 485. Berry, M. V. (1984). Proc. Roy. Soc. L ondon, A392: 45. Ehrenburg, W. and Siday, R. E. (1949). Proc. Phys. Soc., 62B: 8. Goldhaber, A. S. (1989). Phys. Rev. L ett., 62: 482. Go¨ppert-Mayer, M. (1931). Annalen der Physik, 9: 273. Henneberger, W. C. (1997). Int. Journal of T heor. Physics, 36: 2067. Henneberger, W. C. (1984). Phys. Rev. L ett., 52: 573. Henneberger, W. C. (1981). J. Math. Phys., 22: 116. Henneberger, W. C. (1970). Nuclear Physics, B23: 365. Henneberger, W. C. (1968). Phys. Rev. L ett. 21: 838. Henneberger, W. C. (1959). Zeitschr. fu¨r Physik, 155: 296. Henneberger, W. C. and Opatrny, T. (1994). Int. Journal of T heor. Physics, 33: 1783. Jackson, J. D. (1975). Classical Electrodynamics, 2nd edition, New York, London, Sydney, Toronto: John Wiley and Sons. Konopinski, E. J. (1981). Electromagnetic Fields and Relativistic Particles, New York: McGrawHill, p. 158. Kramers, H. A. (1938). Report to the 8th Solvay Congress. Lamb, W. E. (1952). Physical Review, 85: 259. Mo¨llenstedt, G. and Bayh, W. (1962). Phys. Bla¨tter, 18: 299. Pauli, W. (1958). Encyclopedia of Physics, 46, Berlin: Springer Verlag. Pauli, W. (1939). Helv. Physica Acta, 12: 147. Peshkin, M., Talmi, I. and Tassie, L. J. (1961). Annals of Physics, 12: 426.

94

WALTER C. HENNEBERGER

Power, E. A. and Zienau, S. (1959). Proc. Roy. Soc. L ondon, A251: 54. Shapiro, D. and Henneberger, W. C. (1989). J. Phys. A: Math. Gen., 22: 3605. Steinwedel, H. (1955). Annalen der Physik, 15, 207. Thomson, J. J. (1904). Elements of the Mathematical Theory of Electricity and Magnetism, 3rd edition, Cambridge: Cambridge University Press. Tonomura, A., Osakabe, N., Matsuda, T., Kawasaki, T., Endo, J., Yano, S., and Yamada, H. (1986a). Phys. Rev. L ett., 56: 792. Tonomura, A., Yano, S., Nobuzuki, O., Matsuda, T., Yamada, H., Kawasaki, T., and Endo, J. (1986b). Proc. 2nd Int. Symposium of Foundations of Quantum Mechanics, Tokyo, 97. van Kampen, N. G. (1951). Dan. Mat. Fys. Medd., 26: (15). Yang, C. N. (1983). Proc. International Symposium on Foundations of Quantum Mechanics, Tokyo, 5. Yu, X. and Henneberger, W. C. (1996). Int. Journal of T heor. Physics, 35: 393. Zhu, X. and Henneberger, W. C. (1990). Jour. Physics A: Math. Gen., 23: 3983.

ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 112

Well-Composed Sets LONGIN JAN LATECKI Department of Applied Mathematics, University of Hamburg, Bundesstr. 55, 20146 Hamburg

I. Introduction . . . . . . . . . . . . . . . . . . . . II. Deﬁnition and Basic Properties of Well-Composed Sets . . III. 3D Well-Composed Sets . . . . . . . . . . . . . . . A. Local Properties of the Continuous Analog . . . . . . B. Digital Characterization of Well-Composed Sets . . . C. Jordan-Brouwer Separation Theorem . . . . . . . . D. Properties of Boundary Faces . . . . . . . . . . . E. Connected Components in 3D Well-Composed Pictures IV. 2D Well-Composed Sets . . . . . . . . . . . . . . . A. Jordan Curve Theorem and Euler Characteristic . . . B. Thinning . . . . . . . . . . . . . . . . . . . . C. Irreducible Well-Composed Sets . . . . . . . . . . D. Graph Structure of Irreducible Sets . . . . . . . . . E. Parallel Thinning on Well-Composed Sets . . . . . . F. Making a Binary Image Well-Composed . . . . . . . G. Making a Gray-Level Image Well-Composed . . . . . V. Digitization and Well-Composed Images . . . . . . . . A. Continuous Representation of Real Objects . . . . . . B. Digitization and Segmentation . . . . . . . . . . . C. Digitizations Produce Well-Composed Images . . . . VI. Application: An Optimal Threshold . . . . . . . . . . A. Thresholding . . . . . . . . . . . . . . . . . . . B. Histogram of Checkerboard Patterns . . . . . . . . VII. Generalizations . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

95 98 103 103 106 109 111 112 113 117 117 123 126 127 135 140 142 142 146 149 154 154 157 159 161

I. Introduction Under a digital set we understand a subset X of Z such that either X or its complement X : Z )X is ﬁnite. Digital sets play a primary role in computer vision and computer graphics, because every object in a digital image is represented by a ﬁne subset of Z , where n is mostly equal to 2 or 3. For example, in order to recognize which real object is depicted in a digital image, it is necessary to analyze properties of digital sets obtained by segmenting the digital image into objects. Among the most important 95 Volume 112 ISBN 0-12-014754-8

ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright 2000 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/00 $35.00

96

LONGIN JAN LATECKI

properties are shape properties, because shape plays an important role in human perception. To analyze the shape properties, like differential-geometric properties, of all the possible subsets of R seems to be impossible. Therefore, shape analysis is restricted mostly to some class of subsets, like subsets of R whose boundaries are n 9 1D manifolds, for example, 2D manifolds in R looks locally like an open subset of the plane. The idea of deﬁning the class of well-composed sets is analog. Wellcomposed sets deﬁne exactly the class of subsets of Z whose continuous analog in R has the boundary that is n 9 1D manifold. The continuous analog has an intuitive and natural meaning in computer vision and computer graphics. Each point in a digital image has a dual nature. 1. It is a point (mostly with integer coordinates); or 2. it is a region in the plane R or in space R. For algorithmic processing, it is necessary to treat each point in a digital image as a point with integer coordinates, that is, as a topological object of dimension zero. For example, a 2D digital image is a ﬁnite rectangular array of points with integer coordinates. However, what is seen on any display is a set of regions with nonempty area, which can be more adequately modeled as subsets of the plane of topological dimension two. Mostly, a point in a 2D digital image is modeled as a unit square centered at the point with integer coordinates. Modeling points in digital images as sets with nonempty area or volume is not only useful for visualization purposes, but also for analyzing geometrical properties of objects in digital images. We will use the region interpretation of digital images to deﬁne wellcomposed images based on topological properties of the regions. Then we will translate the obtained deﬁnition to the point interpretation of digital images that is based on graph structure. Following Wang and Bhattacharya (1997) we will call the regions in the plane pixels (for picture elements) and regions in the space voxels (for volumetric elements). Since the correspondence between points in digital images and pixels plays an important role in this contribution, we will describe it formally in Section II. The concept of well-composed sets was ﬁrst introduced in Latecki et al. (1995) with the goal of distinguishing a special class of subsets of 2D binary digital pictures called ‘‘well-composed pictures.’’ The idea is not to allow the ‘‘critical conﬁguration’’ shown in Figure 1 to occur in a digital picture. Note that this critical conﬁguration can be detected locally. The 2D well-composed pictures have very nice topological properties; for example, the Jordan curve theorem holds for them, their Euler characteristic is locally computable, and they have only one connectedness relation,

WELL-COMPOSED SETS

97

Figure 1. This pattern and its 90° rotation do not occur in a well-composed digital image.

because 4- and 8-connectedness are equivalent. Therefore, when we restrict our attention to well-composed pictures, a number of very difﬁcult problems in digital geometry as well as complicated algorithms become relatively simple. This is demonstrated in Latecki et al. (1995) with the example of thinning algorithms. There are practical advantages in applying thinning algorithms to well-composed pictures. The thinning process (sequential as well as parallel) is greatly simpliﬁed and the resulting skeletons are ‘‘one-point thick.’’ Thus, the problems with irreducible ‘‘thick’’ skeletons disappear. On the other hand, if a set lacks the property of being well-composed, then it may not be possible to reduce it to a ‘‘thin’’ skeleton. Examples of irreducible sets with large interiors are given in Arcelli (1981). We discuss point-based properties of 2D well-composed sets in Section IV. Moreover, Gross and Latecki (1995) show that if the resolution of a digitization process is ﬁne enough to ensure topology preservation, then the obtained segmented 2D image must be well-composed (Section V). An important motivation for 2D well-composed sets were the connectivity paradoxes that occur if only one adjacency relation (e.g., either 4- or 8-adjacency) is used in the whole picture (presented at the beginning of Section IV). Such paradoxes are pointed out in Rosenfeld and Pfaltz (1966) (see also Kong and Rosenfeld (1989)). The most popular solution was that of using different adjacency relations for the foreground and the background: 8-adjacency for black points and 4-adjacency for white points, or vice versa (ﬁrst recommended in Duda et al., 1967). Rosenfeld (1979) developed the foundations of digital topology based on this idea, and showed that the Jordan curve theorem is then satisﬁed. However, the solution with two different adjacency relations does not work if one wants to distinguish more than two colors, that is, to distinguish among different objects in a segmented image, as shown in Latecki (1995). The same paradoxes appear in 3D segmental images. The author (1997) deﬁnes and analyzes 3D segmented ‘‘well-composed pictures’’ in which the connectivity paradoxes do not occur. He also gives a

98

LONGIN JAN LATECKI

topological motivation for well-composed sets based on the concept of continuous analog. We will treat 3D well-composed sets in Section III. The class of well-composed sets seems to be general enough, in the sense that it is possible to determine the digital sets in digital images obtained in practical applications in such a way that they are well-composed sets. In Section VI we show an example application of well-composed sets to determine an optimal threshold for gray-level document images. This application was inspired by the digitization theorem that is presented in Section V. Section II contains the main deﬁnitions. This section gives various generalizations of the concept of well-composedness. Sections III—VII are independent from each other. Therefore, each of them can be read directly after Section II. The deﬁnitions in Section II are also sufﬁcient in order to read Sections IV.F and G. The content of Sections IV.F and G is independent from the content of the other Section IV subsections.

II. Definitions and Basic Properties of Well-Composed Sets We will interpret Z , where n : 2, 3, . . . , as the set of points with integer coordinates in R. We call a set X * Z a digital set if either X or X : Z )X is ﬁnite. The following concept of a continuous analog gives us an intuitive and simple correspondence between points in Z and cubes in R. For example, a 3D digital set can be identiﬁed with a union of upright unit cubes that are centered at its point. Deﬁnition 1 The continuous analog CA(p) of a point p : (p , . . . , p ) + Z is given by CA((p , . . . , p )) : [p 9 , p ; ] ; % ; [p 9 , p ; ]. For example, CA(p) of a point p + Z is the closed unit cube centered at this point with faces parallel to the coordinate planes. The continuous analog of a digital set X * Z is deﬁned as CA(X) : +,CA(x) : x + X-, for example, see Figure 2. Formally, CA can be viewed as a function CA : P(Z ) P(R), and CA(p) as a short form for CA(,p-) if p + Z . We denote C : ,CA(p) : p + Z -. In particular, C is the set of closed squares and C is the set of closed cubes centered at points with integer coordinates.

WELL-COMPOSED SETS

99

Figure 2. The continuous analog of the digital set composed of the eight points is the union of the eight cubes.

Deﬁnition 2 We also deﬁne a dual function Dig to CA, which we call Z subset (or element) digitization: Dig : P(R) P(Z ) is given by Dig (Y ) : Z Z ,p + Z : p + Y -. Clearly, we have Dig (CA(X)) : X for every X * Z . The Z equation CA(Dig (Y )) : Y holds only if Y * R is a union of some cubes Z in C. Now we give the main deﬁnition of this chapter. Deﬁnition 3 A digital set X * Z is well-composed iff CA(X) is an ndimensional bordered manifold. Recall that a subset M of R is an n-dimensional bordered manifold if each point in M has a neighborhood homeomorphic to a relatively open subset of a closed half-space in R and M is not an n-dimensional manifold. A subset N of R is an n-dimensional manifold if each point in N has a neighborhood homeomorphic to R. We call each connected component of an n-dimensional manifold an n-dimensional surface. For example, the 3D digital set X in Fig. 2 is well-composed, because CA(X) is a 3D bordered manifold (i.e., every interior point of CA(X) has a neighborhood that is an open subset of R and every boundary point of CA(X) has a neighborhood homeomorphic to a neighborhood of the boundary point of a closed half-space in R). The two digital sets in Figure 3 are not well-composed, for example, the joint point of the two cubes in Figure 3(2) does not have a neighborhood homeomorphic to a relatively open subset of a closed half-space in space R. According to their deﬁnition, we could name well-composed sets digital bordered manifolds.

100

LONGIN JAN LATECKI

Figure 3. A digital set X * Z is well-composed iff the critical conﬁgurations of cubes (1) and (2) (modulo reﬂections and rotations) do not occur in CA(X) or CA(X ).

For Y * R we denote by bdY the topological boundary of Y in the standard topology on R induced by the Euclidean metric. Equivalently, one can deﬁne well-composed sets in the following way: Deﬁnition 4 We call a digital set X * Z well-composed if bdCA(X) is a n 9 1D manifold. For example, a 3D digital set is well-composed if the boundary surface of its continuous analog is a 2D manifold, that is, it ‘‘looks’’ locally like a planar open set. This deﬁnition implies a simple correspondence between boundaries of a digital set (that are deﬁned based on graph theory) and the boundary surface of its continuous analog, since the digital set is identiﬁed with the union of cubes centered at its points. Thus, we can use well-known properties of continuous boundary surfaces, like the Jordan-Brouwer separation theorem, to determine and analyze properties of the digital set. The equivalence of both deﬁnitions follows from the fact that a subset M * R that is a ﬁnite union of n-dimensional cubes is an n-dimensional bordered manifold iff bdM is an n 9 1D manifold. Although the well-composed sets are deﬁned for any dimension n, we will concentrate on their properties for n : 2 and 3, since these dimensions play the most important role in digital image processing. Now we consider more carefully the deﬁnition and properties of 2D well-composed sets. Let X * Z be a digital set. The continuous analog CA(X) is the union of closed unit squares centered at points of X. The boundary bdCA(X) is the union of the set of unit line segments, each of which is the common edge of a square in CA(X) and a square in CA(X ). Observe that there is only one kind of adjacency for line segments contained in bdCA(X): two segments are adjacent if they have an endpoint in common. Hence, there is only one kind of connectedness for bdCA(X). The unit line segments contained in bdCA(X) correspond to pairs of 4-adjacent points (p, q) such that p + X and q , X. (The adjacency relations are deﬁned in what follows.) In the 2D dase, we obtain the following deﬁnition: A digital set X is

WELL-COMPOSED SETS

101

Figure 4. The continuous analog of a 2D well-composed picture does not contain this critical conﬁguration and its 90° rotation.

well-composed if bdCA(X) is a 1D manifold (each point in bdCA(X) has a neighborhood homeomorphic to R). The following characterization of 2D well-composed sets can be easily proven: Theorem 1 A digital set X * Z is well-composed iff the critical conﬁguration shown in Figure 4 and its 90° rotation do not occur in CA(X) iff the only possible 2;2 conﬁgurations of boundary squares in CA(X) (modulo 90° ■ rotation) are as in Figure 5. The preceding deﬁnition of well-composed sets is based on region interpretation of digital sets. Now we give an equivalent point-based deﬁnition of 2D well-composed sets. First we need to deﬁne adjacency relations for digital points: The 4-neighbors (or direct neighbors) of a point (x, y) in Z are its four horizontal and vertical neighbors (x ; 1, y), (x 9 1, y) and (x, y ; 1), (x, y 9 1) (see Figure 6(a)). The 8-neighbors of a point (x, y) in Z are its four horizontal and vertical neighbors together with its four diagonal neighbors (x;1, y;1), (x;1, y91) and (x91, y;1), (x91,y91) (see Fig. 6(b)). If two points P, Q + Z are n-neighbors, we also call them n-adjacent and write n 9 adj(P, Q), where n : 4 or 8. For n : 4 or 8, the n-neighborhood of a point P : (x, y) in Z is the set N (P) consisting of P and its n-neighbors. N*(P) is the set of all neighbors of P without P itself. Note that N*(P) : N (P)),P-. The points in N*(P)

Figure 5. The only possible 2 ; 2 conﬁgurations of boundary squares in CA(X) of a well-composed set X * Z (modulo 90° rotation).

102

LONGIN JAN LATECKI

Figure 6. (a) shows four 4-adjacent points (or 4-neighbors) of the center point and (b) shows eight 8-adjacent points (or 8-neighbors) of the center point.

are numbered 0 to 7 according to the following scheme: N (P) N (P) N (P)

N (P) P N (P)

N (P) N (P) N (P)

Each N , i : 0, . . . , 7, is a function from Z to Z that maps a point P to one of its neighbors according to this scheme. A digital set X is n-connected if for every pair of points P, Q in X, there is an n-path contained in X from P to Q. Sometimes we will say that X is connected if the adjacency relation is clear from the context. A (connected) n-component of a set S is a greatest n-connected subset of S. In particular, if S is connected, then the only component of S is S itself. The following proposition (that can be easily proven) translates the region-based deﬁnition of 2D well-composed sets to a point-based deﬁnition that requires graph structure introduced by adjacency relations between points: Proposition 1 A digital set X * Z is well-composed iff for every x, y + X if x is 8-adjacent to y, then there exists z + X such that z is 4-adjacent to both x and y. ■ As a simple consequence we obtain Proposition 2, here. Under a binary digital image we understand a pair (Z , X), where either X * Z or X is ﬁnite. On X and its complement X , there is deﬁned an adjacency relation. Every point in X is called a foreground or black point and assigned value 1; every point in X : Z )X is called a background or white point and assigned value 0. Deﬁnition 5 We call a binary digital image (Z , X) well-composed if the set X is a well-composed set.

WELL-COMPOSED SETS

103

Figure 7. The sum of two well-composed sets does not need to be a well-composed set.

Proposition 2 An image (Z, X) is well-composed iff every 8-component of X is also an 4-component of X and every 8-component of X is also an 4-component of X . Under a segmented digital image we understand (Z , X , . . . , X ), where either X * Z or X is ﬁnite for i : 0, . . . , k and X . X : ` if i " j. A binary digital image is a special case of a segmented image in which we distinguish only two objects, the foreground and the background. Deﬁnition 6 We call a segmented digital image (Z , X , . . . , X ) well composed if each X is a well-composed set for i : 0, . . . , k. Observe that the set A / B does not need to be well-composed if A and B are well-composed (see Figure 7). This deﬁnition of a well-composed segmented image is equivalent to Deﬁnition 3.2 in Wang and Bhattacharya (1997) applied to X : X / % / X , since then X , . . . , X is a partition of X in their work. The case of 3D well-composed sets is treated in the next section.

III. 3D Well-Composed Sets A. L ocal Properties of the Continuous Analog This section is based on Latecki (1997). The deﬁnition of well-composed 3D sets is nicely visualized by Proposition 3 which shows the equivalence of this deﬁnition to two simple local conditions on boundary faces. Since CA(X) for X * Z is a union of cubes, bdCA(X) is a union of squares, that is, sides of the cubes that are contained in bdCA(X). We call a corner point of such a square a corner point of bdCA(X). Proposition 3 A digital set X * Z is well-composed iff for any corner point x + bdCA(X), the boundary faces of CA(X) that contain x have one of the six conﬁgurations shown in Figure 8 (modulo reﬂections and rotations).

104

LONGIN JAN LATECKI

Figure 8. In the continuous analog of a 3D well-composed picture, only these conﬁgurations of boundary faces can occur around a corner point on the object boundary.

Proof ‘‘A’’ Clearly, if a point x + bdCA(X) lies in the interior of some square contained in the boundary bdCA(X), then x has a neighborhood homeomorphic to R. If one of the conﬁgurations shown in Figure 8 (modulo reﬂections and rotations) appears at a corner point x of some square in bdCA(X), then x has a neighborhood homeomorphic to R. Hence bdCA(X) is a 2-manifold. ‘‘$’’ Let x + bdCA(X) be a corner point of some square in bdCA(X). In this case eight cubes share x as their common corner point; some of them are contained in CA(X) and some are not. By simple analysis of all possible conﬁgurations of the eight cubes, we will show that boundary faces of CA(X) that contain point x can have only the conﬁgurations shown in Fig. 8 (modulo rotations and reﬂections), since x has a neighborhood homeomorphic to an open subset of R. We start this analysis with one cube q : R whose corner point is x such that q : CA(p) for some point p + X : Z. If all other cubes whose corner point is x are contained in CA(X ), then boundary faces of q that contain x form conﬁguration (a) in Figure 8. If there is one more cube r contained in CA(X) that shares x with q, r must share a face with q, since conﬁgurations (1) and (2) in Figure 3 are not allowed (x would not have a neighborhood homeomorphic to R). Thus, boundary faces of q / r that contain x form conﬁguration (b) in Figure 8. By similar arguments, if we add a third cube, we only obtain conﬁguration (c) of boundary faces. If we add a fourth cube, we obtain one of conﬁgurations (d), (e), or (f).

WELL-COMPOSED SETS

105

Adding a ﬁfth cube will transform the conﬁgurations (d), (e), or (f) of boundary faces to conﬁguration (c), which is now viewed as having ﬁve cubes in CA(X). Adding a sixth cube will transform conﬁguration (c) of boundary faces (of ﬁve cubes) to conﬁguration (b), which is now viewed as having six cubes in CA(X). Adding a seventh cube will yield conﬁguration (a) of boundary faces of seven cubes in CA(X). Thus, we have shown that boundary faces of CA(X) that contain point x can have only the six conﬁgurations in Figure 8 (modulo rotations and reﬂections). ■ Observe that the six face neighborhoods of a corner point shown in Figure 8 are exactly the same as shown in Chen and Zhang (1993) and Francon (1995). The fact that only the six boundary conﬁgurations shown in Figure 8 are possible for a corner point x + bdCA(X) of a well-composed set X * Z (Proposition 3) gives a nice motivation for calling the boundary of a 3D well-composed set a digital 2D manifold. The points of this digital manifold are the corner points x + bdCA(X). Analogous to continuous 2-manifolds, we can deﬁne derivatives at corner points x + bdCA(X) and their curvature. Clearly, these deﬁnitions are based on ﬁnite differences with respect to the neighbor corner boundary points. Further, we can classify corner points x + bdCA(X) according to their discrete derivatives, see Figure 8: 1. corner point x in (a) and (c) is an elliptic point; 2. corner point x in (b) and (d) is a parabolic point; and 3. corner point x in (e) and (f) is a hyperbolic point. Observe that there is only one connectedness relation on faces contained in the boundary of the continuous analog CA(X) of a well-composed picture (Z, X): A set of boundary faces S is a corner-connected component of bdA(X) iff S is an edge-connected component of bdCA(X). As every boundary bdCA(X) is a ﬁnite union of some set of closed faces S, that is, bdCA(X) : +S, the statement that bdCA(X) is a simple closed surface means here that bdCA(X) is a connected 2D manifold in R. Hence, we obtain the following proposition. Proposition 4 A digital set X * Z is well-composed iff every component of ■ bdCA(X) is a simple closed surface. The deﬁnition of well-composed sets is nicely visualized by Proposition 5, which shows the equivalence of this deﬁnition to two simple local conditions on voxels in the continuous analog. Proposition 5 A digital set X * Z is well-composed iff the critical conﬁgurations of cubes (1) and (2) in Figure 3 (modulo reﬂections and rotations) do not occur in CA(X) and CA(X ).

106

LONGIN JAN LATECKI

Proof The proof follows directly from Proposition 3, since for any corner point x + bdCA(X), the boundary faces of CA(X) that contain x have one of the six conﬁgurations shown in Figure 8 iff the critical conﬁgurations of cubes (1) and (2) do not occur in CA(X) or CA(X ). ■ In Artzy et al. (1981) the digital 3D sets that do not contain conﬁguration (1) in Figure 3 (modulo reﬂections and rotations) are deﬁned to be solid. However, conﬁguration (2) can occur in a solid set. B. Digital Characterization of Well-Composed Sets In this section we give a ‘‘digital characterization’’ (using only points in Z) of well-composed sets. First we need deﬁnitions of adjacency relations on points in Z that induce a graph structure. Due to the duality between points in Z and voxels (i.e., cubes centered at these points), the following deﬁnitions apply to points as well as to voxels in R. While the deﬁnitions for points reﬂect the graph theoretical structure of Z, the deﬁnitions for voxels are based on topological properties of the continuous analogs. Two distinct points p : (p , p , p ), q : (q , q , q ) + Z are said to be 26-adjacent (26-neighbors) if p 9 q ; p 9 q ; p 9 q 3, that is, all three of the coordinates of p, q differ by at most one 1. All 26 points different from x in Figure 9 illustrate the 26-neighbors of x. Two distinct points p, q + Z are said to be 18-adjacent (18-neighbors) if p 9 q ; p 9 q ; p 9 q 2, that is, if at most two coordinates of p, q differ by 1. Two distinct points p, q + Z are said to be 6-adjacent (6-neighbors) if p 9 q ; p 9 q ; p 9 q 1, that is, if two of the coordinates of p, q are the same and the third coordinates differ by 1. We will denote the set of closed faces of cubes in C by F, that is, each f + F is a unit closed square in R parallel to one of the coordinate planes. Two distinct points p, q + Z are 6-adjacent iff voxels CA(p) and CA(q) share a face. In this case, we say that CA(p) and CA(q) are face-adjacent. We say that voxels CA(p) and CA(q) are edge-adjacent if cubes CA(p) and CA(q) share an edge but not a face (i.e., CA(p) . CA(q) is a line segment), which is equivalent to the fact that p, q + Z are 18-adjacent but not 6-adjacent (see Figure 9). We say that voxels CA(p) and CA(q) are corner-adjacent if cubes CA(p) and CA(q) share a vertex but not an edge (i.e., CA(p) . CA(q) is a single point), which is equivalent to the fact that p, q + Z are 26-adjacent but not 18-adjacent (see Figure 9). In this case, we say that p and q are diagonally adjacent. A set X : Z is k-adjacent to a point p + Z if there exists q + X such that p and q are k-adjacent, where k : 6, 18, 26.

WELL-COMPOSED SETS

107

Figure 9. The slightly larger black balls illustrate the six 6-neighbors of the center point x, the gray balls illustrate the edge neighbors of x, and the small black balls illustrate the corner neighbors of x.

N (p) denotes the set containing p + Z and all points k-adjacent to p and N*(p) denotes N (p)),p-, where k : 6, 18, 26. N (p) is also referred to as N(p) and is called the neighborhood of p, whereas N (p)),p- is referred to as N*(p). A common face of two cubes centered at points p, q + Z (i.e., a unit square parallel to one of the coordinate planes) can be identiﬁed with the pair (p, q). Such pairs are called ‘‘surface elements’’ in Herman (1992), as they are constituent parts of object surfaces. We can extend CA to apply also to pairs of points by deﬁning CA((p, q)) : CA(p) . CA(q) for p, q + Z, and CA(B) : +,CA(x) : x + B-, where B is a set of pairs of points in Z. In particular, we have F : ,CA((p, q)) : p, q + Z and p is 6 9 adjacent to q-. The ( face) boundary of a continuous analog CA(X) of a digital set X * Z is deﬁned as the union of the set of closed faces each of which is the common face of a cube in CA(X) and a cube not in CA(X). Observe that the face boundary of CA(X) is just the topological boundary bdCA(X) in R. The face boundary bdCA(X) can also be deﬁned using only cubes of the set CA(X) as the union of the set of closed faces each of which is a face of exactly one cube in CA(X). We have bdCA(X) : bdCA(X ), where X : Z)X is the complement of X. The (6-) boundary of a digital set X * Z can be deﬁned as the set of pairs bd X : ,(p, q) : p + X and q , X and p is 6 adjacent to q-. We have bdCA(X) : CA(bd X) : CA(bd (X )).

108

LONGIN JAN LATECKI

As bdCA(X) : bdCA(X ), a set X is well-composed iff the set X is well-composed. Two distinct faces f , f + F are edge-adjacent if they share an edge, that is, if f . f is a line segment in R. Two distinct faces f , f are corner adjacent if they share a vertex but not an edge, that is, if f . f is a single point in R. Proposition 6 A 3D digital set X * Z is well-composed iff the following conditions hold for 0 : 0, 1, where X : X and X : X : (C1) for every two 18-adjacent but not 18-adjacent points x, y in X , there G is a point z in X that is 6-adjacent to both x and y; and G (C2) for every two 26-adjacent but not 18-adjacent points x, y in X , there G is a 6-path joining x to y in N(x) . N(y) . X . G Proof Let X : X , where 0 : 0, 1. We show ﬁrst that the negation of G condition (C1) is equivalent to the fact that conﬁguration (1) (Figure 3) occurs in CA(X). If conﬁguration (1) occurs in CA(X), then there exist four distinct points x, y + X and a, b , X such that CA(x), CA(y), CA(a), CA(b) share an edge. Then x, y + X are 18- but not 6-adjacent in X. Only the points a, b are 6-adjacent to both x and y, but a, b , X. Conversely, if there exists two 18- but not 6-adjacent points x, y in X such that there does not exist a point z in X that is 6-adjacent to both x and y, G then the conﬁguration (1) (Figure 3) occurs in CA(X). Now we show that if conﬁguration (2) (Figure 3) occurs in CA(X), then condition (C2) does not hold. Let x, y + X be such that CA(x) and CA(y) form conﬁguration (2). Then x, y + X are 26- but not 18-adjacent in X. Figure 10 shows the intersection N(x) . N(y) of two 26- but not 18adjacent points x and y. It is easily seen that the other six points in N(x) . N(y) do not belong to X. Therefore, there is no 6-path joining x to y in N(x) . N(y) . X . G Finally, we assume the negation of condition (C2). Let x, y in X be two 26-adjacent points such that there is no 6-path joining x to y in N(x) . N(y) . X. This implies that conﬁguration (2) or conﬁguration (1) occurs in CA(X). ■ The following proposition implies that there is only one kind of connected components in a well-composed picture, because 26-, 18-, and 6-connected component are equal. Proposition 7 Let X * Z be well-composed. Then, each 26-component of X is a 6-component of X and each 18-component of X is a 6-component of G G G X , where 0 : 0, 1 and X : X and X : X . G

WELL-COMPOSED SETS

109

Figure 10. The slightly larger black balls illustrate the intersection N(x) . N(y) of two 26- but not 18-adjacent points x and y.

Proof Let x : x , x , . . . , x : y be a 26-path joining x to y in X . By G condition (C2) in Proposition 6, for any two 26-neighbors x , x , i : 1, . . . , n 9 1, there is a 6-path joining x to x in X . Thus, there exists a G 6-path joining x to y in X . The argument for 18-components is similar. ■ G C. Jordan-Brouwer Separation Theorem An important motivation for introducing 3D well-composed pictures is the following digital version of the Jordan-Brouwer separation theorem. We recall that in a digital picture (Z, X) either X : X or its complement X : X is ﬁnite and nonempty. Theorem 2 If a 3D digital set X is well-composed, then for every connected component S of bdCA(X), R)S has precisely two connected components of which S is the common boundary. Proof The proof of this theorem follows directly from Theorem 3, which is stated at the end of this section. It is sufﬁcient to observe that by

110

LONGIN JAN LATECKI

Proposition 3, a connected component of bdCA(X) is a strongly connected polyhedral surface without boundary, which we deﬁne in what follows. ■ Note that if a digital picture is not well-composed, Theorem 2 does not hold, for example, if X is a two-point digital set such that CA(X) is as shown in Figure 3. Now we deﬁne polyhedral surfaces in R. They were used in Kong and Roscoe (1985) to prove 3D digital analogs of the Jordan curve theorem. Let n 0 and let ,T : 0 i n- be a set of closed triangles in R. The set +,T : 0 i n- is called a polyhedral surface if the following conditions both hold: (i) If i " j, then T . T is either a side of both T and T or a corner of both T and T or the empty set; and (ii) each side of a triangle T is a side of at most one other triangle. The (1D) boundary of a polyhedral surface S : +,T : 0 i n- is deﬁned as +,s : s is a side of exactly one T -. Observe that this deﬁnition produces the same boundary of S for every dissection of S into triangles fulﬁlling (i) and (ii). We say that S is a polyhedral surface without boundary if the boundary of S is the empty set. A polyhedral surface S is strongly connected if for any ﬁnite set of points F * S, the set S)F is polygonally connected, where the deﬁnition of a polygonally connected set is the following: If u and v are two distinct points in R, then uv denotes the straight line segment joining u to v. Suppose n 0 and ,x : 0 i n- is a set of distinct points in R such that whenever i " j, x x . x x : ,x , x - . ,x , x -, then arc(x , x ) : ,x x : 0 i n- is a simple polygonal arc joining x to x . We call a subset S of R polygonally connected if any two points in S can be joined by a simple polygonal arc contained in S. Now we can state the Jordan-Brouwer separation theorem for a strongly connected polyhedral surface without boundary. This theorem is a very important result of combinatorial topology (e.g., see Aleksandrov, 1960). It was applied in Kong and Roscoe (1985) to establish separation theorems for digital surfaces: Theorem 3 If S is a strongly connected polyhedral surface without boundary then R)S has precisely two components, and one of the components is bounded. S is the boundary of each component. Our proof of Theorem 2 follows directly from the Jordan-Brouwer separation theorem stated in Theorem 3, because every connected component S of bdCA(X) of a well-composed set X is a strongly connected polyhedral surface without boundary.

WELL-COMPOSED SETS

111

D. Properties of Boundary Faces Recall that we interpret Z as a set of points with integer coordinates in the space R, C is a set of closed unit upright cubes which are centered at points of Z, and F is a set of closed faces of cubes in C, that is, each f + F is a unit closed square in R parallel to one of the coordinate planes. Note that C : ,CA(p) : p + Z- and F : ,CA((p, q)) : p, q + Z and p is 6 adjacent to q-. We also recall that the function Dig : P(R) P(Z) is deﬁned by Z Dig (Y ) : ,p + Z : p + Y -. We begin this section with a theorem relating Z well-composed pictures to simple closed surfaces composed of faces in F. Theorem 4 Let S : F be a ﬁnite and nonempty set of faces in R. +S is a simple closed surface (i.e., +S is a connected and compact 2D manifold in R) iff R)+S has precisely two components X and X , +S is the common boundary of X and X , and the set Dig (X ) * Z is well-composed. Z The proof of this theorem will be given here. Observe that the implication ‘‘A’’ in Theorem 4 would not be true if the set Dig (X ) were not Z well-composed. Let S : bdCA(D), where D is a digital set of 1’s in the following 2 ; 2 ; 2 conﬁguration (on a background of 0’s): 1 0

1 1

1 1

0 1

Then R)S has precisely two components, but S is not a simple closed surface, as the common corner of the six black (i.e., 1-) voxels does not have a neighborhood homeomorphic to R. To better understand the equivalence in Theorem 4, we consider Theorem 5, the six simple local conﬁguration of faces shown in Figure 8. Theorem 5 If S : F is a ﬁnite and nonempty set of faces in R, then the following conditions are equivalent: (i) +S is a simple closed surface (i.e., +S is a connected and compact 2D manifold in R); and (ii) S is corner-connected and for every corner point x + +S, the boundary faces of S that contain x as their corner point have one of the six conﬁgurations shown in Figure 8 (modulo reﬂections and rotations). Proof ‘‘(i) $ (ii)’’ As +S is a simple closed surface, each point s + + S has a neighborhood homeomorphic to R. Thus, in particular, each corner point x of a face in S has a neighborhood homeomorphic to R. By simple case checking as in the second part of the proof of Proposition 3, it can be shown that Figure 8 shows all possible conﬁgurations (modulo rotations and reﬂections) of faces in F that share a common corner point x such that

112

LONGIN JAN LATECKI

x has a neighborhood homeomorphic to R. Now as +S is connected, the set of faces S must be corner-connected. Thus, we obtain (i) $ (ii). ‘‘(ii) $ (i)’’ We assume (ii). Then every point in the 2D interior of a face in S clearly has a neighborhood homeomorphic to R. As every edge belongs to exactly two faces in S, every point of an edge (except the two corner points) has a neighborhood homeomorphic to R. Because for every corner point x of a face in S, the set of faces sharing x has one of the six conﬁguration of faces shown in Figure 8, x has a neighborhood homeomorphic to R. Thus, +S is a 2D manifold. +S is a connected subset of R, because S is corner-connected. As +S is a ﬁnite union of closed squares in R, +S is compact. Therefore, +S is a simple closed surface. ■ Now we are ready to prove Theorem 4. Proof of Theorem 4 ‘‘$’’ Let +S be a simple closed surface. Then S satisﬁes condition (ii) of Theorem 5. Consequently, +S is a strongly connected polyhedral surface without boundary. By Theorem 3, R)+S has precisely two components X and X , and +S is the common boundary of X and X . It remains to show that the digital set Dig (X ) is well Z composed. As X / +S : CA(Dig (X )), we have +S : bd(CA(Dig (X ))) for i : Z Z 1, 2. Thus, bd(CA(Dig (X ))) is a 2D manifold for i : 1, 2. We obtain that Z (Z, Dig (X )) is well-composed for i : 1, 2. Z ‘‘A’’ Because Dig (X ) is well-composed, bd(CA(Dig (X ))) is a 2D maniZ Z fold. As the closed set X / +S is a union of some cubes in C, we obtain X / +S : CA(Dig (X )). Hence +S : bd(CA(Dig (X ))), which means Z Z that +S is a 2D manifold in R. As +S is a ﬁnite union of closed squares in R, it is compact. It remains to show that +S is connected. If +S were not connected, then there would be more than two components of R)+S, because every connected component of +S would be a strongly connected polyhedral surface without boundary, and therefore, it would satisfy Theorem 3. ■ E. Connected Components in 3D Well-Composed Pictures Rosenfeld and Kong (1991) proved the following theorem for 2D digital pictures: Theorem 6 For every ﬁnite and nonempty set X : Z, the boundary bdCA(X) is a simple closed curve (i.e., bdCA(X) is connected and each line segment in bdCA(X) is adjacent to exactly two others) iff X and X are both 4-connected. ■ As it is shown in Rosenfeld and Kong (1991), an analogous theorem does

WELL-COMPOSED SETS

113

not hold in 3D: Let X be a set of 1’s in the following 2 ; 2 ; 2 conﬁguration (on a background of 0’s): 1 1

0 1

1 1

1 0

Then X and X are both 6-connected, but bdCA(X) is not a simple closed surface. However, the inverse implication is proved in Rosenfeld and Kong (1991), Proposition 9: Theorem 7 If the boundary bdCA(X) of a set X : Z is a simple closed surface, then X and X are both 6-connected. ■ Using the concept of well-composedness, we can generalize Theorem 6 to three dimensions: Theorem 8 For every ﬁnite and nonempty set X : Z, the boundary bdCA(X) is a simple closed surface iff X and X are both 6-connected and set X (and consequently X ) is well-composed. Proof ‘‘$’’ By Theorem 7, we obtain that X and X are both 6-connected. As a simple closed surface is in particular a 2D manifold, we obtain that X is well-composed. ‘‘A’’ As X and X are 6-connected, CA(X) and CA(X ) are connected subsets of R and bdCA(X) is their common boundary. Therefore, bdCA(X) is also a connected subset of R. As X : Z is ﬁnite, bdCA(X) is compact. By deﬁnition, the fact that (Z, X) is well-composed implies that bdCA(X) is a 2D manifold. Consequently, bdCA(X) is a simple closed surface. ■

IV. 2D Well-Composed Sets This section is based on Latecki et al. (1995). A digital analog of a continuous concept is deﬁned based on a discrete theory, mostly graph theory. Then it is shown using graph-theoretical tools that some particular properties of the digital concept are the same as the corresponding properties of its continuous original. For example, a digital version of a simple closed curve can be deﬁned as a connected subgraph of the digital plane (with a certain minimal number of points) in which each point is adjacent to exactly two other points. Further, it can be proved that a simple closed curve separates the digital plane into exactly two components, which can be interpreted as the interior and the exterior of the curve (Rosenfeld, 1979). Thus, a digital simple closed curve has exactly the same separability property as its continuous original. This property is known as the Jordan

114

LONGIN JAN LATECKI

curve theorem. Rosenfeld proved this property for a special graph structure of Z, which we will describe in what follows. An important property of a continuous arc is that its connectivity is destroyed by the removal of one point other than an endpoint. A continuous simple closed curve has the property that deleting any of its points makes it an arc. These statements can be used for general deﬁnitions of a discrete arc and a discrete simple closed curve in any graph, which reduce to the following deﬁnitions in the case of a digital plane or space. A ﬁnite set A is called a (digital) n-arc if it is n-connected, and all but two of its points have exactly two n-neighbors in A, while these two have exactly one. These two points are called endpoints. A ﬁnite set C is called a (digital) simple closed n-curve (Jordan n-curve) if it is n-connected, and each of its points has exactly two n-neighbors in C. To avoid pathological situations, we require that an n-simple closed curve contains a certain minimal number of points, for example, a 4-curve contains at least 8 points and an 8-curve at least 4 points (Rosenfeld, 1979). By these deﬁnitions, an important and common property of a continuous simple closed curve and a digital simple closed curve is that their connectivity is destroyed by the removal of any two points. A typical situation in deﬁning discrete analogy of continuous concepts is the following: It is not enough to deﬁne a digital concept, but it is additionally necessary to specify the graph structure of the digital space in which this concept should have properties analogous to its continuous original. For example, there are graph structures on Z in which a simple closed curve does not satisfy the Jordan curve theorem. This fact was noted early in the history of image analysis and referred to as ‘‘connectivity paradoxes.’’ We illustrate some of the paradoxes pointed out in Rosenfeld and Pfaltz (1966) (see also Kong and Rosenfeld (1989)). If 8-adjacency is used for black as well as white points in Figure 11a, then the black points form a discrete simple closed 8-curve, but this curve does not separate the

Figure 11. Connectivity paradoxes.

WELL-COMPOSED SETS

115

complement (i.e., white points) as there is an 8-path between any pair of the white points, which means that the set of the white points is 8-connected. This example shows that a digital version of the Jordan curve theorem does not hold for 8-adjacency. Figure 11b shows that the situation is equally problematic for 4-adjacency. The black points constitute a simple closed 4-curve, but there exist three 4-connected components of the complement. Thus, in both cases, a digital version of the Jordan curve theorem does not hold. Observe also that, if 4-adjacency is used for black as well as white points in Figure 11a, then the black points are totally disconnected. However, they separate the set of white points into two 4-components. The most popular solution to these problems was the idea of using different adjacency relations for the foreground and the background: 8adjacency for black points and 4-adjacency for white points, or vice versa (ﬁrst recommended in Duda et al. (1967)). If we consider 8-adjacency for the black points in Figure 11a, then the set of black points forms a simple closed 8-curve which separates the white background into exactly two 4-components. Similarly, if we consider 4-adjacency for the black points in Figure 11b, then the set of black points forms a simple closed 4-curve, which separates the white background into exactly two 8-components. Rosenfeld (1979) developed the foundations of digital topology based on this idea, and showed that the Jordan curve theorem is then satisﬁed. The price one has to pay for this solution is that there are two different adjacency relations in one digital picture, that depend on the objects being represented. Therefore, the adjacency relation is not an intrinsic feature of a digital picture as a representing medium. Consequently, connected components are also not intrinsic features as is the case for R with the usual topology. As we have one connectedness relation for the foreground (e.g., the set of black points) and another for the background (e.g., the set of white points), interchanging the foreground and the background also changes the connectedness relations of the digital picture. By the foreground we mean the objects whose properties we want to analyze, and by the background all the other objects of a digital image. Hence, the choice of foreground and of background is critical, especially in cases where it is not clear at the beginning of the analysis what constitutes the foreground and what constitutes the background, since this choice immediately determines the connectedness structure of the digital picture. A solution that allows us to avoid the connectivity paradoxes while having only one connectedness relation for the entire digital picture is to restrict the class of all possible binary images to well-composed images. This kind of solution is commonly applied in mathematical theories, because in most ﬁelds of mathematics we do not treat all subsets of a given space, but

116

LONGIN JAN LATECKI

only a class of subsets that have ‘‘nice properties’’ with regard to features we are especially interested in. As the pictures such as those shown in Figure 11 are not well-composed, their connectivity paradoxes will not occur in well-composed pictures. At ﬁrst glance, it may seem that by requiring well-composedness we restrict the variety of digital pictures. However, requiring digital pictures to be wellcomposed is actually a consequence of requiring that the process of digitization preserves topology, since as we show in Section V, if the digitization resolution is ﬁne enough to ensure topology preservation, then the output digital picture is well-composed. This gives us also the right to ‘‘repair’’ a non-well-composed picture, because if the neighborhood of a point is not well-composed, it must be due to noise. As well-composedness is a local property, that is, it depends on the colors of single picture points, it can be decided very efﬁciently in parallel whether a given set is well-composed. Therefore, a picture can be ‘‘repaired’’ by adding (or subtracting) single points by a parallel algorithm (Section IV.F). The other possibility, which is more promising for applications, is to impose local conditions on the segmentation process that ensure that the obtained segmented image is well-composed. In a well-composed image not only the concept of Jordan curve has analog separation properties to its continuous original, but additionally every well-composed image can be regarded as a planar graph, that is, it can be embedded into the plane R in such a way that its links do not intersect, as only 4-adjacency links need to be considered. The simpler adjacency structure of a well-composed image suggests that algorithms used in image processing should become simpler when restricted to well-composed images. We will investigate this conjecture on thinning algorithms. We will show that the resulting skeletons have a very simple structure if the input set is well-composed. We will show that simple thinning algorithms can be deﬁned for well-composed sets that generate really ‘‘thin’’ skeletons which are ‘‘one point thick,’’ and we also formally deﬁne what this means. Consequently, the problem of irreducible ‘‘thick’’ sets disappears in well-composed pictures. We will also show that skeletons have a graph structure and we deﬁne what this means. Restricting thinning to well-composed sets leads to an interesting situation. Although we delete points in fewer conﬁgurations, the resulting skeletons are thinner, as only 1/3 as many neighborhood conﬁgurations need be considered to decide if a given point can be deleted. In general, there are 18 types of 3 ; 3 neighborhoods of simple points (other than endpoints), which generate 108 neighborhood conﬁgurations by 90° rotations and reﬂections; of these, only 7 types are neighborhoods of simple points (other than endpoints) in well-composed sets, which generate only 36 conﬁgurations.

WELL-COMPOSED SETS

117

A. Jordan Curve Theorem and Euler Characteristic The Jordan curve theorem holds for well-composed sets. Thus, if we consider only subsets of Z that are well-composed, then we have no problems with the paradoxes presented in the introduction. Theorem 9 (Jordan curve theorem) T he complement of a simple closed curve in a well-composed image has exactly two components. Proof Let C be a simple closed curve in a well-composed image. Then C is a 4-curve. Rosenfeld (1979) proved, in particular, that if we consider 4-adjacency for C and 8-adjacency for C , then C has exactly two components. Our theorem follows easily from his theorem, since every 8-component is also a 4-component (Proposition 2). ■ Kong and Rosenfeld (1990) showed that if we use 4- (or 8-) connectedness for both a set and its complement, the Euler characteristic can not be computed by counting local patterns. It is well known (Minsky and Papert, 1969) that the Euler characteristic is locally computable if we use changeable 8/4- (or 4/8) connectedness. Theorem 1D shows that the Euler characteristic is also locally computable for well-composed images. Deﬁnition 7 Let S be a digital set. If S has n components and S has n components, then 1(S) : n 9 n ; 1 is called the Euler characteristic of S. Theorem 10 T he Euler characteristic is locally computable for well-composed images. Proof Minsky and Papert (1969, Chapter 5.8.1) proved this theorem using 4-adjacency for black points and 8-adjacency for white points. Our theorem follows easily from their theorem, since every 4-component is an 8-component of S and vice versa (Proposition 2), and therefore, we can use 4-adjacency for black points and 8-adjacency for white points in the computation of the Euler characteristic. ■ B. Thinning Thinning is a common pre-processing operation in digital image processing. Its goal is to reduce a set S to a ‘‘skeleton’’ in a ‘‘topology-preserving’’ way. Rosenfeld (1975) stated three requirements that a 2D thinning algorithm should satisfy: () Connectedness is preserved, for both the objects and their complements; (2) curves, arcs, and isolated points remain unchanged; and (1) upright rectangles, whose length and width are both greater than 1, do not remain unchanged.

118

LONGIN JAN LATECKI

In this section we present a sequential algorithm, and in Section IV.E a parallel algorithm, which fulﬁll these requirements. In addition, these algorithms preserve well-composedness and produce really thin sets (a concept which we will also deﬁne precisely). We begin with the deﬁnition of endpoints: Deﬁnition 8 A black point P is n-endpoint if it has exactly one black n-neighbor in N*(P), where n : 4 or 8. There exists only one type of 8-endpoints characterized by the fact that they have exactly one 8-neighbor. There exist two different types of 4endpoints which are endpoints in well-composed sets, namely those having exactly one 8-neighbor which is a 4-neighbor and conﬁguration 3 in Figure 12 (both of these conﬁgurations can occur as endpoints of 4-arcs). Before we give the standard 2D deﬁnition of a simple point (Rosenfeld, 1979; Kong and Rosenfeld, 1989), we want to remind the reader that either 8-connectedness is used for the foreground and 4-connectedness for the background, or vice versa. Deﬁnition 9 A black point p is said to be n-simple if (C1) p is n-adjacent to only one black n-component in N (p))p and (C2) p is m-adjacent to only one white m-component in N (p), where (n, m) : (8, 4) or (4, 8). An equivalent deﬁnition is that a point P + S is n-simple in a set S iff N*(P) contains just one n-component of S which is n-adjacent to P and P is an m-boundary point, where (n, m) : (8, 4) or (4, 8). For example, in Figure 12, conﬁguration of all 8-simple points (except 8-endpoints) are shown. Algorithms for checking whether a point is simple in 2D, as well as in 3D images, are presented in Latecki and Ma (1996), for example. Simple points are used in thinning algorithms: Deﬁnition 10 n-T hinning a digital set means repeated removal of n-simple points that are not n-endpoints. A digital set is termed n-irreducible if its only n-simple points are n-endpoints. An irreducible set obtained from a set by means of thinning is called a skeleton. One step in a sequential thinning algorithm consists of removal of a single simple point. One step in a parallel thinning algorithm consists of the simultaneous removal of some set of simple points. It is required that after every step of a thinning algorithm the connectedness of a set and of its complement are not changed (Rosenfeld’s condition ()). This is a simple consequence of the deﬁnition of simple points for sequential thinning, but

WELL-COMPOSED SETS

119

Figure 12. Conﬁgurations of all possible 8-simple points (except 8-endpoints). The middle point of conﬁguration 3 is a 4-endpoint but not an 8-endpoint.

requires a separate proof for parallel thinning. The special treatment of endpoints guarantees that Rosenfeld’s condition (2) is satisﬁed. If we want thinning to preserve well-composedness, only 4-simple points that are not endpoints can be delted. If we use changeable 8/4-connectedness, there are 18 types of 8-neighborhood conﬁgurations of 8-simple points

120

LONGIN JAN LATECKI

(not including endpoint conﬁgurations), which generate 108 8-neighborhood conﬁgurations of 8-simple points by rotations and reﬂections (Eckhardt and Maderlechner, 1993). Figure 12 shows all 18 types of 8-simple points. An important advantage of dealing only with well-composed sets in thinning processes is the fact that we have to treat only 7 types of 4-simple point neighborhoods (without endpoints): 7, 14, 15, 31, 62, 63, and 191 (see Figure 12). This corresponds to 36 neighborhood conﬁgurations obtained by rotations and reﬂections. The idea of deleting only 4-simple points in thinning an 8-connected object dates back to Rutovitz in 1966 and was proposed by different authors (Lu¨ and Wang, 1985; Stefanelli and Rosenfeld, 1971; Zhang and Suen, 1984); however, they did not use the concept of well-composed sets. In general, if we delete only 4-simple points to thin any subset of Z, we have problems with 8-components, since 4-simple points are not necessarily 8-simple, as the following conﬁguration shows:

Obviously, at least one of the critical conﬁgurations shown in Figure 13 appears in such a situation. If we treat well-composed images, these problems cannot occur. In fact, in well-composed sets every 4-simple point is 8-simple. Another very important property of well-composed sets is given in the following theorem. As already noted, in a well-composed set we can have only 4-simple points, since every component of a well-composed set is 4-connected. Therefore, thinning a well-composed set means removal of 4-simple points that are not endpoints. Theorem 11 Sequential 4-thinning is an internal operation on well-composed sets, that is, applying sequential 4-thinning to a well-composed set results in a well-composed set.

Figure 13. The critical conﬁgurations that appear in non-well-composed images.

WELL-COMPOSED SETS

121

Figure 14. Arcelli’s set with ﬁve interior points.

Proof If we delete any 4-simple point from a well-composed set, we obtain a well-composed set. To see this it is sufﬁcient to analyze the neighborhood conﬁgurations of 4-simple points that are not endpoints 7, 14, 15, 31, 62, 63, and 191 in Figure 12. ■ We conclude that thinning is an internal operation on well-composed sets if and only if only 4-simple points are deleted. It is easy to see that if we eliminate any 8-simple point which is not 4-simple, the resulting set will no longer be well-composed (see Figure 12). One of the most important goals of thinning is to obtain a skeleton of the input set. Therefore, the resulting skeleton, which cannot be further reduced by thinning, should not have any interior points. However, skeletons may have components of interior points of arbitrarily large sizes as shown in Eckhardt and Maderlechner (1993). The ﬁrst example of such a skeleton was given by Arcelli (1981) (see Figure 14 here). For well-composed sets the situation is very simple, since there is only one type of kernel component, namely a set with only one point, as will be shown in Theorem 12. This type of interior point cannot be further eliminated, since it indicates a very useful property, namely that we have an intersection of two lines at this point, that is, locally the following situation:

122

LONGIN JAN LATECKI

Eliminating such interior points would mean that a skeleton could not have such intersections of two line segments, which is an unrealistic assumption. Deﬁnition 11 A point of a digital set X that has all of its n-neighbors in X is called an n-interior point. The set of all interior points of a set X is termed the n-interior or n-kernel of X. A point of X that has an n-neighbor in the complement S is called an n-boundary point. Again for these concepts, as for all other concepts already deﬁned here, we will drop the preﬁx n if the n-adjacency relation is clear from the context. For illustration purposes, we will denote the different types of points by the following symbols:

. : black point . : (or blank position) white point

: interior point · : point of either black or white color. ■

Theorem 12 Any 8-connected component of the kernel of a 4-irreducible well-composed set contains at most one point. Proof We prove this by showing that it is not possible to have two adjacent interior points within a well-composed irreducible set. Without loss of generality we may assume that we consider interior points having boundary points as direct neighbors. We distinguish two cases (in the pictures given in the proof, the one on the left gives the start situation and the one on the right gives the situation constructed during the proof): (a) P and Q are two directly neighboring interior points such that the points N (P) and N (Q) are not interior points. If N (Q) were black then N (Q) necessarily is a (4-) simple point or an interior point. So, N (Q) must be white. Because N (Q) cannot be simple and as the set is required to be well-composed, N (N (Q)) is black. The same argument holds for P. Now, N (Q) becomes 4-simple, which is a contradiction.

WELL-COMPOSED SETS

123

(b) Assume now that there are two indirectly neighboring interior points P and Q. Without loss of generality Q : N (P) and N (P) is not an interior point. This means that N (N (P)) or N (P) is white. If one of these neighbors is black, then N (P) is simple, hence both are white. Now N (N (P)) must be black, for otherwise N (P) would again be simple. The conﬁguration thus obtained is no longer well-composed.

■

Deﬁnition 12 The crossing number (see, e.g., Hilditch, 1969, p. 411) is the number of white-black (0-1) transitions in the (cyclic) sequence N (P), N (P), . . . , N (P), N (P). Remark For well-composed sets, the crossing number is equal to the number of black 4-components in N*(P), since the crossing number is equal to the number of black 4-components in N*(P) if all 8-components in N*(P) are directly connected to P. This is the case if the critical conﬁgur ations (Figure 13) are not contained in N (P). C. Irreducible Well-Composed Sets We now investigate sets having the property of being 4-irreducible and well-composed. Such sets can be obtained by applying a 4-thinning process to a well-composed set (see Theorem 11). Irreducible sets obtained by ordinary thinning can contain all point conﬁgurations which are not simple. There are 148 such conﬁgurations (256 conﬁgurations of 3 ; 3 neighborhoods minus 108 conﬁgurations of simple points that are not endpoints); they are generated by 33 neighborhood types. The situation is more favorable if thinning algorithms are applied to well-composed sets. Theorem 13 For a point P in a 4-irreducible well-composed set, only the following neighborhood conﬁgurations (as well as symmetric conﬁgurations,

124

LONGIN JAN LATECKI

obtained from them by 90° rotations and reﬂections) are possible: 1. One direct black neighbor (4-endpoint)

2. Two direct black neighbors

3. Three direct black neighbors

4. Four direct black neighbors

WELL-COMPOSED SETS

125

Proof The proof is by enumeration of all possible cases, showing that if P occurs in any other conﬁguration, it must be a 4-simple point, which is not possible in a 4-irreducible set. 1. Let N (P) be black. N (P) and N (P) cannot be black, because then the set would not be well-composed. If N (P) were black, then P is a simple point and we have case 1b. The same holds for N (P). If N (P) and N (P) are white, we have case 1a. Therefore, the two conﬁgurations shown in the foregoing are the only possible conﬁguration types with one direct black neighbor. 2. This case is obvious. 3. Let N (P), N (P), and N (P) be black. If N (P) and N (P) are white, then we have case 3b. If only one of them is white, say N (P), then we have case 3a. As N (P) cannot be simple, N (N (P)) must be black and N (N (P)) must be white. As N (P) cannot be simple, N (N (P)) must be black and N (N (P)) must be white. 4. In this case P is an interior point. Assume that N (P) is black (see the picture that follows, where the set on the left represents the start situation, and the set on the right represents the situation constructed during the proof).

If N (P) were black, then N (P) is either an interior point, which contradicts Theorem 12, or else it is necessarily simple. The same argument applies to N (P), so N (P) and N (P) must be white. N (N (P)) is not white, because otherwise either N (P) would be simple (N (N (P)) white) or the set would not be well-composed (N (N (P)) black). Again, by symmetry, N (N (P)) is black. N (N (P)) must be white, as otherwise N (P) would be simple. Now, regardless of the color of N (N (P)), N (P) is always simple (as the set is assumed to be well-composed). It is now easily seen that the conﬁguration is as shown in the preceding. ■ Remark If P is an interior point of an irreducible well-composed set, then we have locally only the conﬁguration presented in part 4 of Theorem 13.

126

LONGIN JAN LATECKI

So, P can be treated as an intersection point of a vertical and a horizontal line segment. D. Graph Structure of Irreducible Sets The goal of thinning is formulated by different authors as obtaining a set that is ‘‘one pixel thick’’ ‘‘. . . a single pixel wide . . .’’ in Pavlidis (1982b, p. 143), ‘‘. . . until all that remains is lines which are one point wide . . .’’ in Hilditch (1969, p. 407), ‘‘T hus, a ﬁnal step might be necessary to reduce the set of the skeletal pixels to the unit width skeleton’’ in Arcelli and Sanniti di Baja (1989, p. 411), ‘‘A unitary skeleton is a single pixel thickness skeleton, in which each of its pixels is connected to not more than two adjacent pixels unless it represents a treepoint’’ in Abdulla et al. (1988, p. 13), ‘‘Overall, these applications employ thinning (a) to reduce line images to medial lines of unit width, (b) to enable objects to be represented as simpliﬁed data structures (e.g., by chain-coding) . . .’’ in Davies and Plummer (1981). Bearing in mind Arcelli’s example (Figure 14), one might wonder how to give this requirement a precise meaning. Using the concept of well-composed sets, we can now propose the following deﬁnition: Deﬁnition 13 A digital set is one pixel thick if it is well-composed and 4-irreducible. Thus, we may give Theorem 12 an alternative formulation: Theorem 14 The skeleton of a well-composed set is one pixel thick. On the other hand, by ‘‘thinness’’ is meant intuitively that the skeleton should have a ‘‘graph-like’’ structure, as is expressed in the informal deﬁnition in Abdulla et al. (1988) or Davies and Plummer (1981). As any digital set that is equipped with a neighborhood relation is a graph, we should make this concept more precise by formalizing the concept of a ‘‘graph-like structure.’’ Deﬁnition 14 For any point P the 8-connection number C (P) is the number of black 8-components in N*(P) and the 4-connection number C (P) is the number of black 4-components in N*(P) that are directly connected to P. Obviously C (P) is never greater than C (P). Deﬁnition 15 An n-graph point of a digital set is a point having the property that its n-connection number equals the number of its black n-neighbors, where n : 4 or 8. A digital set is termed an n-graph if all of its points are n-graph points, where n : 4 or 8. The 8-skeleton of a digital set is not necessarily an 8-graph. The following

WELL-COMPOSED SETS

127

set is 8-irreducible, but P is not an 8-graph point:

The same negative assertion holds for the 4-skeleton. In conﬁguration 3.a of Theorem 13, point P is not a 4-graph point. However, if this conﬁguration is 8-thinned, we obtain the following conﬁguration which consists entirely of 8-graph points:

Deﬁnition 16 We deﬁne a point of a digital set to be a 4/8-graph point if it is either a 4-graph point or an 8-graph point. A 4/8-graph is a digital set consisting entirely of 4/8-graph points. Now we can formulate the ideas of the last part of this section in the following theorem: Theorem 15 If a 4-irreducible well-composed set is postprocessed by 8-thinning applied to all conﬁgurations of type 3a of Theorem 13, the resulting set is a 4/8-graph. E. Parallel Thinning on Well-Composed Sets In sequential thinning, where only one simple point is deleted at each step, connectivity is evidently preserved. Parallel thinning, on the other hand, may not preserve connectivity. For example, if the simple points in the central column of the following set are deleted simultaneously, the set becomes disconnected. In order for parallel thinning algorithms to be topologically correct, one must avoid such situations.

128

LONGIN JAN LATECKI

When we investigate parallel thinning methods on well-composed sets, we are faced with an additional dilemma. Parallel elimination of points, even if it is designed so as to be topologically correct, may destroy well-composedness of sets, as shown by the example in Figure 15. Here simultaneous elimination of two 4-simple points yields a set that is not well-composed. Our goal in this section is to construct a parallel thinning algorithm for well-composed sets that is topologically correct and preserves well-composedness. We also want the resulting skeletons to be ‘‘thin,’’ that is, we want their kernel components to be as small as possible, as is the case for sequential thinning of well-composed sets. The simplest possibility is to use a 4-phase thinning algorithm as described by Rosenfeld (1975). This algorithm removes only one type of border point at each phase: north border points are removed in the ﬁrst phase, east in the second, and then south and west border points (where, for example, a north border point of S is one whose second neighbor is in S ). Rosenfeld showed that this algorithm is topologically correct. If we allow only 4-simple points to be deleted, this algorithm also preserves well-composedness, and the resulting skeletons are 4-irreducible. Therefore, Theorem 12 also holds for this 4-phase thinning. However, 4-phase algorithms are not very efﬁcient, as only 1/4 of the possible points can be deleted in one phase. It is also possible to thin well-composed sets using one-phase parallel algorithms. However, such algorithms must examine a relatively large neighborhood of every point, since it is well known (Rosenfeld, 1975) that a parallel

Figure 15. Parallel 4-thinning is not necessarily an internal operation on well-composed sets.

129

WELL-COMPOSED SETS

thinning algorithm based on a 3 ; 3 neighborhood cannot preserve connectedness. We will now present a parallel thinning method for well-composed sets that minimizes the number of phases while at the same time minimizing the size of the neighborhoods used. It is a two-phase method consisting of one marking phase and one elimination phase. In the ﬁrst phase the candidates for elimination are marked, and in the second phase these candidates are eliminated if they fulﬁll a simple condition described in what follows. In Eckhardt and Maderlechner (1993) the concept of a perfect point was introduced. Deﬁnition 17 A point Q of a digital set is termed perfect if

· Q has a direct neighbor that is an interior point, say N (Q), where k + ,0, 1, 2, 3-.

· The direct neighbor of Q opposite the interior point N (Q) is white.

For example, if Q in the following conﬁguration is black, then it is perfect.

Deﬁnition 18 The south neighborhood of a point P is SN(P) : ,P, N (P), N (P), N (P), N (P), N (P)-:

In a similar way the north neighborhood of P can be deﬁned. In the ﬁrst step of a parallel thinning algorithm, we mark with ‘‘c’’ direct neighbors of P that are perfect points as candidates for deletion. In the second step, only points marked ‘‘c’’ can be deleted. While deleting the

130

LONGIN JAN LATECKI

marked points, we must take care that in the following situations, at most one of the points marked ‘‘c’’ can be deleted:

Parallel Thinning Algorithm Let S be a set of black points to be thinned and let P be any point in S. First Step Candidates for deletion are marked. If P is a perfect point, then P is marked c (candidate for deletion). Second Step Deletion of marked points. If a point P is marked c and no critical conﬁguration shown here occurs in the south neighborhood of P, then P is deleted (i.e., its color is changed from black to white).

In the ﬁrst step exactly the perfect points are marked as candidates, which are automatically simple points in a well-composed set, as the following proposition states. In fact every parallel thinning algorithm can be modiﬁed using the above two-step technique so that it maps a well-composed set to a well-composed set. In the ﬁrst step the points that would have been deleted by the original algorithm are marked. The second step is identical with the above one. Proposition 8 Every candidate point in a well-composed set is a simple point. Proof Because the point is a candidate point, we have the following situation (possibly rotated by a multiple of 90°):

WELL-COMPOSED SETS

131

Whatever color the points marked ‘‘·’’ have, the candidate point ‘‘c’’ is obviously simple, as the set is well-composed. ■ As every candidate is a perfect point, we are also guaranteed that spikes will be left unchanged. Note also that endpoints are preserved from deletion, because points that are deleted are simple and perfect, and a perfect point has a direct neighbors that is an interior point. The condition in the second step of this algorithm prevents deletion of a candidate if there is a critical conﬁguration involving another candidate to the south. However, it cannot be that the deletion of every candidate will be prevented by this condition, as there always is a candidate having no critical conﬁguration to the south. Therefore, the number of interior points decreases after every application of this algorithm. Hence, after we ﬁnish applying the algorithm, there are no 4-simple perfect points in the resulting set. Now we prove that the algorithm preserves both well-composedness and connectivity. Theorem 16 The two-step parallel thinning method described in the preceding is an internal operation on well-composed sets. Proof Well-composedness is destroyed only when a critical conﬁguration occurs and both candidates (marked with ‘‘c’’) are deleted. This is prevented by deleting only candidates having no diagonal neighbors to the south that ■ are also candidates. Theorem 17 The two-step parallel thinning method is topologically correct on well-composed sets. Proof The proof is based on Ronse’s conditions (1988), of which a simpliﬁed version for well-composed sets is the following: A parallel thinning algorithm on a well-composed set S preserves connectedness of the set and its complement if: 1. only simple points are deleted; and 2. for any two 8-adjacent points P and Q of S, P is simple after Q has been deleted, and Q is simple after P has been deleted. By Proposition 8, only simple points can be candidates, and therefore only simple points can be deleted. So it remains to show the second condition. Assume that P and Q are both candidates and direct neighbors. The only possibility for such a situation is indicated in the conﬁguration here (up to

132

LONGIN JAN LATECKI

rotations by multiples of 90°).

If, for example, N (Q) is an interior point and N (Q) is white, then N (P) must be an interior point and N (P) must be white. It is now easy to see that P will be simple in S),Q-, and that the same holds with P and Q interchanged. Let P and Q be diagonal neighbors. Note that both points have to be simple and perfect in S in order to be deleted. In this case deletion of a diagonal neighbor cannot disconnected or delete a black 4-component in the 8-neighborhood of either P or Q. Therefore, P will be simple in S),Q-, and the same holds with P and Q interchanged. ■ Theorem 18 The 4-kernel of a well-composed set that contains no 4-single perfect points is either empty or consists of 8-isolated components having one of the following forms:

Proof The proof follows from Propositions 9 and 10 in the sequel. In a horizontal or vertical line there can at most be two adjacent kernelboundary points. The conﬁguration in which the kernel contains two successive kernel-boundary points in a diagonal line such that one of their two common direct neighbors is not an interior point is impossible. Thus if the kernel contains two successive kernel-boundary points in a diagonal line, then their two common direct neighbors are also interior points. In this case the four kernel points form a square. ■ Deﬁnition 19 A point in the kernel of a set is termed a kernel-boundary point if it has at least one direct neighbor that is not in the kernel.

WELL-COMPOSED SETS

133

Proposition 9 If the 4-kernel of S contains three successive kernel-boundary points in a horizontal or vertical line then S contains a point that is (4- and 8-) simple and perfect. Proof We have the following situation:

At least one of the points marked ‘‘·’’ is necessarily white, because otherwise the middle point would not be a kernel-boundary point. ■ Proposition 10 If the (4-)kernel of S contains two successive kernel-boundary points in a diagonal line, and one of their two common direct neighbors is not an interior point, then either S is not well-composed or it contains a point that is (4- and 8-) simple and perfect. Proof We have the following situation:

At least one of the points marked ‘‘·’’ must be white. Without loss of generality assumes that the lower ‘‘·’’, call it Q, is white. If N (Q) were black, then N (Q) would be simple; hence both Q and N (Q) are white. If N (Q) were black, the set would not be well-composed; if it were white, N (Q) would be simple. ■ If the parallel thinning method described in the preceding is applied repeatedly to a well-composed set, the ﬁnal remaining set does not contain

134

LONGIN JAN LATECKI

any points that are simple and perfect. This set might contain kernel components as in Theorem 18. We will now show that the application of a sequential 4-thinning process to this remaining set results in a 4-irreducible well-composed set which satisﬁes Theorem 12 and which automatically has all the properties described in Sections IV.C and D. In this sequential thinning process each point in the set is examined once, and if it is simple, it is eliminated. Theorem 19 Assume that a well-composed set S contains no points that are simple and perfect. If the sequential 4-thinning process is applied to S, examining each point of S just once, the resulting set is 4-irreducible. Proof There can be points of S that are simple but not perfect. We will show that after the sequential 4-thinning process is applied to S, no 4-simple points remain in the resulting set (which is also well-composed). It is clear that after the sequential 4-thinning process is applied to S, all simple points in S either have been deleted or are no longer simple. We now have to show that no point in the resulting set can become simple. More precisely, we have to show that the following situation is impossible: There is a point P + S that is not 4-simple in S, and there is a set SP : N (P) of 4-simple points in S such that P is 4-simple in S)SP. We distinguish two cases: 1. The 4-connection number C (P) is equal to 1 with respect to S. In this case, P has at most one white indirect neighbor and as P is not simple in S, it must be an interior point. Therefore we have (up to 90° rotations) the following situation:

All points N (N (P)), i : 0, 1, 2, 3, must be black, because otherwise N (P) would be simple and perfect. Thus, N (P), i : 2, 3, are interior points. By the same argument N (N (P)), N (N (P)), N (N (P)), and N (N (P)) are black. Thus we have a kernel component containing ﬁve points (P and N (P), i : 3, 4, 5, 6), which contradicts Theorem 18.

WELL-COMPOSED SETS

135

2. C (P) 1 with respect to S. The point P should be 4-simple with respect to the set S)SP. Therefore, it has in this set one of the conﬁgurations 7, 14, 15, 31, 62, 63, or 191 in Figure 12. Conﬁgurations 63 and 191, however, are not possible because C (P) 1 with respect to S. By the same argument conﬁguration 62 is not possible because the set S is well-composed. To obtain conﬁguration 31, SP consists of only one point, the 6-neighbor of P. As this latter point must be simple, it would have to be perfect, which is a contradiction. There remain only conﬁgurations 7, 14, and 15. We start with conﬁguration 15.

As C (P) 1, N (P) and possibly N (P) are black in S, the points N (P) and N (P) are necessarily white. If N (P) is black, then N (P) is not 4-simple. As a consequence, N (P) must be eliminated as the last point in SP. For N (P) to be eliminated, it must be simple in S),N (P)-. But then N (P) is an endpoint (conﬁgura tion 3). A similar argument applies to conﬁguration 14 with point N (P) as the last point in SP to be eliminated. In case of conﬁguration 7, we can apply the same argument either to N (P) or to N (P) as the last point in SP. ■ F. Making a Binary Image Well-Composed It may happen that a segmented image is not well-composed. In this section we present two different approaches to ‘‘repair’’ a binary image so that the resulting image is well-composed.

136

LONGIN JAN LATECKI

In the ﬁrst approach we locally repair it by adding (or deleting) a minimal number of black points. If a picture is not well-composed, then there is an 8-component of it, which is not a 4-component. Consequently, repairing this situation changes the topology of the set. Our goal will be to repair a picture while keeping the unavoidable changes minimal. In the second approach we obtain a well-composed image without changing its topology. We extend the input binary image to its double resolution by inserting black, white, and boundary points. In the obtained image all three sets will be well-composed, that is, the sets of black, white, and boundary points will be well-composed. The other possibility, which may be more promising for applications, is to impose local conditions on the segmentation process that guarantee that the obtained segmented image is well-composed or to repair a gray-level image as described in Section IV.G. The following theorem describes the ﬁrst approach. Theorem 20 The following parallel algorithm makes a well-composed picture from a given picture by adding a minimal number of points. Repairing Algorithm In the following pictures the possible local conﬁgurations are presented (up to rotations and reﬂections). The left hand conﬁguration gives the original situation, the right hand conﬁguration gives the repaired situation obtained by changing white points to black.

1.

2.

At least one of the points ‘‘ * ’’ must be black.

WELL-COMPOSED SETS

137

3.

4. (a)

It may happen that here a new critical conﬁguration occurs. This is the case if and only if the following situation (in the larger neighborhood) is given: (b)

Proof If in cases 3 and 4.a we add only the black point marked ‘‘1’’, then a new critical conﬁguration occurs. Therefore we have to add the black point marked ‘‘2’’. If in case 4.b we add only the black point marked ‘‘1’’, then a new critical conﬁguration occurs. Therefore we have to add the black point marked ‘‘2’’. But adding point marked ‘‘2’’ causes again the critical situation that disappears when the point marked ‘‘3’’ is added. In the remaining cases no further critical situation occur. ■ The repairing algorithm described by Theorem 20 is invariant up to 90° rotations and reﬂections. If this invariance does not matter, we can repair a given picture to obtain a well-composed picture by adding fewer points. This can be achieved by considering only south-neighborhoods in the conﬁgurations already given and their reﬂections at a vertical axis. We can also repair any set to obtain a well-composed set by deleting black points. An algorithm for this purpose can be formulated if black and

138

LONGIN JAN LATECKI

Figure 16. Expanding a binary image.

white points are changed. This is due to the duality of well-composedness (i.e., a set S is well-composed if and only if S is well-composed). The second approach to make a binary image well-composed is based on the image expansion described in Ko¨the (2000). We expand every 2 ; 2 square in the original image to a 3 ; 3 square, see Figure 16. We need to determine the values a, b, c, d, m as functions of the original binary values A, B, C, D. As the conﬁgurations are considered modulo 90° rotations and reﬂection, it is sufﬁcient to determine the value of a as a function of A and B in Figure 16: 1. If pixels A and B are both black or white, then the new pixel a will be assigned their color. 2. If A and B have different colors, then a : ;, which means that a is a boundary pixel. It remains to assign a value to the middle pixel in the extended image, which is pixel m in Figure 16b: 3. If a : ; or b : ; or c : ; or d : ;, then m : ;. 4. Otherwise the pixels A, B, C, D, a, b, c, d are all either white or black, and m will be assigned their color. An example is given in Figure 17, where (b) $ (c) is done by rules 1 and 2 and (c) $ (d) is done by rules 3 and 4. Proposition 11 The preceding procedure given by rules 1—4 always yields a well-composed image when applied to a binary image, that is, the sets of black, white, and boundary points are all well-composed sets. Proof The proof of this proposition is very simple. We consider a 2 ; 2 square in the resulting image, for example, square a, B, b, m in Figure 16b. If a : x and b : x, then m : x by rule 3, and consequently, the set of boundary points is well-composed. If a is black, then B must also be black (by rules 1 and 2), and consequently the set of black points is wellcomposed. The same argument applies to the set of white points. ■

WELL-COMPOSED SETS

Figure 17. Expanding a binary image to a well-composed image.

139

140

LONGIN JAN LATECKI

G. Making a Gray-Level Image Well-Composed Following Ko¨the (2000), we present a simple procedure to make a gray-level image well-composed and at the same time improve its quality. It is based on image expansion by bilinear interpolation. Deﬁnition 20 A gray-level image is well-composed if for every threshold a binarization of the gray values results in a binary well-composed image. The property of well-composedness can be locally tested by the following condition: Proposition 12 A gray-level image is well-composed iff for every 2 ; 2 square ABCD (see Figure 18a) the diagonal intervals have a nonempty intersection: [min(A, D), max(A, D)] . [min(B, C), max(B, C)] " `. where A, B, C, D are gray values.

■

Expanding a Gray-Level Image to a Well-Composed One We expand every 2 ; 2 square to a 3 ; 3 square by a slightly modiﬁed bilinear interpolation. Let A, B, C, D be gray values of a 2 ; 2 square in the input image arranged as shown in Figure 18a. We assign the following values to a, b, c, d, m in Figure 18b: 1. a :

A;B B;D A;C C;D , b: , c: , d: 2 2 2 2

2. m :

A;B;C;D if [min(A, D), max(A, D)] . [min(B, C), max(B, C)] 4

"` 3. m :

max(A, D) ; min(B, C) if [min(A, D), max(A, D)] 2 . [min(B, C), max(B, C)] : ` and max(A, D)min(B, C)

Figure 18. Expanding a gray-level image.

WELL-COMPOSED SETS

141

Figure 19. Expanding a gray-level image to a well-composed image.

4. m :

max(B, C) ; min(A, D) if [min(A, D), max(A, D)] 2 . [min(B, C), max(B, C)] : ` and max(B, C) min(A, D)

An example is given in Figure 19. Using the condition in Proposition 12, we prove that these expanding equations always yield a well-composed gray-level image. Theorem 21 By the slightly modiﬁed bilinear interpolation given by rules 1—4, every gray-level image can be expanded to a well-composed image. Proof For the proof we assume that A B C. If B D, then [min(A, D), max(A, D)] . [min(B, C), max(B, C)] : [A, D] . [B, C] " `, which means that the original square ABCD is well-composed. In this case, the value of m is assigned by rule 2, and it is easy to check that the obtained 3 ; 3 square is well-composed. Now we consider the case when D B. We additionally assume that A D, the case where D A is very similar. Thus, we have A D B C. We obtain [min(A, D), max(A, D)] . [min(B, C), max(B, C)] : [A, D] . [B, C] " `, which means that the original square is not well-composed. In this case we have by rule 3 m:

max(A, D) ; min(B, C) D ; B : . 2 2

Again it is easy to check that the obtained 3;3 square is well-composed. ■ The gray-level image obtained by the slightly modiﬁed bilinear interpolation is not only well-composed, but its quality is signiﬁcantly improved. Ko¨the (2000) veriﬁed by numerous experiments that thresholding an expanded image leads to a signiﬁcantly better binarization than thresholding the original image. The same applies to edge detecting ﬁlters.

142

LONGIN JAN LATECKI

V. Digitization and Well-Composed Images This section is based on Latecki et al. (1998). We model a digitization process by a direct relation between a continuous object and a segmented object in the digital image (see Figure 20, lower part). This approach allows us to avoid tedious details of signal analysis (see Figure 20, upper part). Due to this object-based approach, we can directly compare the features of a continuous object and its digital image. This kind of interpretation of digitization processes for relating topological properties is used in Pavlidis (1982a), Serra (1982), Gross and Latecki (1995), and Latecki et al. (1998). In this section we show that an output digital image must be wellcomposed if the resolution of the digitization process is ﬁne enough to guarantee topology preservation. Because in every mathematical model of a digitization process we ﬁrst need to model real ‘‘continuous’’ objects, we begin with a deﬁnition of continuous planar sets, called parallel regular sets, that are reasonable models of real objects. The properties of the parallel regular sets will allow us to determine the resolution of the digitization process. A. Continuous Representation of Real Objects For the digitization approach, it is necessary to explicitly characterize continuous representations of real objects, as these representations will be mapped to discrete representations by functions modeling real digitization processes. Thus, continuous representations of real objects are the starting point for this approach. Therefore, in this section we describe the classes of continuous representations of real objects that will be used as input to digitization functions. Any continuous model of some class of real objects should on the one hand be able to reﬂect relevant shape properties as exactly as possible, and on the other hand should be mathematically tractable, in the sense that it should allow for precise, formal description of the relevant properties. For example, it does not make much sense to model the boundaries of 2D projections of real objects as all possible curves in R. This class is too general to allow us to formally describe any shape properties of sets in this class, and it contain curves with very unnatural properties, for example, plane ﬁlling curves, that deﬁnitely do not model boundaries of planar objects. Therefore, some restrictions must be added. The class of parallel regular sets, which we deﬁne in what follows, is deﬁned using osculating balls or equivalently using normal vectors at boundary points. However, we will not use the classical deﬁnition of these tools of differential geometry. Differential geometry is based on the concept

WELL-COMPOSED SETS

Figure 20. A comparison of signal-based and object-based approaches to relating spatial objects and their digital images.

143

144

LONGIN JAN LATECKI

of derivative, which requires the calculation of limits of inﬁnite sequences of numbers. As this calculation cannot be transferred into discrete spaces, no analog of the concept of derivative in discrete spaces exists that has similar properties. Instead we will deﬁne osculating balls and normal vectors in such a way that their deﬁnitions can be directly translated to digital spaces. Let A be a planar set. We denote by A the complement of A, by bdA the topological boundary of A, by intA the topological interior of A, and by clA the topological closure of A in the usual topology of the plane induced by the Euclidean metric. The connected components of the boundary bdA are called contours. We denote by d(x, y) the Euclidean distance of points x, y and by B(c, r) a closed ball of radius r centered at a point c. Deﬁnition 21 We will say that a closed ball B(c, r) is tangent to bdA at point x + bdA if bdA . bd(B(c, r)) : ,x- (see Figure 21). We will say that a closed ball iob(x, r) of radius r is an inside osculating ball of radius r to bdA at point x + bdA if bdA . bd(iob(x, r)) : ,x- and iob(x, r) * intA / ,x- (see Figure 21). We will say that a closed ball oob(x, r) of radius r is an outside osculating ball of radius r to bdA at point x + bdA if bdA . bd(oob(x, r)) : ,x- and oob(x, r) * A / ,x- (see Figure 21). Note that x is a boundary point, not the center, of both iob(x, r) and oob(x, r). According to this deﬁnition, for every boundary point of a given ball B(c, s) of radius s, there exist inside osculating balls of radii r, where 0 r s. However, B(c, s) itself is not an inside osculating ball for any of its boundary points. Now we deﬁne parallel regular subsets of the plane: Deﬁnition 22 We assume that A is a closed subset of the plane such that its boundary bdA is compact.

Figure 21. The inside and outside osculating balls of radius r to the boundary of the set A at point x.

WELL-COMPOSED SETS

145

Figure 22. The set A is par(r)-regular while the set B is not par(r)-regular, where r is the radius of the depicted circles.

A set A will be called par(r, ;)-regular if there exists an outside osculating ball oob(x, r) of radius r at every point x + bdA. A set A will be called par(r, 9)-regular if there exists an inside osculating ball iob(x, r) of radius r at every point x + bdA. A set A will be called par(r)-regular (or r parallel regular) if it is par(r, ;)regular and par(r, 9)-regular. A set A will be called parallel regular if there exists a constant r such that A is par(r)-regular. In Figure 22 the set A is par(r)-regular while the set B is not par(r)regular, where r is the radius of the depicted circles. Note that a parallel regular set, as well as its boundary, does not have to be connected. We have the following equivalence: Theorem 22 A set A is par(r)-regular iff, for every two distinct points x, y + bdA, the outer normal vectors n(x, r) and n(y, r) exist and they do not intersect, and the inner normal vectors 9n(x, r) and 9n(y, r) exist and they do not intersect. ■ The proof of Theorem 22 as well as a deﬁnition of normal vectors that is not based on the concept of derivative are given in Latecki et al. (1998). In Figure 23, set X is not par(r)-regular while set Y is par(r)-regular, where r is the length of the depicted vectors, for example. Proposition 13 Let A be a par(r)-regular set. If x and y belong to two different components of bdA, then d(x, y) 2r. Proof Let C , . . . , C be all connected components of bdA (there is only a ﬁnite number of them, since bdA is compact), where n 2. For every i " j, i, j + ,1, . . . , n-, let d : C ; C R be the Euclidean distance d restricted to C ; C . Since d is a continuous function on a compact set, there exists

146

LONGIN JAN LATECKI

Figure 23. X is not par(r)-regular, but Y is par(r)-regular.

(c , c ) + C ; C such that d (c , c ) 0 is the minimal value of d . Let a pair (c , c ), k " m, be such that d (c , c ) d (c , c ) for all i, j + ,1, . . . , n- with i " j. We obtain that d(x, y) d(c , c ) : d (c , c ) for every x and y belong ing to two different components of bdA. We now show that d(c , c ) 2r. Assume that d(c , c ) 2r. Consider the closed ball B such that c , c + bdB and the line segment c c is the diagonal of B (see Figure 24). Clearly, the radius of B is not greater than r and B . bdA : ,c , c -. Therefore, either B * A or intB * A . We assume intB * A . The proof in the second case is analogous. Every closed ball OB such that OB is a proper subset of B and OB . B : ,c - is an outside osculating ball of A at c . Since the radius of B is not greater than r and the center of B is collinear with all centers of balls OB, we obtain that B is an outside osculating ball of A at c . Yet, this contradicts the fact that B . bdA : ,c , c -. Therefore, d(c , c ) 2r, and consequently d(x, y) 2r for every x and y belonging to two ■ different components of bdA.

B. Digitization and Segmentation Deﬁnition 23 Let Q be a cover of the plane with closed squares with the diagonal of length r such that if two squares intersect, then their intersection is either their common side or a corner point. Such a cover is called a square grid with diameter r. A digital image can be described as a set of points located at the centers of the squares of a grid Q and that are assigned some value in a gray level or color scale. By a digitization process we understand a function mapping a planar set X to a digital image. By a segmentation process we understand a process grouping digital points to a set representing a digital object. Therefore, the output of a segmentation process can be interpreted as a binary digital image, where each point is either black or

WELL-COMPOSED SETS

147

Figure 24.

white. We assume that digital objects are represented as sets of black points. Thus, the input of a digitization and segmentation process is a planar set X and the output is a binary digital image, which will be called a digitization of X with diameter r and denoted Dig(X, r). We will interpret a black point p + Dig(X, r) as a closed (black) square of cover Q centered at p and the digitization Dig(X, r) as the union of closed squares centered at black points, that is, Dig(X, r) will denote a closed subset of the plane. We will treat digitization and segmentation processes satisfying the following conditions relating a planar par(r)-regular set X to its digital image Dig(X, r): ds1 If a square q + Q is contained in X, then q + Dig(X, r) (i.e., q is black). ds2 If a square q + Q is disjoint from X, then q , Dig(X, r) (i.e., q is white). ds3 If a square q is black and area(X . q) area(X . p) for some square p + Q, then square p is black. These conditions seem to form an acceptable model of the digitization and segmentation process in image document processing, where an image is captured by a scanner and segmented by thresholding, if we exclude digitization and segmentation errors: (a) The sensor values are monotone with respect to the area of the object ‘‘seen ’’ by sensors, that is, if the area of the object seen by sensor s is greater than the area of the object seen by sensor s , then the gray-level value of s is greater than the gray-level value of s . (b) The inﬂuence of blurring is so small that it can be neglected, due to the fact that the distance of objects from the sensor is known and the scanner is calibrated accordingly. (c) The gray-level images obtained by the digitization process are segmented by thresholding, which is the standard segmentation technique in document analysis.

148

LONGIN JAN LATECKI

Figure 25. (a) The union of all squares represents the intersection digitization of the ellipse. (b) The two squares represent the square subset digitization of the ellipse. (c) The eight squares represent a digitization of the ellipse with the area ratio equal to 1/5.

In the following, we deﬁne some important digitization and segmentation processes satisfying conditions ds1, ds2, and ds3. Deﬁnition 24 Let X be any set in the plane. A square p + Q is black (belongs to a digital object) iff p . X " `, and white otherwise. We will call such a digital image an intersection digitization with diameter r of set X, and denote it with Dig.(X, r), namely Dig.(X, r) : +,p + Q : p . X " `-. See Figure 25a, for example, where the union of all depicted squares represents the intersection digitization of an ellipse. With respect to real camera digitization and segmentation, the intersection digitization corresponds to the procedure of coloring a pixel black iff there is part of the object A in the ﬁeld ‘‘seen’’ by the corresponding sensor. When digital straight line models were ﬁrst studied by Freeman and Rosenfeld, the digitization models that were assumed were based on the intersection digitization, which is called square-box quantization by Freeman (1970). Now we consider digitizations corresponding to the procedure of coloring a pixel black iff the object X ﬁlls the whole ﬁeld ‘‘seen’’ by the corresponding sensor. For such digitizations, a square p is black iff p * X and white otherwise. We will refer to such a digital image of a set X as a square subset digitization and denote it by Dig:(X, r), where Dig:(X, r) : +,p + Q : p * X-. In Figure 25b the two squares represent Dig:(X, r), where X is the ellipse. Next, let us consider a digitization and segmentation process in which a pixel is colored black iff the ratio of the area of the continuous object in a sensor square to the area of the square is greater than some constant threshold value v. An example is given in Figure 25c, where the squares represent a digitization of the ellipse with the ratio equal to 1/5. A square p + Q is black iff area(p.X)/area(p)v and white otherwise, where 0 v1 is a constant. We will refer to such a digital image of a set X as a v-digitization of X with diameter r. This process models a segmentation by applying a threshold value to a gray-level digital image for all real devices

WELL-COMPOSED SETS

149

in which the sensor values can be assumed to be monotonic with respect to the area of the object in the sensor square. We will denote such digitizations by Dig (X, r). We recall that we identify the digitization of X with the union of black closed squares. Thus Dig (X, r) denotes the digital picture, which is the union of black closed squares. We will also denote Dig (X, r) as the digitization in which the ratio of the area is equal to 1. We have the following inclusions Dig:(X, r) * Dig (X, r) * Dig.(X, r) for every v + [0, 1] and Dig (X, r) * Dig (X, r) if w v for every v, w + [0, 1]. We will hereafter use Dig(X, r) without subscript to denote any digitization and segmentation process satisfying ds1, ds2, and ds3. Thus, in particular, Dig(X, r) denotes Dig.(X, r), Dig:(X, r), and Dig (X, r) for every v + [0, 1]. C. Digitizations Produce Well-Composed Images In Latecki et al. (1998), we proved the following result: Theorem 23 Let A be a par(r)-regular bordered 2D manifold. Then A and Dig(A, r) are homeomorphic for every digitization Dig(A, r) (which satisﬁes conditions ds1, ds2, and ds3). ■ An important step in proving Theorem 23 is the following theorem whose proof we restate here: Theorem 24 If A is par(r)-regular, then Dig(A, r) is well-composed, that is, the pattern shown in Figure 26 and its 90° rotation cannot occur in any Dig(A, r). Proof Let c be the common vertex of all four closed squares. We ﬁrst assume that c , A and show that the pattern shown in Figure 26 and its 90° rotation cannot occur in the conﬁguration of the four squares. Let S and S be black, and S and S be white as shown in Figure 27a, where S , S , S , and S are closed squares. We prove that this assumption leads to a contradiction. As A is closed and c , A, there is an e 0 such that B(c, e) . A : `, where B(c, e) denotes (as always) a closed ball. There must be points of A in both squares S and S , because if S . A : `, then S would be white by ds2. Therefore, S . A " `, and similarly S . A " `.

150

LONGIN JAN LATECKI

Figure 26. This pattern and its 90° rotation cannot occur in every Dig(A, r).

Let p be a point with the shortest distance t to c in S . A. Let p be a point with the shortest distance d to c in S . A. Clearly, points p , p belong to bdA and t 0, d 0, because c , A, c + S , and c + S . Without loss of generality, we assume that t d. Consider the closed ball B(c, t). We show that p and p belong to two different components of B(c, t) . bdA. Assume that this is not the case. Then, for some component C of bdA, it follows that C : arc (p , p ) / arc (p , p ), arc (p , p ) . arc (p , p ) : ,p , p -, and arc (p , p ) * B(c, t) or arc (p , p ) * B(c, t). Assume arc (p , p ) * B(c, t) and, without loss of generality, assume that arc (p , p ) goes through S . Then arc (p , p ) . face(S , S ) . (B(c, t))B(c, e)) " ` and arc (p , p ) . face(S , S ) . (B(c, t))B(c, e)) " `, where face denotes the common face of two squares (see Figure 27b). In this case, arc (p , p ) . S * (B(c, t))B(c, e)) . S .

Figure 27. The small circle illustrates ball B(c, e) and the big circle illustrates ball B(c, t).

WELL-COMPOSED SETS

151

As the diameter of square S is r, no component other than C of bdA intersects S , by Proposition 13. Therefore, A . S contains (S )B(c, t)) together with part of A between arc (p , p ) and bdB(c, t) in S . We also have that A . S * (S )B(c, t)), since no point in S . A is closer to c than distance t (by the deﬁnition of constant t). Consequently, area(A . S ) area(A . S ). Thus, square S should be black. This contradiction implies that arc (p , p ) * / B(c, t) for i : 1, 2. Therefore, bdA . B(c, t) has at least two components, one containing p and the second containing p . In each of these components there is a point with the shortest distance ( t) to c; call them x and x . Then c + n(x , r) . n(x , r), a contradiction. We have thus shown for a par(r)-regular set A that (*) if c , A, then the pattern shown in Figure 26 and its 90° rotation cannot occur in the four squares of Dig(A, r) which have c as their common vertex. The case in which c + A)bdA follows directly from the result already given and applied to the digitization of the complement A of A (i.e., the roles of A and A are interchanged). For completeness, the proof is given in what follows. Let S and S be black, and S and S be white, as shown in Figure 27a, where S , S , S , and S are closed squares. Without loss of generality, we assume that area(S . A) area(S . A) and area(S . A) area(S . A). Then, by ds3, area(S . A) area(S . A). We will digitize the set B that is the closure of the complement of A, that is, B : cl(A ). As A is par(r)-regular, B is also par(r)-regular. Clearly, area(S . B) : area(S ) 9 area(S . A) for i : 1, 2, 3, 4. As area(S . B) : 9area(S . A) ; area(S ), where area(S ) is a constant value, we obtain that area(S . B) area(S . B) as well as area(S . B) area(S . B) and area(S . B) area(S . B). Thus, in Dig(B, r) we have the pattern: S and S are white, and S and S are black. With c , B, we obtain by the preceding result (*) applied to B that this pattern cannot occur in squares S , S , S , and S , which belong to Dig(B, r). The obtained contradiction proves that if c + A)bdA, then the pattern shown in Figure 26 and its 90° rotation cannot occur in the four squares of Dig(A, r) which have c as their common vertex.

152

LONGIN JAN LATECKI

Figure 28. area(S . A) 9 area((S ; v) . A) area(S : (S ; v)).

It remains to consider the case in which c + bdA. Let again S and S be black, and S and S be white in Dig(A, r), as shown in Figure 27a, where S , S , S , and S are closed squares. This implies that % : min,area(S . A), area(S . A) 9 max,area(S . A), area(S . A)- 0 (1) We denote by X ; v the translation of a set X by vector v and by A : B : (A 9 B) / (B 9 A). It is easy to observe that (see Figure 28) area(S . A) 9 area((S ; v) . A) area(S : (S ; v))

(2)

for every square S + Q and every vector v, where r denotes the absolute value of r. With c + bdA, there are points of the complement A in every neighborhood of c. Therefore, there exists a vector v such that c ; v , A and area(S : S ) %/2, where S : S ; v for every square S + Q. As a consequence of this fact and inequalities (1) and (2), we obtain min,area(S . A), area(S . A)- 9 max,area(S . A), area(S . A)- 0. (3) Therefore, S and S are black, and S and S are white, in the digitization Dig (A, r) of A with respect to the square cover Q translated by v. This contradicts (*), because c ; v , A and c ; v is the common vertex of the four squares. The obtained contradiction proves that if c + bdA, then the pattern shown in Figure 26 and its 90° rotation cannot occur in the four squares of Dig(A, r) which have c as their common vertex. ■ The following theorem is a simple consequence of 24. Theorem 25 Let A be par(r)-regular. Then Dig(A, r) is a bordered 2D manifold and the boundary of Dig(A, r) is a 1D manifold. Proof Whereas the conﬁguration shown in Figure 26 (and its 90° rotation) cannot occur in Dig(A, r) by Theorem 24 there exist only three 2 ; 2

WELL-COMPOSED SETS

153

conﬁgurations of boundary squares in Dig(A, r) shown in Figure 5 (modulo reﬂection and 90° rotation). Therefore, if we view Dig(A, r) as a subset of R, every point in Dig(A, r) has a neighborhood homeomorphic to a relatively open subset of a closed half-plane. Hence Dig(A, r) is a bordered 2D manifold and the boundary of Dig(A, r) is a 1D manifold. ■ The well-composedness of an output digital image by the intersection digitization can also be guaranteed without the requirement that the input continuous image is parallel regular. Theorem 26 L et G be a planar set with the property that the intersection of G with every open ball with radius d is connected. T hen the intersection digitization Dig.(G, d) of G with diameter d is well-composed. Proof Let a, b, c, and d be four closed squares of cover Q sharing a common corner point x which are arranged as follows:

Let O(x, d) be the open ball centered at x with radius d. We will show that the conﬁgurations a . G " `, d . G " `, and c . G : b . G : ` or b . G " `, c . G " `, and a . G : d . G : `, which lead to non-wellcomposedness of the digitization of G, are impossible. For example, if a . G " ` d . G " `, and b . G : c . G : `,

then G . O(x, d) is disconnected, because G . O(x, d) : G . (O(x, d))(c / d)), which is clearly a disconnected subset of O(x, d). ■

154

LONGIN JAN LATECKI

VI. Application: An Optimal Threshold This section is based on Latecki and Gross (1998). It seems to be a common opinion that the problem of ﬁnding an optimal threshold for gray-level images of black objects on a white background (e.g., obtained by a scanner) has been solved. Usually, this threshold is determined by analyzing a gray-level histogram. In this section we show that the gray-level histogram alone does not provide sufﬁcient information to solve this problem and that this threshold depends on topological properties of the image. Based on these considerations, we propose a new method for determining the optimal threshold that is based on the digital topology of the gray-level image. We tested this method on several hundred document images scanned at different dpi’s (and consisting of several different languages). In all these tests, the threshold determined by our topological method was more suitable than the threshold determined by analyzing gray-level histograms.

A. Thresholding One of the important problems that generally needs to be solved in analyzing document images is that of ﬁnding a threshold to convert the document from a gray-level digital image to a binary one. Finding a good threshold value is important in the document domain for many subsequent applications from OCR to symbolic compression (e.g., see O’Gorman and Kasturi, 1997). There exist several approaches to ﬁnd an optimal threshold of a gray-level image (e.g., see Pratt, 1978 and Weszka, 1978); however, as we will show in what follows, none of them can be optimal. With the following experiment we demonstrate that the gray-level histogram is often insufﬁcient to decide where to cut an optimal threshold. Therefore, there is no apparent reason to assume that by analyzing the gray-level image histogram, for example, by ﬁnding the minima, an optimal threshold can be detected. Consider the two gray-level images of the word ‘‘should’’ shown in Figure 29. They differ only by the inter-letter spacing (i.e., the distance between letters). The lower image was generated from the top one by taking the bounding boxes around each gray-level letter and moving these bounding boxes so as to make them almost adjacent, while moving all the columns containing only background pixels to the outside. Because the distribution of pixels in the image itself remained the same, the gray-level histogram for the two images is identical.

WELL-COMPOSED SETS

155

Figure 29. The two gray-level images differ only by the inter-letter spacing. The corresponding binary images are obtained by thresholding at 215.

The two gray-level images have been thresholded at the threshold value 215. The obtained binary images are shown in Figure 29 below the original gray-level images. While this threshold leads to a correct object segmentation by grouping into the connected components for the upper image, this is not the case for the lower one, where the letters ‘‘u’’ and ‘‘l’’ form a single connected component. Thus, this threshold value when applied to the lower gray-level image results in a false connection. Because the text font in the top gray-level image has wider spaces between letters than the font in the lower one, it follows that the gray-level image of the ﬁrst font can be thresholded at a higher gray-level value than the second font, while still yielding correct segmentation of the letters. The second font, however, having smaller inter-letter distances, clearly requires a lower threshold value than 215. Observe that this is not a distinction that can be made from their identical gray-level histograms. From a digital topology perspective, the digitization and segmentation process for the lower image is not topology preserving, that is, the continuous original ‘‘image’’ containing the underlying letters and the binary image are not topologically equivalent (homeomorphic). In this example we have demonstrated that two images with identical gray-level histograms may require considerably different thresholds in order to ensure that the digitization is topology preserving, which is necessary for correct object segmentation by connected component grouping. Thus, it is impossible to determine a topology preserving threshold using the gray-level histogram. Consequently, signiﬁcant points of the gray-level histogram, like minima, are not related to the topology of the underlying image. Clearly, there is a connection between threshold values and the preservation of topology, but a topology-preserving threshold cannot be determined

156

LONGIN JAN LATECKI

Figure 30. A gray-level document image.

by analyzing the gray-level histogram. The following example gives an illustration of this connection. Gray-level document images often have the property that they are bimodal, and the two peaks of the gray-level histogram are quite distinct, but the proper threshold value between these peaks can be very hard to ﬁnd. This is analogous to the problem in color segmentation of knowing the number and color of the regions into which to segment the image, but of not necessarily knowing exactly where to delineate the boundaries. A good candidate for the threshold value is the minimum between the two peaks (see e.g., Pratt, 1978, Section 18.5.1). Consider the gray-level image shown in Figure 30. This image was captured with a scanner set at 400 dpi. The minimum (between the two peaks) of the histogram of the image gray-level values appears to be approximately 169. The image in Figure 30 is shown thresholded at gray-level 169 in Figure 31. As is evident, this is not a particularly good threshold. It seems to be considerably lower than the desired threshold value and results in many

Figure 31. The gray-level document image thresholded at the gray-level value of 169, which is the approximate minimum of the histogram of image gray-level values.

WELL-COMPOSED SETS

157

Figure 32. The resulting binary image when the threshold is set to 232.

false disconnections, where components that were clearly connected in the original image have become disconnected in the thresholded image. If we desire a topology-preserving segmentation, then this is one where there are neither false connections nor disconnections of connected components. Next, let us consider the binary image shown in Figure 32, which is once again a thresholded version of the image shown in Figure 30. It can be seen that in this image the problem is reversed — this is primarily a problem of false connection, with letters connecting to other letters in the text. One way to view thresholding a digital document is that setting the threshold lower effectively thins out each letter, or component, while raising the threshold effectively thickens each textural component. Clearly, then, there is a tradeoff between the false connection and the false disconnection rate. Assuming the initial document was scanned at some resolution that was not completely topology preserving, this false connection/disconnection tradeoff will almost always exist. B. Histogram of Checkerboard Patterns In the last section, we showed that it is impossible to determine an optimal threshold using the gray-level histogram. Thus, a new indicator is necessary in order to determine which threshold is the one most likely to preserve topology. We are going to propose such an indicator in this section. Our starting point is the observation from the last section that an optimal threshold is closely related to a topology preserving segmentation by thresholding. According to Theorem 23, if the resolution of a digitization process is sufﬁcient, then the resulting segmented digital image and the underlying continuous object are topologically equivalent. Moreover, by Theorem 24, the obtained digital image is well-composed.

158

LONGIN JAN LATECKI

Whereas our mathematical model of the digitization process seems to closely approximate the digitization process of scanners, if there were no noise and if the resolution of a digitization process were sufﬁcient, then the binary digital image obtained by the digitization and segmentation process would contain no checkerboard patterns. As there is some noise inﬂuence and some parts of continuous objects violate the requirement on the sufﬁcient resolution, it is a simple consequence to require that the number of checkerboard patterns be minimal. Consequently, the threshold that minimizes the number of checkerboard patterns should be chosen. However, this is not so simple, because such a threshold is not unique, for example, for thresholds values 0 and 255, which give completely white and completely black digital images, no checkerboard patterns occur. The problem is thus to ﬁnd the ‘‘right’’ minimum in the number of checkerboard patterns. The histogram of the number of checkerboard patterns per threshold value is bimodal for almost all the digital documents we have tried in our experiments, which have consisted of several hundred document images scanned at different dpi’s and consisting of several different languages. The two maxima of the histogram correspond to the two gray-level values where either the number of topological false connections (or false disconnections) has extrema. We will justify this fact here. For the document shown in Figure 30, the two extrema occurred at gray-level values of 86 and 246. The ﬁrst extremum occurred as a result of letters that were falsely disconnecting. The second extremum occurred as a result of letters falsely connecting and from noisy background pixels forming checkerboard patterns. Both of these images are topologically very unstable in that the underlying topological structure of the image is rapidly changing. Conversely, the minimum of checkerboard patterns that occurs in between these two maxima is topologically very stable in that the rate of topological change is at a minimum. We claim that an optimal threshold is at the gray-level value for which the minimum of checkerboard patterns occurs and which lies between the two maxima of checkerboard patterns. We will call this optimal threshold topology-preserving. For example, for the image in Figure 30 the minimum of checkerboard patterns between the two maxima occurs at the gray-level value of 213. The resulting thresholded image is shown in Figure 33, where only 1 of 859,308 2;2-neighborhoods was a checkerboard pattern. Observe that the resulting binary image is topologically very close to the original document. For the more than several hundred documents we have studied, the minimum of checkerboard patterns seems to result in a thresholding of the

WELL-COMPOSED SETS

159

Figure 33. This image was thresholded at the minimum of the histogram of checkerboard patterns, which occurred at a gray-level value of 213. The resulting binary image is topologically very close to the original document.

gray-level image into binary that is either topologically optimal, that is, the total number of false connections and disconnections is minimized, or it is very close to optimal. Unlike the minima of the gray-level histogram, which is often ﬂat or not well-deﬁned, the minima of the histogram of checkerboard patterns are generally very well-deﬁned. In addition, in all of the experiments we have conducted, the thresholding at the checkerboard minima outperforms the thresholding at the gray-level minima considerably. For example, the character recognition rate of the Omni-Page OCR program was signiﬁcantly higher for binary document images obtained using our threshold than by using Omni-Page directly on the gray-level images. For document images scanned at 200 dpi, the number of mismatches made by Omni-Page using our thresholded version was 10 times smaller. If we revisit the two thresholded images shown in Figs. 31 and 32, there is a clear indication from the number of checkerboard patterns in each case that neither gray-level thresholds is optimal. The image in Figure 31 is under-thresholded. The checkerboard patterns are almost entirely the result of false disconnections. The image in Figure 32 is overthresholded. The checkerboard patterns are almost entirely the result of false connections. A more detailed discussion on thresholding document images based on the checkerboard histogram can be found in Latecki and Gross (1998).

VII. Generalizations In Wang and Bhattacharya (1997) the concept of 2D well-composed sets is extended in two directions: to an arbitrary grid system and to segmented images with objects labeled with more than two gray values. Our deﬁnition

160

LONGIN JAN LATECKI

Figure 34. The partition in (b) is a regular partition while the one in (a) is not.

of a well-composed segmented image given at the end of Section I is equivalent to Deﬁnition 3.2 in Wang and Bhattacharya (1997). Now we extend the deﬁnition of well-composed sets to subsets of arbitrary grid systems. In order to simplify the presentation, the following deﬁnitions are closely related to but not identical to the ones in Wang and Bhattacharya (1997). Deﬁnition 25 We call S : P(R) a regular partition of R if 1. 2. 3. 4.

+S : R, for every S + S the boundary bdS is a simple closed curve, each S + S intersects only a ﬁnite number of elements in S, for every A, B + S A . B is either empty or a point or an arc.

We will call each element of a regular partition a pixel, that is, a pixel is a subset of the continuous plane. For example, the partition in Figure 34b is a regular partition, while the one in (a) is not, because p . q is not an arc. A collection S that satisﬁes conditions 1—3 is called a grid system in Wang and Bhattacharya (1997). Deﬁnition 26 A subcollection W of a regular partition S (i.e., a set of pixels) is well-composed if +W is a 2D bordered manifold (or equivalently if bd(+ W) is a 1D manifold). These deﬁnitions can be easily generalized to R: Deﬁnition 27 We call S : P(R) a regular partition of R if 1. +S : R, 2. for every S + S the boundary bdS is a simply connected closed n 9 1 manifold, 3. each S + S intersects only a ﬁne number of other sets in S, 4. for every A, B + S A . B is either empty or is a simply connected bordered manifold of dimension less than n. Deﬁnition 28 A subcollection W of a regular partition S : P(R) is well-composed if +W is a bordered n-manifold (or equivalently if bd(+W) is an n 9 1-manifold.

WELL-COMPOSED SETS

161

We call a function G : S [0, 1] a gray-level digital image, and a function B : S ,0, 1- a binary digital image. Acknowledgments This contribution is based on articles that the author published together with the following researchers: Azriel Rosenfeld (University of Maryland at College Park), Ari Gross (Queens College, CUNY, New York), Ulrich Eckhardt (University of Hamburg), and Christopher Conrad. Their cooperation is gratefully acknowledged. This contribution was also inﬂuenced by many helpful comments from Siegfried Stiehl (University of Hamburg), Ullrich Ko¨the (University of Hamburg), Ralph Kopperman (The City College, CUNY, New York), Paul Meyer (The City College, CUNY, New York), and Atsushi Imiya (Chiba University, Japan). References Abdulla, W. H., Saleh, A. O. M., and Morad, A. H. (1988). A preprocessing algorithm for hand-written character recognition. Pattern Recognition L etter, 7:13—18. Aleksandrov, P. S. (1960). Combinatorial Topology, vol. 3, Albany, New York: Graylock Press. Arcelli, C. (1981). Pattern thinning by contour tracing, Computer Graphics and Image Processing, 17:130—144. Arcelli, C., and Sanniti di Baja, G. (1989). A one-pass two-operation process to detect the skeletal pixels on the 4-distance transform, IEEE Trans. PAMI, 11:411—414. Artzy, E., Frieder, G., and Herman, G. T. (1981). The theory, design, implementation and evaluation of a three-dimensional surface detection algorithm, Computer Graphics and Image Processing, 15:1—24. Chen, L., and Zhang, J. (1993). Classiﬁcation of simple surface points and a global theorem for simple closed surfaces in three dimensional digital spaces, in Proc. SPIE’s Vision Geometry, 2060: 179—188. Davies, E. R., and Plummer, A. P. N. (1981). Thinning algorithms: A critique and a new methodology, Pattern Recognition, 14:53—63. Duda, R. O., Hart, P. E., and Munson, J. H. (1967). Graphical Data Processing Research Study and Experimental Investigation, AD650926, March 1967. Eckhardt U., and Maderlechner, G. (1993). Invariant thinning. Int. J. of Pattern Recognition and Artiﬁcial Intelligence, 7:1115—1144. Francon, J. (1995). Discrete combinatorial surfaces, Graphical Models and Image Processing, 57:20—26. Freeman, H. (1970). Boundary encoding and processing, in B. S. Lipkin and A. Rosenfeld, editors, Picture Processing and Psychopictures, pp. 241—266, New York: Academic Press. Gross, A., and Latecki, L. J. (1995). Digitizations preserving topological and differential geometric properties, Computer V ision and Image Understanding, 62:370—381. Herman, G. (1992). Discrete multidimensional Jordan surfaces, GV GIP: Graphical Models and Image Processing, 54:507—515.

162

LONGIN JAN LATECKI

Hilditch, C. J. (1969). Linear skeletons from square cupboards, in B. Meltzer and D. Michie, editors, Machine Intelligence IV, pp. 403—420. New York: American Elsevier and Edinburgh: University Press. Kong, T. Y., and Roscoe, A. W. (1985). A theory of binary digital pictures, Computer V ision, Graphics, and Image Processing, 32:221—243. Kong, T. Y., and Rosenfeld, A. (1989). Digital topology: Introduction and survey, Computer V ision, Graphics, and Image Processing, 48:357—393. Kong, T. Y., and Rosenfeld, A. (1990). If we use 4- or 8-connectedness for both the objects and the background, the Euler characteristic is not locally computable, Pattern Recognition L etters, 11:231—232. Ko¨the, U. (2000). Generische Programmierung fu¨r Computer V ision. Doctoral dissertation, Dept. of Computer Science, University of Hamburg. Latecki, L. J. (1995). Multicolor well-composed pictures, Pattern Recognition L etters, 16:425— 431. Latecki, L. J. (1997). 3d well-composed pictures, Graphical Models and Image Processing, 59:131—142. Latecki, L. J., Conrad, Ch., and Gross, A. (1998). Preserving topology by a digitization process, Jour. Mathematical Imaging and V ision, 8:131—159. Latecki, L. J., Eckhardt, U., and Rosenfeld, A. (1995). Well-composed sets, Computer V ision and Image Understanding, 61:70—83. Latecki, L. J., and Gross, A. (1998). From mathematical digitization models to discrete shape constraints, in R. Kletle, A. Rosenfeld, and F. Sloboda, editors, Advances in Digital and Computational Geometry, pp. 185—226. Singapore: Springer-Verlag. Latecki, L. J., and Ma, C. M. (1996). An algorithm for a 3d simplicity test, Computer V ision and Image Understanding, 63:388—393. Lu¨, H. E., and Wang, P. S. P. (1985). An improved fast parallel thinning algorithm for digital patterns, in Proc. IEEE Conference on Computer V ision and Pattern Recognition, San Francisco, CA, pp. 364—367. Minsky, M., and Papert, S. (1969). Perceptrons. An introduction to Computational Geometry, Cambridge, MA: MIT Press. O’Gorman, L., and Kasturi, R. (1997). Document Image Analysis, Los Alamitos: IEEE Computer Society. Pavlidis, T. (1982a). Algorithms for Graphics and Image Processing, Berlin: Springer-Verlag. Pavlidis, T. (1982b). An asynchronous thinning algorithm, Computer Graphics and Image Processing, 20:133—157. Pratt, W. K. (1978). Digital Image Processing, New York: John Wiley and Sons. Ronse, C. (1988). Minimal test patterns for connectivity preservation in parallel thinning for binary images, Discrete Applied Mathematics, 21:67—79. Rosenfeld, A. (1975). A characterization of parallel thinning algorithms, Information and Control, 29:286—291. Rosenfeld, A. (1979). Digital topology, American Mathematical Monthly, 86:621—630. Rosenfeld, A., and Kong, T. Y. (1991). Connectedness of a set, its complement, and their common boundary, Contemporary Mathematics, 119:125—128. Rosenfeld, A., and Pfaltz, J. L. (1966). Sequential operations in digital picture processing, Jour. Association for Computing Machinery, 13:471—494. Rutovitz, D. (1966). Pattern recognition, J. Royal Statist. Soc., 129:504—530. Serra, J. (1982). Image Analysis and Mathematical Morphology, New York: Academic Press. Stefanelli, R., and Rosenfeld, A. (1991). Some parallel thinning algorithms for digital pictures, Jour. Association for Computing Machinery, 18:255—264.

WELL-COMPOSED SETS

163

Wang, Y., and Bhattacharya, P. (1992). Digital connectivity and extended well-composed sets for gray images, Computer V ision and Image Understanding, 68:330—345. Weszka, J. S. (1978). A survey of threshold selection techniques, Computer Graphics and Image Processing, 7:259—265. Zhang, T. Y., and Suen, C. Y. (1984). A fast parallel algorithm for thinning digital patterns, Communications of the ACM, 27:236—239.

a This Page Intentionally Left Blank

ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 112

Non-Stationary Thermal Field Emission V. E. Ptitsin Institute for Analytical Instrumentation RAS, Rizhskij Prospekt 26, 198103, St. Petersburg, Russia

I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . II. Electron Emission and Thermal Field Processes Activated by High Electric Fields Acting on Metal Surfaces . . . . . . . . . . . . . . . . . . A. Thermal Field Emission of Electrons from Metal Surfaces: Canonical Concepts of the Emission Mechanism . . . . . . . . . . . . . . B. Inadequacy of the Concepts Regarding the Thermal Field Emission Mechanism for Intense Electric Fields Inducing High Density Emission Currents (J 10 A/cm) . . . . . . . . . . . . . . . III. Phenomenological Model of Non-Stationary Thermal Field Emission . . A. Heating of the Pointed Microcrystal Emitter Tip by the Emission Current Flow . . . . . . . . . . . . . . . . . . . . . . . . B. Instability of the Microcrystal Emitter Tip Surface During Non-Stationary Thermal Field Emission . . . . . . . . . . . . . C. Determination of the Surface Concentration of the Microcrystal Emitter Substance Native Atoms in the Two-Dimensional Gas State . . . . . D. Characteristic Features of the Emitter Native Neutral Atoms Motion After the Evaporation from the Emitting Surface into Vacuum . . . . E. Ionization Probability of the Emitter Substance Native Atoms After the Evaporation . . . . . . . . . . . . . . . . . . . . . . . . F. Processes at the Interface: Emitter Surface — Microplasma Layer . . . G. Non-Stationary Thermal Field Emission Current Kinetics . . . . . . IV. Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . .

. .

165

. .

167

. .

167

. . . .

171 191

. .

191

. .

195

. .

198

. .

202

. . . . . . .

207 209 217 221 225 225 228

. . . . . . .

I. Introduction This paper deals with one of the most important problems related to forming high-power density submicron electron probes, that is, that of development and practical preparation of stable electron sources with superhigh brightness (up to 10 A/cm sr) and angular emission intensity (up to 10\ A/sr). The timeliness of the development of electron-optical systems capable of forming on the sample (target) surface a submicron electron probe with a 165 Volume 112 ISBN 0-12-014754-8

ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright 2000 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/00 $35.00

166

V. E. PTITSIN

power density of (10—10)W/cm and relatively low energy (about 10 eV and below) is because such probes could be very useful in advanced scientiﬁc and technical projects such as resistless (‘‘dry’’) electron lithography; high-intensity (up to 10 W/cm) ‘‘point’’ X-ray sources for X-ray lithography; electron-beam semiconductor surface proﬁling and modiﬁcation; ‘‘soft’’ desorption ionization of adsorbates for mass spectral analysis of complex organic and bioorganic compounds; next-generation Auger spectrometers for high-locality chemical and structural analysis of molecules adsorbed on various matrices, and so on. The main difﬁculty here, which hinders practical development and utilization of such probes, is that the existing thermal ﬁeld cathodes-emitters based on, for example, refractory transition metal microcrystals and also ZrO/W(100 composite emitters do not demonstrate stable operation in the high density (above 10 A/cm) emission mode since in such conditions an explosive breakdown is initiated in the interelectrode gap: emitter surface — the ﬁrst anode of the electron-optical system. In this connection the present paper concentrates on the results of investigations on the physical mechanisms of thermal ﬁeld processes leading to the explosive breakdown phenomenon. A detailed analysis of processes accompanying the development of non-stationary electron emission and explosive breakdown initiation has shown that the non-stationary behavior of the electron emission current from the transition metal microcrystal (MC) tip surface in high electric ﬁelds cannot be adequately interpreted based on canonical conceptions of the thermal ﬁeld emission (TFE) mechanism. It has been found that in such conditions the non-stationary electron emission mode develops due to (a) evaporation of native atoms from the emitting MC emitter surface, (b) ﬁeld and collisional ionization of evaporating atoms, and (c) interaction of the ions thus formed with the emitting surface. The development of those processes in time causes phase transition of the emitter condensed matter to dense plasma. The studies have revealed substantial differences in physical mechanisms between the stationary and non-stationary electron-emission processes that suggests their treatment as essentially different phenomena. The phenomenological model developed for the explosive breakdown phenomenon has scientiﬁc value by itself, but also is of great practical interest for applications in electron-optical systems, such as electron ‘‘quazilasers’’’ to produce intense submicron electron beams whose power density is comparable with that of modern high-power laser sources. Such instruments will make the scientiﬁc and technical projects listed above practical. The investigations carried out have provided the basis for developing an original nanoprocess for making pointed electron emitters capable of stable

NON-STATIONARY THERMAL FIELD EMISSION

167

operation in intense electric ﬁelds with an electron brightness of up to 10 A/cmsr and angular emission intensity of up to 10\ A/sr. A brief description of the technology for preparing emitters with such unique electron-optical parameters is given in the Appendix.

II. Electron Emission and Thermal Field Processes Activated by High Electric Fields Acting on Metal Surfaces A. Thermal Field Emission of Electrons from Metal Surfaces: Canonical Concepts of the Emission Mechanism The phenomenon of ﬁeld emission (FE) of electrons experimentally detected by R.W. Wood (1897) consists in emitting electrons by substances exposed to a sufﬁciently high electric ﬁeld (F about 1 V/nm). R. Fowler and L. Nordheim (1928), based on the ideas of tunneling of electrons of substance through the potential barrier at the metal surface—vacuum boundary, gave theoretical explanation to the phenomenon of FE. The expression describing dependence of the current density (J) of ﬁeld emission on the ﬁeld strength (F) and work function (3) has the form (see Fowler and Nordheim (1928) and Modinos [1990]) J(F ) :

1,537.10F 0,683 3 · exp 9 (y) 3t(y) F

(1)

here J(F ) is expressed in A/cm, 3 in eV, F in V/Å, and t(y) and v(y) are tabulated functions of the argument y : 3,79F/3 (Modinos, 1990). Note that formula (1) is obtained using several simplifying assumptions and computational techniques (Fowler and Nordheim, 1928; Modinos, 1990; Elinson, 1974). 1. The problem was stated as one-dimensional (i.e., the metal—vacuum interface was considered as an ideally ﬂat surface, and hence the boundary ﬁeld was considered homogenous). 2. The transparency of the potential barrier was calculated by means of the quasi-classical WKB approximation. 3. As a model of metal, the model of free electrons in a potential ‘‘box’’ (Sommerfeld model of metal) was selected, according to which electrons of metal will form a degenerate gas obeying the Fermi—Dirac statistics. 4. The theory is built for the metal temperature T : 0 K. The plot of ln(J/F) as a function of argument 1/F is a straight line and, consequently, is called the Fowler—Nordheim (F—N) straight line. The

168

V. E. PTITSIN

analytical expression describing the electron emission density from the substance surface with the work function 3 versus ﬁeld strength F and absolute surface temperature T was derived by E. L. Murphy and R. H. Good in 1956 (Murphy and Good, 1956). This relation is of the form (Elinson, 1974) J(F, T ) : J(F )

# sin(#)

(2)

where # : (8m/ )(3/eF) t(y) kT ; m is the electron mass, is the Planck constant, e is the electron charge, and k is the Boltzmann constant. Formula (2) is correct with an error up to 40%, providing certain conditions interrelating the emission parameters (F, 3, T ) are fulﬁlled (Modinos, 1990; Elinson, 1974). The range of temperatures and ﬁeld strengths, where (with the indicated error) the use of formula (2) is valid, is shown in (Modinos, 1990 (Figure 1.4)). Analytical expressions describing the relation J(F, T ) in an intermediate range of temperatures and ﬁeld strengths were obtained by Christov (1966). However, the relationships derived in (Christov, 1966) are not given here, as they are cumbersome and can be accepted only after appropriate numerical calculations. In addition to the function J(F, T ) another essentially important physical characteristic of the ﬁeld emission process is the energy distribution of emitted electrons. The number of electrons emitted from the unit area per unit time with the total energy from E to E ; dE can be deﬁned as j (E ) dE, where j (E ) is the function of distribution by total energies. Within the framework of the theory of metals, in free electrons approximation, j (E ) can be expressed as [Modinos, 1990] j (E ) :

N(E, #) D(#, F)d# where N(E, w) is determined by the relationship

(3)

N(E, w)dEdw : (m/2 ) f (E )dEdw here f (E ) is the Fermi—Dirac distribution function; w is the normal component of the electron total energy; and D(w, F) is the coefﬁcient of electron passage through the potential barrier at the metal—vacuum boundary: D(w, F) : ,1 ; exp[Q(w)]-\ Q(w) : 92i

(4)d4 (4 ) : [(2m/ ) (w 9 E 9 3 ; e/44 ; eF4)]

169

NON-STATIONARY THERMAL FIELD EMISSION

where Z and Z are the roots of the equation (4) : 0, and E is the Fermi energy. The expression for j (E ) true for the range of F and T, where formula (2) was obtained, was derived by R. Young (1959)

md f (E ) 4 2m (3 ; E 9 E) j (E ) : exp v(y ) , 2 3 eF

(4)

where y : (eF ) (E ; 3 9 E)\. As shown by reﬁned experiments performed in (Swanson and Crouser, 1967), Young’s formula (4) satisfactorily agrees with experimental results up to ﬁelds F : 4 V/nm for crystallographic planes (111), (112), (116), and (310) of W. For higher values of ﬁeld strengths F 5 V/nm (at current densities about 10 A/cm for W), Young’s formula is not true because of limitations at which it was obtained. In such conditions the integral (3) should be found numerically without any reduction. However, in (Bell and Swanson, 1979) it was experimentally shown that in high electric ﬁelds (F 5 V/nm) the energy distribution function of electrons proves to be broader than it follows from expression (4). According to the Bell and Swanson (1979), the experimentally observed signiﬁcant broadening of the electron energy distribution function in comparison with the theoretically predicted distribution (3) is largely due to the inﬂuence of the emitted electrons space charge (SC) ﬁeld. In Figure 1 some curves plotted using formulas (1) and (2) are given, which quantitatively and qualitatively characterize the theoretically predicted behavior of ‘‘current—voltage characteristics’’ (CVC) for the phenomenon of TFE depending on T (playing here the role of parameter) for a W tip-like emitter with the work function 3 : 4, 5 eV at ﬁeld strengths ranging from 1.5 V/nm to 3V/nm. Formally, from curves 2—5 in Figure 1 it follows that, if the inﬂuence of the SC ﬁeld were negligible, CVC of the TFE process should ‘‘pass’’ above the F—N straight line at any values of F. Actually such supposition is true only at rather low TFE current densities. In this connection the shape of curves 2—5 in Figure 1 and also their position relative to the F—N straight line well agrees with the experiment only at rather low (F 4.5 V/nm) ﬁelds. As revealed by W. P. Dyke and his colleagues (Dyke and Dolan, 1956), CVC of actual vacuum diodes with pointed cathodes-emitters in high electric ﬁelds (F 5 V/nm) deviate from the F—N straight line towards lower current densities (Figure 2). The authors (1956) believed that such Note that further in the text the abbreviation TFE will denote the phenomenon of electron emission activated by the effect of an electric ﬁeld of about 1 V/nm and higher on the surface of condensed substance (metal) having a temperature different from 0 K. In our opinion, physically it is quite correct. Such generalization allows one to consider the FE phenomenon and also the so-called Schottky mode of electron emission as more speciﬁc phenomena produced by TFE.

170

V. E. PTITSIN

Figure 1. The current density versus ﬁeld strength at different emitting W surface temperatures (3 : 4.5 eV) 1. T : 0 K (is the ‘‘F—N straight line’’); 2. T : 850 K; 3. T : 1050 K; 4. T : 1350 K; 5. T : 1600 K.

behavior of CVC in high electric ﬁelds could be attributed both to the inﬂuence of the emitted electrons SC and self-heating of the emitter tip apex by high-density current. It is natural that at a speciﬁc ﬁxed value of F the process of self-heating of the emitter should result in the TFE current density growth, whereas the inﬂuence of the SC ﬁeld, on the contrary, should cause a decrease in TFE current density. It formally follows that for the correct description of the electron emission process in high electrical ﬁelds, it is necessary to search for the solution of a set of equations consisting of Equation (2), the Poisson equation, and heat conduction equation. Apparently, owing to the complexity of such a problem statement for the description of the emission process in high electric ﬁelds, it has not been posed yet.

NON-STATIONARY THERMAL FIELD EMISSION

171

Figure 2. Typical experimental CVC for a vacuum diode with the W cathode-emitter. ACE is the F—N straight line, J is the emission current density corresponding to the transition ! from stationary TFE to a non-stationary process of electron emission; J is the maximum or limiting value of current density in the non-stationary pulsed emission mode for given initial experimental conditions: r, , 5 , and so on. (At J : J , the explosive breakdown process is initiated.)

B. Inadequacy of the Concepts Regarding the Thermal Field Emission Mechanism for Intense Electric Fields Inducing High-Density Emission Currents (J 10 A/cm) The theoretical conceptions of the physical mechanism of TFE from the metal surface brieﬂy outlined in the preceding section quite satisfactorily agree, both quantitatively and qualitatively, with the results of numerous experimental studies of ﬁeld emission process at ﬁeld strengths up to

172

V. E. PTITSIN

F 5 V/nm. These results are well known to the specialists in ﬁeld emission phenomena. They are comprehensively covered in a number of excellent review papers, monographs, and books (Modinos, 1990; Elinson, 1974; Dyke and Dolan, 1956) and, therefore, will not be discussed here in detail. However, the satisfactory agreement between the TFE theory and experimental data takes place only for the ﬁeld range 1 V/nm F 5 V/nm and at relatively low temperatures of the metal emitter (T (1000 —1500)K). In high electric ﬁelds (F 5 V/nm) the canonical conceptions of the TFE phenomenon do not allow adequate interpretation of certain experimental data. The question of validity of using canonical conceptions of the TFE phenomenon to interpret thermal ﬁeld processes occurring under the above conditions originally arose from the pioneering studies by W. P. Dyke and his colleagues (1956). In the course of those studies they revealed certain deviations from the normal TFE behavior. In particular, this manifests itself in the fact that in high electric ﬁelds the curves of J : J(F ) or I : I(V ) (here I is the total emission current from the ﬁeld emitter, V is the extraction voltage or the potential difference between the emitter and anode of the vacuum diode) plotted in coordinates: ln(J/F), 1/F (or in coordinates log I, 1/V ) deviate from the F—N straight line (Figure 2). Note that in the deviation region of the curves (Figure 2) the electron emission process is not stationary and therefore the experiments were carried on in the pulsed emission mode. Dyke’s and his colleagues’ experiments have shown that during the square voltage pulse-period emission current increases with time. If the voltage pulse height is further slightly (1%) gradually increased, the emission current will increasingly grow at a ﬁxed voltage pulse width 5(5 10\ s). Such non-stationary behavior of the emission process is usually called the effect of spontaneous current growth. Simultaneously, one can observe a bright emission ring (‘‘ring’’ effect) around the emission image of the W ﬁeld emitter surface on the ﬂuorescent screen of the vacuum diode (electron microscope—Muller projector) (Figure 3). The emission-current spontaneous-growth rate (dI/dt) slightly increases with voltage pulse height at repeated exposure of the emitter surface to ﬁeld pulses and, ﬁnally, at a certain arbitrary time moment t(0 t 5) the next in sequence voltage pulse will cause an abrupt increase in the total emission current (Figure 4). For a period of 50 ns when this jump (or burst) of the total emission current takes place, it may grow (10—100) times depending on 5. This causes irreversible destruction of the emitter tip. The electron microscope observation of the destroyed and melted emitter tip shows that the linear dimensions of the melted emitter tip reach a few micrometers (up to (10—30) m). This phenomenon of abrupt emission current growth resulting in irreversible destruction and melting of the emitter tip was called

NON-STATIONARY THERMAL FIELD EMISSION

173

Figure 3. Emission image of the W MC surface in the non-stationary electron emission mode. (The ring effect can be observed at an emission current density satisying the inequality J J J .) !

by the authors, who ﬁrst discovered it, the phenomenon of explosive breakdown (Dyke and Dolan, 1956). The explosive breakdown studies have shown the following (Dyke and Dolan, 1956). 1. Explosive breakdown development is not an absolutely random, uncontrollable process since it is preceded by typical and reproducible effects such as spontaneous current growth and the ‘‘ring’’ effect. It is shown that by varying the square voltage pulse height one can (by increasing pulse height) either strengthen the ring effect (i.e., ring brightness) and raise the spontaneous current growth rate or, inversely, (by slightly decreasing pulse height) gradually reduce the ring brightness and spontaneous current growth rate down to complete fading of those effects. If then, after vanishing of the effects, the voltage pulse height is increased again, the pre-explosion effects will be well reproduced.

174

V. E. PTITSIN

Figure 4. Typical oscillograms of current in the non-stationary emission mode. (Oscillograms 1—4 are obtained at successively increasing (by 1%) rectangular voltage pulse height of ﬁxed duration; the oscillogram portion — 4 , corresponds to the current ‘‘break’’ stage due to explosive breakdown.)

2. Activation and development of explosive breakdown is not connected with the emitter bombardment by ions formed on the anode surface as a result of electron-stimulated desorption with a rather low but non-zero probability. This was proved by numerous experiments that show that the development of explosive breakdown depends only on the emission current density and not on ion energy, which is evidently deﬁned by the extraction voltage. Besides, the independence of the breakdown activation process from the bombardment by ions desorbed from the vacuum diode’s anode was conﬁrmed by direct experiments. The explosive breakdown phenomenon in those experiments was observed at pulsed ﬁeld exposure times below the ions’ time of ﬂight from anode to emitter.

NON-STATIONARY THERMAL FIELD EMISSION

175

Figure 5. Half-width at half-height of the emitted electron energy distribution (%) versus ﬁeld strength at different MC surface temperatures (a) MC of W(100;

(1) T : 1685 K; (2) T : 84 K;

(b) MC of ZrO/W(100;

(1) T : 1455 K; (2) T : 84 K.

(Curves obtained from the experimental data (Bell et al., 1979) are shown by solid lines; curves obtained from Young’s formula (Equation 4) are shown by dashed lines.)

3. Breakdown initiation is not related to the emitter surface bombardment by residual gas ions. To prove this statement a spherical Muller projector was made (Dyke and Dolan, 1956) with two emitters whose ion bombardment probabilities were approximately the same, whereas emission current densities differed about two times at a given extraction voltage. The explosive breakdown (with preceding effects of spontaneous current growth and ring) was initiated of the emitter with a higher current density, and the

176

V. E. PTITSIN

explosive breakdown initiation at that emitter did not affect the emission current image of the other one. The experimental data given above demonstrate that explosive breakdown is initiated by thermal ﬁeld processes in the emitter itself and the governing factor here is the value of current density and/or the ﬁeld strength F. The results listed above were obtained in experiments on emitters made of W. Later these results were conﬁrmed in the works of other authors (Elinson, 1974). They involved emitters both of W and other refractory transition metals such as Ta, Mo, Re, and Nb as well as of a number of refractory metal-like compounds. Another speciﬁc feature of the electron-emission process in high electric ﬁelds is the substantial broadening of electron-emission total energy spectra (Figure 5) experimentally revealed in Bell and Swanson, (1979). The experiments in Bell and Swanson, (1979) were carried out on emitters with essentially different work functions. Both emitters were made of W with crystallographic orientation (100 and the work function of one of the emitters was reduced by selective adsorption of Zr atoms. The curves in Figure 5 show that the signiﬁcant departure of experimental data from the TFE theory takes place only in the high electric ﬁeld region. So a set of unambiguously established and quite reliable experimental data discussed here allows us to assert that in high electric ﬁelds an abnormal behavior and some peculiarities of the emission process are observed, which go beyond the canonical conceptions of the physical mechanism of TFE. To conclude this section we list the anomalies found in high electric ﬁelds. 1. Deviation of the CVC of the process from the F—N straight line. 2. Broadening of the emission electron total energy spectra. 3. Emission process is non-stationary, accompanied by the ring effect and ends in explosive breakdown. For a long time these anomalies did not receive unambiguous interpretation. As for the question of the CVC departure from the F—N straight line, in particular, Dyke and Trolan (1953) explain this by the inﬂuence of the emission-electron SC ﬁeld. A different point of view on this subject was offered in Lewis, (1956). In that work, the departure from linearity is related to possible deviation of the true potential barrier shape at the metal—vacuum boundary from the model, which is known to be deﬁned by the combined effect of the electric image forces and electrostatic ﬁeld on emitted electrons. It turns out that in high

NON-STATIONARY THERMAL FIELD EMISSION

177

electric ﬁelds the typical barrier widths become comparable or close to the interatomic distance (0.3 nm) and the conceptions of classical electrodynamics are no longer valid. The SC ﬁeld effect on CVC of the vacuum diode with a pointed cathode was studied in (Aizenberg, 1954, 1964; Kompaneets, 1959; Barbour et al., 1953). The studies have shown that CVC departure from the F—N straight line at the ﬁeld strength F (7—8) V/nm, with the current density of emission from the W emitter being, accordingly, as high as J (1 —5) 10 A/cm, may be really caused by the emitted electron SC ﬁeld. This result was in satisfactory agreement with experimental data obtained by Dyke and his colleagues and did not raise serious objections for a long time. However, further investigations (Ptitsin et al. 1985, 1986, 1996, 1998) have cast doubt on the validity of those conceptions since for emitters made of W, Mo, and Nb used in these works the departure of CVC from the F—N straight line was observed in much weaker ﬁelds and, hence, at lower emission current densities (F 5 V/nm, J (1—2) 10 A/cm). Besides, it has been found that in the CVC departure region over the time 5 equal to a single ﬁeld (voltage) pulse length (10\—10\) s the emitting surface microstructure undergoes a substantial change (Krotevich et al., 1985). Moreover, it was shown (Krotevich et al., 1986) that, when going from the linear portion of CVC to the portion where CVC deviates from the F—N straight line, one could observe an increase in resolution () of the pulsed electron microscope—Muller projector (up to (0.3—0.6) nm). Note that in the linear portion of CVC, where the emission process is stationary, the electron microscope resolution retains its typical range of values (2.5— 5.0) nm (Modinos, 1990). Experimental data presented above cannot be adequately explained based on the ideas of possible inﬂuence of the emitted electron SC ﬁeld on the emission process. Because of the indicated above inconsistency between experimental data and theoretical conceptions, we performed calculations to obtain quantitative estimates of the emitted electron SC ﬁeld contribution to the total ﬁeld strength at the emitter tip surface (Ptitsin et al., 1996, 1998). Note again that a similar problem was earlier solved also by other authors (Aizenberg, 1954, 1964; Kompaneets, 1959; Barbour et al., 1953). However, the correctness of their results casts some doubt. It is based on the fact that to ﬁnd a solution to the self-consistent problem for the Poisson equation, the authors mentioned above used essential simpliﬁcations that cannot always be considered as being physically justiﬁed enough. In particular, the emitted-electron SC distribution (6(x, y, z)) was described by a spherically symmetric function, that is a substantial oversimpliﬁcation for the vacuum diode with a thermal ﬁeld pointed cathode. Our

178

V. E. PTITSIN

calculations (Ptitsin et al. 1996, 1998) have shown that such simpliﬁcation yields 2—3 times overestimated values for the SC contribution to ﬁeld magnitudes at the emitter tip surface (F*). Besides, the initial velocity of ﬁeld emission electrons ( ) in (Aizenberg, 1954, 1964; Kompaneets, 1959) was taken to be zero and, hence, 6 - at the emitter surface. Last assumption was not justiﬁed by the authors of (Aizenberg, 1954, 1964; Kompaneets, 1959). In contrast to (Aizenberg, 1954, 1964; Kompaneets, 1959), the authors of (Ptitsin et al. 1996, 1998) used a different approach to account for the SC ﬁeld effect on the process of electron emission. It is based on the possibility of using the image method to calculate the ﬁeld magnitudes created by the SC of emitted electrons at the emitter surface during intense emission. As opposed to earlier works, in (Ptitsin, 1996; Ptitsin et al. 1996, 1998), the initial electron velocity was assumed to be different from zero and, besides, the distribution 6(x, y, z) was approximated by an axially symmetric function corresponding to real conditions. It is shown that the suggested approach yields physically correct estimates of the magnitude sought. The potential distribution function was found by the integral equation M method (Molokovsky et al., 1991). As a model of the vacuum diode there was considered a spherical condenser in which SC of emitted electrons is distributed within a solid angle of 2 (Figure 6). The SC ﬁeld strength at the thermal ﬁeld emitter tip apex was calculated by the image technique (Landau and Lifshits, 1982). The ﬁeld potential at an arbitrary point N M in the z-axis of the diode was sought as a superposition of three functions : ; ; * M

(5)

where is the solution of the Laplace equation meeting the boundary conditions : 0; :V Q ? where is the potential of the ﬁeld created by emitted electrons, * is the potential of the ﬁeld created by virtual ‘‘image’’ charges, r is the inner sphere (cathode) radius, and r is the outer sphere (anode) radius. The sum ; * should satisfy the boundary conditions: ( ; *) : 0. According to ; the image technique, the value of the function ;Q *? at the point N is deﬁned by ; * :

1 1 r 9 dq 4% 4 4 · r

(6)

where 4, 4 , r are the magnitudes of respective vectors (Figure 6). Bearing in

NON-STATIONARY THERMAL FIELD EMISSION

179

Figure 6. Vacuum diode model for calculation of the emitted electrons SC ﬁeld strength at the thermal ﬁeld emitter apex. 4, 4* are radius — vectors of the point charge dq and its image dq*, respectively; r /r $ 10—10; (the anode surface is shown only partly; geometrical scale proportions are not observed).

mind that in the spherical coordinate system Jr sin dd3d4 (4)

dq : 6d : 9

where J is the TFE current density at the emitter surface, v(4 ) is the electron velocity

2e M m 4 : (4 ; r 9 2 · 4r cos ); r : (r ; r /4 9 2rr cos /4) (4 ) :

1;

180

V. E. PTITSIN

Equation (6) can be rewritten as Jr (r) : 9 M 4%

sin d4d d3 (1 ; 2e /m) M

1 r ; 9 (7) (4 ; r 9 2r4 cos ) (r4 ; r 9 2r r4 cos ) This integral equation is formally the solution of the self-consistent Dirichlet problem for the Poisson equation. In view of the fact that in high electric ﬁelds the potential distribution (r) differs from (r) only slightly (based on preliminary estimates, by two M or three percent max), to a ﬁrst approximation, the function (r) under the M integral in Equation (7) may be replaced by the known function ( (r) V r r r (r) : 1 9 $ V 1 9 r 9r r r Integrating (7) in 3 and yields

?

4 r 24 cos ; 9 r r r Q 4 24 cos 1 1 9 ;19 ; (r ; 4) · 9 (8) r r r r Jr 2eV ; 1 r 2/(1 ; 2) here ; 2 2% (1 ; 2) m The ﬁeld strength F at the emitter apex can be found from (8) M d F F :9 M Q M dr After differentiation and substitution of the dimensionless variable t : 4/r , we obtain r (r) : V 1 9 ; r M r

d4 (4 9 14)

F

Q

: 9V /r 9

?Q

dt

(t 9 1) 9 (t ; 1) (1 ; t 9 2t cos )

1 (t 9 1t/r ) r 3 r ?Q (t 9 1) · (t 9 1t/r )\ : 9V /r ; ; ln 4 9 dt r 2 r (1 ; t 9 2t cos ) (9) ;

181

NON-STATIONARY THERMAL FIELD EMISSION

The integral (I) in the latter relationship is elliptic. It may be expressed as a sum of simpler elliptic integrals by using a known transformation technique for such integrals (Smirnov, 1969):

c a I : (b 9 1) I 9 2b ; I ; I 9 C 2 \ 2 where I :

? z dz , (k : 91, 0, 1) (a z ; b z ; c z ; d ?

(10)

where a : (ab 9 b 9 b); b : (3b 9 2ab ; 1); c : (a 9 3b); d : 1 : (b ; r /r )\, : (1 ; b)\, a : 92 cos , b : 91/r (P( ) (P( ) 9 C: where (P( ), (P( ) are the values of denominator in the integrand of Equation (10). So Equation (9) can be rewritten in the form of F

Q

r 3 r : 9V /r ; ; ln 4 r 2 r c a 9 (b 9 1) I 9 2b ; I ; I 9 C \ 2 2

(11)

where I , I , and I are elliptic integrals of the ﬁrst, second, and third kind, \ respectively, with known integration limits and coefﬁcients of the polynomial P(z). Equation (11) is easily integrated by numerical techniques at any given values of the coefﬁcients. Equation (11) can be reduced to a simple analytical expression. In particular, such expression may be obtained for the conditions of thermal ﬁeld emitter functioning in the high emission current density mode. Under these conditions, equals /3 (Dyke and Dolan, 1956). After the transformation of integrals in Equation (11) to the Legendre form and integration we have

1 r (3 1 3 F* 5 (1.75 ; ln ; E 9 K (12) 2 2 r 2 2 where F* is the magnitude of the SC ﬁeld vector, and K and E are total elliptic integrals. Numerical calculations by Equation (12) show that for typical values of the emitter tip radius and interelectrode space the ﬁeld F*

182

V. E. PTITSIN

does not exceed the Laplace ﬁeld by more than two percent up to J $ 10 A/cm. In other words, the result obtained implies that in agreement with the experiment for indicated values of J the SC ﬁeld should not (Ptitsin, 1996) noticeably affect the shape of CVC. The agreement between the calculations performed and experiment proves that the present approach allows adequate evaluation of the SC ﬁeld effect to an accuracy sufﬁcient for unequivocal interpretation of experimental results. If, however, one sets : in Equation (11) in accordance with earlier adopted models of the vacuum diode with a spherically symmetric SC distribution (Aizenberg, 1954, 1964; Kompaneets, 1959), then the magnitude of F*, all things being equal, will be 2.5—3.2 times higher (as compared to the realistic model with : /3). For a spherically symmetric SC distribu tion, the SC ﬁeld must affect the CVC shape already at J of about $5 10 A/cm, which is inconsistent with experimental data (Ptitsin, 1996). As for the possible effect of the TFE electrons’ initial velocity on the F* magnitude, from Equation (12) it follows that to ignore the initial velocity, setting it equal to zero, the following relationship should hold

2eV m

This is always the case with TFE, and the earlier adopted model simpliﬁcations : 0 are quite admissible. Thus the suggested approach yields physically correct, experimentally proved estimates of SC ﬁeld magnitudes created by emitted electrons at intense TFE. To conclude, it is worth noting that the approach developed for quantitative evaluation of the SC ﬁeld effect can be also used to calculate the SC ﬁeld created by emitted ions during the operation of liquid metal ion sources. The suggested method for SC ﬁeld calculations permits one to built CVC of the TFE process for the vacuum diode model considered. The theoretical dependence J(F, T ) was deﬁned by Equation (2). The emitter tip surface temperature (T ) was calculated by the formula obtained in Ptitsin, (1996)

J% r Jr8 \ T$ T ; · 19 2e7 tan 27 tan where T is the initial temperature of the emitting surface at the moment preceding the process of high-density current emission that causes heating of the emitter tip up to temperature T (T T , T 300K), % is the mean energy transferred from the electronic to the phonon subsystem of metal via a single electron’s tunneling to vacuum (Ptitsin, 1996), 7 is the heat conduction coefﬁcient value at high metal temperatures, 8 is the Lorentz number, and is the emitter apex cone opening half angle.

NON-STATIONARY THERMAL FIELD EMISSION

183

Figure 7. Predicted CVCs of TFE for a spherical vacuum diode with an axially symmetric SC distribution of the emitted electrons. 1. T : 0 K; the SC effect on the CVC shape is ignored; 2—4. CVCs of TFE, with the SC ﬁeld effect and self-heating of the emitter tip by emission current taken into account (2 9 : 1°; 3 9 : 2°; 4 9 : 3°).

The current density values for speciﬁed values of r , , and % were numerically evaluated by successive iterations in the following manner: V J (T , F ) (T , F ) J (T , F ) (T , F ) J (T , F ) and so on until J $ 0.99J . Note that to reach such accuracy four iterations were enough. The theoretical CVC curves thus obtained are presented in Figure 7. They have shown that the thermal factor, that is, emitter tip heated by highdensity emission current, has greater effect than the SC ﬁeld of emitted

184

V. E. PTITSIN

electrons has. Besides, comparison of theoretical CVC with experimental ones reveals that in high electric ﬁelds the physical mechanism of electron emission may be substantially changed. As for existing views on the broadening of emission-electron energy spectra in high electric ﬁelds, at present there is only one experimental work devoted to the study of emission-electron energy distribution spectra at strong electric ﬁelds (Bell and Swanson, 1979). The generalized results of this study are presented in Figure 5. The graphs given in this ﬁgure show a signiﬁcant energy spectrum broadening (at half-height) in high electric ﬁelds. They unambiguously established in (Bell and Swanson, 1979) energy spectrum broadening that occurs at high-density emission currents cannot be adequately explained by the theory of thermal ﬁeld electron emission (Murphy and Good, 1956). The authors of (Bell and Swanson, 1979) suggested that the spectrum broadening might be caused by the Coulomb interaction between emitted electrons (Boersch, 1954). However, at that time the authors did not make any quantitative estimates of the possible effect of such interaction on the full width of the energy distribution (E). To obtain such estimates we will use the results of (Knauer, 1981) where an analytical expression for E was derived:

e m dI R\ % V d9 where V is the voltage applied between the ﬁeld emitter and anode of the vacuum diode; R is the distance between the emitter apex and the intersection point of the emitter axis with the tangent to the boundary path of emission electrons; dI/d9 is the angular emission intensity; and d9 is the element of solid angle (9) incorporating the emission current dI. To estimate E we can use the known empirical relationship F : 2V 5 V /5r, where 2 is the so-called ﬁeld factor, or 2 factor, and r is the emitter tip radius, and also the relationship (Dyke and Dolan, 1956) E : 1.45

I dI : d9 2(1 9 cos

)

where I is the total emission current, and 2 is the angular opening of a cone encompassing the electron emission. Substitution of typical (for the pre-explosion phase of non-stationary electron emission from the W emitter surface) parameter values I 30mA, : /3, r 300 nm into the above expression yields the following upper numerical estimate: E < 3.0 eV. Comparison of the estimate with experimental data (Figure 5) shows that the full width of experimental energy distribution spectra for electrons emitted in high electric ﬁelds substantially exceeds the above upper numeri-

NON-STATIONARY THERMAL FIELD EMISSION

185

cal estimate. The difference between the theoretical estimates and experimental data will be even greater if the high- and low-energy tails of the distribution are taken into account. Thus, based on the estimates made, one may conclude that in strong electric ﬁelds the Coulomb interaction is only one of the possible factors that contribute to emitted electron total energy spectrum broadening. As for the mechanism instability of electron emission from pointed metal microcrystals, (with tip radii r 0.1 m), according to Dyke and his colleagues (1956), the electron-emission process may remain stationary (up to J $ 10 A/cm) only if special techniques are used, precluding development of emission instability. If no special technique are applied, the emission instability appears at much lower emission current densities (about 5 10 A/ cm). This was established by Dyke in the course of elegant experiments carried out specially to minimize bombardment of the emitter tip by residual gas ions. They were performed in ultrahigh vacuum (P 10\ torr). Ion bombardment of the emitting tip surface by the ions formed as a result of impact ionization of residual gas atoms in the bulk of a vacuum diode could be eliminated by a magnetic ﬁeld whose magnetic induction vector was normal to the longitudinal axis of the ﬁeld emitter. Nevertheless, even under such ‘‘reﬁned’’ conditions the emission process could remain stationary only at J 10 A/cm. Note here that the experimentally deﬁned stationary current density limit J , which is 10 A/cm for the W emitter, represents only some mean value obtained after data analysis and processing of numerous experiments carried out under the same conditions. Thus, it was in Dyke’s works where it was ﬁrst established that the transition from the stationary mode of ﬁeld electron emission to some non-stationary emission process is activated not by secondary processes in the vacuum diode, such as interaction of the electron ﬂux with the anode surface or with the residual gas atoms, but by initiation of new subprocesses developing as a result of the effect of strong electric ﬁelds on the condensed emitter matter in the high emission current density mode. This non-stationary process, which ends in the development of instability of the total emission current and destruction of the emitter tip, was called by Dyke the explosive breakdown as mentioned earlier. Further investigations (Elinson, 1974; Ptitsin, 1996) on the phenomenon of explosive breakdown have shown, in particular, that if there exists no magnetic ﬁeld normal to the longitudinal axis of the emitter, then the numerical value of J does not usually exceed 10 A/cm. It has been also shown that in the pulsed electron-emission mode the current density limit depends on the voltage pulse width (5). To distinguish between the current density limit in the stationary mode of emission and that in the pulsed emission mode, we introduce the notation J . Investigations of non-station

186

V. E. PTITSIN

ary electron emission have shown that J substantially depends on 5, and decreasing 5 may lead to its change from 10 A/cm at millisecond pulse widths up to 10 A/cm in the nanosecond pulse width range. Respectively, the total emission current from a single emitter may vary with 5 in the range 10\ A to 1 A. It is also worth noting that in Ptitsin, (1996) it was unambiguously shown that the portion of the vacuum diode CVC corresponding to the non-stationary mode of electron-emission did not coincide with the F—N straight line when the emission process was no longer stationary. Dyke et al. (1953, 1956) related the non-stationary behavior of the electron-emission process to heating of the ﬁeld emitter tip by the highdensity emitter current. They believed that the unsteadiness of emission might be caused by the development of thermal instability. The latter is the result of emitter self-heating by the ﬂowing current, that leads to a rise of temperature and, hence, of emission current density with time. As was suggested in Dolan et al., (1953), the interrelated processes of heating and respective emission current density growth, developing in time, may lead to an avalanche growth of emission current and, consequently, to emitter tip melting. The temperature regime of the emitter operation in the intense electron-emission mode was calculated in Dolan et al., (1953), based on solving the non-stationary heat conduction equation. It was assumed that the emitter heating was due only to the Joule heat. The emitter tip geometry was approximated by a ﬁgure of revolution of section shaped like a truncated cone bounded by a portion of a circle. An approximate solution of the equation was obtained with the following simplifying assumptions: 1. physical constants of the emitter substance were assumed to be independent of temperature and equal to some mean values; 2. energy losses by heat radiation were ignored; 3. temperature at the truncated cone base was assumed to be equal to an initial pre-set value; and 4. the conical part length of the emitter was believed to be much greater than the emitter tip radius, that is true only for small emitter cone apex angular opening values (). The results of calculations were used to obtain the curves of temperature distribution along the emitter axis as well as the kinetic curves T : T (t), where T is the temperature at the emitter apex, t is time. The calculated dependencies T : T (t) showed saturation. The calculated value of current density corresponding to the temperature close to the melting point (T ) of the W emitter is 10 A/cm. So the calculations in Dolan et al., (1953) have not allowed direct explanation of experimentally observed emission current growth with time

NON-STATIONARY THERMAL FIELD EMISSION

187

based on original qualitative assumptions of possible transition of the emitter condensed matter from the stationary thermal mode to the nonstationary one at high emission current densities. Note that Dyke also suggested (1953, 1956) that the nonstationary character of the emission process might be caused also by evaporation and impact ionization of emitter substance native atoms. It was supposed (1953, 1956) that the impact ionization could result in partial cancellation of the SC ﬁeld, strengthening of the electron extraction ﬁeld, emission current growth, and so on. However, no quantitative calculations or estimates justifying the above model for the mechanism of emission unsteadiness in strong electric ﬁelds were made. An attempt of more correct calculations of emitter heating by the high density emission current was made later in Gor’kov et al., (1962). Being essentially the same as in Dolan et al., (1953), the approach adopted in Gor’kov et al., (1962), in addition, took into account dependence of the emitter substance constants on temperature. Besides, the calculation results from that work are applicable to emitters with high values (up to /2). The main results of Gor’kov et al., (1962) are as follows. 1. The function T : T (t) is represented by a curve with saturation if J J , where J is some critical value of emission current density.

2. At J J the solution of the heat conduction equation is non-station ary and, accordingly, the T : T (t) curve abruptly rises with time. Furthermore, calculations of Gor’kov et al., (1962) qualitatively correctly represent experimentally deﬁned relationships such as strong dependence of the maximum stationary current density on angle as well as sharp transition of the stationary emission mode to a non-stationary one upon a relatively small increase of voltage between the anode and cathode of the vacuum diode. However the calculation results of Gor’kov et al., (1953) qualitatively rather poorly agree with experiment Dyke et al., (1956) since, according to estimates made, J 10 A/cm, that is an order of magnitude higher than

the maximum experimental value of stationary current density for W emitters. In later studies on the problem of non-stationary electron-emission in high electric ﬁelds (Martin, 1960; Mitterauer et al., 1975; Glazanov et al., 1989; Ptitsin et al., 1992, 1993) the effect of Nottingham (1941) was taken into account for heat calculations. As is known, this effect consists in the phenomenon of ﬁeld emission from the energy region below the Fermi level at rather low temperatures, and electrons escaping the metal are substituted by those injected into the metal from an external circuit with energy approximately equal to the Fermi

188

V. E. PTITSIN

energy. This dynamic process leads to metal lattice heating due to thermalization of the injected electrons. At emitter heating the maximum in the electron energy distribution may rise above the Fermi level and, as a consequence, the emitter tip is chilled. The transition from heating to cooling is called an inversion of the Nottingham effect, and the respective lattice temperature at which certain balance is reached is called the inversion temperature (T *). The inﬂuence of the Nottingham effect on the process of TFE was ﬁrst discussed in Swanson et al., (1966). It has been shown there that thermal processes of lattice heating and cooling caused by this effect are conﬁned in the emitter subsurface layer as the electron free path length is very small relative to the event of electron-phonon relaxation and equal to (1—10) nm. Besides, Swanson et al., (1966) gives an expression for the inversion temperature T *. Let us consider the inferences of Swanson et al., (1966) in more detail. The average energy released in the microcrystalemitter lattice as a result of emission of a single electron (% ) equals the difference between mean energies of emission ((%) and conduction (% )

electrons % : (% 9 %

In Nottingham, (1941) it is supposed that % : E , where E is the Fermi

energy at T : 0K. Then the average energy of exchange % is given by Levine (1962) % : kT cot(kT /d) where 9.76 · 10\ · F d$ 3 · t(y) Obviously, if kT /d : 0.5, no energy exchange occurs and, hence, the inversion temperature will be 5.67 · 10\ · F T *(K ) : 3 · t(y) where F comes in V/cm and 3 in eV. If the temperatures of electron (T ) and phonon (T ) subsystems of a " metal can be assumed to be the same (Kaganov et al., 1956) that is, T : T T and the Nottingham effect is considered a purely surface one, " the non-stationary thermal problem concerning ﬁeld emitter heating by the ﬂowing TFE current is formulated as follows T J 1c(T ) : (7 (T ) T ) ; 9 *(T ) (J # · T ) t (T ) ((T ) · 3) : 0

NON-STATIONARY THERMAL FIELD EMISSION

189

The boundary and initial conditions for these equations are:

T J(F , T ) % (F , T ) 9 (T ) S T : $ n e 3 (T ) : J(F , T ) T : T ; T (t : 0) T n % Here 1(T ) stands for the emitter substance density; c(T ) for speciﬁc heat, 7(T ) for the thermal conduction coefﬁcient; s(T ) for the electrical conductance coefﬁcient; *(T ) for the Thompson coefﬁcient; F and T for the ﬁeld strength and temperature, respectively, at the emitting surface; (T ) for the surface blackness degree; S for the Stefan-Boltzmann constant; n# for the $ vector of the normal to the emitting surface; J(F , T ) for the current density at the emission boundary; T for the temperature at the emitter base; and 3 for the potential. The solution to the thermal problem thus formulated is very difﬁcult to obtain, even numerically. Therefore, various simpliﬁcations are applied. A solution of the one-dimensional stationary thermal problem for the conical model of the emitter tip bounded by a spherical surface portion of radius r was ﬁrst found in Martin, (1960); Swanson et al., (1966). The contribution of the Thompson effect, energy losses by thermal radiation, as well as temperature dependence of the coefﬁcients 7(T ) and (T ), were ignored and the distribution of the T (6) along the emitter axis was derived in the form of 7(T )

I% 1 I I T (6) : T ; ; 9 (13) e7 tan 76 · tan 6 276 · tan here 6 is the spherical coordinate, is the emitter cone apex opening half-angle, I is the emission current, 6 : r/tan . Taking 6 : 6 and J : I/ (6 tan ) we have I% I T :T ; ; (14) e7r · tan 27r · tan

or

J% r Jr ; T :T ; e7 · tan 27 · tan After differentiating Equation (13), we obtain

(15)

T I% I I :9 9 ; 6 e76 · tan 76 6 · tan 76 · tan where it follows that the maximum temperature is reached at the point

190 6:6

V. E. PTITSIN

% \ 6 :6 · 1; eJ6 The derivative of the function T (6) at the point 6 : 6 is deﬁned by T 6

:9

I% J% :9 e7 e76 · tan

MMQ Substitution of typical mean experimental values of parameters (, 7, , % ) for the W emitter into Equation (15) yields numerical estimates of J values at which the emitter surface temperature T reaches (2000—2500) K. Calcu lations show that for : (0.01 to 0.03), 7 : 1 W/cm, : (1.3—2.0) 10 Ohm\ cm\, % : (0.28 to 0.33) eV (Swanson et al., 1966, 1973), T : 300 K the above mentioned temperatures are reached at J : (1 —3) 10 A/cm. The obtained estimate of J well agrees with experimental data of Ptitsin et al., (1985, 1986, 1996) where it has been shown that it is at these emission current densities that emission anomalies outlined above go beyond the theoretical conceptions of the physical mechanism of TFE. The agreement between the theoretical calculations and experimental data shows that the thermal problem in Martin (1960); Swanson et al., (1966) is stated quite correctly. However, the solution for the stationary thermal problem obtained in Martin, (1960); Swanson et al., (1966) does not answer the main question: What is, in fact, the mechanism that causes the emission current growth with time if the emitter tip temperature at J 10 A/cm remains substantially below the melting point. The studies in Glazanov et al., (1989); Vibrans, (1964) have also not clariﬁed the above question since their results do not agree with experimental data because of considerable simpliﬁcations adopted in the non-stationary emitter heating problem statement. Note that interesting results were obtained in Glazanov et al., (1989) where the authors calculated the kinetics of pulsed heating of pointed emitters of real geometry by the high-density emission current. The main results of that work can be formulated in the following way: If the initial electron emission current density exceeds some critical value (10 A/cm), then the bulk emitter temperature, emission current, and bulk heat release power show an avalanche increase with time due to development of thermal instability. During heating up of the emitter by emission current the bulk of the emitter exhibits a region of higher temperature (as compared with that at the emitter surface), which causes thermal and elastic stresses sufﬁcient for mechanical destruction of the emitter.

NON-STATIONARY THERMAL FIELD EMISSION

191

III. Phenomenological Model of Non-Stationary Thermal Field Emission A. Heating of the Pointed Microcrystal Emitter Tip by the Emission Current Flow As follows from the data given in the preceding sections, in the highemission current-density mode one can observe a number of effects and processes that are not amenable to adequate treatment in the framework of existing canonical conceptions applied to the phenomenon of TFE. However, the fact that under such conditions the emission process is inﬂuenced by heating of the MC tip by the high-density emission current causes no doubt. As mentioned above, thermal calculations of emitter tip heating by emission current poorly agree with experimental data. Analysis of results from these calculations and also new experimental data of Ptitsin et al., (1985, 1986) that characterize some speciﬁc features of the emission process in high electric ﬁelds show that the disparity between the predicted and experimental data may be related to the fact that all the works mentioned ignored the nonlocality of ‘‘hot’’ holes energy dissipation. These holes are formed in the subsurface layer of the emitter as a result of electron emission from the energy states below the Fermi level (Nottingham, 1941; Swanson and Bell, 1973). Besides, these works ignored the fact that at intense heating of the MC tip a noticeable contribution to the thermal balance might be made by thermally activated self-diffusion and also by evaporation of the MC native atoms into vacuum. Simultaneous consideration of all these factors when solving the heat conduction equation does not seem possible. Therefore, it was suggested (Ptitsin et al., 1992, 1993) to separate the solution of the heat conduction equation and analysis of thermally activated processes into two interrelated problems. Solving the heat conduction equation with regard to nonlocal hole energy dissipation yielded temperature distribution at the MC tip as a function of J, % , , and other parameters. Then the energy balance equation for surface processes was analyzed at a given surface temperature T to evaluate the contribution from each thermally activated surface process to dissipation of the heat energy ﬂux, which (due to nonlocal dissipation of hole energy) propagates from the inside of the MC substance towards its emitting surface. Energy relaxation of hot holes produced in the course of tunneling emission from the states below E was considered in the framework of conceptions developed in Gadzuk and Plummer, (1973). According to those conceptions, a ‘‘hot’’ hole is localized in the subsurface layer 0.5 nm thick.

192

V. E. PTITSIN

The energy of the hot holes is reduced through conduction electron scattering by the holes, and the excess energy passes to electrons whose energy is close to E . The characteristic time of this process is the time of electron—electron interaction 10\ s (Ziman, 1962). After that, electrons of energy above E give their energy to the lattice at a distance approximately equal to the characteristic length of electron—phonon interaction " from the emitting surface. It would be natural to believe that the distribution of distances (from the emitter surface) at which electrons give up their energy to the lattice is normal. Then, in spherical coordinates, the stationary heat conduction equation in which the energy dissipation processes connected with the Nottingham effect are considered in terms of a volume heat releasing source, can be written as (Ptitsin 1992, 1993) T 2 T I% exp[9(2D)\ ( 6 9 6 9 ( )] I " ; ; ; :0 6 6 6 76 tan (2e7D6 · tan (16) where D is the variance of the random value ; ( is the mean value " " of ; 7 and are the coefﬁcients of heat and electrical conductivity, " respectively. The boundary conditions for this equation are

T :C T :T 6 where T is the emitter base temperature, C (C 0) is the dimensional constant for a given ﬁeld strength value at the emitter tip (F ). The constant value C may be determined from the function T (6) derivative zeroing condition at the point 6 $ 6 ; ( . This condition follows from the " known proposition that the temperature maximum in the bulk of a substance is located at the point corresponding to the maximum volume density of heat release power. Ignoring the temperature dependence of the functions 7(T ) and (T ), the solution of the heat conduction equation for the temperature distribution along the emitter axis T (6) can be written as

6 · C I 1 1 I% T (6) : T 9 ; 9 ; 6 76 · tan 6 26 2e7 · tan 1 ( z 9 6 dz " ; erf ; erf 6 z (2 · D (2 · D M 6 C I 1 1 I% 5T 9 ; · 9 ; 6 76 · tan 6 26 2e7 tan

193

NON-STATIONARY THERMAL FIELD EMISSION

;

1 ( 696 " ; erf ; erf 6 (2D (2D

1 , 66 6

where the constant C for a given value of ﬁeld strength at the emitter tip is deﬁned by the following relationship C5 $

1 I% ( I 6 96 " ; · erf · 6 2e7 tan 76 tan 6 ·6 (2D

J% ( J( erf " " ; 2e7 7 (2D

(17)

Substituting the relationships 6 : 6 , 6 : r/tan , and J : I/r into T (6) yields the expression for the surface temperature at the emitter tip:

I ( ( tan I% " " T 5T ; ; 1 ; 2erf 27r · tan 2e7r · tan r (2D

(18)

This expression, accurate within the factor 1/2 in the addend deﬁning the contribution from the Nottingham effect, coincides with a similar expression (14) given in (Martin, 1960; Swanson and Bell, 1973). This implies that consideration of the hot hole energy dissipation nonlocality does not result in substantial correction of the earlier obtained numerical estimates of T . However, it is worth noting that the solution of the thermal problem given above, nevertheless, basically differs from the solutions obtained earlier (Martin, 1960; Swanson et al., 1966, 1973). The difference is in that, owing to the nonlocality of hole energy dissipation, the function T (6) has a maximum in the subsurface layer of the emitter at the point 6 : 6 . The maximum of T (6) at high J values means that in the intense TFE mode there exists a heat ﬂux propagating from the bulk of the emitter material toward the emitter surface. The ﬂux power density (# : 97C) at high J levels may reach values up to (10—10) W/cm. For the emission stability conditions to be retained, this heat ﬂux directed toward the emitter surface must effectively dissipate through activation of various surface processes. In this connection a natural question arises about the energy dissipation mechanism of this heat ﬂux. The analysis of this question is given in the subsequent sections. However, prior to this analysis it would be useful to consider a consequence of the solution found for the thermal problem. In particular, the solution given above can provide numerical estimates of maximum, or so-called limiting values of current densities (J ) in the

194

V. E. PTITSIN

stationary mode of TFE. It would be natural to assume that, by deﬁnition, J corresponds to such J values at which the temperature T becomes equal or close to the melting point (T ). Of course, the actual, experimental values of the current density limit should satisfy the inequality J J . By using the Wiedemann-Franc law, Equation (18) is easily transformed into

J% r Jr8 \ T $ T ; ; 19 2e7 tan 27 tan

(19)

where 8 is the Lorentz number. The latter expression is conveniently represented in the dimensionless form : : (1 ; ) · (1 9 )\

(20)

where

J % 2 7 · tan : T /T ; : (28)\ ; ; and J J eT 8 r Note that J formally corresponds to such emission current density value at which T $ -. From Equation (20), after rearrangements, we obtain J 4(: 9 1) : J: 91 1; (21) 2:

The latter expression can yield numerical estimates of J for emitters made of W, Mo, and Nb. Taking T :300 K, $5, r$0.3 m, % 50.3 eV (Swan son et al. 1966; 1973), and T : T and substituting respective tabular data gives the following estimates: J $ 1.8 10 A/cm (for W); J $ 1.6 10 A/ cm (for Mo); J $ 1.2 10 A/cm (for Nb). Comparison of the theoretical estimates of J with experimental data (Table 1) suggests the following implications.

TABLE 1 Dependence of Average Experimental Values (J and (J on emitter substance Emitter substance

W

Mo

Nb

5 value, ms

0.3

0.6

0.5

(J ; 10\, A/cm2

9.3

7.5

4.3

(J ; 10\, A/cm

10

8.0

5.0

NON-STATIONARY THERMAL FIELD EMISSION

195

1. The theoretical value of J for W well agrees with the results of special Dyke’s experiments, which involved a magnetic ﬁeld transverse to the emitter axis that was created at the emitter surface in the ultrahigh vacuum. This agreement between theory and experiment allows one to admit that the solution of the thermal problem given above is quite correct. Based on the assumption of correctness of the solution obtained, one may now conclude that disparity between theoretical estimates for J and experimental data for (J (Table 1) is due to the fact that instability arising at current densities ! of about 10 A/cm with no magnetic ﬁeld applied takes place at temperatures substantially below the melting point of W (T (1800 —2000) K). The transverse magnetic ﬁeld ‘‘shifts’’ the onset of the emission instability development to the region of higher current densities and, hence, higher emitter substance temperatures (up to T ). 2. Numerical theoretical estimates of J values for various emitters (W, Mo, Nb) differ only slightly, which may be explained by the closeness of respective coefﬁcients 7 to each other at temperatures near the melting point. 3. Theoretical estimates of J values 2—3 times exceed the experimental ones, that, in accordance with Implication 1, proves that emission instability leading to explosive breakdown develops at T (0.7 —0.8)T . 4. In accordance with experimental data (Ptitsin, 1996) the current density limit J . tan /r. To summarize, it may be stated that emission instability in strong electric ﬁelds is undoubtedly activated due to heating the metal lattice by the high density emission current, but this is not the only reason for the development of explosive breakdown since the instability appears at relatively low metal temperatures T (0.7 —0.8)T . B. Instability of the Microcrystal Emitter Tip Surface During Non-Stationary Thermal Field Emission As follows from thermal calculations, the nonlocality of hot hole energy dissipation is responsible for the temperature maximum occurring in the subsurface layer of the emitter material at a distance $( from the " emitting surface. This, according to general thermodynamic concepts, implies that intense TFE is accompanied by a heat ﬂux whose density vector is directed from the inside of the emitter material towards the emitting surface. This suggests the necessity of analyzing possible dissipation mechanisms of this heat ﬂux. To this end, in Ptitsin et al., (1992, 1993) was considered a local equation of balance for energy ﬂux densities across the emitting surface. According to

196

V. E. PTITSIN

Equation (17), one may write

J( n D (T ) l ( J% " 5 S T ; O " ; erf $ kT S 2e (2D

8 ; 8 n f exp 9 kT where n is the concentration of the emitter material native atoms; is the heat of thermal transfer (Geguzin et al., 1984); D is the coefﬁcient of surface self-diffusion; (T ) is the tangential component of the vector, T, S is a O physically inﬁnitesimal element of the emitting surface at the emitter tip; l is the length of the outline enclosing the element S; 8 is the binding energy of the adatom with the emitting surface; and f $ kT /h is the adatom thermal vibration frequency. Note that the physical meaning of the balance equation is nothing but the energy conservation law for thermal ﬁeld processes, which take place at the emitting surface at intense electron emission. For ( $ (5—10) nm (College papers, 1973), ( $ (2D, % $ 0.3 eV " " (Swanson et al., 1966, 1973), and emitter made of W, at J $ J the numer ical value for the left-hand side of the balance equation will be below 10W/cm, while the heat radiation power density will not exceed 10 W/cm. To evaluate the contribution of the second addend on the right-hand side of the energy balance equation, we will rearrange it ﬁrst. Using the conical model of emitter bounded by a portion of a spherical surface of radius r and choosing an element of the spherical surface at the emitter tip as an element S, one can write T T e : T : (T ) e# ; (T ) # e# ; e# O 5 n (22) T T J (T ) : : O 5 J 5 where it is assumed that the temperature distribution in the vicinity of emitter tip apex is described by a certain function T : T (J ( (5))); e# , e# are the unit vectors tangent and normal, respectively, at the point M + S (note that, because of the axial symmetry, the vectors e# , e# and T lie in the meridional plane passing through this point and the axis of emitter); is the polar angle; T /5, T /n are the derivatives of the function T : T (J ( (5))) with respect to corresponding directions; and 5, n are the curvilinear coordinates along the e# , e# directions. Taking J( ) : J cos , where J is the current density at the emitter tip apex, and using Equations (20) and (22) and also the relationships /5 : 91/r, after rearrangement yields

NON-STATIONARY THERMAL FIELD EMISSION

197

(Ptitsin, 1993) n (T ) l (1 9 cos )n D [( ; 2 ; )] O $ (23) kT S kT r(1 ; ) It is evident that in the limit at $ 0 this expression deﬁnes the local energy ﬂux density due to heat transfer at the emitter tip. To evaluate the right-hand side of Equation (23) numerically, one should deﬁne three parameters: n , , and D . Determining the numerical value of n at intense electron emission is a separate nontrivial problem. The solution of this problem is given in Section III. C. Here we only note that according to the solution found (Ptitsin, 1990, 1991), at T (0.7—0.8)T the concentration of the emitter material native atoms being at the emitting surface in the twodimensional gas state is equal to n 10 cm\. To obtain the ‘‘upper’’ estimate of the coefﬁcient D , we use a known expression for the coefﬁcient of surface self-diffusion in the Arrhenius form D : D exp(9E /kT ) ! where D is the so-called pre-exponential factor, and E is the activation ! energy of the surface migration. As far as we know, at present there are no analytical expressions for the factor D at T (0.7—0.8)T . So we will use the experimental data of Zhdanov (1988) according to which the maximum value of the coefﬁcient D for the W MC does not exceed 10\ cm/s. At intense electron emission (Ptitsin, 1991) as a result of the thermally activated destruction of the monoatomic terrace ledges exposed at the emitting microcrystal emitter surface, the value of can be evaluated based on the TLK (terrace-ledge-kink) model (College papers, 1959). This model gives for W(110) the value $ 5 eV (Geguzin et al., 1984). With these data in mind, we ﬁnd (Ptitsin, 1990, 1991) that at J $ J , the thermal transfer energy ﬂux density (due to surface self-diffusion over the nonisothermal emitter surface) does not exceed 10 W/cm. Taking into account that the obtained value of the thermal transfer energy ﬂux density is an ‘‘upper’’ estimate, we conclude that at T (0.7— 0.8)T the energy balance equation holds only if the general dissipation mechanism of the thermal ﬂux directed from the inside of the emitter substance towards the emitting surface includes, in addition to processes of heat radiation and transfer, also the process of activated evaporation of emitter material native atoms, which go from the binding states in a two-dimensional gas to free states in vacuum. Based on this conclusion and using Equations (19—21) for J $ J , we obtain the mean effective binding energy of the adatom with the W emitter surface: 8 $ 2.5 eV. Note that this value of 8 is less than the binding

198

V. E. PTITSIN

energy of the W adatom with the W(110) surface when the metal surface is not exposed to a high external electric ﬁeld. The difference in binding energy values is quite signiﬁcant and equals 2 eV. This question calls for further investigation since for now it is difﬁcult to give a reasonable explanation for this disparity. Nevertheless, the general inference of this section based on the fundamental concepts of thermodynamics is beyond question and consists of the fact that during intense electron emission (J 10 A/cm) the emitting surface of a pointed metal MC is transformed into an effective emitter of native neutral atoms. In other words, under these conditions the emitter surface becomes unstable and a source of neutrals (Ptitsin, 1990, 1991, 1993). C. Determination of the Surface Concentration of the Microcrystal Emitter Substance Native Atoms in the Two-Dimensional Gas State To calculate the concentration of native atoms on the emitting surface in the two-dimensional gas state at intense emission, consider a pointed W(110) MC in a ultrahigh vacuum environment. The estimation of the concentration sought (n ) at the MC emitter apex will be carried on in the context of the TLK model (College papers, 1959). The activation energies of atom transitions from various surface states will be calculated in terms of the theory of pairwise interactions. As a model of the real emitter we consider a W MC with orientation (110 bounded by a portion of a spherical surface (Muller and Tsong, 1972; Figure 96). In this model the upper part of the semisphere consists of a stack of atomic W(110) planes superimposed on one another. An average value of n will be used as a mean concentration n on concentric monoatomic ledges of width b and height s (Muller and Tsong, 1972) s:

a (h ; k ; l)

where a is the lattice parameter; h, k, l are Muller’s indices for the given orientation; is the coefﬁcient equal to 1 if (h ; k ; l) is an even number; and : 2 if (h ; k ; l) is an odd number. In our case s : a (2/2, a : 0.316 nm. The ledge size b was calculated from the relationships (Muller and Tsong, 1972) b : r(sin 1 9 sin 1 ), i : 1, 2, 3, . . . \ (i ; 1) · s 1 : arccos 1 9 , i : 0, 1, 2, . . . r b : r sin 1

NON-STATIONARY THERMAL FIELD EMISSION

199

As a measure of the terrace width at the emitter tip of a given radius, we take an arithmetical mean ((b) over the ﬁrst ten terraces. Then, for example, for r : 300 nm, (b 2.7 nm; for r : 500 nm, (b 3.5 nm; and for r : 1000 nm, (b 4.9 nm. This means that for the adopted model the terraces are narrow concentric rings. Therefore, based on the model symmetry, the problem of adatom diffusion on the terrace may be solved not for ring terraces, but rather for long plane-parallel ledges in the Cartesian coordinates on the planes. Note that such simpliﬁcation eliminates Bessel’s functions in the solution of the diffusion problem. To deﬁne the n distribution over a certain terrace, it is necessary to ﬁnd a solution to the equation (College papers, 1959) j :j 6

(24)

where j : 9D (n /6) is the atomic ﬂux escaping the ledge; 6 is the coordinate counted along the normal to the ledge kink; and j is the atom evaporation—condensation ﬂux density at some point of the ledge. For evaporation into vacuum

kT E j 5 n (6) exp 9 9 (25)

h kT where E is the activation energy of adatom evaporation from the terrace surface, and is the condensation ﬂux density. As shown in Ptitsin, (1990);

Hirth and Pound (1957), at intense electron emission " 0. According to

the Hirth and Pound theory (1957), the relationship between n (6) and adatom concentration at an equilibrium pressure n" (6) is of the form n (6) 2P 1 : ; n" (6) 3P 3 s " where P , P are the equilibrium and nonequilibrium metal vapor pressures, " respectively, (P P). From this relationship, it is easily seen that at P : 0 " ( : 0) the values of n (6) and n" (6) will differ 3 times max. So if the term

is neglected, the error of the evaporation—condensation ﬂux density

calculation also does not exceed 3 times, that is quite acceptable for our purposes. Thus, in view of the aforesaid, Equation (24) can be written as (Ptitsin, 1991)

n (6) kT E 9 n (6) · · exp 9 : 0 (26) 6 hD kT To specify the boundary conditions, in addition to this equation, it is necessary to deﬁne an average adatom displacement (() for the time

200

V. E. PTITSIN

preceding the evaporation event (5 ). The time 5 can be evaluated, based on the conceptions developed by Frenkel, (1958)

h E 5$ exp kT kT If the adatom movement over the W(110) terrace is considered as a two-dimensional random walk, we obtain

4D 5 For W(110) we assume the following values of parameters: E : 4.4 eV (Nakamura and Kuroda, 1969), E $ 3.1 eV (Sokolskaya, 1956; Barbour et ! al., 1960), D $ 3.6 10\ cm/s (Barbour et al., 1960). Note here that the difference in numerical values of E given in Sokolskaya (1956); Barbour et ! al., (1960) may be, perhaps, explained by the contribution of the entropy factor (Muller and Tsong, 1972). Calculations of ( for various T show that at relatively high temperatures (1500 K) the relationship b $ ( holds well. Therefore, the boundary condition for Equation (26) can be expressed as ( :

n (6) : n (0) n , n (6) 50 &J M In view of the remarks made and boundary conditions speciﬁed above, the solution of Equation (26) for the adatom concentration on the terrace will be (Ptitsin, 1990)

kT E exp 9 (b 9 6) hD 2kT n (6) : n · kT E sh exp 9 2kT hD The mean concentration of adatom may be given by sh

kT E (b exp 9 hD 2kT (n (6) : n · kT E (b exp 9 hD 2kT In the above expression, n is an uncertain quantity. To deﬁne it, we will use the following relationship (College papers, 1959) cth

j y:q 9q where q is the atomic ﬂux density when the atoms go from their state at

NON-STATIONARY THERMAL FIELD EMISSION

201

the ledge step to a two-dimensional gas state on the terrace; q is the atomic ﬂux density when the atoms go from the two-dimensional gas state on the terrace to their state at the ledge; and y is the ledge length per unit crystal surface. Substitution of the expression for n (6) as well as known expressions for q , q given in (College papers, 1959) into the last relationship and simple rearrangements yield [Ptitsin, 1991, 1993]

n E ;E ;E E · exp 9 (n (6) $ · exp 9 (b kT kT \ # · 1 · (b E ; · exp 9 ! cth (# (b) kT where n is the number of adsorption sites per unit monoatomic ledge length (3.8 10 cm\); # (kT /hD ) 1 is the probability of the adatom diffusion jump; E is the activation energy of the transition from the kink to adsorption at the ledge; E is the activation energy of the transition from adsorption at the ledge to adsorption in the plane; and E is the energy of kink formation at the ledge. In the framework of the theory of pairwise interactions, the activation energies E , E , E can be expressed in terms of two parameters: and *, where is the enthalpy of surface diffusion activation, and * is the energy of bond breaking at the kink-ledge adsorption transition. Based on the results of Wang and Tsong, (1982), 5 0.9 eV can be considered quite a credible value of for W(110). Then the value of * for W(110) can be obtained from the relationship (College papers 1959): 8 : 4 ; 3*, where 8 is the sublimation energy for W. The numerical value of 8 : 8.66 eV, so * : 1.7 eV. Using the thermal calculation data for T and expressions for activation energies given in Barbour et al., (1960), we obtain (n (6) $ (10 —10) cm\ at T $ (2000 —2500) K. This means that as a result of emitter tip heating by the high-density emission current the emitting surface will be coated by about a monolayer of native atoms in the two-dimensional gas state (Ptitsin, 1991). This agrees with the data of direct experiments (Drechsler, 1988). Note here that, generally speaking, the activation energies for various transitions depend on the ﬁeld strength because the values of polarizability and surface-induced dipole moment for adatoms differ from zero. However, the contribution of these polarization terms can be ignored as in this case the relative changes in activation energy do not usually exceed $10%. The calculations made for (n (6) and T can be used, in particular, to estimate the emitter shortening rate due to adatom evaporation from the MC surface. The rate of pointed emitter length decrease with time (l/t) in

202

V. E. PTITSIN

the course of emitter heating can be expressed as

E kT l exp 9 $ a(n (6) h kT t This gives l/t $ 1 m/s at T $ 2500 K, and E $ 4.4 eV for W(110 emitters. The obtained shortening rate at ﬁrst glance may appear to be rather signiﬁcant. However, this estimate is true only for small tip radii and angles . As the real emitter shortens, its parameters r and substantially increase and, hence, the factor n /(b, concentration (n (6), and l/t will essentially decrease with time during high-temperature baking. This ‘‘dimensional’’ effect is well known to specialists in ﬁeld emission electron and ion microscopy: At high-temperature heating of small radius (r 0.1 m) pointed microcrystals with no electric ﬁeld applied, the MC tips are rapidly blunted up to about 1 m as a result of thermal evaporation and surface self-diffusion. In the course of emitter tip blunting, the tip retains ﬂattened roundedness. Besides, this process is usually accompanied by deterioration of vacuum conditions inside the experimental instrument or system. A qualitatively different picture is observed at high-temperature heating of tips in a strong external electric ﬁeld. As known from Sokolskaya, (1956), under such conditions a so-called thermal rearrangement phenomenon takes place. In addition, the high-density current pulse ﬂowing through the emitter tip initiates the effect of spontaneous rearrangement (Krotevich et al., 1985). In this connection it would be useful to consider in more detail the processes developing at the MC-vacuum boundary during high-temperature heating of a pointed emitter in a strong external electric ﬁeld. D. Characteristic Features of the Emitter Native Neutral Atoms Motion After the Evaporation from the Emitting Surface into Vacuum In a high inhomogenous electric ﬁeld a neutral atom at the MC tip surface # (Ptitsin, 1991; Muller and Tsong, 1972) exercises a polarization force ; % # : · grad(F) ; 2 where % is the atom polarizability. To ﬁnd qualitative estimates of the main characteristics of atom behavior on evaporation, consider a model problem with a spherically symmetric electric ﬁeld distribution at the tip apex F(6) : F r/6 here F is the ﬁeld strength at the apex of a radius r spherical emitter. After

NON-STATIONARY THERMAL FIELD EMISSION

203

an evaporation event the atom energy (W) at an arbitrary point above the emitter surface is deﬁned by W:

M6! L %F ; 9 2 2M6 2

(27)

where M is the mass of the emitter substance atom, and L is the impulsive moment of the atom relative to the ﬁeld center (center of a sphere). The radial motion part in this expression may be considered as one-dimensional motion in a force ﬁeld with the ‘‘effective’’ potential energy U (Landau and "' Lifshits, 1973) L %F U 9 "' 2M6 2 The values 6 : 6 , at which U : W, deﬁne the bounds of the atom "' motion region because, if this condition is met, the radial velocity 6! goes to zero. The equality 6! : 0 corresponds to the turning point of the trajectory at which the function 6(t) changes from increasing to decreasing. The atom movement will be ﬁnite if the inequality U W holds. If this condition is "' satisﬁed, upon evaporation the atom will be ‘‘locked’’ in a potential trap due to the effect of the polarization force on this atom. It may be supposed that the mean kinetic energy of the atom at the moment of evaporation is equal to $kT , where T is the emitter surface temperature, and the probability density of atom leaving the surface at an angle to the normal to the surface is $2/ and, hence, the mean angle ( equals /4. In the general case, based on the energy conservation law, one may write r %F r %F kT 9 : kT sin 9 2 6 2 6 where it follows that for ﬁnite movement

6 :r·

sin 1 ;

19

2'

4'(1 9 ') sin

(28)

\

(29)

where the parameter ' %F /2kT must obviously satisfy the relationship 1 ' · (1 ; (1 9 sin ) 2 If one takes : ( : /4, then ' 0.933. Note that at : ( : /4 and ' : 1 the maximum separation of the atom from the emitter surface is

204

V. E. PTITSIN

6 9 r : (2r. For the speciﬁc case of ﬁnite movement, when the atom # ( : ) and ' 1, the initial velocity vector is collinear with the vector ; relationship [Equation (29)] may be simpliﬁed (Ptitsin, 1990, 1991) kT r : 6 9r$r (30) 2%F 4' Further analysis will require also an estimate of the typical jump time (t ) of the atom after the evaporation event for ﬁnite movement. In view of the energy conservation law, from Equation (28) follows

r r · sin \ 2M M 19';' 9 t : · d6 6 6 kT The latter integral cannot be expressed in terms of elementary functions. To evaluate the atom jump time t in the potential trap region, we restrict ourselves to the speciﬁc case of : ( : /4, ' : 1. Then 6 9 r : (2r and the jump time will be deﬁned by

M (31) t 5 3r kT Substituting the typical values of W emitter parameters for the pre-explosive emission phase into Equation (31) yields t $ 2 ns and, hence, the time of the atom upward movement up to a point with the coordinate 6 : 6 will be $1 ns. The calculations given above refer to the model problem (a single sphere) and, therefore, cannot be directly applied to respective estimations for emitters of real geometries. To use the above results for such emitters, the following considerations could be applied: (1) if a single sphere whose surface potential is equal to V is replaced by a pointed emitter of real geometry with the same radius of curvature r of the tip and the same potential value, then the ﬁeld strength near and at the top apex (F ) will be related to the ﬁeld strength value F at the sphere surface by the formula F : 2rF where 2 is the ﬁeld factor (or 2-factor), F : V ; (2) since the turning points are located near the emitter surface, to a ﬁrst approximation Equation (28) remains valid. In view of the aforesaid, we have

F$

2kT %

r 6 r 1 9 2r 6

1 9 sin

(32)

NON-STATIONARY THERMAL FIELD EMISSION

205

Using Equation (32) and recalling that % : 1.867 10\ C m/V (for W atoms (Ptitsin, 1991)), one can determine F values at T $ 2000 K. The calculations show that the F values range from $2.7 V/nm to $5.4 V/nm (depending on 2 and ). Thus the calculations made above prove that during electron emission in the presence of a high electric ﬁeld the atoms of the emitter material evaporating from the emitting surface turn to be ‘‘locked’’ in a polarization trap. The linear dimensions of the polarization trap (6 9 r) are very small (about 100 nm and below). Numerical calculations of F for other MCs (of Ta, Mo, Nb) show that the ﬁniteness condition for the movement of evaporating atoms is met also for these emitters if the emission takes place in high electric ﬁelds. # does not depend It is worth noting that, as the direction of the vector ; # (with respect to the normal to the emitter surface), on that of the vector F all the results and implications of this section are applicable both to electron emission in a high electric ﬁeld and to intense heating of a pointed emitter due to external energy source in the electric ﬁeld of the reverse (or ‘‘ﬁeld ionization’’) direction. It is known that during heating the emitter tips in the electric ﬁeld of the ‘‘ﬁeld ionization’’ direction, the initially ﬂattened rounded surface of the MC tip is substantially transformed. These evolutional changes in the MC surface due to high temperatures and intense electric ﬁelds were called the phenomenon of thermal ﬁeld rearrangement. Studies of this phenomenon have shown that the thermal ﬁeld rearrangement may involve both inﬁnite (Zhukov et al., 1989) and ﬁnite (Krotevich et al., 1984, 1985; Fursey et al., 1984) neutral atom movement. For further analysis, it would be interesting to estimate the vapor atom concentration (N ) in the polarization trap region. For a single-sphere emitter model we will do this with the following simplifying assumptions. In particular, we will suppose that: 1. The vapor atoms constitute an ideal gas of particles and, hence, such ensemble of particles can be described in terms of the Maxwell-Boltzmann statistics. 2. Upon evaporation the emitter atoms in the polarization trap region remain unionized. To calculate N , we assume that in the steady state the evaporation rate ( ) is equal to the condensation rate ( ) "

kT E exp 9 : (N , T ) :n (33) " h

( kT where T is the vapor atom temperature. The vapor atom could be assigned (

206

V. E. PTITSIN

a deﬁnite temperature, if in the polarization trap region there existed interatomic collisions accompanied with kinetic energy transfers. As a criterion of the absence of interatomic collisions, the following relationship can be adopted 1 5 2(6 9 r) (34) ' (2N where is the free-path length of the atom, and is the effective collision ' cross section. Taking $ R , where R is the atom radius, in view of Equation (34), we obtain that the effective interatomic collisions will take place if N is on the order of over 10 cm\. The calculations below will show that N values are somewhat less than 10 cm\. In the absence of interatomic collisions, it is imposible to talk about temperature as a constant quantity independent of the coordinates of the thermodynamic system and characterizing the equilibrium state of the vapor atoms since in this case the kinetic energy of a single vapor atom (E ) is not constant and depends on the coordinate 6 as follows

1 r 1 E (6) : kT 9 %F ; %F 2 2 6

(35)

To ﬁnd the form of the function N (6), we will use the relationship of the classical statistical physics and deﬁne the effective temperature of vapor atoms (T ) by " 1 T $ · (E " k where (E is the average value of kinetic energy over the entire ensemble of atoms and coordinate 6. Note that such deﬁnition of T assumes that the " impulsive moment of an atom moving in the central force ﬁeld remains constant and, hence, the atom trajectory lies in one plane. Therefore, the number of degrees of freedom for this atom can be taken equal to 2. In view of the above remarks, (E can be expressed as 1 MG 1 (E : E (6) d6 (36) 6 9r N where N is the total number of atoms within the polarization trap. Using Equation (36), upon integration, we obtain (E $ kT and, hence, T : T $ T . Using the expression derived for the effective temperature and ( " the relationship for the magnitude of vapor atom velocity

2kT M

207

NON-STATIONARY THERMAL FIELD EMISSION

the condensation ﬂux density can be deﬁned as (Ptitsin, 1990, 1991; Landau and Lifshits, 1976)

M )Q M kT : 2N exp 9 · d $ 0.3N

2kT M 2kT (37) To determine the particle concentration distribution along coordinate 6, we use the condition of chemical potential (Z) constancy for an ideal gas of particles in the force ﬁeld (Landau and Lifshits, 1976) Z(6, T ) ; U(6) : const where Z(6, T ) : kT ln[N (6) (h/M kT )] 1 r U(6) : 9 %F 2 6

This yields

r %F 1 9 r 6 6 6 2kT After rearrangements, in view of Equations (33) and (37), we have N (6) : N exp 9

(38)

n E $ 3.4(MkT ) exp 9 (39) h kT The substitution of the numerical values of parameters: T $ (2000 — 2500) K, F $ 5 V/nm, and E $ 3.0 eV typical of the pre-explosion emission phase into Equation (39) shows that in this case N may be on the order of up to $(10—10) cm\. This means that the pressure of metallic vapor atoms in the pre-explosion emission phase in the polarization trap region approaches the atmospheric pressure. The above calculation results impose a natural question of the probability of vapor atom ionization due to either impact ionization (electron impact) or ﬁeld ionization. This question is discussed in the next section. N

E. Ionization Probability of the Emitter Substance Native Atoms After the Evaporation According to the results of the preceding sections, the intense evaporation of the native atoms of the MC material can take place both at intense

208

V. E. PTITSIN

heating of the MC tip due to external energy source and in the course of high-density current electron emission. In the presence of a high electric ﬁeld the evaporating atoms can be ionized by different mechanisms. If the electric ﬁeld is accelerating for electrons, atom ionization may be caused by electron—atom collisions (impact ionization) or by ﬁeld ionization (FI) of free atoms. If the high electric ﬁeld is retarding for electrons, a free atom near the emitter surface is likely to be ionized only through FI. To calculate the probability of impact ionization (P ) during intense

electron emission, we will use the known expression P : 1 9 exp(97 · t)

where 7 is the atom ionization probability per unit time due to electron—

atom collision, and t is the time of atom movement in the polarization trap region, 7 will be deﬁned by (Ptitsin, 1996)

1 1 7 : · (J · · J ·

e

e where J is the emission current density, is the impact ionization cross

section as a function of the relative velocity of the ionizing electron and the atom, and is the maximum value of impact ionization cross section

corresponding to the incident electron energy approximately equal to (2—5) I , where I is the atom ionization energy. After calculation of from Drawine’s empirical formula (1961) for W

atoms, we obtain 10\ cm, where it follows that at J $ 10 A/cm

the value of 7 satisﬁes the inequality 7 6 10 s\. Accordingly, P for the

time (1 ns) equal to the atom jump time turns to be close to unity. The FI probability per unit time (7 ) for a free neutral atom is (Ptitsin, ' 1996; Muller and Tsong, 1972)

2S S 7 : A# exp 9 '

(40)

where A : const 1, # is the electron oscillation frequency (# 10 s\), S $ (2m)

e I9 9 ezF(z) dz 4% z

(41)

where % is the permitivity of vacuum, and Z , Z are turning point coordinates. Note that Equation (41) ignores the contribution from the potential of image forces because for z 0.5 nm this contribution is negli-

NON-STATIONARY THERMAL FIELD EMISSION

209

gible. Assuming F(z) $ const yields

I eF · 1< 19 2eF 4% I After calculating (41), we have Z

:

(42)

Z Z 2 (43) S $ (Z ) E(0) 1 ; 9 2 K(0) Z Z 3 where E(0), K(0) are full elliptic integrals, 0 (1 9 Z /Z ). Calculating Equations (40—43) shows that in high electric ﬁelds F $ (4.5—5.0) V/nm, a free W atom in ground or low-lying excited energy states is ionized by the ﬁeld at a rate of 7 2.5 10 s\ and, hence, the FI probability ' (P : 1 9 exp(97 t)), turns to be close (or equal) to unity already after ' ' the time of W atom residence in the polarization trap region equal to (or less than) 10\ s. The numerical estimates obtained suggest that in high electric ﬁelds (of magnitudes F (4.5 —5.0) V/nm), the W MC native atoms, which will reside in the polarization trap after evaporation, will be ionized primarily (7 7 ) '

through ﬁeld ionization. Note that this conclusion is true not only for W emitters but also for other (Mo, Nb, Ta) pointed MCs studied and, therefore, can be considered as rather general. The results presented above suggest also that at a high density (J of about 10 A/cm) emission current in the polarization trap region there will exist SC consisting of ions of the MC substance and electrons. Among the electrons are those produced by FI of neutrals. Besides, the electronic component of SC naturally includes emitted electrons. In other words, under these conditions at the emitter tip surface there will appear a plasma blob, or plasmoid. F. Processes at the Interface: Emitter Surface—Microplasma L ayer As seen from the results given in the preceding sections, electron emission from the emitter tip surface in the high density emitter current mode initiates secondary thermal ﬁeld processes, which produce a layer of SC—microplasma (MP) near the MC tip emitting surface. This statement is conﬁrmed by experiment. From the data of Slivkov (1986) obtained in the studies devoted to the initiation and development of vacuum breakdown, it is known that immediately before the vacuum breakdown on single local microinhomogeneities— microtips at the macrosurface of the vacuum gap cathode, there appear luminous regions, which in the course of the breakdown development are

210

V. E. PTITSIN

transformed into intensely radiating and expanding with time plasma blobs—cathode ﬂares. The MP radiation spectra ﬁrst show the lines of excited neutral atoms entering into the chemical composition of the cathode material (Mesyats and Proskurovsky, 1984). The studies of electron emission from the surface of single- pointed emitters at emission current densities of about 10 A/cm have also revealed similar luminescence of the forming MP layer (Ptitsin, 1996). As stated in Section II, B simultaneously with the ﬂashes of radiation emitted from the local region near the emitter apex, the non-stationary emission process is acompanied with: (a) signiﬁcant change of emitting surface microstructure (due to the effect of spontaneous rearrangement), (b) ring effect, and (c) increase in the resolving power of the electron microscope— Muller’s projector. Applying the ideas developed in the previous sections, the effects listed above can be interpreted in the following way. The effect of spontaneous rearrangement is believed to be due to the interaction of ions formed at ﬁeld ionization with the emitting MC surface. According to estimates made in Ptitsin, (1996) the energy E of the ion interacting with the emitter surface may reach 10 eV up to 100 eV and above. At such energies there is a high probability (Kaminsky, 1967; Ion Bombardment . . . , 1984) of knocking the surface atoms out of the lattice sites and also transferring to adatoms the initial energy sufﬁcient for their transition from binded states at the surface to free states in vacuum. The atoms knocked from the emitter surface as a result of the cathode selfsputtering will be ionized with a high degree of probability by a high ﬁeld. During the current pulse, ﬂowing the secondary processes caused by ion interaction with the emitter surface may undergo multiple multiplication, that leads to emission instability and changes in the emitting surface microstructure. The ions produced by FI will have different energies depending on which local region of the near surface they exercise acceleration. Since the ﬁeld near the emitter surface is inhomogenous both in coordinate 6, and polar angle coordinate , the ions accelerated in vacuum in the local regions adjacent to edges and angles of the emitting crystal surface will possess the maximum energy. This means that all other things being equal, the most substantial change of emitter surface microstructure must take place at the local sites near the edges and angles forming the emitting surface. It was the microstructure change in the course of spontaneous rearrangement that was observed in our experiments (Krotevich et al., 1984, 1985; Ptitsin, 1990). The ring effect can be also qualitatively interpreted in terms of emitted electron interactions with the MP layer. It is known that during such interactions electrons undergo scattering at the boundary of a dense plasma blob. Based on the axial symmetry of the process, the plasma blob boundary

NON-STATIONARY THERMAL FIELD EMISSION

211

is evidently close to a circle. The electron scattering process at this boundary is displayed on the luminescent screen of the electron microscope—Muller’s projector as a diffraction pattern. To justify this approach, let us make some estimates. As is known, the microparticle diffraction is most clearly noticeable if the de Broglie wave length of a microparticle ( ) is comparable with the $ distance between the scattering centers. This means that diffraction can be observed if the condition N \ holds. The mean de Broglie wavelength $ ( for electrons at TFE can be evaluated from the relationship (Ptitsin, $ 1996) ( 4Z, where Z is the width of the potential barrier at the $ metal—vacuum boundary. Using the data of Modinos, (1990); Elinson, (1974), we will ﬁnd that, according to calculations of Section III, D, the concentration N in the MP layer for W emitter must be on the order of (10 cm\ 9 10 cm\). Another proof of such interpretation of the ring effect consists in that the suggested approach can give a satisfactory quantitative explanation to the known experimental fact (Elinson, 1974; Ptitsin, 1996; Krotevich, 1985) that the ring effect for W and Ta emitters is observed with the probability close to 1, while for Mo and Nb emitters the probability of the ring effect observation turns to be 3—4 times lower than for W and Ta emitters. This seems to be caused by the fact that the intensity (I ) of electron ﬂux scattering on the MP layer boundaries depends on the atomic number (A) of the element in the periodic table in the following way I .

m · e $ · (A 9 f ) · 2h sin

(44)

where is the scattering angle, and f is the atomic scattering amplitude. From this it follows that, in accordance with the atomic numbers of the emitter materials studied, the intensity ratios for diffraction maxima well agree with the experimentally established ring effect observation probability. Finally, the resolution growth of the pulsed electron microscope—Muller’s projector in the pre-explosion phase of non-stationary electron emission may be adequately explained as follows. At the initial stage of MP layer formation, the electron ﬂux through the anode surface of a vacuum diode is created by both the TFE electrons and the electrons produced in the course of ﬁeld ionization of evaporating neutrals. Therefore, the emission image of the emitting surface will be evidently created by electron ﬂuxes of various ‘‘origin.’’ To deﬁne the resolving power ( ) of Muller’s projector in ‘‘ﬁeld ioniz ation electrons’’ we shall recall that the event of ﬁeld ionization occurs at a short (100 nm) distance from the emitting surface. Therefore, to a ﬁrst approximation, can be calculated from the known formula (Modinos,

212

V. E. PTITSIN

1990) for the resolution of Muller’s projector in ‘‘ﬁeld emission electrons’’ ( ) ' E O : 41r (45) ' eV

where 1 is the image compression coefﬁcient, E is the tangential compoO nent of the ﬁeld emission electron initial energy, and V is the potential difference between the ﬁeld emitter and anode of Muller’s projector. Using Equation (45) for yields E (46) $ · ' E O where E is the tangential component of the initial energy of an electron upon the event of ﬁeld ionization of a neutral. E will be estimated based on the Heisenberg uncertainty principle

(2m E ) [D ; (Z 9 Z )] $ 2

(47)

where D is the diameter of a neutral, and Z and Z are the turning point coordinates. From Equations (46) and (47) and the expression for E given O in Modinos (1990) we obtain that for the W emitter at F 5 V/nm the resolution of Muller’s projector ‘‘in ﬁeld ionization electrons’’ is $ 0.16 . ' For a typical value of 2.5 nm (Modinos, 1990), the resolution in ‘‘ﬁeld ' ionization electrons’’ will be equal to about 0.45 nm, that is, in agreement with experimental data (Krotevich et al., 1986) the resolution of Muller’s projector in the pre-explosion phase of emission turns to be close to the atomic resolution. Note that the increase in resolution of Muller’s projector under these conditions is observed only at local sites of the emitting surface adjacent to closely-packed faces (Krotevich et al., 1986). This peculiarity of the emission image will be quite understandable if we estimate the electron current density due to ﬁeld ionization of evaporating neutrals in vacuum. Taking the ﬁeld ionization probability equal to 1, one can easily deﬁne the ﬁeld ionization electron current density from Equation (25). The appropriate calculations show that the ﬁeld ionization electron current density (J ) at the emitter surface temperature T (2000—2500) K may reach (10— 10) A/cm. The values obtained for the current density are 1—2 orders of magnitude less than the mean (or effective) current density at the preexplosion stage of non-stationary emission, which is close to 10 A/cm as follows from the preceding sections. However, in view of the fact that at the borders of closely-packed faces having a high value of the work function, the

NON-STATIONARY THERMAL FIELD EMISSION

213

TFE current density is rather low (see Equations (1) and (2)), and the temperature is evidently close to an average value over the entire surface, it is easy to see that it is at those local sites of the surface that the higher resolution conditions will be fulﬁlled. It would be natural to assume that the ﬁeld ionization electron ﬂuxes are also formed above the surface areas with a lower work function than that for closely packed faces, but as they are 1—2 orders of magnitude less than the TFE electron ﬂux, it is impossible to distinguish the contribution of the ﬁeld ionization electrons against ‘‘the background’’ of TFE electrons on the luminescent screen of Muller’s projector without special techniques. So the ideas developed above, which suggest that at intense emission in strong electric ﬁelds a MP layer is formed at the MC tip surface, will agree with experimental data. Let us go now to the analysis of some processes that depend on the established fact of MP formation at the MC tip surface. To analyze processes that occur in the system: pointed emitter—MP layer—vacuum gap of a diode, we start from the continuity equation for the current ﬂowing through the diode I :I :I (48) where I , I are the electric current intensities in the emitter and cathode drop layer (or the so-called Langmuir layer), respectively; and I is the current intensity in vacuum at the MP—vacuum boundary. Next we can write the following relationships I :I ;I :I ;I :I (49) " " I : J S : I (50) " I I /e (51) where I is the electron current equal to the emission current from the metal " surface into MP; I is the component of the total electron current in metal corresponding to neutralization of the ion ﬂux incident on the metal surface; I is the ion current from MP to the metal surface (its density J can be deﬁned by an expression given, for example, in Molokovsky et al., (1991); Krendel, (1977); is the coefﬁcient (or probability) of ion neutralization at the metal surface; S is the area of the emitting MC surface—MP boundary; " I is the neutrals ﬂux from the metal surface; and is the ion accomodation coefﬁcient. Note that the phenomenological equations (48)—(51) describe a nonstationary process and, therefore, all ‘‘currents’’ and also coefﬁcients of accomodation and neutralization are generally time dependable. The meaning of these equations for an arbitrary time moment seems straightforward

214

V. E. PTITSIN

and does not require detailed comments. It should be only mentioned that the I current component deﬁnes the electron current in the Langmuir layer, " however, here, the mechanism of electron emission is not discussed. The last equation in the above set of equations reﬂects the fact that under these conditions the substance injection into the MP layer takes place due to evaporation of neutrals and, therefore, during the MP life the rate of neutrals generation must be no less than that of neutrals recombination. Note that according to data of Kaminsky, (1967); Rakhovsky, (1970), the neutralization probability at metal ion interaction with the metal surface consisting of native atoms is close to 1. However the accomodation coefﬁcient of such ions ( ) differs from unity and according to various sources (Rakhovsky, 1970; Rayzer, 1987) may reach 0.5 1. To ﬁnd out the mechanism of electron emission at complete screening of the external (Laplace) ﬁeld by plasma SC, we will apply the known Langmuir-McCown equation (Rakhovsky, 1970; Rayzer, 1987)

4 M m F: J (1 9 s) 9s V (52)

% " 2e 2e where F is the ﬁeld strength created by the SC of plasma ions and electrons at the metal surface, J : I /S ; s J /(J ; J ) 5 I /(I ; I ) : I /I; V is " " " " " " " "

the potential drop (or cathode drop) at the plasma—metal boundary; I is the total current through the vacuum diode; and M is the ion mass. Using the straightforward relationships: J : (I 9 I )/S ; s : I /(I ; I ) : (I 9 I )/I, " " " " the Langmuir-McCown equation is easily transformed into

4 I 9 I I M (I 9 I ) m F: 9 (53) V

% S I 2·e I 2·e " Ignoring the second term in square brackets, which is on the order of a 10\th part of the ﬁrst term, and solving the equation for I at given I and F yields I 4· (54) I : · 1; 19 2 I

where % S FI(e/8MV ). The radicand non-negativeness condition "

implies that

I MV

F

(55) % S 2e " According to Equation (54), two solutions I and I correspond to given values of I and F satisfying Equation (55). Formally, in view of the form of Equations (52) and (53), this evidently means that one and the same ﬁeld F

NON-STATIONARY THERMAL FIELD EMISSION

215

value may be created by either of two possible combinations: (I , I ) and " (I , I ) at a given total current I. In the general case, I " I and I " I . " " " Based on general physical principles, it was shown in Rakhovsky, (1970) that at the interaction of dense quasi-neutral plasma with the metallic electrode—cathode of a vacuum diode the equality I : I should hold. This " implies that under these conditions only a doubly degenerate solution of Equation (53) for the variable I exists, that is, I I :I : (56) 2 I I :I : " 2

s:

1 2

(57)

I MV

F: (58) % S 2e " Note that Equation (58) has quite a simple physical meaning that under such conditions the negative pressure of the SC ﬁeld ponderomotive forces at the metal surface must be equal to the gas-kinetic pressure of the ﬂux of ions accelerated in the cathode drop region up to the energy eV (% F/

2 : en V : J (MV /2e) where n is the ion concentration in the MP at the

cathode drop region boundary). Equation (58) allows one to estimate the value of F from experimental data in real conditions when it is possible to specify the values of I, S , and V with a sufﬁcient accuracy. According to "

Ptitsin, (1990); Rayzer, (1987) in the non-stationary emission mode the total current through the vacuum diode with a pointed cathode-emitter abruptly (for less than 0.1 s) rises from 10\ A to 1 A and then over 0.1 s slowly increases or remains constant (quasi-stationary stage). The process either ends in transition to the state of vacuum breakdown, if the vacuum gap is bridged by plasma, or the current rapidly drops to zero (a so-called current ‘‘break’’ or ‘‘cut-off’’ ) if the gap is relatively wide. In the electron Muller projector, wherein the emitter-to-anode distance is relatively great (about 1 cm) the vacuum breakdown does not usually develop and the non-stationary process ends in the current break. The electron-microscopic studies of the emitter tip geometry upon completion of the current instability and break development show (Elinson, 1974; Mesyats et al., 1984) that the emitter tip turns to be melted and its average radius is equal to r (1.0 —1.5) m. In view of these results and using empirical values for the cathode drop (Rakhovsky, 1970), we obtain V 16 V, I $ 1 A, J $ (3 9 5) 10 A/cm,

J : J $ (1.5 9 2.5) 10 A/cm, and F $ (4.5 9 5.0) V/nm for the W emit " ter. These estimates have a very important physical meaning because they

216

V. E. PTITSIN

quantitatively characterize the non-stationary electron emission process after the emitter tip is screened by a plasma layer. First of all, the estimates made suggest that the electronic component of the total current in the Langmuir layer is due to thermal ﬁeld electron emission from metal since the ﬁeld F and the value of J numerically satisfy Equation (2) describing " the TFE process. Besides, from the obtained estimates, on the basis of thermal calculations and TFE theory concepts, it follows that the values obtained for J are not sufﬁcient for MC tip heating up to the melting point. " This, evidently, means that the experimentally observed emitter tip melting upon explosive breakdown takes place either at the end of the abrupt total current growth or already at the quasi-stationary stage owing to injection of the concentrated energy ﬂux due to the ion current component in the Langmuir layer into the emitter substance. The transition from the stage of abrupt current growth to a quasistationary stage followed by current break seems to be connected just with the phase transition of the emitter substance from a crystalline to a liquid state. This statement can be qualitatively proved as follows. The crystal— liquid phase transition may lead to current break if such transition results in an abrupt decrease of the ﬁeld F , which is mainly deﬁned by the ion SC in the Langmuir layer. The decrease of ion concentration in plasma is possible if, as a result of the phase transition, the ﬂow of neutrals

kT 8 S exp 9 I $n h " kT

(59)

from the emitter surface is considerably reduced. Since the emitter temperature, surface atom concentration and oscillation frequency at phase transition change continuously, the ﬂow of neutrals can be reduced only due to a sharp increase in the binding energy of surface atoms. According to Frenkel, (1958), the binding energy of surface atoms in liquid is close to that of atoms located at intralattice sites. As applied to the problem studied, this means that after transition to the liquid phase, the physically isolated, loosely bound with the closely-packed MC faces adatom states disappear and, hence, evaporation of neutrals from the liquid metal surface would require much more energy. To our estimate, the binding energy of the liquid W surface atoms is 6 eV. In the estimations it was taken into account that after melting the excitation of the electron subsystem of metal would continue for some time (Ptitsin, 1996). At such binding energies the neutrals generation rate becomes negligible and plasma will decay due to surface and volume recombination. To summarize the above, it is worth noting that, though the electronic component I of the total current is deﬁned by the TFE mechanism, the "

NON-STATIONARY THERMAL FIELD EMISSION

217

process of non-stationary electron emission is not identical and not reduced to TFE, since, ﬁrst, the bulk emitter current in the course of non-stationary emission is the sum of two components: the thermal-ﬁeld and neutralization ones, and the latter becomes numerically equal to the thermal-ﬁeld component as a result of MP formation (see Equation 57) and, second, owing to interaction of the emitter substance with concentrated energy ﬂuxes (due to electron and ion components of the total current), the non-stationary emission is accompanied with a continuous phase transition (sublimation) of the condensed emitter matter to the state of ionized metallic vapor and then to the plasma state. As shown in Ptitsin, (1996), the plasma ion interaction with the emitter substance accelerates the process of intense sublimation of the MC substance surface layer, that leads to expansion of the dense plasma blob (cathode ﬂare plasma) into vacuum.

G. Non-Stationary Thermal Field Emission Current Kinetics To obtain a more complete picture of the physical mechanism underlying the processes of non-stationary electron emission, let us consider also the process of current ﬂowing in vacuum. To describe this process, it is necessary to ﬁnd the form of the function I : I(V , t), where V is the potential difference between the cathode and anode of a vacuum diode. As a diode model we choose a spherical condenser. Based on the general concepts of plasma physics, the form of the instantaneous potential distribution 3(6, t) and the self-consistent ﬁeld strength F (6, t) in a spherical vacuum diode can be qualitatively described

by the curves shown in Figure 8. To deﬁne the function I : I(V , t), a non-stationary self-consistent problem for the Poisson equation was solved. To a ﬁrst approximation, it was assumed that the function characterizing the electron velocity () dependence on the potential : (3) is identical to that without SC ﬁeld. From the solution of the Laplace equation for R 6 R with the boundary conditions: 3(R ) V , 3(R) : V $ 0 (V V ) it follows that $

2e V R R 19 9 V ; m R 9R 6

where is the initial velocity of electrons.

(60)

218

V. E. PTITSIN

Figure 8. Instantaneous self-consistent ﬁeld potential 3(6, t) and strength F (6, t) dis tribution curves in a spherical vacuum diode in the course of microplasma blob expansion during non-stationary high current density electron emission process. (H is Langmuir layer width, R is instantaneous coordinate value of the emission boundary, r is the instantaneous value of the MC tip radius, and V is the instantaneous value of the potential 3(R, t); scale proportions along the coordinates are not observed.)

Then, in the general case, the problem in the spherical coordinate system can be stated as follows d3 2 d3 I ; : d6 6 d6 4% 6

(61)

3(R) V , F (R) : 0 The solution to Equation (61) will be sought in two steps: ﬁrst we will ﬁnd F ( 6, t) and then, after integrating the function found, we will determine

3(6, t). Rewriting Equation (61) in the form dF 2

; F : d6 6

R '6 6 9 '

(62)

219

NON-STATIONARY THERMAL FIELD EMISSION

here I

9

2e V R 4% · m R 9R upon integrating, we obtain F (6, t) : · '\

R 69 ' R 1 R ; 9 19 ; ln 6 6 ' '6

2e V 9 m '1; 2e V R · m R 9R

R ; 6 ' 1 ;1 19 '

69

R

(63)

Note that the last expression allows us to ﬁnd an approximation to the sought relationship I : I(V , t) without subsequent integration. This is because the process of instability development lasts at most 50 ns. During this time a plasmoid (cathode ﬂare) moving at a velocity of 2 10 cm/s, (Elinson, 1974; Mesyats et al., 1984) expands to sizes of about 1 mm. Therefore, for vacuum breakdown in Muller’s projector, which is characterized by relatively large vacuum gaps (R 1 cm), it can be assumed that the self-consistent ﬁeld strength in the above expression near the anode surface at an arbitrary time moment t (after the onset of instability development), to a ﬁrst approximation, is equal to V R · F (R , t) $ 9

R 9R R This ‘‘immediately’’ yields an approximation for the unknown function I(V , t). A general analytical expression that is true for any gap can be derived after integrating F (6, t). The integration followed by rearrangements gives

the following expression for the time dependence of the anode current

2e 'V (V 9 V ) 4% m I(t) $ (1 9 4)

; 2ln

1; 19 4

4 '

4 4 4 4 ; ln 49 19 9 ln 1; 19 2' ' ' '

\

(64)

220

V. E. PTITSIN

in which, for convenience, the notation 4 R/R : t/R , is introduced, where v $ 2 10 cm/s is the velocity of the plasma leading front (Mesyats et al., 1984). Note that the functional relationship Equation (64), according to which I . V , well agrees with the results of Bellustin, (1939). Equation (64) describes a non-stationary emission process for the model problem (in the spherical geometry of a vacuum diode) and, hence, cannot be used for qualitative comparison with experimental data obtained for diodes — Muller’s projectors. To make such comparison, one should take into account that the full solid angle of emission in the diode with an actual electrode geometry is substantially less than 4 and approximately equals 2(1-cos 9), where 9 is the half-angle of the cone opening incorporating 90% of the entire electron ﬂux. According to Elinson, (1974); Dyke and Dolan, (1956), 9 /6. Another thing to be accounted for when reﬁning Equation (65) is that initiation of a non-stationary emission process and its further development with time depends on the respective initial ﬁeld strength originally created at the emitter surface as a result of applying a potential difference V between the anode and emitter of a vacuum diode However, the magnitude of the voltage V * in a diode of real geometry needed to create an initial critical ﬁeld F 5 V/nm is not equal to that in a spherical diode. The relationship between V * and V can be deﬁned if a certain approximation of the actual pointed emitter surface shape is speciﬁed. As shown in Dyke and Dolan, (1956), the shape of a real emitter surface is very close to a hyperboloid of revolution with a known expression for the so-called 2-factor (or ﬁeld factor). Assuming such approximation, we introduce the following substitution into Equation (64)

V : 2V * ln\

4R r

(65)

here V * is the voltage at which the instability process in a diode of real geometry is initiated. To change from the variable 4 to t, we use the formula t : 4R / . Figure 9 shows a kinetic curve of the current I : I(V *, t, R , r) calculated from Equation (64) with appropriate corrections for V (R) $ V .

The following typical experimental values of parameters were used: V * : 8 keV, R : 5 cm, and r : 300 nm. A schematic diagram of different stages in the transition of thermal ﬁeld electron emission to vacuum breakdown is presented in Figure 10. Figures 9 and 10 show a satisfactory agreement between theory and experiment.

NON-STATIONARY THERMAL FIELD EMISSION

221

Figure 9. Theoretical curve of the current kinetics in the spherical vacuum diode during non-stationary high current density electron emission process (V * : 8 keV; R : 5 cm; r : 300 nm).

IV. Discussion and Conclusion Thus the analysis has shown that the stationary TFE process naturally goes to a non-stationary emission process at high emission current densities (J 10 A/cm), that results in a 10-fold increase in total emission current (but not in its density!) in about 10\ s. The abrupt rise of current is accompanied by sublimation of the emitter material and dense plasma formation. The integral length of this non-stationary process including a quasi-stationary stage does not exceed 10\ s (Rakhovsky, 1970). The ‘‘lifetime’’ of the non-stationary process is deﬁned mainly by the time needed

222

V. E. PTITSIN

Figure 10. Diagram of different stages in the transition of TFE to vacuum breakdown based on experimental data (Elinson, 1974). (The abrupt emission current rise stage lasts about 100 ns; the duration of other stages of this curve depends on experimental conditions.)

for the crystal-to-liquid phase transition of the emitter tip substance. Note that the detailed calculation of the current kinetics and the lifetime requires solving a non-stationary two-temperature thermal problem, which at present could hardly even be stated correctly because the interaction of intense high power density ((10—10) W/cm) ion ﬂuxes with substances has not yet been well studied. In fact, for example, the question of interrelation between non-stationary thermal ﬁeld electron emission and so-called ‘‘explosive emission’’ (EE) (Mesyats et al., 1984) still remains unclear. The studies carried on in this work have shown that non-stationary thermal ﬁeld electron emission is a speciﬁc variation or consequence of the interaction between the concentrated energy ﬂux and the emitter material, which takes place in a high electric ﬁeld in cases when, due to high density (J 10 A/cm) TFE current ﬂowing through the MC substance-emitter, the velocity of the concentrated energy ﬂux at its entry into the MC material exceeds the rate of its dissipation through heat conduction and surface

NON-STATIONARY THERMAL FIELD EMISSION

223

self-diffusion processes. An additional contribution to energy dissipation from the activated evaporation of native MC atoms, subsequent vapor atom ionization and interaction of the high power density (over 10 W/cm) ion ﬂux thus formed with the emitter material lead to avalanche sublimation of MC material and simultaneous reproduction (generation) of dense plasma. The conduction current in the metal is caused by both TFE from the emitting surface increasing with time and plasma ion ﬂow neutralization at the emitter surface. Electron emission into vacuum from the surface of expanding MP follows a known mechanism (Molokovsky et al., 1991; Krendel, 1977). The conceptions offered for the physical mechanism of non-stationary thermal ﬁeld electron emission show also that, contrary to the existing views of the EE mechanism (Elinson, 1974; Mesyats et al., 1984), the nonstationary process of electron emission is caused not by thermal instability and volume explosion of the emitter substance at high ﬁeld electron emission current densities (J 10 A/cm) as believed earlier (Elinson, 1974; Mesyats et al., 1984), but rather is the result of a combination of the above listed interrelated secondary thermal ﬁeld processes initiated at much lower TFE current densities (J 10 A/cm). Therefore, another very important question arises, namely: Why is it possible to attain such high current magnitudes (up to 1 A) and, respectively, high current densities (up to 10 A/cm and above (Fursey et al., 1984)), with the current jump simultaneously decreasing from 10 to 0 with decreasing 5 ? To answer this question we refer to experimental data (Fursey et al., 1984). According to these data, at small 5 the increase in total emission current (which is considered to be the thermal ﬁeld electron emission current) from 10\ A to 1 A is accompanied by a 2-fold increase of the voltage V pulse height. This means that, provided the emission mech anism remains purely thermal-ﬁeld, the ﬁeld strength F also increases 2 times and at a current of 1 A is equal to 10 V/nm. At such ﬁeld strength at the emitter tip apex, the ﬁeld strength drop at the emitter tip periphery down to (4—5) V/nm for conventional hyperbolic or parabolic approximations of the MC tip surface shape occurs at 5 /6 (Gor’kov et al., 1962). Then, according to Equations (2.7) and (2.8) in Ptitsin, (1996), the value S 6 r and, hence, the values calculated (Fursey et al., 1984) from " the formula J $ I/4 r turn to be overestimated 5 times. However, the experimental results of Zhukov et al., (1988), where the ring effect in the nanosecond range of 5 was observed, show that an MP layer appears also at the above values of 5 . The MP layer alters the emitter ‘‘geometry’’ so that it will approach a so-called approximation shape: a sphere-hyperboloid or a sphere on a cone with a certain degree (*) of closeness to the spherical

224

V. E. PTITSIN

approximation. At * $ 0.25 the angle increases up to $ 5.6 /6 and, hence, the area of the real emitting surface S is much greater than r. The " calculations show that S (10 9 15) r and above. So the estimations " made indicate that for typical pointed (W) emitters of radius 0.1 m r 0.5 m, the maximum total current density does not exceed 10 A/cm and, hence, the maximum TFE current density at the emitter—MP boundary does not exceed 5 10 A/cm in a wide range of 5 from 10\ s to 10\ s. Another characteristic feature of the non-stationary electron emission is that, with decreasing 5 , the delay time (t ), at which the intense sublimation ! of the MC-emitter material begins, also decreases from 10\ s (and above) to a few nanoseconds (Elinson, 1974; Meysats et al., 1984). In the context of the phenomenological theory developed such signiﬁcant variation of t is deﬁned, as mentioned above, by the time taken by the dense ! plasma formation at various initial values of 8 and T due to TFE current ﬂowing. The estimations show that the main contribution to t is from the ! adatom lifetime (according to Frenkel) on closely packed emitter planes and also from the electron—ion relaxation time in dense plasma. When the initial temperature T changes from 1800 K to 3600 K, the lifetime varies from 10\ s to 10\ s (Ptitsin, 1996). To summarize, it may be concluded that the conceptions of the nonstationary thermal ﬁeld electron emission mechanism developed in this work essentially complement and extend the existing approaches to explosive breakdown or so-called EE (Elinson, 1974; Mesyats et al., 1984). In our opinion, conceptions of non-stationary thermal ﬁeld electron emission, explosive breakdown, and EE are conceptually similar in that, according to different authors, the non-stationary emission process in the vacuum diode with a pointed metal emitter is initiated in high electric ﬁelds when the velocity of the concentrated energy ﬂow at its entry into the emitter material is above its dissipation rate. In the present work, in contrast to Elinson, (1974); Mesyats et al., (1984), it has been taken into account that at high density current TFE the energy dissipation takes place not only through the heat conduction mechanism, but also through self-diffusion and activated evaporation of native atoms from the emitter surface. Analysis of these phenomena and their consequences has, ﬁrst, shown that the non-stationary electron emission can be initiated already at the initial current density 10 A/cm instead of (10—10) A/cm as supposed earlier and, second, helped to explain the mechanism of dense plasma formation and abandon the known concepts of spatial explosion of the emitter tip. Due to such ‘‘historically established’’ state of affairs, one and the same physical process of non-stationary high current density electron emission

NON-STATIONARY THERMAL FIELD EMISSION

225

has different names. In fact, it is called ‘‘non-stationary thermal ﬁeld emission’’ here and ‘‘explosive breakdown’’ (Dyke and Dolan, 1956) or EE elsewhere (Nonheating Cathodes, 1974; Mesyats et al., 1984). Perhaps it would be better to give for this process of the non-stationary high current density electron emission a common name, for example, ‘‘phase transition emission’’ which most closely characterizes its physical meaning and mechanism.

Acknowledgments Author is grateful to Mrs. G. D. Gelever and Mrs. G. G. Levina for valuable assistance in this work. Author also wishes to thank the Russian Foundation for Basic Research for ﬁnancial support of this work (Grant No. 98-0218101).

References Aizenberg, N. B. (1964). ‘‘About the Inﬂuence of the Space Charge on the Form of the Current Voltage Characteristics of the Field Emission Cathodes’’ Radiotechnika i Electronika. 9, 2147 (in Russian). Aizenberg, N. B. (1954). ‘‘About the Role of the Space Charge in the Spherical Electron Projectors’’ Z. Tekh. Fiz. 24, 2079—2082 (in Russian). Barbour, J. P., Charbonnier, P. M., Dolan, W. W., Dyke, W. P., Martin, E. E., and Trolan, J. K. (1960). ‘‘Determination of the Surface Tension and Surface Migration Constants for Tungsten’’ Phys. Rev. 117, 1452. Barbour, J. P., Dolan, W. W., Trolan, J. K., Martin, E. E., and Dyke, W. P. (1953). ‘‘Space-Charge Effects in Field Emission.’’ Phys. Rev. 92, 45. Bell, A. E., Swanson, L. W. (1979). ‘‘Total Energy Distribution of Field Emitted Electrons at High Current Density.’’ Phys. Rev. B19, 3353. Bellustin, S. V. (1939). ‘‘To the Theory of Current in Vacuum. III. Spherical Electrode Case’’ Z. Experimentalnoi i T eoreticheskoi Fiziki, 9, 857 (in Russian). Boersch, H. (1954). ‘‘Ezperimentelle Bestimmung der Energiever-teilung in Thermisch Ausgelosten Electrononstrahlen.’’ Z. Phys. 139, 115. Christov, S. G. (1966). ‘‘General Theory of Electron Emission from Metals.’’ Phys. Stat. Sol. 17, 11. Dolan, W. W., Dyke, W. P., and Trolan, J. K. (1953). ‘‘The Field Emission Initiated Vacuum Arc II, The Resistivity Heated Emitter.’’ Phys. Rev. 91, 1054. Dravin, H. W. (1961). ‘‘Zur Formelmabigen Darstellung der Jonisierung Squerschnitte Gegeinuber Elektronenstob’’ Z. fur Physik. 164, 521. Drechsler, M. (1988). ‘‘Microscopy of the Thermal Roughening of Crystal Faces.’’ Journ. de Physique, Coll. C6, suppl., An. No. 11, Tome 49, C6—87. Dyke, W. P., Dolan, W. W. (1956). ‘‘Field Emission.’’ In Advances in Electronics and Electron Physics. Vol. 8 (Ed. L. Marton). New York: Academic Press. Dyke, W. P., Trolan, J. K. (1953). ‘‘Field Emission: Large Current Densities, Space Charge and the Vacuum Arc.’’ Phys. Rev. 89, 799.

226

V. E. PTITSIN

Fowler, R. W., Nordheim, L. (1928). ‘‘Electron Emission in Intense Electric Fields’’ Proc. Roy. Soc. Ser. A119, 173. Frenkel, Ya. I. (1958). ‘‘On the Surface Particle Walk in Crystals with Natural Roughness of Faces.’’ In Collection of Selected Papers by Ya. I. Frenkel. Vol. 2, Moscow: Leningrad Publ. USSR. Ac. Sciences. Fursey, G. N. Ptitsin, V. E., Krotevich, D. N. (1984). ‘‘Spontaneous Migration of the Surface Atoms at Top Current Densities of Field Emission Initiating Vacuum Breakdown.’’ Proc. XI Intern. Symp. on Discharges and Electrical Insulation in Vacuum, 1, 69. Fursey, G. N., Zhukov, V. M., Baskin, L. M. (1984). ‘‘Limiting Values of Field Emission Current Density and Pre-Explosion Effects.’’ In High Emission Current Electronics (Ed. G. A. Mesyats). Novosibirsk: Nauka Publ., Siberian Branch (in Russian). Gadzuk, J. W., Plummer, E. W. (1973). ‘‘Field Emission Energy Distribution’’ Rev. Modern. Phys. 45, 487. Geguzin, Ya. E., Koganovsky, Yu. S. (1984). Diffusion Processes at the Crystal Surface. Moscow: Energoatomizdat (in Russian). Glazanov, D. V., Baskin, L. M., Fursey, G. N. (1989). ‘‘Kinetics of Pulsed Heating of Pointed Field Emission Cathodes of Real Geometry by High Density Emission Current.’’ Zhurnal Teknicheskoi Fiziki. 59, 60 (in Russian). Gor’kov, V. A., Elinson, M. I., Yakovleva, G. D. (1962). ‘‘Theoretical and Experimental Studies of Pre-Breakdown Phenomena in Field Electron Emission’’ Radiotekhnika i Elektronika, 7, 1501 (in Russian). Hirth, J. and Pound, G. (1957). ‘‘Evaporation of a Metal Crystals’’ J. Chem. Phys. 26, 1216. Ion Bombardment Sputtering of Solids. (1984). Issue 1 (Ed. R. Berish). Moscow (in Russian). Kaganov, M. I., Lifshits, I. M., Tanatarov, L. V. (1956). ‘‘Relaxation between the Electrons and the Phonons.’’ Zhurnal Experimentalnoi i Teoreticheskoi Fiziki, 31, 232 (in Russian). Kaminsky, M. (1967). Atomic and Ionic Collisions at the Metal Surface. Moscow: Mir 338 (in Russian). Knauer, W. (1981). ‘‘Energy Broadening in Field Emitted Electron and Ion Beams’’ Optik 59, 335. Kompaneets, A. S. (1959). ‘‘About the Inﬂuence of the Space Charge on the Field Emission.’’ Dokladi Akademii Nauk SSSR. 128, 1160 (in Russian). Krendel, Y. E. (1977). Plasma Electron Sources, Moscow (in Russian). Krotevich, D. N. (1985). Ph.D. Degree Thesis, All-Union Research Center for Surface and Vacuum Properties Studies, Moscow (in Russian). Krotevich, D. N., Ptitsin, V. E., Fursey, G. N. (1986). ‘‘Observation of Fine Structure of the Restructured Microcrystal Surface by Pulsed Field Emission Microscopy.’’ Fizika Tverdogo Tela. 28, 3722 (in Russian). Krotevich, D. N., Ptitsin, V. E., Fursey, G. N. (1985). ‘‘Spontaneous Restructurization of the Field Emission Cathode at the Ultimate Field Emission Current Density Take-off.’’ Z. Tekh. Phys. 55, 625 (in Russian). Landau, L. D., and Lifshits, E. M. (1982). ‘‘Electrodynamics of Continuous Media,’’ Moscow: Nauka (in Russian). Landau, L. D., Lifshits, E. M. (1973). Mechanics, Moscow: Nauka (in Russian). Landau, L. D., Lifshits, E. M. (1976). Statistical Physics, Pt. 1, Moscow: Nauka (in Russian). Levine, P. H. (1962). ‘‘Thermoelectronic Phenomena Associated with Field Emission.’’ J. Appl. Phys. 33, 582. Lewis, T. J. (1956). Phys. Rev. 101, 1694. Martin, E. E. (1960). ‘‘Research on Field Emission Cathodes.’’ Air Develop. Div., Ohio, Tech. Rep, No. 59-20 (AD-272760), Field Emission Corp., McMinnvile, Oregon. Mesyats, G. A., Proskurovsky, D. I. (1984). Pulsed Electric Discharge in Vacuum, Nauka: Novosibirsk (in Russian).

NON-STATIONARY THERMAL FIELD EMISSION

227

Mitterauer, J., Till, R., Fraunschiel, E. (1975). ‘‘The Temperature of Field Emitting Surface.’’ Proc. XII Int. Conf. Phenomena in Ionised Gases, Eindhoven, 1975, Contr. Papers, Amsterdam, part 1, p. 249. Modinos, A. (1990). Field, T hermionic and Secondary Electron Emission Spectroscopy (Ed. G. N. Fursey). Moscow: Nauka. Molokovsky, S. I., and Sushkov, A. D. (1991). ‘‘Intense Electron and Ion Beams,’’ Moscow: Energoatomizdat, p. 35 (in Russian). Molokovsky, S. I., Sushkov, A. D. (1991). Intense Electron and Ion Beams, Moscow: Energoatomizdat (in Russian). Muller, E. W., Tsong, T. T. (1972). Field Ionization Microscopy. Moscow: Metallurgy. Murphy, E. L., Good, R. H. (1956). ‘‘Thermionic Emission, Field Emission and Transition Region.’’ Phys. Rev. 102, 1464. Nakamura, S., Kuroda, T. (1969). ‘‘On Field Evaporation end Forms of a bcc Metal Surface Observed by a Field Ion Microscope.’’ Surf. Sci. 17, 346. Nonheating Cathodes (1974). (Ed. M. I. Elinson). Moscow: Sov. Radio. Nottingham, W. B., (1941). Phys. Rev. 59, 907. Primary Processes of Crystal Growth. (1959). Coll. Papers (Eds. G. G. Lemmlein and A. A. Chernov). Moscow: Publ. House of Foreign Literature (in Russian). Ptitsin, V. E. (1990). ‘‘On a Mechanism of Vacuum Breakdown Initiated in a Point-Cathode Diode.’’ Proc. 14 Int. Symp. on Discharge and Electrical Insulations in Vacuum, Sante Fe, USA, p. 77. Ptitsin, V. E. (1990). ‘‘Surface Diffusion and Prebreakdown Phenomena’’ Proc. XIV Int. Symp. on Discharges and Electrical Insulation in Vacuum, Santa Fe, USA, p. 269. Ptitsin, V. E. (1991). ‘‘Instability of Thermal Field Electron Emission’’ Surface Science 246, 373. Ptitsin, V. E. (1992). ‘‘Atom Desorption Effects on Electron Emission in Intense Electric Fields.’’ IAI RAS (in Russian). Ptitsin, V. E. (1992). ‘‘To the Problem of the Vacuum Breakdown.’’ Pisma v Zhurnal Experimentalnoi i Teoreticheskoi Fiziki. 55, 325 (in Russian). Ptitsin, V. E. (1993). ‘‘Instability of the Metal Microcrystal Surface at Intense Electron Emission.’’ J. Vac. Science and Technology A. 11(5), 2447. Ptitsin, V. E. (1996). ‘‘Thermal Field Processes Activated by Exposure of Condensed Matter to Strong Electric Fields and Concentrated Energy Fluxes.’’ Doct. Sci. Degree Thesis, Inst. Analytical Instrumentation. St. Petersburg: Russian Academy of Sciences. Ptitsin, V. E. (1996). ‘‘Thermal Field Processes Activated by Exposure of Condensed Matter to Strong Electric Fields and Concentrated Energy Fluxes.’’ Autoreview of Doct. Sci. Degree Thesis, Inst. Analytical Instrumentation. St. Petersburg: Russian Academy of Sciences. Ptitsin, V. E., Koltsov, S. N. (1998). ‘‘Calculation of Field Strengths Caused by Emitted Electrons at the Field Emission Cathode Surface Using the Imaging Technique.’’ Izvestia Akademii Nauk Russian Fed. (Fiz. Ser.), No. 10, p. 1991 (in Russian). Ptitsin, V. E., Komyak, N. I., Koltsov, S. N. (1998). ‘‘Emitted Electron Space Charge Effect on Thermal Field Emission.’’ Doklady Akademii Nauk Russian Fed. No. 3 (in Russian). Rakhovsky, V. I. (1970). Physical Bases of Electrical Current Switching in Vacuum. Moscow: Nauka (in Russian). Rayzer, Yu. P. (1987). Physics of Gas Discharge. Moscow: Nauka (in Russian). Shrednik, V. N., Pavlov, V. G., Rabinovich, A. A., and Shaikhin, B. M. (1974). ‘‘Intense Electric Field and Heating Effects on Metallic Tips.’’ Izvestia AN SSSR (Phys. Ser.), 38, 296 (in Russian. Slivkov, I. N. (1986). High-Voltage Processes in Vacuum. Moscow: Energoatomizdat (in Russian). Smirnov, V. I. (1969). Course of Higher Mathematics. Vol. 3. Moscow: Nauka (in Russian).

228

V. E. PTITSIN

Sokolskaya, I. L. (1956). ‘‘Surface Migration of the W Atoms in the Electric Field.’’ Z. Techn. Fiz. 26, 1177. Swanson, L. W., Bell, A. E. (1973). Advances in Electronics and Electron Physics 32, 193 (Ed. L. Marton). New York: Academic Press. Swanson, L. W., and Crouser, L. C., Charbonnier, F. M. (1966). ‘‘Energy Exchange Attending Field Emission.’’ Phys. Rev. 151, 327. Swanson, L. W., Crouser, L. C. (1967). ‘‘Total Energy Distribution of Field Emitted Electron and Single-Plane Work Functions for Tungsten.’’ Phys. Rev. 163, p. 622. Tunneling Phenomena in Solids. (1973). (Ed. Ya. I. Perel). Moscow: Mir (in Russian). Vibrans, G. E. (1964). ‘‘Vacuum Voltage Breakdown as Thermal Instability of the Emitting Protrusions.’’ J. Appl. Phys. 35, 2855. Wang, S. C., Tsong, T. T. (1982). ‘‘Field Temperature Dependence of the Directional Walk of Single Adsorbed W Atoms on the W (110) Plane.’’ Phys. Rev. B, 26, 6470. Wood, R. W. (1897). ‘‘A New Form of Cathode Discharge and the Production of X-Rays together of Some Note of Diffraction.’’ Phys. Rev. 5, 1. Young, R. D. (1959). ‘‘Theoretical Total-Energy Distribution of Field Emitted Electrons.’’ Phys. Rev. 113, p.110. Zhdanov, V. P. (1988). Physicochemical Surface Processes. Nauka: Novosibirsk (in Russian). Zhukov, V. M., Polezhaev, S. A. (1988). ‘‘Changing of the Pointed Emitter Surface in Nanosecond Electric Fields,’’ Radiotekhnika i Electronika 33, 2360 (in Russian). Zhukov, V. M., Polezhaev, S. A. (1989). ‘‘Evolution of th Microcrystal Surface at the Point Tip in the Thermal Field Environment.’’ Z. Tekhn. Fiz. 59, 130 (in Russian). Ziman, J. (1962). Electrons and Phonons.’’ Moscow: Publ. House of Foreign Litrature (in Russian).

Appendix About the Feasibility of Creating a High Brightness and Angular Emission Intensity Thermal Field Cathode for Electron ‘‘Quasi-Lasers.’’ The inference drawn on the mechanism of emission instability in high electric ﬁelds have helped, ﬁrst, to ﬁnd out the physical factors limiting the emission capacity of conventional thermal ﬁeld pointed emitters, which are usually fabricated by electrochemical etching of wires made of refractory transition metals (W, Mo, Ta, Nb) and, second, to offer a conceptually new nanotechnology to make pointed emitters with unique electron-optical characteristics: brightness of up to 10 A/cm sr, angular emission intensity of 10\ A/sr. Based on the conceptions developed, it has become possible to formulate physical conditions, which should be met by an ‘‘ideal’’ emitterelectron source for an electron ‘‘quasi-laser’’ (EQL). These conditions are as follow. 1. The emitting surface of a pointed emitter should be a closely packed MC plane. This is necessary to minimize the surface concentration of emitter substance native atoms, which at high emission current densities, go from

NON-STATIONARY THERMAL FIELD EMISSION

229

tightly bound intralattice states to loosely bound surface states, that is, adatom states, due to self-heating of the MC tip and excitation of the subsurface layer electron subsystem. The minimum surface concentration of atoms in adatom states provides minimization of surface self-diffusion ﬂuxes and of the activated evaporation of adatoms into vacuum. The latter leads to the maximum stability of emission owing to decreasing emission current ﬂuctuations (noise) and the counterﬂow of ions bombarding the emitting surface. 2. The mean binding (or cohesion) energy of atoms forming the emission surface should be maximum possible. This condition also implies minimum self-diffusion and activated evaporation ﬂows, thus ensuring a stable geometry of the emitting surface and, hence, the long-term stability of electron-optical parameters of the cathode-emitter. 3. The work function of the emitting surface should be minimum possible. This is necessary to keep the stationary TFE from going to an unstable stage up to maximum possible (or limiting) values of emission current density (J ). In particular our calculations have shown that reducing the work function of the emitting surface from 4.5 eV (for W) to 2.8 eV (for ZrO/W(100) results in ‘‘shifting’’ the J value from 10 A/cm to 10 A/cm. 4. Thermal calculations relating to emitter tip self-heating by the highdensity emission current have shown that J depends also on the curvature radius of the emitting surface. The value of J was calculated to be on the order of 10 A/cm, with the surface curvature radius being comparable to the characteristic electron free path length as regards the electron-photon scattering. It has been found that the above combination of properties of an ‘‘ideal’’ electron emitter for EQL is exhibited by ZrO/W(100 thermal-ﬁeld emitters made using special emitting surface forming techniques. Omitting the details of the nanotechnology developed, we only note that it is based on known data according to which at thermal-ﬁeld rearrangement in an intense electric ﬁeld, it is possible to facet the original rounded-smoothed W MC tip in a controllable and reproducible manner, as well as on the fact that thermalﬁeld emission electrons can be conﬁned in small, solid angles by means of selective adsorption of Zr atoms on closely packed W(100 type faces. In view of the aforesaid, the physical-nanotechnology principle of the new technique for enhancing the emission capacity, which we called the ‘‘duallocalization’’ of emission, is composed of successive application of two known localization technologies. This was implemented as follows. The ﬁrst step consisted in rearrangement of the W(100MC in intense electric ﬁeld, which lasted until forming a tetrahedral angle in the (100

230

V. E. PTITSIN

direction. It was followed by the deposition of a Zr monolayer on the facetted MC surface with a calibrated molecular gun. The thermal-ﬁeld ZrO/W(100 cathode thus fabricated exhibited the claimed above unparalled brightness and angular emission intensity values in the stationary mode of TFE. Other electron-optical characteristics of the cathodes such as the half-width of the emission-electron energy distribution and noise spectral power density at various stages of the ZrO/W(100 cathode development were measured in automated ultrahigh vacuum units. The measurements have shown that the above characteristics substantially depend on the thermal-ﬁeld conditions of the ZrO/W(100cathode operation. The most optimal conditions were found to be those of Schottky emission mode. Emission and Electron — Optical Parameters of the Thermal Field ZrO/W(100 Cathode with High Brightness and Angular Emission Intensity 1. 2. 3. 4. 5. 6. 7. 8.

Brightness in the stationary mode of emission, A/cm sr Angular emission intensity in the stationary mode of emission, mA/sr Total current in the stationary mode of emission, mA Half-width at half-height of the emitted electron energy distribution in the Schottky emission mode, eV Operating temperatures, K Emission current stability, %/hr Operating vacuum, Pa Service life, hr

up to 10 up to 10 up to 5—10 1.5 1500—1800 1 10\ 2000

So the theoretical and experimental results obtained have demonstrated the theoretical and practical feasibility of creating electron emitters providing electron ﬂow densities comparable to photon densities reached with highpower laser sources. To solve the problem of forming high-power density submicron electron probes from electron ﬂuxes emitted by ZrO/W(100 cathodes, a versatile multilens electron-optical system has been developed, whereby it is possible to ﬁnd an optimal aperture and correct for aberrations of lenses in the electron gun in accordance with the probe current, electron energy, and working distance to the object. Based on the calculations of aberration coefﬁcients (AC) of the lens system with a virtual intermediate electron source image, the possibility of building electron gun with a stepwise correction of aberrations, wherein each preceding lens reduces AC of subsequent lenses. By combining doublepole and single-pole magnetic lenses with unarmored coils and electrostatic

231

NON-STATIONARY THERMAL FIELD EMISSION

lenses, it is possible to obtain a versatile lens system with minimum aberrations for the entire range of probe currents electron energies and working distances to the object. The calculation of electron beam power densities in electron gun with developed thermal ﬁeld cathodes shows that at average quite attainable values of AC of about 0.5 cm and probe sizes of 0.1 m power density values at the specimen surface are equal to (10—10)W/cm. Further optimization of the electron gun for EQL would result in even higher power density levels of up to 10 W/cm and over. The numerical calculated estimates of the possible electron—optical parameters of such electron guns with a thermal ﬁeld ZrO/W(100 cathode are given below. Electron—Optical Parameters of the Electron Gun for EQL 1. 2. 3. 4. 5. 6. 7. 8.

Electron probe power density range on the exposed surface, W/cm Electron probe energy range on the exposed surface, keV Electron probe diameter range on the exposed surface, nm Electron probe minimum diameter, nm Beam current, A Working distance, cm Residual gas pressure in the cathode region, Pa Overall dimensions of the electron gun (diameter—length), cm

10—10 0.2—5.0 2—1000 2 10\—10\ 0.6—5 10\ 5—14

a This Page Intentionally Left Blank

ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 112

Theory of Ranked-Order Filters with Applications to Feature Extraction and Interpretive Transforms BART WILBURN University of Arizona, Optical Sciences Center, Tucson, Arizona

I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . II. Statistical Approach to Ranked-Order Filters . . . . . . . . . . . III. Mathematical Logic Approach to Ranked-Order Filters . . . . . . . A. Logical Construction . . . . . . . . . . . . . . . . . . . . B. Logical Investigation . . . . . . . . . . . . . . . . . . . . C. Two-Dimensional Analysis . . . . . . . . . . . . . . . . . D. The Grammar of (L ) Fixed-Point Root Combinations . . . . .

E. Oscillating Roots . . . . . . . . . . . . . . . . . . . . . F. Octagonal Hexagonal and 3D Filters . . . . . . . . . . . . . IV. A Language Model Based on Ranked-Order Filters . . . . . . . . . A. The Necessity and Possibility of Interpretive Transforms of Imagery B. Satisfaction of a Propositional Language System . . . . . . . . C. Reﬂections . . . . . . . . . . . . . . . . . . . . . . . . D. Ontological Considerations . . . . . . . . . . . . . . . . . V. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

233 235 241 243 247 255 268 282 292 307 308 314 323 325 331 332

I. Introduction In every discipline there are objects and constructs of intellect that are almost trivially simple in form, and yet are fascinating because they exhibit complex behavior. The discipline of mathematics is the host of many such constructs and the ranked-order ﬁlter is one of them. The median window ﬁlter is the most common manifestation of the ranked-order ﬁlter and it is extraordinarily simple in construction. However, an understanding of its behavior, especially in two dimensions, has been slow in coming. Indeed, for lack of understanding, the median window ﬁlter has until recently fallen from grace and been regarded as a minor tool of limited use. The purpose of this essay is to shed some light on the behavior of the general class of ranked-order ﬁlters exempliﬁed by the median window ﬁlter. An additional purpose of this essay is to suggest some rather promising applications of this understanding for feature extraction from imagery, and perhaps also for linking feature extraction to automated image interpretation, or artiﬁcial intelligence. 233 Volume 112 ISBN 0-12-014754-8

ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright 2000 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/00 $35.00

234

BART WILBURN

We have known about ranked-order ﬁlters in various forms for a long time; the most well-known form being the median window ﬁlter. These ﬁlters take various names such as stack ﬁlters, median, or minimum, mean square windows, and so on. The commonality of these ﬁlters, explaining the term ‘‘ranked-order,’’ is that they all operate on the basis of determining the ﬁlter output according rank in a sorted list of the data, or by a function of a sorted list of data. The data in the sorted list are those data selected by a window as it moves incrementally along a stream of data. Nothing could be simpler, yet we may wonder about all the attention given to them in this and many previous studies. The simplicity of the median window ﬁlter belies the elegant insight of Tukey (1970) who introduced it as a simple scheme for presenting the essential characteristics of data that have meaning. He referred to it as the ‘‘box’’ method of sampling (or presenting) the essential information-bearing characteristics of a set of data. The underlying philosophy of the method was the assumption that the meaning of a set of data was conveyed by the relationship of the data, one to another, in its context. Following this introduction, the ‘‘box’’ method was extended to be the ‘‘repeated median ﬁlter,’’ or RMF. This underlying philosophy seems to have been largely overlooked, and the ﬁlter was popularly applied to sequential data for removing spiky noise while preserving edge gradients, that is, the features regarded as important. The median window indeed does this, but this application ignores its other more interesting and sometimes vexing characteristics associated with preserving edge gradients that became apparent after its introduction. Those characteristics were the phenomena of ‘‘roots’’ or patterns of data that are transparent to the ﬁlter. We will explore the roots and see how they may be applied to the problem of feature extraction without knowing the feature a priori. This author at least, regards the possibility of this application to be what distinguishes the median window from other ﬁlters, many of which are superior in noise-reduction capability. This chapter is divided into two approaches, that is, development of the theory of the ﬁlter, and then applications of the ﬁlter. Both approaches are derived from other works (Wilburn, 1998a; 1998b) with some new developments not previously published. The development of the theory is approached from two perspectives, statistical and logical. The statistical analysis is presented in Section II, and is concerned with the probability distribution of the ﬁltered data and the associated signal-to-noise ratio (SNR). As noted, the ﬁlter has not been widely used for noise reduction and SNR theory of the ﬁlter has not been fully described. Nevertheless, the SNR theory is offered here for completeness. However, it is the logical theory that opens the door for our understanding of how the ﬁlter functions in a way that enables us to use it as a tool based on the morphology of the roots of

THEORY OF RANKED-ORDER FILTERS

235

the ﬁlter in one, two and possibly three dimensions. The logical analysis of the ﬁlter is much richer in content and leads to the logical theory of the ranked-order ﬁlter presented in Section III. II. Statistical Approach to Ranked-Order Filters The roots of the ﬁlter are a relatively small number of patterns of data satisfying a relational structure and are transparent to the ﬁlter. We will ﬁnd that because these roots are deﬁned by a relational structure, the co-joining, or conjunction, of them is similarly constrained by a relational structure. This light of understanding suggests we may constrain the ﬁlter to explore applications such as feature extraction and the notion of automated image interpretation, and for designing the ﬁlter to achieve other desired outcomes in signal processing. The statistical approach to the ranked-order ﬁlter, whether formulated by a distribution function or by order statistics, has been the approach of most analyses of the ﬁlter prior to this one. The method of order statistics led to a description of the 1D roots, both ﬁxed-point and oscillating, primarily presented for bi-valued data, but it fell short of a functional understanding of the ﬁlter sufﬁcient to control it for detection of roots. The method of order statistics will be referred to again later in Section III when we address the roots of the ﬁlter. The formulation of the ﬁlter as a multinomial distribution function leads to a solution for estimating the signal-to-noise (SNR). As a practical matter, however, estimating the SNR has not been a subject of concern in most applications of the median window ﬁlter. This is because most applications of the ﬁlter have been for morphological effects, for example, to remove spikes or to preserve edges, and in those cases, distortion rather than SNR was the matter of concern. Nevertheless, the SNR solution is interesting in its own right mathematically, and is given here for completeness and because the method may ﬁnd utility in other problems that can be presented in a similar form. As remarked in the foregoing, the ranked-order (RO) ﬁlters derive their name from the datum output of the ﬁlter being a member of an ordered subset. We may express this somewhat more rigorously as: The output datum of the ﬁlter is a member of an ordered subset ( , . . . , of a set of input data ,x -; , - : ,x -. In the case of the median window ﬁlter, the output of the ﬁlter is the median of ( , . . . , , thus N is odd and generally expressed as N : 2k ; 1. The derivation of the generalized SNR for RO ﬁlters is based on the SNR deﬁned as SNR : (x / (x), where (x is the mean value of multivalued data, and (x) is the standard deviation of that data. The input data to the ﬁlter ,x - are assumed to be

236

BART WILBURN

independent and identically distributed (iid), according to a probability density function p (x), and the ﬁltered data, or output data, are distributed according to a multinomial distribution p (x). The datum selected from the ordered subset, x + , -, must satisfy three conditions deﬁning x as: x : , an output datum. The conditions are: (a) That + (x, x ; dx); p (x) dx (b) That n of ( , . . . , are greater than ; (1 9 P (x)). (c) That m of ( , . . . , are less than ; (P (x)). For m ; n ; 1 : N, the rank of the ﬁlter is determined by m and n, for example, for a median window m : n. The multinomial construction (Frieden, 1998) of P (x) : p (x) dx is N N91 p (x) p (x) dx p (x) dx dx. P (x) : 19 1 m \ \ \ (1)

The task before us is to solve Eq. (1) for (x and (x) to estimate the SNR of data ﬁltered by a ranked-order ﬁlter. To do this, we must stipulate that the probability density p (x) is deﬁned to be integrable over x + X; P (x) : p (x) dx. With this stipulation, we have: \ N N91 (P (x)) (1 9 P (x)) p (x) dx. (2) P (x) : 1 m \ We may now make use of a probability integral transform y : P (x); dy : p (x) dx, and have P (x) expressed in Y-space as P (y) *V P (y) : B\(m ; 1, n ; 1) y(1 9 y) dy. (3) Please notice that the multinomial coefﬁcient is identiﬁed as a beta function B\(m ; 1, n ; 1). The density function p (y) follows now as p (y) : B\(m ; 1, n ; 1) y(1 9 y). We can now see immediately that the ﬁrst and second moments of p (y) are beta functions: (y : B\(m ; 1, n ; 1) y ?(1 9 y) dy, B(m ; 2, n ; 1) (y : , (4) B(m ; 1, n ; 1)

B(m ; 3, n ; 1) (y : . B(m ; 1, n ; 1)

(5)

237

THEORY OF RANKED-ORDER FILTERS

From these expressions, the (y and (y) are readily computed using ;(); (2 ) B(, 2) : , ;( ; 2) ;( ; 1) : ! The results are shown as follows for the median, maximum and minimum windows: m : n : (N 9 1)/ 2; (y

1 (4(N ; 2), : , (y) : \

2

(6)

N N , m : N 9 1, n : 0; (y : , (y) :

(N ; 1) (N ; 1) (N ; 2) 1 m : 0, n : N 9 1; (y : , (y) : (y).

(N ; 1)

(7) (8)

As implied in the results, (y) is symmetric about (y). The reader may

see quite readily that (y and (y) could be evaluated for any arbitrary value of m or n, subject to m ; n ; 1 : N, corresponding to the rank of the output selected by the ﬁlter. The SNR (y) could, of course, be computed from these quantities, but it is of little value. The desired quantities are (x and (x), and they are obtained by transforming from Y to X by P\ (y) utilizing its properties as a transform function and as a probability function. It is important to realize that P (x) is in fact P (0 9 x); thus P (x) is isomorphic from X to Y and P\ (y) is an inverse transform. Because P (x) is a probability transform, P\ (y) is the fractile in X at probability y; thus P\((y ) is the fractile (x 9 0 at probability (y and we have: P\ ((y ) : (x . (9) It is not normally the case to deal with the mean value of a probability, thus it may be worth a cautionary note that (x is not the mean value of the inverse transform, but is the inverse transform of the mean value of the probability (y . The transform of (y) is not direct by the same reason ing. The standard deviation is normally thought of as simply the square root of the variance, but here it is a measure in Y of probability y in the interval y of (y ; (y), or (y 9 (y). For this reason we must have a reference point for (y) and that reference is (y . This is especially true because P\ (y) is not symmetric in units of X about (x in the general case of all (m, n) for some p (x) dx, or of all p (x) dx for any (m, n). In other

238

BART WILBURN

words, the (x) can depend on which side of (x it is measured, and on the choice of (m, n), with respect to the symmetry of p (x). The solution is to form two measures t : (y ; (y) and t\ : (y 9 (y) such that P\ (t < ) : (x < < (x). The properties of this measure are that, whereas the intervals: (x , (x ; (x) and (x , (x 9 \ (x) are not equal for all cases of (m, n) and their sum is an unknown quantity according to p (x) dx, the corresponding intervals in Y are (y , t ; (y , t\ : const. and are computable according to Eqs. (6) to (8) invariant of p (x) for all cases. These conditions allow us to compute a (x) that subsumes asymmetric cases of p (x) dx and choices of (m, n) as: t

1 (x) : P\ (y) . (10) 2 t\

The SNR (x) may now be computed as deﬁned: (x (11) SNR (x) : . (x) We will apply this formalism to three distributions of data: Rect(x), Normal(x) and Exp(x). The window ﬁlter used for explication is the median because it is simple and also because it is the most commonly used ranked-order ﬁlter. For this case, m : n : k; k : (N 9 1)/ 2. Rect(x) : The general formulation is:

1 x9a P (x) : Rect b b 1 (y : b

(12)

b (x 9 a ; 2

1 b

(x) : b(y) ; a 9 . 2 2 \ When we apply the case of the median window, that is, ‘‘w’’ is ‘‘med’’ in these expressions, we get: (x

b : b(y ; a 9

2

(x : a.

(13)

THEORY OF RANKED-ORDER FILTERS

239

Because p (x) is symmetric about a and m : n, P\ (y) is symmetric about

(x , P$t ) : P\ (t$, and (x) : \(x). Thus:

(x) : b (y). (14)

The resulting SNR for Rect (x) is: SNR

(x)

:

2a (N ; 2. b

(15)

Normal(x): The Normal distribution N((x, (x)) is most easily analyzed by temporarily transforming it to the Gauss distribution N(0, 1) by z: thus, for p (z) : N(0, 1), P (z) is

(x 9 (x) (2(x)

,

1 P (z) : (1 ; erf (z)). 2 Here again, for the case of the median window, we have symmetries of m : n and P (y) about (x , but in this case P (z) is nonlinear. The general

formalism is:

1 (y : (1 ; erf ((z )) 2

1

(z) : erf \(2y 9 1) . 2 \ From these relationships, we realize the beneﬁts of symmetry and ﬁnd: erf( ) : erf(\ )

(z : 0

erf( (z)) : 2 (y).

(z) : erf\(2 (y)).

For practical application in most cases, we may note that for (y)

sufﬁciently small, that is, less than about 0.45 (N 3), erf( (z)) may be

approximated by (z) within an error of 0.10 at the user’s discretion. In

any case, we have (z) in terms of N, and we may transform from Z back

240

BART WILBURN

to X with the result of: (x : (x,

(16)

(x) : (2(x) (z),

(x . SNR (x) :

, (2(x) erf\(\ (N ; 2)

(17) (18)

In the case of the maximum and minimum windows applied to Normal(x), the reader will note that the approximation of (z) cannot be made, that

is, erf( (z)) " (z). Moreover, the symmetries are lost and the

solution must proceed with tabulated data of the error function for a particular window of length N using Eqs. (7) to (11). Exp(x): The Exp(x) density function is asymmetric about any point in its domain and is nonlinear, yet the solution is closed for all choices of (m, n). The solution is shown for an arbitrary window ﬁlter and is derived in the same manner as for the Rect(x) and Normal(x): P (x) : 1 9 exp

9x a

(19)

(x : 9a ln(1 9 (y ), a 1 9 (y 9 (y) (x) : 9 ln , 2 1 9 (y ; (y) 9a ln(1 9 (y ) . SNR (x) : (x) N.B.: The SNR (x) is not a function of the parameter a of the distribution, Eq. (19).

(20) (21) (22) data

Veriﬁcation The estimates of SNR (x) derived in the foregoing are veriﬁed by a computer simulation of a median ﬁlter of length N : 9. The basis of comparison is the SNR gain G deﬁned as G : SNR (x)/SNR (x). The ﬁlter

was applied to three strings of data X , n : 650, distributed according to: Rect(x): x : 2aRANF(y); a : 5, b : 2a Exp(x): x : 9a(1 9 ln(y )); y : RANF(y), a : 5

241

THEORY OF RANKED-ORDER FILTERS

a Normal(x): x : Rnorml(a, ), a : 5, : . 2(3 The results are shown in Table 1 for comparison of the estimated and measured values of : (x, (x , (x), (x), SNR(x), SNR (x), and G.

TABLE I Verification of SNR (x)

Rect(x)

(x: (x :

(x): (x):

SNR(x) : SNR (x):

G:

Normal(x)

Exp(x)

est.

meas.

est.

meas.

est.

meas.

5.00 5.00 2.89 1.51 1.73 3.32 1.92

4.96 5.00 2.87 1.47 1.72 3.40 1.98

5.00 5.00 1.44 0.58 3.47 8.61 2.48

5.04 5.06 1.50 0.60 3.37 8.43 2.50

5.00 3.47 5.00 1.55 1.00 2.23 2.23

5.08 3.70 5.24 1.65 0.97 2.24 2.31

The reader may verify that in all cases, except for the maximum window applied to Rect(x), the SNR (x) is less than the SNR (x). We pay a 0price in the SNR gain of the RO ﬁlters to realize the beneﬁt of their morphological characteristics. The morphological characteristics are determined by the roots of the ﬁlter, which leads us to the following section. III. Mathematical Logic Approach to Ranked-Order Filters As mentioned earlier, the characteristic of the ranked-order ﬁlter that has attracted attention is that the median window (MW) has roots. This notion derives from the property of the MW to preserve edge gradients while suppressing noise. This property, however, is a consequence of the more general characteristic that ‘‘some’’ patterns of data simply pass through the ﬁlter unchanged. The peculiarity of these patterns is that they are not deﬁned as to shape or value, but instead by the relationship of the data values to each other. Furthermore, these patterns are not restricted to edge gradients. This characteristic belies an important difference between the MW and almost all other digital ﬁlters that is often not appreciated. Almost all other ﬁlters replace the original data with another different set of data, that is, they generate a replacement set of data such as in the cases of the average window, the Wiener ﬁlter, or the maximum-entropy ﬁlter. The MW instead does one of the following two things to data: (1) It rearranges the

242

BART WILBURN

data, replicating some and throwing some away; or (2) it leaves a string of data unchanged if it is a root of the ﬁlter. However, there is this caveat: It leaves a string of data unchanged if it is a root AND if the ﬁlter is correctly implemented. We shall see later how it is possible to incorrectly implement the ﬁlter resulting in pseudoroots. Nevertheless, even then, it does not generate new data. Given in more mathematical terms: The range of the MW is a subset of the domain of the ﬁlter such that the output is an onto map of the input. The relational nature of the roots to the MW, and the fact that the roots are characteristic of the ﬁlter, suggest that the methods of mathematical logic (Schoenfeld, 1967) would be a fruitful approach to development of the theory of the RO ﬁlter exempliﬁed by the MW. This is a departure from the previous theory based on order statistics (Justasson, 1982; Bovick et al., 1983), yet it builds directly on its results. The order statistics approach succeeded in the classiﬁcation of roots to the MW in 1D, and provided the phenomenological basis for the logical theory by establishing the existence of ﬁxed-point (FP) roots deﬁned in terms of local monotonicity (Eberly et al., 1991; Longbotham, 1989). The results of order statistics provided a description of roots and data types as follows. If we posit a subsequence of data S : ( , . . . , ) within any sequence S of data, and a median window ﬁlter 3 (N), such that the

convolution of 3 (N) and S leaves S unchanged, 3 (N)* S : &

S , then S is a root of 3 (N). This deﬁnes S as type-I data with

respect to 3 (N). Data not having this property are called type-II data

with respect to 3 (N). There are two kinds of roots: ﬁxed-point roots and

oscillating roots. Fixed-point roots have the property of contiguous data and are deﬁned for both multivalued and binary (a ^ b, a b) data, whereas oscillating roots are deﬁned only for binary data that oscillate between ‘‘a’’ and ‘‘b’’ as ‘‘abab . . . a.’’ The ﬁxed-point roots of the median ﬁlter have been analyzed by Tyan (1982) and Longbotham (1989) using the methods of order statistics and described in terms of monotonicity (paraphrased here in the language of sets) as: A root of a median ﬁlter of length N, N : 2k ; 1, is a set of data , - : ( , . . . , ) such that for each + , -, ( , . . . , ) is either monotonically nonincreasing, ( . . . , or monotonically non

· · · , but not both, within a window of length decreasing, ( j ; k and j 9 k connecting each successive datum, , + , -. Such a sequence of data is called locally monotonic of order k ; 1, and designated: LOMO(k ; 1). This theorem is proved in this investigation as part of the development of a computable structure of a ﬁlter function that extends easily from one dimension to two dimensions and implies the existence of a logical grammar of ﬁxed points.

THEORY OF RANKED-ORDER FILTERS

243

The development of the logical theory will involve two steps, that is, (1) logical construction, and (2) logical investigation. The logical construction will develop the schema for a mathematical model of the ﬁlter based on its structure and function. The logical investigation will employ this model to investigate the structure of data satisfying the conditions for roots, and further investigate the mathematical properties of the model for applications to signal and feature extraction. The investigation will begin with the 1D case for ﬁxed-point (FP) roots and oscillating roots, and then introduce the coded window ﬁlter as a generalization of oscillating roots. The investigation will continue with analysis of roots structure and behavior for the 2D case in terms of root morphology and syntactic structure, and include example applications to feature extraction based on FP root representation of features. The chapter will conclude with an exploration of the development of a language model of features in imagery based on FP representation of features. A. Logical Construction We must begin this development with a description of the data u . This may seem trivial, but it is not. Data are usually regarded as a sequence of values, but here the data are regarded as an entity that has properties. The properties we are concerned with are value, or amplitude, and position in a sequence. This is an important concept because it will enable us to separate the notions of value and position. The construction proceeds as follows. We may consider the data as being composed of terms designating individual quantities having property u. Let us take note that if , . . . , are terms, and a function f is n-ary, then f, . . . , is a term. The data set u may be considered to be of this form where u is the symbol that is n-ary such that u, . . . , is a term. The number n is a natural number determined by u and is the index of u. We may use this form to associate with every u in a sequence u a term designating its position in that sequence. This convention enables separation of the index of a variable from the value of the variable so that we may construct functions of its position in a sequence independently of its value, or functions of its value independently of its position. The notational convention adopted is: u : (u) , i n such that for every n-tuple u : (u , . . . , u ) are associated functions ((u ), . . . , (u )) and (, . . . , where the (u ) are the value of the data element in the position indicated by . We may begin to reﬂect the structure of the MW with this representation of the data by deﬁning a subsequence of u : u , that extends from i to i ; 2k: u : (u ) : ((u ), (u ), . . . , (u )), v : ( , , . . . , .

244

BART WILBURN

We notice some notational difﬁculties here if we want to retain identiﬁcation of the designator with the index i. To avoid this, we may deﬁne a recursive function 2(a , j), a : (a , . . . , a +): 2(a , j) : a , j : 0, N 9 1 2((u ), j) : (u ) 2(v , j) : v (u ) : ((u ) , . . . , (u ) \ \) v : ( , . . . , \ \ u : (u ) v . This function allows us to deﬁne u : (u ) v where (u ) is the value of the jth term of u separable from its position in an ordered sequence: ( , . . . , , and preserves all necessary information to recover u from u by logical addition of u : u j u , u . This representation of the data deﬁnes a window u , of length N that selects N of u and associates an index with them of j : 0, N 9 1 for every increment of i; i : 1, n 9 (N 9 1), and constitutes a sampling function u of u. The u form a set: u : (u , . . . , u ) \ where every u is: u : (u , u , . . . , u ); i : 1, n 9 2k. The set u is a power set of u , and u : +\ u . We will employ this sampling function to construct the ﬁlter function applied to a data sequence u. The structure of the logical theory of the median ﬁlter is in terms of predicates and functions having the recursive properties of: K , ‘‘:,’’ and , F(a) : x(M(a, x) : 0) where x(. . . x . . .) is the -operator deﬁned as the minimum x such that . . . x . . . is true. Some readers may not be familiar with the use of a predicate in this context. It is used here as a representation of the data that has semantic content, that is, the predicate ‘‘says something’’ about the data that is either true or false. In the simplest cases, it may seem a trivially assumed state by construction, for example, that the data are represented by the sampling function. We shall see, however, that the use of a predicate allows more complex statements about the data that enable construction of a ﬁlter function useful for applications such as we shall seek. Thus we deﬁne a recursive predicate R(a) and its representing function KR (a) : KR (a) : 0 if R(a) (read as: is true)

THEORY OF RANKED-ORDER FILTERS

245

: 1 if R(a), Def. ‘‘ ’’ is negation. We further deﬁne a function (a) subject to the predicate being true as: (a) R(a), to mean that both R and are deﬁned for the argument a, or that both are undeﬁned. In a pragmatic sense, we may read this as (a) having value not necessarily ‘‘0’’ as a function of a if and only if R(a) is true of a. We may generalize this notion to (a ) R(a ), (a ) R(a ), . . . , (a ) R(a ), and apply the generalization to the median ﬁlter 3 (N ) as

a function of u , (u , u ), where u is the input and u is the output. The result is the ﬁltered output U , u : ((u ) , . . . , (u ) ) represented by a ﬁlter function I(u, u ) partially deﬁned as follows: I(u , u ) : (u , u ) iff R(u ) < I(u

\

, u ) : (u , u ) iff R(u ), \ \ \ \

where R(u ) x(K R (u , x) : 0). In order to complete the deﬁnition of the ﬁlter function we must deﬁne the function (u , u ) more explicitly as a window function: (u , u ) : x[H(G (u , x), u )], u : +\ u The function G(u , x) involves the introduction of two functions: a selection function + ((x , . . . , x ) : x and an ordering function O(x) : ((x) '? , + ? (x) '@ , . . . , (x) ' ; (x) (x) , . . . , (x) . The ordering function is where @ + ? @ we make ﬁrst use of the separability of the value (u) from its designator , and assign a different designator, ' l : ll l : 0, N 9 1. (N.B. The superY Y script index of 'l is determined by (u ) ). The ordering function may be expressed in terms of the sampling function u as: O(u ) : ((u ) ' , . . . , (u ) ' . \ We continue by deﬁning G in terms of the selection function and the ordering function as: G(u , (u ) ) : x[x : \(O (u )) & (u ) : x], where m is the rank selected by the ﬁlter, for example, for 3 (N ), m: m : k.

(As mentioned, we shall be concerned with the median ﬁlter, thus m : k.) The purpose of G is to select the output of ﬁlter for each window, thus we

246

BART WILBURN

think of G as the output selection function. The function H in turn, is deﬁned as a writing function: H(G (u , (u ) ), u ) : x [(u ) : (u ) ^ (u ) : (0) ], , so that: (u , u ) : ((0) , . . . , (0) , (u ) , (0) , . . . , (0) . \ \ \ \ we may make use of the properties of the sampling function as a member of a power set, and of the representing function of the predicate K (a), to construct the total ﬁlter function I(u, u ) as: I(u, u ) : (u , u ) · K R (u ) j, . . . , j (u , u ) · K R (u ), q : n 9 2k, k : (N 9 1) /2, or I(u, u ) : +\ (u , u ) · K R (u ). (23) An understanding of the ﬁlter function will be gained through the investigation of its roots in the following section, but before that let us consider its structure. The entire structure is based on the global union of the window function subject to the predicate being true. The arguments of the window function are the sampling function of the ﬁlter and the data selected by it, but otherwise it makes no constraints on the form of the sampling function. This is most easily understood by realizing that the only reference in (u , u ) to 1D as opposed to 2D is the dimension of the indexical scheme. This will become clearer when we undertake analysis of 2D roots. The point is that it is the predicate R(u ) that does all the work in deﬁning the ﬁlter design and determining the ﬁlter response. It does this by representing the sampling function, and by imposing conditions on the data presented by the sampling function to the window function necessary for the predicate to be true. The criterion of truth is the representation of intent and ﬁlter design. This is where we begin to see the potential of a predicate of the data to control the ﬁlter for speciﬁc, intended applications. The predicate R(u ) represents the form and geometry of the data such that I(u, u ) is deﬁned and true with respect to the intended use of I(u, u ). The distinction between ﬁlter design is determined by the geometry of the sampling function to distinguish between 1D and 2D ﬁlter designs. The satisfaction of ﬁlter intent is determined by conditions imposed on the data such as to be either or both type-I or type-II multivalued or binary data. The predicate accomplishes these distinctions with the sampling function, and by forcing conditions (Schoenfeld, 1967) c to be correct, cor(c) conditions imposed on R(u ) for R(u ) to be true such that I(u , u ) : (u , u ) iff R(u ). Thus, the predicate is deﬁned by the sampling function u , and any

THEORY OF RANKED-ORDER FILTERS

247

forcing conditions represented as: u (u , . . . , u ) : [xR (x, u , . . . , u , cor(c)]

read as: the set of all x such that R is true. The subscript c indicates correct

conditions, cor(c), that force R (u ) to reﬂect constraints on the data such

that R : (u , u ) R (u ). It follows then that the intent to discover such

special phenomena as FP or oscillating roots for a 1D or 2D ﬁlter become discoveries of u and cor(c) in the representation of the data by the predicate such that (u , u ) R (u ). The notion of the predicate constraint is central to the application of the ﬁlter for feature extraction and the implementation will become clear by example in Section III.B, where we will illustrate these notions. In summary, the predicate and the functions (u , u ) : x[H(G( u , x), u )] and I(u, u ) constitute a conceptual framework of analysis wherein the predicate is the deﬁning condition of the ﬁlter as remarked in the foregoing. The functions consist of a window function (u , u ), comprised of an output selection function G of the output datum and a writing function H of u , and I(u, u ) is a composing function of u from u . B. Logical Investigation We may now address the solution of R (u ) for an FP root of the ﬁlter

represented by I(u, u ). We will develop the notion in 1D and then extend it easily to 2D. The notion of an FP root is that there is a sequence of either multivalued or binary data of some continuous length r in u that is invariant to the ﬁlter. This means that if every value of r is selected by the ﬁlter preserving its value and its identity in the sequence, then every datum of r must be in the kth position of O(u ) and coincident with the kth position in u for each increment of i of a median ﬁlter of length N: N : 2k ; 1. This requirement for the output of the median data value to be coincident with the median data position in u within u deﬁnes a condition of: c: (u ) : (u ) 'l , (24) k : (N 9 1)/2; $ : ' : k at median as necessary for R (u ) to be true. This is a forcing condition in the predicate.

We must note here a very important consequence of the representation of data by u : (u ) v in (u ) : (u ) 'l . The condition on the data described in Eq. (24) results in identity of the datum, regarded as an entity, in the median position of the sampling function and the entity in the median position of the ordering function. This is important not only because it permits the formalism of Eq. (23), but because it is the difference between

248

BART WILBURN

correct and incorrect implementation of the ﬁlter mentioned much earlier. The most common error in implementation is to sort the data in the ordering function according to the equivalence value of the data rather than according to the identity of entities. Failure to implement the ﬁlter according to Eq. (24) will result in false roots. The satisfaction of Eq. (24) deﬁnes (u ) to be the median value of u , and results in the following arrangement of data in the ordering function: O(u ) : ((u ) ' % (u ) ' (u ) (u ) ' % (u ) ' , \ \ (u ) : (u ) ' $ : ' . If we increment to u , then we have (u ) : (u ) . If we do \ this and maintain the condition of coincidence, Eq. (24), then we also have : '\ or : '\ depending on whether (u ) is greater or less \ \ \ than (u ) . N.B.: This amounts to a corollary to the coincidence condition. We may incorporate this result into the O(u ) to result in: O(u ) : ((u ) ' % (u ) (u ) \ \ % (u ) ' , \ or ) : ((u ) ' % (u ) (u ) \ \ % (u ) ' , j k. (25) \ We can see now that the forcing condition of cor(c (u )) : : ' for i : i, ' i ; k results in a predicate representation of the data u to be a k ; 1 sequence that is either monotonically nondecreasing or nonincreasing as shown in Eqs. (26) and (27). This is a condition imposed on the data by the predicate for it to be an FP root of a median window ﬁlter 3 (N ):

O(u ) : ((u ) ' % (u ) ' (u ) \ % (u ) , j k, (26) or O(u

) % (u ) (u ) ' % (u ) ' , j k. (27) If we were to begin the analysis at i ; 1 and decrement to i, we would produce the converse of Eqs. (26) and (27). This means that for a 1D ﬁlter along the x-axis, an object that is a root must be symmetrical in terms of local monotonicity such that the sequence of data constituting the data object is locally monotonic on entry as well as on exit by the ﬁlter. This O(u

) : ((u

THEORY OF RANKED-ORDER FILTERS

249

deﬁnes a data sequence to be locally monotonic of order k ; 1 denoted as LOMO(K ; 1). This analysis illustrates how the predicate R (u ) constrained by forcing

conditions satisﬁes the intended use of I(u, u ). The subscript c indicates correct conditions, cor(c), that force R (u ) reﬂecting the ﬁlter design and

intent, in this case that I(u, u ) of 3 (N ) only selects FP roots of the

ﬁlter, that is, 3 (N ) iff R (u ). The forcing condition is hereafter

$. referred to for convenience as the coincidence condition. We must note that this case is LOMO(k ; 1) deﬁning an FP because the coincidence condition is applied to every datum in a k ; 1 sequence. Another way to say this that will prove useful later in the discussion of 2D feature extraction is as follows: The predicate condition cor(c) on the data is that it must satisfy the coincidence condition for a k ; 1 set of contiguous data. This condition is not given explicitly in Eq. (24), but of course could be incorporated as a super-predicate on I(u, u ) as R (u ). We will see later that if the

$. coincidence condition is not applied to every datum, but, say, to every other datum, then it results in an oscillation root, and later still to a root deﬁned by a repeated pattern of data in a coded median ﬁlter. 1. Example of 1D Fixed-Point Root Detection Let us suppose a sequence of data described as shown in the following where (u) is listed in the upper row and is in the lower row: (u) 0 1 2 1 4 5 5 5 5 6 6 7 6 : 1 2 3 4 5 6 7 8 9 10 11 12 13

Now let us apply the sampling function u : (u ) v for a median window of N : 9, that is, j : 0-8, for each increment of i : 1-5 with the designator of the sampling function , shown in the lower row indexed by i. We may then write the ordering function followed by the resulting window function. This analysis will illustrate the conditions of u to satisfy LOMO(k ; 1). The output index 'l , of the selection function in G(u , (u ) ), m : k, operating on O(u ) is highlighted in boldface: (u) 0 1 2 1 4 5 5 5 5 6 6 7 6 i:1 : 0 1 2 3 4 5 6 7 8 O(u ) : (5' 5' 5' 5' 4(' : ) 2' 1' 1' 0' (u , u ) : 000040000 (u) 0 1 2 1 4 5 5 5 5 6 6 7 6 : i : 2. 012345678

250

BART WILBURN

O(u ) : (6' 5' 5' 5' 5(' : ) 4 2' 1' 1' (u , u ) : 000050000

(u) 0 1 2 1 4 5 5 5 5 6 6 7 6 : v 012345678 O(u ) : (6' 6' 5' 5' 5(' : ) 5 4 2' 1' (u , u ) : 000050000 i : 3.

(u) 0 1 2 1 4 5 5 5 5 6 6 7 6 : v 012345678 O(u ) : (7' 6' 6' 5' 5(' : ) 5 5 5 4 1' (u , u ) : 000050000 i : 4.

(u) 0 1 2 1 4 5 5 5 5 6 6 7 6 : v 012 345678 O(u ) : (7' 6' 6' 6' 5(' : ) 5 5 5 4 (u , u ) : 000050000 i : 5.

I(u, u ) : (u , u ) · K R (u ) j, . . . , j (u , u ) · K R (u ) I(u, u ) : 0 0 0 0 4 5 5 5 5 0 0 0 0. We can see now that if we were to reverse the process and decrement i from 5 to 1, then we would produce the same result. An important observation can be made here that will be applied later in feature extraction. The implementation of the ﬁlter mentioned in the preceding incorporated the condition of coincidence R(u ) : R (u ), that is, for i : 1, 5, without

$. explicitly stating so. The root was detected because the data set was a type-I sequence of data. If we had used R (u ) for I(u, u ) where u was a

$. somewhat longer set of data that included the type-I sequence used earlier, but was otherwise type-II, then the result would have been the exclusive detection of the type-I sequence with all else returned as zero. This follows from K R : 0 in Eq. (23) for all but the type-I sequence. $.H We can Aalso illustrate the phenomena of the false root described earlier. Suppose we modiﬁed u to u by changing the last two data to zero as: (u ) 0 1 2 1 4 5 5 5 5 6 6 0 0 u : : . v 1 2 3 4 5 6 7 8 9 10 11 12 13

THEORY OF RANKED-ORDER FILTERS

251

This change will illustrate the subtle importance of the identity of the median data and the median rank implied by the coincidence condition. Let us relax the coincidence condition from ‘‘’’ deﬁned on and 'l to ‘‘:’’ deﬁned on (u ) , and then repeat the analysis for i : 1—5. This would be the same as sorting the data allowing substitution of equivalent values. The data u would fail to satisfy R (u ), but under a sort allowing substitution of equivalent value, it would

$. produce the same result, but it is a false root. This false root is made possible by the substitution between (u) permitted by ‘‘:’’ otherwise denied by ‘‘’’ \ deﬁned on and the corollary to the coincidence condition. The reason it is a false root, other than violation of the corollary, (u ) : (u ) ; \ \ : '\ or : '\ is that the detection of the FP by substitution \ \ \ occurs only if (u) ; i : 6—9, are the same value greater than or equal to (u) , even if the data are otherwise a monotonic sequence. Furthermore, basing the ﬁlter on substitution of data having equivalent value prohibits the control of the ﬁlter by forcing conditions on the predicate necessary for feature extraction. Finally, one additional comment on 3 (N ), which applies also to 2D

ﬁlters, is that a ﬁlter unconstrained by the coincidence condition applied to type-II data results in type-I data output. This comes about because the output of the ﬁlter is always the median value of the sampling function u ; thus the datum in the median position of a sampling function of the ﬁltered output, u is always the median value — ergo: type-I. 2. T he Structure of 1D Fixed-Point Roots The FP root is an object in itself having a property that the data composing it are related and describable as one of several distinct types of locally monotonic patterns of relationships ,e@-, 2 + ,s-. In the case of a 1D ﬁlter, we see the following patterns: (a) A monotonically increasing sequence of k ; 1 terms, for example, an up-ramp of length k ; 1, followed by a ‘‘ragged’’ plateau of k data all greater than any of the monotonic data in the ramp, and following a ‘‘ragged’’ plateau of k data all less than any of the monotonic data in the ramp (Fig. 1). (b) A monotonically decreasing sequence of k ; 1 terms, for example, a down-ramp, followed by a ‘‘ragged’’ ﬂoor of k data all less than any of the monotonic data in the down ramp, and following a ‘‘ragged’’ ﬂoor of k data all greater than any of the monotonic data in the down ramp (Fig. 2). (c) If the k ; 1 monotonic data are related by ‘‘:’’ rather than ‘‘’’ or ‘‘,’’ then the modes of the leading and trailing sequence of k terms may be independently derived from the neutrality of ‘‘:,’’ allowing for a ‘‘pulse’’ of k ; 1 binary data (Fig. 3).

252

BART WILBURN

Figure 1. Up-ramp.

Figure 2. Down-ramp.

Figure 3. Pulse.

THEORY OF RANKED-ORDER FILTERS

253

Figure 4. Alphabet of 1D FP roots.

The important point is that these FPs are independent of the absolute pixel values. The characteristic of a ﬁxed-point pattern being a monotonic relationship of each of the (u ) to all other (u ) , j : 0, 2k in a window i 9 k to i ; k suggests that there is a deﬁnable grammar for the composition of subsets u to comprise a ﬁxed-point set u or sentence of data. The notion of a grammar of FP roots can be illustrated as follows. Suppose we assign mnemonics , 2, 7 to the monotonic sequence of the three FP patterns described in the foregoing in (a), (b) and (c), and k and k\ to the increasing and decreasing ‘‘ragged’’ precursor and trailing data (Fig. 4). We may now see that the FP described in (a) is captured by the label k\k , that of (b) by k 2k\, and (c) by (k or k\)7(k or k\). We can also see that the FPs denoted by these labels may be combined in certain allowable ways that preserve the character of being an FP. If we may assume the ﬁrst and last k sequence of data appropriate for k or k\ for 2 or , respectively, we have the following allowable and nonallowable dyadic combinations that satisfy the conditions for an FP of the combination (Table 2). In this structure, , 2, and 7 are primitive characters and k and k\ are constraints on the preceding and trailing k sequence of connective data. The character 7 is universal and and 2 have a sense of polarity. Furthermore, the combination . . . denotes a k ; 1 ; n monotonically increasing sequence and similarly for 2 . . . 2 or 7 . . . 7 . With these rules of combina tion, or grammar, we may compose allowable combinations of FP roots as words (Fig. 5) that are themselves FP roots such as shown in Fig. 4. It is important to realize here that the patterns , 2, and 7 are patterns of relationships of the data and are independent of the actual values of the

254

BART WILBURN TABLE II Grammar of 1D FPs Allowed 7 7 27 72

not allowed ... 2...2 7...7

2 2

data. There is much more work needed to be done on this notion of a grammatical structure, and there are variants of the forementioned combination such as overlaps of FP root patterns to be deﬁned, but the possibilities of application are intriguing. For example, realizing our ability to constrain I(u, u ) by R (u ), we may detect a sequence of such words

$. as those in the preceding separated by some speciﬁed interval of data. If such a sequence is found and represents a relationship of data that is a characteristic feature of some known object, then the sequence can be said to have semantic content and be regarded as a sentence. For example, a sentence S : k\[ . . . 7 . . . 7 2]k\ k\[7727]k\ would represent the fol lowing as shown in Fig. 6. It is important to note, however, that the sentence described in the preceding by the sentence S is independent of the values of data except insofar as they satisfy a relationship of values expressed by k\[ . . . 7 . . . 7 2]k\ k\[7727]k thus S is an intensional representation of Fig. 6. The notion of a syntactical structure of FP roots having semantic content such as described earlier will be explored more rigorously later. It is more important now to continue this line of thought into two dimensions.

k\[72]k\:

Figure 5. Example FP word.

THEORY OF RANKED-ORDER FILTERS

255

Figure 6. Example FP sentence.

C. Two-Dimensional FP Analysis We may construct 2D ﬁlters by 2D extension of 1D ﬁlters. The most obvious forms of this extension of the standard 3 (N) are (L : N ; N) ) (N) or (L : N j N T ) (2N 9 1), for L : 2q ; 1

Examples of 2D ﬁlters (L ), representing these structures derived from a 1D ﬁlter 3 (N ), N : 5, are presented in Fig. 7. These two ﬁlters are not the only kind of 2D ﬁlter that one can construct, but they will serve to illustrate the analysis of 2D ﬁlter properties. The analysis of these ﬁlters will proceed with a mathematical investigation of the predicate and the ordering function followed by a phenomenological investigation addressing FP roots. We mentioned earlier in the derivation of the 1D case for I(u, u ) that the distinction between 1D and 2D was in the predicate given by the indexical scheme of the data and the sampling function. We will show this now beginning with the indexical scheme. The data are assumed to be an array in X, Y for the case of 2D ﬁlters, with the origin in the upper left. The

Figure 7. 2D MW ﬁlter designs.

256

BART WILBURN

set of data u : u : ,u -, are deﬁned in X ; Y: D X ; Y : [: + X & 1 + Y & : + (, 1] so that when expressed in the form of designator terms, the data u : (u )D, where the index : is an ordered pair : : (, 1, : : ((1, 1, . . . , (n, n; thus u : (u , u , . . . , u ). Following the same form used in the general derivation in the preceding, we may form the sampling function of u by a subsequence u : (u ) D ; # : (6, , # : ((0, 0, . . . , (2k, 2k, based on D D S S 2(uD, #) : uD and 2((u ), #) : (u ) . Hereafter, the ordered pair indices of # S D D S and : will be abbreviated when convenient and no confusion will result with the understanding that they are an ordered pair, that is, : : (, 1 or 1; # : (6, or 6, or 6 ; k, ; k, etc. Hereafter, the distinction in 2D ﬁlters is provided by the sampling function and the analysis proceeds as before for the 1D case. The constraints on the range of # reﬂect the design of the ﬁlter as the sampling of u. We further deﬁne an analogous ordering function: O(u ) : (u ) 'S. The predicate R(u ) is deﬁned by u and any cor(c), such D D S H D D as the coincidence condition deﬁned in 2D as D : 'S . S H The ) (L ) ﬁlter is introduced in the following with some trepidation because for L 9, it is actually a false ﬁlter in the sense that it does not have an isolated FP root, that is, it is a quasi-root that spans the horizontal space of the data. An isolated root is one that is invariant under repeated passes of the ﬁlter. Furthermore, for L : 9, ) (9) is actually not a square ﬁlter at all, but the smallest member of the class of octagonal * (L ) ﬁlters. We will show that the * (L ) ﬁlter is in fact a logical extension of the (L ) ﬁlter.

For these reasons, we will start with the (L ) as its analysis is relatively

straightforward and serves to introduce the methodology of 2D analysis. 1. (L ) Fixed Points

As we mentioned earlier, everything hinges on the predicate with the sampling function and the correct conditions, speciﬁcally the coincidence condition and the condition for a contiguous set of data satisfying the coincidence condition. For that reason, let us begin with the (L ) sampling

function. It is: & ((u) (B A N ) ; u : 6((u) (B MA )) M, N, D (u ) : ((u ) , . . . , (u ) ) D S S D S D D : (vD , . . . , vD S. S S L : 2N 9 1, N : 2k ; 1, and (6, : ((0, 0, . . . , (2k, 2k.

(28)

This sampling function results in a double sequence Equation (29) representing the intersection of the two arms of the ‘‘cross’’ ﬁlter, (L ), shown in

THEORY OF RANKED-ORDER FILTERS

257

Fig. 6: u : (((u ) D , . . . , (u ) D , ((u ) D , . . . , (u ) D ) (29) D D D D D u : u / u ; u . u : (u ) DY . A B A B D D We may increment u on : to : : ; k, 1 in the same way as for the 1D D case, subject to the predicate forcing condition of coincidence as the ﬁlter slides along the X axis, to get a series analogous to Eq. (26). We may describe this series with an auxiliary ordering function O (u ) : O(u ) & A D O(u ) for each 1D sequence ordered over N: B O (u ) : ((u ) DY % (u ) DY (u ) 'M % (u ) 'M & KY DY DY DY M DY M ((u ) 'N % (u ) 'N (u ) DY (u ) 'N % (u ) 'N . DY N DY N \ DY DY N DY N (30) or the converse expression for a nonincreasing series analogous to Eq. (27). We may now increment u on : to : : , 1 ; k and get a similar series along D the Y axis: O (u ) : ((u ) D % (u ) D (u ) 'N % (u ) 'N & D D D N D D N ((u ) 'M % (u ) 'M (u ) DY (u ) 'M % (u ) 'M . D M D M D D M D M (31) We may produce the same analogous converse for the predicate symbol ‘‘ .’’ In similar fashion, if we decrement : and : by k from 1 ; k and ; k, we get a series analogous to the cases discussed for Eqs. (26) and (27). Equations (30) and (31), and their implied converses, describe two orthogonal monotonic sequences u and u connected by a common intersection A B constrained by the coincidence condition such that (u ) D : (u ) D : D D (u ) DY and .\ O(u ) : (u ) D , as the predicate for a ﬁxed point of DY D D (L ).

It is important to note that the ordering function O(u ) is not deﬁned on D the diagonal reﬂecting the sampling function geometry, that is, the adjacent diagonal datum is not in the sample space of the sample function. If we tried to increment u on the diagonal, the median value would drop out of the D ordering function on each increment. This means that diagonal monotonicity is not a requirement for the ﬁxed points of (L ) in contrast to what we will

ﬁnd for * (L ). The functions G, H and are 2D extensions of the set of functions described by Eq. (24) wherein the unary indices of Eq. (24) are replaced by ordered pairs : and #. The ﬁnal condition for a 2D FP root of (L ) is that the data satisfying the coincidence condition are also a member

of a (k ; 1) ; (k ; 1) set of contiguous data all satisfying the coincidence condition.

258

BART WILBURN

Figure 8. (a) A uniform pulse pattern.

The 2D FP roots to (L ) are more complex than the 1D FP roots as

might be expected, and their combinations to form FP structures are similarly more complex. We may best visualize the FP roots of (L ) as a

box with the top surface formed by a sheet connecting to the 1D FP root patterns on each side. Imagine that the top surface is only moderately ﬂexible in that it does not allow any kinks, much like the way a sheet of stiff cardboard might be ﬂexed in various tilts and twists. The amplitude of the

THEORY OF RANKED-ORDER FILTERS

259

Figure 8. (b) a saddle pattern

top surface is the envelope of the values of the data in the pattern. A few examples of the FP structures are shown in Fig. 8a—8e that are FPs for (17), thus k : 4. The associated example data set for these patterns is

shown using arbitrary data values between 0 and 5 to illustrate the pattern. Please note that the (k ; 1) ; (k ; 1) FP is shown with the precursor and trailing k or k\ patterns of the ‘‘ragged’’ plateaus of data of length k. The patterns are shown in X, indexed by , and Y, indexed by 1. We should also

260

BART WILBURN

Figure 8. (c)a wedge pattern

note that these patterns are deﬁned by the types of relationships on the boundaries, and by local monotonicity on the interior. As such, the data satisfying those boundary relationships and monotonicity is not a unique pattern of values in all cases, but in all cases, is independent of absolute values of the data. The ﬁrst example is of a uniform pulse of data wherein the data are all 7

THEORY OF RANKED-ORDER FILTERS

261

Figure 8. (d) an inclined saddle pattern.

type monotonic patterns, that is, u : ((u ) : (u ) : % (u ) , u : ((u ) : (u ) : % (u ) A B D D D D D D for every : in : : (, 1 to ( ; k, 1 ; k. This pulse is of uniform amplitude 5 and we should note here and in what follows where applicable, that the precursor and trailing plateaus are shown as k\ patterns, but could

262

BART WILBURN

Figure 8. (e) a diagonal wedge pattern.

just as easily have been k because of the ‘‘:’’ predicate relating the data for a pulse. The FPs detected by (L ) are, as shown in the preceding, a pattern of

data determined by a monotonic structure within themselves and also satisfy a relation with neighboring k data as all ‘‘’’ or ‘‘ ’’ to be a ﬁnite set of possible types of roots. If we restrict the predicates of the monotonic structures of the roots to ‘‘:’’ and ‘‘’’ or ‘‘,’’ for (k ; 1) ; (k ; 1) sets, or ‘‘tiles,’’ we have potentially 81 possible types of FP patterns. This number

THEORY OF RANKED-ORDER FILTERS

263

Figure 9. Complex FP of a pulse, wedge and diagonal wedge. (a) Original image, (b) (13) iff R (u ), (c) and (d) (13) iff R (u ).

$.

$.!

is reduced to 54, however, by the constraint of being connected at the vertices of the FP, and further reduced to 15 fundamental patterns by disregarding patterns equivalent under rotation. Thus, we have an alphabet of 15 fundamental patterns: e@, 2 : 1,15; e@ R (u ) including the ﬁve D $. D shown in Fig. 8. The important fact is that there are only 15 patterns. Now we can see quite readily how some of these few patterns in Fig. 8 can be combined into a complex FP pattern. The following example, shown in Fig. 9, involving the wedge, pulse and diagonal wedge, is an FP pattern buffered by a k\ of ‘‘0.’’ This pattern could be further combined with /2 rotations of the wedge and diagonal wedge on matching edges, and buffered by k\ of ‘‘0’’ to form a 2D image of Fig. 5. Sufﬁce it to say that it could also be combined with /2 rotations of itself to form a similar image, and other more complex and extensive FP surfaces could be imagined involving the other patterns. The solid lines in the complex FP shown in Fig. 9 indicate that the four FPs co-joined together share a common set of pixels on the adjoining edge. This results in each pattern satisfying the k-data requirements of the other patterns as well as the monotonic requirements of each pattern. This aspect of the co-joining of FP patterns will be addressed again in Sections III.D and III.E. With some understanding of the structure of a complex FP, we may illustrate the application of the predicate constrained (L ) for extract

ing features of an object in an image.

264

BART WILBURN

2. Application to Feature Extraction The application of the RO ﬁlter to feature extraction may be realized by enforcing a condition on the predicate such that the data satisfy the correct conditions of an FP root for the ﬁlter function to be not necessarily zero. The implementation of this logic is described by Eq. (23); I(u, u ) : (u , u ) · K R (u ) j, . . . , j(u , u ) · K R (u ) where K R (u):1 \ \ \ for R(u) : R (u ) and R (u ) is false, that is, that the data satisfy cor(c)

and R (u ) is true. If R (u ) is false, that is, R (u ) is true, then K R (u) : 0

and the associated (u, u ) contributes 0 to I(u, u ). A collection of juxtaposed, and thus grammatically compatible, R (u ) represents a predi D cate of an object in an image composed of FP roots and is denoted as: DY R (u ) ; R (u ) : + R (u ) ; : : : ; n. DY

DY

D D The application of RO ﬁlters for feature extraction assumes that objects of interest are composed of a set of related features distinguished from features not comprising objects of interest by their monotonic structure, and that the composition of features to constitute the object is a relational structure. This assumption clearly does not apply to all images and objects of interest, but may apply to many cases such as man-made objects embedded in clutter. The potential of the RO ﬁlter for feature extraction of objects embedded in clutter is illustrated with preliminary results in Fig. 10. This application is of (L ) iff R (u ), L : 13 constrained by the

$.

condition of coincidence and inclusion in (k ; 1) ; (k ; 1) sets denoted by (13) iff R (u ). The representation of the features is by the FPs

$. <

detected by this constrained ﬁlter and is shown both with and without the precursor and trailing k and k\ patterns that deﬁne the complete root. The (u ). representation with the k and k\ data is denoted by (13) iff R

$.<

The results shown in Fig. 10 are: (a) Original image of a navigation marker; (b) ﬁltered image subject to constraints for selecting LOMO(q ; 1) FP structures; (c) ﬁltered image composed of complete FP roots, that is, (13)

iff R (u ); and (d) (13) iff R (u ) applied to (a) rotated by 90°.

$. <

$. <

These results demonstrate that feature extraction by this method is independent of shape and gray level of the object. The dimension of the input image was 840 ; 1024 pixels and required approximately 100 s of computation by a Sun Microstation to produce Fig. 10(c). We can see that the constrained ﬁlter is rather efﬁcient in detecting the feature of the numbered panel, and conversely it is efﬁcient in rejecting the various kinds of clutter surrounding it. We can also see that the FP roots have internal variation in intensity and that they combine to form a recognizable object. The numbered panel is a reasonably smooth, albeit

THEORY OF RANKED-ORDER FILTERS

265

(a) Figure 10. Chesapeake Navigation Marker.

lumpy, structure, and as such it satisﬁes the predicate constraints for many of the FP patterns ,e@-. The other parts of the image do not satisfy these constraints by the nature of being clutter, except by chance, and the result is the occasional appearance of isolated patterns that are detected. The exception is the detection of the sky. An observer might note, however, that the sky appears as an unstructured feature as does the partially obscured orthogonal panel, both being recognized by association in the local context. It is important to note that some of the facing panel could be obscured and we may still recognize it as part of a feature. Finally, we can see that the application is fairly robust under rotation as seen by image (d).

266

BART WILBURN

(b) Figure 10. Continued.

By inspection of Fig. 10 (b), (c) and (d), we can verify that the FP patterns ,e@- are a set of relational patterns independent of absolute pixel value. This means that they represent the property of relation in that region of the image independently of the extension of the image, thus the ,e@- are an intension of the image in that region. This explains the claim that an image of a feature represented by an ordered sequence of e@: (e @, . . . , e @ is an D DY intensional representation of that feature. As a practical matter, we cannot say that any image of that feature will have the same representation by (e @, . . . , e @. We can say, however, that any image from a set of images of D DY that feature obtained within the limits of linear exposure, equivalent range, perspective and resolution, and digitized at the same level will have the same intensional representation. Furthermore, as individual objects are members

THEORY OF RANKED-ORDER FILTERS

267

(c) Figure 10. Continued.

of a sort of object, features are members of a sort of feature deﬁned by a common set of intensions referred to as intensional properties. Images of members of that sort form families of images over range and perspective. This family of images is usually a fairly large set of possible images for which the intensional property of the feature holds true in practice. We should also point out that it is certainly possible to ‘‘fool’’ the ﬁlter with camouﬂage in the same way that human observers are deceived by camouﬂage. The relations required between the monotonic data and neighboring k data, or ‘‘ragged’’ plateaus, result in the roots being constrained by rules of combination so that the co-joining or partial superposition of roots satisfy a kind of grammatical compatibility. The demonstrated capability to extract

268

BART WILBURN

(d) Figure 10. Continued.

and represent features intensionally with a ﬁnite set of grammatically constrained ,e@- implies that it may be possible to deﬁne a language system of imagery based on a syntax of ,e@-. If we can devise a language system, then we may be justiﬁed to explore a language model of this system that would provide a semantic valuation in the calculus of the system. The purpose of such a language model would be to link an image directly to a logical model of object recognition in order to implement a system of automated image interpretation. This subject will be addressed in more detail later in this chapter. The next section will address the grammar, or compatibility, for combining the various FP patterns. D. The Grammar of (L) Fixed-Point Root Combinations

The discussion in this section is new and has not been previously presented or published. The purpose here is to develop the schema for computing the natural grammar of allowable combinations of FP patterns e@. The development of this schema is presented in a detailed, step-by-step progression and is not abstracted into more elegant and parsimonious mathematics as the concern here is for understanding the dynamics of the situation.

THEORY OF RANKED-ORDER FILTERS

269

Figure 11. Join of two FP by ‘‘’’.

The method for realizing the compatibility (or grammar) of the various FP patterns e@ to join together involves determining the relational predicate R (e?, e@) for combining any two e? and e@ by a connective 1 such that A R (e?, e@) is true (T ), that is, that they are compatible and form a complex A FP, or false (F), that is, that they are not compatible to form a complex FP. Thus, the problem is the evaluation of R (e?, e@) + ,T, F-. This relational A predicate is central to the language of imagery referred to in Section III.C.2 so as to constitute a nonclassical propositional language L (I) to be discussed later. The compatibility of e? and e@, that is, that the truth value of R (e ?, e@) A is T for 1(e?, e@), is a problem of matching the LOMO(k ; 1, k ; 1) relationship in X and Y of e? and e@. For our purposes here, we will assume, without any loss in generality, that 1 is the simple connective ‘‘’’ interpreted as two FPs e? and e@ joined in Fig. 11. The FPs in Fig. 11 are indicated by and 2, and the dashed enclosure of k denotes the k and k\ data patterns as appropriate to the FPs. If we may assume R (e ?, e@) is T, then the join of the FPs by forms a complex FP. Our purpose here is to develop the method of testing R(e?, e@) such that 1(e?, e@) R(e?, e@). The method of evaluating R(e?, e@) must also include the rotation of the FPs. This means that R(e?, e@) is more properly stated as R(e?(3), e@(3)), where 3 is the rotation (0, /2, , and 3/2) of the FPs. We will defer this complication for now and address it later. 1. Representation of Patterns An FP pattern e@ may be described in vector notation deﬁned in a right-handed space with unit vectors e , e and e of X, Y and Z as shown in Fig. 12. The length of the e and e components of an FP is either 0 or at least k ; 1, and we shall denote this as 0 or 7. The e component in actual patterns is a variable, but this analysis does not need to be concerned with this complexity. This is because we are only concerned with the continuity of the slope of the vectors as a monotonic sequence of data that are all ‘‘’’

270

BART WILBURN

Figure 12. Coordinate system of FPs.

or ‘‘ .’’ This simpliﬁcation follows from the e@ being an intensional representation of the actual data. The continuity of slope from one FP to another is the measure of compatibility and we denote the magnitude of the e component as 1, 0 or 91. With the conventions described in the foregoing, we now imagine an FP pattern as a square composed of four vectors a, b, c and d, and points indexed 1, 2, 3, and 4 of the four corners of the pattern as shown in Fig. 13.

Figure 13. Vector representation of FP.

THEORY OF RANKED-ORDER FILTERS

271

We may now describe the FP as a matrix of vector components as e e e a a a b b b (32) c c c d d d The components a, b, c and d are computed as follows indexed by the points of the FP pattern: a e@: b c d

a :x 9x a :y 9y a :z 9z b :x 9x b :y 9y b :z 9z c :z 9z c :x 9x c :y 9y d :x 9x d :y 9y d :z 9z With this convention (recall that the pattern length of k ; 1 or greater is 7) we may see that any pattern may be described by a matrix as follows: e a b (34) c d This representation permits the distinction of all (L ) FP pattern ‘‘types’’

by the e component of vectors a, b, c and d as 1, 0, or 91. We will refer to this representation as the e@ form where 2 + ,P-, P is the isomorphic mapping of the ﬁnite set FP pattern types to the set of positive integers. a e@: b c d

e 0 7 7 0

e 7 0 0 7

2. Characterization of Fixed-Point Surfaces We know that the compatibility of any two FP patterns at some orientation is determined by the continuity of the slope of the surfaces in the X and Y directions at the junction of the adjoining edges of the surfaces. We also know that by construction, each of the patterns is LOMO(k ; 1, k ; 1) in themselves. This means that we know that there are no compound curves in either X or Y, and that we may characterize the surface of each pattern by the surface normal vectors at each of the four corners of the patterns indexed by points 1—4. We must emphasize an important, but subtle, principle of combining FP patterns. We have stated the primary principle governing the formation of complex FPs and that is the continuity of slopes. There is another, more

272

BART WILBURN

subtle, principle also at play here and it is that these patterns are deﬁned to already exist by hypothesis. The signiﬁcance of this statement lies in the realization that we are not designing actual patterns to be joined. Rather, we are testing for the recognition, that is, the ‘‘name,’’ of the types of each of two patterns that can share related data as one existing complex FP. We must keep in mind that each of the patterns constituting the complex FP also accounts for the k and k\ data of each other extending from the mated edges. This principle governs the distinction between the necessary and sufﬁcient conditions for compatibility and, indeed, implies an important, if not subtle, constraint: Two FP patterns joined along some edge necessarily share the data on the mated edge. For example, two patterns joined along d:a, implying the left and right placement, have a common edge that is the vector d of the pattern (i) on the left, and the vector a of the pattern ( j ) on the right such that d a. This constraint results in some unexpected realizations. For example, suppose we were to join two patterns that when isolated appeared as two wedges, one inverted with respect to the other. Upon joining constrained by a common edge, one pattern appears as the original wedge in e@ form corresponding to the left-hand side of the join, but the other appears as a saddle in e@ form. The saddle is deﬁned by its form in the e@ representation even though it is a ‘‘step down’’ and a ‘‘step up’’ in the ﬁrst column after the a edge at points 1 and 2, respectively. This illustrates the signiﬁcance of the prior existence assumption of the patterns. Their combination already exists as a complex FP of kinds of e@, and the objective is to identify their type knowing that only certain types may exist in combination. The problem is to be able to compute the possibilities of types that may exist in combination as opposed to computing the types of independent patterns that may be combined. It is a subtle, but important distinction that we will illustrate as we continue. We may return to the point made in the preceding before the caveat on existence and common edges, speciﬁcally, that the surface of an e@ may be characterized by the four normal vectors at the corners. What we have is the situation of the four vectors a, b, c and d, and the points 1, 2, 3 and 4 as shown in Fig. 13. The normal vectors are indexed by the points and computed according to: n at point 1: b ; a n at point 2: c ; a n at point 3: c ; d n at point 4: b ; d.

(35)

THEORY OF RANKED-ORDER FILTERS

273

Thus, at point 1, we have n computed as: e e e n : b ; a: n : 7 0 b 0 7 a 0 b 7 b 7 0 e 9 e ; n : e 7 a 0 a 0 7 n : 9b 7e 9 a 7e ; 7e . The other normal vectors follow in a similar manner, and we may write them normalized by 7\ with the result of:

n : 9b e 9 a e ; 7e n : 9c e 9 a e ; 7e (36) n : 9c e 9 d e ; 7e n : 9b e 9 d e ; 7e . In this description of the normal vectors, they are represented as scaled by a constant of the size of the pattern 7 and referenced to a common origin in coordinate space. 3. Criteria for Compatibility We may test for the compatibility of two FP patterns by using the normal vectors to describe the slope of the surfaces of the two patterns. We do this by computing the direction cosines of the normal vectors in X and Y at mated vertices, or points. The test for compatibility is that the sign of the direction cosine for the e component of one pattern does not change at the junction with the other pattern at the adjoining point. This is also the case for the e component. We may visualize this in 1D. Suppose we imagine two patterns e and e of a ramp and a pulse, respectively, joined along the d:a vectors at points 4:1 and 3:2 (The superscripts refer to e and e.) Suppose further that we project the junction at point 4:1 onto the X-Z plane. The situation we have may be presented as shown in Fig. 14 for the junction of b and b, and the projection of the normal vectors, n and \ n of patterns 1 and 2 for points 4 and 1, respectively. \ In the case shown in Fig. 14, we can see that the direction cosine of n , \ is negative, cos 0, and that of n (1), is zero, cos : 0. Thus, the \ direction cosine of n does not change sign in transition to e and the \ two patterns are compatible at the junction of 4:1. The computational test is sign(cos ) sign(cos ) 0 for the junction of k:1. The same test

274

BART WILBURN

Figure 14. Normal vectors of joined FP edges.

applies for the Y-component in the Y-Z plane. By virtue of n n cos : , and cos : . n n This test is tantamount to: sign(n · n ) : 0 or 1, and sign(n · n ) : 0 or 1 (37) for any two patterns at the junction of points (k:1). We can see, by reference to Eq. (36), that the criterion of Eq. (37) is expressed in terms of the Z components a , b , c and d . Indeed, we will ﬁnd it useful to deﬁne two matrices S and S as follows: n n n n pt.1 9b pt.2 9c (38) n cos :S : pt.3 9c pt.4 9b n n n n pt.1 9a pt.2 9a n cos :S : pt.3 9d pt.4 9d The S and S matrices represent the X and Y components of the normal vectors of the ith pattern for each of the four points of the pattern. We may

THEORY OF RANKED-ORDER FILTERS

275

recall that the Z components were deﬁned to be 1, 0, or 91. Thus, the S and S matrices provide immediate representation of the sign of the direction cosines for use in the test of compatibility. The S and S matrices are useful for two purposes, one for providing the direction of the slope of the surface at the four points as shown in the foregoing, but also for computing the slope of the surfaces under rotation through /2, , and 3/2. To do this, we hold the labels of the points 1—4 and the vectors a, b, c and d ﬁxed in space and transform the values of a , b , c and d to form S (3) and S (3). The rotational state of a pattern is incorporated into the notation as e@(3), where the condition of 3 : 0 is the normal form of the pattern. The transform is computed by a permutation matrix applied to the e , z-component of an e@ pattern in normal form as shown in what follows: a (3) 0 91 0 b (3) 0 0 0 : c (3) 1 0 0 d (3) 0 0 91

0 a (0) 1 b (0) ; 0 c (0) 0 d (0)

3:

n , n : 0, 1, 2, 3. 2

(39)

This matrix provides us with what we need for computing the compatibility of any two patterns in some state of rotation from the normal form. 4. Computation of Compatibility We argued in the preceding that the criterion for compatibility was the continuation of the slope in X and Y to be either nondecreasing or nonincreasing from one pattern to another. We also showed that the criterion is satisﬁed at the points of a mated corner (k :l) of e and e if the X and Y components of n and n are the same sign or neutral. From this follows that the corners of a mated edge must be a compatible pair of corners deﬁning that edge, namely, (k :l) & (m :n). (The symbol ‘‘&’’ is the logical conjunction of sentences whereas ‘‘’’ is the logical conjunction of atomic sentences.) Furthermore, the satisfaction of the slope criteria at mated corners is necessary, but by itself, even for two sets of corners, is a necessary but not a necessary and sufﬁcient condition for compatibility. Sufﬁciency is provided by the additional condition for deﬁning a common edge and we will address that later. Nevertheless, for a necessary condition we have (n n n n ) & (n n n n ). Let us consider an example. Suppose two patterns e and e are found joined as shown in Fig. 15, where (k : 4: l : 1) & (m : 3, n : 2). (N.B.: k and m are points in e, and l and n are in e.)

276

BART WILBURN

Figure 15. Corner notation of joined FPs.

A necessary but not sufﬁcient condition for identiﬁcation as a compatible pair is: n n 0 n n 0 & n n 0 n n 0. By referring to the S and S matrices in Eq. (38), we see that this example of satisfying a necessary condition for compatibility amounts to: Points (4:1):

(9b) (9b ) 0 (9d ) (9a ) 0 &

(9c) (9c ) 0 (9d ) (9a ) 0. We need to link the example calculation for compatibility of Fig. 15 to a scheme for testing all possible mated edges. We may indeed accomplish this linkage with a computational scheme based on the formation of a 2 ; 2 matrix m : n 0 m : ; k : 1, 4 points of ith FP pattern. 0 n We may now form a vector of the four m characterizing the ith FP as: m m m : m m and then use this vector to represent the join of the ith FP to the jth FP with the computation of M : m mY (the prime denotes transpose). The calculation of M : m mY is another function subject to a predicate condition, in fact two of them. These conditions are in the form of Points (3:2):

THEORY OF RANKED-ORDER FILTERS

277

combinatorial constraints resulting from the indexing scheme of the points of the patterns. We will dispense with the formal expression and simply state them. The ﬁrst constraint determines the form of the matrix, and the second determines the interpretation of it. The ﬁrst constraint is that the mated corners allowable without rotation of either of the ith or jth patterns are determined by their sum, for example, (k ; l), being an odd number. This constraint results in:

M :

m

m m

m

m m

(40)

m m The second constraint is simply that the sum of four corner indices of a single pattern is the sum of indices of two mated edges, namely, (k ; l ) ; (m ; n) : 10. In other words, the four corners of any two mated edges must also be the four corners of any single pattern. This constraint, together with the ﬁrst one, that (k ; l) and (m ; n) are both odd numbers, links the m pairs that are junctions of a possible common edge. These constraints results in the following possibilities of mated edges: A: m & m : c : b B: m & m : d : a (41) C: m & m : b : c D: m & m : a : d. The question before us now is to compute the satisfaction of the condition for compatibility of two FP patterns mated on one or more of the possibilities given by Eq. (41). To compute this compatibility, we need to introduce the notion of truth of Eq. (41) as a necessary condition for compatibility. The notion of truth in the context of Eq. (41) applies to the predicate statement of any two given FPs, present in some state of rotation, that they satisfy the condition for compatibility. The notion of truth was presented in the beginning of this section as R (e?, e@) + ,T, F- where R (e?, e@) is a A A predicate of e? and e@ that they satisfy some conditions described by R , A where 1 is a conjunctive for the join e? and e@. Those conditions must be the necessary and sufﬁcient conditions for compatibility such that R (e?, e@) + A ,T, F- and (e? e@) iff R (e?, e@). That is, that a complex FP can be formed A A by the join according to 1 of e? and e@ if and only if what the predicate R (e?, e@) says about them is true. We have developed the basis for a A

278

BART WILBURN

necessary condition in Eq. (41), and we may apply the notion of truth to them that at least one of the possibilities of Eq. (41) is logically true in the sense that e and e are compatible on at least one of the possibilities of Eq. (41). We will soon add another condition to Eq. (41) such that the two together are the necessary and sufﬁcient conditions represented in R (e?, e@) + ,T, F-. Let us ﬁrst tackle the incorporation of Eq. (41) in A R (e?, e@). A The test of R (e?, e@) + ,T, F- is the logical truth of the sentences A, B, C A and D that at least one of them is true, namely, that the sentence (A^B^C^D), where ‘‘^’’ is the logical symbol of disjunction read as ‘‘or,’’ is true. The sentences A, B, C and D are closed under subsentences, or atomic sentences, in conjunction. (An atomic sentence is an element having semantic content, namely, is T or F, and is indivisible in semantic content.) This means that the valuation of A, B, C and D, namely, V (A) + ,T, F-, is determined by the conjunction of the truth values of all of the subsentences of each of them. This leads us to the question of determining the truth value of the atomic sentences of A, B, C and D, and for this we need to deﬁne a logical operator L , on m : L (m ) + ,T, F-. With L (m ) we may, for example, interpret A in the context of (e? e@) iff R (e?, e@); 1 : as: A A / For e and e, V (A) : V [L (m ) & L (m )] V (A) : T iff L (m ) : T & L (m ) : T. If A then R (e , e) is T, then (e e) is T where is a join on c : b. / / The other conditions of B, C and D follow in a similar manner. The logical operator L , is based on an assignment of the truth value of an atomic sentence ‘‘p’’: ‘‘V ( p) : T ’’ if p 0 and ‘‘V ( p) : F’’ if p 0, and is applied to the diagonal elements of m : L (m ) : tr(V )

n n 0

0 n n

L (m ) : [V (n n )V (n n )] n n 0 L (m ) : T iff and n n 0

(42)

279

THEORY OF RANKED-ORDER FILTERS

5. Example Calculation Let us assume a test of the compatibility of a saddle pattern e(3), and a wedge pattern e(3), both in normal form (3 : 0). The e@(3) forms are given in Fig. 16. The test is the satisfaction of at least one of the conditions of Eq. (41) to be true, and this is found by inspection of M : mmY with L (m ). The m for the two patterns are:

9b 0 0 9a 9c 0 m : 0 9a 9c 0 m : 0 9d 9b 0 m : 0 9d m :

m

m

91 0 0 1

0 0 0 91

1 0 0 1

0 0 0 91

1 0 0 91

0 0 0 91

91 0 0 91

0 0 0 91

With these data, we compute M : mmY:

M :

0 0 0 91

0 0 0 1

0 0 0 91

0 0 0 1

0 0 0 91

0 0 0 1

0 0 0 91

0 0 0 1

We can see the following results immediately by inspection: V (A): L (m ) : F; L (m ) : T, thus V (A) : F and R (e (0), e(0)) V (B): L (m ) : T ; L (m ) : T, thus V (B) : T and R (e (0), e(0)) V (C): L (m ) : F; L (m ) : T, thus V (C) : F and R (e (0), e(0)) ! V (D): L (m ) : F; L (m ) : F, thus V (D) : F and R (e (0), e(0)) " Thus (e (0) e(0)) is T where is a join on d : a.

is F. is T. is F. is F.

280

BART WILBURN

Figure 16. e(0): Saddle and e(0): wedge.

To further illustrate the computational scheme for compatibility, let us imagine a variation on the example in the foregoing wherein the wedge is rotated by 3 : 3/2. For this case, we consult the S (3) and S (3) matrices. Recall n n n n pt.1 9b pt.2 9c S : pt.3 9c pt.4 9b n n n n pt.1 9a pt.2 9a S : pt.3 9d pt.4 9d These matrices for the wedge in normal form (cf. Fig. 16) are: 91

0 S (0) :

0

,

0 0

S (0) :

91 91 91

or S (0) : [0] and S (0) : 9[1]. For 3 : 3/2, we consult Eq. (39) and get s (3/2) : 9[1] and S (3/2) : [0] (Fig. 17). Thus, denoting w(3) by w , bY : cY : 1 and aY : dY : 0.

THEORY OF RANKED-ORDER FILTERS

281

Figure 17. e(3/2) : wedge rotated by 3/2.

We substitute the foregoing values for mY, k : 1, 4, and repeat the computation of M : mmY to obtain the result of: V (C): L (mY ) : T ; L (mY ) : T, thus V (C) : T and R (e (0), e(3/2)) is T. ! Thus (e(0) e(3/2)) is T, where is a join on b : cY. ! ! 6. Necessary and Sufﬁcient Conditions for Compatibility We have shown how satisfying at least one of the four truth conditions of Eq. (41) is a necessity, but we can illustrate quite easily how it is not a sufﬁcient condition. Consider the case of a pulse joined to any pattern, but in particular to a saddle. In this case, the m are all [0], and thus also would be M , implying satisfaction of all of Eq. (41). However, we readily see that a pulse is incompatible with a saddle. In the language of logic, if is the state of satisfaction of one of Eq. (41) and ‘‘Comp’’ is the state of compatibility, then what we have shown is the material implication: ( 9 Comp), whereas what we need to show is the necessary entailment: ( Comp). The problem of the pulse arises out of the ambiguity of the end points of the ﬂat edge to the end points of the adjoining orthogonal edge. We can solve this problem by recalling the fundamental assumption that any two patterns joined together already exist as FPs in their combined state, namely, that the k and k\ conditions are satisﬁed. Furthermore, we must remember that the intent in testing the patterns is not to design compatible pairs, but to compute the natural grammar of allowable patterns that can be found joined to form a complex FP. Finally, we must not lose sight of the relational nature of the data comprising the patterns. The solution is to

282

BART WILBURN

recognize that any two patterns found joined together share a common edge that stands in relation to the data on either side of it satisfying the requirements of an FP root. This recognition of shared edge is enforced if we impose an additional condition for compatibility that the cross product of the joined edges is zero. Thus, for example, if condition A of Eq. (41) is satisﬁed, V (A) : T, then the additional condition to be satisﬁed is that c ; b : 0. Thus V (A) : T & c ; b : 0 are the necessary and sufﬁcient conditions for this case. If we set N to stand for any of Eq. (41) conditions and u to be any of the vectors a, b, c, or d associated with N, then the general statement of necessary and sufﬁcient conditions for compatibility of any two patterns e and e is: Necessary and Sufﬁcient Conditions for (e e): V (N ) : T & u ; u : 0 A (43) In reﬂection on this section, we can readily see now this formalization lends itself to a representation of the S symmetric group. The reformula tion, however, will be a task for another day. This schema shows that a grammar exists and is computable, and provides the requirements of the computation of compatibility. An elegant reformulation is necessary, however, to provide guidance for developing an algorithmic test of existing patterns found in combination. This is work yet to be done. 7. Catalog of Fixed-Point Roots to (L )

The types of FP roots to (L ) is a ﬁnite number and indeed, when reduced

by the constraints of piecewise continuity at the four corners and eliminating redundancies for rotation, the ﬁnal number is 15 fundamental types of FP roots to (L ). We showed earlier that the e component of the vectors a,

b, c and d distinguish all patterns, thus we may list all 15 patterns in terms of the e column vectors in normal form and all rotations may be computed by Eq. (39). E. Oscillating Roots This section addresses the other kind of root of the MW ﬁlter, the so-called oscillating roots. We will develop the analysis in a manner parallel to the previous approach for the FP roots, starting with the 1D 3 (N ) and

extending to the 2D case of (L ). The reader is referred to Sections III.A

and III.B for notation. The oscillating root is referred to in the pejorative because it is not a true root; it gives rise, however, to a relative referred to as the coded MW root. The phenomena of oscillating roots involves

283

THEORY OF RANKED-ORDER FILTERS TABLE III Catalog of FP Roots for (L )

a b e@: c d

e:

1

91

1

1

91

1

0

0

1

,

e:

0

91

e:

1 1

e:

91 ,

e:

1 91

1

,

e:

1

91

0

91 e:

,

91 e:

,

1 91

1

e:

,

91

1

1

91

91

e:

,

91

1 0

e:

,

1 91

91

91

1

0

0

91

91

91

91

0

0 0 91

,

e:

0 1 91

,

e:

0 91 91

,

e:

0 91 1

,

e:

0 0

.

0

invariant patterns of bi-valued, or binary, data: a^b, a b, that oscillate in the sense of : ‘‘ababab . . . a.’’ Oscillating roots have been investigated by Tyan (1982), Longbotham (1989), and others, for 1D MW, and they concluded that oscillating roots were quasi-roots in that the ﬁrst and last datum were lost by the ﬁlter. This means that the oscillating sequence must exceed the domain of the ﬁlter in order that the output qualiﬁed as a root. This investigation will verify that result, and will also show that the dimension of the MW must be N : 2k ; 1; k is even. This investigation will show further that the oscillating sequence need not be binary data but oscillating data modulated by an envelope that is locally monotonic. Finally, as already noted, this investigation will show that there is a way out of the quasi-root result through a generalization of the MW to a ‘‘coded’’ MW. We should also note that the use of binary (0, 1) data is an acceptable generalization for discussion oscillating roots, but is not for discussion of SNR or suppression of (0, 1) noise. Binary data allow the discussion of noise to be a discussion of ‘‘bit error.’’

284

BART WILBURN

1. Oscillating Roots The solution of oscillating roots is easily expressed in the same formalism we used for FP roots. As mentioned in the foregoing, we will begin with the 1D case and then extend it to the 2D case. The forcing conditions for an oscillating root are fairly simple. The individual functions following from the binary condition and the coincidence condition of : ' necessary for root deﬁnition of 3 (N ) are:

(u , u ) : (0 , . . . , a , . . . , 0 , R(u ) (u , u ) : (0 , . . . , b , . . . , 0 R(u ). Thus the ordering function must be: O(u ) : (a' , . . . , a' , b' , . . . , b' O(u ) : (a' , . . . , a' , b' , . . . , b' \ and O(u ) : O(u ), and O(u ) : O(u ) for all i. We may realize a generalization of the data by noting that the bi-valued constraint on the data (a, b) is not strict for the b’s if all of the a’s are monovalued and greater than any of the b’s. We may effectively analyze the oscillating root case using (0, 1) based on a threshold distinction between a and b. These arguments deﬁne the ordering function as an oscillating function alternating between (k ; 1) a’s and (k) b’s , and (k) a’s and (k ; 1) b’s for alternate odd and even i, of u . The interesting aspect of this analysis is that the forcing condition of coincidence, : 'l for alternating binary data, abab . . . a, means that not every median ﬁlter has an oscillating root. The coincidence condition for oscillating binary data forces k to be an even number such that the allowable ﬁlter dimensions for an oscillating root are N : 5, 9, 13, . . . , and the ﬁrst and last instance of coincidence of an ‘‘a’’ with the k ; 1 position of the ﬁlter fail to satisfy the median coincidence of the ordering function. Thus, the ﬁrst and last data elements are lost by the ﬁlter, namely, it is a quasi-root. The correct conditions forcing R(u ) such that the ordering function has this behavior for 3 (N ) are:

cor(c (1)) : u : a^b, a b cor(c (2)) : : 'l cor(c (3)) : u : (a) ( j: odd), (b) ( j: even)i, j, j : 0, n; n 3k, k : even. We can see from these conditions that the description of the oscillating roots as quasi-roots simply means that the data described by cor(c (1) and cor(c (3)) must exceed the data space of u, which is a contradiction. We

THEORY OF RANKED-ORDER FILTERS

285

also see that the cor(c) proscribing the coincidence condition does not proscribe contiguity for k ; 1 pixels. The next section introduces a way out of the quasi-root problem for oscillating roots. 2. Coded Medium W indow Filter The discussion of oscillating roots suggests another possibility — that of a ﬁlter having a symmetrical periodic structure incorporated into its design. We may interpret the discussion of oscillating data given in the proceeding as the notion of repeated occurrence, and apply that notion to subsets of data. In this perspective, the oscillating data of (0, 1) described previously is regarded now as a repeated pattern of binary data: 10. We may further generalize this to be any pattern of binary data regarded as a unit of a periodic, or repeated, sequence of data patterns. With this perspective, we may generalize the requirement of k is ‘‘even’’ for the oscillating data quasi-root to be a condition that k : ‘‘length of the binary pattern,’’ and that the number of patterns in U must be greater than one. We may further observe that the oscillating data are distributed across the extent of the ﬁlter and consider a scheme that would optimize the ﬁlter by effectively compacting the data of the binary pattern into one side of the ﬁlter for a LOMO(k ; 1) root response function analogous to the conventional ﬁlter. To implement this repeated pattern scheme, we may combine the notions of the median window and the matched ﬁlter to construct a ﬁlter as a symmetrical structure matching the intended repeated pattern E , of ‘‘0’’ ’s and ‘‘1’’ ’s. This ﬁlter is referred to as a coded window ﬁlter (N ), and is

applicable for patterns of binary data having at least one repetition. The oscillating root of this ﬁlter is not a quasi-root, but a true root. An example of a (N ) is:

(3) : [ ■ ■ ■ ■ ■ ],

where ■ are inactive elements of the ﬁlter. The following application of (N ) to a data set containing a ﬁnite pattern of binary data matching

the symmetry of the ﬁlter illustrates the root characteristic: S (23) : 00000101010101010100000 (3) * S (23) : 00000101010101010100000.

Note that the active dimension of the ﬁlter is 3 although the extent is 5. Another example is (5) deﬁned as

(5) : [ ■ ■ ■■ ■ ■■ ■ ■ ]

S (28) : 0000100110011001100110010000 (5) * S (28) : 0000100110011001100110010000.

286

BART WILBURN

A somewhat more complicated example is: (7) : [ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ]

S (28) : 0000101011010110101101010000, with equal results. The symmetry of a coded ﬁlter is derived from the binary pattern, for example, 10 for S , 1001 for S , and 10101 for S , represented in the ﬁlter by ) and ■ corresponding to 1 and 0, respectively. The pattern is reﬂected about the k ; 1 active element of the ﬁlter plus ) on each end with the constraint that the output k ; 1 position of the ﬁlter must be an active element. We may deﬁne the binary pattern of (1, 0) as the root E : (% , . . . , e ), mapped into (), ■] denoted by % ; then for a given E having k:1’s and p—k: 0’s, the ﬁlter construction is deﬁned by the sampling function u : u : ((u ) , % (u ) , . . . , % (u ) , (u ) , % (u ) , . . . , \ \ % (u ) , (u ) ; \ \ I (2k ; 1): [)% . . . % )% . . . % )], E : (% , . . . , % ).

(44)

In this design, the root pattern is sampled in the active elements of the window as the ﬁlter slides across the pattern. The active length of this ﬁlter is N: N : 2k ; 1, and the total length, or extent, of this ﬁlter is m : 2p ; 1, p : ‘‘length of the binary pattern.’’ This deﬁnes the root of the coded ﬁlter as a sequence of binary data composed of at least one repetition of a binary pattern satisfying R (u ).

The requirement for at least one repetition of a pattern to constitute a periodic binary pattern of length (n ; 2) E , n + ,Integers- augments the local monotonicity requirement of a root for this class of ﬁlters. (We should note that the conventional ﬁlter required at least two repetitions of the simple binary pattern S , to produce a quasi response.) The coded ﬁlter of extent m: m : 2p ; 1, also has a ﬁxed point behavior of LOMO(p ; 1) for multivalued data represented by R(u ) reﬂecting E , and the noise sup pression of multivalued data according to the estimates given earlier by Eqs. (14), (18), and (21) for ﬁlters of length N. Such a ﬁxed point is included in a ﬁxed point of LOMO(p ; 1) for a conventional median ﬁlter of length m and may be regarded as a modulation envelope of the E pattern. We may incorporate this information about the ﬁlter with a notation of N (2k ; 1, m). The forcing conditions on R (u ) for the intent of detecting

a repeated pattern are that the data are binary a^b, a b, and the pattern is deﬁned by ‘‘a’’ such that it matches the symmetry of the ﬁlter as noted in

THEORY OF RANKED-ORDER FILTERS

287

the preceding by E , and is repeated at least once. It is important to note also that the writing function H is deﬁned over the extent m of the sampling function, but the ordering function is deﬁned over N. Thus, the selection function assigns the output datum to j : p in the interval 0, 2p. The analysis of the noise response of the coded ﬁlters is primarily concerned with bit errors in the detection of a binary pattern or code. The analysis of bit errors in a binary pattern detected by N (2k ; 1, m) centers

on the requirement for a ‘‘1’’ to occur in a ‘‘0’’ position of the binary pattern when the output position is coincident with a ‘‘0’’ of the binary pattern. This must occur as many times as necessary for ( k ; 1) ‘‘1’’ ’s to be sampled by the ﬁlter centered on a ‘‘0.’’ The coded ﬁlter has an extent of m sampled in 2k ; 1 positions, thus there are 2(p 9 k) zero positions in the pattern divided symmetrically on the two sides of the ﬁlter when the ﬁlter is centered on a ‘‘1.’’ When the ﬁlter is centered on a ‘‘0,’’ however, there are (2(p 9 k) ; 1) ‘‘0’’ ’s within the window. The sampling pattern determines the number of ‘‘1’’ ’s coincident with active ﬁlter elements. Therefore, at most (p 9 k) ; 1 ‘‘0’’ ’s must have an error bit with a binomial probability to cause the ﬁlter to register a bit error. This error rate is effectively multiplied by 2(p 9 k) to account for the window length. Another source of bit errors derives from a data set composed of more than one kind of E . The rejection ability of N (2k ; 1, m) to distinguish

between different E binary patterns is mixed. Repeated applications of a given ﬁlter for a repeated binary pattern occurring at least once in a stream of other patterns similarly repeated converges to a separation of different E by 0’s or by a LOMO(2p ; 1) sequence of 1’s. This suggests a combination of ﬁltering and image subtraction employing 3 , 3 and ﬁlters to

ﬁnd and locate a binary pattern given knowledge of the pattern intended. 3. Representation of 2D Oscillating Data The oscillating roots discussed earlier for 3 (N ) are expanded here to 2D

for a ﬁnite set of data. First, we must address a functional description of the data in order to discuss the conditions imposed by the coincidence condition. There are two basic types of oscillating data — points and bars. However, as we shall see, bar data may be composed from point data. The 2D representation of oscillating point data may be composed by the matrix product of 1D forms A and B, one shifted by one datum with respect to the other, B : shifted A, and the union of the matrix products. 67 We deﬁne the data types A and B as oscillating data, for example: A : 1010 . . . , B : 0101 . . . , initially, and then later we will generalize to binary patterns E . If we consider these data types as vectors of data, then we may deﬁne the following matrices: A : A A, B : B B, C : B A, and

288

BART WILBURN

D : A B, where the prime denotes ‘‘transpose.’’ The A and B data results in: 1 0 1 0 A:

C:

0 0 0 0 1 0 1 0

0 0 0 0 ,

B:

0 1 0 1 0 0 0 0

0 0 0 0

0 1 0 1

0 0 0 0

0 1 0 1

1 0 1 0 0 0 0 0

,

D:

1 0 1 0

0 0 0 0 0 1 0 1

,

,

0 0 0 0

These sets of data, and the unions of these sets in all combinations, are the possibilities of patterns of oscillating data. The general conditions on the data that they are patterns of oscillating 2D data in the 2D space of X;Y are the properties of the matrices A, B, C, and D as follows: (a) Each matrix is its own identity: AA : A, BB : B, CC : C, DD : D. (b) The intersection of dissimilar matrices is a zero matrix: A . B : ,0- : C . D, A . D : ,0- : A . C, B . C : ,0- : B . D. (c) The multiplication of dissimilar matrices is a zero matrix: M M : ,0-; i, j : A, B, C, D & i " j.

(45)

4. Oscillating Roots of (L )

The ﬁrst observation we can make is that, just as in the 1D case, 2D oscillating data will not satisfy the predicate conditions for oscillating roots of all (L ) ﬁlters, and the oscillating roots, quasi- and otherwise, of (L )

are more complex. By inspection against the q ; 1 criterion for coincidence and the predicate conditions of the data, we ﬁnd that (L ) has quasi- and

real roots for the following: A, B, C, D A / B, C / D, A / C, B / C, A / D, B / D

(46)

(A / C / D) or (B / C / D). For purposes of describing ﬁlter response, this set of matrices can be reduced by eliminating shift-equivalent matrices. For example, we may note that B, C, and D are shifted versions of A, A / B is a shifted version of

THEORY OF RANKED-ORDER FILTERS

289

Figure 18. Elementary quasi-root of (L ).

C / D, and A / D and B / D are shifted versions of B / C and A / C, respectively. Thus, we may adequately describe the ﬁlter response with A, A / B, A / C, A / D, and (A / C / D) closed under A, B, C, and D. The result is the elementary point root of (L ) shown in Fig. 18.

By the same reasoning as applied to the 1D case, (L ): L : 2N 9 1,

L : 2q ; 1, N : 2k ; 1 $ q : 2k & k-even by the requirement of coincidence = q-even & q 4. (See Eqs. (30) and (31) and cor(c (3)).) The (L ) has a quasi-root response to the oscillating patterns given by:

A, A / B, and the parallel bar pattern given by: A / C or A / D. (The quasi-root of (L ) bar data also includes single monovalued bars spanning

the data set in the dimension of the bar.) The (L ) also has a true root

response to the juxtaposition of a bar pattern on both X and Y axes as a consequence of the sampling function of (L ) having no diagonal compo

nents. This root constitutes a crosshatch pattern reﬂecting (A / C / D) shown in Fig. 12. These results permit the occurrence of oscillating roots juxtaposed with ﬁxed point roots in images (Fig. 19). 5. Roots of Coded W indow Based on (L )

The result shown in the preceding suggests that the structure of (L ) is the

class of structures that includes 2D coded ﬁlters for repeated 2D binary functions, N (L , M) : L : 2N 9 1, L : 2q ; 1, q : 2k; M : 2s ; 1. The

construction of the coded ﬁlter N (L , M) is entirely a 2D analog of the 1D

I (N, m) where the 2D parameter L corresponds to the 1D parameter N as

the length of the ﬁlter, namely, L is the number of active elements. In similar fashion, M corresponds to m as the extent of the ﬁlter including the inactive elements. The ﬁlter N (L , M) is based on (L ) so that the sampling

function is of the same form, but modiﬁed to represent E in the same way as for I (N, m), except that now the E is represented in 2D as Rep. ,E E -.

The data compaction of the coded window construction allows the ﬁlter to

290

BART WILBURN

Figure 19. Crosshatch root of (L ).

apply to repeated binary patterns within the 2D space in the same manner as for the 1D case. By virtue of membership in the (L ) class of ﬁlters,

however, N (L , M) retains the classiﬁcation strictures of the (L ) given

by Eq. (45). The conditions of Eqs. (45) and (46) are necessary to ensure the independence of the repeated binary patterns on the X and Y axes. The representation of data for the 2D coded window ﬁlter N (L , M), is

based on the same matrix formulations as for the oscillating roots to (L ).

In this case, however, the matrices are formed by multiplication of E E . . . E 0 and 0E E . . . E instead of 1010 . . . and 0101 . . . , where E must be repeated at least once. Thus, we let vectors: A : E E 0 and B : 0E E , and form the matrices A, B, C, and D. The coded ﬁlter response is described then as before with A, A / B, A / C, A / D, and (A / C / D). Because the roots are true roots, and (A / C / D) subsumes A / B and A / C, we may further simplify the description by eliminating A / C and A / D in favor of (A / C / D) to result in rt.(1): A, rt.(2): A / B, and rt.(3): (A / C / D). For N (L , M), type-II noise suppression follows f (L ), and type-I ﬁxed

point response follows LOMO(s ; 1). An example of N (L , M) is pro vided by (13, 21) derived from (9, 11) wherein E : 10101, and

shown in Fig. 20. The predicate is deﬁned by the sampling function and conditions of Eq. (28) modiﬁed by the 2D adaptation of Eq. (32), and the requirement of repeated binary patterns. The writing function is deﬁned over M. Using the symbol ‘‘*’’ to indicate a binary ‘‘1’’ and a blank to be a ‘‘0,’’ the images of the elementary roots of (13, 21); rt(1): A, rt(2): A / B, and

rt(3): (A / C / D) are shown in Fig. 21.

THEORY OF RANKED-ORDER FILTERS

291

Figure 20. (13, 21).

5.a. Veriﬁcation and Application of N (L , M)

The veriﬁcation of the 2D coded window ﬁlter (13, 21) is shown in Fig.

22. The veriﬁcation is a test against a computer-generated 64 ; 64 pixel binary image a^b; a : 180, b : 0 of the root: rt(1), with normally distributed noise added according to: Normal(x): x : Rnormal(, ), : 128, : /2(3. Figure 22 shows: (a) target image of rt(1); (b) noisy image of rt(1); (c) ﬁltered image of (b) by (13, 21) iff R(u ); and (d) ﬁltered image

of (b) by (13, 21) iff R (u ); cor(c): : ' . The ﬁltered images (c) and

(d) differ in that (c) is simple ﬁltering and (d) is ﬁltering constrained for feature extraction, that is, R (u ) ; cor(c) : the forcing condition of coinci dence, : ' . The measured background SNR(x) of the noisy image (a) and the measured background SNR (x) of the ﬁltered image (c) are: SNR(X ) : 3.41, SNR (x) : 10.49, respectively. This compares well to the prediction of (18) using L in place of N for the 2D application, L : 13; SNR(x) : 3.46, SNR (x) : 10.65.

292

BART WILBURN

Figure 21. Elementary Roots of (13, 21).

The result in Fig. 22 (d) could be improved considerably by simply applying (13, 21) subject to R (u ); cor(c) a second time. This is because

the target pattern satisﬁes R(u ) by virtue of the companion data in E , whereas many (if not most) of the spurious responses in the background do not have those companion data and therefore will not satisfy R (u ).

F. Octagonal, Hexogonal and 3D Filters The octagonal ﬁlters are best introduced by explaining why the square ﬁlter is not a class of ﬁlters that exists as a true ﬁlter. The introduction is

THEORY OF RANKED-ORDER FILTERS

293

(a)

(b)

Figure 22. (13, 21). (a) Original image, (b) noisy image.

facilitated by explaining an apparent contradiction to this assertion in the form of the popular 3 ; 3 MW ﬁlter. The 3 ; 3 MW ﬁlter is, by all outward appearances, a square ﬁlter and it is a true ﬁlter, which accounts for its popularity. We will show, however, that it is not a square ﬁlter, but is in fact the smallest member of the class of octagonal ﬁlters. A class of ﬁlter is

294

BART WILBURN (c)

(d)

Figure 22. Continued. (c) Simple ﬁltered image, and (d) feature extraction image.

established by the properties of having true roots of the sort described in the preceding, namely, FP and oscillating, speciﬁed by the order of the ﬁlter as a member of a set of integers, for example, N : 2k ; 1, k : ,positive integers-, or k : ,positive, even integers-. That is, if a ﬁlter is a true ﬁlter, speciﬁcally, has true roots, all sizes of that type of ﬁlter have true roots.

THEORY OF RANKED-ORDER FILTERS

295

1. Morphological Perspective Let us analyze the response of a 3 ; 3 MW ﬁlter from a morphological point of view. We will conduct a thought experiment as follows: Suppose that the 3 ; 3 MW ﬁlter is a square ﬁlter ) (L ) ; L : 2q ; 1, q 4 and L : N, N : 2k ; 1. Now imagine that ) (9) is placed in a ﬁeld containing some isolated object composed of contiguous pixels. Let us suppose further that the object is an FP root. The description of the object as an isolated object is important insofar as it means that it has edges removed from the boundary of the ﬁeld. This may seem a trivial statement, but it enters into the analysis in the following way. Imagine that the ﬁlter is scanned over the ﬁeld from left to right and top to bottom. In the course of scanning, the ﬁlter encounters the object so that the leftmost pixel of the uppermost edge of the object is sampled by the q ; 1 (or center) element of the ﬁlter. The object is, by hypothesis, an FP object. Therefore, the pixel sampled by the center (or output) element of the ﬁlter must satisfy the requirement of being the output of the MW ﬁlter, namely, the coincidence condition. For simplicity, and without any loss of generality, we may suppose that the object is a set of ‘‘1’s’’ surrounded by ‘‘0’s.’’ For this case, the output requirement is that q ; 1 elements of the ﬁlter ‘‘downstream’’ in sample space are ﬁlled. The necessary pattern of data sampled by the ﬁrst encounter of the ) (9) with the object to result in an FP response is shown in Fig. 23. We may continue this experiment by supposing that the pixel shown in Fig. 23 sampled by the leftmost element of the bottom row of the ﬁlter, that is, element 3:1, is the uppermost pixel of the leftmost edge of the object. Let us repeat the experiment with the ﬁlter shifted down one row such that the pixel marking the top of the leftmost edge is sampled by the q ; 1 (or center) element of the ﬁlter. For this object pixel to be an output of the ﬁlter, the required pattern of the object sampled by the ﬁlter is shown in Fig. 24. We can readily see that the pattern shown in Fig. 24 is compatible with being part of the same object shown in Fig. 23. We repeat the experiment again with the ﬁlter shifted down another row and obtain a necessary pattern shown in Fig. 25. This pattern is also compatible with being part of the same object shown in Figs. 23 and 24.

Figure 23. First encounter of ) (9) with object.

296

BART WILBURN

Figure 24. Second encounter of ) (9) with object.

By imposing the symmetry required for an object to be an FP on exit as well as on entry of a ﬁlter, we see that an elemental FP satisfying the ) (9) is the pattern shown in Fig. 26. If the ) (9) is indeed a member of a class of ﬁlters that are square ﬁlters, then we should be able to repeat the ) (9) experiment for ) (L ), L 9 with the result being a larger FP pattern. Indeed, the result should have a similarity relationship to that shown in Fig. 26 scaled by the size of the ﬁlter so as to include the pattern shown in Fig. 26. The next available size of ) (L ) is L : 25. (N.B.: L : N & L : 2q ; 1; N : (2k ; 1), thus k : 2 $ q : 12.) Imagine conducting the forementioned ) (9) experiment with ) (25). Then the analog to the pattern shown in Fig. 23 is as shown in Fig. 27 for ) (25). If, as before, we assume the pixel in the 4:1 ﬁlter element of Fig. 27 to be the topmost pixel of the leftmost edge of the object and shift the ﬁlter down one row as we did for ) (9), then we see a problem. When the center ﬁlter element samples that top pixel of the left edge, the pattern of data sampled by the ﬁlter window fails to permit that pixel to satisfy the coincidence condition for it to be the output of the ﬁlter. The pattern of data is shown in Fig. 28 and as we can see contains only 10 object pixels. Whereas the aspect ratio of the stepped corners of the ) (9) FP object shown in Fig. 26 is 1:1, the aspect ratio of the stepped corner in Fig. 28 is 2:1. To be sure we could redraw the object in Fig. 28, maintaining the left edge from Fig. 27, and ﬁll 13 ﬁlter elements. However, it would not be compatible with being part of the same object shown in Fig. 27. The only

Figure 25. Third encounter of ) (9) with object.

THEORY OF RANKED-ORDER FILTERS

Figure 26. FP of ) (9).

Figure 27. First encounter of ) (25) with object.

Figure 28. Second encounter of ) (9) with object.

297

298

BART WILBURN

Figure 29. New class of ﬁlter.

way an object with an aspect ratio n:1, n 1 can satisfy the output requirement of the coincidence condition for every shift down by one row is for that object to extend beyond the ﬁeld of data in the X direction, or in the Y direction for an aspect ratio 1:n, n 1. This means that the roots to ) (L ), L : 9 are quasi-roots, and thus ) (L ), L : 9 is a false ﬁlter and ) (L ) is not a legitimate class of ﬁlters. So what do we make of the 3 ; 3 MW ﬁlter that apparently has a true FP root? What we make of it is that we suspect that it is a member of a different class of ﬁlter. Suppose we construct a ﬁlter as shown in Fig. 29. This ﬁlter is denoted by * (Q), where Q : 2v ; 1, and for this instance has a dimension of Q : 25. Let us suppose that (Q) is a class of ﬁlters. * We may also note that Q : 4N 9 3 or Q : 2(2N 9 1) 9 1, that is, Q : 2L 9 1. There is an interesting analog to take note of here between Q : 2L 9 1 and L : 2N 9 1. This leads us to note that as N : 2k ; 1, then L : 4k ; 1 and Q : 8k ; 1, and as L : 2q ; 1, and Q : 2v : 1, then q : 2k and v : 4k. If Q : 25, then k : 3; conversely, if k : 1, then Q : 9, and thus (9) is the smallest member of (Q). (N.B.: (9) is a 3 ; 3 MW ﬁlter.) * * * We need to test the supposition that (Q) is a class of ﬁlters. To do this, * we will conduct the same experiment for (25) that we did for ‘‘(25).’’ (We * use scare quotes because ) (25) is not a legitimate ﬁlter.) The analog to Fig. 24 for the sampling of the leftmost pixel of the topmost edge by the center ﬁlter element results in a necessary data pattern for satisfying the coincidence condition as shown in Fig. 30. If we continue the experiment for (25) as we did in the foregoing, we * will ﬁnd that there exists an isolated object pattern that satisﬁes the requirements of an FP root to (25) as shown in Fig. 31. * We can see by inspection that this isolated object is an octagon with edge k ; 1 in length, and thus it has symmetry about its axes. Furthermore, we

THEORY OF RANKED-ORDER FILTERS

299

Figure 30. First encounter of (25) with object. *

can see that the response for (25) includes the (9) response. Therefore, * * we can conclude by induction this this FP response continues for (Q) for * all Q : 2v ; 1; v : 4k, k 1. This establishes that (Q) is a class of ﬁlters. * The octagonal response gives rise to the name of (Q) being the class of * octagonal ﬁlters by virtue of its FP roots being a family of octagonal

Figure 31. Pulse-type FP of (25). *

300

BART WILBURN

patterns. Finally, as we indicated much earlier, because FP roots to (Q) * include the 3 ; 3 MW, and the square ﬁlter is not a ﬁlter class, the so-called 3 ; 3 square ﬁlter is really the smallest member of (Q) at Q : 9 for k : 1. * 2. Construction of (Q) Fixed-Point Roots * It will not be necessary to describe the details of the root computation to (Q) as the rationale is straightforward and parallels the computation for * (L ) described in Section III.B. We may take a clue for the rationale from

the analog mentioned earlier between (Q) and (L ), namely, *

L : 2N 9 1, L : 2q ; 1, q : 2k and Q : 2L 9 1, Q : 2v ; 1, v : 4k. Recall that the structure of (L ) was the juxtaposition of two 1D ﬁlters 3(N), one

rotated with respect to the other by /2 and constrained to share the intersecting pixel in evaluating the coincidence condition. The result of this construction of (L ) predicate was an FP that was a square structure

(k ; 1) ; (k ; 1) with monotonicity in X and Y. The FP root of (Q) is * found in a completely analogous way as the juxtaposition of two (L )

ﬁlters, one rotated with respect to the other by /4 and sharing the intersecting pixel in evaluating the coincidence condition. This construction is shown conceptually in Fig. 32.

Figure 32. Construction of (Q). *

THEORY OF RANKED-ORDER FILTERS

301

Figure 33. (17) example FP with k-data. *

The FP of the (Q) is an octagonal structure that is monotonic in X and * Y and also on both diagonals. The requirement of diagonal monotonicity in addition to X and Y monotonicity constrains the FP of (Q) to be a * more rigidly deﬁned structure with a somewhat more complex k /k\ data structure. The k-data structure being all greater or less than any of the values in the monotonic k ; 1 sequence must reﬂect the diagonal as well as the orthogonal monotonic structure of the root. This means that a given column or row of k-data must satisfy the structure constraint simultaneously for four axes of monotonicity of the root as illustrated by example in Fig. 33 for (17), k : 2 showing the FP with the k-data pattern and the (17) * * overlaid on the pattern. 2.a. Discussion of Complex (17) Fixed-Point Roots * The computation of the grammar for (Q), such as the analysis given in * Section III.E for (Q), is incomplete and cannot be presented at this time.

Nevertheless, we can discuss some of the properties of a complex of (Q) *

302

BART WILBURN

Figure 34. Unsuccessful complex of (17) FP roots. *

FP roots. The ﬁrst thing we notice about forming a complex of (Q) FP * roots, which are octagonal, is that we cannot simply join them along shared edges and form a continuous FP structure. This is because octagons do not pack into a grid as do squares or hexagons. In order to form a continuous FP surface based on a complex of octagonal patterns, the octagons must overlap in a complex manner represented in the grammar of their combinations. The complexity of successful and unsuccessful patterns of overlapping octagons can be seen by examination of Figs. 34 and 35. Figure 34 is of a complex of four octagons, each one a monovalued pulse: Oct. A: [1], B: [2], C: [3], and D: [4]. The complex is formed by having C overlap D, B overlap C, and A overlap B in the manner shown in Fig. 34. The test for success is monotonicity in all four axes such that all pixels satisfy the coincidence condition. The pattern shown in Fig. 34 is unsuccessful by virtue of the pixels shown encircled in boldface that are forced to have the values of ‘‘3’ and ‘‘4.’’ These pixels violate the monotonicity requirement ‘‘4’’ in the vertical axis and ‘‘3’’ in a diagonal axis to satisfy the coincidence condition as well as the k-data pattern requirement. The errors of course multiply except for selection by substitution, which is not strictly an FP condition. Under closer examination, we can see that the errors can almost be removed by moving C down two rows. In this case of moving C down by two rows, a datum is left uncovered as shown in Fig. 35, that is, the

THEORY OF RANKED-ORDER FILTERS

303

Figure 35. Successful complex of (17) FP roots. *

datum in the center indicated by the boldface circle, and as such is constrained only to satisfy the k-data requirement. Thus we are free to set the value of that datum so as to be locally monotonic, in this case to the value of ‘‘2,’’ which of course satisﬁes the k-data requirement. This leaves one remaining errant pixel indicated by the character ‘‘a.’’ If left unchanged, it would have the value of D, or ‘‘4,’’ and would violate vertical monotonicity. If we intercede, however, and change it to ‘‘a : 3,’’ then the system has the integrity of a complex FP, and indeed, we can ﬁnd another pattern ‘‘E’’ indicated by the dashed octagon, to cover the ‘‘hole.’’ The result is a complete FP pattern over the set contained within the complex boarders of the octagonal ‘‘tiles,’’ or a fully successful complex of (17) FPs. * This exercise is not the only arrangement of (Q) FP roots possible to * result in a complex FP, and is only intended to illustrate the delicate balance involved in the conjunction of (Q) FP roots to satisfy a complex FP * structure. Furthermore, it is not a prelude to developing the alphabet or the grammar of the roots; that is a ‘‘work in progress.’’ An interesting consequence of this exercise, however, is that we notice that E, formed by the conjunction of A—D, begins to take on a complex form as an independent FP type or element of an alphabet of (Q) FP roots. Determining that *

304

BART WILBURN

alphabet is a prerequisite to computing the grammar of the conjunction of the roots. The octagonal structure of the (Q) FP roots allows the surface of the * FPs to have local maxima or minima internally and still satisfy the requirements of monotonicity. This means that the (Q) FP roots cannot * be characterized as ﬁrst-order monotonic surfaces as were the (L ) FP

roots. Nevertheless, the method of characterizing the surfaces of (Q) FP * roots, and analyzing their grammar, based on normal vectors may still apply, but with some modiﬁcations and perhaps in a more complex manner. The notion of the approach at present is that the (Q) FP roots are * themselves decomposable into subroots that are ﬁrst-order surfaces, and that they satisfy a subgrammar of root composition to types of roots. We should mention here that a hexagonal ﬁlter, denoted by (Y ), exists * as a class and is formed and analyzed in a manner analogous to the octagonal ﬁlter. The hexagonal ﬁlter is formed from the octagonal ﬁlter by excluding either the vertical or horizontal 1D arm. In this way, it explicity requires diagonal and either vertical or horizontal monotonicity, but excludes horizontal and vertical monotonicity, respectively. Further, we should note that the hexagonal ﬁlter packs into a grid. 3. T he Notion of 3D Feature Extraction The discussion of 3D ﬁlters is included in this section because of the similarity of their construction with octagonal and hexagonal ﬁlters. The construction of 3D ﬁlters parallels that of (Q) with the juxtaposition of * 1D ﬁlters on axes of monotonicity to construct 3D forms of (L ), (Y )

* and (Q) denoted as ! (8), ! () and ! ( ), respectively. The dimen

* * * sions of ! (8), ! (), and ! ( ) are:

* * 8 : 3N 9 2, N : 2k ; 1 $ 8 : 6k ; 1, thus for 8 : 2q ; 1, then q : 3k. : 5N 9 4, N : 2k ; 1, : 10k ; 1, thus for : 26 ; 1, then 6 : 5k. : 7N 9 6, N : 2k ; 1 $

: 14k ; 1, for

: 2v ; 1, then v : 7k.

The construction of ! (8) is illustrated in Fig. 36 for 8 : 13 and the

construction of ! () and ! ( ) is entirely analogous. * * No attempt will be made here to present the analysis of 3D root structure as, similar to the (Y ) and (Q), it is incomplete at this time awaiting * * further work. We can, however, discuss the roots. They are most easily envisaged as a cube in the case of the ! (8). As we will see, this structure

leads to an interesting notion for application to feature extraction that may provide some physical insight to the structure and composition of materials, both man-made and natural.

THEORY OF RANKED-ORDER FILTERS

305

Figure 36. Construction of !(13).

Let us imagine a 3D FP root to ! (8) to be a cube composed of

(k ; 1) ; (k ; 1) ; (k ; 1) voxels — ‘‘volume picture elements’’ analogous to pixels in a 2D image — as shown in Fig. 37. The immediate question for delineating the types of FP roots and their grammar is: How do we represent the value of the voxel? For the case of (L ) FP roots, the value was represented by the amplitude of a surface (see

Fig. 8). We cannot do this in the case of ! (8) because the three dimensions

of representation are used in deﬁning the voxel. This concern may seem unimportant until we consider how to approach the computation of the

Figure 37. Geometry of FP root of !(13).

306

BART WILBURN

alphabet of roots and their grammar. The method of computing the grammar of (L ) FP root was based on a model of the root as a surface

constrained to be a ﬁrst-order curve. The notion of how to visualize a 3D root as a complex of voxels is important in the same way, namely, how we visualize it is a model for the mathematical development of its structure and therefore follows the grammar. We cannot visualize the ! (8) FP root as surface, but we may visualize it

as a cube having an internal structure related to a detectable state of a 3D region. The structure detected by ! (8) is the relationship of the states of

voxels satisfying the locally monotonic structure of a ! (8) FP. The

representation of this model in mathematical terms cannot follow based on the continuity of the slope of a surface as in (L ) FP roots, but must access

a more abstract and elegant representation, one that may subsume the model used for (L ) FPs. We have seen that the valuation of pixels can be either

binary or multivalued, and we may easily extend this ﬂexibility to the 3D case. For example, we may envision that a voxel of material may or may not be polarized along some axis P + ,0, 1-, or that it may be polarized to some degree. Either measure can satisfy the LOMO(v ; 1) FP requirement of ! (8) in an abstract hyperdimensional space. The binary case is particularly

easy to incorporate into an abstract space of (L ) FP structure, and the

multivalued case as well, but in a fourth or hyperdimension. Whereas the (L ) FPs described the brightness, or state, of a region of

a surface, the ! (8) FPs describe the distribution of some detectable state,

for example, brightness or polarization in three dimensions. In this light, the analysis of ! (8) EP opens up a much wider scope of application to study

the internal structure of states of a material surface, or a volume of material. For example, in the case of a material surface, or a complex astronomical object, we may consider feature extraction from a hyperspectral image expressed in terms of the image cube I(X, Y, ). (In the next section we will discuss an interpretive transform of imagery based on a representation of features in terms of FP roots. In that case, the utility of this representation is particularly appropriate because our experience does not provide for an intuitive model of recognition of an object in an image cube.) In a similar fashion, we may consider extracting features of regions of polarization in terms of the Stokes vector components S (X, Y, ) or the volume distribution of polarization at a given wavelength S (X, Y, Z) . H The case of feature extraction of the volume distribution of species in material brings to mind several applications in the form of I(X, Y, Z), for example, those derived from composites of tomographic X-ray or acoustic imagery. These are seen mainly in medical imagery, but we should not overlook the cases of man-made microstructures, or of larger-scale 3D imagery such as seismic or marine mapping. It is implicit in these dis-

THEORY OF RANKED-ORDER FILTERS

307

cussions that although the detection of species in a volume of material is dependent on the probe, the method of feature extraction remains the same. Finally, we should also note that these applications also apply to the 3D coded window version of ! (8) and the grammar is almost trivial, as there

are only three types of roots. The application of a 3D ﬁlter constrained by a predicate condition for FP or binary patterns amounts to a feature extraction of internal structure expressed in a ﬁnite alphabet of terms just as in the case of the 2D ﬁlters. The signiﬁcance of the expression in terms of a ﬁnite alphabet will be explored in the next section. IV. A Language Model Based on Ranked-Order Filters The central thesis of this section is to advance the technology of object recognition. The development of automated object recognition discussed here is based on a top-down approach of simulating the natural phenomena of object recognition rather than a bottom-up approach of inventing an apparent function of object recognition with current technology. To that end, the development described here seeks to fulﬁll a necessary step in this top-down approach, and we will show how the advances in MW ﬁlters described herein factor in the development of object recognition technology. To begin, we must brieﬂy discuss what we understand natural object recognition to be, and then discuss the perspective of this development based on that understanding. We assume that automated object recognition is a tool that is useful for us. Thus it is one that we understand and its function is compatible with our understanding of objects. This understanding derives from the phenomena of natural object recognition by human beings. We assume the phenomenological perspective that natural object recognition by human beings is the predication of an object by a name conveying what it means to us for an object to be what it is. In this way, natural object recognition is the relationship of an observer to the world in a language community such that an observer interprets the world by language. For example, a human observer may assert the following: ‘‘That object is a desk.’’ Other people understand what that object is in terms of function, or application, and some notion of extension. In the case of automated object recognition, we must distinguish between recognition and classiﬁcation. A computing machine, functioning in the capacity of a tool to recognize objects must, like a human observer, point out an object in the context of an image and assign a name to it. Thus the tool identiﬁes that object. In this case of automated machine recognition of an object, the meaning of the object is conveyed by its name and is understood, rather than given, by the user of the tool.

308

BART WILBURN

The notion of meaning of an object is derived from the collection of relationships of an object to a user’s various experiences of that object, or one like it, in terms of purposive ends and events. This notion of meaning in the present tense context of a particular object is the signiﬁcance of that object and it is paramount to recognition of it. A machine (and here we mean a tool), then, must simulate the notion of signiﬁcance in the local context of objects. Simulating signiﬁcance involves the simulation of a user’s experience relevant to the object as if the user was in that local context having the present tense experience of that object. This is the classical ‘‘frame problem’’ of artiﬁcial intelligence (AI) and it has been the major stumbling block to efforts to develop automated object recognition. The purpose of this project is to take a step toward addressing the ‘‘frame problem’’ by establishing a commonality of language between the machine and the ‘‘perception’’ of the object by the machine. In the following paragraphs it is argued why establishing such a commonality of language is a necessary step toward fulﬁlling the goal of developing a useful tool for automated object recognition and, further, how recent developments in ranked-order ﬁlters opens a door to this development. We anticipate further that in pursuing this project we will also satisfy other useful applications of automated feature extraction from imagery.

A. The Necessity and Possibility of Interpretive Transforms of Imagery The experience involved in automated object recognition by a machine is simulated experience given by user of the tool in terms of his or her language, that is, with propositions. In order to relate the object to experience, the representation of the object must be logically compatible with the representation of experience, that is, both representations must be in propositional terms. In past efforts on this problem, the user of a computing tool interpreted an object to a representation in propositional terms and the result was variable from one user to another, making the process indeterminate. The ﬁrst problem then, before that of simulating cognition, is a translation or transform of imagery into a natural language of imagery that is logically compatible with the language of the tool. The second problem is the interpretation of the language of imagery to a representation of meaning in the user’s language. We may discuss object recognition in somewhat greater detail by deﬁning it as a true assertion by an observer to be some noun object that conveys the meaning of the object to be what it is. This means that the assertion of an object S to be a P is true if it ‘‘makes sense or is possible’’ in the context of the observer’s experience as a being-in-the-world with objects. This

THEORY OF RANKED-ORDER FILTERS

309

includes the collective experience of being in a community. That is, the assertion of the object to be some thing — P — must be logically consistent with the experience of the user in all past contexts having some relationship to the present tense of the local context where that object is occurring. For this logical consistency to be robust, the set of propositions deﬁning past experience must be revisable by veriﬁcation in the actual world. The objective before us here is, of course, a limited scope of simulating this phenomenon in a propositional logic model representing the object and experience it in propositions subject to logical consistency. In order to evaluate the logical consistency of the present tense context of the object and the relevant experience, the representation of the object must be by propositions in a direct transform from the image. This transform is a simulation of the natural interpretation of ideational complexes of objects to sentential complexes of propositions. This further means that the representation must be by a propositional language system, and hence it must be deﬁned on a ﬁnite alphabet. This approach differs signiﬁcantly from conventional approaches based on classiﬁcation of object data in terms of templates, or by Bayesian estimates of joint probabilities of occurrence. This approach is most similar to knowledge-based reasoning and truth maintenance systems, yet differs from them dramatically in the method of representing the object in language. The detailed arguments of this approach may be found in Wilburn (1998). The well-known limitation of template matching as a basis of object recognition follows from templates being extensional representations of features. As such, they are a limited sample of a very large, denumerably inﬁnite, number of possibilities of how that object may be found in whole or recognizable part. The distinction between the approach described here and Bayesian methods is subtler and needs further explanation. The conditional probability of the Bayesian estimate rests on the posterior and assumed prior probabilities of occurrence of objects to estimate a particular likely occurence of an object being one of an assumed set of objects. The classical weakness of the Bayesian method is the assumption of the prior probability. The major shortcoming, however, is that it is based purely on the probability of occurrence. Considerable advances have been made in recent years with Bayesianinference methods in the form of hybrids with knowledge-based systems, for example, by incorporating null constraints and descriptive terms of proximity and appearance into the object data to result in a classiﬁcation of objects by decision trees. Nevertheless, Bayesian-inference methods remain based fundamentally on a probability of joint occurrence. The Bayesian method does not convey information about an object’s identity inferred from the

310

BART WILBURN

relational structure of object data independent of any particular occurrence, or from the relationship of the object in question to other objects with respect to function. Methods of image reconstruction, or enhancement, such as maximum entropy, and most recently the pixon approach of Puetter and Pina (1998) that is related to Bayesian estimates, have demonstrated utility in estimating features by enhancing noisy and blurred images. These methods are also based on pixel-bound statistics, but in the case of the pixon, the formalism is augmented by a measure of structure in the form of a correlation length. Nevertheless, none of these statistical methods can ‘‘logically deduce’’ an object’s determinate identity in terms of function or effect on a purposive end. Statistical methods cannot provide a sense of meaning about an object because they lack semantic content of a ‘‘concept.’’ A concept must be represented by the intensional properties of a sort of object that are true of it in any individual instance and the intensions of an individual object true in any situation where it may be found. A concept of an object represented in this way can be related to experience of that object, or one like it, namely, a member of that sort of object. Intensional properties are those properties common to a sort of object that are true of it under any conditions of occurrence, for example, no matter where or under what conditions you ﬁnd them, all balls of any size, color, or surface design are spheres. Because Bayesian inferences lack this ability to represent the intensional property of an object, they fall short in the AI ‘‘frame problem’’ of recognition in contexts of apparent contradiction or partial data. Consider, for example, the following situations: partially obstructed, or occluding, objects such as a side view image of two ﬂatbed trucks parked beside each other but headed in opposite directions so that the beds are exactly overlapping; and another, perhaps more common case of objects in strange local contexts such as a cartoon, for example, a soldier standing in a ﬂowerpot. These problems are derived from the predicate of the object being deﬁned on pixel-bound statistics, that is, extensional data, rather than on relational structure true in all instances of the object and compatible with a grammatical logic indexed by function and events. The outcome of statistical methods is a probability estimate founded on historical occurrence rather than an abductive inference to the ‘‘best’’ answer based on logical consistency of available evidence and historical relationships associated with purposive ends. The assertion of ‘‘What an object is’’ is the answer to the question of: ‘‘What does it mean to us for it to be whatever it is?’’ The answer is an intentional interpretation of the extensional object in terms of application — what we do with it, or to it, or what it does to us — and identity, namely, existential quantiﬁcation. Object recognition then is a pointing-out-

THEORY OF RANKED-ORDER FILTERS

311

as of the individual object by an observer. That is, object recognition is an assertion by an observer, in the language understood by his or her community, to correctly signify the truth of a particular object to have the meaning to his or her community of being what it is. The recognition of an object is accomplished by language in a name conveying its meaning in terms of application, and we refer to the noun-phrase of a name as a substance-sort. Individual objects are members of a class, or sort, of object. The application of a sort of object is the consistent tautological deduction of all experience of that sort of object to be the ‘‘concept’’ of it shared by a community of observers. An individual member of a sort is distinguished from all other sorts of objects by intensional properties not shared by any other sort. A sort, then, is deﬁned by a nonempty logical intersection of the intensions of individuals. It follows therefore that we must extract features of objects in terms of intensions in order to be compatible with recognition of an object in terms of identity and application, or meaning. This requirement for an intensional representation of features leads us to the constrained MW ﬁlter for feature extraction in terms of ﬁxed-point (FP) roots of the ﬁlter. We have established that the ,e@- are a set of relational patterns independent of absolute pixel value, and that they represent the property of relation in that region of the image independently of the extension of the image. Further, we claim from experience that the relationship of values of data is reasonably invariant of any particular occurrence of an object. This means that the members of ,e@- are an intension of the image in that region. We have further shown in Section III.D that the relations required by R (u ) between the monotonic data and neighboring k data of the e@

$. D have the result that the e@ are constrained by rules of combination so that the co-joining or partial superposition of e@ satisﬁes a kind of grammatical compatibility. This entire argument means that an image of a feature represented by an ordered sequence of e@: (e@, . . . , e@ is then an intensional D DY representation of that feature. A collection of such co-joined, thus grammatically compatible, e@ satisﬁes a collection of predicates R (u ): R (u )

DY

D * R (u ), representing the predicate of an object in an image composed of

DY FP roots (e@, . . . , e@ R (u ). As a practical matter, we cannot say that D DY DY any image of that feature will have the same representation by (e@, . . . , e@. D D We can say, however, that any image from a set of images of that feature obtained within the limits of linear exposure, equivalent range, perspective and resolution, and digitized at the same level will have the same intensional representation. Furthermore, as individual objects are members of a sort of object, features are members of a sort of feature and images of them form families of images over range and perspective. The result is usually a fairly large set of possible images for which the intensional property of the feature

312

BART WILBURN

holds true in practice. The demonstrated utility of the ﬁlters for feature extraction further suggests that they may satisfy the requirement for detection of features of many, although certainly not all, objects of interest. The satisfaction of a grammar, or syntax, for conjunction of a ﬁnite set of FP roots that are intensions of objects leads us to the notion of a language system for an interpretive transform of features. Our motivation is that we may employ an interpretive transform for automated object recognition by a logical model based on propositions describing applications indexed by event, including other objects in the worlds of various events. What we need, then, is to develop a representation of features that can have semantic content, namely, to be true or false insofar as being in or not in the vocabulary of a semantic language and indexable to propositions of experience. This transform is the simulation of the transform of an ideational complex to a sentential complex as happens in natural object recognition. We must discuss the structure of a syntactic structure based on e@ and then show how this syntactic structure satisﬁes a semantic language model. In the simplest terms, a semantic language is a model of a propositional syntactic system (PCS). The system is a triple (A, L, S where: A: A set of denumerable atomic sentences. L: A set of logical symbols, for example, , , , , ^, (,)-. Respectively: ‘‘ ’’: negation; read as: ‘‘not,’’ ‘‘’’: conditional or entailment; read as: ‘‘if . . . then . . . ,’’ ‘‘’’: conjunction; read as: ‘‘and,’’ ‘‘^,’’: disjunction; read as: ‘‘or,’’ and ‘‘(,)’’: parentheses. S: The smallest set of sentences including A such that if A, B + S, then so are A and (AB). A language model of this system includes the concept of valuation of sentences composed by the logical connectives of atomic sentences; thus a language model includes the semantics of the calculus. Let us describe a language with the construction L ; L (1 , 1 , . . . , 1 ; p , p , . . .). The p’s are propositional variables and in the case of this language they are substituted with e@. The 1 are i( j)-ary connectives of the ith type connecting j variables, for example, 1 (p , p , . . . , p ). In our case, the variables are the e@, and the connection of any two may or may not be allowable depending on their types. Further, any logically allowable combination of the e@ must also be in the vocabulary of the language for it to have semantic value. In this case, we say the 1 are truth-and-relation functional connectives if there is a truth table for determining the truth value of 1 (e, e, . . . , e) to be either true(T ) or false(F) that it is in the vocabulary of the language. The truth value of any 1 (e, e, . . . , e) is determined according to the truth value of the individual e , and if the e are related by a relationship R (e, e, . . . , e) governing 1 permitted by the language. The relationship A

313

THEORY OF RANKED-ORDER FILTERS

R (e, e, . . . , e) introduces the relatedness component of the logical system A for a language model to be a semantic structure of nonclassical logic. This form of nonclassical logic is distinguished from classical logic that considers only form and truth-value neglecting content and relationship. The language L is based on a nonclassical logic. Nonclassical logic is deﬁned by two types of relational structure, namely, a set-assignment semantics and a relations-based semantics, athough in the end, the sentences deﬁned by either are semantically equivalent. A complete model must consider both, but our immediate concern here with the structure of the language will focus on the relation-based type. In the case of a relations-based semantics, we have for A and B the valuation of A and B connected according to 1: (1(A, B)) + ,T, F) determined as: (1(A, B)) : T iff R (A, B) is true, that is, 1(A, B) R (A, B). A A The relation R (A, B) is simply the relationship governing allowable comA binations of A and B. The truth-and-relation function f is the calculation of the truth value in a language of the combination of A and B given their individual existence as T or F. Suppose A and B are e and e, respectively. If (e) : T or F and (e) : T or F, then for a simple logical operation of 1 : ‘‘’’ in the language of ordinary propositional logic f( (e), (e)) is: f(T, T ) : T, f(T, F) : F, f(F, T ) : F, f(F, F) : F, if and only if the combination of e and e by is an allowed combination according to R(e, e) : T. A formal relation-based model M for a language L based on (e@ ) is ! D given then by M : (, R , 1 , 1 , . . . , 1 ; e, e, . . . , e; (e@ ), (e @ ), . . . A D D where: (e@ ), (e@ ), . . . complex propositions composed using 1 , D D is the evaluation (p) + ,T, F-, R * Sub(wffs(i )) is the relation governing the truth table for 1 allowed A by logical compatibility of the language. The valuation is applied to the complex propositions (e@ ) : 1 (e, D e, . . . , e) by: (1 (e, e, . . . , e)) : T iff

R (e, . . . , e) and f ( (e), . . . , (e)) : T

.

314

BART WILBURN

We can see now that the R deﬁnes the syntax of the nonclassical language A L . In the case of this language, the syntax we are addressing refers to the allowable combinations of characters rather than the more complex case we are accustomed to including word order. In the end, we will incorporate that as well, but that involves the predicate calculus of a quantiﬁcational syntactic system (QCS) and is beyond the scope of what is intended here. The semantic content is established by R and f ; R establishes that 1 is A A logically possible based on the methods developed in Section III.D, and f establishes that the combination according to 1 is in the vocabulary of L . The notion of a language based on an alphabet of e@ is that a feature d composed of e@ is an ordered sequence : (e@ , . . . , e@ ; (e@ ) R (u ) that D D DY ! ! D DY satisﬁes a syntax and has semantic content. The sets (e @ ) that have ! D semantic content in the sense of being a true or false valuation, namely, ( (e@ )) + ,T, F- in the vocabulary of a semantic model, are atomic senten! D ces closed under subsentences e@ . Sets (e@ ) that do not have this kind of D ! D atomic semantic value, that is, ( (e@ )) , ,T, F-, are molecular expressions. ! D Thus, if any (e@ ) belongs to a vocabulary of features it is atomic, and if ! D not, then it is molecular. We should point out that the (e@ ) are perhaps ! D better understood here as well-formed formulas (wffs), which are the individual e@ and their combinations by the logical connectives. N.B.: A sentence A: (A) + ,T, F- is a wff. The details of the structure of the e@ , and D the structure of a language based on them, are described in Wilburn (1998b) and brieﬂy outlined here. B. Satisfaction of a Propositional Language System We must now show how L satisﬁes the requirements of being a proposi tional language system (PCS), that is, we need to look at the properties a propositional language has that a language of imagery must also have. To do this, we need to deﬁne a few concepts of classical logic systems incorporating the notions of syntax and semantics. A property that stands out in importance is the property of ﬁnitary entailment, and explaining it will serve to illustrate these notions. The theorems for transitivity and semantic-syntactic deduction for ﬁnite consequences can describe ﬁnitary entailment. In this exampe, ; is a set of sentences in a language model, and A and B are sentences, or propositions, in that model. Transitivity: ; / ,A , . . . , A -:B iff ;:A , i : 1.n, then :B Semantic consequences: ; / ,A , . . . , A -:B iff ;:(A A . . .A )B, where ‘‘:’’ means ‘‘validates,’’ for example, A :B: A validates B, or B is a semantic consequence of A. The case of :A means: A is a tautology, or is

THEORY OF RANKED-ORDER FILTERS

315

true in every model. A wff that is always true in every model is a valid wff. The theorem of semantic consequences is an analog to the case of ﬁnite syntactic consequences as follows: ; / ,A , . . . , A - 9 B iff ; 9 A (A (. . . (A B) . . . ) or ; / ,A , . . . , A - 9B iff ; 9 ((A A . . .A ) B). In this expression, ‘‘9’’ is read as ‘‘deducible from,’’ for example, 9B: B is deducible from in a logical proof. In the case of B + , -, then we refer to B as a theorem of the system and may denote it as 9 B. In this case, we may also refer to a theory closed under a rule, for example, modus ponens, denoted by Th( ) : ,A: 9A-. We can understand these expressions of ﬁnitary entailment as two forms of expressing a logic — one is semantic and the other is syntactic. A semantic description is in terms of the truth of propositions and a syntactic description is in terms of the theroemhood of propositions. Finitary entailment is founded on the notions of consistency and completeness of a system, and includes the important property of compactness. (a) A system is consistent if all of its theorems are valid well-formedformulas (wffs), meaning always true, in all models, namely, all theorems are tautologies: If 9 A, then : A. (b) A system is complete if all valid wffs are also theorems of the system: If : A, then 9 A. (c) A system that is both consistent and complete is strongly complete: 9 A iff : A. (d) A system is compact in either a semantic or syntactic sense of: ; : A iff there is a ﬁnite * ; such that : A, and similarly for ‘‘9.’’ Thus, compactness incorporates the notion of ﬁniteness. The notions of consistency, completeness and compactness imply some other important properties of systems. A compact system has a ﬁnite model of it, it is without contradiction (consistent), and it is a model of all its elements (complete). Succinctly stated (Epstein, 1995): (a) Consistency means that for every A, ; 9 A or ; 9 A, but not both. (b) Completeness means that for every A, A or A is in ;. (c) If a system is consistent and there is a D: 9 D (using ‘‘9’’ to mean: D is not deducible from ), then there is a strongly complete ; such that D , ; and * ;. (d) Every consistent set of wffs has a model.

316

BART WILBURN

(e) If ; is strongly complete, then ; has a model and every ﬁnite subset of ; has a model. (f ) If ; is consistent, then there is some A: 9 A. These deﬁnitions and theorems serve to introduce the basic notions of semantic and syntactic entailment, consistency, completeness, and compactness as properties of language systems. To explore the structure of a nonclassical language based on these properties, however, we need to employ a more powerful or ﬂexible metalanguage. We introduce this metalanguage by recalling the notion of valuation given earlier as + ,T, F- to be a mapping of sentences into ,T, F-. If a valuation maps all sentences of a language into ,T, F-, the language is a bivalent language, in this case speciﬁcally a bivalent propositional language. A valuation is an admissible valuation if it is a member of a set of points V L ; + V L , associated with the closure set of wffs of a language L. The truth valuation space of a wff A is H(A) : , + V L ; (A) : T -; H(A) is the truth set of A, meaning the set of all points in V L where A is true, or more logically stated: the set of points where A is satisﬁed. The valuation space of the language L is: H : (V L , ,H(A) ; A + L-. We have several useful deﬁnitions that follow from this concept (Van Fraasen, 1971a): (a) A wff is a valid wff, : A, in L iff every admissible valuation in H of L satisﬁes A, that is, (A) : T for all , + V L . (b) A set of wffs X is unassailable if every admissible valuation of L satisﬁes some member of X. (c) A set X of L semantically entails A, X : A, in L iff every of L that satisﬁes X also satisﬁes A. (d) The set H(X ) : > H(A) is the elementary class of X. A union of /Z0 elementary classes that span H is the cover of H. These results are a kind of restatement of those given earlier for consistency and completeness, but deﬁned here in set-theoretic terms. With this approach, we may summarize these deﬁnitions in theorems as follows: (Def.: ` : the null set): (a) : A iff H(A) : H, (b) X is unassailable iff + H(A) : H, that is, if the truth set of all of /Z0 X is the cover of H. (c) X is satisﬁable iff > H(A) " `, that is, if the elementary class of /Z0 X is not empty. (d) B : A iff H(B) > H(A) " `.

THEORY OF RANKED-ORDER FILTERS

317

(e) X : A iff > H(B ) * H(A), that is, if the elementary class of X is a $Z0 subset of truth set of A. We may now redeﬁne compactness in terms of intersection and union. Compactness is described in two forms (Van Fraasen, 1971b): I-compact (intersection), and U-compact (union). (a) A language L and its valuation space H is I-compact iff for any set X of wffs in L, > H(A) : ` only if > H(A) : ` for any ﬁnite /Z0 /Z1 subset Y of X. This is the same as saying that the property of I-compactness means that any set in L is satisﬁable iff all of its ﬁnite subsets are satisﬁable. (b) A language L and its valuation space is U-compact iff for any set X of L, + H(A) : H only if + H(A) : H for some ﬁnite subset Y /Z0 /Z1 of X. This is the same as saying that the property of U-compactness means that any set in L is unassailable only if it has a ﬁnite unassailable subset. (c) A language that is both I-compact and U-compact is compact. Finitary semantic entailment is deﬁnable in these terms now as a property of a language: X :A iff for any X of L and a wff A of L, H(X ) * H(A) only if H(Y ) * H(A) for some ﬁnite subset Y of X. There are some other more subtle aspects of this subject not given here, and the reader is referred to Wilburn (1998b) for a more complete discussion. The language we have discussed in connection with the model M, is as we have said, a bivalent language. A bivalent language has the inherent property of exclusion negation. A language has this property if for every wff A of L there is an A* of L such that: H(A*) : H 9 H(A). A basic theorem is given by Van Fraasen (1971c) that connects compactness and ﬁnitary entailment for a language having exclusion negation: Theorem IV.4-A: If a language L has exclusion negation, then: (a) (b) (c) (d)

L is I-compact. L is U-compact. L is compact. L has ﬁnitary entailment.

This theorem result for ﬁnitary entailment is conditional, however, and not necessary. For ﬁnitary semantic entailment to be necessary, we need the property of convergence that is supplied by the construction of a ﬁlter. The customary use is to prove compactness. We introduce the logical construct of ﬁlters to illustrate how a nonclassical propositional language is represented and distinguishes propositions. Finally, we will show by example that

318

BART WILBURN

L satisﬁes the requirements of being a nonclassical propositional language system. In the foregoing we saw from Van Fraasen (1971a, 1971b, 1971c) that a language L having exclusion negation and compactness implies ﬁnitary entailment, but not necessarily. The missing condition is the convergence of a ﬁlter. As noted, the notion of ﬁlters has found application in logic as a tool for proving compactness and ﬁnitary entailment, and that it is used here as well. However, we also appeal to ﬁlters as a tool for distinguishing features by ﬁnitary semantic entailment, namely, semantic deduction for ﬁnite consequences. To do this, we must understand ﬁlters and show that the language L is compatible with the structure of ﬁlters. Finitary entail ment is intuitively necessary in order to make a deducible assertion in a language based on ﬁnite evidence. Convergence is also intuitively necessary in order that an assertion is consistent or unambiguously understandable in that language. Even if the assertion is a disjunction, for example, S is P or Q or . . . , it must be the same disjunction given same the evidence for the assertion. A ﬁlter I is deﬁned on a set of sentences X in terms of the valuation space H(X ) to be the set X in I such that: (a) ` , I. (b) If Y + I and Y *Z*X, then Z + I. (c) If Y + I and Z + I, then Y . Z + I. This deﬁnition in terms of H on L leads to L(I) : ,A: H(A) + I-. From this follows that if for i : 1, n; ,A - + L(I-, then if ,A - : B in L, then B + I and L has ﬁnitary entailment, thus L(I) is a system. A ﬁlter I may contain subﬁlters, or ﬁlter bases B that can generate ﬁlter I such that I contains B. A ﬁlter I on X is an ultraﬁlter if there is no ﬁlter on X that contains I as a proper part, namely, it is a maximal element as the basis for including all of X subject to H(X ); (X) : T. That is: Ultraﬁlter I: ,A: H(A); (X) : T for every A + X in L-, and every ﬁlter base is contained in an ultraﬁlter. The notion of maximal element in itself implies ﬁniteness, but more rigorously, X must be ﬁnite because the system is deﬁned for A, B, A and (AB), and we cannot have a sentence that is a maximal element of an inﬁnite conjunction. (N.B.: In this case, we have the system deﬁned for the truth-functionally complete L( , ).) This construction of the system also deﬁnes the ﬁlter to be deﬁned on H over the closure set of X and it is ﬁnite. Thus the union of the elementary classes is the closure set of X.

THEORY OF RANKED-ORDER FILTERS

319

If I is an ultraﬁlter on X, then (Van Fraasen, 1971a): (a) Y / Z + I iff Y + I or Z + I, for all Y, Z * X. (b) For every Y * X, either Y + I or X 9 Y + I. With this result, we can see if L(I) is bivalent, compact and convergent, then L(I) has ﬁnitary semantic entailment, and thus L(I) is a language system. As we shall see directly, convergence is closely allied to compactness and is important. Convergence is deﬁned analogous to compactness in terms of intersection and union as: (a) U-convergent to in H iff every elementary class containing belongs to I. (b) I-convergent to in H iff every elementary class in I contains . (c) Convergent to if every elementary class belongs to I iff it contains . Let us imagine a simple language L , of atomic sentences p, q, and r, and the truth functionally complete syntax of L ( , ). Further, let us imagine the complex connectives 1 to be represented by the connective ‘‘’’ of atomic sentences A and B as (AB), deﬁned as A co-joined with B and subject to the relatedness logic predicate of R(A, B). It is most important to emphasize here again that (AB) does not mean the occurrence of A and B in an image, but rather the occurrence of A joined with B in an image. For our purposes here, we will assume that ‘‘’’ means simple joining, but in actual fact, as mentioned earlier, it is a little more complex than that allowing for overlaps of the data satisfying FP constraints. However, for now this interpretation will do. Let us imagine that the variables are deﬁned as p: an up-ramp (ramp ) of levels 0—5, q: a pulse of level 5, and r: a down-ramp (ramp\) of levels 5—0. Now suppose the set of sentences involving p, q, r are subject to R(A, B) : ,T, F-; (AB) R(A, B) as follows: ( pq); R ( p, q) : T (qr) ; R (q, r) : T ( pqr); R ( p, q, r) : T (qq) ; R (q, q) : T

(pr) ; R ( p, r) : F (rp) ; R ( p, r) : F ( pp); R ( p, p) : F (rr) ; R (r, r) : F

(qp) ; R ( p, q) : F (rq) ; R (q, r) : F (qrp) ; R ( p, q, r) : F (rpq) ; R ( p, q, r) : F (rqp) ; R (p, q, r) : F

A language based on this set of sentences may be thought to correspond roughly to a language based on the 1D FP roots described in the foregoing. (We should also postulate a second type of pulse of level 0, but that would lead to a larger set of V L that does not materially add to the explication intended here.) We may now compose a partial truth table in the usual fashion for sentences. We need to pause, however, and take note that the

320

BART WILBURN

nonclassical aspect of this logic imposes a condition on the set of sentences that might be overlooked. Those sentences for which R(A, B) : F, such as (pr) are absurd, that is, they simply do not exist in X. In this case, the axiom of (A, L, S; ‘‘S: The smallest set of sentences including A such that if A, B + S, then so are A and (AB)’’ is subject to R(A, B) : T. With this in mind, we may proceed with p, q, r, ( pq), (qr) and (pqr). With the three atomic sentences (wffs) p, q, r, we have H : 2, H : 8 for , . . . , comprising V L . The truth table is shown as follows (T : 1, F : 0). We can demonstrate the properties of L using the tools developed in the foregoing. We may construct a ﬁlter base B on a subset Y of X; Y : p, q as: B : H( p), H(q), and H(pq). B : , , , , -, , , , , -, , , -. The ﬁlter B is a base because Y : p, q, and pq is satisﬁable, that is, H(A) : , , -, which is the elementary class of Y . We may also > /Z1 observe that the ﬁlter base B is the valuation space of the closure set of pq. The ﬁlter base B generates a ﬁlter I over the complete subset 1 Y ? Y that contains B , B * I : [N.B.: (a^b) : ( a b).] 1 I : H( p), H(q), H( pq), H(( pq)^( p q)), H( ( p q)), 1 H( ( pq)), H( ( p q)), H( p^ p). When we represent this in terms of V L , this corresponds to: I : , , , , 1 -, , , , , -, , , -, , , , , 1 , , , , , , -, , , , , , , -, , , , , , , -, , , , , , , , , -. This is a more interesting ﬁlter than B . We see that the tautology H( p^ p) : H, and the contradiction H( p p) : `. Note that exclusion negation accounts for not including H( ( pq)). Further, that > H(A) : , , - and + H(A) : H, and that > H(A) * H( p q), /Z1 /Z1 /Z1 that is, , , - * , , , , , , -, thus Y : q [N.B.: p q : ( p q)]. This shows that L is I-compact and U-compact for a ﬁnite subset Y of X. Thus for every X in L , L is compact and has ﬁnitary semantic entailment including ﬁnite axiomatizability. (We could generate a similar ﬁlter I based on B : H(q), H(r), and H(qr).) The question 1 now looms: is ﬁlter I an ultraﬁlter? The answer is no. It is itself contained 1 in a ﬁlter I over X: p, q, r, and its family of allowable conjunctions. Adding r to the subset Y to be the set X of L results in a ﬁlter base B , 0 B : H( p), H(q), H(r), H( pq), H(qr), H( pqr). 0 This base generates a ﬁlter I.

321

THEORY OF RANKED-ORDER FILTERS TABLE IV Truth Table of L ( , , R, p, q, r)

p

q

r

p

q

r

pq: R(p, q)

qr: R(p, r)

pqr: R(q, r)

1 1 1 1 0 0 0 0

1 1 0 0 1 1 0 0

1 0 1 0 1 0 1 0

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

1 1 0 0 0 0 0 0

1 0 0 0 1 0 0 0

1 0 0 0 0 0 0 0

I : H( p), H(q), H(r), H( pq), H(( pq)^( p q)), H( ( p q)), H( ( pq)), H( ( p q)), H(qr), H((qr)^( q r)), H( ( q r)), H( ( qr)), H( (q r)), H(p^ p), H(pqr), H(( pqr)^( pq r)), H( ( p(qr))), H( ( p (qr))), H( ( p (qr))), H( ( p( qr))), H( ( p ( qr))), H( ( p(q r))), H( ( p ( qr))), H( ( p (q r))), H( ( p ( q r))), H( (( pq) r)), H( ( ( pq)r)), H( ( ( pq) r)), H( ( ( pq)r)), H( ( ( p q)r)), H( ( ( p q) r)). When we represent this in terms of V L , the ﬁlter I shows that the language L is I-convergent to , - and U-convergent to + H, and has ﬁnitary semantic entailment. Furthermore, the ﬁlter I is an ultraﬁlter as it contains I ; B * I * I and there is no ﬁlter that contains I as a proper 1 1 part. Thus L (I) is a nonclassical propositional language system. The effect of R(A, B) : 0 is to reduce the number of truth sets in I, but has no effect on the compactness and convergence of L (I). Let us examine ﬁnitary entailment with a few examples. Suppose we ﬁnd (pq). Then may we infer (pqr)? The case then is: ( pq) ( pqr)? ( pq) ( pqr) : (( pq) ( pqr)). The test is made by >

/Z0

H(A) * H(( pq) ( pqr)).

322

BART WILBURN

The set X is the ﬁlter base of I. Thus, in the case of p, q, and r, > H(A) : , -, the elementary class of X. The truth set of the entailment /Z0 is H( (( pq) ( pqr))) : H( (( pq) r)) : H 9 . Clearly *H9 . Thus, the test succeeds; therefore: (pq) ( pqr) is true in I at . This means that if we ﬁnd an isolated object of (pq), then we may infer that it is possible that ( pqr) is the actual object. Let us suppose that instead of (pq), we ﬁnd (qr). Can we infer that (pqr) is possible? By the same analysis we know that the elementary class is and ﬁnd the truth set of the entailment is: H((qr) ( pqr)) : H 9 = *H9 . Thus, the entailment succeeds. Finally, for completeness, let us suppose we ﬁnd q. Then can we infer (pq)? Here again, the elementary class of p and q is , , -, and the truth set of (q ( pq)) is: H((q ( pq) : H 9 , , = , , - * H 9 , , -. Thus, the test succeeds and the inference can be made in I at , , -. A 1 similar example can be shown for (p ( pq)), and for (qr) entailed by q or r but in I at , , -. 1 Let us consider another case. Suppose we ﬁnd (p q). Then may we infer (p qr)? We apply the same procedures as before, that is, X: p, q, and r, and the elementary class is . The truth set of the entailment sentence (p q) (p qr) is: H( (( p q) ( p qr))) : H( ( ( p q) r)) : H 9 , and *H9 . Thus, the test shows that the entailment of (p q) ( p qr) succeeds and if we ﬁnd ( p q), we can infer that (p qr) is possible in I at . The complement works as well: ( qr) ( p qr). A similar analysis, however, shows that if we ﬁnd (p q) we cannot infer (pqr). This is because we have asserted (p q) and replacement is not allowed by this

THEORY OF RANKED-ORDER FILTERS

323

logic. If, on the other hand, we ﬁnd p or q or r, we can infer (pqr) in I, for example, p ( pqr): * H 9 , , , -. The difference between ﬁnding (p q) instead of p, q, r, or (pq) distinguishes the possibility of (p qr) from the possibility of (pqr). There are not many degrees of freedom in this language of only eight valuations in V L , but we can see enough to understand how this structure will work. We see that p implies q, (p q), in I at , , -, and that p or 1 q implies (pq) also at , , - in I . In similar fashion, we saw how q or 1 r implies (qr) in I at , , -. We have also seen how p, q, or r, or (pq) 1 or (qr) implies (pqr) in I at , and ﬁnally we have seen how (p q) or ( qr) implies (p qr) in I at , but excludes the possibility of (pqr). The notion behind these examples of entailment is that we can infer an object from a sufﬁcient subset of its parts, and we can distinguish some objects from others by identiﬁcation, or by implication from a sufﬁcient subset of its parts. When we recall that the set of V L for L is 2, and that the connectives 1 account for overlap and joins on four faces in 2D, then we can begin to appreciate the complexity of the language system to satisfy delineation of many objects. If we were to substitute (e, e, . . . , e) for (p, q, r, . . . , s), with the 1 and f , being truth functionally complete, then L is L and we can see that L would satisfy the requirements for L (I) to be a satisfactory language structure for a linguistic transform of features in imagery. This is the principle result: If f , f , . . . , f for connectives 1 (e, e, . . . , e), 1 , 1 , . . . , 1 ; R (e, . . . , e) A in L are truth-functionally complete, then L (I) is a nonclassical proposi tional language system that is compact, bivalent and convergent.

C. Reﬂections A student new to this subject might question why the representation I does not include logical connectives of p and r. It is worth taking some time to reﬂect on this as the reason it does not is instructive. It is easy to see that a ﬁlter base B deﬁned on p and r is not satisﬁable because the intersection is a null set, and thus it follows that a subﬁlter cannot be constructed based it. The ardent student might suppose we could construct a ﬁlter base B for Y : p and r: B : H( p), H( r), H( p r) B : , , , , -, , , , , ), , , -.

324

BART WILBURN

The elementary class of Y is , , -, thus Y is satisﬁable. Suppose we continue with a ﬁlter I , based on B . I : H( p), H( r), H( p r), H(( pr)^( p r)), H( ( p r)), H( ( pr)), H( ( p r), H( p^ p) I : , , , , -, , , , , -, , , -, , , -, , , , , , , , , -, , , , , , , -, , , , , , , -, , , , , , , , , -. We could stop right here because (pr) is not deﬁned in H(( pr)^ ( p r)). Thus the set of sentences would seem to be not unassailable. Nevertheless, let us push on and we will see how the notion falls apart. We can see that > H(A) : , , - and + H(A) : H, and thus Y /Z1 /Z1 is compact. The problem is that the ultraﬁlter I is I-convergent to rather than I-convergent to , , -. Nevertheless, it would seem that Y is compact and bivalent, therefore it should have ﬁnitary entailment:

and

Y : p r iff > H(A) * H(9p r) /Z1

, , - * , , , , , , -, and indeed it does, and, furthermore, X : p r by virtue of , - * , , , , , , -. So how does this happens? The resolution of the paradox comes about from realizing that I may be a ﬁlter, but it is not a subﬁlter of I: I : / I. The sentences involving p and r may, in principle, appear that they can be included in X subject to ﬁnitary entailment under I, even though I does not delineate them. (Recall the theorems deﬁning an ultraﬁlter: (a) Y / Z + I iff Y + I or Z + I, for all Y, Z * X, or (b) For every Y * X, either Y + I or X 9 Y + I). Is Y * X a problem? Yes, it is, but the assailability of the subset and the disagreement of I-convergence prohibit I from being contained in I. The sentences of ﬁnitary entailment involving p and r may seem to be mathematically included in X. However, there is good reason why they are not included in I and should not be included in the language derived from it. The variables for our example language of L are deﬁned as the \ occurrence of a 1D FP, that is, p : 1 means that the p-type FP occurred. A sentence expressing the conjunction of p & q: pq, satisﬁes R( p, q) and has semantic content in the language with a syntax founded on the notion that it is possible that pq can occur. The connection between the semantic language expression of pq and the actual world occurrence of p, q or pq

THEORY OF RANKED-ORDER FILTERS

325

is the basic problem, and we will address that later in a discussion of the ontology of language. Nevertheless, the relations R of L are based on governing the occurrence, not the nonoccurrence, of conjunctions. The conjunctions involving p and r are for the nonoccurence of one or both of them, namely, p : 1 or r : 1 is true, or p : 1 and r : 1 is true. The ﬁlter base B is satisﬁable only if R( p, r) : 1. Thus B is compatible with a language structure based on nonoccurrence, or p, q, r, and R( p, q), R( q, r), R( p, r), and so on. For a language based on nonoccurrence to interpret the actual world of p, q, and r, the governing relations must be deﬁned for nonoccurrence, and the resulting language structure would be the complement of I I-converging to . The ﬁlter bases B and B are in some negatives of each other and mixing them in the same system deﬁned by the occurrence relations R is a bit like apples and oranges, or more appropriately apples and nonapples. We must respect the primacy and consistency of the relatedness of the nonclassical logic deﬁning L (I) that we redeﬁne in terms of e@ to be L (I). D. Ontological Considerations Ontological considerations concern the relationship between a language of objects and the existence of objects in imagery. We must understand that L is an artiﬁcial language and we use natural language as the metalanguage to discuss it. The language L samples the image and detects patterns of data in imagery that satisfy its characters and syntax. There is much more to the relationship than this simple statement, however, and we discover this richness in resolving what may appear to be contradiction found upon close examination of sentences permitted by I in L (I). For example, in our pedagogical language of L based on p, q, and r, both sentences of ﬁnitary entailment, that is, If p, then q: p q, and If not p, then not q: p q, are permitted following R( pq) : 1 in I . Both of these sentences derive from the truth conditions of pq and therefore p q being true. We use the term entailment here by convention, but it is more accurately material implication denoted by A 9 B in the case of ordinary propositional logic for a contingent universe. The truth conditions are that the sentence A 9 B is T for all cases of A, B + ,T, F- except for A : T and B : F. The truth tables for the sentences (p q) exempliﬁed by H( ( p q)), and ( p q) exempliﬁed by H( ( pq)), are as follows in Table V. As can be seen in Table V, the conditions of (b) and (c) are contradictory. So what is the relationship between L and the actual world of images? There can be only one actual world, yet the legitimate language permits contradictory expressions of it. The resolution lies in analyzing the terms

326

BART WILBURN TABLE V Truth Table for p q and p q

a b c d

p

q

( p q)

( pq)

T T F F

T F T F

T F T T

T T F T

‘‘expression’’ and ‘‘actual world.’’ We will consider them separately and then relate them together. We will employ the notion of a sentence as a well-formed-formula (wff) and develop the notion of wff in terms of atomic sentences. The reader may think the use of wff is redundant, and some people do, but the notion of wffs permits a delineation of properties without confusion more easily than does the conventional understanding of the term sentence in a metalanguage. We must, however, distinguish between a sentence as a wff, and as a molecular expression that is not in the vocabulary of a language, and further still, we must distinguish between a wff and a valid wff. This brings us into the realm of modal logic addressing necessity and possibility, a realm we will delve into only sparingly in this discourse. If we imagine an image of objects being referenced to time and place, then we consider it as a snapshot of a piece of the actual world at some time. We may think of an image as a representation of some state-of-affairs or relationships linking objects of that place at that time. In this way, we may regard the image as a ‘‘world’’ unto itself. Other images of the same place at different times and from the same, or different, perspective are themselves distinct worlds. Of course images of different places at any time are also distinct worlds in the same sense of representing an event involving places, time and relationships. If we relate an image to our experience of that place at that time, and even further still our experience of those objects and places at other times, then we deﬁne a world that has meaning to us in terms of our experience. The point is that an image is an actual world, but there are an indeﬁnitely large number of worlds that are possible involving any given object representing the context of that object in terms of relationships binding it to other objects and features comprising objects. Finally, these possible worlds can be related to experience. With our understanding of wff and possible worlds given in the foregoing, we may begin to link the two in a context of necessity and possibility. The reader may recall the deﬁnition of a valid wff as a wff that is true in all models of a syntactic system. We have deﬁned a syntactic system PCS as

THEORY OF RANKED-ORDER FILTERS

327

the triple (A, L , S consisting of A, the set of atomic sentences, L , the logical operators, and the grammar S deﬁned in L by R . We further A deﬁned a model of a syntactic system to be the incorporation of the notion of valuation of a sentence to have semantic content in the sense of being T or F. Earlier, we incorporated semantic content based only on the logical consequences of the syntax to deﬁne the truth space H. Now we wish to relate the syntactic system of this artiﬁcial language to the actual world so that the truth space of the language is relevant to the actual world. We relate the syntax of L to the actual world by interpreting a model of the PCS as one having its vocabulary deﬁned on its truth space, but constrained by the existence of its terms in the set of worlds or images. In the ideal case, the set of all images has denumerably inﬁnite possibilities, and in the real case, it is a class of images having an indeﬁnitely large number of possibilities. We will address the justiﬁcation for this constraint on the vocabulary later, but for now, an image, be it of real or imagined objects, is a possible world that may be an actual world at some instant of time, and the set of all images in some class is the semantic basis of L . A valid wff of L , then, is a wff in L that is true in all such images spanning all time, and an ordinary wff is true in at least one such image. Because a valid wff is true in all worlds (images) we may say that it is necessarily true, that is, it is not possible to ﬁnd an image where it is not true. For example, a tautology is necessarily true, namely, (p^ p) is always true in all images. For an ordinary wff that is true in some, but not all worlds (images), we may say that it is possibly true, that is, it is possible to ﬁnd an image where it is true, or where it is false, or perhaps both. (Here we understand that being false means that it does not exist.) In this sense, a tautology is independent of an actual world and is an artifact of the language. Propositions that are not tautologies may be a valid wff, but generally they are contingent on the actual world. An object of data, or data object, in an image that satisﬁes an atomic sentence in L is contingent. For example, and hereafter we will use L as a stand-in for L with the understanding that p, q, or r can be some choice of the e@ in L , an object that may be described by p may occur in some image, but may not occur in another image, and similarly for q and r. Furthermore, objects described by the conjunction (pq) or (qr) — meaning, for example, p co-joined with q as described in Section III.D — may occur in some image and not in another. It is possible that p may occur co-joined with q, but it is not necessary. Thus it is possible that p, q, ( pq), ( p q) or ( p q) are true statements about objects in some image (and similarly for q and r). The structure of L reﬂects this possibility. The problem comes about from knowing that 3 (N ) cannot detect a conjunction described by (pr)

328

BART WILBURN

in any image. We cannot discuss the problem of p and r in the same manner as for p and q or q and r; we cannot even say that it is necessary that (p r) or ( p r) are not possible because these sentences are logically absurd. The absurdity occurs because they are entailed by p, r, and (pr), but (pr) is not possible. These sentences simply do not exist in the vocabulary of L . The entailment sentences of p and r would permit the possibility of p occurring co-joined with r, which is not possible in the actual world, and coincidentally (pr) is not a possible sentence of L , that is, it is not wff of L . We cannot even say that ( pr) is a necessary property of the actual world as this would still imply (pr) because it is entailed by it. We cannot say anything about (pr) in L . If this seems severe, then we would observe that this situation could be generalized. The reader may be impatient because he or she would complain that of course (pr) is impossible because you simply cannot put these two patterns together with a single set of data and satisfy the monotonicity requirements with the shared data. The consequences of the nonexistence of (pr), however, apply equally to any other pattern of data that do not satisfy the requirements of simply being an FP, yet do exist, to our senses, in an actual image. The ontological implications of this are interesting. It is interesting to consider the question of the existence of (p q), or any equivalent variables, in the language structure. We have shown that the construct of ﬁnitary entailment is essential to the logical structure of a language. Furthermore, we postulate from philosophical arguments that ﬁnitary entailment is an essential mode of conscious thought expressing the temporality of the ego to project a possible relationship of the observer with his or her world onto the object-horizon of his or her existence. This suggests that the logical structure of language reﬂects the essence of the ego, that is, temporality. The interesting part of the question motivated by L comes about because we have also shown that the construction of ﬁnitary entailment is predicted on the detection of existent objects and their possible combinations in the actual world. The full impact of ﬁnitary entailment may not be illustrated by the binary combination of two characters, but the logical extension of (1 (e, e, . . . , e)) (1 (e, e, . . . , e)) begins to tell the story. The relationship between the conscious construction of ﬁnitary entailment and the actual world is subtle and complex, turning on the notion of the observer being embedded in the actual world as a part of it. The issue is not that some pattern of data does not exist in some possible world, but that for some N, 3 (N ) would not detect it and L cannot express it. In

a world perceived by 3 (N ), an object described by (pr) or any non-FP

does not exist, thus there is no basis in the experience of a monolingual observer in the language community of 3 (N ) for describing it. This

THEORY OF RANKED-ORDER FILTERS

329

implies that patterns of data in an image that do not satisfy any wff of a language are not experienced by the user of that language in the perception of that image because they are, in a sense, unintelligible to a monolingual observer in that language community. This seems like a rather strong statement until we remember that we have been developing this line of thought based on an artiﬁcial language. The conclusion may be easier to understand if stated in the converse: Any patterns of data that are not represented in the experience of an observer are not expressible by that observer, thus there is no evidence of his or her perception of it shared in that community. The structure of the artiﬁcial language is a simulacrum of natural language, so this conclusion serves to give us an appreciation of the richness of natural language. This conclusion and the ontological question bring us to the notion of interpreting an ideational complex of an object into a sentential complex about an object and to understand why this is not a neo-Kantian idea that the existence of objects conforms to our knowledge of them. The remaining discussion is based on the developments in phenomenological research spanning the twentieth century. The line of phenomenological thought began with Husserl (1970) circa 1900 continuing through Heidegger (1982), G. Frege (1952), and L. Wittgenstein (1945) in the ﬁrst half of the century, and it has attracted the attention of several contemporary thinkers too numerous to cite here, but good references are Smith and McIntyre (1982), and Dreyfus (1995). The reader will ﬁnd, however, that the argument is reminiscent of Aristotle and Plato. The delineation of objects, the notion of predication, and the terms of intensional properties and intensions are reminiscent of Aristotelian substances, sorts, essences and accidents. Indeed, the reader will ﬁnd much in phenomenology derived from Plato and Aristotle, which may tempt some to comment that we have not really come all that far since Plato. This might be a rather harsh judgment, nevertheless, it is fair to say that the foundation of modern analysis was formed by Greek philosophy culminating in Aristotelian logic. The modern contribution relevant here is the philosophy of mind and language accounting for accessibility of intellect to the actual world, and the formalisms of modern mathematics and logic. Details of the discussion on modeling object recognition given here may be found in Wilburn (1998). The notion of an artiﬁcial interpretative transform of a complex of objects is the transform of a complex of data objects to a complex of sentences. These sentences must be compatible with sentences expressing experience relevant to the actual world of the object in order that they can be subject to an abductive inference — a judgment — modeled by logical consistency. This notion is a simulation of object recognition by a natural observer. A natural observer performs an interpretive transform of an identical complex

330

BART WILBURN

of objects and experience into a complex of sentences and makes a judgment of the evidence to assert an object in the act of recognition. The ideational complexes are grounded on the actual world, and as such, are the experience of an observer expressible in terms of language. The assertion is the evidence of an observer’s understanding of an object in terms of identity and application, and it is understandable within a language community. The ground for the transform of an ideational complex of an object, then, is the observer’s experience of being in the world as an undetached part of it. His or her experience in the world is both a disclosure of it to the observer and the observer’s discovery of it [Dreyfus (1995)]. The world is disclosed to the observer in ontological transcendence as a being-in-the-world, and he or she discovers of the world in ontic transcedence of coping with of things in the world. The evidence of the observer’s experience is reﬂected in his or her language as a mode of understanding what it means for an object to be what it is, and as a mode of coping with it. Finally, the observer’s language is a representation of a grammatical logic structure derived from his or her ontological/ontic transcendence, and it necessarily entails a community. The employment of the artiﬁcial language L for automated image understanding would be a two-level simulation of the grammatical logic of a natural observer in the context of an actual world of an image. The ﬁrst level is the simulation of the disclosure of the world modeled by the (L )

ﬁlter and represented by the nonclassical PCS modeled by L described here. The second level is the simulation of the discovery of the world modeled by an indexical structure incorporated into a quantiﬁcational language system (QCS) not described here. The artiﬁcial language is a system L (I), and it includes sentences that are valid wff, for example, (p^ p), and wffs true in possible worlds, but does not include any sentences describing objects (L ) cannot detect. Objects that (L ) cannot

detect are ignored. The structure of L (I) into subﬁlters, I * I, can be employed to delineate nouns as the transform of (e@). This approach to ! the assignment of nouns means that the system must be ‘‘taught’’ by the user of this artiﬁcial intelligence and represents the artiﬁcial experience of the artiﬁcial intelligent system. The nouns, then, constitute the vocabulary of the system. Sentences of (e@) that are not distinguishable by subﬁlters may be ! considered as molecular expressions and, as such, are not atomic sentences even though they may be in H of L (I ). The approach to an interpretive transform explored here is an effort to address the classical ‘‘frame problem’’ of artiﬁcial intelligence. The ‘‘frame problem’’ is the representation of the primordial relation of an observer to an object in the context including both of them in the act of object recognition. The idea of an AI system based on L (I ) is tenable because the ,e@- of (L ) are a ﬁnite set of intensional representations of features and

THEORY OF RANKED-ORDER FILTERS

331

they satisfy a grammar permitting L (I ). This means that the representation of the detected object by a machine is in a form compatible with a logical model of an observer recognizing an object to be some thing. Clearly, this kind of AI system would be a very restricted simulation; nevertheless, such a system could prove to be a very useful tool. As time goes by, other methods of feature extraction may be developed based on the MW ﬁlters discussed here, or other ﬁlters altogether that satisfy the requirements of a language system, and they may be incorporated into an AI system.

V. Conclusion This chapter presented a review of recent work to investigate the structure and properties of 1D and 2D ranked-order ﬁlters with particular attention given to the median rank, or median window ﬁlter. We also presented some new developments in coded window, octagonal and 3D ﬁlters, and we explored some applications. The approach that permitted the full understanding of structure and function enabling application was a mathematical logic investigation in contrast to the customary statistical approach. The ﬁlter described as the (L ) ﬁlter received the most attention because it is

the most completely understood variant of 2D ranked-order ﬁlters, but we described the structure and root forms of the octagonal and 3D ﬁlters and alluded to the hexagonal ﬁlter. We also brieﬂy discussed applications of these ﬁlters with the 3D variant in particular showing promise of useful applications to interpreting hyperspectral imagery. The hexagonal and octagonal ﬁlters suggest a somewhat grander scheme than previously thought; one constituting a hierarchy of dialects differing in complexity deﬁned on degrees of monotonicity. The application of the (L ) to feature extraction was easily anticipated

from the formalism and it was shown to be remarkably effective for some kinds of imagery. It was that utility and the discovery that the ﬁxed-point roots to (L ) satisﬁed a propositional language that proved to be the most

intriguing. The juxtaposition of utility for feature extraction and a syntactic structure of the elements of features suggested the possibility of automated interpretive transform of features prerequisite to an AI system for image understanding. The key element of this idea is that (L ) enables a machine

to ‘‘read’’ an image in terms of a language compatible with a model for abductive inference of what an object is. It is the hope of this investigator that this work will continue with success and encourage the communities of ontology, artiﬁcial intelligence and image processing to learn each other’s language so that they may work together in pursuit of developing visual artiﬁcial intelligence.

332

BART WILBURN

References Bovick, A., Huang, T. S., and Muson, D. (1983). A generalization of median ﬁltering using linear combinations of order statistics, IEEE Trans. Acoust. Speech and Signal Process, 31:1342—1350. Dreyfus, H. (1995). Being-in-the-World: A Commentary on Heidegger’s Being and T ime, Division I, Cambridge: MA/London: England: The MIT Press. Eberly, D., Longbotham, H. G., and Aragon, J. (1991). Complete classiﬁcation of roots to 1-dimensional median and ranked-order ﬁlters, IEEE Trans. Acoust. Speech and Signal Process., 39:197—200. Epstein, R. L. (1995). T he Semantic Foundations of L ogic, London: Oxford University Press, Chapter IV.H. Frege, G. (1892). Uber Sinn und Bedeutung. Zeitschrift fur Philosphie und philosphische Kritik 100 (1892) (Trans., On Sense and Reference in Frege, Philosophical Writings, P. Geach and M. Black, eds., London: Oxford-Blackwell, 1952. Frieden, B. R. (1998). Probability, Statistical Optics, and Data Testing, Berlin, Heidelberg, New York: Springer-Verlag, p. 257. Heidegger, M. (1982). T he Basic Problems of Phenomenology (Trans., Introduction and L exicon, Albert Hofstadter, Indiana University Press), Published in German as Die Grundproblems der Phanomenologie, Vittorio Klostermann, 1975. Husserl, E. (1970). L ogical Investigations, J. N. Findlay, translator, London: Routledge & Kegan Paul. Critical edition: L ogische Untersuchungen. Bd. I, Elmar Holenstein, editor, Husserliana XV III. The Hague: Nijhoff, 1975; Bd. II (in 2 parts), Ursula Panzer, ed., Husserliana XIX. The Hague: Nijhoff, 1984. First Edition 1900 (Vol. I), 1901 (Vol. II). Justasson, B. I. (1982). Median Filtering: Statistical Properties, in Two-Dimensional Signal Processing II: Transforms and Median Filters, T. S. Huang, ed., Berlin: Springer-Verlag, pp. 161—196. Longbotham, H. G. (1989). Theory of order statistic ﬁlters and their relationship to FIR ﬁlters, IEEE Trans. Acoust. Speech and Signal Process., 37:275—287. Puetter, R. C. and Pina, R. K. (0000). Pixon-based image restoration, http://www.stsci.edu./stsci/ meetings/irw/proceedings/puetter.dir/puetterr.html. Schoenfeld, J. R. (1967a). Mathematical L ogic, Reading, MA: Addison-Wesley, Chapters 6 and 7. Schoenfeld, J. R. (1967b). Mathematical L ogic, Reading, MA: Addison-Wesley, pp. 282—292. Smith, D. W. and McIntyre, R. (1982). Husserl and Intentionality, Dordrecht: Holland/Boston: D. Reidel Publishing Company. Tukey, J. W. (1970). Exploratory Data Analysis, Reading, MA: Addison-Wesley Publishing Co., Chapter 5, pp. 5-11—5-31. Tyan, S. G. (1982). Median Filtering: Statistical Properties, in Two-Dimensional Signal Processing II: Transforms and Median Filters, T. S. Huang, ed., Berlin: Springer-Verlag, pp. 197—218. Van Fraasen, B. C. (1971a). Formal Semantics of L ogic, New York: McMillan Company, pp. 31—34. Van Fraasen, B. C. (1971b). Formal Semantics of L ogic, New York: MacMillan, p. 36. Van Fraasen, B. C. (1971c). Formal Semantics of L ogic, New York: MacMillan, pp. 40—51. Wilburn, J. B. (1998a). Developments in generalized ranked-order ﬁlters, JOSA A, 15:1084—1099. Wilburn, J. B. (1998b). Exploring a language model of features in imagery, Proc. 4th Army Conf. Appl. Stat. 1998 and JETAI (Submitted 10 Nov. 98). Wilburn, J. B. (1998c). A possible worlds model of object recognition, Synthese, 116:(3), 403—438. Wittgenstein, L. (1945). Philosophical Investigations, Trans. G. E. M. Anscomb, Basil Blackwell & Mott, Ltd., 1967.

Index

A Adjacency relations, 101—102 in 3D well-composed sets, 106—109 in 2D well-composed sets, 115 Aharonov-Bohm (AB) effect dynamics of, 69 electromagnetic momentum conservation, 69—70 interaction of passing classical electron with rotating quantum cylinder, 74—81 interior of solenoid, 86—90 objections to standard interpretation, 56—63 quantum and canonical transformation ambiguities and future work on, 90—93 scattering, 58—59 shielding and, 71—74 solution of closed system, 81—86 stability of, 70—71 vector potential, 63—66, 72—74 Aharonov-Casher effect, 79 Ajudgment, 329—330 Anisotropic diffusion ﬁltering, 33—34 Artiﬁcial intelligence frame problem, interpretive transforms and automated object recognition deﬁned, 307—308, 310—311 conclusions, 331 consistency, completeness, and compactness, 315—317 convergence, 317, 318—319 exclusion negation, 317 ﬁnitary entailment, 314—323 logical connectives, 323—325 ontological considerations, 325—331

propositional syntactic/language system (PCS), 312, 314—323 translation/transform of imagery, 308—314 ultraﬁlter, 318—319, 324

B Background, 102 Bayesian estimates, 309—310 Bessed functions, 60—61 Bilinear interpolation, 140—141 Binary digital image, 102, 161 making well-composed, 135—139 Biorthogonal wavelets, 23 Black point, 102 Blocking artifacts, 3, 15, 20 Block partitioning, 10, 12 Boundary faces, properties of, 111—112 Box sampling method, 234

C Canonical concepts of emission mechanism, 167—171 Canonical momentum, 73, 78 Canonical transformation ambiguities, 90—93 Centroid linkage algorithm, 40—41, 45 Chain coding, 27 Closed surface, simple, 105, 111—113 digital version, 113—114 Coded window ﬁlter, 285—287, 289—292 Coifman’s wavelets, 23 Coincidence condition, 249, 256 Common face, 107 Compatibility, ﬁxed-point roots and computation of, 275—278

333

334

INDEX

Compatibility (Cont.) conditions needed for, 281—282 criteria for, 273—275 example of calculation, 279—281 Cones, 5 Conjugate quadrature ﬁlter (CQF), 20 Connection number, 126 Connectivity paradoxes, 114—116 Continuous analog, of a digital set, 98 local properties of, in 3D wellcomposed sets, 103—106 Continuous representation of real objects, 142—146 Contour coding, 31, 39—40 Contour texture modeling, 40 Contrast resolution, 7 Corner adjacency, 106, 108 Corner points, 103—106 Coulomb gauge, 68—69, 76 Crossing number, 123

D Daubechies’ wavelets, 23 Decomposition/transformation, 8—9 Diagonally adjacency, 106 Digital bordered manifolds, 99 Digital characterization, of 3D wellcomposed sets, 106—109 Digital image, 146 Digital n-arc, 114 Digital sets, 98 Digitization of well-composed images continuous representation of real objects, 142—146 deﬁned, 146 outcome of, 149—153 segmentation and, 146—149 Directional ﬁltering, 27—31 Discrete cosine transform (DCT), 2 coder, 10—15 shape-adaptive, 38—39 Discrete wavelet transform (DWT), 23 Dynamic coding, 4

E Edge adjacency, 106, 108 Edge detection, 26—27 Electromagnetic momentum conservation, 69—70 Electron-optical systems, applications for, 165—166 Electron quasi-lasers (EQLs), 228—231 Embedded zerotree wavelet (EZW), 24 Endpoints, 114, 118—121 Entropy, maximum, 310 EPIC, 24 Euler characteristic, 117 Exclusion negation, 317 Expanding binary images, 137—139 by bilinear interpolation, 140—141 Explosive breakdown, 172—176, 185

F Face adjacency, 106 Fast transform techniques, 13 Feature extraction, 264—268, 304—307 Feature-width morphological pyramid (FMP), 29, 30, 31 Fermi-Dirac distribution function, 168 Feynman propagator, 62 Filter banks, 20 Finitary entailment, 314—323 First-generation image coding, 2 limitations of, 3 Fixed-point roots. See Rank-order (RO) ﬁlters Forcing condition, 247—249 Foreground, 102 Fourier tansform, 28 Fovea, 5 Fowler-Nordheim straight line, 167 Fractal model, 37, 44—46 Freeman chain coding, 39—40

G Gaussian Markov random ﬁeld (GMRF) model, 42, 46 Gaussian pyramid, 17

INDEX

Gibbs phenomenon of linear ﬁlters, 20, 28 Go¨ppert-Mayer transformation, 90—91 Grammar of ﬁxed-point roots, 253—254, 268—282 Graph/tree theory, 37, 43—44 Gray-level images, 161 histogram of checkerboard patterns, 157—159 making well-composed, 140—141 thresholding, 154—157 Grid system, 160

H Heisenberg uncertainty principle, 212 Hexagonal roots, 304 Hilbert space of functions, 60, 61 Histogram of checkerboard patterns, 157—159 H.261 compression standard, 2 Human visual system (HVS), 3, 4—8

I Image features, 3 Intensional properties, 267, 310 International Telecommunications Union (ITU), 2 Inverse gradient ﬁlter, 33, 40 Irreducible well-composed sets, 118, 123—126 graph structure of, 126—127

335

L Landau gauge, 63—64, 65 Langmuir-McCown equation, 214—215 Laplacian pyramid, 17 Layered zero coefﬁcient (LZC), 26 Lorentz force, 66—67, 70 Luminance, 5

M Mathematical morphology, 19, 40 M-band ﬁlters, 20 Mean square distortion (MSD), 10 Mean square error (MSE), 3, 10, 14 Medial axis transformation (MAT), 42 Median window ﬁlter, 233, 234 See also Rank-order (RO) ﬁlters Minimum spanning forest (MSF), 43 M-JPEG standard, 1—2 Modulation transfer function (MTF), 6, 33 Morphological directional coding, 29—30 MPEG-4 standard, 4, 10, 32, 47 MPEG standard, 2, 14 Muller’s projector, 211—213, 215, 220 Multiscale/pyramidal coding, 15—19

N

Jordan-Brouwer separation theorem, 100, 109—110 Jordan curve theorem, 96, 97, 113—114, 117 Jordan n-curve, 114 JPEG standard, 10, 12, 14

n-boundary point, 122 n-dimensional bordered manifold, 99 n-interior/kernel of x, 122 n-irreducible, 118, 123—126 graph structure of, 126—127 Non-stationary thermal ﬁeld emission. See Thermal ﬁeld emission, nonstationary Nottingham effect, 187—188 n-thinning, 118

K

O

Karhunen-Love (KL) transform, 10 Kernel-boundary points, 132—135 K-H transformation, 92 k-means clustering, 36, 42

Octagonal roots, 292—304 Ordering function, 257, 284 Order statistics, 235 Orthogonal wavelets, 23

J

336

INDEX

Oscillating roots, 282—283 coded window ﬁlter, 285—287, 289— 292 solution of, 284—285 two-dimensional representation, 287— 288 Output selection function, 246

P Parallel thinning, 127—135 Perfect points, 129 Pixels, 96, 126, 160 Pixon approach, 310 Polygonal approximation, 40 Polyhedral surface, 110 without boundary, 110 Polynomial approximation, 38 Potential momentum, 64 Preprocessing techniques, 32—34 Propositional syntactic/language system (PCS), 312, 314—323 Pyramidal coding, 15—19 Pyramidal linking, 36—37

Q Quadrature mirror ﬁlter (QMF), 20 Quantiﬁcational syntactic/language system (QCS), 314, 330 Quantization/ordering of transform coefﬁcients, 8, 9, 13—14, 17, 24 Quantum theory ambiguities, 90—93

R Rank-order (RO) ﬁlters coded window ﬁlter, 285—287, 289— 292 commonality of, 234 median window ﬁlter, 233, 234 statistical analysis of, 235—241 Rank-order (RO) ﬁlters, language model based on automated object recognition deﬁned, 307—308, 310—311 conclusions, 331

consistency, completeness, and compactness, 315—317 convergence, 317, 318—319 exclusion negation, 317 ﬁnitary entailment, 314—323 logical connectives, 323—325 ontological considerations, 325—331 propositional syntactic/language system (PCS), 312, 314—323 translation/transform of imagery, 308—314 ultraﬁlter, 318—319, 324 Rank-order (RO) ﬁlters, mathematical logic approach to, 241—243 compatibility, 273—282 construction, 243—247 feature extraction, 264—268, 304—307 ﬁxed-point roots, catalog of, 282 ﬁxed-point roots, detection of 1D, 249—251 ﬁxed-point roots, grammar of, 253— 254, 268—282 ﬁxed-point roots, structure of 1D, 251—254 ﬁxed-point surfaces, characterization of, 271—273 hexagonal roots, 304 octagonal roots, 292—304 oscillating roots, 282—292 solutions, 247—254 three dimensional roots, 304—307 two-dimensional ﬁxed points, 255— 268 Recursive shortest spanning tree (RSST) algorithm, 43 Region adjacency graph (RAG), 44 Region growing, 34, 40—41 Regions, 96 Regular partition, 160 Repairing algorithm, 136—137 Repeated median ﬁlter (RMF), 234 Retina, 5 Ring effect, 172, 173, 210 Ringing, 20 Rods, 5

INDEX

S Sampling function, 244—247, 256—257 Scalability, 4 Scalar quantization (SQ), 24 Schro¨dinger equation, 74, 86—90, 91 Second-generation image coding conclusions, 46—49 development of, 3—4 Second-generation image coding, segmentation-based contour coding, 31, 39—40 fractal dimension, 37, 44—46 graph/tree theory, 37, 43—44 k-means clustering, 36 overview of, 31—32 preprocessing, 32—34 pyramidal linking, 36—37 region growing, 34, 40—41 split and merge, 35, 41—43 texture coding, 31, 38—39 Second-generation image coding, transform-based directional ﬁltering, 27—31 discrete cosine transform (DCT) coder, 10—15 edge detection, 26—27 multiscale/pyramidal, 15—19 optimum coder, 9—10 overview of, 8—9 wavelet, 19—26 Segmentation, digitization and, 146—149 Segmentation-based coding contour coding, 31, 39—40 fractal dimension, 37, 44—46 graph/tree theory, 37, 43—44 k-means clustering, 36 overview of, 31—32 preprocessing, 32—34 pyramidal linking, 36—37 region growing, 34, 40—41 split and merge, 35, 41—43 texture coding, 31, 38—39 Segmented digital image, 103 Sequential thinning, 117—123

337

Set partitioning in hierarchical trees (SPIHT), 26 Shape-adaptive DCT (SADCT), 38—39 Shape analysis, 96 Signal-to-noise ratio (SNR), 234, 235— 241 Skeleton, 118 Split and merge, 35, 41—43 Spontaneous current growth, 172, 173 Square grid, 146 Square subset digitization, 148 Steven’s Law, 33 Stokes’ theorem, 72 Subband coding (SBC), 19—21, 24 Subset/element digitization, 99 Surface elements, 107 Synthetic highs system, 29

T Texture coding, 31, 38—39 Thermal ﬁeld emission (TFE) broadening of emission-electron energy spectra in high electric ﬁelds, 184—185 current-voltage characteristics for, 169—170 electron quasi-lasers (EQLs), 228—231 of electrons from metal surfaces, 167—171 emission instability, 185 explosive breakdown, 172—176, 185 image technique, 178—182 inadequacy of concepts regarding high electric ﬁelds, 171—190 Nottingham effect, 187—188 Thermal ﬁeld emission model, nonstationary calculating surface concentration of emitter atoms, 198—202 conclusions, 221—225 current kinetics, 217—220 emitter surface-microplasma layer, processes at, 209—217

338

INDEX

Thermal ﬁeld emission model (Cont.) heating of emitter tip by emission current ﬂow, 191—195 instability of emitter tip surface during, 195—198 ionization probability of emitter atoms after evaporation, 207—209 motion of emitter atoms after evaporation, 202—207 Thinning algorithms, 97 parallel, 127—135 sequential, 117—123 Three dimensional roots, 304—307 3D well-composed sets. See Wellcomposed sets, 3D Thresholding, 154—157 histogram of checkerboard patterns, 157—159 Topology-preserving threshold, 158 Transform-based coding directional ﬁltering, 27—31 discrete cosine transform (DCT) coder, 10—15 edge detection, 26—27 multiscale/pyramidal, 15—19 optimum coder, 9—10 overview of, 8—9 wavelet, 19—26 Two-dimensional ﬁxed points, 255—268 2D well-composed sets. See Wellcomposed sets, 2D

V Vector potential, 63—66, 72—74 Vector quantization (VQ), 9, 24

W Wavelet transform-based coding, 19—26 Weber ratio, 7 Weber’s Law, 33

Well-composed sets adjacency relations, 101—102 applications, 95—96 deﬁnitions and basic properties, 98—103 development of, 96—97 digitization of, 142—153 generalizations, 159—161 histogram of checkerboard patterns, 157—159 thinning algorithms, 97 thresholding, 154—157 Well-composed sets, 3D adjacency relations, 106—109 boundary faces, properties of, 111—112 connected components, 112—113 digital characterization, 106—109 Jordan-Brouwer separation theorem, 100, 109—110 local properties of continuous analog, 103—106 Well-composed sets, 2D adjacency relations, 115 deﬁnitions and properties, 113—116 Euler characteristic, 117 irreducible, 118, 123—126 irreducible, graph structure of, 126—127 Jordan curve theorem, 96, 97, 113—114, 117 making a binary image wellcomposed, 135—139 making a gray-level image wellcomposed, 140—141 thinning, parallel, 127—135 thinning, sequential, 117—123 White point, 102 Wiedemann-Franc law, 194 Window function, 245 Writing function, 246

Advances in Imaging and Electron Physics (Volume 112) (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 148 (Advances in Imaging and Electron Physics) (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 127 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 132 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 143 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 120 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 121 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 111 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 102 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 113 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 144 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 128 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 125 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 113 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 150 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, volume 136 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 111 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 101 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 135 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 130 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 99 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 141 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 146 (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 123: Advances in Electron Microscopy and Diffraction (Advances in Imaging and Electron Physics)

Advances in Imaging and Electron Physics, Volume 134 (Advances in Imaging & Electron Physics)

Advances in Imaging & Electron Physics - Volume 100 Cumulative Index (Advances in Imaging and Electron Physics)

Advances in Imaging & Electron Physics, Volume 122 (Advances in Imaging and Electron Physics)