ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 112
a
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES/Laboratoire d’Optique E...
36 downloads
778 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 112
a
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES/Laboratoire d’Optique Electronique du Centre National de la Recherche Scientifique Toulouse, France
ASSOCIATE EDITORS
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California
TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdom
Advances in
Imaging and Electron Physics Edited by PETER W. HAWKES CEMES/Laboratoire d’Optique Electronique du Centre National de la Recherche Scientifique Toulouse, France
VOLUME 112
San Diego San Francisco New York Boston London Sydney Tokyo
This book is printed on acid-free paper. Copyright 2000 by Academic Press All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per-copy fee through the Copyright Clearance Center, Inc. (222 Rosewood Drive, Danvers, Massachusetts 01923), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-2000 chapters are as shown on the title pages: if no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/00 $35.00 Explicit permission from Academic Press is not required to reproduce a maximum of two figures or tables from an Academic Press article in another scientific or research publication provided that the material has not been credited to another source and that full credit to the Academic Press article is given. ACADEMIC PRESS A Harcourt Science and Technology Company 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.academicpress.com Academic Press Harcourt Place, 52 Jamestown Road, London, NW1 7BY, UK http://www.hbuk.co.uk/ap/ International Standard Serial Number: 1076-5670 International Standard Book Number: 0-12-014754-8 Printed in the United States of America 00 01 02 03 EB 9 8 7 6 5 4 3 2 1
CONTENTS
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . Forthcoming Contributions . . . . . . . . . . . . . . . . . . .
vii ix xi
Second-Generation Image Coding N. D. Black, R. J. Millar, M. Kunt, M. Reid, and F. Ziliani I. II. III. IV. V.
Introduction . . . . . . . . . . . . . Introduction to the Human Visual System Transform-Based Coding . . . . . . . Segmentation-Based Approaches . . . . Summary and Conclusions . . . . . . . References . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
1 4 8 31 46 50
. . . . . .
56 63 66 69 70 71
. . .
74 81 86
. .
90 93
. . . .
95 98 103 113
The Aharonov-Bohm Effect — A Second Opinion Walter C. Henneberger I. II. III. IV. V. VI. VII.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . The Vector Potential . . . . . . . . . . . . . . . . . . . . Dynamics of the Aharonov-Bohm Effect . . . . . . . . . . . . Momentum Conservation in the Aharonov-Bohm Effect . . . . . Stability of the AB Effect . . . . . . . . . . . . . . . . . . . The AB Effect Can Not Be Shielded . . . . . . . . . . . . . . Interaction of a Passing Classical Electron with a Rotating Quantum Cylinder . . . . . . . . . . . . . . . . . . . . . . . . . . VIII. Solution of the Entire Problem of the Closed System . . . . . . . IX. The Interior of the Solenoid . . . . . . . . . . . . . . . . . X. Ambiguity in Quantum Theory, Canonical Transformations, and a Program for Future Work . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .
Well-Composed Sets Longin Jan Latecki I. II. III. IV.
Introduction . . . . . . . . . . . . . . . . . . . Definition and Basic Properties of Well-Composed Sets 3D Well-Composed Sets . . . . . . . . . . . . . . 2D Well-Composed Sets . . . . . . . . . . . . . .
v
. . . .
. . . .
. . . .
. . . .
. . . .
vi
CONTENTS
V. Digitization and Well-Composed Images VI. Application: An Optimal Threshold . . VII. Generalizations . . . . . . . . . . References . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
142 154 159 161
. . .
165
. . .
167
. . . . .
191 221 225 225 228
Non-Stationary Thermal Field Emission V. E. Ptitsin I. Introduction . . . . . . . . . . . . . . . . . . . . II. Electron Emission and Thermal Field Processes Activated by High Electric Fields Acting on Metal Surfaces . . . . . . . III. Phenomenological Model of Non-Stationary Thermal Field Emission . . . . . . . . . . . . . . . . . . . . . . IV. Discussion and Conclusion . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
Theory of Ranked-Order Filters with Applications to Feature Extraction and Interpretive Transforms Bart Wilburn I. II. III. IV. V.
Index
Introduction . . . . . . . . . . . . . . . . . . Statistical Approach to Ranked-Order Filters . . . . . Mathematical Logic Approach to Ranked-Order Filters . A Language Model Based on Ranked-Order Filters . . Conclusions . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
233 235 241 307 331 332
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
333
CONTRIBUTORS Numbers in parentheses indicate the pages on which the author’s contribution begins.
N. D. Black (1), Information & Software Engineering, University of Ulster, Northern Ireland Walter C. Henneberger (56), Department of Physics, Southern Illinois University, Carbondale, IL 62901-4401 M. Kunt (1), Swiss Federal Institute of Technology, Lausanne, Switzerland Longin Jan Latecki (95), Department of Applied Mathematics, University of Hamburg, Bundesstr. 55, 20146 Hamburg R. J. Millar (1), Information & Software Engineering, University of Ulster, Northern Ireland V. E. Ptitsin (165), Institute for Analytical Instrumentation RAS, Rizhskij Prospekt 26, 198103, St. Petersburg, Russia M. Reid (1), Kainos Software Ltd., Belfast, Northern Ireland Bart Wilburn (233), University of Arizona, Optical Sciences Center, Tucson, Arizona F. Ziliani (1), Swiss Federal Institute of Technology, Lausanne, Switzerland
vii
a This Page Intentionally Left Blank
PREFACE The transmission of digital images is by now a commonplace, although most of us have encountered innumerable obstacles and difficulties in practice. For transmission, images need to be compressed and a family of techniques for doing this efficiently has been developed. Sophisticated though these are, the transmitted image may be found imperfect, particularly if it represents everyday objects, with which the eye is familiar. For such reasons as these, a new generation of image coding techniques is being developed, which satisfy to a greater extent the expectations of the visual system. It is these ‘‘second-generation’’ coding methods that form the subject of the first chapter in this volume, by N.D. Black and R.J. Millar of the University of Ulster, F. Ziliani and M.Kunt (who introduced these secondgeneration approaches) of the EPFL in Lausanne and M. Reid of Kainos Software Ltd. The Aharonov-Bohm effect, discovered in a semi-classical form by W. Ehrenberg and R. E. Siday nearly a decade before the seminal paper of Y. Aharonov and D. Bohm, has a huge literature and has been at the heart of innumerable disputes and polemics. The existence of the effect is no longer in doubt, thanks to the conclusive experiments of A. Tonomura, but there is still argument about the correct way of analyzing it. The difficulty concerns the scattering treatment of the phenomenon and it is here that W. C. Henneberger, who has written numerous thought-provoking papers on the subject, departs from the widely accepted canon. I have no doubt that the argument will continue but I am delighted to include this carefully reasoned alternative opinion in these pages. The third contribution is concerned with one of the theoretical problems of analyzing digital images that continues to be a source of nuisance, if nothing worse. It is well known that in order to avoid paradoxes, it is necessary to use different adjacency relations in different areas of images, which is obviously inconvenient and intellectually unsatisfying. L. J. Latecki has introduced the idea of well-composed sets into binary image studies, precisely in order to prevent such paradoxes from arising, and this very readable account of his ideas will, I am sure, be found most helpful. The quest for ever brighter electron sources, notably for electron lithography, is in a lively phase and the fourth chapter describes an unusual approach, non-stationary field emission, that is currently under investigation. In addition to the intrinsic interest and scientific relevance of the subject, this chapter has the additional merit of making better known the ix
x
PREFACE
Russian work in this area; despite the fact that the principal Russian serials are available in English translation, their contents are often less well-known than they might be. V. E. Ptitisin first describes the physical processes that occur when high electric fields are applied to metal surfaces, then presents in detail a phenomenological model of the non-stationary effects that are at the origin of the desirable emissive properties of the associated sources. We conclude with an extended discussion by B. Wilburn of the theory of ranked order filters and of their applications for feature extraction and even for artificial intelligence. These filters, of which the median filter is the best known, remained for many years somewhat mysterious, their attractive features were known experimentally but the underlying theory remained obscure. Now, however, the reasons for their performance are better understood and formal analyses of their behavior have been made. The fascinating relation between them and the constructs of mathematical morphology is likewise now understood. B. Wilburn not only presents the theory, both statistical and logical, very fully and clearly but also includes some new findings, which have not yet been published elsewhere. I am particularly pleased that he agreed to include this material in these Advances. I thank all our contributors, in particular for all their efforts to ensure that their contributions are accessible to readers who are not specialists in the same area and present a list of material to appear in future volumes. Peter Hawkes
FORTHCOMING CONTRIBUTIONS D. Antzoulatos Use of the hypermatrix N. Bonnet (vol. 114) Artificial intelligence and pattern recognition in microscope image processing G. Borgefors Distance transforms A. van den Bos and A. Dekker Resolution P. G. Casazza (vol. 115) Frames J. A. Dayton Microwave tubes in space E. R. Dougherty and Y. Chen Granulometries J. M. H. Du Buf Gabor filters and texture analysis G. Evangelista Dyadic warped wavelets R. G. Forbes Liquid metal ion sources E. Fo¨rster and F. N. Chukhovsky X-ray optics A. Fox The critical-voltage effect M. I. Herrera The development of electron microscopy in Spain K. Ishizuka Contrast transfer and crystal images xi
xii
FORTHCOMING CONTRIBUTIONS
C. Jeffries Conservation laws in electromagnetics I. P. Jones ALCHEMI M. Jourlin and J.-C. Pinoli (vol. 115) Logarithmic image processing E. Kasper Numerical methods in particle optics A. Khursheed (vol. 115) Scanning electron microscope design G. Ko¨gel Positron microscopy W. Krakow Sideband imaging D. J. J. van de Laak-Tijssen and T. Mulvey (vol. 115) Memoir of J.B. Le Poole C. Mattiussi (vol. 113) The finite volume, finite element and finite difference methods J. C. McGowan Magnetic transfer imaging S. Mikoshiba and F. L. Curzon Plasma displays S. A. Nepijko, N. N. Sedov and G. Scho¨nhense (vol. 113) Photoemission microscopy of magnetic materials P. D. Nellist and S. J. Pennycook (vol. 113) Z-contrast in the STEM and its applications K. A. Nugent, A. Barty and D. Paganin Non-interferometric propagation-based techniques E. Oesterschulze Scanning tunnelling microscopy M. A. O’Keefe Electron image simulation J. C. Paredes and G. R. Arce Stack filtering and smoothing
FORTHCOMING CONTRIBUTIONS
xiii
C. Passow Geometric methods of treating energy transport phenomena E. Petajan HDTV F. A. Ponce Nitride semiconductors for high-brightness blue and green light emission J. W. Rabalais Scattering and recoil imaging and spectrometry H. Rauch The wave-particle dualism G. Schmahl X-ray microscopy J. P. F. Sellschop Accelerator mass spectroscopy S. Shirai CRT gun design methods T. Soma Focus-deflection systems and their applications I. Talmon Study of complex fluids by transmission electron microscopy I. R. Terol-Villalobos Morphological image enhancement and segmentation R. Tolimieri, M. An and A. Brodzik Hyperspectral imaging A. Tonazzini and L. Bedini Image restoration J. Toulouse New developments in ferroelectrics T. Tsutsui and Z. Dechun Organic electroluminescence, materials and devices Y. Uchikawa Electron gun optics D. van Dyck Very high resolution electron microscopy
xiv
FORTHCOMING CONTRIBUTIONS
L. Vincent Morphology on graphs N. White (vol. 113) Multi-photon microscopy C. D. Wright and E. W. Hill Magnetic force microscopy T. Yang (vol. 114) Cellular Neural Networks
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 112
a This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 112
Second-Generation Image Coding N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, and F. ZILIANI Information & Software Engineering, University of Ulster, Northern Ireland Computing & Mathematical Sciences, University of Ulster, Northern Ireland Swiss Federal Institute of Technology, Lausanne, Switzerland Kainos Software Ltd, Belfast, Northern Ireland
I. Introduction . . . . . . . . . . . . . . II. Introduction to the Human Visual System . . III. Transform-Based Coding . . . . . . . . . A. Overview . . . . . . . . . . . . . . B. The Optimum Transform Coder . . . . C. Discrete Cosine Transform Coder . . . . D. Multiscale/Pyramidal Approaches . . . E. Wavelet-Based Approach . . . . . . . F. Edge Detection . . . . . . . . . . . . G. Directional Filtering . . . . . . . . . IV. Segmentation-Based Approaches . . . . . . A. Overview . . . . . . . . . . . . . . B. Preprocessing . . . . . . . . . . . . C. Segmentation Techniques: Brief Overview D. Texture Coding . . . . . . . . . . . E. Contours Coding . . . . . . . . . . . F. Region-Growing Techniques . . . . . . G. Split-and-Merge-Based Techniques . . . H. Tree/Graph-Based Techniques . . . . . I. Fractal-Based Techniques . . . . . . . V. Summary and Conclusions . . . . . . . . References . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
1 4 8 8 9 10 15 19 26 27 31 31 32 34 38 39 40 41 43 44 46 50
I. Introduction The thirst for digital signal compression has grown over the last few decades, largely as a result of the demand for consumer products, such as digital TV, commercial tools, such as visual inspection systems and video conferencing, as well as for medical applications. As a result a number of ‘‘standards’’ have emerged that are in wide-spread use today, and which exploit some aspect of the particular image they are used on to achieve reasonable compression rates. One such standard is M-JPEG, which was originally developed for the compression of video images. It does this by 1 Volume 112 ISBN 0-12-014754-8
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright 2000 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/00 $35.00
2
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
treating each image as a separate still picture. It works by taking ‘‘blocks’’ of picture elements and processing them using a mathematical technique known as the discrete cosine transform (DCT), resulting in a set of digital data representing particular aspects of the original image. These data are then subject to ‘‘lossless’’ compression to further reduce the size before transmission. The technique is very effective but, as one might expect, affects the resulting image quality to a certain degree. At high data rates for example, the process has the effect of enhancing picture contrast whereas at low data rates the process introduces ‘‘blocking’’ effects, which deteriorate the picture quality. Successive compression techniques often build upon previous designs, as is the case with the MPEG standard which encodes images rather like the M-JPEG standard but transmits information on the differences between successive image frames. In this way improved compression ratios can sometimes be achieved. The gain in compression is often at the expense of some other feature such as quality. The MPEG standard, for example, offers higher compression than M-JPEG but produces a recovered picture that is not only less sharp but introduces significant delays. The International Telecommunications Union (ITU) has defined a number of standards relating to digital video compression, all of which use the H.261 compression standard. This technique is specifically designed for low bandwidth channels and, as a result, does not produce images which could be considered of TV quality. Currently, the best compression techniques can produce about a 20:1 compression if the picture quality is not to be compromised. These ‘‘standards’’ are all based upon the so-called ‘‘First-Generation’’ coding techniques. All exploit temporal correlation through block-based motion estimation and compensation techniques, whereas they apply frequency transformation techniques (mainly discrete cosine transform, DCT) to reduce spatial redundancy. There is a high degree of sophistication in these techniques and a number of optimization procedures have been introduced that further improve their performances. However, the limits of these approaches have been reached and further optimizations are unlikely to result in drastically improved performance. First generation coding schemes are based on classical information theory (Hoffman, 1952; Golomb, 1966; Welch, 1977) and are designed to reduce the statistical redundancies present in the image data. These schemes exploit spatial and temporal redundancies in the video sequence at a pixel level or at a fixed-size, block of pixels level. The various different schemes attempt to achieve the least possible coding rate for a given image distortion, and/or to minimize the distortion for a given bit rate. The compression ratios obtained with first generation lossless techniques are moderate at around
SECOND-GENERATION IMAGE CODING
3
2:1. With lossy techniques a higher ratio (greater than 30:1) can be achieved but at the expense of image quality. The distortion introduced by the coding scheme is generally measured in terms of mean square error (MSE) between the original image and its reconstructed version. Although MSE is a simple measure of distortion that is easy to compute, it is limited in characterizing the perceptual level of degradation of an image. New image quality measures are necessary to second-generation image coding techniques. These will be introduced and discussed later in this paper. Second-generation image coding was first introduced by Kunt et al. (1985). The work stimulated new research aimed at further improvement in the compression ratios compared with those that were produced using existing coding strategies whose performances have now reached saturation level. The main limitation of first-generation schemes compared with the second-generation approach is that first-generation schemes do not directly take into account characteristics of the human visual system (HVS) and hence the way in which images are perceived by humans. In particular, first generation coding schemes ignore the semantic content of the image, simply partitioning each frame into artificial blocks. It is this that is responsible for the generation of strong visible degradation referred to as blocking artifacts, because a block can cover spatial/temporal nonhomogeneous data belonging to different entities (objects) in the scene. Block partitioning results in a reduced exploitation of spatial and temporal redundancies. In contrast, instead of limiting the image coding on a rigid block-based grid, second-generation approaches attempt to break down the original image into visually meaningful subcomponents. These subcomponents may be defined as image features and include edges, contours, and textures. These are known to represent the most relevant information to enable the HVS to interpret the scene content (Cornsweet, 1970; Jain, 1989; Rosenfeld and Kak, 1982) and need to be preserved as much as possible to guarantee good perceptual quality of the compressed images. Second-generation coding techniques minimize the loss in terms of human perception so that when decoded, the reconstructed image does not appear to be different from the original. For second-generation schemes, therefore, MSE is not sufficient as a measure of quality and a new criterion is required to correctly estimate the distortion introduced. Alternative to edges, contours and textures, the scene may be represented as a set of homogeneous regions or objects. This representation offers several advantages. First, each object is likely to present a high spatial and temporal correlation, improving the efficiency of the compression schemes much beyond the limits imposed by a block-based representation. Second, a description of the scene in terms of objects give access to a variety of
4
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
functions. For example, it is possible to assign the priority of each object and to distribute, accordingly, the available bit rate for the single frame. This functionality, referred to as scalability, enhances the quality of the objects of interest compared to those regions with less importance in the scene. Similarly, it is possible to apply for each object, according to its properties, the corresponding optimum coding strategy. This concept, referred to as dynamic coding (Ebrahimi et al., 1995), may further optimize the overall performances of the coding scheme. As suggested in Kunt (1998), future multimedia systems will strongly exploit all of these new fuctions, indeed some have already been introduced in the new video coding standard MPEG-4 (Ebrahimi, 1997). This chapter is organized into four main sections. In Section II a brief introduction to the human visual system is given in which characteristics that may be exploited in compression systems are discussed. Throughout the text, and where it is appropriate, additional and more specific material is referenced. Sections III and IV present the main body of the chapter and include transform-based techniques and segmentation-based techniques, respectively. Finally, we offer a Summary and Conclusions in Section V.
II. Introduction to the Human Visual System We consider essentially two techniques for the coding of image information: transformation and segmentation. These techniques are essentially signal processing strategies, much of which can be designed to exploit aspects of the human visual system (HVS) in order to gain coding efficiency. Part of the process of imaging involves extracting certain elements or features from an image and presenting them to an observer in a way that matches their perceptual abilities or characteristics. In the case of a human observer, there are a number of sensitivities, such as amplitude, spatial frequency and image content, that can be exploited as a means of improving the efficiency of image compression. In this section, we shall introduce the reader to the HVS by identifying some of its basic features, which can be exploited fruitfully in coding strategies. We shall consider quantitatively those aspects of the HVS that are generally important in the imaging process. More explicit information on the HVS, as it relates to specific algorithms described in the text, is referenced throughout the text. The HVS is part of the nervous system and, as such, presents a complex array of highly specialized organs and biological structures. Like all other biological organs, the eye, which forms the input to the HVS, consists of highly specialized cells that react with specified yet limited functionality to
SECOND-GENERATION IMAGE CODING
5
input stimuli in the form of light. The quantitative measure of light power is luminance, measured in units of candela per meter squared (cd/m). The luminance of a sheet of white paper reflecting in bright sunlight is about 30,000 cd/m and that in dim moonlight is around 0.03 cd/m; a comfortable reading level is a page that radiates about 30 cd/m. As can be seen from these examples, the dynamic range of the HVS, defined as the range between luminance values so low as to make the image just discernible and those at which an increase in luminance makes no difference to the perception, is very large and is, in fact, in the order of 80 dB. From a physical perspective, light enters the eye through the pupil, which varies in diameter between 2—9 mm, and is focused onto the imaging retina by the lens. Imperfections in the lens can be modeled by a two-dimensional (2D) lowpass filter while the pupil can be modeled as a lowpass filter whose cut-off frequency decreases with enlargement. The retina contains the neurosensory cells that transform incoming light into neural impulses, which are then transmitted to the brain and image perception occurs. It has two types of cells, cones and the slightly more sensitive rods. Both are responsible for converting the incoming light into electrical signals while compressing its dynamic range. The compression is made according to a nonlinear law of the form B : LA where B represents brightness and L represents luminance. Both sensitivity and resolution characteristics of the HVS are largely determined by the retina. Its center is an area known as the fovea, which consists mainly of cones separated by distances large enough to facilitate grating resolutions of up to 50 cycles/degree of subtended angle. The spatial contrast sensitivity of the retina depends on spatial frequency and varies according to luminance levels. Figure 1, derived from Pearson, shows the results of this sensitivity at two luminance levels, 0.05 cd/m and 500 cd/m. The existence of the peaks in Fig. 1 illustrates the important ability of the HVS in identifying sharp boundaries and further its limitations in identifying gradually changing boundaries. The practical effect of Figure 1 is that the HVS is very adept at identification of distinct changes in boundary, grayscales, or color, but important detail can easily be missed if the changes are more gradual. The rods and cones are interconnected in complex arrangements and this leads to a number of perceptual characteristics, such as lateral inhibition. Cells that are activated as a result of stimulation can be inhibited from firing by other activated cells that are in close proximity. The effect of this is to produce an essentially high-pass response, which is limited to below a radial spatial frequency of approximately 10 cycles/degree of solid angle, beyond which integration takes place. The combined result of lateral inhibition and the previously described processes make this part of the HVS behave as a linear model with a bandpass frequency response.
6
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
Figure 1. Typical contrast sensitivity of the eye for sine-wave gratings. Evidently the perception of fine detail is dependent on luminance level; this has practical implications, for example on the choice between positive and negative modulation for the display of particular types of image.
The majority of the early work on vision research used the frequency sensitivity of the HVS as described by the modulation transfer function (MTF). This characterizes the degree to which the system or process can image information at any spatial frequency. The MTF is defined for any stage of an imaging system as the ratio of the amplitudes of the imaged and the original set of spatial sine-waves representing the object which is being imaged, plotted as a function of sine-wave frequency. Experiments by Manos and Sakrison (1974) and Cornsweet (1970) propose a now commonly used model for this function, which relates the sensitivity of the eye to sine-wave gratings at various frequencies. Several authors have made use of these properties (Jangard Rajola, 1990, 1991; Civinlar et al., 1986) in the preprocessing strategies, particularly when employing segmentation where images are preprocessed to take account of the HVS’s greater sensitivity to gradient changes in intensity and threshold boundaries (Civinlar et al., 1986; Marque´s et al., 1991).
SECOND-GENERATION IMAGE CODING
7
Figure 2. Dependence on threshold contrast .L /L (the Weber ratio) on size of an observed circular disc object, for two levels of background luminance, with zero noise and a 6 s viewing time. The effect of added noise and/or shorter viewing time will generally be to increase threshold contrast relative to the levels indicated here (after Blackwell (1996)).
A considerable amount of research work has been carried out to determine the eye’s capability in contrast resolution (the interested reader is referred to Haber and Hershenson, 1973, for detailed information). Contrast resolution threshold is given as .L /L where L is the luminance level of a given image and .L is the difference in luminance level that is just noticeable to an observer. The ratio, known as the Weber ratio, is a function of light falling on the retina and can vary considerably. Under ideal conditions, plots of Weber’s ratio indicate that the eye is remarkably efficient at contrast resolution between small differences in grayscale level. The response of the HVS is a function of both spatial and temporal frequency and is shown in Fig. 2. Measurements by Blackwell (1996) on the joint spatiotemporal sensitivity indicate that this joint sensitivity is not
8
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
separable. As shown in Figure 2, distinct peaks in individual bandpass characteristics appear in both cases. In the following sections of this paper, algorithms derived to facilitate the coding of images will exploit aspects of the HVS with the intention of improving coding efficiency. For lossless transformations the efficiency does not always result in significant compression but the tasks in the coding sequence such as quantization and ordering are made easier if the properties of the HVS can be properly exploited.
III. Transform-Based Coding A. Overview This section introduces and describes some of the most frequently used coding techniques, based on image transformation. Basically it consists of two successive steps: image decomposition/transformation; and quantization/ordering of the transform coefficients. The general structure of all of these techniques is summarized in Figure 3. The basic idea exploited by these techniques is to find a more compact representation of the image content. This is initially achieved by applying a decomposition/transformation step; different decompositions/transformations can be applied. Most transformation techniques (discrete cosine transform, pyramidal decompo-
Figure 3. A generic transform coding scheme. The image is first transformed according to the chosen decomposition/transformation function. Then a quantization step, eventually followed by a reordering, provides a series of significant coefficients that will be converted into a bit-stream after a bit assignment step.
SECOND-GENERATION IMAGE CODING
9
sition, wavelet decomposition, etc.) distinguish low frequency contributions from high frequency contributions. This is a first approximation of what happens in the HVS as was described in Section II. More accurate transformations from an HVS model point of view are applied in the directional-filtering-based techniques (see Section III.G). Here frequency responses in some preferred spatial directions are used to describe the image content. Generally all of these transformations are lossless, thus they do not necessarily achieve a significant compression of the image. However, the resultant transformed image has the property of highlighting the features that are significant in the HVS model, thus easing the task of quantizing and ordering the obtained coefficients according to their visual importance. The real compression step is obtained in the quantization/ordering step and the following entropy-based coding. Here the continuous transform coefficients are first projected into a finite set of symbols each representing a good approximation of the coefficient values. Several quantization methods are possible from the simplest uniform quantization to the more complex vector quantization (VQ). A reordering of nonzero coefficients is generally performed after the quantization step. This is done to better exploit the statistical occurrence of nonzero coefficients to improve the performances of the following entropic coding. In a second generation coding framework, this ordering step is also responsible for deciding which coefficients are really significant and which can be discarded with minimum visual distortion. The criteria used to perform this choice are based on the HVS model properties and they balance the compromise between quality of the final image and compression ratio. The next section will review some of the most popular coding techniques that belong to this class, identifying the properties that make them second generation coding techniques. First a brief introduction on general distortion criteria is presented in Section III.B in order to define the optimum transform coder. Then the discrete cosine transform will be discussed in Section III.C. Multiscale and pyramidal approaches are introduced in Section III.D and wavelet-based approaches are discussed in Section III.E. Finally, techniques that make use of extensive edge information will be reviewed in the last two sections, III.F and III.G.
B. The Optimum Transform Coder In a transform-based coding scheme, the first step consists in transforming pixel values to exploit redundancies and to improve the compression rates
10
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
of successive entropy encoding. Once the optimality criterion is defined, it is possible to find an optimum transform for that particular criterion. In the framework of image coding, the most commonly used optimality criterion is defined in terms of mean square distortion (MSD), also referred to as mean square error (MSE) between the reconstructed and original images. For such a criterion, it has been shown that an optimum transform exists (Schalkoff, 1989; Burt and Adelson, 1983) in the Karhunen-Loe`ve (KL) transform (Karhunen, 1947; Loe`ve, 1948). The KL transform depends on critical factors such as the second-order statistics as well as the size of the image. Due to these dependencies, the basis vectors are not known analytically and their definition requires heavy computation. As a result, the practical use of the KL transform in image coding applications is very limited. Although Jain (1976) proposed a fast algorithm to compute the KL transform, his method is limited to a specific class of image models and thus is not suitable for a general coding system. Fortunately, a good approximation of the KL transform that does not suffer from complexity problems exists as the discrete cosine transform presented in Section III.C. It is important to note that from an HVS point of view, the MSE criterion is not necessarily optimal. Other methods have been considered and are currently under investigation for measuring the visual distortion introduced by a coding system (van den Branden Lambrecht, 1996; Winkler, 1998; Miyahara et al., 1992). They take into account the properties of the HVS in order to define a visual distance between the original and the coded image and thus to assess the image quality of that particular compression (Mannos and Sakrison, 1974). These investigations have already shown improvements in standard coding systems (Westen et al., 1946; Osberger et al., 1946). Future research in this direction may provide more efficient criteria and the corresponding new, optimum, transforms that could improve the compression-ratio without loss in visual image quality.
C. Discrete Cosine Transform Coder The discrete cosine transform (DCT) coder is one of the most used in digital image and video coding. Most of the standards available today, from JPEG to the latest MPEG-4, are based on this technique to perform compression. This is due to the good compromise between computational complexity and coding performance that the DCT is able to offer. A general scheme for a coding system based on DCT is presented in Fig. 4. The first step is represented by a Block Partitioning of the image. This is divided into N;N pixel blocks f [x, y], where N needs to be defined.
SECOND-GENERATION IMAGE CODING
Figure 4. A generic scheme for DCT coding. First the image is divided into 8 ; 8 blocks of pixels. Each block is then transformed using the DCT. A quantization step performs the real compression of the data. Finally a zig-zag scanning from the DC component to the AC components is performed.
11
12
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
Typical values for N are 8 or 16. A larger block size may lead to more efficient coding, as the transform may access higher correlated data from the image; however, larger blocks also increase the computational cost of the transform, as will be explained. Better compression efficiency can be achieved by using a combination of blocks of different shapes as suggested by Dinstein et al. (1990). Clearly, this method increases the overall complexity of the coding scheme. In the international standard JPEG, N has been chosen equal to eight; thus the following examples will use the same value. Once the Block Partitioning step is performed each block is coded independently from the others by applying the DCT. The result of this step is a block, F[u, v], of N;N transformed coefficients. Theoretically, the 2D DCT block, F[u, v], of the N;N image block, f [x, y], is defined according to the following formula: F[u, v] :
2 (2x ; 1)u (2y ; 1)v \ \ C(u)C(v) f [x, y] cos cos N 2N 2N
where
1
C(z) : (2 1
, z:0
for z : u and z : v. Its inverse transform is then given by
2 \ \ (2x ; 1)u (2y ; 1)v cos C(u)C(v)F[u, v] cos N 2N 2N with C(u) and C(v) defined as before. Intuitively, DCT coefficients represent the spatial frequency components of the image block. Each coefficient is a weight that is applied to an appropriate basis function. In Fig. 5 we display the basis functions for an 8 ; 8 DCT block. The DCT has some very interesting properties. First, both forward and inverse DCT are separable. Thus, instead of computing the 2D transform, it is possible to apply a one-dimensional (1D) transform along all the rows of the block and then down the columns of the block. This reduces the number of operations to be performed. As an example, a 1D 8-point DCT will require an upper limit of 64 multiplications and 56 additions. A 2D 8 ; 8-point DCT considered as a set of 8 rows and 8 columns would require 1024 multiplications and 896 additions. Secondly, it can be observed that the transform kernel is a real function. In coding, this is an interesting property because only the real part for each f [x, y] :
SECOND-GENERATION IMAGE CODING
13
Figure 5. The basis function for an 8 ; 8 DCT block.
transform coefficient needs to be coded. This is not necessarily true for other transformations. Finally, fast transform techniques (Chen et al., 1977; Narasimha and Peterson, 1978) that take advantage of the symmetries in the DCT equation can further reduce the computational complexity for this technique. For example, the cosine transform for an N;1 vector can be performed in O(N log N) operations via an N-point FFT. Computational efficiency is not the only important feature of the DCT transform. Of primary importance is its role in performing a good energy compaction of the transform coefficients. The DCT also performs this well; it is verified that in practice the DCT tends towards the optimal KL transform for highly correlated signals such as natural images that can be modeled by a 1st order Markov process (Caglar et al., 1993). Once the DCT has been computed, it is necessary to perform a quantization of the DCT coefficients. At this point of the scheme, no compression has been performed; the quantization procedure will introduce it. Quantization is an important step in coding and, again, it can be performed in several ways. Its main role is to minimize the average distortion introduced by fixing the desired entropy rate (Burt and Adelson, 1993). In practice, it makes more values look the same, so that the subsequent entropy-based coding can improve its performance while coding the DCT coefficients.
14
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI TABLE 1 Example of JPEG Quantization Table 16
11
10
16
24
40
51
61
12
12
14
19
26
58
60
55
14
13
16
24
40
57
69
56
14
17
22
29
51
87
80
62
18
22
37
56
68
109
103
77
24
35
55
64
81
104
113
92
49
64
78
87
103
121
120
101
72
92
95
98
112
100
103
99
As discussed in Section III.B a distortion measure can be defined, either in the sense of an MSE or, in the more interesting, HVS sense. State-of-theart coding techniques are in general based on the former. In this context, it appears that the uniform quantizer is optimal or quasi-optimum for most of the cases (Burt and Adelson, 1993). A quantizer is said to be uniform when the same distance uniformly separates all the quantization thresholds. The simplicity of this method and its optimal performance are the reasons why the uniform quantizer is so widely used in coding schemes and in standards such as JPEG and MPEG. In particular, in these standards, a quantification table is constructed to define a quantization step for each DCT coefficient. Table 1 shows the quantization table used in the JPEG standard. Each DCT coefficient is divided by the corresponding quantization step value defining dynamically their influence. From a perceptual point of view the MSE optimality criterion is not relevant. Therefore in second generation coding techniques other techniques are proposed such as the one described by Macq (1989) and van den Branden (1996). After the quantization is performed, a reordering of the DCT coefficients in a zig-zag scanning order represents the successive step. This procedure starts parsing the coefficient from the upper left position, which represents the DC coefficient, to the lower right position of the DCT block, which represents the highest frequency, AC, coefficient. The exact order is represented in detail in Fig. 4.
SECOND-GENERATION IMAGE CODING
15
The zig-zag reordering is justified by a hypothesis on the knowledge of both natural images and HVS properties. In fact it is known that most of the energy in a natural image is concentrated in the DC components, where AC components are both less likely to occur and less important from a visual point of view. This is the reason why we can expect that most of the nonzero coefficients will be concentrated in the DC components. Ordering the coefficients in such a way that all zero or small coefficients are concentrated at the end improves the performance of entropy-based coding techniques by generating a distribution as far as possible from the uniform distribution. In Fig. 6 the results obtained by applying a JPEG-compliant DCT-based compression scheme on the L ena image are shown. The image is in a QCIF format (176;144 pixels) and is shown in both color and black and white format. Four different visual quality results are represented. Each corresponds to a different compression ratio as indicated. The higher the compression ratio, the worse the visual quality, as might be expected. Blocking artifacts are evident at very low bit rates, highly degrading the image quality.
D. Multiscale/Pyramidal Approaches Multiscale and pyramidal coding techniques represent an alternative to the block quantization approach based on DCT (see Section III.C). Both approaches perform a transformation and a filtering of the image in order to compact the energy and improve the coding performances. However, multiscale/pyramidal coding techniques operate on the whole original image instead of operating on limited dimension blocks. In particular, the image is filtered and subsampled in order to produce various levels of image detail at progressively smaller details. An interesting property of this approach, when compared with the DCT, is the possibility for progressive transmission of the image, as will be described later. Moreover, the fact that no blocks are introduced avoids the generation of blocking artifacts, which represents one of the most annoying drawbacks in the DCT-based coding techniques. Multiresolution approaches have recently been of great interest to the video coding research community. From a complexity point of view, the approach of coding an image through successive approximation is often very efficient. From a theoretical point of view, it is possible to discover amazing similarities with the HVS models. In fact, experimental results have shown that the HVS uses a multiresolution approach (Schalkoff, 1989) in completing its tasks. Researches suggest that multifrequency channel decomposition seems to take
Figure 6. Visual performances at different bit rates of a JPEG compliant, DCT-based compression scheme. The top two images are the original images. The following represent the results obtained with different bit rates.
16
SECOND-GENERATION IMAGE CODING
17
place in the human visual cortex, thereby causing the visual information to be processed separately according to its frequency band (Wandell, 1995). Similarly, the retina is decomposed into several frequency band sensors with uniform bandwidth on an octave scale. All of these considerations justify the keen interest shown by the researchers in this direction. In 1983, Burt and Adelson (1983) presented a coding technique based on the Laplacian pyramid. In this approach a lowpass filtering of the original image is performed as first step. This is obtained by applying a weighted average function (H ). Next a down-sampling of the image is performed. These two steps are repeated in order to produce progressively smaller images, in both spatial intensity and dimension. All together, the results of these transformations represent the Gaussian Pyramid represented in Figure 7 by the three top images G , G , and G . Each level in the Gaussian Pyramid, starting from the lowest, smallest level, is interpolated to the size of its predecessor in order to produce the Laplacian Pyramid. In terms of the coding, it is the Laplacian Pyramid, instead of the image itself, which is coded. As in the DCT-based coding method, the original image has been transformed to a specific structure in which each level of the pyramid has a different visual importance. The smallest level of the Gaussian Pyramid represents the roughest representation of the image. If greater quality is required, then successive levels of the Laplacian Pyramid need to be added. If the complete Laplacian Pyramid is available, a perfect reconstruction of the image is possible through the process of adding with appropriate interpolation all the different levels from the smallest resolution to the highest. This structure makes a progressive transmission particularly simple. As in the DCT-based approach, the real coding process is represented by the successive step: the quantization of each level of the pyramid. Again, a uniform quantization is the technique preferred by the authors. They achieve this by simply dividing the range of pixel values into bins of a set width: quantization then occurs by representing each pixel value that occurs within the bin by the bin centroid. Different compression ratios can be achieved by increasing or decreasing the amount of quantization. As before, there is a trade-off between high compression and visual quality. Burt and Adelson (1983) attempted to exploit areas in the image that are largely similar. These similar areas appear at various resolutions, hence, when the subsampled image is expanded and subtracted from the image at the next higher resolution, the difference image (L ) contains large areas of zero, indicating commonality between the two images. These areas can be noticed in Fig. 7 as the dark zones in L and L . The larger the degree of commonality, the greater the amount of zero areas in the difference
18 N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
Figure 7. The scheme of the Gaussian Pyramid approach proposed by Burt and Adelson (1983). G is the original image to be coded. A first lowpass filtering, followed by a down-sampling, generates G , which is a smaller version both in spatial intensity and dimension of G . Iterating this process a certain number of times builds the Gaussian Pyramid represented here by G , G , and G . This is used to generate the Laplacian Pyramid, which is the one to be coded. L and L represent in this scheme two levels of the Laplacian Pyramid.
SECOND-GENERATION IMAGE CODING
19
images. Standard first-generation coding methods can then be applied to the difference images to produce good compression ratios. With very good image quality, compression ratios of the order of 10:1 are achievable. Other techniques exist that are based on a similar pyramidal approach. Of particular interest in a second generation coding context are those based on mathematical morphology (Salembier and Kunt, 1992; Zhou and Venetsanopoulos, 1992; Toet, 1989). These techniques provide an analysis of the image based on object shapes and sizes; thus, they include features that are relevant to the HVS. The advantage of these techniques is that they do not suffer from ringing effects (cf. Section III.G) even under heavy quantization. However, these techniques produce residual images still with large entropy, thus not efficiently compressible through first generation coding schemes. Moreover, the residual images obtained are the same size as the original image. These drawbacks do not allow for a practical application of these techniques in image coding. A more detailed discussion on these techniques will be presented in Section III.G.
E. Wavelet-Based Approach Although the wavelet transform-based coding approach is a generalization of multiscale/pyramidal approaches, it deserves to be treated separately. The enormous success it has obtained in the image coding research community and its particular compatibility with the second generation coding philosophy provide the rationale for a more extensive discussion of this category in this overview. Moreover, the future standard for still image compression, JPEG2000, will be based on the wavelet coding system. The wavelet transform represents the most commonly used transform in the current domain of research: subband coding (SBC). The idea is similar to that for the Gaussian Pyramid already described here, but much more general. Using the wavelet tranforms, instead of computing only a lowpass version of the original image, a complete set of subbands is computed by filtering the input image with a set of bandpass filters. In this way, each subband directly represents a particular frequency range of the image spectrum. A primary advantage of this transform is that it does not increase the number of samples over that of the original image, whereas pyramidal decompositions do. Moreover, wavelet-based techniques are able to efficiently conserve important perceptual information like edges, even if their energy contribution to the entire image is low. Other transform coders, like the one based on DCT, decompose images into representations where each coefficient corresponds to a fixed-size spatial area and frequency band. Edge
20
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
information would require many nonzero coefficients to represent them sufficiently. At low bit rates other transform coders allocate too many bits to signal behavior that is more localized in the time or space domain and not enough bits to edges. Wavelet techniques therefore offer benefits at low bit rates since information at all scales is available for edges and regions (Shapiro, 1993). Another important characteristic of wavelets in a second generation coding framework is that each subband can be coded separately from the others. This provides the possibility of allocating the total bit rate available according to the visual importance of each subband. Finally, SBC does not suffer from the annoying blocking artifacts reported in the DCT coders. However, it does suffer from an artifact specific to wavelet transfer, that of ringing. This effect occurs mainly around high— contrast edges and is due to the Gibbs phenomenon of linear filters. The effect of this phenomenon varies according to the specific filter bank used for the decomposition. Taking into account properties of the HVS, SBC, and, in particular wavelet transforms makes it possible to achieve high compression ratios with very good visual quality images. Moreover, as with the pyramidal approach proposed by Burt and Adelson (1983), they permit a progressive transmission of the images by the hierarchical structure they possess. As a general approach, the concept of subband decomposition on which wavelets are based was originally introduced in the speech-coding domain by Crochiere et al. (1976) and Croisier et al. (1976). Later, Smith and Barnwell (1986) proposed a solution to the problem of perfect reconstruction for a 1D multirate system. In 1984, Vetterli (1984) extended perfect reconstruction filter banks theory to bidimensional signals. In 1986 Woods and O’Neil (1986) proposed the 2D separable quadrature mirror filter (QMF) banks that introduced this theory in the image-coding domain. The most currently used filter banks are the QMF proposed by Johnston (1980). These are 2-band filter banks that are able to minimize a weighted sum of the reconstruction error and the stopband energy of each filter. Fig. 8 represents a generic scheme for 2-band filter banks. As they exhibit linear phase characteristics these filters are of particular interest to the research community; however they do not allow for perfect reconstruction. An alternative is represented by the conjugate quadrature filter (CQF) proposed by Smith and Barnwell (1986). These allow for perfect reconstruction, but do not have linear phase. M-band filters also exist as an alternative to quadrature filters (Vaidyanathan, 1987); however, the overhead introduced by their more complex design and computation has not helped the diffusion of these filters in the coding domain. Finally, some attempts to define filter banks that take further account of HVS properties have been pursued by Caglar et al. (1993) and Akansu et al. (1993).
SECOND-GENERATION IMAGE CODING
21
Figure 8. Generic scheme for a 2-band analysis/synthesis system. H and G are, respectively, the analysis and synthesis lowpass filter. H and G represent the equivalent high-pass filters. Perfect reconstruction is achievable when Y (z) is a delayed version of X(z).
Among the complete set of subband filters that have been developed, an important place in second generation image coding is represented by the wavelet decomposition. This approach takes into consideration the fact that most of the power in natural images is concentrated in the low frequencies. Thus a finer partition of the frequency in the lowpass band is performed. This is achieved by a tree structured system as represented in Fig. 9. A wavelet decomposition is a hierarchical approach: at each level the available frequency band is decomposed into four subbands using a 2-band
Figure 9. A depth 2 wavelet decomposition. On the right, the 2-level tree structure is represented. X is the original image. L H represents a first level, lowpass filter in the horizontal direction and high-pass filter in the vertical direction. The other filter definitions obey the same convention. Since each filtering is followed by a down-sampling step, the complete decomposition can be represented with as many coefficients as the size of the original image, as shown in the left-hand side of this figure.
22
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
Figure 10. An example of wavelet decomposition. The filter used to decompose the image ‘‘Lena’’ (512 ; 512) is a 2-level biorthogonal Daubechies’ 9/7 filter.
filter bank applied both to the lines and to the columns. This procedure is repeated until the energy contained in the lowest subband ( L L ) is less then a prefixed threshold, determined according to an HVS model hypothesis. In Fig. 10 the results of applying wavelet decomposition on the test image Lena are represented. A generic scheme for a wavelet transform coder is represented in Fig. 11. Several different implementations reported in the literature differ according to the wavelet representation, the method of quantization, or the final entropic encoder.
Figure 11. A generic scheme for wavelet encoders. First a wavelet representation of the image is generated, then a quantification of the wavelet coefficients has to be performed. Finally an entropic encoder is applied to generate the bit-stream. Several choices for each step are available. Some of the most common ones are listed in the figure.
SECOND-GENERATION IMAGE CODING
23
Among the different families of wavelet representations, it is worth noting the compactly supported orthogonal wavelets. These wavelets belong to the more general family of orthogonal wavelets that generate orthonormal bases of L (R). The important feature of the compactly supported orthogonal wavelets is that in the discrete wavelet transform (DWT) domain they correspond to finite impulse response (FIR) filters. Thus they can be implemented efficiently (Mallat, 1989; Daubechies, 1993, 1998). In this family the Daubechies’ Wavelets and Coifman’s Wavelets are popular. An important drawback of compactly supported orthogonal wavelets is their asymmetry. This is responsible for the generation of artifacts at the borders of the wavelet subbands as reported by Mallat (1989). To avoid this drawback, he has also investigated noncompact orthogonal wavelets; however they do not represent an efficient alternative due to their complex implementation. An alternative wavelet family that presents symmetry properties is the biorthogonal wavelet representation. This wavelet representation also offers efficient implementations and thus it has been adopted in several wavelet image coders. The example represented in Figure 10 was generated using a wavelet belonging to this family. There has been some work carried out in an attempt to define methods that identify the best wavelet basis for a particular image. In this framework a generalized family of multiresolution orthogonal or biorthogonal bases that includes wavelets has been introduced; these are regrouped according to Lu et al. (1996) in the wavelet packets family. Different authors have proposed entropic or rate-distortion based criteria to choose the best basis from this wide family (Coifman and Wickerhausen, 1992; Ramchandran and Vetterli, 1993). In a second generation image coding framework, of particular interest is the research carried out on the zero-crossings and local maxima of wavelet transforms (Mallat, 1991; Froment and Mallat, 1992). These techniques directly introduce in the wavelet framework the concept of edges and contours, so important in the HVS (Croft and Robinson, 1994; Mallat and Zhong, 1991). More detail on this approach will be given in Section III.F. The choice of the wavelet to be used is indeed a key issue in designing a wavelet image coder. The preceding short discussion shows that many different choices are available: not all directly take into account HVS considerations. These can, however, be introduced in the subsequent quantization step of the coding process. As was discussed, a wavelet representation generates, for each image, a number of 3D ; 1 subbands, where D represents the levels of the decomposition (dyadic scales). Each subband shows different statistical behaviors; thus it is important to apply an optimized quantization for each of them.
24
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
As already discussed in Section III.C for the DCT transform, and as reported by Jain (1989), the uniform quantizer is a quasi-optimum solution for an MSE criterion. In this case we simply need to define for each subband a quantizer step. Note that this solution is similar to the one used in the JPEG standard, where each coefficient in the DCT block is associated with a different quantization step (see Table 1). This choice is the one used in practice by a well-known software package, EPIC (Simoncelli and Adelson). In this case an initial step size is defined and divided by a factor of two as one goes to the next coarser step in the wavelet decomposition. Thus the lowest subband that provides most of the visual information is finely quantized with the smallest step size. Other methods increase the compression by mapping small coefficients in the highest frequency bands to zero. Research has also been performed aimed at the design of HVS-based quantizers. In particular Lewis and Knowles (1992) designed a quantizer that considers the HVS’s spectral response, noise sensitivity in background luminance, and texture masking. For scalar quantization, the uniform quantization performs well; other alternatives are represented by the vector quantization (VQ) methods. Generally VQ performs better then SQ as discussed in Senod and Girod (1992). The principle is to quantize vectors or blocks of coefficients instead of the coefficient itself. This generalization of the SQ takes into account the possible correlation between coefficients, already at the quantization step. Cicconi et al. (1994) describe a Pyramidal Vector Quantization that takes into account correlation between subbands that belong to the same frequency orientations. Thus both intra- and interband correlation are taken into account during the quantization process. In the same contribution the authors also introduce a criterion for a perceptual quantization of the coefficients, which is particularly suited to second generation image coding techniques. Another possible solution in wavelet coders is represented by a successiveapproximation quantization. In this category, it is important to cite the method proposed by Shapiro (1993): ‘‘embedded zerotree wavelet algorithm’’ (EZW). This method tries to predict the absence of significant information across the subbands generated by the wavelet decomposition. This is achieved in defining a zerotree structure. Starting from the lowest frequency subband, a father-children relationship is defined recursively through all the following subbands as represented in Fig. 12. Basically the quantization is performed by successive approximation across the subbands with the same orientation. Similar to the zig-zag scanning reported in Section III.C, a scanning of the different subbands as shown in Fig. 13 is performed. This strategy turns out to be an efficient technique to code zero and nonzero quantized values.
SECOND-GENERATION IMAGE CODING
Figure 12. Parent-children relationship defined by the EZW algorithm.
Figure 13. Zero-tree scanning order for a 3-scale QMF wavelet decomposition.
25
26
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
Research by both Said and Pearlman (1996) and Taubman and Zakhor (1994) based on the same principle developed by Shapiro, provided even better coding performances. Their new techniques are known as set partitioning in hierarchical trees (SPIHT) (Said and Pearlman, 1996) and layered zero coefficient (LZC) (Taubman and Zakhor, 1994). Recently, new efforts have been devoted to the improvement of these coding techniques with special attention to both HVS properties and color components (Lai and Kuo, 1998a,b; Nadenau and Reichel, 1999). An interesting example is represented by the technique proposed by Nadenau and Reichel (1999). This technique is based on an efficient implementation of the LZC method (Taubman and Zakhor, 1994). It applies the lifting steps approach, presented by Daubechies and Sweldens (1998) in order to reduce the memory and the number of operations required to perform the wavelet decomposition. It also performs a progressive coding based on the HVS model and includes color effects. The HVS model is used to predict the best possible bits allocations during the quantization step. In particular, the color image is converted into the opponent color space discussed by Poirson and Wandell (1993, 1996): this representation reflects better the usual YCbCr representation, the properties of color perception in the HVS model. Finally, this technique produces a visually embedded bit-stream. This means that not only the quality improves as more bytes are received and that the transmission can be stopped at anytime, but that the partial results are always coded with the best visual quality.
F. Edge Detection Mallat and Zhong (1991) point out that in most cases structural information required for recognition tasks is provided by the image edges. However, one major difficulty of edge-based representation is to integrate all the image information into edges. Most edge detectors are based on local measurements of the image variations and edges are generally defined as points where the image intensity has a maximum variation. Multiscale edge detection is a technique in which the image is smoothed at various scales and edge points are detected by a first- or second-order, differential operator. The coding method presented involves two steps. First, the edge points considered important for visual quality are selected. Second, these are efficiently encoded. Edge points are chained together to form edge curves. Selection of the edge points is performed at a scale of 2. This means that the edge points are selected from the image in the pyramidal structure that has been scaled to a factor of four. Boundaries of important structures often
SECOND-GENERATION IMAGE CODING
27
generate long edge curves, so, as a first step, all edge curves whose lengths are smaller than a threshold are removed. Among the remaining curves, the ones that correspond to the sharpest discontinuities in the image are selected. This is achieved by removing all edge curves along which the average value of the wavelet transform modulus is smaller than a given amplitude threshold. After the removal procedures, it is reported that only 8% of the original edge points are retained; however it is not clear if this figure is constant for all images. Once the selection has been performed, only the edge curves at scale 2 are coded in order to save bits; the curves at other scales are approximated from this. Chain coding is used to encode the edge curve at this scale. The compression ratio reported by Mallat and Zhong (1991) with this method is approximately 27:1 with good image quality. G. Directional Filtering Directional filtering is based on the relationship between the presence of an edge in an image and its contribution to the image spectrum. It is motivated by the existence of direction-sensitive neurons in the HVS (Kunt et al., 1985; Ikonomopoulos and Kunt, 1985). It can be seen that the contribution of an edge is distributed all over the spectrum; however, the highest frequency component lies in the direction orthogonal to that of the edge. It can also be seen that the frequency of the contribution diminishes as we turn away from this direction, until it vanishes at right angles to it. A directional filter is one whose frequency response covers a sector or part of a sector in the frequency domain. If f and g are spatial frequencies and r is the cut-off frequency of the lowpass filter, then the ideal frequency response of the ith directional filter of a set of n is given by:
1, if tan\(g/ f ) G ( f, g) : 0, otherwise with : (i 9 1) , 2n
: (i ; 1) 2n
and f , g 0.5. A directional filter is a high-pass filter along its principal direction and a lowpass filter along the orthogonal direction. The directional filter response is modified, as in all filter design, by an appropriate window function
28
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
(Harris, 1978), to minimize the effect of the Gibbs phenomenon (Ziemer et al., 1989). In the sum of a trigonometric series, it can be seen that there tend to be overshoots in the signal being approximated at a discontinuity. This is referred to as the Gibbs phenomenon. An ideal filter can be viewed as a step or rectangular pulse waveform, that is, a discontinuous waveform. The reason for the overshoot at discontinuities can be explained using the Fourier transform. Consider a signal x(t) with a Fourier transform X( f ). The effect of reconstructing x(t) from its lowpass part shows that:
x (t) : F\ X( f )
f 2W
,
where
f 1 when f W : 2W 0 otherwise.
According to the convolution theorem of Fourier transform theory,
x (t) : x(t) 9 F\
f 2W
: x(t) 9 (2W sinc 2Wt).
Bearing in mind that convolution is a folding-product, sliding-integration process, it can be seen that a finite value of W will always result in x(t) being viewed through the sinc window function; even as W increases, more of the frequency content of the rectangular pulse will be used in the approximation of x(t). In order to eliminate the Gibbs phenomenon it is important to modify the frequency response of the filter by a window function. There are many window functions available, each with different frequency responses. The frequency response of the chosen window function is convolved with the filter response. This ensures that the overall frequency response does not contain the sharp discontinuities that cause the ripple. In a general scheme using directional filters, n directional filters and one lowpass filter are required. An ideal lowpass filter has the following frequency response: G
( f, g) :
1, 0,
if f ; g r
otherwise.
It should be noted that superposition of all the directional images and the lowpass image lead to an exact reconstruction of the original image. Two parameters are involved in the design of a directional filter-based image coding scheme: the number of filters and the cutoff frequency of the lowpass filter. The number of filters may be set a priori and is directly related to the
SECOND-GENERATION IMAGE CODING
29
minimum width of the edge elements. The choice of lowpass cutoff frequency influences the compression ratio and the quality of the decoded image. As reported by Kunt et al. (1985) a very early technique in advance of its time was the synthetic highs system (Schreiber et al., 1959; Schreiber, 1963). It is stated by Kunt that the better known approach of directional filtering is a refinement of the synthetic highs system. In this technique, the original image is split into two parts: the lowpass picture showing general area brightness and the high-pass image containing edge information. Twodimensional sampling theory suggests that the lowpass image can be represented with very few samples. In order to reduce the amount of information in the high-pass image, thresholding is performed to determine which edge points are important. Once found, the location and magnitude of each edge point is stored. To reconstruct the compressed data, a 2D reconstruction filter, whose properties are determined by the lowpass filter used to produce the lowpass image, is used to synthesize the high frequency part of the edge information. This synthesized image is then added to the lowpass image to give final output. Ikonomopoulos and Kunt (1985) describe their technique for image coding based on the refinement of the synthetic highs system, directional filtering. Once the image has been filtered, the result is 1 lowpass image and 16 directional images. The coding scheme proposed is lossy since high compression is the goal. When the image is filtered with a high-pass filter the result gives zero-crossings at the location of abrupt changes (edges) in the image. Each directional component is represented by the location and magnitude of the zero-crossing. Given that a small number of points result from this process, typically 6—10% of the total number of points, run length encoding proves efficient for this purpose. The low frequency component can be coded in two ways. As maximum frequency of this component is small, it can be resampled based on 2D sampling theorem and the resulting pixels can be coded in a standard way. Alternatively, transform coding may be used, with the choice of transform technique being controlled by the filtering procedure used. The transform coefficients may then be quantized and coded via Huffman coding (Huffman, 1952). The compression ratios obtained with this technique depend on many factors. The image being coded and the choice of cutoff frequency all play an important role in the final ratio obtained. The compression scheme can be adapted to the type of image being compressed. Zhou and Venetsanopoulos (1992) present an alternative spatial method called morpholological directional coding. In their approach, spatial image features at known resolutions are decomposed using a multiresolution morphological technique referred to as the feature-width morphological pyramid (FMP). Zhou and Venetsanopoulos (1992) report that nontrivial
30
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
spatial features, such as edges, lines, and contours within the image, determine the quality of the reproduced image for the human observer. It was this fact that motivated them to employ a stage in their coding technique that identifies these nontrivial features in order that they may be coded separately. Morphological directional coding schemes were developed to preserve nontrivial spatial features in the image during the coding phase. Such filtering techniques are used for feature separation, as they are spatial methods that are capable of selectively processing features of known geometrical shapes. A multiresolution morphological technique therefore decomposes image features at various resolutions. In this technique image decomposition is a multistage process involving a filter called an open-closing (OC) filter. Each filtered image from the current stage is used as the input to the next stage, and in addition the difference between the input and output images of each stage is calculated. The first N 9 1 decomposed subimages (L % L ) are termed feature images and \ each contains image features at known resolutions. For example, L contains image features of width 1, L has features of width 2, and so on. Each OC filter has a structuring element associated with it, with those for stage n progressively larger than for the previous stage n 9 1. The structuring element defines the information content in each of the decomposed images. The decomposed FMP images contain spatial features in arbitrary directions. Therefore directional decomposition filtering techniques are applied to each of the FMP images in order to group features of the same direction together. Before this is implemented, the features in the FMP images, L , . . . , L , must be eroded to 1-pixel width. There are two \ reasons for this feature thinning phase (Zhou and Venetsanopoulos, 1992). First, the directional decomposition filter bank gives better results for features of 1 pixel width and second, it is more efficient and simpler to encode features of 1-pixel width. After the FMP images have been directionally decomposed, the features are further quantized by a nonuniform scalar quantizer. Each extracted feature is first encoded with a vector and then each vector is entropy encoded. The coarse image L is encoded using conventional methods such as VQ. Both of these methods employ directional decomposition as the basis of their technique. Ikonomopoulos and Kunt (1985) implemented a more traditional approach in that the directional decomposition filters are applied directly to the image. In their method the compression ratio varies from image to image. The filter design depends on many factors, which in turn affect the compression ratio. Therefore Ikonomopoulos and Kunt (1985) state that these parameters should be tuned to the particular image because the quantity, content, and structure of the edges in the image determine the
SECOND-GENERATION IMAGE CODING
31
compression obtained. Despite these factors, compression in the order of 64:1 is reported with good image quality. The morphological filtering technique by Zhou and Venetsanopoulos (1992) separates the features into what they refer to as FMP images. Traditional directional decomposition techniques are applied to these FMP images in order to perform the coding process. The compression ratios reported by this method are reasonable at around 20:1.
IV. Segmentation-Based Approaches A. Overview A general scheme of a segmentation-based image coding approach is represented in Fig. 14. The original image is first preprocessed in order to eliminate noise and small details. The segmentation is then performed in order to organize the image as a set of regions. These might represent the objects in the scene, or more generally some homogeneous group of pixels. Once regions have been generated, the coding step takes place. This is now composed of two different procedures: contour coding and texture coding. The former is responsible for coding the shape of each region so that it can be reconstructed later at the decoder site and the latter is responsible for coding the texture inside each region. These two procedures generate two bit-streams that, together, are used for the reconstruction of the original image. The segmentation-based approaches have strong motivations in the framework of second generation image coding. The visual data to be coded is generally more coherent inside a region that is semantically more meaningful than predefined blocks. The introduction of a semantic representation of the scene might increase the decorrelation of the data, thus providing a higher energy compaction with consequent improved compression performances.
Figure 14. Generic scheme for segmentation-based approach to image coding.
32
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
Moreover, an object representation of the scene is the key point for dynamic coding (Ebrahimi et al., 1995). Each region can be coded independently from the others; this means that the best coding approach that suits the statistics of the single region can be applied. The introduction of a semantic representation of the image has another advantage: that of object interaction. This concept is particularly suitable for video sequences, and is one of the keypoints of the new MPEG-4 standard, but it can also be extended to still image coding. As we have mentioned, the HVS is able to recognize objects and automatically assign a different priority to an object of high or low interest. This can be simulated in a segmentation-based approach by association to visually or semantically important regions, high bandwidth, and to less crucial objects with low bandwidth. Research to predict and dynamically allocate the available bit rate, have been performed by Fleury et al. (1996) and Fleury and Egger (1997). In addition to these advantages, segmentation-based coding approaches suffer from some major drawbacks. First, the segmentation process is computationally expensive and generally not very accurate or automatic. Thus, it is still not possible to correctly analyze, in realtime, the semantic content of a generic image. This is a severe limitation for practical applications. Second, for each region we want to compress, it is necessary to code not only the texture information but also the contour information. This introduces an overhead that might even overcome the advantages obtained by coding a more coherent region. Finally, it has been shown that a semantic representation of the scene does not always provide homogeneous regions suitable for high compression purposes. In the next section, a brief review of important preprocessing techniques will be outlined. In Section IV.C, an overview of existing segmentation techniques will be proposed. A discussion of texture and contours coding will be presented in Sections IV.D and E. Finally, a review of major coding techniques based on a segmentation approach will be presented in the last four Sections, IV.F, G, H, and I.
B. Preprocessing The purpose of preprocessing is to eliminate small regions within the image and remove noise generated in the sampling process. It is an attempt to model the action of the HVS and is intended to alter the image in such a way that the preprocessed image will resemble more accurately the human brain processes. There are various methods used to preprocess the image,
SECOND-GENERATION IMAGE CODING
33
all derived from properties of the HVS. Two properties commonly used are Weber’s Law and the modulation transfer function (MTF) (Jang and Rajala, 1990, 1991; Civinlar et al., 1986). Marque´s et al. (1991) suggest the use of Steven’s Law. This accounts for a greater sensitivity of the HVS to gradients in dark areas as compared to light ones. For example, if B is the perceived brightness and I the stimulus intensity then: B : K.I. Therefore, by preprocessing according to Steven’s Law, visually homogeneous regions will not be split and unnecessary and heterogeneous dark areas will not be falsely merged. In addition, the inverse gradient filter (Wang and Vagnucci, 1981) has also been implemented in order to give a lowpass response inside a region and an all-pass response on the region’s contour (Kwon and Chellappa, 1993; Kocher and Kunt, 1986). This is an iterative scheme that employs a 3 ; 3 mask of weighting coefficients. These coefficients are the normalized gradient inverse between the central pixel and its neighbors. If the image to be smoothed is expressed as an n ; m array, whose coefficients p(i, j) are the graylevel of the image pixel at (i, j) with i : 1 . . . n and j : 1 . . . m, the inverse of the absolute gradient at (i, j) is then defined as (i, j : k, l) :
1 p(i ; k, j ; l) 9 p(i, j)
where k, j : 91, 0, 1 but k and l are not equal to zero at the same time. This means that (i, j : k, l)s are calculated for the eight neighbors of (i, j); this is denoted the vicinity V (i, j). If p(i ; k, j ; 1) : p(i, j), then the gradient is zero and (i, j : k, l) is defined as 2. The proposed 3 ; 3 smoothing mask is defined as: w(i 9 1, j 9 1) W (i, j) :
w(i 9 1, j)
w(i 9 1, j ; 1)
w(i, j 9 1)
w(i, j)
w(i, j ; 1)
w(i ; 1, j 9 1)
w(i ; 1, j)
w(i ; 1, j ; 1)
where w(i, j) : and w(i ; k, j ; l) : [ (i, j : k, l)]\(i, j : k, l) for k, l : 91, 0, 1, but not 0 at the same time. The smoothed image is then given as p (i, j) : w(i ; k, j ; l)p(i ; k, j ; l). \ \ Finally, the anisotropic diffusion filtering (Perona and Malik, 1990; Yon et al., 1996; Szira´nyi et al., 1998) is worth citing as a preprocessing method
34
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
Figure 15. Example of anisotropic diffusion applied to a natural image. On the left-hand side, the original image is displayed; on the right-hand side, the filtered version is represented.
because it is effective in smoothing image details by preserving edges information as shown in Fig. 15.
C. Segmentation Techniques: Brief Overview This section introduces some of the commonly used methods for segmenting an image. Segmentation groups similar pixels into regions and separates those pixels that are considered dissimilar. It may be thought of as representing an image by a disjoint covering set of image regions (Biggar et al., 1988). Many segmentation methods have been developed in the past (Pal and Pal, 1993; Haralick, 1983) and it is generally the segmentation method that categorizes the coding technique. Most of the image segmentation techniques today are applied to video sequences. Thus, they have access to motion information, and are extremely useful in improving their performances. We have focused here on still image coding, but motion information remains an important HVS feature. Thus in the following we will also refer to those techniques that integrate both spatial and temporal information to achieve better segmentation of the image. 1. Region Growing Region growing is a process that subdivides a (filtered) image into a set of adjacent regions, whose gray-level variation within the region does not exceed a given threshold. The basic idea behind region growing is that, given a starting point within the image, the largest set of pixels whose gray level is within the specified interval is found. This interval is adaptive in that it is allowed to move higher or lower on the grayscale in order to intercept the maximum number of pixels. Figure 16 illustrates the concept of regiongrowing for two contrasting images.
SECOND-GENERATION IMAGE CODING
35
Figure 16. Images a) and b) are, respectively the original test images ‘‘Table Tennis’’ and ‘‘Akiyo.’’ Images c) and d) are the corresponding segmentations obtained through region growing.
2. Split and Merge Split-&-merge algorithms (Pavlidis, 1982) segment the image into sets of homogeneous regions. In general, they are based around the quadtree (Samet, 1989) data structure. Initially the image is divided into a predefined subdivision; then, depending on the segmentation criteria, adjacent regions are merged if they have similar gray-level variations or a quadrant is further split if large variations exist. An example of this method is displayed in Fig. 17.
Figure 17. An example of quadtree decomposition of the image ‘‘Table Tennis.’’ Initial decomposition in square blocks is iteratively refined through successive split and merging steps.
36
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
Figure 18. Segmentation results obtained by applying the method proposed by Ziliani (1998).
3. K-Means Clustering K-means clustering is a segmentation method based on the minimization of the sum squared distances from all points in a cluster to a cluster center. First k-initial cluster centers are taken and the image vectors are iteratively distributed among the k-cluster domain. New cluster centers are computed from those results in such a way that the sum of the squared distances from all points in a cluster to a new cluster center has been minimized. It is interesting to note that this method can characterize each cluster and each pixel of the image with several features, including luminance, color, textures, etc., as described in Castagno (1998) and Ziliani (1998). In Fig. 18, an example of the segmentations obtained in applying the method proposed by Ziliani (1998) is presented. 4. Pyramidal L inking This method proposed by Burt et al. (1981) uses a pyramid structure where flexible links between the nodes of each layer are established. At the base of the pyramid the original image is assumed. The layers consist of nodes that comprise the feature values and other information as described by Ziliani and Jensen (1998). The initial value for the node of a layer is obtained by computing the mean of the certain area in the layer below. This is done for the all-nodes in a way that they correspond to partially overlapping regions. After this is done for the entire pyramid, father-son relationships are defined between the current layer and the layer below using those nodes that participated in the initial feature computation. Using these links, the feature values of all layers are updated again and afterwards new links are established. This is repeated until a stable state is reached. In Fig. 19 an example of the segmentation obtained in applying the method proposed in Ziliani and Jensen (1998) is represented.
SECOND-GENERATION IMAGE CODING
37
Figure 19. These are the regions obtained by applying to ‘‘Table Tennis’’ the Pyramid Linking segmentation proposed by Ziliani and Jensen (1998).
5. Graph T heory There are a number of image segmentation techniques that are based on the theory of graphs and their applications (Morris et al., 1986). A graph is composed of a set of ‘‘vertices’’ connected to each node by ‘‘links.’’ In a weighted graph the vertices and links have weights associated with them. Each vertex need not necessarily be linked to every other, but if they are, the graph is said to be complete. A partial graph has the same number of vertices but only a subset of the links of the original graph. A ‘‘spanning tree’’ can be referred to as a partial graph. A ‘‘shortest spanning tree’’ of a weighted graph is a spanning tree such that the sum of its link weights is a minimum for many possible spanning trees. To analyze images using graph theory, the original image must be mapped onto a graph. The most obvious way to do this is to map every pixel in the original image onto a vertex in the graph. Other techniques generate a first over-segmentation of the image and map each region instead of each pixel. This reduces complexity and improves segmentation results because each node of the graph is already a coherent structure. Recently, Moscheni et al. (1998) have proposed an effective segmentation technique based on graphs. 6. Fractal Dimension The fractal dimension D is a characteristic of the fractal model (Mandelbrot, 1982), which is related to properties such as length and surface of a curve. It provides a good measure of the perceived roughness of the surface of the image. Therefore, in order to segment the image, the fractal dimension across the entire image is computed. Various threshold values can then be used to segment the original image according to its fractal dimension.
38
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
D. Texture Coding According to the scheme presented in Fig. 14, once segmentation of the image has been performed, it is necessary to code each defined region. By taking into account the hypothesis that the segmentation has generated luminance homogeneous regions, a first approach to code their texture is represented by the polynomial approximation approach presented in Section IV.D.1. However, we have already noticed that the need for a semantic representation of the scene that might be useful for dynamic coding applications does not always correspond to the definition of homogeneous regions. In these cases, a more general approach such as the shape-adaptive DCT transform (Section IV.D.2) is used. 1. Polynomial Approximation In order to efficiently code the gray-level content of the regions, these are represented by an n-order polynomial. The basic idea behind polynomial fitting is that an attempt is made to model the gray-level variation within a region by an order-n polynomial while ensuring that the MSE between the predicted value and the actual is minimized. An order-0 polynomial would ensure that each pixel in the region is represented by the average intensity value of the region. An order-1 polynomial is represented by: z : a ; bx ; cy, where z : new intensity value at (x, y). 2. Shape-Adaptive DCT The shape-adaptive DCT (SADCT) proposed by Sikora (1995) and Sikora and Makai (1995) is currently very popular. The transform principles are the same as we have already introduced in Section III.C: The image is organized in N ; N blocks of pixels as usual. Some of these will be completely inside the region to be coded while others will contain pixels belonging to the region and pixels outside the region to be coded. For those blocks completely contained in the region to be coded, no differences have been introduced from the standard DCT-based coder. For those blocks that contain some pixels of the region to be coded, a shift of all the pixels of the original shape to the upper bound of the block is first performed. Each column is then transformed, based on the DCT transform matrix defined by Sikora and Makai (1995). Then another shift to the left bound of the block is performed. This is followed by a DCT transform of each line of coefficients. This final step provides the SADCT coefficients for the block. This algorithm is efficient because it is simple and it generates a total
SECOND-GENERATION IMAGE CODING
39
number of coefficients corresponding to the number of pixels in the region to be coded. Its main drawback is decorrelation of the nonadjacent pixels it introduces. Similar techniques also exist for the wavelet transforms (Egger et al., 1996).
E. Contours Coding As illustrated in Fig. 14, a segmentation-based coding approach requires a contour coding step in addition to a texture coding step. This is necessary to correctly reconstruct the shape of the regions defined during the segmentation step. Contour coding can be a complex problem. The most simple solution is to record every pixel position in the region in a bitmap-based representation. This is not the most efficient approach but it can achieve good compression performances when combined with efficient statistical entropy coding. The trade-off between exact reconstruction of the region and the efficient coding of its boundaries has been the subject of much research (Rosenfeld and Kak, 1982; Herman, 1990). Freeman chain coding (1961) is one of the earlier and most referenced techniques that attempts to code region contours efficiently by representing the given contour with an initial starting position and a set of codes representing relative positions. The Freeman chain codes are shown in Fig. 20. In this coding process an initial starting point on the curve is stored via its (x, y) coordinates. The position of the next point on the curve is then located. This position can be in 1 of the 8 locations illustrated in Fig. 20. If, for example, the next position is (x, y 9 1) then the pixel happens to lie in position 2 according to Freeman and hence a 2 is output. This pixel is then updated as the current position and the coding process repeats. The coding
Figure 20. Each number represents the Freeman chain code for each possible movement of the central pixel.
40
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
is terminated when either the original start point has been reached (closed contour), or no further points on the curve can be found (open contour). The code is an efficient representation of the contour because at least 3 bits are required to store each code in the chain; however further gains can be achieved by applying entropy coders or lossy contour coding techniques to the contours. In addition to chain coding, other approaches have been investigated. We cite the geometrical approximation methods (Gerken, 1994; Schroeder and Mech, 1995) and the methods based on mathematical morphology (Briggar, 1995). A recent technique based on polygonal approximation of the contour that provides progressive and efficient compression of region contours is one that was proposed by Le Buhan et al. (1998).
F. Region-Growing Techniques Kocher and Kunt (1983) presented a technique based on region growing called contour texture modeling. The original image is preprocessed by the inverse gradient filter (Wang and Vagnucci, 1981) to remove picture noise in preparation for the region growing process. After the growing process, a large number of small regions are generated, some of which must be eliminated. This elimination is necessary in order to reduce the number of bits required to describe the segmented image, and thus increase compression ratio. It is performed on the basis of removal of small regions and merging of weakly contrasting regions. Regions whose gray-level variations differ slightly are considered weakly contrasting. In this technique, contour coding is performed in stages. First an orientation of each region contour is defined. Then spurious and redundant contour points are deleted. Also small regions are merged with nearby valid regions. Finally, the contours are approximated by line and circle segments and coded through differential coding of the successive end-of-segment addresses. Texture coding is achieved by representing the gray-level variation within the region by an nth-order polynomial function. As a final step, pseudorandom noise is added in order to produce a natural looking image. Civanlar et al. (1986) present an HVS-based segmentation coding technique in which a variation of the centroid linkage region growing algorithm (Haralick, 1983) is used to segment the image after preprocessing. In a centroid linkage algorithm the image is scanned in a set manner, for example, left to right or top to bottom. Each pixel is compared to the mean gray-level value of the already partially constructed regions in its neighborhood and if the values are close enough the pixel is included in the region
SECOND-GENERATION IMAGE CODING
41
and a new mean is computed for the region. If no neighboring region has a close enough mean, the pixel is used to create a new segment whose mean is the pixel value. In the technique by Civanlar et al. (1986), the centroid linkage algorithm described here applies. However, if the intensity difference is less than an HVS visibility threshold, the pixel is joined to an existing segment. If the intensity differences between the pixel and its neighbor segments are larger than the thresholds, a new segment is started. The work by Kocher and Kunt (1983) provides the facility to preset the approximate compression ratio prior to the operation. This is achieved by setting the maximum number of regions that will be generated after the region growing process. The results obtained via their method are good both in terms of reconstructed image quality and compression ratio. However, they point out that the performance of their technique in terms of image compression and quality is optimal for images that are naturally composed of a small number of large regions. Civanlar et al. (1986) report good image quality and compression ratios comparable to those achieved by Kocher and Kunt (1983). G. Split-and-Merge-Based Techniques Kwon and Chellappa (1993) and Kunt et al. (1987) present a technique based on a merge-and-threshold algorithm. After the image has been preprocessed, the intensity difference between two adjacent regions is found. If this difference is less than or equal to k, which has been initialized to 1, the regions are merged and the average of the intensities is computed. A histogram of the merged image is computed and if separable clusters exist, the above steps are repeated; otherwise, the original image is segmented by thresholding the intensity clusters. When the overall process is complete the regions obtained may be represented by an nth order polynomial. The preceding method of segmentation extracts only homogeneous regions and thus for textured regions a large number of small homogeneous regions will be generated. In terms of image coding, it is more efficient to treat textured areas as one region as opposed to several small regions. Therefore, in addition to the homogeneous region extraction scheme, textured regions are also extracted and combined with the results of the uniform region segmentation. Multiple features are used in the texture extraction process, along with the recursive thresholding method using multiple 1D histograms. First, the image is regarded as one region. A histogram is then obtained within each region of the features to be used in the extraction process. The histogram showing the best clusters is selected and this corresponding region is then
42
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
split by thresholding. These steps are repeated for all regions until none of the histograms exhibit clustering. Final segmentation is achieved by labeling the extracted uniform regions. If the area of such a region is covered by more than 50% of a textured region of type ‘‘X’’, then the uniform region is labeled as a textured region of that type. Adjacent uniform regions are merged with a texture region if they show at least one similar texture feature with the corresponding texture region. In terms of coding, uniform regions are represented by polynomial reconstructions. Texture regions are represented by a texture synthesis technique using the Gaussian Markov random field (GMRF) model (Chellappa et al., 1985). Encoding the image therefore involves storing information about the contours of the regions, polynomial coefficients of the uniform regions, GMRF parameters for textured regions, and a means of identifying each region. Variable bits are allocated for each component. Another approach based on a split-and-merge algorithm is that by Cicconi and Kunt (1977). Segmentation is performed by initially clustering the image using a standard K-means clustering algorithm (Section IV.C.3). Once the image has been segmented into feature homogeneous areas, an attempt to further reduce the redundancy inside the regions is implemented by looking for symmetries within the regions. In order to do this the medial axis transformation (MAT) (Pavlides, 1982) is used for shape description. The MAT is a technique that represents, for each region, the curved region descriptor. The MAT corresponds closely to the skeleton that would be produced by applying sequential erosion to the region. Values along the MAT represent the distance to the edge of the region and can be used to find its minimum and maximum widths. The histogram of the values will give the variation of the width. Once the MAT has been found, a linear prediction of each pixel in one side of the MAT can be constructed from pixels symmetrically chosen in the other side. Coding of the segmented image is performed in two stages — contours coding and texture coding. As the MAT associated with a region is reconstructible from a given contour, only contours have to be coded. Texture components in one part of the region with respect to the MAT may be represented by a polynomial function. However, representing the polynomial coefficients precisely requires a large number of bits. Therefore, the proposed method suggests defining the positions of 6 pixels, which are found in the same way for all regions, then quantizing these 6 values. These quantized values allow the unique reconstruction of the approximating second-order polynomial. Both of the preceding techniques are similar in that they employ a split-and-merge algorithm to segment the original image. However, Kwon and Chellappa (1993), state that better compression ratios may be obtained by segmenting the image into uniform and textured regions. These regions
SECOND-GENERATION IMAGE CODING
43
may be coded separately and in particular the textured regions may be more efficiently represented by a texture synthesis method, such as a GMRF model, as opposed to representing the textured region with many small uniform regions. Cicconi and Kunt’s method (1977) segments the image into uniform regions and, in addition, they propose to exploit further redundancy in these regions by identifying symmetry within the regions. The gray-level variation within each of the uniform regions is represented using polynomial modeling. Cicconi and Kunt further developed a method for reducing the storage requirements for the polynomial coefficients. Despite the different methods used to represent both the contours and the gray-level variations within the regions, both methods report similar compression ratios. H. Tree/Graph-Based Techniques Biggar et al. (1988) developed an image coding technique based on the recursive shortest spanning tree (RSST) algorithm (Morris et al., 1986). The RSST algorithm maps the original image onto a region graph so that each region initially contains only one pixel. Sorted link weights, associated with the links between neighboring regions in the image, are used to decide which link should be eliminated and therefore which regions should be merged. After each merge, the link weights are recalculated and resorted. The removed links define a spanning tree of the original graph. Once the segmentation is complete, the spanning tree is mapped back to image matrix form, thus representing the segmented image. The regions generated are defined by coding the lines that separate the pixels belonging to different regions. The coded segmented image consists of three sources: a list of coordinates from which to start tracing the edges; the edge description; and a description of the intensity profile within each region. Although the intensity profile within the region could be represented as a simple flat intensity plateau, it has been suggested by Kunt et al. (1985) and Kocher and Kunt (1983) that a better result is achievable by higher-order polynomial representation. Biggar et al. (1988) suggest that to embed the polynomial fitting procedure at each stage of the region-merging process, as Kocher and Kunt (1983) do, would be computationally too expensive. Therefore in this case a flat intensity plane is used to generate the regions and polynomials are fitted after the segmentation is complete. The edge information is extracted from the segmented image using the algorithm for thin line coding by Kaneko and Okudaira (1985). A similar technique to the forementioned, based on the minimum spanning forest MSF is reported by Leou and Chen (1991). Segmentation and contour coding and performed exactly as described by Biggar et al. (1988), however
44
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
the intensity values within a segmented region are coded with polynomial representation. Here a texture extraction scheme is used, based on the assumption that lights are cast overhead on the picture and that the gray values vary according to the distance to the corresponding region centroid. After texture extraction, the regions have a high pixel-to-pixel correlation. Therefore, for simplicity and efficiency, a polynomial representation method is used to encode the texture. This is achieved by representing any row of the image by a polynomial. A different graph theory approach is presented by Kocher and Leonardi (1986) based on the region adjacency graph (RAG) data structure (Pavlidis, 1982). The RAG is again a classical map graph with each node corresponding to a region and links joining nodes representing adjacent regions. The basic idea of the segmentation technique is that a value that represents the degree of dissimilarity existing between two adjacent regions is associated to a graph link. The link that exhibits the lowest degree of dissimilarity is removed and the two nodes it connects are merged into one. This merging process is repeated until a termination criterion is reached. Once complete, the RAG representation is mapped back to the image matrix form, and thus a segmented image is created. The segmented image is coded using a polynomial representation of the regions and gives very good compression ratios. All of the preceding methods are based on similar graph structures that enable the image to be mapped to the graph form in order to perform segmentation. The techniques by Biggar et al. (1988) and Kocher and Leonardi (1986), both model the texture within the image via a polynomial modeling method. However, Kocher and Leonardi report on compression ratios of much larger proportions than those from Biggar et al. (1988). Leou and Chen (1991) and Pavlidis (1982) implement a segmentation technique identical to that presented by Biggar et al. However, Leou and Chen point out that better compression ratios can be achieved by firstly performing a texture extraction process and then modeling the texture by polynomials as opposed to polynomial functions. The compression ratio achieved via this method is an improvement on that reported by Biggar et al (1988). A more recent technique belonging to graph-based segmentation techniques is the one proposed by Moscheni et al. (1998) and Moscheni (1997).
I. Fractal-Based Techniques In the previous sections, various methods for image segmentation have been suggested that lend themselves to efficient compression of the image. Most
SECOND-GENERATION IMAGE CODING
45
of these techniques segment the image into regions of homogeneity and thus, when a highly textured image is encountered, the result of the segmentation is many small homogeneous regions. Jang and Rajala (1990, 1991) suggest a technique that segments the image in terms of textured regions. They also point out that in many cases previous segmentation-based coding methods are best suited to head and shoulder type (closeups) images and that results obtained from complex natural images are often poor. In their technique the image is segmented into textually homogeneous regions as perceived by the HVS. Three measurable quantities are identified for this purpose: the fractal dimension; the expected value; and the just noticeable difference. These quantities are incorporated into a centroid linkage region growing algorithm that is used to segment the image into three texture classes: perceived constant intensity; smooth texture; and rough texture. An image coding technique appropriate for each class is then employed. The fractal dimension D value is then thresholded to determine the class of the block. The following criteria are used to categorize the particular block under consideration: D D perceived constant intensity; D D D smooth texture; and D D rough texture. After this segmentation process the boundaries of the regions are represented as a two-tone image and coded using arithmetic coding. The intensities within each region are coded separately according to their class. Those of class 1, perceived constant intensity, are represented by the average value of the region. Class 2, smooth texture, and class 3, rough texture, are encoded by polynomial modeling. It should be noted from the description in Section IV.D.1 that polynomial modeling leads to some smoothing and hence may not be useful for rough texture. Therefore, it is not clear as to why Jang and Rajala chose this method of representation for the class 3 regions. Each of the various segmentation techniques used group pixels according to some criterion, whether it was homogeneity, texture, or pixels within a range of gray-level values. The problem that arises after segmentation is how to efficiently code the gray-level values within the region. The most basic representation of gray level within a region is by its mean value. This will result in a good compression, especially if the region is large; however the quality of the decoded image will be poor. In most cases gray-level variation is approximated by a polynomial function of order two. The results obtained by polynomial approximation can be visually poor, especially for highly
46
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
textured images. It is for this reason that more researchers are representing highly textured regions by texture synthesis techniques such as GMRF. These methods do not gain over the compression ratios obtained using polynomial approximation, but the quality of reproduced image is claimed to be improved (Kwon and Chellappa, 1993). Another approach used to encode the gray-level variation is by representing the variations by polynomials, as was done by Leou and Chen (1991). This method is of similar computational complexity but the results in terms of compression ratio and image quality are claimed to be better than polynomial reconstruction. As stated by Jang and Rajala (1990, 1991), many of the forementioned segmentation-based techniques are not sufficient when the input image is a natural one, that is, an image of a real scene. These may be images that contain highly textured regions and when segmented using conventional methods, the resulting textured region is composed of a large number of small regions. These small regions are often merged or removed in order to increase the compression ratio and as a result the decoded image appears to be very unnatural. Therefore, Jang and Rajala (1990, 1991) employed the use of the fractal dimension to segment the image. This ensures that the region is segmented into areas that are similar in terms of surface roughness, this being the result of visualizing the image in 3D with the third dimension that of gray-level intensity. However, once segmented into regions of similar roughness the method employed to code the identified areas is similar to that of traditional segmentation-based coding methods, that is, polynomial modeling. This polynomial modeling, as reported by Kwon and Chellappa (1993) does not suffice for the representation of highly textured regions and they suggest the use of texture synthesis. Therefore, it may be concluded that a better segmentation-based coding method might employ the fractal dimension segmentation approach coupled with texture synthesis for textured region representation. As discussed in this section there are a number of different methods that can be used in the segmentation process. Table 2 summarizes the methods used in the techniques that employ a segmentation algorithm as part of the coding process.
V. Summary and Conclusions This chapter has reviewed second-generation image coding techniques. These techniques are characterized by their exploitation of the human visual system. It was noted that first-generation techniques are based on classical information theory and are largely statistical in nature. As a result they tend to deliver compression ratios of approximately 2:1. The problem with the
47
SECOND-GENERATION IMAGE CODING TABLE 2 Texture Coding Employed in Segmentation Algorithms Technique HVS-based segmentation (Civinlar et al., 1986) Segmentation-based (Kwon and Chellappa, 1993) Symmetry-based coding (Cicconi and Kunt, 1977) RSST-based (Biggar et al., 1988) MSF-based (Leou and Chen, 1991) RAG-based (Kochen and Leonardi, 1986) Fractal dimension (Jang and Rajala, 1990)
Texture Coding Method Polynomial Polynomial Polynomial Polynomial Polynomial Polynomial Polynomial
function & GMRF function function function function function
early techniques is that they ignore the semantic content of an image. In contrast, second-generation methods break an image down into visually meaningful subcomponents. Typically these are edges, contours, and textures of regions or objects in the image or video. Therefore, not surprisingly, these subdivisions are a strong theme in the emerging MPEG-4 standard for video compression. In addition, second-generation coding techniques often offer scalability where the user can trade picture quality for increased compression. An overview of the human visual system was presented to demonstrate how many of the more successful techniques closely resemble the operation of the human eye. It was explained that the HVS is particularly sensitive to sharp boundaries in a scene and why it finds gradual changes more difficult to identify — with the result that detail in such scenes can be missed. Early coding work made use of the frequency sensitivity of the eye and it can be concluded that the eye is particularly efficient at contrast resolution. The techniques considered herein were categorized into two broad approaches: transform-based coding; and segmentation-based coding. Transform-based coding initially decomposes/transforms the image into low and high frequencies (c.f. the HVS) to highlight these features that are significant to the HVS. It was observed that directional filtering is a technique that more closely matches the operation of the HVS. Following this initial stage, which is generally without loss, the transformed image is quantized and ordered according to visual importance. A range of methods from the simple uniform quantization to the more complex vector quantization can achieve this, with differing visual results in terms of image quality. Much research is still ongoing to find a suitable measure for the effect of second-generation coding techniques on the visual quality of an image. Such measures attempt to quantify the distortion introduced by the technique by providing a figure for the visual distance between the original image and the
48
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
one that has been coded and decoded. Early measures used in firstgeneration image coding, such as MSE, are not necessarily optimal in the HVS domain. The discrete cosine transform is the basis of many transform-based codings. For example, it features in standards ranging from JPEG to MPEG-4. This can be attributed to its useful balance between complexity and performance and it was noted that it approaches the optimal KL transform for highly correlated signals (such as natural images). The multiscale and pyramidal approaches to image coding are multiresolution approaches, again paralleling the HVS. In addition, they offer the possibility of progressive image transmission — an attractive feature. For these reasons, most of the current research in transform-based coding is focused on wavelet codings; indeed, this will be the basis of JPEG 2000, the new standard for still image coding. Although wavelet coding is a generalization of pyramidal coding, it does not increase the number of samples over the original and it also preserves important perceptual information such as edges. As an example of subband coding, wavelets allow each subband to be separately coded allowing a greater bit allocation to those subbands considered to be visually important; for example, the power in natural images is in the lower frequencies. Unlike DCT, there are no blocking artifacts with wavelets, although they do have their own artifact called ‘‘ringing’’ — particularly around high contrast edges. A number of different wavelets exist and research is ongoing into finding criteria to assist in choosing the most suitable wavelet given the nature of the image. Segmentation-based approaches to image coding segment the original image into a set of regions following a preprocessing step to remove noise and small details. Given a set of regions, it is then ‘‘only’’ necessary to record the contour of each region and to code the texture within it. The segmentation approach aims to identify regions with semantic meaning (and hence a high correlation) rather than generic blocks. In addition, it is then possible to apply different codings to different regions as required. This is particularly important for video coding where research is considering how to predict the importance of regions and hence the appropriate allocation of bandwidth. The drawback to the segmentation approach is that it is not currently possible to correctly analyze the semantic content of a generic image in realtime. This section of the paper considered six approaches to segmentation: region growing; split and merge; K-means clustering; pyramidal linking; graph theory; and fractal dimension. All, except perhaps the first two, are still being actively researched. Methods for texture representation range from using the mean value, through polynomial approximations to texture synthesis techniques. Both mean value and polynomial approximations
SECOND-GENERATION IMAGE CODING
49
(which tend to be second-order) yield poor quality images, especially if the image is highly textured. It must be remembered that the semantic representation does not always correspond to the definition of a homogeneous region. At present, the shape-adaptive DCT, particularly that proposed by Sikora (1995), is very popular. Most current research is concentrating on texture synthesis techniques, for example, GMRF. In coding the contours of segmented regions there is a trade-off between exactness and efficiency. Freeman’s (1961) chain code is probably the most referenced technique in the literature, but has been surpassed by techniques based on geometrical approximations, mathematical morphology, and the polygonal approximation of Le Baham et al. (1998). We conclude that the best approach to segmentation-based coding is currently a technique that uses fractal dimensions for the segmentation phase and texture synthesis techniques for the texture representation. The future of image and video coding will probably be driven by multimedia interaction. Coding schemes for such applications must support object functionalities such as dynamic coding and object scalability. Initial research is directed at how to define the objects. Such object-based coding is already being actively pursued in the field of medical imaging. The concept is to define 3D models of organs such as the heart and then, instead of sending an image, the parameters of the model that best match the current data are sent and completed with an error function. This work is still in its infancy. All of the techniques reviewed here have their relative merits and drawbacks. In practice, the choice of technique will often be influenced by non-technical matters such as the availability of an algorithm, or its inclusion in an imaging software library. A direct comparison between techniques is difficult as each is based on different aspects of the HVS. This is often the reason why a technique performs well on one type or source of images but not on others. Comparing the compression ratios of lossy techniques is meaningless if image quality is ignored. Therefore, until a quantitative measure of image quality is established, direct comparisons are really not possible. In the meantime, shared experiences and experimentation for a particular application will have to provide the best method of determining the appropriateness of a given technique.
Acknowledgments The authors would like to thank Julien Reichel, Marcus Nadenau, and Pascal Fleury for their contributions and useful suggestions.
50
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
References Akansu, N., Haddad, R. A., and Caglar, H. (1993). The binomial QMF-wavelet transform for multiresolution signal decomposition, IEEE Trans. Signal Proc., 41: 13—19. Berger, T. (1972). Optimum quantizers and permutation codes, IEEE Trans., Information T heory, 18: (6), 756—759. Bigger, M., Morris, O., and Costantinides, A. (1988). Segmented-image coding: Performance comparison with the discrete cosine transform, IEEE Proc., 135: (2), 121—132. Blackwell, H. (1946). Contrast thresholds of the human eye, Jour. Opt. Soc. Am., 36: 624—643. Brigger, P. (1995). Morphological Shape Representation Using the Skeleton Decomposition: Application to Image Coding, PhD Thesis No. 1448, EPFL, Lausanne, Switzerland. Burt, P. J. and Adelson, E. H. (1983). The Laplacian pyramid as a compact image code, IEEE Trans. Comm., COM-31: (4), 532—540. Burt, P., Hong, T. H., and Rosenfeld, A. (1981). Segmentation and estimation of region properties through cooperative hierarchical computation, IEEE Trans. Syst., Man, Cyber., SMC-11: 802—809. Caglar, H., Liu, Y., and Akansu, A. N. (1993). Optimal PR-QMF design for subband image coding, Jour. V is. Comm. and Image Represent., 4: 242—253. Castagno, R. (1998). V ideo Segmentation Based on Multiple Features for Interactive and Automatic Multimedia Applications, PhD Thesis, Swiss Federal Institute of Technology, Lausanne. Chellapa, R., Chatterjee, S., and Bagdazian, R. (1985). Texture synthesis and compression using Gaussiajn-Markov random fields, IEEE Trans. Syst. Man Cybern. SMC, 15: 298—303. Chen, W. H., Harrison Smith, C., and Fralick, S. C. (1977). A fast computational algorithm for the Discrete Cosine Transform, IEEE Trans. Comm., 1004—1009. Cicconi, P. and Kunt, M. (1977). Symmetry-based image segmentation, Soc. Photo Optical Instrumentation Eng. (W PIE), 378—384. Cicconi, P. et al. (1994). New trends in image data compression, Comput. Med. Imaging Graph, 18: (2), 107—124. Civanlar, M., Rajala, S., and Lee, X. (1986). Second generation hybrid image-coding techniques, SPIE-V isual Comm. Image Process, 707: 132—137. Coifman, R. R. and Wickerhauser, M. V. (1992). Entropy-based algorithms for best basis selection, IEEE Trans. Inform T heory, 38: 713—718. Cornsweet, T. N. (1970). V isual Perception, New York: Academic Press. Crochiere, R. E., Weber, S. A., and Flanagan, F. L. (1976). Digital coding of speech in sub-bands, Bell Syst. Tech. J., 1069—1085. Croft, L. H. and Robinson, J. A. (1994). Subband image coding using watershed and watercourse lines of the wavelet transform, IEEE Trans. Image Proc., 3: 759—772. Croisier, X., Esteban, D., and Galand, C. (1976). Perfect channel splitting by use of interpolation, decimation, tree decomposition techniques, Proc. Int’l Conf. Inform. Sci./Systems, 443—446. Daubechies, I. (1998). Orthonormal bases of compactly supported wavelets, Comm. Pure Appl. Math., 41: 909—996. Daubechies, I. (1993). Orthonormal bases of compactly supported wavelets II, Variations on a theme, SIAM J. Math. Anal., 24: 499—519. Daubechies, I. (1998). Factoring wavelet transforms into lifting steps, J. Fourier Anal. Appl., 4: (4), 245—267. Dinstein, K., Rose, A., and Herman, A. (1990). Variable block-size transform image coder, IEEE Trans. Comm., 2073—2078.
SECOND-GENERATION IMAGE CODING
51
Ebrahimi, T. (1997). MPEG-4 video verification model: A video encoding/decoding algorithm based on content representation, Signal Processing: Image Comm., 9: (4), 367—384. Ebrahimi, T. et al. (1995). Dynamic coding of visual information, technical description ISO/IEC JTC1/SC2/WG11/M0320, MPEG-4, Swiss Federal Institute of Technology. Egger, O., Fleury, P., and Ebrahimi, T. (1996). Shape adaptive wavelet transform for zerotree coding, European Workshop on Image Analysis and Coding, Rennes. Fleury, P. and Egger, O. (1997). Neural network based image coding quality prediction, ICASSP, Munich. Fleury, P., Reichel, J., and Ebrahimi, T. (1996). Image quality prediction for bitrate allocation, in IEEE Proc. ICIP, 3: 339—342. Freeman, H. (1961). On the encoding of arbitrary geometric configuration, IRE Trans. Electronic Computers, 10: 260—268. Froment, J. and Mallat, S. (1992). Second generation compact image coding with wavelets, in Wavelet: A Tutorial in T heory and Applications, C. K. Chui, ed., San Diego: Academic Press. Gerken, P. (1994). Object-based analysis-synthesis coding of image sequences at very low bit rates, IEEE Trans. Circuits, Systems V ideo Technol., 4: (3), 228—235. Golomb, S. (1966). Run length encodings, IEEE Trans. Inf. T heory, IT-12: 399—401. Haber, R. N. and Hershenson, M. (1973). T he Psychology of V isual Perception, New York: Holt, Rinehart and Winston. Haralick, R. (1983). Image segmentation survey, in Fundamentals in Computer V ision, Cambridge: Cambridge University Press. Harris, F. (1978). On the use of windows for harmonic analysis with the discrete Fourier transform, Proc. IEEE, 66: (1), 51—83. Herman, T. (1990). On topology as applied to image analysis, Computer V ision Graphics Image Proc., 52: 409—415. Huffman, D. (1952). A method for the construction of minimum redundancy codes, Proc. IRE, 40: (9), 1098—1101. Ikonomopoulos, A. and Kunt, M. (1985). High compression image coding via directional filtering, Signal Processing, 8: 179—203. Jain, K. (1989). Image transforms, in Fundamentals of Digital Image Processing, Chapter 5, Englewood Cliffs, NJ: Prentice-Hall Information and System Science Series. Jain, K. (1976). A fast Karhunen Loeve transform for a class of random processes, IEEE Trans. Comm., COM-24: 1023—1029. Jang, J. and Rajala, S. (1991). Texture segmentation-based image coder incorporating properties of the human visual system, in Proc. ICASSP’91, 2753—2756. Jang, J. and Rajala, S. (1990). Segmentation based image coding using fractals and the human visual system, in Proc. ICASSP’90, 1957—1960. Jayant, N., Johnston, J., and Safranek, R. (1993). Signal compression based on models of human perception, Proc. IEEE, 81: (10), 1385—1422. Johnston, J. (1980). A filter family designed for use in Quadrature mirror filter banks, Proc. Int’l Conference on Acoustics, Speech and Signal Processing. ICASSP, 291—294. Jordan, L., Ebrahimi, T., and Kunt, M. (1998). Progressive content-based compression for retrieval of binary images, Computer V ision and Image Understanding, 71: (2), 198—212. Kaneko, T. and Okudaira, M. (1985). Encoding of arbitrary curves based on the chain code representation, IEEE Trans. Comm. Comm-33, 7: 697—707. Karhunen, H. (1947). Uber Lineare Methoden in der Wahrscheinlichkeits-Rechnung, Ann. Acad. Science Fenn, A.I: (37). Kocher, M. and Kunt, M. (1983). Image data compression by contour texture modelling, Proc. Soc. Photo-Optical Instrumentation Eng. (SPIE), 397: 132—139.
52
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
Kocher, M. and Leonardi, R. (1986). Adaptive region growing technique using polynomial functions for image approximation, Signal Processing, 11: 47—60. Kovalevsky, X. (1993). Topological foundations of shape analysis: Shape picture-math. description shape grey-level images, NATO ASI Series F: Comput. Systems Sci., 126: 21—36. Kunt, M. (1998). A vision of the future of multimedia technology, Mobil Multimedia Communication, Chapter 41, pp. 658—669, New York: Academic Press. Kunt, M., Benard, M., and Leonardi, R. (1987). Recent results in high compression image coding, IEEE Trans. Circuits and Systems, CAS-34: 1306—1336. Kunt, M., Ikonomopoulos, A., and Kocher, M. (1985). Second generation image coding, in Proc. IEEE, 73: (4), 549—574. Kwon, O. and Chellappa, R. (1993). Segmentation-based image compression, Optical Engineering, 32: (7), 1581—1587. Lai, Yung-Kai and Kuo, C,-C. Jay (1998a). Wavelet-based perceptual image compression, IEEE International Symposium on Circuits and Systems, Monterey, California, May 31—June 3, 1998. Lai, Yung-Kai and Kuo, C.-C. Jay (1998b). Wavelet image compression with optimized perceptual quality, Conference on ‘‘Applications of Digital Image Processing XXI,’’ SPIE’s Annual Meeting, San Diego, CA, July 19—24, 1998. Leou, F. and Chen, Y. (1991). A contour based image coding technique with its texture information reconstructed by polyline representation, Signal Processing, 25: 81—89. Lewis, S. and Knowles, G. (1992). Image compression using the 2-D wavelet transform, IEEE Trans. Image Processing, 1: 244—250. Lin, Fu-Huei and Mersereau, R. M. (1996). Quality measure based approaches to MPEG encoding, in Proc. ICIP, 3: 323—326, Lausanne, Switzerland, September 1996. Loe`ve, M. (1948). Fonctions aleatoires de second ordre, Processus stochastiques et mouvement brownien, P. Levvey, ed., Paris: Hermann. Lu, J., Algazi, V. R., and Estes, R. R. (1996). A comparative study of wavelet image coders, Optical Engineering, 35: (9), 2605—2619. Macq, B. (1989). Perceptual Transforms and Universal Entropy Coding For an Integrated Approach to Picture Coding, PhD Thesis, Universitie Catholique de Louvain, Louvain-laNeuve, Belgium. Mallat, S. G. (1989a). Multifrequency channel decomposition of images and wavelet models, IEEE Trans. Acoustics, Speech and Signal Processing, 37: 2091—2110. Mallat, S. G. (1989b). A theory of multiresolution signal decomposition: the wavelet representation, IEEE Trans. Pattern Anal. and Machine Intell., 11: 674—693. Mallat, S. G. (1991). Zero-crossing of a wavelet transform, IEEE Trans. Inform. T heory, 37: 1019—1033. Mallat, S. G. and Zhong, S. (1991). Compact coding from edges with wavelets, in Proc. ICASSP’91, 1745—2748. Mandlebrot, B. (1982). T he Fractal Geometry of the Nature, 1st edition, New York: Freeman. Mannos, J. L. and Sakrison, D. J. (1974). The effects of a visual fidelity criterion on the encoding of images, IEEE Trans. Information T heory, 20: (4), 525—536. Marques, F., Gasull, A., Reed, T., and Kunt, M. (1991). Coding-oriented segmentation based on Gibbs-Markov random fields and human visual system knowledge, in Proc. ICASSP’91, 2749-2752. Miyahara, M., Kotani, K., and Algazi, V. R. (1992). Objective picture quality scale (PQS) for image coding, Proc. SID Symposium for Image Display, 44: (3), 859—862. Morris, O., Lee, M., and Constantinides, A. (1986). Graph theory for image analysis: An approach based on the shortest spanning tree, IEEE Proc. F, 133: (2), 146—152. Moscheni, F. (1997). Spatio-Temporal Segmentation and Object Tracking: An Application to
SECOND-GENERATION IMAGE CODING
53
Second Generation V ideo Coding, PhD Thesis, Swiss Federa; Institute of Technology, Lausanne. Moscheni, F., Bhattacharjee, S., and Kunt, M. (1998). Spatiotemporal segmentation based on region merging, IEEE Trans. Pattern Anal. Mach. Intell., 20: (9), 897—915. Nadenau, M. and Reichel, J. (1999). Compression of color images with wavelets under consideration of HVS, Human V ision and Electronic Imaging, IV, San Jose. Narasimha, M. J. and Peterson, A. M. (1978). On the computation of the Discrete Cosine Transform, IEEE Trans. Comm., COM-26: 934—936. Osberger, W., Maeder, A. J., and Bergmann, N. (1996). A perceptually based quantization technique for MPEG encoding, Proc. SPIE, Human V ision and Electronic Imaging, 3299: 148—159, San Jose, CA. Pal, N. and Pal, S. (1993). A review on image segmentation techniques, in Pattern Recognition, 26: (9), 1277—1294. Pavlidis, T. (1982). Algorithms for Graphics and Image Processing. 1st edition, Rockville, MD: Computer Science Press. Pearson, D. E. (1975). Transmission and Display of Pictorial Information, London: Pentatech. Perona, P. and Malik, J. (1990) Scale-space and edge detection using anisotropic diffusion, IEEE Trans. Pattern Anal. Machine Intell., 12: (7), 629—639. Poirson, A. B. and Wandell, B. A. (1996). Pattern-color separable pathways predict sensitivity to simple colored patterns, V ision Research, 36: (4), 515—526. Poirson, A. B. and Wandell, B. A. (1993). Appearance of colored patterns: pattern-color separability, Optics and Image Science, 10: (12), 2458—2470. Ramchandran, K. and Vetterli, M. (1993). Best wavelet packet bases in a rate-distortion sense, IEEE Trans. Image Processing, 2: 160—175. Rose, A. (1973). V ision — Human and Electronic, New York: Plenum. Rosenfeld, A. and Kak, A. C. (1982). Digital Picture Processing, San Diego: Academic Press. Said, A. and Pearlman, W. A. (1996). A new, fast, and efficient image codec based on set partitioning in hierarchical trees, IEEE Trans. Circuits and Systems for V ideo Technology, 6: (3), 243—250. Salembier, P. and Kunt, M. (1992). Size-sensitive multiresolution decomposition of images with rank order based filters, Signal Processing, 27: 205—241. Samet, H. (1989a). Applications of Spatial Data Struc‘tures, 1st edition, Reading, MA: Addison-Wesley. Samet, H. (1989b). T he Design and Analysis of Spatial Data Structures, 1st edition, Reading, MA: Addison-Wesley. Schalkoff, R. J. (1989). Digital Image Processing and Computer V ision, Singapore: John Wiley and Sons. Schreiber, W. F. (1963). The mathematical foundation of the synmthetic highs systems, MIT, RLE Quart. Progr. Rep., No. 68, p. 140. Schreiber, W. F., Knapp, C. F., and Kay, N. D. (1959). Synthetic highs, an experimental TV bandwidth reduction system, Jour. SMPT E, 68: 525—537. Schroeder, M. R. and Mech, R. (1995). Combined description of shape and motion in an object based coding scheme using curved triangles, IEEE Int. Conf. Image Proc., Washington, 2: 390—393. Senoo, T. and Girod, B. (1992). Vector quantization for entropy coding of image subbands, IEEE Trans. Image Proc., 1: 526—533. Shapiro, J. M. (1993). Embedded image coding using zerotrees of wavelet coefficients, IEEE Trans. Signal Proc., 41: (12), 3445—3462. Sikora, T. (1995). Low complexity shape-adaptive DCT for coding of arbitrarily shaped image segments, Signal Processing: Image Communication, 7: 381—395.
54
N. D. BLACK, R. J. MILLAR, M. KUNT, M. REID, AND F. ZILIANI
Sikora, T. and Makai, B. (1995). Shape-adaptive DCT for generic coding of video, IEEE Trans. Circuits and Systems for V ideo Technol., 5: 59—62. Simoncelli, P. and Adelson, E. H. Efficient Pyramid Image Coder (EPIC), a public domain software available from URL: ftp://ftp.cis.upenn.edu/pub/eero/epic.tar.Z (Jan. 2000). Smith, M. J. T. and Barnwell, T. P. (1986). Exact reconstruction techniques for tree structured subband coders, IEEE Trans. Acoustics, Speech, and Signal Processing, 34: 434—441. Sziranyi, T., Kopilovic, I., Toth, B. P. (1998). Anisotropic diffusion as a preprocessing step for efficient image compression, 14th ICPR, Brisbane, IAPR, Australia, pp. 1565—1567, August 16—20, 1998. Taubman, D. and Zakhor, A. (1994). Multirate 3-D subband coding of video, IEEE Trans. Image Proc., 572—588. Toet, A. (1989). A morphological pyramid image decomposition, Pattern Recognition L etters, 9: 255—261. Vaidyanathan, P. P. (1987). Theory and design of M channel maximally decimated QMF with arbitrary M, having perfect reconstruction property, IEEE Trans. Acoustics, Speech, and Signal Processing. Van den Branden Lambrecht (1996). Perceptual Models and Architectures for V ideo Coding Applications, PhD Thesis, Swiss Federal Institute of Technology, Lausanne, Switzerland. Vetterli, M. (1984). Multi-dimensional subband coding: some theory and algorithms, IEEE Trans. Acoustics, Speech, and Signal Processing, 97—112. Wandell, A. (1995). Foundations of V ision, Sunderland, MA: Sinauer Associates, Inc. Publishers. Wang, T. P. and Vagnucci, A. (1981). Gradient inverse weighted smoothing scheme and the evaluation of its performance, Computer Graphics and Image Processing, 15, 167—181. Welch, T. (1977). A technique for high performance data compression, IEEE Computing, 17: (6), 8—19. Westen, S. J. P., Lagendijk, R. L., and Biemond, J. (1996a). Optimization of JPEG color image coding under a human visual system model, Proc. SPIE Human V ision and Electronic Imaging, 2657: 370—381, San Jose, CA. Westen, S. J. P., Lagendijk, R. L., and Biemond, J. (1996b). Spatio-temporal model of human vision for digital video compression, Proc. SPIE, Human V ision and Electronic Imaging, 3016: 260—268. Winkler, S. (1998). A perceptual distortion metric for digital color images, in Proc. ICIP, 1998, Chicago, IL, 3: 399—403. Woods, J. and O’Neil, S. (1986). Subband coding of images, IEEE Trans. Acoustics, Speech, and Signal Processing, 1278—1288. You, Y., Xu, W., Tannenbaum, A., and Kaveh, M. (1996). Behavioral analysis of anisotropic diffusion in image processing, IEEE Trans. Image Processing, 5: (11), 1539—1553. Zhou, Z. and Venetsanopoulos, A. N. (1992). Morphological methods in image coding, Proc. Int’l Conf. Acoust., Speech, and Signal Processing ICASSP, 3: 481—484. Ziemer, R. Tranter, W., and Fannin, D. (1989). Signals and Systems: Continuous and Discrete, 2nd edition, New York: Macmillan. Ziliani, F. (1998). Focus of attention: an image segmentation procedure based on statistical change detection, Internal Report 98.02, LTS, Swiss Federal Institute of Technology, Lausanne, Switzerland. Ziliani, F. and Jensen, B. (1998). Unsupervised image segmentation using the modified pyramidal linking approach, Proc. IEEE Int. Conf. Image Proc., ICIP’98.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 112
The Aharonov-Bohm Effect — A Second Opinion WALTER C. HENNEBERGER Department of Physics, Southern Illinois University, Carbondale, IL 62901-4401
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objections to the ‘‘standard’’ interpretation of the Aharonov-Bohm effect are discussed in detail. In particular, it may not be interpreted as a ‘‘scattering’’ effect. II. The Vector Potential . . . . . . . . . . . . . . . . . . . . . . . . . . T he role of the vector potential in the AB effect is discussed. T he transverse vector potential is shown to be related to the electromagnetic momentum, which is, indeed, a physical quantity. III. Dynamics of the Aharonov-Bohm Effect . . . . . . . . . . . . . . . . . A rigorous proof is given that, in Coulomb gauge, (e/c)A is just the electromagnetic momentum of the electron. T hus, in Coulomb gauge, A is an observable. T he longitudinal part of A carries no physics; it is merely a computational convenience. IV. Momentum Conservation in the Aharonov-Bohm Effect . . . . . . . . . . . In the AB effect, there is no force on the electron. T he electron does, however, exert a force on the flux whisker or solenoid. T he force on the solenoid and the time rate of change of electromagnetic momentum constitute an action-reaction pair. V. Stability of the AB Effect . . . . . . . . . . . . . . . . . . . . . . . . T he reason for the stability of the fringe pattern is discussed. VI. The AB Effect Can Not Be Shielded . . . . . . . . . . . . . . . . . . . T he interaction of a passing electron with a superconducting shield is discussed VII. Interaction of a Passing Classical Electron with a Rotating Quantum Cylinder . T he solenoid is represented as a charged, rotating cylinder. It is shown that the rotating cylinder suffers a phase shift which is equal and opposite to that of the electron in an AB experiment. It is shown further that this result follows directly from classical mechanics VIII. Solution of the Entire Problem of the Closed System . . . . . . . . . . . . A correct solution of the AB problem is given. T he problem involves three degrees of freedom, not two, as is usually thought. T he solution involves bringing the L agrangian of the problem to normal coordinates, and subsequently quantizing the system. It is shown that problems in quantum theory do not (if canonical transformations are allowed) have unique solutions. IX. The Interior of the Solenoid . . . . . . . . . . . . . . . . . . . . . . . A semiclassical theory of an electron in a constant magnetic field is given. T he correct treatment of this problem also involves three degrees of freedom. It is shown that Berry phase has a dynamic origin.
56
63
66
69
70 71 74
81
86
55 Volume 112 ISBN 0-12-014754-8
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright 2000 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/00 $35.00
56
WALTER C. HENNEBERGER X. Ambiguity in Quantum Theory, Canonical Transformations, and a Program for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other examples of ambiguities in quantum theory are cited. T he oldest and best known of these is the p · A vs. r · E question. It is argued that one obtains better solutions of problems by eliminating velocity-dependent potentials by means of suitable canonical transformations. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
93
I. Introduction In 1949, Ehrenberg and Siday proposed an experiment for detecting phase shifts in an interference pattern due to the presence of a magnetic field confined to a region not accessible to the electrons. Ten years later, Aharonov and Bohm made a detailed study of such a system, in which a
Figure 1a. Idealized Aharonov-Bohm experiment.
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
57
coherent beam of electrons is directed around two sides of a solenoid. Since the de Broglie wavelength (which is a gauge dependent quantity) depends upon the vector potential A, quantum theory predicts a shift in the interference pattern that one obtains. The experiment is typically carried out by concealing a solenoid or a whisker of flux between two slits, as shown in Figure 1a. The slit system serves the purpose of preventing the electrons from entering the flux-carrying regions, as well as providing an initial interference pattern that serves as a basis upon which to compare the pattern with the flux present. Figure 1b shows the result of Mo¨llenstedt and Bayh (1962), obtained by stopping the interference pattern down to a narrow region and moving the rectangular stop vertically as the current in the solenoid was increased. One sees clearly that the Aharonov-Bohm (AB) effect is a right-left shift in an interference pattern. This is the phenomenon that is observed, and this is the phenomenon that the theorist must explain.
Figure 1b. Result of Mo¨llenstedt and Bayh.
58
WALTER C. HENNEBERGER
Figure 2. An AB interference pattern of a flux whisker at the center of a wide slit.
It is possible, in principle, to obtain an effect without the slit system, but the experimental difficulties involved are greater and the interference pattern would not be as clear. A theoretical result based on a flux whisker of zero thickness is shown in Figure 2. In this computation (Shapiro and Henneberger, 1989) (based on the Feynman path integral method), the whisker is at the center of a very wide single slit. The result shows clearly the contribution of the slit edge, as well as the interference pattern of the flux whisker. There has been ample experimental evidence for the AB effect, from the early experiments to the elegant experiments of Tonomura and coworkers. This writer has never doubted the existence of an AB effect. Physics is, after all, an experimental science. Quantum theory predicts an AB effect, and the theory, at least in a limited fashion, appears to be well understood by the experimental community that works in electron optics. The point at which this author dissents strongly from the viewpoint of his theoretical colleagues is on the topic of AB ‘‘scattering.’’ In 1959, Aharonov and Bohm treated the problem described here as a scattering problem, with electrons being scattered by an external vector potential. In units in which : : 1 ( is the electron mass, and is Planck’s constant divided by 2),
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
59
the Hamiltonian of the system is H:
1 e P9 A(r) 2 c
(1)
with A : 0 and A : /2r. is the total flux in the whisker or solenoid. F AB found the stationary states m : J
e F m : 0, <1, <2, <etc. ? which led to the differential cross section sin() d/d : 2k sin( /2)
(2)
(3)
where : 9(e/ch) and k is the electron momentum. The AB calculation has been repeated many times. It is generally agreed that the mathematics of the derivation is flawless. The problem lies with the result. One may make the following objections. 1. The total cross section is divergent. Of course, a divergent cross section also occurs in Coulomb scattering. In the Coulomb case, however, one recognizes that this result is due to the infinite range of the Coulomb force. In the AB effect, there is no force at all. 2. d/d is symmetric in . Experiments all give the right-left shift in an interference pattern. The AB result is not even qualitatively correct. It is not reasonable to believe that one can change the symmetry properties of the result by going to thinner flux whiskers. 3. The velocity operator is v:P9
e e A(r) : 9i 9 A(r). c c
(4)
It has components v : v cos 9 v sin F v : v sin ; v cos . F It is convenient to introduce the operators v : v ; iv
v : v : v 9 iv .
\ By straightforward substitution, we have v : 9ie F/r ; (e F/r)/ ; ie F/r
v : v : 9e\ F/r 9 (e\ F/r)/ 9 ie\ F/r.
\
(5a) (5b) (6a) (6b) (7a) (7b)
60
WALTER C. HENNEBERGER
In the Hilbert space of functions that vanish at the origin (the electrons are excluded from the flux-carrying region), we have and
[v ,
v ]:0 \
(8)
H : v v . (9) \ We see that [H,v] : 0, so that v is an integral of motion. This implies that the electrons move in a straight line with constant speed while they are being scattered. This is not, as is generally claimed, a quantum effect. It is a contradiction. In a world in which statements are simultaneously true and false, there can be no such thing as knowledge, and science becomes an illusion. The situation becomes still worse (Henneberger, 1981). The canonical angular momentum operator is M : 9i/. The commutation relations for M, v and v are
\ [M, v ] : v
[M, v ] : 9v \ \ [M, H] : 0.
(10) (11a) (11b) (11c)
Thus, M and H commute, so that we may use the eigenvalue m of the angular momentum operator to characterize the eigenstates that comprise the basis of a Hilbert space of states having energy k/2. Therefore, if Mm : mm, then Eq. (11a) implies that Mv m : (m ; 1)v m
and Eq. (11b) implies that
(12a)
Mv m : (m 9 1)v m. (12b) \ \ Therefore v is an operator that raises the eigenvalues of M. Similarly, v
\ is a lowering operator. For clarity, we restrict to the range 0 1. Other values of can be treated by making obvious changes in the following discussion. Let us consider the action of the operators v and v on the AB states of
\ Eq. (2). This is illustrated in Fig. 3a. We see that the operator v can be
used to generate the positive m states from the state 0. The operator v \ may similarly be used to generate the negative m states from the state 91. These results follow easily from the recurrence relations for Bessel functions. However, there are two distinct chains of eigenstates that are not linked to each other by v and v . From the recurrence relations for Bessel functions, \
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
61
Figure 3a. Action of v and v on Aharonov-Bohm eigenstates.
\
we see that v 0 involves J (kr). This function is not in the Hilbert \ \ ? space of AB eigenfunctions. It is infinite at the z axis, while the AB eigenfunctions all vanish there. Applying v twice to the state 0 produces \ a function that is not square integrable over any region containing the z axis. Similarly, the function v 91 does not lie in the AB Hilbert space
(Fig. 3a). Pauli (1939, 1958) has emphasized that such states are unsuitable. Pauli wrote that ‘‘A general criterion for the admissibility of eigenfunctions, which does not assume single-valuedness at the outset, was given by W. Pauli. This says that the repeated application of operators corresponding to physical properties may not lead to functions outside the space of quadratically integrable functions.’’ (Author’s translation.) There is good reason for Pauli’s criterion. Any perturbation leading to nonvanishing matrix elements between states would eventually lead to a state that is not normalized. Thus, the AB states are not suitable for any perturbation calculation unless one makes further assumptions that modify the system (Henneberger, 1981). It is noteworthy that, if one chooses the multivalued functions : J (kr)e JFe\ ?F, : 0, <1, <2, (13) J one obtains a consistent set of normalizable states, which is closed under the operators v and v , as shown in Fig. 3b.
\
62
WALTER C. HENNEBERGER
Figure 3b. Action of v and v on eigenstates of Eq. (13).
\
It is further noteworthy that the propagator that satisfies the integral equation (r, t) :
K(r, r; t)(r , 0) dr
(14)
is given by K(r , r; t) : *(r ) (r) exp(9iE t/ ), (15) where the (r) are the stationary state solutions of the Schro¨dinger equation. Insertion of the states of Eq. (13) into Eq. (15) yields the Feynman propagator. Theoretical results that show a displacement in an interference pattern in AB calculations are all based on the Feynman path integral method. It should be noted that there is considerable literature involving the path integral method in which electron paths winding around the flux one or more times are included. In this writer’s opinion, such paths are forbidden
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
63
by quantum theory, as the velocity vector is an integral of the motion. The result of this section leads to an interesting debacle. On one hand, the correct set of eigenstates for the AB problem ‘‘must be’’ the states of Eq. (13). On the other hand, these states ‘‘can not be’’ the correct ones, as they are not single valued. Yang (1983) has written: ‘‘We emphasize that to challenge the single valuedness requirement of the wave function is to challenge the very foundation of quantum mechanics itself.’’ How is this dilemma to be resolved? It turns out that these two points of view are not contradictory. The reader is urged to read on.
II. The Vector Potential The vector potential is probably the most misunderstood function in physics. It is generally agreed that the AB effect is an effect of the vector potential. However, this statement takes on meaning only when the role of the vector potential in physics is clarified. Conventional wisdom states that the vector potential is not a physical quantity because of the freedom to make gauge transformations. This view, while not completely wrong, has spawned many papers that are almost metaphysical. Let us bring the vector potential back to the real world by considering the following experiment. A parallel plate capacitor having plates of area A separated by a distance d is discharged in a uniform magnetic field B, as shown in Fig. 4. The magnetic field permeates all of space. The resistance in the circuit is R. At t : 0, the switch S is closed and a current i flows in the circuit. The forces on the wire that do not cancel are those on a projection of the wire on the distance d. The force to the right at any time is F : Bid/c, and the final momentum of the capacitor, wire, resistance, and switch (assumed to be rigidly connected) is
i dt : BQd/c. We will not be surprised to find that momentum is conserved. The initial electromagnetic momentum is P
F dt : (Bd/c)
: 1/(4c)
EB dr : EBAd/(4c) : BQd/c.
Let us now compute e A /c in Landau gauge. We have A : 9By A : A : 0.
64
WALTER C. HENNEBERGER
Figure 4. Apparatus demonstrating potential momentum.
If we let y : 0 be the plane of the positively charged plate, then we find e A /c : (9Q/c)(9Bd) : QB d/c. In his lectures on relativity at Purdue University, Belinfante described the quantity eA/c as a ‘‘potential momentum.’’ The motivation there was the fact that (kinetic momentum) and W /c, where W is the proper energy, form a 4-vector, as do eV and eA. The reader will recall that the proper energy of a particle is (c ; c). Therefore, eA/c is a ‘‘potential momentum.’’ The example of the discharging capacitor shows vividly the nature of this potential momentum. Here, potential momentum is converted into kinetic momentum, just as a ball rolling down a plane has its potential energy converted into kinetic energy.
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
65
So it is in the AB effect. The flux whisker or solenoid is penetrated by the electric field of the passing electron. (It is not the tail of the wave function that penetrates the flux, as is sometimes thought.) At this point, the reader will, no doubt, object strenuously on two grounds: 1. In the AB effect, there is no force on the electron. 2. The elegant experiments of Tonomura et al. (1986a,b) have shown that the AB effect persists when there is no net electric field in the fluxcarrying region. Each of these objections will be discussed in detail. However, let us proceed with a logical development of the theory. We return to a discussion of the vector potential. Any vector potential that is due to localized sources can be uniquely decomposed into two vector functions: A(r) : AQ(r) ; A(r)
with · AQ(r):0
and
; A(r) : 0. (16)
AQ(r) , the ‘‘transverse part’’ of A(r) is given by Jackson (1975) AQ(r) :
1 ;; 4
A(r ) dr . r 9 r
(17)
The ‘‘longitudinal part’’ of A(r) is given by A(r) : 9
1 4
· A(r ) dr . r 9 r
(18)
Two facts immediately come to one’s attention: (1) The ‘‘transverse part’’ is a global concept. One must know A(r) everywhere in space in order to extract the transverse part. (2) The transverse part of a vector potential will, in general, not be unique if there are fields at infinity. Thus, a uniform magnetic field in the Z direction may be described in Landau gauge (already discussed) or by A : Br, A : 0, A : 0. Another example is the flux F whisker in the AB problem. Here A may be given by A : /(2r), A : 0, F and also by A : 0, A : H(x)(y). In the latter example, H(x) is the Heaviside step function: H(x) : 0 for x 0, H(x) : 1 for x 0. (y) is the usual Dirac delta function. It is clear that
A · dl :
for all paths that encircle the origin, and zero for all paths that do not. This particular vector potential allows the flux whisker to be treated as a phase plate that covers the positive x axis.
66
WALTER C. HENNEBERGER
It is the unique (in all physically realizable problems) transverse part of the vector potential that carries all of the physics. This function is, in every case, due to the penetration of some magnetic flux-carrying region by the electron’s electric field. This was already known to Thomson (1904), and it appears in a text by Konopinski (1981). A rigorous proof will be given in the following section. A(r) carries no physics. One takes advantage of the existence of A(r) in order to do manifestly covariant calculations. All of the physics is contained in the electric and magnetic fields, and hence, in the transverse part of the vector potential. In any problem involving a magnetic field, one should imagine the charged particle’s electric field lines intersecting magnetic field lines. This is what the transverse vector potential (the physical part of the vector potential!) is all about. III. Dynamics of the Aharonov-Bohm Effect In order to fully understand the forces involved in the AB effect, it is essential to consider the effect of the electron’s own fields on the solenoid. We follow the approach of Zhu and Henneberger (1990). The interaction energy in the AB problem is purely magnetic. E :
1 4
B (r 9 r) · B (r ) dr
(19)
with 1 B : ; E(r 9 r) c
(20)
for an electron at point r moving with velocity v. It is useful to compute 1 4
1 B (r 9 r) · B (r ) dr : 4c :
1 4c
; E(r 9 r) · B (r ) dr · E(r 9 r) ; B (r ) dr
: ( ·P ):( ·)P ;;(;P ) d : P ; ; ( ; P ). (21) dt We remind the reader that P : 1/(4c) E ; B dr . As the reader has probably guessed, the term v ; ( ; P ) is just the Lorentz force. This
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
67
can be seen as follows: 1 ; 4c
1 E(r 9 r) ; B (r ) dr : 9 4c ;
1 4c
B(r )[ · E(r 9 r)] dr [B (r ) · ]E(r 9r) dr .
(22)
The relation · E(r 9 r) : 9 · E(r 9 r) yields 1 ;P : 4c
1 B 4e(r 9 r) dr ; 4c
[B (r ) · ]E(r 9 r) dr . (23)
The first term of Eq. (23) is (e/c)B (r). The second term can be shown to vanish. Let + be an arbitrary constant vector:
+·
[B(r ) · ]E(r 9 r) dr : :
9
[B(r ) · ][+ · E(r 9 r) dr · [+ · E(r 9 r)]B(r ) dr
+ · E(r 9 r) · B(r ) dr .
(24)
The first integral can be converted into a surface integral that vanishes at infinity. The second integral vanishes since · B : 0. We thus have the result
1 d e B (r 9 r) · B (r ) dr : P ; ;B (25) 4 dt c Equation (25) provides insight into the dynamics of the AB problem. The Lorentz force is the time rate of change of the kinetic momentum . The left-hand side of Eq. (25) reads F : (E ), not F : 9(E ), with E given by Eq. (19). We are assuming that the current in the solenoid windings and the probability current density of the electron are kept constant. The consequences of this assumption will be discussed later. It is well known that the magnetic force between current-carrying conductors is given by F : (E ) when the currents are kept constant. Our force now reads 1 4
dP B (r 9 r) · B (r )dr : dt
(26)
with P : ; P . The usual statement that there is no force in the AB problem requires clarification. There is no mechanical force (i.e., no force on
68
WALTER C. HENNEBERGER
the electron). However, the rate of change of total momentum is not zero, because the electric and magnetic fields of the electron penetrate the solenoid. (In cases in which the interior of the solenoid is shielded, the electric and magnetic fields of the electron interact with currents in the shielding materials. An extensive discussion of this will be given shortly.) In the preceding paragraphs, we have seen that e ;P : B. c
(27)
It is therefore clear that there exists a ‘‘natural’’ gauge in which e P : A. c
(28)
The reader should not be surprised to learn that this ‘‘natural’’ gauge is the Coulomb gauge. 1 P (r) : 4c :
1 4c
E(r 9 r) ; [ ; A(r )] dr [E(r 9 r) · A(r )] dr 9
1 4c
9
1 4c
9
1 4c
[A(r ) · ]E(r 9 r) dr [E(r 9 r) · ]A(r ) dr A(r );[ ;E(r 9r)] dr . (29)
The fourth term of Eq. (29) vanishes because ; E : 0 for the Coulomb field of the electron. The first term vanishes by a corollary of Gauss’ theorem. The second integral vanishes in Coulomb gauge. (We assume that the external magnetic field has a finite source, so that the fields vanish at infinity.) Again, taking + to be an arbitrary vector, we find
+·
[A(r ) · ]E(r 9 r) dr :
· [A(r ) + ·E(r 9 r)] dr
9
[+ · E(r 9 r)] · A(r ) dr . (30)
The first integral of Eq. (30) can be converted into a vanishing surface integral. The second term vanishes in Coulomb gauge, · A(r ) : 0. The
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
69
third integral of Eq. (29) is then 1 4c
A(r ) · E(r 9 r) dr :
1 c
e A(r )e(r 9 r) dr : A(r). c
(31)
Thus, we have a rigorous proof that for any problem involving sources of finite extent in space, the electromagnetic momentum is given by e/c(A), where A(r) is the uniquely defined vector potential in Coulomb gauge. IV. Momentum Conservation in the Aharonov-Bohm Effect We consider the more general problem of a moving electron in or near a magnetic field, as discussed by Al-Jaber and Henneberger (1992). We consider a source of flux that is rigid and fixed in spatial orientation. Let a fixed point in the body of the source have a displacement r. Let r be the displacement vector of the electron. The interaction energy of the current source with the electron is :
1 4
B (r 9 r) · B (r 9 r) dr .
(32)
Equation (32) is a trivial generalization of Eq. (19). The force on the whisker is 1 F : B (r 9 r) · B (r 9 r) dr . (33) 4
In the nonrelativistic limit B (r 9 r) : ; E(r 9 r), c
(34)
so that F is given by F: :
1 4c
1 · 4c
: ( · P
; E(r 9 r) · B (r 9 r) dr
E(r 9 r) ; B (r 9 r) dr
(r 9 r) : 9( · P (r 9 r). We have seen in Eq. (21) that ( · P ) :
dP e ; ; B . dt c
(35)
70
WALTER C. HENNEBERGER
We thus have F;
e dP ; ; B : 0, dt c
(36)
where F is the force exerted on the source of the external field. In the case of the AB effect, the Lorentz force is zero. The force on the flux whisker or solenoid is the negative of the time rate of change of the electromagnetic momentum. If no other forces act on the flux whisker, the total momentum of the flux whisker plus that of the electromagnetic field is conserved. The forces on the field and on the flux whisker form an action-reaction pair. The role of the electromagnetic momentum is central to the Aharonov-Bohm and Aharonov-Casher effects. This has been discussed by several authors [Goldhaber, 1989; Zhu and Henneberger, 1990).
V. Stability of the AB Effect In the previous section, we have assumed the flux in the solenoid to be absolutely constant. Let us examine what this assumption entails. We assume an ideal solenoid of n turns per cm. In order to maintain the solenoid flux constant, the current must be well regulated. The passing electron induces a voltage into the windings in dz given by n d dE : 9 dz c dt
B (r) da,
(37)
where the integration is over the cross section of the solenoid. The total induced voltage in the solenoid is therefore E:9
n d d dt
(38)
B dr,
where the integration is over the volume of the solenoid. To maintain the current (and thus, the solenoid flux) fixed, the current source must produce an additional voltage given by the negative of Eq. (38). The source must therefore produce an additional instantaneous power nl d P : c dt
1 d B dr : 4 dt
B B dr.
(39)
Thus, the additional field energy must be supplied by the current source of the solenoid. It is interesting that, in an ideal AB experiment, this power is positive or negative depending on whether the electron passes to the right
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
71
or to the left of the solenoid. In an ideal AB experiment, the power source itself must become a quantum system! The conditions of an ideal AB experiment clearly can not be realized in the laboratory. There are many magnetic fields acting on the solenoid that together induce a larger voltage than the field of the passing electron, for example, fluctuations in the flux due to current fluctuations, fields of other passing charged particles, and the field due to the electron’s own magnetic moment. In the following, it is shown that small voltage fluctuations that are not correlated with the motion of the electron do not affect the fringe pattern. The wave properties of a particle in quantum theory depend upon the canonical momentum (the sum of and P ). Let be the fluctuation in the flux that occurs in time t due to all random causes. This fluctuation in the flux induces an electric field on an electron 1 E :9 , F 2rc t which gives rise to the momentum change e : eE t : 9 . F F 2rc The change in P
(40)
is (with A in Coulomb gauge, of course) e e P : A : . F c F 2rc
(41)
We see that P : ; P : 0. (42) The total momentum (and hence, the local wavelength) is unaffected by variations in the flux. Were it not for Eq. (42), observations of the AB effect would be much more difficult. Equation (42) guarantees the stability of the fringes.
VI. The AB Effect Can Not Be Shielded The first of the objections of Section II has now been discussed in detail, and we have seen the role played by the electromagnetic momentum, or ‘‘potential momentum’’ in the AB effect. We now turn our attention to the observation that fields can be shielded from the interior of the flux-carrying solenoid, and the AB effect persists.
72
WALTER C. HENNEBERGER
The beautiful and well-known experiment of Tonomura and coworkers (1986a,b), which demonstrates the AB effect in a flux-carrying torus surrounded by a superconducting shield, has an aspect that is seldom discussed. The purpose of the shield was to ensure that no flux leakage occurs. The shield enabled flux quantization to be observed when it was at a superconducting temperature. A thoughtful person may wonder why, at superconducting temperature, there was any effect at all. Since neither electric or magnetic fields penetrate the shield, how does information regarding the flux reach the electron? This question may be answered at two levels. The obvious answer is the superficial one: The AB effect is an effect of the vector potential, and the vector potential is always related to the flux by Stokes’ theorem. Stokes’ theorem is a mathematical identity. It can not be violated. Hence, the result of the Tonomura experiment is the only one possible. Many readers probably find this explanation satisfactory. On the other hand, we have seen that, in Coulomb gauge, the vector potential is related to the electromagnetic momentum of the passing electron by e 1 A(r) : P : 4c c
E (r 9 r) ; B (r ) dr .
Thus, in a manner of speaking, the vector potential (modulo a gauge transformation) is just another name for the electromagnetic momentum. This appears to present a problem. If the electric field can not overlap the magnetic flux because of the presence of the superconducting shield, how can there be a vector potential? How is information regarding the flux transmitted to the electron? The answer to this question is not difficult. Consider an AB effect, either with a flux whisker or a solenoid along the z axis, or a toroidal flux, as in the Tonomura experiment. In either case, let the flux be completely surrounded by a superconducting shield. Imagine an electron passing in the x-y plane. The inner surface of the shield will, in general, carry some current because of flux quantization. The total flux is the sum of the flux due to the whisker and the flux due to the interior surface current. In the absence of passing charged particles, the outer surface of the shield can carry no current. A current on the outer surface would create a magnetic field inside the superconducting material. Now consider the changes brought about by a passing electron. In the following, we represent the electromagnetic momentum of the electron by P and the electromagnetic momentum of the superconducting electrons on
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
73
the outer surface of the shield by P . These quantities are then given by 1 P : E (r 9 r) ; B (r ) dr (43) 4c
1 P: 4c
E (r ) ; B (r ) dr .
(44)
In the forementioned, E (r 9 r) is the electric field at a point r due to an electron at the point r. E (r ) is the electric field due to the superconducting electrons (together with the background charge) at the point r . The integrals are over all space, but the integrands are nonvanishing only in the region enclosed by the inner surface of the shield. As the net electric field inside the outer surface of the superconductor must vanish, we have E (r 9 r) : 9E (r ) (45) at all points r inside the outer superconductor surface, and therefore we have P : 9P . (46) The electromagnetic momenta of the electron and of the surface charges of the superconducting shield are equal and opposite. However, they do not cancel. They are momenta of different charges. Differentiation of Eq. (46) with respect to time yields a relation that is reminiscent of Newton’s third law. It is important to realize that the field of the passing electron is seen as an external field (as opposed to a self-field) by the flux whisker and shield. The fields of the superconducting shield are, likewise, seen as external fields by the passing electron. Thus, the passing electron experiences a change in its electromagnetic momentum given by 1 P : 4c
E (r 9 r) ; B (r ) dr ,
(47)
where B (r ) is the magnetic field of the superconducting electrons which, in the interior of the superconductor, just cancels the magnetic field of the passing electron. Finally, we recall that the wave properties of particles depend upon the canonical momentum, which is p : ; P.
(48)
At a given time, the vector potential seen by a passing electron is A(r, t) : A (r) ; A(r, t)
(49)
74
WALTER C. HENNEBERGER
with P : (e/c)A(r, t). A (r) is the vector potential in the absence of passing electrons. Thus, if in a time dt, the change in the vector potential due to a supercurrent is dA, then the electric field experienced by the electron will be E:9
1 (A) c t
so that e d : eE dt : 9 d(A). c
(50)
Therefore, dp : d ; (e/c) d(A) : 0. Clearly, the superconducting shielding can have no effect on an AB experiment (except, of course, to bring about the usual flux quantization). The result has been derived independent of geometry. It is further clear that the problem of shielding is not of a new type. It is merely an example of the stability of the AB effect under small fluctuations in the flux. This was discussed in the previous section. The field of the supercurrent is just one example of such fluctuations. VII. Interaction of a Passing Classical Electron with a Rotating Quantum Cylinder In Section I it was argued that the usual single-valued stationary states, while mathematically correct, made no physical sense. The physics associated with them exhibits many contradictions. On the other hand, the total wave function ‘‘must’’ be single-valued. The way out of this dilemma is fairly easy to guess: The Schro¨dinger equation, as written down by Aharonov and Bohm contains too few degrees of freedom. The AB effect is directly concerned with phase. A correct mathematical description of the effect depends critically on getting phase information right. In this section, the solenoid will be represented by a charged, rotating cylinder. The treatment is that of Henneberger and Opatrny (1994). The reader has already seen that there is a force exerted on the solenoid by the passing electron. The reader will also see that the passing electron exerts a torque on the rotating cylinder. Thus, in the external field approximation, phase information is lost. Therefore the external field approximation is inappropriate for discussion of the AB effect. In this section, this is shown to be the case. The model of a charged, rotating cylinder enables one to avoid a discussion of the quantum theory of the power source that supplies the
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
75
current to the solenoid (see the discussion associated with Eq. (39)). The model adopted here is similar to that of Peshkin et al. (1961). However, these authors discuss the change in angular velocity of the cylinder. The essential thing here, however, is not the change in the motion of the rotor, but the shift in its phase. We thus arrive at a conclusion that is just the opposite of the one drawn by these authors. We consider a cylinder of radius a and length l carrying a surface charge density , as shown in Fig. 4. The axis of the cylinder carries a line charge : 92a, so that the electric field of the cylinder is confined to its interior. Thus, the electrostatic potential vanishes in the region exterior to the cylinder. The cylinder is free to rotate about its axis (the z axis). An exact treatment of the problem in which an electron passes the rotating cylinder with fixed angular momentum consists, in general, of an infinite series of rapidly converging corrections to the unperturbed motion. The passing electron affects the motion of the cylinder and the change in the cylinder’s motion gives a further correction to the usual AB effect (which turns out to vanish, because of the AB effect’s stability, as already discussed). However, an angular acceleration of the cylinder would exert a small force on the electron. By the argument of Peshkin et al., in the limit of an infinitely massive cylinder, there will be no change in the cylinder’s angular velocity. The cylinder will, however, undergo a phase shift. We may therefore consider the treatment to be given here as exact. The method of this section is a semiclassical one. We write the state function for the electron—cylinder system as a product (r, , t) : ( , r, t)(r, t).
(51)
In Eq. (51), the function depends upon the electron coordinate r in order to allow for the influence of the passing electron on the cylinder. It is (r, , t) that describes an isolated system. It is this function that must be single valued. The product Ansatz of Eq. (51) is based on the fact that, in the limit of an extremely massive cylinder, the electron sees a constant flux, and the cylinder moves with virtually constant angular velocity, but undergoes a finite phase shift. We next assume that ( , r, t) is the solution of the quantum problem for the cylinder interacting with a passing classical electron. It may appear strange to the reader to do a quantum mechanical computation for a massive macroscopic object, but in light of the fact that the electron experiences no force (and thus moves in a straight line with constant speed, in accordance with the fact that the electron’s velocity is an integral of the motion, as seen earlier), the treatment here is exact, in the context of a semiclassical method. The reader who is unhappy with this method will find consolation in the fact that the complete quantum problem is solved exactly
76
WALTER C. HENNEBERGER
in the next section. Because of the large number of skeptics who have no problem with the usual treatment of AB scattering, the author feels the need for at least two independent derivations of his result! We consider that a classical electron passes the cylinder with constant speed along a line parallel to the x axis with impact parameter b, as shown in Fig. 5. From Eqs. (26) and (28), we obtain 1 4
e dA(r) B (r 9 r) · B (r ) dr : c dt
(52)
as d/dt : 0 in the AB problem. The reader is again reminded that Eq. (52) holds only in Coulomb gauge, the ‘‘natural’’ gauge for solving problems involving constant magnetic fields. The magnitude of B is related to the current/length K by the relation B : 4K/c, with K : a!. The angular
Figure 5. Electron passing a charged, rotating cylinder.
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
77
velocity of the cylinder is " so that B is given by B : (4/c)a" . (53) Equation (52), together with the fact that the electron has only one degree of freedom yields x
a "B (r 9 r) dr : x c
a " (z , x) dz c
e dA (r) e A : v . : c dt c x
(54)
The last integral of Eq. (54) is over the length l of the cylinder. (z , x) denotes the magnetic flux due to the passage of an electron at point x through an area of the cylinder parallel to the x, y plane a distance z above it; vol denotes the volume of the cylinder, and v is the velocity of the electron. X and t are related by x : vt. The constancy of v assumes neglect of electric fields caused by the angular acceleration of the cylinder. In this approximation (which is exact in the limit of an infinitely massive cylinder), one may integrate Eq. (54) directly, obtaining a (e/c)vA : " c
(z , x) dz .
(55)
The cylinder has length l and (mechanical) moment of inertia I . The electromagnetic angular momentum of the cylinder is L
:
1 4c
r ; (E ; B) dr
(56)
l L : r(4a/r)(4/c) " a · 2r dr : 4al"/c (57) 4c and I : I ; 4al/c is the effective moment of inertia of a charged cylinder of length l. The interaction Lagrangian is given by (e/c)v · A, so that the Lagrangian for the cylinder is I e L : " ; vA . 2 c
(58)
The angle is the angle subtended by some fiducial mark on the cylinder with the perpendicular to the trajectory indicated by the vector b in Fig. 4. The flux is given by : aB : (4a/c)" .
(59)
78
WALTER C. HENNEBERGER
The relations A :
2a" c((b ; x)
cos :
b
(60a)
(b ; x
2ab" A : A cos : c(b ; x)
(60b)
1 2eabv" L : I" ; . 2 c(b ; x)
(61)
yield
The canonical angular momentum is L P : : I" ; R(x) F " with R(x) :
2eabv c(b ; x)
(62)
and x : vt. The relation L / : 0 indicates that P is conserved. The wave F function ( , r, t) of Eq. (51) is of the form
(" , r, t) : exp i/
F
P (" ) d 9 F
E(t ) dt . (63) \ \ The interaction between the electron and the rotating cylinder is very weak. The kinetic energy of the electron is conserved. Energy conservation therefore dictates that the sum of the energy of the cylinder plus the magnetic interaction energy is conserved. This is easily checked. We define the quantity # as the value of " at t : 9-. Thus, we have Then Eq. (62) yields
P I# : I" . \ F
(64)
" : # 9 R(x)/I, so that " : 9R(x)/I
(65)
represents the change in " due to the passage of the electron. The change in kinetic energy is then K.E. : I"" $ I#" : 9
2eabv# . c(b ; x)
(66)
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
79
The change in magnetic energy is 1 4
#a B (r 9 r) · B (r ) dr : c
(z , x) dz .
It is now clear that any phase shift of the cylinder must be due to a lag in the angle (t). e e 2abv : vA : # : 9K.E. c c (b ; x)
(67)
Indeed, the phase shift in Eq. (63) is given by
I# # 2eab e 1 P : " dt :9 dx :9 A (x ) dx .
F
c(b;x ) \ \ \ (68) We see that the phase shift of ( , r, t) is just the negative of the Dirac phase shift that a quantum electron experiences in the vector potential of a classical flux. The preceding result has serious consequences. Returning to the requirement that the total wave function (r, , t) : ( , r, t)(r, t) be singlevalued, we see that (r, t) can not be a single-valued function. It must contain a phase factor exp
ie
c
r
A(r ) · dr : exp
ie
c
r
r d : exp(9i ) (69) 2r
with defined in Eq. (3). The cylinder may be arbitrarily massive; one must nevertheless take into account the phase shift of the cylinder. Failure to do so yields incorrect eigenfunctions for the system and physics that is simply nonsense. One cannot avoid this phase factor of the source of the magnetic field. In order to have true stationary states, one must have a closed system, that is, there can be no transfer of energy to or from the system. It will not do to avoid discussion of the source of the magnetic field by simply powering the solenoid with power from the local power company. As has been discussed in Section IV, this merely complicates the problem by making the local power company part of the quantum system! This cancellation of phases between parts of an isolated system is not accidental. It always occurs. It has been shown by Yu and Henneberger (1996) that this also occurs in the Aharonov-Casher effect. The authors go on to show that, in every closed system, the sum of the phase shifts must be zero. The proof follows.
80
WALTER C. HENNEBERGER
We consider an isolated system having N degrees of freedom with coordinates q , q , . . . , q . The system is described by a Lagrangian L , with canonically conjugate momenta given by L p : , k : 1, . . . , N. q! The principle of least action states that
with
(70)
L dt : 0,
L : p q! 9 H, (71) with the second of these equations defining the Hamilton function H. Equations (71) then yield
p q! dt 9 H dt : 0. (72) These equations hold for arbitrary infinitesimal variations in coordinates. Instead of considering arbitrary variations, we first consider the system without interaction. Interactions that result only in a path-dependent phase factor are very weak. We therefore consider the variations q and p to be those resulting from an adiabatic switching on and off of the interaction leading to the phase shifts. The treatment here is especially suited to very weak interactions involving free particles, such as in the AB effect. The Hamiltonian H is conserved, and the adiabatic switching on and off of the interaction occurs when the interacting parts are separated by very great distances. Hence, we have
Equation (72) now tells us that
H dt : 0.
(73)
I p q! dt : p dq : 0. (74) I But I p dq is just the phase shift associated with the coordinate q , I multiplied by . We note that in the case considered here the q do not necessarily vanish at the endpoints of the integral. The variations here are not virtual displacements, but real ones. This is quite legitimate. In the usual theory, the
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
81
q are chosen to vanish at the endpoints of the time integral for conveni ence, as the displacements are only virtual. All that is required is the variational condition of Eq. (71). Our result, expressed in terms of phases, is
1 I p dq : 0 (75)
I for any isolated, weakly interacting system. This result appears almost obvious; however it has profound implications. One must use great care in splitting an interacting system into a particle and an external field. Whenever the wave function of a particle in such a system undergoes a phase shift, the wave function of the rest of the system undergoes an equal and opposite phase shift. To ignore this phase shift of the remainder of the system is to violate the postulate of quantum theory that requires the wave function of the completely isolated system (which, to be completely accurate, is the wave function of the system!) to be single valued. The external field approximation is, as we have just seen, an approximation. VIII. Solution of the Entire Problem of the Closed System Because of widespread misunderstanding of the AB problem, it was useful to demonstrate the phase shift of the rotating cylinder in the previous section. In a problem where skepticism abounds, it is good to have two independent derivations, one corroborating the other. The complete problem is solved in this section by finding normal modes of the system and quantizing these normal modes. The method illustrates a shortcoming of quantum theory: T he processes of canonical transformation and quantization do not commute! It is possible for two (or more!) persons to produce different solutions of a physical problem, all of which are mathematically correct. A wider discussion of this rather disconcerting phenomenon will be given at the end of this section. Returning to the problem of an electron in the region exterior to a charged, rotating cylinder, we find that the Lagrangian of the total system is 1 1 e 1 L : r! ; r" ; "" ; I", 2 2 c 2 2
(76)
where is the electron mass, r and are the electron coordinates, and is the angle turned by some fiducial mark on the cylinder, as before. The treatment here follows one that was published earlier by the author (Henneberger, 1997).
82
WALTER C. HENNEBERGER
In this section, we must modify the model of Section VII slightly. The line charge along the z axis is replaced by a stationary charged cylinder having a radius a ; %, where % is infinitesimal. The cylinder carries a surface charge 9. In this way, all electric fields of the rotating cylinder have been eliminated (except for the infinitesimal region between the cylinders). The magnetic moment of inertia of the rotating cylinder is easily shown to be unchanged from the value found in Section VII. The vector potential at a distance r from the flux whisker is A :
with : " 2r
and
: 4a/c.
(77)
As before, is the charge/cm on the rotating cylinder and a is the radius of the cylinder (assumed to be small). The moment of inertia of the cylinder is the total moment (the sum of mechanical and electromagnetic moments) as defined earlier. In Eq. (76), the interaction of the electron with the vector potential has been replaced by the interaction with its source, the rotating charged cylinder. The most direct method of treating this Lagrangian (and also the method that inspires the most confidence!) is to transform it to normal coordinates. The reader is reminded that Planck discovered the quantum of action by quantizing the normal modes of a cavity. If there is any system at all to which quantum theory can be applied without any sort of caveat, surely it is a system having normal modes. We begin with an orthogonal transformation to new variables & and & , such that & : cos ' ; sin ' & : 9 sin ' ; cos '
(78)
Equation (78) is a rotation through an angle ' in the , space. The inverse transformation is : & cos ' 9 & sin ' : & sin ' ; & cos '.
(79)
The angle ' is assumed to be time-independent. This assumption will be seen to be justified in the limit I r. This is in keeping with the usual assumption that the rotating cylinder is extremely massive. The assumption that ' is time-independent yields " : "& cos ' 9 "& sin ' " : &" sin ' ; "& cos '.
(80)
83
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
The Lagrangian of Eq. (76) now becomes L : r" ; r("& sin' ; "& cos' ; 2"& "& sin ' cios ') e " ; (& sin ' cos ' 9 "& sin ' cos ' ; "& "& cos' 9 "& "& sin') 2c (81) ; I("& cos' 9 2"& "& sin ' cos ' ; "& sin'). The angle ' is chosen to have a value that causes the "& "& term to vanish. This condition yields r sin ' cos ' ;
e (cos' 9 sin') 9 I sin ' cos ' : 0. 2c
(82)
Equation (82) has the solution tan' :
e e $ . c(I 9 r) cI
(83)
Equation (83) shows that the assumption '! : 0 is valid for I r. The angle ' is very small, but it may not be assumed to vanish. Equation (83) yields sin ' $
e 2cI
and
cos ' $ 1.
(84)
The Lagrangian then becomes
1 1 e 1 L : r! ; r sin' ; sin ' cos ' ; I cos' "& 2 2 2c 2 ;
1 e 1 r cos' 9 sin ' cos ' ; I sin' &" . 2 2c 2
(85)
The canonical momenta are L p : : r! r!
p& :
L e : r sin' ; sin ' cos ' ; I cos' &" c &"
p& :
L e : r cos' 9 sin ' cos ' ; I sin' &" . c " &
(86)
84
WALTER C. HENNEBERGER
It is convenient to introduce the quantities I& and : I& : r sin' ;
e sin ' cos ' ; I cos' c
r : r cos' 9
e sin ' cos ' I ; sin' . c r r
In the limit I -, I& I and . The Lagrangian now has the simple form
(87)
L : r! ; r&" ; I& &"
(88)
1 1 1 p p p H : r! ; r&" ; I& &" : ; & ; & . 2 2 2 2r 2I& 2
(89)
and the Hamiltonian is
In the limit of extremely large I, we may drop the tildes on I& and . Then H becomes p p p H : ; & ; & . 2 2r 2I
(90)
The eigenstates of this Hamiltonian are (k, r, & , &) : J (kr)e im& eiM&, (91) where m and M are integers and the electron kinetic energy is given by
k/2. We must, of course, exclude the m : 0 states, as these are nonvanishing at r : 0. We assume the radius of the cylinder to be effectively zero, as did Aharonov and Bohm. Equations (76) and (88) show that p , p , p& , and p& are conserved quantities. We now turn to the physical significance of the angles & and & . p : I" ;
e " 2c
p : r ;
e " 2c
p& : I"& : M
p& : r"& : m
e p 9 p& : r(" 9 "& ) ; . 2c With the second of Eqs. (80), this becomes e e e p 9 p& : r&" sin ' ; : r ; . 2c 2cI 2c
(92) (93)
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
85
The approximation r I then yields p 9 p& : 9
(94)
with :9
e , ch
as defined by AB. Equation (94) shows that p& is the kinetic angular momentum of the electron. In the AB problem, it is the kinetic angular momentum (not the canonical angular momentum) that is quantized. This kinetic angular momentum is, of course, the canonical angular momentum in the new coordinates. For the past 18 years the author has been emphasizing that this must be the case (Henneberger, 1981). The angles & and & are the angles and in the absence of any interaction. Multiplication of the first of Eqs. (79) by the (conserved) canonical angular momentum of the cylinder yields e
M ( 9 &) : 9& M sin ' : 9M 2cI
(95)
where terms of order e have been neglected in the last equality. The relation M /I : yields e M ( 9 &) : 9
: . ch
(96)
Equation (96) may be written
p d : 9
p d,
(97)
where the symbol refers to the change in these quantities due to the AB interaction. This is the result of Henneberger and Opatrny (1994), as was already given in the previous section. There is a second proof that it is the kinetic angular momentum that takes on integral values of in AB problems. We combine the equations
: m i &
(98)
: cos ' 9 sin ' . &
(99)
and
86
WALTER C. HENNEBERGER
The approximation sin ' $ tan ' and cos ' $ 1 yields m :
e 9 . i 2cI i
(100)
Putting /i(/ ) : I" and " : then yields m :
1 e 9 . i ch
(101)
Equation (101) shows that when the original canonical angular momentum operator acts on an eigenstate of 1/i(/& ), one obtains 1 e 1 m : L can 9 : L can ; . ch
(102)
Again, we see that, in the AB problem, the eigenvalues of (1/ )L can are m 9 , where m is an integer. The result of this section depends critically upon the quantization of the normal modes of the system. The author believes strongly that in the case of ambiguities it is the normal modes of the system that should be quantized. A person wishing to argue against this point of view might argue that one could find the Hamiltonian corresponding to the original Lagrangian of Eq. (76) and then simply postulate stationary states of the form (r, , ) : f (r)e F e
(103)
where M and m are integers. This approach forces the quantization of the original canonical angular momenta, and leads directly to the solution for the electron found by Aharonov and Bohm. Several arguments against this solution have already been given. In classical theory, all sets of canonically conjugate variables are equivalent for solving a problem. This is not true in quantum theory. In general, a person who makes a canonical transformation before quantizing a system will arrive at a different result than one who does not. There are several other examples of this in the literature. These will be discussed in Section X. IX. The Interior of the Solenoid In the last section, we saw that the usual formulation of the AB problem suffers from the difficulty of having too few degrees of freedom. The reader may now wonder whether the well-known solutions of the Schro¨dinger equation in a constant magnetic field suffer from the same defect. It will be seen in this section that, sadly, they do. These solutions are adequate for
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
87
almost all problems. However, they cannot correctly describe interference experiments. Let us turn now to the problem of the electron in the interior of the solenoid. The system has the Lagrangian 1 1 er 1 L : r! ; r" ; "" ; I". 2 2 ca 2
(104)
The canonical momenta are p :
L er : I" ; " 2ca "
L p : : r! r! p :
L er : r" ; . 2ca "
(105)
Because of the factor r in the interaction term of Eq. (104), it is no longer possible to transform the Lagrangian to normal coordinates by means of a single rotation, as in the previous section. The assumption that the angle ' is time-independent fails. It is the author’s hope that the reader has become convinced that, in cases where normal coordinates exist, it is the normal coordinates which must be quantized. The perplexing question is how to proceed in the absence of normal coordinates. Here, we deal with just such a case. It is possible to arrive at a somewhat satisfactory discussion in terms of the original coordinates. As before, the angular momenta p and p are conserved quantities. The value of p is I/, where is the value of the flux when r : 0 (or equivalently, when no electron is present). The cylinder’s angular velocity is given by er" " : ; : ; ". caI
(106)
The cylinder phase shift is
1
1 p d : F
1 p " dt : 9 F
er eB d : 9 2ca
c
r d. (107) 2
Over one revolution of the electron in its orbit, the phase shift is 9eB/c (A) : 9e /c , where is the flux encircled by the orbit of area A.
88
WALTER C. HENNEBERGER
The conserved electron canonical angular momentum is p" : r" ;
er" er er : r" ; 9 . 2ca 2ca 4acI
(108)
In the limit I -, the term in e/I may be dropped. The second term of Eq. (108) is the vector potential term. It gives a phase shift per cycle of e
c
A r d :
e
c
rB e e d : B · (area) : . 2
c
c
(109)
The phase shifts are again equal and opposite, as Eq. (75) requires. For an AB experiment performed inside the solenoid, the phase difference would be given by Eq. (109) with being the flux between the two paths open to the electron. The phase shift of Eq. (109) is an example of the Berry phase (Berry, 1984). We see that, in closed systems, that is, systems that do not involve external potentials, the Berry phase follows directly from the dynamics of the system. Let us return to the cylinder phase shift of Eq. (107). Consider an electron passing through the axis of rotation of the cylinder (r : 0). For such an electron, p vanishes. The change in the angular velocity of the rotating cylinder is then er " : 9 " . 2caI
(110)
The energy change of the rotating cylinder is er Ber e E : I"" : 9 " : 9 " : 9 · A. 2ca 2c c
(111)
This last term is the negative of the overlap magnetic field energy (i.e., the energy term involving the scalar product of the cylinder’s magnetic field and the electron’s magnetic field). The reader will recall that : " when the electron is at the origin, in accordance with Eq. (106). When r : 0, · A vanishes. Equation (111) shows that the energy shift is time dependent, in general, reflecting a lack of constancy in the angular velocity of the cylinder. As the electron moves in its orbit (which is fixed in space), kinetic energy of the cylinder is continually being exchanged with magnetic energy. The average energy shift may be computed by means of a semiclassical argument. The phase shift per cycle for the cylinder was found in Eq. (109) to be 9e /(c ). A continuous shift in phase is just a shift in angular frequency. This frequency shift is just the phase shift divided by the period
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
89
of revolution of the electron. This is 9
e/c
eB :9 (orbit area). 2/eB 2c
Multiplication by gives an average energy shift per cycle of (112) (E : 9v, where v is the velocity of the electron in its orbit. Once more, this illustrates the difference in behavior of the cylinder when the electron is in the magnetic field, and when it is in the exterior region. When the electron is in the magnetic field, the cylinder undergoes a continuous phase shift, that is, a shift in energy. This energy will (at least partially, depending on the orbit) continuously oscillate between kinetic energy and the energy of the magnetic field. As in the AB effect, the kinetic energy of the electron is unchanged. An electron in the exterior region travels in a straight line. The cylinder gets only a one-time phase shift — as does the electron wave function. We have already seen that, in the AB problem, the price paid for making an external field approximation is the sacrifice of single-valuedness of the electron wave function. Were one to go further and use an external field approximation over all space, one would omit a degree of freedom that is correlated with the motion of the electron. This solution would be adequate in the interior region, as long as one does not treat interference phenomena. However, we see that the behavior of the omitted cylinder is quite different as the electron approaches the boundary of the cylinder from the two sides of the cylinder wall. It seems highly unlikely to this author that the Schro¨dinger equation is valid over all space in the external field approximation. In 1984, the author published his most controversial paper on the AB effect. At that time, wave functions corresponding to the external field approximation were considered exact solutions. The AB problem was assumed to be 2D. The author postulated two boundary conditions on the current density: (A) n · j continuous, and (B)
(n ; j) continuous, n
where n is the normal to a given surface (in this case, the boundary of the solenoid). These boundary conditions, along with the assumption that the wave function in the interior of the solenoid is single-valued, give a wave
90
WALTER C. HENNEBERGER
function in the exterior region having the form f (r, )e\ ? , where f (r, ) is a single-valued function. This result is no longer especially interesting, as the complete AB problem involves 3 degrees of freedom, not 2. In the external field approximation, there is no guarantee that the wave function in the interior of the cylinder is single valued. Only the total wave function of the electron plus spinning cylinder system need be single valued. Whether the result of Henneberger (1984) is right or wrong will only be determined when a solution of the complete problem of the electron plus spinning cylinder system is available in all of space. It is highly desirable to have a transformation of the type of Eq. (78) that is valid in the interior region. One would like this transformation to join smoothly with Eq. (78) at the cylinder boundary. To be sure, the interior system does not have normal coordinates. It is unclear how one would determine such a transformation. One highly desirable property of such a transformation would be that it eliminate velocity-dependent potentials. In every case of which this author is aware, elimination of velocitydependent potentials by means of a suitable canonical transformation brings with it not only an enormous simplification of the problem, but also physics that is reasonable and unquestionably correct. This will be discussed in detail in the upcoming last section of this work.
X. Ambiguity in Quantum Theory, Canonical Transformations, and a Program for Future Work In classical theory, canonical transformations are simply changes in coordinate systems. The new coordinate system is equivalent to the old one in that both are equally valid for description of the system under consideration. The motivation for the transformation is generally convenience, for example, one may wish to transform to normal coordinates, as in the case of coupled oscillators. In quantum theory, canonical transformations can bring ambiguities with them. The reader who has carefully read Section VIII has already seen one example of this. The earliest example known to this author is the transformation by Maria Go¨ppert-Mayer (1931) in her well-known paper on Raman effect and two-quantum emission. She noted that the variational principle allows one to add the total time derivative of any function to the Lagrangian. She added L : 9
e d e e (r · A) : 9 · A ; er · E 9 r · ( · )A c dt c c
(113)
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
91
to the Lagrangian, which left her with the new interaction Lagrangian er · E, after making a dipole approximation (which neglects the r · ( · )A term). This interaction has been used by many authors over the years. It has come to be the interaction of choice in quantum optics. It was found by Lamb (1952), that, in some problems, the interaction Hamiltonians 9e/mcp · A and 9er · E give different results. This problem has been discussed extensively by Power and Zienau (1959), who showed that 9er · E is the better interaction Hamiltonian. One cannot make an argument supporting one on the basis that the other was derived from mathematical errors. Mathematically, one has two quantum systems. Unfortunately, at most one corresponds to the physical system that one is trying to describe. It seems to this author that one reason for preferring r · E over p · A is the following: In perturbation theory, one begins with eigenstates of 9 /(2m) ; V. In the absence of a perturbation, ( /i) : , the kinetic momentum. If one now introduces the perturbation 9e/mc(p · A), then /i() becomes ; e/c(A). Thus, the ‘‘unperturbed’’ state is no longer unperturbed. This does not happen if the perturbation is 9r · E. In this case, p is always . The Go¨ppert-Mayer transformation has done away with the offensive velocitydependent potential! The author has encountered quantum ambiguities several times in his career. The earliest was in his doctoral dissertation (1959). The problem involved a canonical transformation by van Kampen (1951) and later derived by Steinwedel (1955), who used an alternative derivation. The transformation involved bringing a charged harmonic oscillator coupled to the radiation field (in dipole approximation) to normal coordinates. One of the author’s tasks was to use the transformation to show that a displaced ground state wave packet would oscillate with the usual radiation damping while holding its shape, and eventually settle at the origin. The author began by formulating the problem in terms of the original particle and field variables. The problem was not only monstrously complicated, but (to the great relief of the author!), had the property that the mean square deviation of the new variables from their averages was a divergent integral! He quickly concluded that the formulation of the problem in terms of the original particle and field variables made no sense. It then turned out that, when the problem was formulated in terms of the new variables, the normal coordinates, the problem was relatively simple. Again, formulation of the problem in terms of the normal coordinates does not involve velocity-dependent potentials. Another example is a canonical transformation due to Kramers (1938). The author obtained the same transformation many years later, starting with a time-dependent unitary transformation acting on the wave function in the Schro¨dinger equation (Henneberger, 1968). A few years after he
92
WALTER C. HENNEBERGER
published his result, the author realized that the two transformations were the same. The transformation is now known as the K-H transformation. The author (1970) quantized the electromagnetic field after first making a K-H transformation to the new variables. When one transforms the quantized system back to the original particle and field variables, one finds that in addition to the usual Hamiltonian a term identical with the one usually put in ‘‘by hand,’’ called the mass renormalization term, is obtained. Here we have another example of an improvement of the system by quantizing it in a form that does not contain a velocity-dependent potential. The AB effect is merely the latest in a series of examples that demonstrates that solutions of quantum systems involving velocity-dependent potentials, that is, systems involving electrodynamics, have more than one solution. Indeed, there is probably a different solution for every possible canonical transformation that one might wish to make. The reader has hopefully noted that the AB effect is the simplest of all problems in electrodynamics. Even the problem of an electron in a constant magnetic field is more difficult when solved correctly. The importance of the AB effect lies in its simplicity. In the AB effect, nature is trying to tell us something. We should be listening! Finally, one should be concerned about a program for the future. It appears that physicists have chosen to quantize the obvious variables, while nature has in fact quantized some mysterious set of variables that is related to that of the physicist by a canonical transformation. One can only hope that (as in the examples given in this work) the transformation is linear. In the general case of q.e.d. (not in dipole approximation!) this is probably wishful thinking. Nature has hidden her secrets well. In the meantime, renormalization techniques make possible computations that yield an amazing agreement with experiments. A good beginning might be to start with the easiest problem, that of an electron interacting with a rotating charged solenoid, as discussed in the previous sections. To a young physicist in search of a problem, the author would suggest seeking a transformation in the interior region of the solenoid (rotating charged cylinder) that eliminates the velocity-dependent coupling between particle and field, and that joins Eqs. (78) smoothly at the boundary of the cylinder. This would give an improved solution for an electron in a constant magnetic field. It would also shed light on the problem of what happens at the boundary of the solenoid. There have in the author’s opinion been many errors made over the years regarding the AB effect. These have been based on one or more of the following erroneous assumptions: 1. If one allows the moment of inertia of the rotating cylinder to become infinite, the external field approximation becomes exact. Actually, the phase shift of the rotating cylinder is independent of its mass.
THE AHARONOV—BOHM EFFECT — A SECOND OPINION
93
2. The vector potential is not a physical quantity. This, as hopefully the reader has seen, is at best a half-truth. The transverse part of the vector potential is indeed a physical quantity. It is the electromagnetic momentum due to the presence of a charge at the point r (when multiplied by e/c). The remainder (the longitudinal part) is merely a computational convenience. It is perhaps noteworthy that there was a group in Italy that argued (quite correctly, in this writer’s opinion) that unphysical quantities cannot cause physical effects. Their failure to recognize that the vector potential is a physical quantity is what led them to conclude that there could be no AB effect, and that all of the experiments had to be wrong. 3. Problems in quantum theory have unique solutions. This, it appears, is only true if one forbids canonical transformations. It now appears that there is a different solution for every canonical transformation, assuming of course, that the transformation is carried out before the quantization. Failure to recognize these problems has led to an enormous amount of literature, all of which has generated an immense amount of heat, but very little light. References Aharonov, Y. and Bohm, D. (1959). Phys. Rev., 115: 485. Al-Jaber, S. M. and Henneberger, W. C. (1992). Il Nuovo Cim., 107B: 485. Berry, M. V. (1984). Proc. Roy. Soc. L ondon, A392: 45. Ehrenburg, W. and Siday, R. E. (1949). Proc. Phys. Soc., 62B: 8. Goldhaber, A. S. (1989). Phys. Rev. L ett., 62: 482. Go¨ppert-Mayer, M. (1931). Annalen der Physik, 9: 273. Henneberger, W. C. (1997). Int. Journal of T heor. Physics, 36: 2067. Henneberger, W. C. (1984). Phys. Rev. L ett., 52: 573. Henneberger, W. C. (1981). J. Math. Phys., 22: 116. Henneberger, W. C. (1970). Nuclear Physics, B23: 365. Henneberger, W. C. (1968). Phys. Rev. L ett. 21: 838. Henneberger, W. C. (1959). Zeitschr. fu¨r Physik, 155: 296. Henneberger, W. C. and Opatrny, T. (1994). Int. Journal of T heor. Physics, 33: 1783. Jackson, J. D. (1975). Classical Electrodynamics, 2nd edition, New York, London, Sydney, Toronto: John Wiley and Sons. Konopinski, E. J. (1981). Electromagnetic Fields and Relativistic Particles, New York: McGrawHill, p. 158. Kramers, H. A. (1938). Report to the 8th Solvay Congress. Lamb, W. E. (1952). Physical Review, 85: 259. Mo¨llenstedt, G. and Bayh, W. (1962). Phys. Bla¨tter, 18: 299. Pauli, W. (1958). Encyclopedia of Physics, 46, Berlin: Springer Verlag. Pauli, W. (1939). Helv. Physica Acta, 12: 147. Peshkin, M., Talmi, I. and Tassie, L. J. (1961). Annals of Physics, 12: 426.
94
WALTER C. HENNEBERGER
Power, E. A. and Zienau, S. (1959). Proc. Roy. Soc. L ondon, A251: 54. Shapiro, D. and Henneberger, W. C. (1989). J. Phys. A: Math. Gen., 22: 3605. Steinwedel, H. (1955). Annalen der Physik, 15, 207. Thomson, J. J. (1904). Elements of the Mathematical Theory of Electricity and Magnetism, 3rd edition, Cambridge: Cambridge University Press. Tonomura, A., Osakabe, N., Matsuda, T., Kawasaki, T., Endo, J., Yano, S., and Yamada, H. (1986a). Phys. Rev. L ett., 56: 792. Tonomura, A., Yano, S., Nobuzuki, O., Matsuda, T., Yamada, H., Kawasaki, T., and Endo, J. (1986b). Proc. 2nd Int. Symposium of Foundations of Quantum Mechanics, Tokyo, 97. van Kampen, N. G. (1951). Dan. Mat. Fys. Medd., 26: (15). Yang, C. N. (1983). Proc. International Symposium on Foundations of Quantum Mechanics, Tokyo, 5. Yu, X. and Henneberger, W. C. (1996). Int. Journal of T heor. Physics, 35: 393. Zhu, X. and Henneberger, W. C. (1990). Jour. Physics A: Math. Gen., 23: 3983.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 112
Well-Composed Sets LONGIN JAN LATECKI Department of Applied Mathematics, University of Hamburg, Bundesstr. 55, 20146 Hamburg
I. Introduction . . . . . . . . . . . . . . . . . . . . II. Definition and Basic Properties of Well-Composed Sets . . III. 3D Well-Composed Sets . . . . . . . . . . . . . . . A. Local Properties of the Continuous Analog . . . . . . B. Digital Characterization of Well-Composed Sets . . . C. Jordan-Brouwer Separation Theorem . . . . . . . . D. Properties of Boundary Faces . . . . . . . . . . . E. Connected Components in 3D Well-Composed Pictures IV. 2D Well-Composed Sets . . . . . . . . . . . . . . . A. Jordan Curve Theorem and Euler Characteristic . . . B. Thinning . . . . . . . . . . . . . . . . . . . . C. Irreducible Well-Composed Sets . . . . . . . . . . D. Graph Structure of Irreducible Sets . . . . . . . . . E. Parallel Thinning on Well-Composed Sets . . . . . . F. Making a Binary Image Well-Composed . . . . . . . G. Making a Gray-Level Image Well-Composed . . . . . V. Digitization and Well-Composed Images . . . . . . . . A. Continuous Representation of Real Objects . . . . . . B. Digitization and Segmentation . . . . . . . . . . . C. Digitizations Produce Well-Composed Images . . . . VI. Application: An Optimal Threshold . . . . . . . . . . A. Thresholding . . . . . . . . . . . . . . . . . . . B. Histogram of Checkerboard Patterns . . . . . . . . VII. Generalizations . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
95 98 103 103 106 109 111 112 113 117 117 123 126 127 135 140 142 142 146 149 154 154 157 159 161
I. Introduction Under a digital set we understand a subset X of Z such that either X or its complement X : Z )X is finite. Digital sets play a primary role in computer vision and computer graphics, because every object in a digital image is represented by a fine subset of Z , where n is mostly equal to 2 or 3. For example, in order to recognize which real object is depicted in a digital image, it is necessary to analyze properties of digital sets obtained by segmenting the digital image into objects. Among the most important 95 Volume 112 ISBN 0-12-014754-8
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright 2000 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/00 $35.00
96
LONGIN JAN LATECKI
properties are shape properties, because shape plays an important role in human perception. To analyze the shape properties, like differential-geometric properties, of all the possible subsets of R seems to be impossible. Therefore, shape analysis is restricted mostly to some class of subsets, like subsets of R whose boundaries are n 9 1D manifolds, for example, 2D manifolds in R looks locally like an open subset of the plane. The idea of defining the class of well-composed sets is analog. Wellcomposed sets define exactly the class of subsets of Z whose continuous analog in R has the boundary that is n 9 1D manifold. The continuous analog has an intuitive and natural meaning in computer vision and computer graphics. Each point in a digital image has a dual nature. 1. It is a point (mostly with integer coordinates); or 2. it is a region in the plane R or in space R. For algorithmic processing, it is necessary to treat each point in a digital image as a point with integer coordinates, that is, as a topological object of dimension zero. For example, a 2D digital image is a finite rectangular array of points with integer coordinates. However, what is seen on any display is a set of regions with nonempty area, which can be more adequately modeled as subsets of the plane of topological dimension two. Mostly, a point in a 2D digital image is modeled as a unit square centered at the point with integer coordinates. Modeling points in digital images as sets with nonempty area or volume is not only useful for visualization purposes, but also for analyzing geometrical properties of objects in digital images. We will use the region interpretation of digital images to define wellcomposed images based on topological properties of the regions. Then we will translate the obtained definition to the point interpretation of digital images that is based on graph structure. Following Wang and Bhattacharya (1997) we will call the regions in the plane pixels (for picture elements) and regions in the space voxels (for volumetric elements). Since the correspondence between points in digital images and pixels plays an important role in this contribution, we will describe it formally in Section II. The concept of well-composed sets was first introduced in Latecki et al. (1995) with the goal of distinguishing a special class of subsets of 2D binary digital pictures called ‘‘well-composed pictures.’’ The idea is not to allow the ‘‘critical configuration’’ shown in Figure 1 to occur in a digital picture. Note that this critical configuration can be detected locally. The 2D well-composed pictures have very nice topological properties; for example, the Jordan curve theorem holds for them, their Euler characteristic is locally computable, and they have only one connectedness relation,
WELL-COMPOSED SETS
97
Figure 1. This pattern and its 90° rotation do not occur in a well-composed digital image.
because 4- and 8-connectedness are equivalent. Therefore, when we restrict our attention to well-composed pictures, a number of very difficult problems in digital geometry as well as complicated algorithms become relatively simple. This is demonstrated in Latecki et al. (1995) with the example of thinning algorithms. There are practical advantages in applying thinning algorithms to well-composed pictures. The thinning process (sequential as well as parallel) is greatly simplified and the resulting skeletons are ‘‘one-point thick.’’ Thus, the problems with irreducible ‘‘thick’’ skeletons disappear. On the other hand, if a set lacks the property of being well-composed, then it may not be possible to reduce it to a ‘‘thin’’ skeleton. Examples of irreducible sets with large interiors are given in Arcelli (1981). We discuss point-based properties of 2D well-composed sets in Section IV. Moreover, Gross and Latecki (1995) show that if the resolution of a digitization process is fine enough to ensure topology preservation, then the obtained segmented 2D image must be well-composed (Section V). An important motivation for 2D well-composed sets were the connectivity paradoxes that occur if only one adjacency relation (e.g., either 4- or 8-adjacency) is used in the whole picture (presented at the beginning of Section IV). Such paradoxes are pointed out in Rosenfeld and Pfaltz (1966) (see also Kong and Rosenfeld (1989)). The most popular solution was that of using different adjacency relations for the foreground and the background: 8-adjacency for black points and 4-adjacency for white points, or vice versa (first recommended in Duda et al., 1967). Rosenfeld (1979) developed the foundations of digital topology based on this idea, and showed that the Jordan curve theorem is then satisfied. However, the solution with two different adjacency relations does not work if one wants to distinguish more than two colors, that is, to distinguish among different objects in a segmented image, as shown in Latecki (1995). The same paradoxes appear in 3D segmental images. The author (1997) defines and analyzes 3D segmented ‘‘well-composed pictures’’ in which the connectivity paradoxes do not occur. He also gives a
98
LONGIN JAN LATECKI
topological motivation for well-composed sets based on the concept of continuous analog. We will treat 3D well-composed sets in Section III. The class of well-composed sets seems to be general enough, in the sense that it is possible to determine the digital sets in digital images obtained in practical applications in such a way that they are well-composed sets. In Section VI we show an example application of well-composed sets to determine an optimal threshold for gray-level document images. This application was inspired by the digitization theorem that is presented in Section V. Section II contains the main definitions. This section gives various generalizations of the concept of well-composedness. Sections III—VII are independent from each other. Therefore, each of them can be read directly after Section II. The definitions in Section II are also sufficient in order to read Sections IV.F and G. The content of Sections IV.F and G is independent from the content of the other Section IV subsections.
II. Definitions and Basic Properties of Well-Composed Sets We will interpret Z , where n : 2, 3, . . . , as the set of points with integer coordinates in R. We call a set X * Z a digital set if either X or X : Z )X is finite. The following concept of a continuous analog gives us an intuitive and simple correspondence between points in Z and cubes in R. For example, a 3D digital set can be identified with a union of upright unit cubes that are centered at its point. Definition 1 The continuous analog CA(p) of a point p : (p , . . . , p ) + Z is given by CA((p , . . . , p )) : [p 9 , p ; ] ; % ; [p 9 , p ; ]. For example, CA(p) of a point p + Z is the closed unit cube centered at this point with faces parallel to the coordinate planes. The continuous analog of a digital set X * Z is defined as CA(X) : +,CA(x) : x + X-, for example, see Figure 2. Formally, CA can be viewed as a function CA : P(Z ) P(R), and CA(p) as a short form for CA(,p-) if p + Z . We denote C : ,CA(p) : p + Z -. In particular, C is the set of closed squares and C is the set of closed cubes centered at points with integer coordinates.
WELL-COMPOSED SETS
99
Figure 2. The continuous analog of the digital set composed of the eight points is the union of the eight cubes.
Definition 2 We also define a dual function Dig to CA, which we call Z subset (or element) digitization: Dig : P(R) P(Z ) is given by Dig (Y ) : Z Z ,p + Z : p + Y -. Clearly, we have Dig (CA(X)) : X for every X * Z . The Z equation CA(Dig (Y )) : Y holds only if Y * R is a union of some cubes Z in C. Now we give the main definition of this chapter. Definition 3 A digital set X * Z is well-composed iff CA(X) is an ndimensional bordered manifold. Recall that a subset M of R is an n-dimensional bordered manifold if each point in M has a neighborhood homeomorphic to a relatively open subset of a closed half-space in R and M is not an n-dimensional manifold. A subset N of R is an n-dimensional manifold if each point in N has a neighborhood homeomorphic to R. We call each connected component of an n-dimensional manifold an n-dimensional surface. For example, the 3D digital set X in Fig. 2 is well-composed, because CA(X) is a 3D bordered manifold (i.e., every interior point of CA(X) has a neighborhood that is an open subset of R and every boundary point of CA(X) has a neighborhood homeomorphic to a neighborhood of the boundary point of a closed half-space in R). The two digital sets in Figure 3 are not well-composed, for example, the joint point of the two cubes in Figure 3(2) does not have a neighborhood homeomorphic to a relatively open subset of a closed half-space in space R. According to their definition, we could name well-composed sets digital bordered manifolds.
100
LONGIN JAN LATECKI
Figure 3. A digital set X * Z is well-composed iff the critical configurations of cubes (1) and (2) (modulo reflections and rotations) do not occur in CA(X) or CA(X ).
For Y * R we denote by bdY the topological boundary of Y in the standard topology on R induced by the Euclidean metric. Equivalently, one can define well-composed sets in the following way: Definition 4 We call a digital set X * Z well-composed if bdCA(X) is a n 9 1D manifold. For example, a 3D digital set is well-composed if the boundary surface of its continuous analog is a 2D manifold, that is, it ‘‘looks’’ locally like a planar open set. This definition implies a simple correspondence between boundaries of a digital set (that are defined based on graph theory) and the boundary surface of its continuous analog, since the digital set is identified with the union of cubes centered at its points. Thus, we can use well-known properties of continuous boundary surfaces, like the Jordan-Brouwer separation theorem, to determine and analyze properties of the digital set. The equivalence of both definitions follows from the fact that a subset M * R that is a finite union of n-dimensional cubes is an n-dimensional bordered manifold iff bdM is an n 9 1D manifold. Although the well-composed sets are defined for any dimension n, we will concentrate on their properties for n : 2 and 3, since these dimensions play the most important role in digital image processing. Now we consider more carefully the definition and properties of 2D well-composed sets. Let X * Z be a digital set. The continuous analog CA(X) is the union of closed unit squares centered at points of X. The boundary bdCA(X) is the union of the set of unit line segments, each of which is the common edge of a square in CA(X) and a square in CA(X ). Observe that there is only one kind of adjacency for line segments contained in bdCA(X): two segments are adjacent if they have an endpoint in common. Hence, there is only one kind of connectedness for bdCA(X). The unit line segments contained in bdCA(X) correspond to pairs of 4-adjacent points (p, q) such that p + X and q , X. (The adjacency relations are defined in what follows.) In the 2D dase, we obtain the following definition: A digital set X is
WELL-COMPOSED SETS
101
Figure 4. The continuous analog of a 2D well-composed picture does not contain this critical configuration and its 90° rotation.
well-composed if bdCA(X) is a 1D manifold (each point in bdCA(X) has a neighborhood homeomorphic to R). The following characterization of 2D well-composed sets can be easily proven: Theorem 1 A digital set X * Z is well-composed iff the critical configuration shown in Figure 4 and its 90° rotation do not occur in CA(X) iff the only possible 2;2 configurations of boundary squares in CA(X) (modulo 90° ■ rotation) are as in Figure 5. The preceding definition of well-composed sets is based on region interpretation of digital sets. Now we give an equivalent point-based definition of 2D well-composed sets. First we need to define adjacency relations for digital points: The 4-neighbors (or direct neighbors) of a point (x, y) in Z are its four horizontal and vertical neighbors (x ; 1, y), (x 9 1, y) and (x, y ; 1), (x, y 9 1) (see Figure 6(a)). The 8-neighbors of a point (x, y) in Z are its four horizontal and vertical neighbors together with its four diagonal neighbors (x;1, y;1), (x;1, y91) and (x91, y;1), (x91,y91) (see Fig. 6(b)). If two points P, Q + Z are n-neighbors, we also call them n-adjacent and write n 9 adj(P, Q), where n : 4 or 8. For n : 4 or 8, the n-neighborhood of a point P : (x, y) in Z is the set N (P) consisting of P and its n-neighbors. N*(P) is the set of all neighbors of P without P itself. Note that N*(P) : N (P)),P-. The points in N*(P)
Figure 5. The only possible 2 ; 2 configurations of boundary squares in CA(X) of a well-composed set X * Z (modulo 90° rotation).
102
LONGIN JAN LATECKI
Figure 6. (a) shows four 4-adjacent points (or 4-neighbors) of the center point and (b) shows eight 8-adjacent points (or 8-neighbors) of the center point.
are numbered 0 to 7 according to the following scheme: N (P) N (P) N (P)
N (P) P N (P)
N (P) N (P) N (P)
Each N , i : 0, . . . , 7, is a function from Z to Z that maps a point P to one of its neighbors according to this scheme. A digital set X is n-connected if for every pair of points P, Q in X, there is an n-path contained in X from P to Q. Sometimes we will say that X is connected if the adjacency relation is clear from the context. A (connected) n-component of a set S is a greatest n-connected subset of S. In particular, if S is connected, then the only component of S is S itself. The following proposition (that can be easily proven) translates the region-based definition of 2D well-composed sets to a point-based definition that requires graph structure introduced by adjacency relations between points: Proposition 1 A digital set X * Z is well-composed iff for every x, y + X if x is 8-adjacent to y, then there exists z + X such that z is 4-adjacent to both x and y. ■ As a simple consequence we obtain Proposition 2, here. Under a binary digital image we understand a pair (Z , X), where either X * Z or X is finite. On X and its complement X , there is defined an adjacency relation. Every point in X is called a foreground or black point and assigned value 1; every point in X : Z )X is called a background or white point and assigned value 0. Definition 5 We call a binary digital image (Z , X) well-composed if the set X is a well-composed set.
WELL-COMPOSED SETS
103
Figure 7. The sum of two well-composed sets does not need to be a well-composed set.
Proposition 2 An image (Z, X) is well-composed iff every 8-component of X is also an 4-component of X and every 8-component of X is also an 4-component of X . Under a segmented digital image we understand (Z , X , . . . , X ), where either X * Z or X is finite for i : 0, . . . , k and X . X : ` if i " j. A binary digital image is a special case of a segmented image in which we distinguish only two objects, the foreground and the background. Definition 6 We call a segmented digital image (Z , X , . . . , X ) well composed if each X is a well-composed set for i : 0, . . . , k. Observe that the set A / B does not need to be well-composed if A and B are well-composed (see Figure 7). This definition of a well-composed segmented image is equivalent to Definition 3.2 in Wang and Bhattacharya (1997) applied to X : X / % / X , since then X , . . . , X is a partition of X in their work. The case of 3D well-composed sets is treated in the next section.
III. 3D Well-Composed Sets A. L ocal Properties of the Continuous Analog This section is based on Latecki (1997). The definition of well-composed 3D sets is nicely visualized by Proposition 3 which shows the equivalence of this definition to two simple local conditions on boundary faces. Since CA(X) for X * Z is a union of cubes, bdCA(X) is a union of squares, that is, sides of the cubes that are contained in bdCA(X). We call a corner point of such a square a corner point of bdCA(X). Proposition 3 A digital set X * Z is well-composed iff for any corner point x + bdCA(X), the boundary faces of CA(X) that contain x have one of the six configurations shown in Figure 8 (modulo reflections and rotations).
104
LONGIN JAN LATECKI
Figure 8. In the continuous analog of a 3D well-composed picture, only these configurations of boundary faces can occur around a corner point on the object boundary.
Proof ‘‘A’’ Clearly, if a point x + bdCA(X) lies in the interior of some square contained in the boundary bdCA(X), then x has a neighborhood homeomorphic to R. If one of the configurations shown in Figure 8 (modulo reflections and rotations) appears at a corner point x of some square in bdCA(X), then x has a neighborhood homeomorphic to R. Hence bdCA(X) is a 2-manifold. ‘‘$’’ Let x + bdCA(X) be a corner point of some square in bdCA(X). In this case eight cubes share x as their common corner point; some of them are contained in CA(X) and some are not. By simple analysis of all possible configurations of the eight cubes, we will show that boundary faces of CA(X) that contain point x can have only the configurations shown in Fig. 8 (modulo rotations and reflections), since x has a neighborhood homeomorphic to an open subset of R. We start this analysis with one cube q : R whose corner point is x such that q : CA(p) for some point p + X : Z. If all other cubes whose corner point is x are contained in CA(X ), then boundary faces of q that contain x form configuration (a) in Figure 8. If there is one more cube r contained in CA(X) that shares x with q, r must share a face with q, since configurations (1) and (2) in Figure 3 are not allowed (x would not have a neighborhood homeomorphic to R). Thus, boundary faces of q / r that contain x form configuration (b) in Figure 8. By similar arguments, if we add a third cube, we only obtain configuration (c) of boundary faces. If we add a fourth cube, we obtain one of configurations (d), (e), or (f).
WELL-COMPOSED SETS
105
Adding a fifth cube will transform the configurations (d), (e), or (f) of boundary faces to configuration (c), which is now viewed as having five cubes in CA(X). Adding a sixth cube will transform configuration (c) of boundary faces (of five cubes) to configuration (b), which is now viewed as having six cubes in CA(X). Adding a seventh cube will yield configuration (a) of boundary faces of seven cubes in CA(X). Thus, we have shown that boundary faces of CA(X) that contain point x can have only the six configurations in Figure 8 (modulo rotations and reflections). ■ Observe that the six face neighborhoods of a corner point shown in Figure 8 are exactly the same as shown in Chen and Zhang (1993) and Francon (1995). The fact that only the six boundary configurations shown in Figure 8 are possible for a corner point x + bdCA(X) of a well-composed set X * Z (Proposition 3) gives a nice motivation for calling the boundary of a 3D well-composed set a digital 2D manifold. The points of this digital manifold are the corner points x + bdCA(X). Analogous to continuous 2-manifolds, we can define derivatives at corner points x + bdCA(X) and their curvature. Clearly, these definitions are based on finite differences with respect to the neighbor corner boundary points. Further, we can classify corner points x + bdCA(X) according to their discrete derivatives, see Figure 8: 1. corner point x in (a) and (c) is an elliptic point; 2. corner point x in (b) and (d) is a parabolic point; and 3. corner point x in (e) and (f) is a hyperbolic point. Observe that there is only one connectedness relation on faces contained in the boundary of the continuous analog CA(X) of a well-composed picture (Z, X): A set of boundary faces S is a corner-connected component of bdA(X) iff S is an edge-connected component of bdCA(X). As every boundary bdCA(X) is a finite union of some set of closed faces S, that is, bdCA(X) : +S, the statement that bdCA(X) is a simple closed surface means here that bdCA(X) is a connected 2D manifold in R. Hence, we obtain the following proposition. Proposition 4 A digital set X * Z is well-composed iff every component of ■ bdCA(X) is a simple closed surface. The definition of well-composed sets is nicely visualized by Proposition 5, which shows the equivalence of this definition to two simple local conditions on voxels in the continuous analog. Proposition 5 A digital set X * Z is well-composed iff the critical configurations of cubes (1) and (2) in Figure 3 (modulo reflections and rotations) do not occur in CA(X) and CA(X ).
106
LONGIN JAN LATECKI
Proof The proof follows directly from Proposition 3, since for any corner point x + bdCA(X), the boundary faces of CA(X) that contain x have one of the six configurations shown in Figure 8 iff the critical configurations of cubes (1) and (2) do not occur in CA(X) or CA(X ). ■ In Artzy et al. (1981) the digital 3D sets that do not contain configuration (1) in Figure 3 (modulo reflections and rotations) are defined to be solid. However, configuration (2) can occur in a solid set. B. Digital Characterization of Well-Composed Sets In this section we give a ‘‘digital characterization’’ (using only points in Z) of well-composed sets. First we need definitions of adjacency relations on points in Z that induce a graph structure. Due to the duality between points in Z and voxels (i.e., cubes centered at these points), the following definitions apply to points as well as to voxels in R. While the definitions for points reflect the graph theoretical structure of Z, the definitions for voxels are based on topological properties of the continuous analogs. Two distinct points p : (p , p , p ), q : (q , q , q ) + Z are said to be 26-adjacent (26-neighbors) if p 9 q ; p 9 q ; p 9 q 3, that is, all three of the coordinates of p, q differ by at most one 1. All 26 points different from x in Figure 9 illustrate the 26-neighbors of x. Two distinct points p, q + Z are said to be 18-adjacent (18-neighbors) if p 9 q ; p 9 q ; p 9 q 2, that is, if at most two coordinates of p, q differ by 1. Two distinct points p, q + Z are said to be 6-adjacent (6-neighbors) if p 9 q ; p 9 q ; p 9 q 1, that is, if two of the coordinates of p, q are the same and the third coordinates differ by 1. We will denote the set of closed faces of cubes in C by F, that is, each f + F is a unit closed square in R parallel to one of the coordinate planes. Two distinct points p, q + Z are 6-adjacent iff voxels CA(p) and CA(q) share a face. In this case, we say that CA(p) and CA(q) are face-adjacent. We say that voxels CA(p) and CA(q) are edge-adjacent if cubes CA(p) and CA(q) share an edge but not a face (i.e., CA(p) . CA(q) is a line segment), which is equivalent to the fact that p, q + Z are 18-adjacent but not 6-adjacent (see Figure 9). We say that voxels CA(p) and CA(q) are corner-adjacent if cubes CA(p) and CA(q) share a vertex but not an edge (i.e., CA(p) . CA(q) is a single point), which is equivalent to the fact that p, q + Z are 26-adjacent but not 18-adjacent (see Figure 9). In this case, we say that p and q are diagonally adjacent. A set X : Z is k-adjacent to a point p + Z if there exists q + X such that p and q are k-adjacent, where k : 6, 18, 26.
WELL-COMPOSED SETS
107
Figure 9. The slightly larger black balls illustrate the six 6-neighbors of the center point x, the gray balls illustrate the edge neighbors of x, and the small black balls illustrate the corner neighbors of x.
N (p) denotes the set containing p + Z and all points k-adjacent to p and N*(p) denotes N (p)),p-, where k : 6, 18, 26. N (p) is also referred to as N(p) and is called the neighborhood of p, whereas N (p)),p- is referred to as N*(p). A common face of two cubes centered at points p, q + Z (i.e., a unit square parallel to one of the coordinate planes) can be identified with the pair (p, q). Such pairs are called ‘‘surface elements’’ in Herman (1992), as they are constituent parts of object surfaces. We can extend CA to apply also to pairs of points by defining CA((p, q)) : CA(p) . CA(q) for p, q + Z, and CA(B) : +,CA(x) : x + B-, where B is a set of pairs of points in Z. In particular, we have F : ,CA((p, q)) : p, q + Z and p is 6 9 adjacent to q-. The ( face) boundary of a continuous analog CA(X) of a digital set X * Z is defined as the union of the set of closed faces each of which is the common face of a cube in CA(X) and a cube not in CA(X). Observe that the face boundary of CA(X) is just the topological boundary bdCA(X) in R. The face boundary bdCA(X) can also be defined using only cubes of the set CA(X) as the union of the set of closed faces each of which is a face of exactly one cube in CA(X). We have bdCA(X) : bdCA(X ), where X : Z)X is the complement of X. The (6-) boundary of a digital set X * Z can be defined as the set of pairs bd X : ,(p, q) : p + X and q , X and p is 6 adjacent to q-. We have bdCA(X) : CA(bd X) : CA(bd (X )).
108
LONGIN JAN LATECKI
As bdCA(X) : bdCA(X ), a set X is well-composed iff the set X is well-composed. Two distinct faces f , f + F are edge-adjacent if they share an edge, that is, if f . f is a line segment in R. Two distinct faces f , f are corner adjacent if they share a vertex but not an edge, that is, if f . f is a single point in R. Proposition 6 A 3D digital set X * Z is well-composed iff the following conditions hold for 0 : 0, 1, where X : X and X : X : (C1) for every two 18-adjacent but not 18-adjacent points x, y in X , there G is a point z in X that is 6-adjacent to both x and y; and G (C2) for every two 26-adjacent but not 18-adjacent points x, y in X , there G is a 6-path joining x to y in N(x) . N(y) . X . G Proof Let X : X , where 0 : 0, 1. We show first that the negation of G condition (C1) is equivalent to the fact that configuration (1) (Figure 3) occurs in CA(X). If configuration (1) occurs in CA(X), then there exist four distinct points x, y + X and a, b , X such that CA(x), CA(y), CA(a), CA(b) share an edge. Then x, y + X are 18- but not 6-adjacent in X. Only the points a, b are 6-adjacent to both x and y, but a, b , X. Conversely, if there exists two 18- but not 6-adjacent points x, y in X such that there does not exist a point z in X that is 6-adjacent to both x and y, G then the configuration (1) (Figure 3) occurs in CA(X). Now we show that if configuration (2) (Figure 3) occurs in CA(X), then condition (C2) does not hold. Let x, y + X be such that CA(x) and CA(y) form configuration (2). Then x, y + X are 26- but not 18-adjacent in X. Figure 10 shows the intersection N(x) . N(y) of two 26- but not 18adjacent points x and y. It is easily seen that the other six points in N(x) . N(y) do not belong to X. Therefore, there is no 6-path joining x to y in N(x) . N(y) . X . G Finally, we assume the negation of condition (C2). Let x, y in X be two 26-adjacent points such that there is no 6-path joining x to y in N(x) . N(y) . X. This implies that configuration (2) or configuration (1) occurs in CA(X). ■ The following proposition implies that there is only one kind of connected components in a well-composed picture, because 26-, 18-, and 6-connected component are equal. Proposition 7 Let X * Z be well-composed. Then, each 26-component of X is a 6-component of X and each 18-component of X is a 6-component of G G G X , where 0 : 0, 1 and X : X and X : X . G
WELL-COMPOSED SETS
109
Figure 10. The slightly larger black balls illustrate the intersection N(x) . N(y) of two 26- but not 18-adjacent points x and y.
Proof Let x : x , x , . . . , x : y be a 26-path joining x to y in X . By G condition (C2) in Proposition 6, for any two 26-neighbors x , x , i : 1, . . . , n 9 1, there is a 6-path joining x to x in X . Thus, there exists a G 6-path joining x to y in X . The argument for 18-components is similar. ■ G C. Jordan-Brouwer Separation Theorem An important motivation for introducing 3D well-composed pictures is the following digital version of the Jordan-Brouwer separation theorem. We recall that in a digital picture (Z, X) either X : X or its complement X : X is finite and nonempty. Theorem 2 If a 3D digital set X is well-composed, then for every connected component S of bdCA(X), R)S has precisely two connected components of which S is the common boundary. Proof The proof of this theorem follows directly from Theorem 3, which is stated at the end of this section. It is sufficient to observe that by
110
LONGIN JAN LATECKI
Proposition 3, a connected component of bdCA(X) is a strongly connected polyhedral surface without boundary, which we define in what follows. ■ Note that if a digital picture is not well-composed, Theorem 2 does not hold, for example, if X is a two-point digital set such that CA(X) is as shown in Figure 3. Now we define polyhedral surfaces in R. They were used in Kong and Roscoe (1985) to prove 3D digital analogs of the Jordan curve theorem. Let n 0 and let ,T : 0 i n- be a set of closed triangles in R. The set +,T : 0 i n- is called a polyhedral surface if the following conditions both hold: (i) If i " j, then T . T is either a side of both T and T or a corner of both T and T or the empty set; and (ii) each side of a triangle T is a side of at most one other triangle. The (1D) boundary of a polyhedral surface S : +,T : 0 i n- is defined as +,s : s is a side of exactly one T -. Observe that this definition produces the same boundary of S for every dissection of S into triangles fulfilling (i) and (ii). We say that S is a polyhedral surface without boundary if the boundary of S is the empty set. A polyhedral surface S is strongly connected if for any finite set of points F * S, the set S)F is polygonally connected, where the definition of a polygonally connected set is the following: If u and v are two distinct points in R, then uv denotes the straight line segment joining u to v. Suppose n 0 and ,x : 0 i n- is a set of distinct points in R such that whenever i " j, x x . x x : ,x , x - . ,x , x -, then arc(x , x ) : ,x x : 0 i n- is a simple polygonal arc joining x to x . We call a subset S of R polygonally connected if any two points in S can be joined by a simple polygonal arc contained in S. Now we can state the Jordan-Brouwer separation theorem for a strongly connected polyhedral surface without boundary. This theorem is a very important result of combinatorial topology (e.g., see Aleksandrov, 1960). It was applied in Kong and Roscoe (1985) to establish separation theorems for digital surfaces: Theorem 3 If S is a strongly connected polyhedral surface without boundary then R)S has precisely two components, and one of the components is bounded. S is the boundary of each component. Our proof of Theorem 2 follows directly from the Jordan-Brouwer separation theorem stated in Theorem 3, because every connected component S of bdCA(X) of a well-composed set X is a strongly connected polyhedral surface without boundary.
WELL-COMPOSED SETS
111
D. Properties of Boundary Faces Recall that we interpret Z as a set of points with integer coordinates in the space R, C is a set of closed unit upright cubes which are centered at points of Z, and F is a set of closed faces of cubes in C, that is, each f + F is a unit closed square in R parallel to one of the coordinate planes. Note that C : ,CA(p) : p + Z- and F : ,CA((p, q)) : p, q + Z and p is 6 adjacent to q-. We also recall that the function Dig : P(R) P(Z) is defined by Z Dig (Y ) : ,p + Z : p + Y -. We begin this section with a theorem relating Z well-composed pictures to simple closed surfaces composed of faces in F. Theorem 4 Let S : F be a finite and nonempty set of faces in R. +S is a simple closed surface (i.e., +S is a connected and compact 2D manifold in R) iff R)+S has precisely two components X and X , +S is the common boundary of X and X , and the set Dig (X ) * Z is well-composed. Z The proof of this theorem will be given here. Observe that the implication ‘‘A’’ in Theorem 4 would not be true if the set Dig (X ) were not Z well-composed. Let S : bdCA(D), where D is a digital set of 1’s in the following 2 ; 2 ; 2 configuration (on a background of 0’s): 1 0
1 1
1 1
0 1
Then R)S has precisely two components, but S is not a simple closed surface, as the common corner of the six black (i.e., 1-) voxels does not have a neighborhood homeomorphic to R. To better understand the equivalence in Theorem 4, we consider Theorem 5, the six simple local configuration of faces shown in Figure 8. Theorem 5 If S : F is a finite and nonempty set of faces in R, then the following conditions are equivalent: (i) +S is a simple closed surface (i.e., +S is a connected and compact 2D manifold in R); and (ii) S is corner-connected and for every corner point x + +S, the boundary faces of S that contain x as their corner point have one of the six configurations shown in Figure 8 (modulo reflections and rotations). Proof ‘‘(i) $ (ii)’’ As +S is a simple closed surface, each point s + + S has a neighborhood homeomorphic to R. Thus, in particular, each corner point x of a face in S has a neighborhood homeomorphic to R. By simple case checking as in the second part of the proof of Proposition 3, it can be shown that Figure 8 shows all possible configurations (modulo rotations and reflections) of faces in F that share a common corner point x such that
112
LONGIN JAN LATECKI
x has a neighborhood homeomorphic to R. Now as +S is connected, the set of faces S must be corner-connected. Thus, we obtain (i) $ (ii). ‘‘(ii) $ (i)’’ We assume (ii). Then every point in the 2D interior of a face in S clearly has a neighborhood homeomorphic to R. As every edge belongs to exactly two faces in S, every point of an edge (except the two corner points) has a neighborhood homeomorphic to R. Because for every corner point x of a face in S, the set of faces sharing x has one of the six configuration of faces shown in Figure 8, x has a neighborhood homeomorphic to R. Thus, +S is a 2D manifold. +S is a connected subset of R, because S is corner-connected. As +S is a finite union of closed squares in R, +S is compact. Therefore, +S is a simple closed surface. ■ Now we are ready to prove Theorem 4. Proof of Theorem 4 ‘‘$’’ Let +S be a simple closed surface. Then S satisfies condition (ii) of Theorem 5. Consequently, +S is a strongly connected polyhedral surface without boundary. By Theorem 3, R)+S has precisely two components X and X , and +S is the common boundary of X and X . It remains to show that the digital set Dig (X ) is well Z composed. As X / +S : CA(Dig (X )), we have +S : bd(CA(Dig (X ))) for i : Z Z 1, 2. Thus, bd(CA(Dig (X ))) is a 2D manifold for i : 1, 2. We obtain that Z (Z, Dig (X )) is well-composed for i : 1, 2. Z ‘‘A’’ Because Dig (X ) is well-composed, bd(CA(Dig (X ))) is a 2D maniZ Z fold. As the closed set X / +S is a union of some cubes in C, we obtain X / +S : CA(Dig (X )). Hence +S : bd(CA(Dig (X ))), which means Z Z that +S is a 2D manifold in R. As +S is a finite union of closed squares in R, it is compact. It remains to show that +S is connected. If +S were not connected, then there would be more than two components of R)+S, because every connected component of +S would be a strongly connected polyhedral surface without boundary, and therefore, it would satisfy Theorem 3. ■ E. Connected Components in 3D Well-Composed Pictures Rosenfeld and Kong (1991) proved the following theorem for 2D digital pictures: Theorem 6 For every finite and nonempty set X : Z, the boundary bdCA(X) is a simple closed curve (i.e., bdCA(X) is connected and each line segment in bdCA(X) is adjacent to exactly two others) iff X and X are both 4-connected. ■ As it is shown in Rosenfeld and Kong (1991), an analogous theorem does
WELL-COMPOSED SETS
113
not hold in 3D: Let X be a set of 1’s in the following 2 ; 2 ; 2 configuration (on a background of 0’s): 1 1
0 1
1 1
1 0
Then X and X are both 6-connected, but bdCA(X) is not a simple closed surface. However, the inverse implication is proved in Rosenfeld and Kong (1991), Proposition 9: Theorem 7 If the boundary bdCA(X) of a set X : Z is a simple closed surface, then X and X are both 6-connected. ■ Using the concept of well-composedness, we can generalize Theorem 6 to three dimensions: Theorem 8 For every finite and nonempty set X : Z, the boundary bdCA(X) is a simple closed surface iff X and X are both 6-connected and set X (and consequently X ) is well-composed. Proof ‘‘$’’ By Theorem 7, we obtain that X and X are both 6-connected. As a simple closed surface is in particular a 2D manifold, we obtain that X is well-composed. ‘‘A’’ As X and X are 6-connected, CA(X) and CA(X ) are connected subsets of R and bdCA(X) is their common boundary. Therefore, bdCA(X) is also a connected subset of R. As X : Z is finite, bdCA(X) is compact. By definition, the fact that (Z, X) is well-composed implies that bdCA(X) is a 2D manifold. Consequently, bdCA(X) is a simple closed surface. ■
IV. 2D Well-Composed Sets This section is based on Latecki et al. (1995). A digital analog of a continuous concept is defined based on a discrete theory, mostly graph theory. Then it is shown using graph-theoretical tools that some particular properties of the digital concept are the same as the corresponding properties of its continuous original. For example, a digital version of a simple closed curve can be defined as a connected subgraph of the digital plane (with a certain minimal number of points) in which each point is adjacent to exactly two other points. Further, it can be proved that a simple closed curve separates the digital plane into exactly two components, which can be interpreted as the interior and the exterior of the curve (Rosenfeld, 1979). Thus, a digital simple closed curve has exactly the same separability property as its continuous original. This property is known as the Jordan
114
LONGIN JAN LATECKI
curve theorem. Rosenfeld proved this property for a special graph structure of Z, which we will describe in what follows. An important property of a continuous arc is that its connectivity is destroyed by the removal of one point other than an endpoint. A continuous simple closed curve has the property that deleting any of its points makes it an arc. These statements can be used for general definitions of a discrete arc and a discrete simple closed curve in any graph, which reduce to the following definitions in the case of a digital plane or space. A finite set A is called a (digital) n-arc if it is n-connected, and all but two of its points have exactly two n-neighbors in A, while these two have exactly one. These two points are called endpoints. A finite set C is called a (digital) simple closed n-curve (Jordan n-curve) if it is n-connected, and each of its points has exactly two n-neighbors in C. To avoid pathological situations, we require that an n-simple closed curve contains a certain minimal number of points, for example, a 4-curve contains at least 8 points and an 8-curve at least 4 points (Rosenfeld, 1979). By these definitions, an important and common property of a continuous simple closed curve and a digital simple closed curve is that their connectivity is destroyed by the removal of any two points. A typical situation in defining discrete analogy of continuous concepts is the following: It is not enough to define a digital concept, but it is additionally necessary to specify the graph structure of the digital space in which this concept should have properties analogous to its continuous original. For example, there are graph structures on Z in which a simple closed curve does not satisfy the Jordan curve theorem. This fact was noted early in the history of image analysis and referred to as ‘‘connectivity paradoxes.’’ We illustrate some of the paradoxes pointed out in Rosenfeld and Pfaltz (1966) (see also Kong and Rosenfeld (1989)). If 8-adjacency is used for black as well as white points in Figure 11a, then the black points form a discrete simple closed 8-curve, but this curve does not separate the
Figure 11. Connectivity paradoxes.
WELL-COMPOSED SETS
115
complement (i.e., white points) as there is an 8-path between any pair of the white points, which means that the set of the white points is 8-connected. This example shows that a digital version of the Jordan curve theorem does not hold for 8-adjacency. Figure 11b shows that the situation is equally problematic for 4-adjacency. The black points constitute a simple closed 4-curve, but there exist three 4-connected components of the complement. Thus, in both cases, a digital version of the Jordan curve theorem does not hold. Observe also that, if 4-adjacency is used for black as well as white points in Figure 11a, then the black points are totally disconnected. However, they separate the set of white points into two 4-components. The most popular solution to these problems was the idea of using different adjacency relations for the foreground and the background: 8adjacency for black points and 4-adjacency for white points, or vice versa (first recommended in Duda et al. (1967)). If we consider 8-adjacency for the black points in Figure 11a, then the set of black points forms a simple closed 8-curve which separates the white background into exactly two 4-components. Similarly, if we consider 4-adjacency for the black points in Figure 11b, then the set of black points forms a simple closed 4-curve, which separates the white background into exactly two 8-components. Rosenfeld (1979) developed the foundations of digital topology based on this idea, and showed that the Jordan curve theorem is then satisfied. The price one has to pay for this solution is that there are two different adjacency relations in one digital picture, that depend on the objects being represented. Therefore, the adjacency relation is not an intrinsic feature of a digital picture as a representing medium. Consequently, connected components are also not intrinsic features as is the case for R with the usual topology. As we have one connectedness relation for the foreground (e.g., the set of black points) and another for the background (e.g., the set of white points), interchanging the foreground and the background also changes the connectedness relations of the digital picture. By the foreground we mean the objects whose properties we want to analyze, and by the background all the other objects of a digital image. Hence, the choice of foreground and of background is critical, especially in cases where it is not clear at the beginning of the analysis what constitutes the foreground and what constitutes the background, since this choice immediately determines the connectedness structure of the digital picture. A solution that allows us to avoid the connectivity paradoxes while having only one connectedness relation for the entire digital picture is to restrict the class of all possible binary images to well-composed images. This kind of solution is commonly applied in mathematical theories, because in most fields of mathematics we do not treat all subsets of a given space, but
116
LONGIN JAN LATECKI
only a class of subsets that have ‘‘nice properties’’ with regard to features we are especially interested in. As the pictures such as those shown in Figure 11 are not well-composed, their connectivity paradoxes will not occur in well-composed pictures. At first glance, it may seem that by requiring well-composedness we restrict the variety of digital pictures. However, requiring digital pictures to be wellcomposed is actually a consequence of requiring that the process of digitization preserves topology, since as we show in Section V, if the digitization resolution is fine enough to ensure topology preservation, then the output digital picture is well-composed. This gives us also the right to ‘‘repair’’ a non-well-composed picture, because if the neighborhood of a point is not well-composed, it must be due to noise. As well-composedness is a local property, that is, it depends on the colors of single picture points, it can be decided very efficiently in parallel whether a given set is well-composed. Therefore, a picture can be ‘‘repaired’’ by adding (or subtracting) single points by a parallel algorithm (Section IV.F). The other possibility, which is more promising for applications, is to impose local conditions on the segmentation process that ensure that the obtained segmented image is well-composed. In a well-composed image not only the concept of Jordan curve has analog separation properties to its continuous original, but additionally every well-composed image can be regarded as a planar graph, that is, it can be embedded into the plane R in such a way that its links do not intersect, as only 4-adjacency links need to be considered. The simpler adjacency structure of a well-composed image suggests that algorithms used in image processing should become simpler when restricted to well-composed images. We will investigate this conjecture on thinning algorithms. We will show that the resulting skeletons have a very simple structure if the input set is well-composed. We will show that simple thinning algorithms can be defined for well-composed sets that generate really ‘‘thin’’ skeletons which are ‘‘one point thick,’’ and we also formally define what this means. Consequently, the problem of irreducible ‘‘thick’’ sets disappears in well-composed pictures. We will also show that skeletons have a graph structure and we define what this means. Restricting thinning to well-composed sets leads to an interesting situation. Although we delete points in fewer configurations, the resulting skeletons are thinner, as only 1/3 as many neighborhood configurations need be considered to decide if a given point can be deleted. In general, there are 18 types of 3 ; 3 neighborhoods of simple points (other than endpoints), which generate 108 neighborhood configurations by 90° rotations and reflections; of these, only 7 types are neighborhoods of simple points (other than endpoints) in well-composed sets, which generate only 36 configurations.
WELL-COMPOSED SETS
117
A. Jordan Curve Theorem and Euler Characteristic The Jordan curve theorem holds for well-composed sets. Thus, if we consider only subsets of Z that are well-composed, then we have no problems with the paradoxes presented in the introduction. Theorem 9 (Jordan curve theorem) T he complement of a simple closed curve in a well-composed image has exactly two components. Proof Let C be a simple closed curve in a well-composed image. Then C is a 4-curve. Rosenfeld (1979) proved, in particular, that if we consider 4-adjacency for C and 8-adjacency for C , then C has exactly two components. Our theorem follows easily from his theorem, since every 8-component is also a 4-component (Proposition 2). ■ Kong and Rosenfeld (1990) showed that if we use 4- (or 8-) connectedness for both a set and its complement, the Euler characteristic can not be computed by counting local patterns. It is well known (Minsky and Papert, 1969) that the Euler characteristic is locally computable if we use changeable 8/4- (or 4/8) connectedness. Theorem 1D shows that the Euler characteristic is also locally computable for well-composed images. Definition 7 Let S be a digital set. If S has n components and S has n components, then 1(S) : n 9 n ; 1 is called the Euler characteristic of S. Theorem 10 T he Euler characteristic is locally computable for well-composed images. Proof Minsky and Papert (1969, Chapter 5.8.1) proved this theorem using 4-adjacency for black points and 8-adjacency for white points. Our theorem follows easily from their theorem, since every 4-component is an 8-component of S and vice versa (Proposition 2), and therefore, we can use 4-adjacency for black points and 8-adjacency for white points in the computation of the Euler characteristic. ■ B. Thinning Thinning is a common pre-processing operation in digital image processing. Its goal is to reduce a set S to a ‘‘skeleton’’ in a ‘‘topology-preserving’’ way. Rosenfeld (1975) stated three requirements that a 2D thinning algorithm should satisfy: () Connectedness is preserved, for both the objects and their complements; (2) curves, arcs, and isolated points remain unchanged; and (1) upright rectangles, whose length and width are both greater than 1, do not remain unchanged.
118
LONGIN JAN LATECKI
In this section we present a sequential algorithm, and in Section IV.E a parallel algorithm, which fulfill these requirements. In addition, these algorithms preserve well-composedness and produce really thin sets (a concept which we will also define precisely). We begin with the definition of endpoints: Definition 8 A black point P is n-endpoint if it has exactly one black n-neighbor in N*(P), where n : 4 or 8. There exists only one type of 8-endpoints characterized by the fact that they have exactly one 8-neighbor. There exist two different types of 4endpoints which are endpoints in well-composed sets, namely those having exactly one 8-neighbor which is a 4-neighbor and configuration 3 in Figure 12 (both of these configurations can occur as endpoints of 4-arcs). Before we give the standard 2D definition of a simple point (Rosenfeld, 1979; Kong and Rosenfeld, 1989), we want to remind the reader that either 8-connectedness is used for the foreground and 4-connectedness for the background, or vice versa. Definition 9 A black point p is said to be n-simple if (C1) p is n-adjacent to only one black n-component in N (p))p and (C2) p is m-adjacent to only one white m-component in N (p), where (n, m) : (8, 4) or (4, 8). An equivalent definition is that a point P + S is n-simple in a set S iff N*(P) contains just one n-component of S which is n-adjacent to P and P is an m-boundary point, where (n, m) : (8, 4) or (4, 8). For example, in Figure 12, configuration of all 8-simple points (except 8-endpoints) are shown. Algorithms for checking whether a point is simple in 2D, as well as in 3D images, are presented in Latecki and Ma (1996), for example. Simple points are used in thinning algorithms: Definition 10 n-T hinning a digital set means repeated removal of n-simple points that are not n-endpoints. A digital set is termed n-irreducible if its only n-simple points are n-endpoints. An irreducible set obtained from a set by means of thinning is called a skeleton. One step in a sequential thinning algorithm consists of removal of a single simple point. One step in a parallel thinning algorithm consists of the simultaneous removal of some set of simple points. It is required that after every step of a thinning algorithm the connectedness of a set and of its complement are not changed (Rosenfeld’s condition ()). This is a simple consequence of the definition of simple points for sequential thinning, but
WELL-COMPOSED SETS
119
Figure 12. Configurations of all possible 8-simple points (except 8-endpoints). The middle point of configuration 3 is a 4-endpoint but not an 8-endpoint.
requires a separate proof for parallel thinning. The special treatment of endpoints guarantees that Rosenfeld’s condition (2) is satisfied. If we want thinning to preserve well-composedness, only 4-simple points that are not endpoints can be delted. If we use changeable 8/4-connectedness, there are 18 types of 8-neighborhood configurations of 8-simple points
120
LONGIN JAN LATECKI
(not including endpoint configurations), which generate 108 8-neighborhood configurations of 8-simple points by rotations and reflections (Eckhardt and Maderlechner, 1993). Figure 12 shows all 18 types of 8-simple points. An important advantage of dealing only with well-composed sets in thinning processes is the fact that we have to treat only 7 types of 4-simple point neighborhoods (without endpoints): 7, 14, 15, 31, 62, 63, and 191 (see Figure 12). This corresponds to 36 neighborhood configurations obtained by rotations and reflections. The idea of deleting only 4-simple points in thinning an 8-connected object dates back to Rutovitz in 1966 and was proposed by different authors (Lu¨ and Wang, 1985; Stefanelli and Rosenfeld, 1971; Zhang and Suen, 1984); however, they did not use the concept of well-composed sets. In general, if we delete only 4-simple points to thin any subset of Z, we have problems with 8-components, since 4-simple points are not necessarily 8-simple, as the following configuration shows:
Obviously, at least one of the critical configurations shown in Figure 13 appears in such a situation. If we treat well-composed images, these problems cannot occur. In fact, in well-composed sets every 4-simple point is 8-simple. Another very important property of well-composed sets is given in the following theorem. As already noted, in a well-composed set we can have only 4-simple points, since every component of a well-composed set is 4-connected. Therefore, thinning a well-composed set means removal of 4-simple points that are not endpoints. Theorem 11 Sequential 4-thinning is an internal operation on well-composed sets, that is, applying sequential 4-thinning to a well-composed set results in a well-composed set.
Figure 13. The critical configurations that appear in non-well-composed images.
WELL-COMPOSED SETS
121
Figure 14. Arcelli’s set with five interior points.
Proof If we delete any 4-simple point from a well-composed set, we obtain a well-composed set. To see this it is sufficient to analyze the neighborhood configurations of 4-simple points that are not endpoints 7, 14, 15, 31, 62, 63, and 191 in Figure 12. ■ We conclude that thinning is an internal operation on well-composed sets if and only if only 4-simple points are deleted. It is easy to see that if we eliminate any 8-simple point which is not 4-simple, the resulting set will no longer be well-composed (see Figure 12). One of the most important goals of thinning is to obtain a skeleton of the input set. Therefore, the resulting skeleton, which cannot be further reduced by thinning, should not have any interior points. However, skeletons may have components of interior points of arbitrarily large sizes as shown in Eckhardt and Maderlechner (1993). The first example of such a skeleton was given by Arcelli (1981) (see Figure 14 here). For well-composed sets the situation is very simple, since there is only one type of kernel component, namely a set with only one point, as will be shown in Theorem 12. This type of interior point cannot be further eliminated, since it indicates a very useful property, namely that we have an intersection of two lines at this point, that is, locally the following situation:
122
LONGIN JAN LATECKI
Eliminating such interior points would mean that a skeleton could not have such intersections of two line segments, which is an unrealistic assumption. Definition 11 A point of a digital set X that has all of its n-neighbors in X is called an n-interior point. The set of all interior points of a set X is termed the n-interior or n-kernel of X. A point of X that has an n-neighbor in the complement S is called an n-boundary point. Again for these concepts, as for all other concepts already defined here, we will drop the prefix n if the n-adjacency relation is clear from the context. For illustration purposes, we will denote the different types of points by the following symbols:
. : black point . : (or blank position) white point
: interior point · : point of either black or white color. ■
Theorem 12 Any 8-connected component of the kernel of a 4-irreducible well-composed set contains at most one point. Proof We prove this by showing that it is not possible to have two adjacent interior points within a well-composed irreducible set. Without loss of generality we may assume that we consider interior points having boundary points as direct neighbors. We distinguish two cases (in the pictures given in the proof, the one on the left gives the start situation and the one on the right gives the situation constructed during the proof): (a) P and Q are two directly neighboring interior points such that the points N (P) and N (Q) are not interior points. If N (Q) were black then N (Q) necessarily is a (4-) simple point or an interior point. So, N (Q) must be white. Because N (Q) cannot be simple and as the set is required to be well-composed, N (N (Q)) is black. The same argument holds for P. Now, N (Q) becomes 4-simple, which is a contradiction.
WELL-COMPOSED SETS
123
(b) Assume now that there are two indirectly neighboring interior points P and Q. Without loss of generality Q : N (P) and N (P) is not an interior point. This means that N (N (P)) or N (P) is white. If one of these neighbors is black, then N (P) is simple, hence both are white. Now N (N (P)) must be black, for otherwise N (P) would again be simple. The configuration thus obtained is no longer well-composed.
■
Definition 12 The crossing number (see, e.g., Hilditch, 1969, p. 411) is the number of white-black (0-1) transitions in the (cyclic) sequence N (P), N (P), . . . , N (P), N (P). Remark For well-composed sets, the crossing number is equal to the number of black 4-components in N*(P), since the crossing number is equal to the number of black 4-components in N*(P) if all 8-components in N*(P) are directly connected to P. This is the case if the critical configur ations (Figure 13) are not contained in N (P). C. Irreducible Well-Composed Sets We now investigate sets having the property of being 4-irreducible and well-composed. Such sets can be obtained by applying a 4-thinning process to a well-composed set (see Theorem 11). Irreducible sets obtained by ordinary thinning can contain all point configurations which are not simple. There are 148 such configurations (256 configurations of 3 ; 3 neighborhoods minus 108 configurations of simple points that are not endpoints); they are generated by 33 neighborhood types. The situation is more favorable if thinning algorithms are applied to well-composed sets. Theorem 13 For a point P in a 4-irreducible well-composed set, only the following neighborhood configurations (as well as symmetric configurations,
124
LONGIN JAN LATECKI
obtained from them by 90° rotations and reflections) are possible: 1. One direct black neighbor (4-endpoint)
2. Two direct black neighbors
3. Three direct black neighbors
4. Four direct black neighbors
WELL-COMPOSED SETS
125
Proof The proof is by enumeration of all possible cases, showing that if P occurs in any other configuration, it must be a 4-simple point, which is not possible in a 4-irreducible set. 1. Let N (P) be black. N (P) and N (P) cannot be black, because then the set would not be well-composed. If N (P) were black, then P is a simple point and we have case 1b. The same holds for N (P). If N (P) and N (P) are white, we have case 1a. Therefore, the two configurations shown in the foregoing are the only possible configuration types with one direct black neighbor. 2. This case is obvious. 3. Let N (P), N (P), and N (P) be black. If N (P) and N (P) are white, then we have case 3b. If only one of them is white, say N (P), then we have case 3a. As N (P) cannot be simple, N (N (P)) must be black and N (N (P)) must be white. As N (P) cannot be simple, N (N (P)) must be black and N (N (P)) must be white. 4. In this case P is an interior point. Assume that N (P) is black (see the picture that follows, where the set on the left represents the start situation, and the set on the right represents the situation constructed during the proof).
If N (P) were black, then N (P) is either an interior point, which contradicts Theorem 12, or else it is necessarily simple. The same argument applies to N (P), so N (P) and N (P) must be white. N (N (P)) is not white, because otherwise either N (P) would be simple (N (N (P)) white) or the set would not be well-composed (N (N (P)) black). Again, by symmetry, N (N (P)) is black. N (N (P)) must be white, as otherwise N (P) would be simple. Now, regardless of the color of N (N (P)), N (P) is always simple (as the set is assumed to be well-composed). It is now easily seen that the configuration is as shown in the preceding. ■ Remark If P is an interior point of an irreducible well-composed set, then we have locally only the configuration presented in part 4 of Theorem 13.
126
LONGIN JAN LATECKI
So, P can be treated as an intersection point of a vertical and a horizontal line segment. D. Graph Structure of Irreducible Sets The goal of thinning is formulated by different authors as obtaining a set that is ‘‘one pixel thick’’ ‘‘. . . a single pixel wide . . .’’ in Pavlidis (1982b, p. 143), ‘‘. . . until all that remains is lines which are one point wide . . .’’ in Hilditch (1969, p. 407), ‘‘T hus, a final step might be necessary to reduce the set of the skeletal pixels to the unit width skeleton’’ in Arcelli and Sanniti di Baja (1989, p. 411), ‘‘A unitary skeleton is a single pixel thickness skeleton, in which each of its pixels is connected to not more than two adjacent pixels unless it represents a treepoint’’ in Abdulla et al. (1988, p. 13), ‘‘Overall, these applications employ thinning (a) to reduce line images to medial lines of unit width, (b) to enable objects to be represented as simplified data structures (e.g., by chain-coding) . . .’’ in Davies and Plummer (1981). Bearing in mind Arcelli’s example (Figure 14), one might wonder how to give this requirement a precise meaning. Using the concept of well-composed sets, we can now propose the following definition: Definition 13 A digital set is one pixel thick if it is well-composed and 4-irreducible. Thus, we may give Theorem 12 an alternative formulation: Theorem 14 The skeleton of a well-composed set is one pixel thick. On the other hand, by ‘‘thinness’’ is meant intuitively that the skeleton should have a ‘‘graph-like’’ structure, as is expressed in the informal definition in Abdulla et al. (1988) or Davies and Plummer (1981). As any digital set that is equipped with a neighborhood relation is a graph, we should make this concept more precise by formalizing the concept of a ‘‘graph-like structure.’’ Definition 14 For any point P the 8-connection number C (P) is the number of black 8-components in N*(P) and the 4-connection number C (P) is the number of black 4-components in N*(P) that are directly connected to P. Obviously C (P) is never greater than C (P). Definition 15 An n-graph point of a digital set is a point having the property that its n-connection number equals the number of its black n-neighbors, where n : 4 or 8. A digital set is termed an n-graph if all of its points are n-graph points, where n : 4 or 8. The 8-skeleton of a digital set is not necessarily an 8-graph. The following
WELL-COMPOSED SETS
127
set is 8-irreducible, but P is not an 8-graph point:
The same negative assertion holds for the 4-skeleton. In configuration 3.a of Theorem 13, point P is not a 4-graph point. However, if this configuration is 8-thinned, we obtain the following configuration which consists entirely of 8-graph points:
Definition 16 We define a point of a digital set to be a 4/8-graph point if it is either a 4-graph point or an 8-graph point. A 4/8-graph is a digital set consisting entirely of 4/8-graph points. Now we can formulate the ideas of the last part of this section in the following theorem: Theorem 15 If a 4-irreducible well-composed set is postprocessed by 8-thinning applied to all configurations of type 3a of Theorem 13, the resulting set is a 4/8-graph. E. Parallel Thinning on Well-Composed Sets In sequential thinning, where only one simple point is deleted at each step, connectivity is evidently preserved. Parallel thinning, on the other hand, may not preserve connectivity. For example, if the simple points in the central column of the following set are deleted simultaneously, the set becomes disconnected. In order for parallel thinning algorithms to be topologically correct, one must avoid such situations.
128
LONGIN JAN LATECKI
When we investigate parallel thinning methods on well-composed sets, we are faced with an additional dilemma. Parallel elimination of points, even if it is designed so as to be topologically correct, may destroy well-composedness of sets, as shown by the example in Figure 15. Here simultaneous elimination of two 4-simple points yields a set that is not well-composed. Our goal in this section is to construct a parallel thinning algorithm for well-composed sets that is topologically correct and preserves well-composedness. We also want the resulting skeletons to be ‘‘thin,’’ that is, we want their kernel components to be as small as possible, as is the case for sequential thinning of well-composed sets. The simplest possibility is to use a 4-phase thinning algorithm as described by Rosenfeld (1975). This algorithm removes only one type of border point at each phase: north border points are removed in the first phase, east in the second, and then south and west border points (where, for example, a north border point of S is one whose second neighbor is in S ). Rosenfeld showed that this algorithm is topologically correct. If we allow only 4-simple points to be deleted, this algorithm also preserves well-composedness, and the resulting skeletons are 4-irreducible. Therefore, Theorem 12 also holds for this 4-phase thinning. However, 4-phase algorithms are not very efficient, as only 1/4 of the possible points can be deleted in one phase. It is also possible to thin well-composed sets using one-phase parallel algorithms. However, such algorithms must examine a relatively large neighborhood of every point, since it is well known (Rosenfeld, 1975) that a parallel
Figure 15. Parallel 4-thinning is not necessarily an internal operation on well-composed sets.
129
WELL-COMPOSED SETS
thinning algorithm based on a 3 ; 3 neighborhood cannot preserve connectedness. We will now present a parallel thinning method for well-composed sets that minimizes the number of phases while at the same time minimizing the size of the neighborhoods used. It is a two-phase method consisting of one marking phase and one elimination phase. In the first phase the candidates for elimination are marked, and in the second phase these candidates are eliminated if they fulfill a simple condition described in what follows. In Eckhardt and Maderlechner (1993) the concept of a perfect point was introduced. Definition 17 A point Q of a digital set is termed perfect if
· Q has a direct neighbor that is an interior point, say N (Q), where k + ,0, 1, 2, 3-.
· The direct neighbor of Q opposite the interior point N (Q) is white.
For example, if Q in the following configuration is black, then it is perfect.
Definition 18 The south neighborhood of a point P is SN(P) : ,P, N (P), N (P), N (P), N (P), N (P)-:
In a similar way the north neighborhood of P can be defined. In the first step of a parallel thinning algorithm, we mark with ‘‘c’’ direct neighbors of P that are perfect points as candidates for deletion. In the second step, only points marked ‘‘c’’ can be deleted. While deleting the
130
LONGIN JAN LATECKI
marked points, we must take care that in the following situations, at most one of the points marked ‘‘c’’ can be deleted:
Parallel Thinning Algorithm Let S be a set of black points to be thinned and let P be any point in S. First Step Candidates for deletion are marked. If P is a perfect point, then P is marked c (candidate for deletion). Second Step Deletion of marked points. If a point P is marked c and no critical configuration shown here occurs in the south neighborhood of P, then P is deleted (i.e., its color is changed from black to white).
In the first step exactly the perfect points are marked as candidates, which are automatically simple points in a well-composed set, as the following proposition states. In fact every parallel thinning algorithm can be modified using the above two-step technique so that it maps a well-composed set to a well-composed set. In the first step the points that would have been deleted by the original algorithm are marked. The second step is identical with the above one. Proposition 8 Every candidate point in a well-composed set is a simple point. Proof Because the point is a candidate point, we have the following situation (possibly rotated by a multiple of 90°):
WELL-COMPOSED SETS
131
Whatever color the points marked ‘‘·’’ have, the candidate point ‘‘c’’ is obviously simple, as the set is well-composed. ■ As every candidate is a perfect point, we are also guaranteed that spikes will be left unchanged. Note also that endpoints are preserved from deletion, because points that are deleted are simple and perfect, and a perfect point has a direct neighbors that is an interior point. The condition in the second step of this algorithm prevents deletion of a candidate if there is a critical configuration involving another candidate to the south. However, it cannot be that the deletion of every candidate will be prevented by this condition, as there always is a candidate having no critical configuration to the south. Therefore, the number of interior points decreases after every application of this algorithm. Hence, after we finish applying the algorithm, there are no 4-simple perfect points in the resulting set. Now we prove that the algorithm preserves both well-composedness and connectivity. Theorem 16 The two-step parallel thinning method described in the preceding is an internal operation on well-composed sets. Proof Well-composedness is destroyed only when a critical configuration occurs and both candidates (marked with ‘‘c’’) are deleted. This is prevented by deleting only candidates having no diagonal neighbors to the south that ■ are also candidates. Theorem 17 The two-step parallel thinning method is topologically correct on well-composed sets. Proof The proof is based on Ronse’s conditions (1988), of which a simplified version for well-composed sets is the following: A parallel thinning algorithm on a well-composed set S preserves connectedness of the set and its complement if: 1. only simple points are deleted; and 2. for any two 8-adjacent points P and Q of S, P is simple after Q has been deleted, and Q is simple after P has been deleted. By Proposition 8, only simple points can be candidates, and therefore only simple points can be deleted. So it remains to show the second condition. Assume that P and Q are both candidates and direct neighbors. The only possibility for such a situation is indicated in the configuration here (up to
132
LONGIN JAN LATECKI
rotations by multiples of 90°).
If, for example, N (Q) is an interior point and N (Q) is white, then N (P) must be an interior point and N (P) must be white. It is now easy to see that P will be simple in S),Q-, and that the same holds with P and Q interchanged. Let P and Q be diagonal neighbors. Note that both points have to be simple and perfect in S in order to be deleted. In this case deletion of a diagonal neighbor cannot disconnected or delete a black 4-component in the 8-neighborhood of either P or Q. Therefore, P will be simple in S),Q-, and the same holds with P and Q interchanged. ■ Theorem 18 The 4-kernel of a well-composed set that contains no 4-single perfect points is either empty or consists of 8-isolated components having one of the following forms:
Proof The proof follows from Propositions 9 and 10 in the sequel. In a horizontal or vertical line there can at most be two adjacent kernelboundary points. The configuration in which the kernel contains two successive kernel-boundary points in a diagonal line such that one of their two common direct neighbors is not an interior point is impossible. Thus if the kernel contains two successive kernel-boundary points in a diagonal line, then their two common direct neighbors are also interior points. In this case the four kernel points form a square. ■ Definition 19 A point in the kernel of a set is termed a kernel-boundary point if it has at least one direct neighbor that is not in the kernel.
WELL-COMPOSED SETS
133
Proposition 9 If the 4-kernel of S contains three successive kernel-boundary points in a horizontal or vertical line then S contains a point that is (4- and 8-) simple and perfect. Proof We have the following situation:
At least one of the points marked ‘‘·’’ is necessarily white, because otherwise the middle point would not be a kernel-boundary point. ■ Proposition 10 If the (4-)kernel of S contains two successive kernel-boundary points in a diagonal line, and one of their two common direct neighbors is not an interior point, then either S is not well-composed or it contains a point that is (4- and 8-) simple and perfect. Proof We have the following situation:
At least one of the points marked ‘‘·’’ must be white. Without loss of generality assumes that the lower ‘‘·’’, call it Q, is white. If N (Q) were black, then N (Q) would be simple; hence both Q and N (Q) are white. If N (Q) were black, the set would not be well-composed; if it were white, N (Q) would be simple. ■ If the parallel thinning method described in the preceding is applied repeatedly to a well-composed set, the final remaining set does not contain
134
LONGIN JAN LATECKI
any points that are simple and perfect. This set might contain kernel components as in Theorem 18. We will now show that the application of a sequential 4-thinning process to this remaining set results in a 4-irreducible well-composed set which satisfies Theorem 12 and which automatically has all the properties described in Sections IV.C and D. In this sequential thinning process each point in the set is examined once, and if it is simple, it is eliminated. Theorem 19 Assume that a well-composed set S contains no points that are simple and perfect. If the sequential 4-thinning process is applied to S, examining each point of S just once, the resulting set is 4-irreducible. Proof There can be points of S that are simple but not perfect. We will show that after the sequential 4-thinning process is applied to S, no 4-simple points remain in the resulting set (which is also well-composed). It is clear that after the sequential 4-thinning process is applied to S, all simple points in S either have been deleted or are no longer simple. We now have to show that no point in the resulting set can become simple. More precisely, we have to show that the following situation is impossible: There is a point P + S that is not 4-simple in S, and there is a set SP : N (P) of 4-simple points in S such that P is 4-simple in S)SP. We distinguish two cases: 1. The 4-connection number C (P) is equal to 1 with respect to S. In this case, P has at most one white indirect neighbor and as P is not simple in S, it must be an interior point. Therefore we have (up to 90° rotations) the following situation:
All points N (N (P)), i : 0, 1, 2, 3, must be black, because otherwise N (P) would be simple and perfect. Thus, N (P), i : 2, 3, are interior points. By the same argument N (N (P)), N (N (P)), N (N (P)), and N (N (P)) are black. Thus we have a kernel component containing five points (P and N (P), i : 3, 4, 5, 6), which contradicts Theorem 18.
WELL-COMPOSED SETS
135
2. C (P) 1 with respect to S. The point P should be 4-simple with respect to the set S)SP. Therefore, it has in this set one of the configurations 7, 14, 15, 31, 62, 63, or 191 in Figure 12. Configurations 63 and 191, however, are not possible because C (P) 1 with respect to S. By the same argument configuration 62 is not possible because the set S is well-composed. To obtain configuration 31, SP consists of only one point, the 6-neighbor of P. As this latter point must be simple, it would have to be perfect, which is a contradiction. There remain only configurations 7, 14, and 15. We start with configuration 15.
As C (P) 1, N (P) and possibly N (P) are black in S, the points N (P) and N (P) are necessarily white. If N (P) is black, then N (P) is not 4-simple. As a consequence, N (P) must be eliminated as the last point in SP. For N (P) to be eliminated, it must be simple in S),N (P)-. But then N (P) is an endpoint (configura tion 3). A similar argument applies to configuration 14 with point N (P) as the last point in SP to be eliminated. In case of configuration 7, we can apply the same argument either to N (P) or to N (P) as the last point in SP. ■ F. Making a Binary Image Well-Composed It may happen that a segmented image is not well-composed. In this section we present two different approaches to ‘‘repair’’ a binary image so that the resulting image is well-composed.
136
LONGIN JAN LATECKI
In the first approach we locally repair it by adding (or deleting) a minimal number of black points. If a picture is not well-composed, then there is an 8-component of it, which is not a 4-component. Consequently, repairing this situation changes the topology of the set. Our goal will be to repair a picture while keeping the unavoidable changes minimal. In the second approach we obtain a well-composed image without changing its topology. We extend the input binary image to its double resolution by inserting black, white, and boundary points. In the obtained image all three sets will be well-composed, that is, the sets of black, white, and boundary points will be well-composed. The other possibility, which may be more promising for applications, is to impose local conditions on the segmentation process that guarantee that the obtained segmented image is well-composed or to repair a gray-level image as described in Section IV.G. The following theorem describes the first approach. Theorem 20 The following parallel algorithm makes a well-composed picture from a given picture by adding a minimal number of points. Repairing Algorithm In the following pictures the possible local configurations are presented (up to rotations and reflections). The left hand configuration gives the original situation, the right hand configuration gives the repaired situation obtained by changing white points to black.
1.
2.
At least one of the points ‘‘ * ’’ must be black.
WELL-COMPOSED SETS
137
3.
4. (a)
It may happen that here a new critical configuration occurs. This is the case if and only if the following situation (in the larger neighborhood) is given: (b)
Proof If in cases 3 and 4.a we add only the black point marked ‘‘1’’, then a new critical configuration occurs. Therefore we have to add the black point marked ‘‘2’’. If in case 4.b we add only the black point marked ‘‘1’’, then a new critical configuration occurs. Therefore we have to add the black point marked ‘‘2’’. But adding point marked ‘‘2’’ causes again the critical situation that disappears when the point marked ‘‘3’’ is added. In the remaining cases no further critical situation occur. ■ The repairing algorithm described by Theorem 20 is invariant up to 90° rotations and reflections. If this invariance does not matter, we can repair a given picture to obtain a well-composed picture by adding fewer points. This can be achieved by considering only south-neighborhoods in the configurations already given and their reflections at a vertical axis. We can also repair any set to obtain a well-composed set by deleting black points. An algorithm for this purpose can be formulated if black and
138
LONGIN JAN LATECKI
Figure 16. Expanding a binary image.
white points are changed. This is due to the duality of well-composedness (i.e., a set S is well-composed if and only if S is well-composed). The second approach to make a binary image well-composed is based on the image expansion described in Ko¨the (2000). We expand every 2 ; 2 square in the original image to a 3 ; 3 square, see Figure 16. We need to determine the values a, b, c, d, m as functions of the original binary values A, B, C, D. As the configurations are considered modulo 90° rotations and reflection, it is sufficient to determine the value of a as a function of A and B in Figure 16: 1. If pixels A and B are both black or white, then the new pixel a will be assigned their color. 2. If A and B have different colors, then a : ;, which means that a is a boundary pixel. It remains to assign a value to the middle pixel in the extended image, which is pixel m in Figure 16b: 3. If a : ; or b : ; or c : ; or d : ;, then m : ;. 4. Otherwise the pixels A, B, C, D, a, b, c, d are all either white or black, and m will be assigned their color. An example is given in Figure 17, where (b) $ (c) is done by rules 1 and 2 and (c) $ (d) is done by rules 3 and 4. Proposition 11 The preceding procedure given by rules 1—4 always yields a well-composed image when applied to a binary image, that is, the sets of black, white, and boundary points are all well-composed sets. Proof The proof of this proposition is very simple. We consider a 2 ; 2 square in the resulting image, for example, square a, B, b, m in Figure 16b. If a : x and b : x, then m : x by rule 3, and consequently, the set of boundary points is well-composed. If a is black, then B must also be black (by rules 1 and 2), and consequently the set of black points is wellcomposed. The same argument applies to the set of white points. ■
WELL-COMPOSED SETS
Figure 17. Expanding a binary image to a well-composed image.
139
140
LONGIN JAN LATECKI
G. Making a Gray-Level Image Well-Composed Following Ko¨the (2000), we present a simple procedure to make a gray-level image well-composed and at the same time improve its quality. It is based on image expansion by bilinear interpolation. Definition 20 A gray-level image is well-composed if for every threshold a binarization of the gray values results in a binary well-composed image. The property of well-composedness can be locally tested by the following condition: Proposition 12 A gray-level image is well-composed iff for every 2 ; 2 square ABCD (see Figure 18a) the diagonal intervals have a nonempty intersection: [min(A, D), max(A, D)] . [min(B, C), max(B, C)] " `. where A, B, C, D are gray values.
■
Expanding a Gray-Level Image to a Well-Composed One We expand every 2 ; 2 square to a 3 ; 3 square by a slightly modified bilinear interpolation. Let A, B, C, D be gray values of a 2 ; 2 square in the input image arranged as shown in Figure 18a. We assign the following values to a, b, c, d, m in Figure 18b: 1. a :
A;B B;D A;C C;D , b: , c: , d: 2 2 2 2
2. m :
A;B;C;D if [min(A, D), max(A, D)] . [min(B, C), max(B, C)] 4
"` 3. m :
max(A, D) ; min(B, C) if [min(A, D), max(A, D)] 2 . [min(B, C), max(B, C)] : ` and max(A, D)min(B, C)
Figure 18. Expanding a gray-level image.
WELL-COMPOSED SETS
141
Figure 19. Expanding a gray-level image to a well-composed image.
4. m :
max(B, C) ; min(A, D) if [min(A, D), max(A, D)] 2 . [min(B, C), max(B, C)] : ` and max(B, C) min(A, D)
An example is given in Figure 19. Using the condition in Proposition 12, we prove that these expanding equations always yield a well-composed gray-level image. Theorem 21 By the slightly modified bilinear interpolation given by rules 1—4, every gray-level image can be expanded to a well-composed image. Proof For the proof we assume that A B C. If B D, then [min(A, D), max(A, D)] . [min(B, C), max(B, C)] : [A, D] . [B, C] " `, which means that the original square ABCD is well-composed. In this case, the value of m is assigned by rule 2, and it is easy to check that the obtained 3 ; 3 square is well-composed. Now we consider the case when D B. We additionally assume that A D, the case where D A is very similar. Thus, we have A D B C. We obtain [min(A, D), max(A, D)] . [min(B, C), max(B, C)] : [A, D] . [B, C] " `, which means that the original square is not well-composed. In this case we have by rule 3 m:
max(A, D) ; min(B, C) D ; B : . 2 2
Again it is easy to check that the obtained 3;3 square is well-composed. ■ The gray-level image obtained by the slightly modified bilinear interpolation is not only well-composed, but its quality is significantly improved. Ko¨the (2000) verified by numerous experiments that thresholding an expanded image leads to a significantly better binarization than thresholding the original image. The same applies to edge detecting filters.
142
LONGIN JAN LATECKI
V. Digitization and Well-Composed Images This section is based on Latecki et al. (1998). We model a digitization process by a direct relation between a continuous object and a segmented object in the digital image (see Figure 20, lower part). This approach allows us to avoid tedious details of signal analysis (see Figure 20, upper part). Due to this object-based approach, we can directly compare the features of a continuous object and its digital image. This kind of interpretation of digitization processes for relating topological properties is used in Pavlidis (1982a), Serra (1982), Gross and Latecki (1995), and Latecki et al. (1998). In this section we show that an output digital image must be wellcomposed if the resolution of the digitization process is fine enough to guarantee topology preservation. Because in every mathematical model of a digitization process we first need to model real ‘‘continuous’’ objects, we begin with a definition of continuous planar sets, called parallel regular sets, that are reasonable models of real objects. The properties of the parallel regular sets will allow us to determine the resolution of the digitization process. A. Continuous Representation of Real Objects For the digitization approach, it is necessary to explicitly characterize continuous representations of real objects, as these representations will be mapped to discrete representations by functions modeling real digitization processes. Thus, continuous representations of real objects are the starting point for this approach. Therefore, in this section we describe the classes of continuous representations of real objects that will be used as input to digitization functions. Any continuous model of some class of real objects should on the one hand be able to reflect relevant shape properties as exactly as possible, and on the other hand should be mathematically tractable, in the sense that it should allow for precise, formal description of the relevant properties. For example, it does not make much sense to model the boundaries of 2D projections of real objects as all possible curves in R. This class is too general to allow us to formally describe any shape properties of sets in this class, and it contain curves with very unnatural properties, for example, plane filling curves, that definitely do not model boundaries of planar objects. Therefore, some restrictions must be added. The class of parallel regular sets, which we define in what follows, is defined using osculating balls or equivalently using normal vectors at boundary points. However, we will not use the classical definition of these tools of differential geometry. Differential geometry is based on the concept
WELL-COMPOSED SETS
Figure 20. A comparison of signal-based and object-based approaches to relating spatial objects and their digital images.
143
144
LONGIN JAN LATECKI
of derivative, which requires the calculation of limits of infinite sequences of numbers. As this calculation cannot be transferred into discrete spaces, no analog of the concept of derivative in discrete spaces exists that has similar properties. Instead we will define osculating balls and normal vectors in such a way that their definitions can be directly translated to digital spaces. Let A be a planar set. We denote by A the complement of A, by bdA the topological boundary of A, by intA the topological interior of A, and by clA the topological closure of A in the usual topology of the plane induced by the Euclidean metric. The connected components of the boundary bdA are called contours. We denote by d(x, y) the Euclidean distance of points x, y and by B(c, r) a closed ball of radius r centered at a point c. Definition 21 We will say that a closed ball B(c, r) is tangent to bdA at point x + bdA if bdA . bd(B(c, r)) : ,x- (see Figure 21). We will say that a closed ball iob(x, r) of radius r is an inside osculating ball of radius r to bdA at point x + bdA if bdA . bd(iob(x, r)) : ,x- and iob(x, r) * intA / ,x- (see Figure 21). We will say that a closed ball oob(x, r) of radius r is an outside osculating ball of radius r to bdA at point x + bdA if bdA . bd(oob(x, r)) : ,x- and oob(x, r) * A / ,x- (see Figure 21). Note that x is a boundary point, not the center, of both iob(x, r) and oob(x, r). According to this definition, for every boundary point of a given ball B(c, s) of radius s, there exist inside osculating balls of radii r, where 0 r s. However, B(c, s) itself is not an inside osculating ball for any of its boundary points. Now we define parallel regular subsets of the plane: Definition 22 We assume that A is a closed subset of the plane such that its boundary bdA is compact.
Figure 21. The inside and outside osculating balls of radius r to the boundary of the set A at point x.
WELL-COMPOSED SETS
145
Figure 22. The set A is par(r)-regular while the set B is not par(r)-regular, where r is the radius of the depicted circles.
A set A will be called par(r, ;)-regular if there exists an outside osculating ball oob(x, r) of radius r at every point x + bdA. A set A will be called par(r, 9)-regular if there exists an inside osculating ball iob(x, r) of radius r at every point x + bdA. A set A will be called par(r)-regular (or r parallel regular) if it is par(r, ;)regular and par(r, 9)-regular. A set A will be called parallel regular if there exists a constant r such that A is par(r)-regular. In Figure 22 the set A is par(r)-regular while the set B is not par(r)regular, where r is the radius of the depicted circles. Note that a parallel regular set, as well as its boundary, does not have to be connected. We have the following equivalence: Theorem 22 A set A is par(r)-regular iff, for every two distinct points x, y + bdA, the outer normal vectors n(x, r) and n(y, r) exist and they do not intersect, and the inner normal vectors 9n(x, r) and 9n(y, r) exist and they do not intersect. ■ The proof of Theorem 22 as well as a definition of normal vectors that is not based on the concept of derivative are given in Latecki et al. (1998). In Figure 23, set X is not par(r)-regular while set Y is par(r)-regular, where r is the length of the depicted vectors, for example. Proposition 13 Let A be a par(r)-regular set. If x and y belong to two different components of bdA, then d(x, y) 2r. Proof Let C , . . . , C be all connected components of bdA (there is only a finite number of them, since bdA is compact), where n 2. For every i " j, i, j + ,1, . . . , n-, let d : C ; C R be the Euclidean distance d restricted to C ; C . Since d is a continuous function on a compact set, there exists
146
LONGIN JAN LATECKI
Figure 23. X is not par(r)-regular, but Y is par(r)-regular.
(c , c ) + C ; C such that d (c , c ) 0 is the minimal value of d . Let a pair (c , c ), k " m, be such that d (c , c ) d (c , c ) for all i, j + ,1, . . . , n- with i " j. We obtain that d(x, y) d(c , c ) : d (c , c ) for every x and y belong ing to two different components of bdA. We now show that d(c , c ) 2r. Assume that d(c , c ) 2r. Consider the closed ball B such that c , c + bdB and the line segment c c is the diagonal of B (see Figure 24). Clearly, the radius of B is not greater than r and B . bdA : ,c , c -. Therefore, either B * A or intB * A . We assume intB * A . The proof in the second case is analogous. Every closed ball OB such that OB is a proper subset of B and OB . B : ,c - is an outside osculating ball of A at c . Since the radius of B is not greater than r and the center of B is collinear with all centers of balls OB, we obtain that B is an outside osculating ball of A at c . Yet, this contradicts the fact that B . bdA : ,c , c -. Therefore, d(c , c ) 2r, and consequently d(x, y) 2r for every x and y belonging to two ■ different components of bdA.
B. Digitization and Segmentation Definition 23 Let Q be a cover of the plane with closed squares with the diagonal of length r such that if two squares intersect, then their intersection is either their common side or a corner point. Such a cover is called a square grid with diameter r. A digital image can be described as a set of points located at the centers of the squares of a grid Q and that are assigned some value in a gray level or color scale. By a digitization process we understand a function mapping a planar set X to a digital image. By a segmentation process we understand a process grouping digital points to a set representing a digital object. Therefore, the output of a segmentation process can be interpreted as a binary digital image, where each point is either black or
WELL-COMPOSED SETS
147
Figure 24.
white. We assume that digital objects are represented as sets of black points. Thus, the input of a digitization and segmentation process is a planar set X and the output is a binary digital image, which will be called a digitization of X with diameter r and denoted Dig(X, r). We will interpret a black point p + Dig(X, r) as a closed (black) square of cover Q centered at p and the digitization Dig(X, r) as the union of closed squares centered at black points, that is, Dig(X, r) will denote a closed subset of the plane. We will treat digitization and segmentation processes satisfying the following conditions relating a planar par(r)-regular set X to its digital image Dig(X, r): ds1 If a square q + Q is contained in X, then q + Dig(X, r) (i.e., q is black). ds2 If a square q + Q is disjoint from X, then q , Dig(X, r) (i.e., q is white). ds3 If a square q is black and area(X . q) area(X . p) for some square p + Q, then square p is black. These conditions seem to form an acceptable model of the digitization and segmentation process in image document processing, where an image is captured by a scanner and segmented by thresholding, if we exclude digitization and segmentation errors: (a) The sensor values are monotone with respect to the area of the object ‘‘seen ’’ by sensors, that is, if the area of the object seen by sensor s is greater than the area of the object seen by sensor s , then the gray-level value of s is greater than the gray-level value of s . (b) The influence of blurring is so small that it can be neglected, due to the fact that the distance of objects from the sensor is known and the scanner is calibrated accordingly. (c) The gray-level images obtained by the digitization process are segmented by thresholding, which is the standard segmentation technique in document analysis.
148
LONGIN JAN LATECKI
Figure 25. (a) The union of all squares represents the intersection digitization of the ellipse. (b) The two squares represent the square subset digitization of the ellipse. (c) The eight squares represent a digitization of the ellipse with the area ratio equal to 1/5.
In the following, we define some important digitization and segmentation processes satisfying conditions ds1, ds2, and ds3. Definition 24 Let X be any set in the plane. A square p + Q is black (belongs to a digital object) iff p . X " `, and white otherwise. We will call such a digital image an intersection digitization with diameter r of set X, and denote it with Dig.(X, r), namely Dig.(X, r) : +,p + Q : p . X " `-. See Figure 25a, for example, where the union of all depicted squares represents the intersection digitization of an ellipse. With respect to real camera digitization and segmentation, the intersection digitization corresponds to the procedure of coloring a pixel black iff there is part of the object A in the field ‘‘seen’’ by the corresponding sensor. When digital straight line models were first studied by Freeman and Rosenfeld, the digitization models that were assumed were based on the intersection digitization, which is called square-box quantization by Freeman (1970). Now we consider digitizations corresponding to the procedure of coloring a pixel black iff the object X fills the whole field ‘‘seen’’ by the corresponding sensor. For such digitizations, a square p is black iff p * X and white otherwise. We will refer to such a digital image of a set X as a square subset digitization and denote it by Dig:(X, r), where Dig:(X, r) : +,p + Q : p * X-. In Figure 25b the two squares represent Dig:(X, r), where X is the ellipse. Next, let us consider a digitization and segmentation process in which a pixel is colored black iff the ratio of the area of the continuous object in a sensor square to the area of the square is greater than some constant threshold value v. An example is given in Figure 25c, where the squares represent a digitization of the ellipse with the ratio equal to 1/5. A square p + Q is black iff area(p.X)/area(p)v and white otherwise, where 0 v1 is a constant. We will refer to such a digital image of a set X as a v-digitization of X with diameter r. This process models a segmentation by applying a threshold value to a gray-level digital image for all real devices
WELL-COMPOSED SETS
149
in which the sensor values can be assumed to be monotonic with respect to the area of the object in the sensor square. We will denote such digitizations by Dig (X, r). We recall that we identify the digitization of X with the union of black closed squares. Thus Dig (X, r) denotes the digital picture, which is the union of black closed squares. We will also denote Dig (X, r) as the digitization in which the ratio of the area is equal to 1. We have the following inclusions Dig:(X, r) * Dig (X, r) * Dig.(X, r) for every v + [0, 1] and Dig (X, r) * Dig (X, r) if w v for every v, w + [0, 1]. We will hereafter use Dig(X, r) without subscript to denote any digitization and segmentation process satisfying ds1, ds2, and ds3. Thus, in particular, Dig(X, r) denotes Dig.(X, r), Dig:(X, r), and Dig (X, r) for every v + [0, 1]. C. Digitizations Produce Well-Composed Images In Latecki et al. (1998), we proved the following result: Theorem 23 Let A be a par(r)-regular bordered 2D manifold. Then A and Dig(A, r) are homeomorphic for every digitization Dig(A, r) (which satisfies conditions ds1, ds2, and ds3). ■ An important step in proving Theorem 23 is the following theorem whose proof we restate here: Theorem 24 If A is par(r)-regular, then Dig(A, r) is well-composed, that is, the pattern shown in Figure 26 and its 90° rotation cannot occur in any Dig(A, r). Proof Let c be the common vertex of all four closed squares. We first assume that c , A and show that the pattern shown in Figure 26 and its 90° rotation cannot occur in the configuration of the four squares. Let S and S be black, and S and S be white as shown in Figure 27a, where S , S , S , and S are closed squares. We prove that this assumption leads to a contradiction. As A is closed and c , A, there is an e 0 such that B(c, e) . A : `, where B(c, e) denotes (as always) a closed ball. There must be points of A in both squares S and S , because if S . A : `, then S would be white by ds2. Therefore, S . A " `, and similarly S . A " `.
150
LONGIN JAN LATECKI
Figure 26. This pattern and its 90° rotation cannot occur in every Dig(A, r).
Let p be a point with the shortest distance t to c in S . A. Let p be a point with the shortest distance d to c in S . A. Clearly, points p , p belong to bdA and t 0, d 0, because c , A, c + S , and c + S . Without loss of generality, we assume that t d. Consider the closed ball B(c, t). We show that p and p belong to two different components of B(c, t) . bdA. Assume that this is not the case. Then, for some component C of bdA, it follows that C : arc (p , p ) / arc (p , p ), arc (p , p ) . arc (p , p ) : ,p , p -, and arc (p , p ) * B(c, t) or arc (p , p ) * B(c, t). Assume arc (p , p ) * B(c, t) and, without loss of generality, assume that arc (p , p ) goes through S . Then arc (p , p ) . face(S , S ) . (B(c, t))B(c, e)) " ` and arc (p , p ) . face(S , S ) . (B(c, t))B(c, e)) " `, where face denotes the common face of two squares (see Figure 27b). In this case, arc (p , p ) . S * (B(c, t))B(c, e)) . S .
Figure 27. The small circle illustrates ball B(c, e) and the big circle illustrates ball B(c, t).
WELL-COMPOSED SETS
151
As the diameter of square S is r, no component other than C of bdA intersects S , by Proposition 13. Therefore, A . S contains (S )B(c, t)) together with part of A between arc (p , p ) and bdB(c, t) in S . We also have that A . S * (S )B(c, t)), since no point in S . A is closer to c than distance t (by the definition of constant t). Consequently, area(A . S ) area(A . S ). Thus, square S should be black. This contradiction implies that arc (p , p ) * / B(c, t) for i : 1, 2. Therefore, bdA . B(c, t) has at least two components, one containing p and the second containing p . In each of these components there is a point with the shortest distance ( t) to c; call them x and x . Then c + n(x , r) . n(x , r), a contradiction. We have thus shown for a par(r)-regular set A that (*) if c , A, then the pattern shown in Figure 26 and its 90° rotation cannot occur in the four squares of Dig(A, r) which have c as their common vertex. The case in which c + A)bdA follows directly from the result already given and applied to the digitization of the complement A of A (i.e., the roles of A and A are interchanged). For completeness, the proof is given in what follows. Let S and S be black, and S and S be white, as shown in Figure 27a, where S , S , S , and S are closed squares. Without loss of generality, we assume that area(S . A) area(S . A) and area(S . A) area(S . A). Then, by ds3, area(S . A) area(S . A). We will digitize the set B that is the closure of the complement of A, that is, B : cl(A ). As A is par(r)-regular, B is also par(r)-regular. Clearly, area(S . B) : area(S ) 9 area(S . A) for i : 1, 2, 3, 4. As area(S . B) : 9area(S . A) ; area(S ), where area(S ) is a constant value, we obtain that area(S . B) area(S . B) as well as area(S . B) area(S . B) and area(S . B) area(S . B). Thus, in Dig(B, r) we have the pattern: S and S are white, and S and S are black. With c , B, we obtain by the preceding result (*) applied to B that this pattern cannot occur in squares S , S , S , and S , which belong to Dig(B, r). The obtained contradiction proves that if c + A)bdA, then the pattern shown in Figure 26 and its 90° rotation cannot occur in the four squares of Dig(A, r) which have c as their common vertex.
152
LONGIN JAN LATECKI
Figure 28. area(S . A) 9 area((S ; v) . A) area(S : (S ; v)).
It remains to consider the case in which c + bdA. Let again S and S be black, and S and S be white in Dig(A, r), as shown in Figure 27a, where S , S , S , and S are closed squares. This implies that % : min,area(S . A), area(S . A) 9 max,area(S . A), area(S . A)- 0 (1) We denote by X ; v the translation of a set X by vector v and by A : B : (A 9 B) / (B 9 A). It is easy to observe that (see Figure 28) area(S . A) 9 area((S ; v) . A) area(S : (S ; v))
(2)
for every square S + Q and every vector v, where r denotes the absolute value of r. With c + bdA, there are points of the complement A in every neighborhood of c. Therefore, there exists a vector v such that c ; v , A and area(S : S ) %/2, where S : S ; v for every square S + Q. As a consequence of this fact and inequalities (1) and (2), we obtain min,area(S . A), area(S . A)- 9 max,area(S . A), area(S . A)- 0. (3) Therefore, S and S are black, and S and S are white, in the digitization Dig (A, r) of A with respect to the square cover Q translated by v. This contradicts (*), because c ; v , A and c ; v is the common vertex of the four squares. The obtained contradiction proves that if c + bdA, then the pattern shown in Figure 26 and its 90° rotation cannot occur in the four squares of Dig(A, r) which have c as their common vertex. ■ The following theorem is a simple consequence of 24. Theorem 25 Let A be par(r)-regular. Then Dig(A, r) is a bordered 2D manifold and the boundary of Dig(A, r) is a 1D manifold. Proof Whereas the configuration shown in Figure 26 (and its 90° rotation) cannot occur in Dig(A, r) by Theorem 24 there exist only three 2 ; 2
WELL-COMPOSED SETS
153
configurations of boundary squares in Dig(A, r) shown in Figure 5 (modulo reflection and 90° rotation). Therefore, if we view Dig(A, r) as a subset of R, every point in Dig(A, r) has a neighborhood homeomorphic to a relatively open subset of a closed half-plane. Hence Dig(A, r) is a bordered 2D manifold and the boundary of Dig(A, r) is a 1D manifold. ■ The well-composedness of an output digital image by the intersection digitization can also be guaranteed without the requirement that the input continuous image is parallel regular. Theorem 26 L et G be a planar set with the property that the intersection of G with every open ball with radius d is connected. T hen the intersection digitization Dig.(G, d) of G with diameter d is well-composed. Proof Let a, b, c, and d be four closed squares of cover Q sharing a common corner point x which are arranged as follows:
Let O(x, d) be the open ball centered at x with radius d. We will show that the configurations a . G " `, d . G " `, and c . G : b . G : ` or b . G " `, c . G " `, and a . G : d . G : `, which lead to non-wellcomposedness of the digitization of G, are impossible. For example, if a . G " ` d . G " `, and b . G : c . G : `,
then G . O(x, d) is disconnected, because G . O(x, d) : G . (O(x, d))(c / d)), which is clearly a disconnected subset of O(x, d). ■
154
LONGIN JAN LATECKI
VI. Application: An Optimal Threshold This section is based on Latecki and Gross (1998). It seems to be a common opinion that the problem of finding an optimal threshold for gray-level images of black objects on a white background (e.g., obtained by a scanner) has been solved. Usually, this threshold is determined by analyzing a gray-level histogram. In this section we show that the gray-level histogram alone does not provide sufficient information to solve this problem and that this threshold depends on topological properties of the image. Based on these considerations, we propose a new method for determining the optimal threshold that is based on the digital topology of the gray-level image. We tested this method on several hundred document images scanned at different dpi’s (and consisting of several different languages). In all these tests, the threshold determined by our topological method was more suitable than the threshold determined by analyzing gray-level histograms.
A. Thresholding One of the important problems that generally needs to be solved in analyzing document images is that of finding a threshold to convert the document from a gray-level digital image to a binary one. Finding a good threshold value is important in the document domain for many subsequent applications from OCR to symbolic compression (e.g., see O’Gorman and Kasturi, 1997). There exist several approaches to find an optimal threshold of a gray-level image (e.g., see Pratt, 1978 and Weszka, 1978); however, as we will show in what follows, none of them can be optimal. With the following experiment we demonstrate that the gray-level histogram is often insufficient to decide where to cut an optimal threshold. Therefore, there is no apparent reason to assume that by analyzing the gray-level image histogram, for example, by finding the minima, an optimal threshold can be detected. Consider the two gray-level images of the word ‘‘should’’ shown in Figure 29. They differ only by the inter-letter spacing (i.e., the distance between letters). The lower image was generated from the top one by taking the bounding boxes around each gray-level letter and moving these bounding boxes so as to make them almost adjacent, while moving all the columns containing only background pixels to the outside. Because the distribution of pixels in the image itself remained the same, the gray-level histogram for the two images is identical.
WELL-COMPOSED SETS
155
Figure 29. The two gray-level images differ only by the inter-letter spacing. The corresponding binary images are obtained by thresholding at 215.
The two gray-level images have been thresholded at the threshold value 215. The obtained binary images are shown in Figure 29 below the original gray-level images. While this threshold leads to a correct object segmentation by grouping into the connected components for the upper image, this is not the case for the lower one, where the letters ‘‘u’’ and ‘‘l’’ form a single connected component. Thus, this threshold value when applied to the lower gray-level image results in a false connection. Because the text font in the top gray-level image has wider spaces between letters than the font in the lower one, it follows that the gray-level image of the first font can be thresholded at a higher gray-level value than the second font, while still yielding correct segmentation of the letters. The second font, however, having smaller inter-letter distances, clearly requires a lower threshold value than 215. Observe that this is not a distinction that can be made from their identical gray-level histograms. From a digital topology perspective, the digitization and segmentation process for the lower image is not topology preserving, that is, the continuous original ‘‘image’’ containing the underlying letters and the binary image are not topologically equivalent (homeomorphic). In this example we have demonstrated that two images with identical gray-level histograms may require considerably different thresholds in order to ensure that the digitization is topology preserving, which is necessary for correct object segmentation by connected component grouping. Thus, it is impossible to determine a topology preserving threshold using the gray-level histogram. Consequently, significant points of the gray-level histogram, like minima, are not related to the topology of the underlying image. Clearly, there is a connection between threshold values and the preservation of topology, but a topology-preserving threshold cannot be determined
156
LONGIN JAN LATECKI
Figure 30. A gray-level document image.
by analyzing the gray-level histogram. The following example gives an illustration of this connection. Gray-level document images often have the property that they are bimodal, and the two peaks of the gray-level histogram are quite distinct, but the proper threshold value between these peaks can be very hard to find. This is analogous to the problem in color segmentation of knowing the number and color of the regions into which to segment the image, but of not necessarily knowing exactly where to delineate the boundaries. A good candidate for the threshold value is the minimum between the two peaks (see e.g., Pratt, 1978, Section 18.5.1). Consider the gray-level image shown in Figure 30. This image was captured with a scanner set at 400 dpi. The minimum (between the two peaks) of the histogram of the image gray-level values appears to be approximately 169. The image in Figure 30 is shown thresholded at gray-level 169 in Figure 31. As is evident, this is not a particularly good threshold. It seems to be considerably lower than the desired threshold value and results in many
Figure 31. The gray-level document image thresholded at the gray-level value of 169, which is the approximate minimum of the histogram of image gray-level values.
WELL-COMPOSED SETS
157
Figure 32. The resulting binary image when the threshold is set to 232.
false disconnections, where components that were clearly connected in the original image have become disconnected in the thresholded image. If we desire a topology-preserving segmentation, then this is one where there are neither false connections nor disconnections of connected components. Next, let us consider the binary image shown in Figure 32, which is once again a thresholded version of the image shown in Figure 30. It can be seen that in this image the problem is reversed — this is primarily a problem of false connection, with letters connecting to other letters in the text. One way to view thresholding a digital document is that setting the threshold lower effectively thins out each letter, or component, while raising the threshold effectively thickens each textural component. Clearly, then, there is a tradeoff between the false connection and the false disconnection rate. Assuming the initial document was scanned at some resolution that was not completely topology preserving, this false connection/disconnection tradeoff will almost always exist. B. Histogram of Checkerboard Patterns In the last section, we showed that it is impossible to determine an optimal threshold using the gray-level histogram. Thus, a new indicator is necessary in order to determine which threshold is the one most likely to preserve topology. We are going to propose such an indicator in this section. Our starting point is the observation from the last section that an optimal threshold is closely related to a topology preserving segmentation by thresholding. According to Theorem 23, if the resolution of a digitization process is sufficient, then the resulting segmented digital image and the underlying continuous object are topologically equivalent. Moreover, by Theorem 24, the obtained digital image is well-composed.
158
LONGIN JAN LATECKI
Whereas our mathematical model of the digitization process seems to closely approximate the digitization process of scanners, if there were no noise and if the resolution of a digitization process were sufficient, then the binary digital image obtained by the digitization and segmentation process would contain no checkerboard patterns. As there is some noise influence and some parts of continuous objects violate the requirement on the sufficient resolution, it is a simple consequence to require that the number of checkerboard patterns be minimal. Consequently, the threshold that minimizes the number of checkerboard patterns should be chosen. However, this is not so simple, because such a threshold is not unique, for example, for thresholds values 0 and 255, which give completely white and completely black digital images, no checkerboard patterns occur. The problem is thus to find the ‘‘right’’ minimum in the number of checkerboard patterns. The histogram of the number of checkerboard patterns per threshold value is bimodal for almost all the digital documents we have tried in our experiments, which have consisted of several hundred document images scanned at different dpi’s and consisting of several different languages. The two maxima of the histogram correspond to the two gray-level values where either the number of topological false connections (or false disconnections) has extrema. We will justify this fact here. For the document shown in Figure 30, the two extrema occurred at gray-level values of 86 and 246. The first extremum occurred as a result of letters that were falsely disconnecting. The second extremum occurred as a result of letters falsely connecting and from noisy background pixels forming checkerboard patterns. Both of these images are topologically very unstable in that the underlying topological structure of the image is rapidly changing. Conversely, the minimum of checkerboard patterns that occurs in between these two maxima is topologically very stable in that the rate of topological change is at a minimum. We claim that an optimal threshold is at the gray-level value for which the minimum of checkerboard patterns occurs and which lies between the two maxima of checkerboard patterns. We will call this optimal threshold topology-preserving. For example, for the image in Figure 30 the minimum of checkerboard patterns between the two maxima occurs at the gray-level value of 213. The resulting thresholded image is shown in Figure 33, where only 1 of 859,308 2;2-neighborhoods was a checkerboard pattern. Observe that the resulting binary image is topologically very close to the original document. For the more than several hundred documents we have studied, the minimum of checkerboard patterns seems to result in a thresholding of the
WELL-COMPOSED SETS
159
Figure 33. This image was thresholded at the minimum of the histogram of checkerboard patterns, which occurred at a gray-level value of 213. The resulting binary image is topologically very close to the original document.
gray-level image into binary that is either topologically optimal, that is, the total number of false connections and disconnections is minimized, or it is very close to optimal. Unlike the minima of the gray-level histogram, which is often flat or not well-defined, the minima of the histogram of checkerboard patterns are generally very well-defined. In addition, in all of the experiments we have conducted, the thresholding at the checkerboard minima outperforms the thresholding at the gray-level minima considerably. For example, the character recognition rate of the Omni-Page OCR program was significantly higher for binary document images obtained using our threshold than by using Omni-Page directly on the gray-level images. For document images scanned at 200 dpi, the number of mismatches made by Omni-Page using our thresholded version was 10 times smaller. If we revisit the two thresholded images shown in Figs. 31 and 32, there is a clear indication from the number of checkerboard patterns in each case that neither gray-level thresholds is optimal. The image in Figure 31 is under-thresholded. The checkerboard patterns are almost entirely the result of false disconnections. The image in Figure 32 is overthresholded. The checkerboard patterns are almost entirely the result of false connections. A more detailed discussion on thresholding document images based on the checkerboard histogram can be found in Latecki and Gross (1998).
VII. Generalizations In Wang and Bhattacharya (1997) the concept of 2D well-composed sets is extended in two directions: to an arbitrary grid system and to segmented images with objects labeled with more than two gray values. Our definition
160
LONGIN JAN LATECKI
Figure 34. The partition in (b) is a regular partition while the one in (a) is not.
of a well-composed segmented image given at the end of Section I is equivalent to Definition 3.2 in Wang and Bhattacharya (1997). Now we extend the definition of well-composed sets to subsets of arbitrary grid systems. In order to simplify the presentation, the following definitions are closely related to but not identical to the ones in Wang and Bhattacharya (1997). Definition 25 We call S : P(R) a regular partition of R if 1. 2. 3. 4.
+S : R, for every S + S the boundary bdS is a simple closed curve, each S + S intersects only a finite number of elements in S, for every A, B + S A . B is either empty or a point or an arc.
We will call each element of a regular partition a pixel, that is, a pixel is a subset of the continuous plane. For example, the partition in Figure 34b is a regular partition, while the one in (a) is not, because p . q is not an arc. A collection S that satisfies conditions 1—3 is called a grid system in Wang and Bhattacharya (1997). Definition 26 A subcollection W of a regular partition S (i.e., a set of pixels) is well-composed if +W is a 2D bordered manifold (or equivalently if bd(+ W) is a 1D manifold). These definitions can be easily generalized to R: Definition 27 We call S : P(R) a regular partition of R if 1. +S : R, 2. for every S + S the boundary bdS is a simply connected closed n 9 1 manifold, 3. each S + S intersects only a fine number of other sets in S, 4. for every A, B + S A . B is either empty or is a simply connected bordered manifold of dimension less than n. Definition 28 A subcollection W of a regular partition S : P(R) is well-composed if +W is a bordered n-manifold (or equivalently if bd(+W) is an n 9 1-manifold.
WELL-COMPOSED SETS
161
We call a function G : S [0, 1] a gray-level digital image, and a function B : S ,0, 1- a binary digital image. Acknowledgments This contribution is based on articles that the author published together with the following researchers: Azriel Rosenfeld (University of Maryland at College Park), Ari Gross (Queens College, CUNY, New York), Ulrich Eckhardt (University of Hamburg), and Christopher Conrad. Their cooperation is gratefully acknowledged. This contribution was also influenced by many helpful comments from Siegfried Stiehl (University of Hamburg), Ullrich Ko¨the (University of Hamburg), Ralph Kopperman (The City College, CUNY, New York), Paul Meyer (The City College, CUNY, New York), and Atsushi Imiya (Chiba University, Japan). References Abdulla, W. H., Saleh, A. O. M., and Morad, A. H. (1988). A preprocessing algorithm for hand-written character recognition. Pattern Recognition L etter, 7:13—18. Aleksandrov, P. S. (1960). Combinatorial Topology, vol. 3, Albany, New York: Graylock Press. Arcelli, C. (1981). Pattern thinning by contour tracing, Computer Graphics and Image Processing, 17:130—144. Arcelli, C., and Sanniti di Baja, G. (1989). A one-pass two-operation process to detect the skeletal pixels on the 4-distance transform, IEEE Trans. PAMI, 11:411—414. Artzy, E., Frieder, G., and Herman, G. T. (1981). The theory, design, implementation and evaluation of a three-dimensional surface detection algorithm, Computer Graphics and Image Processing, 15:1—24. Chen, L., and Zhang, J. (1993). Classification of simple surface points and a global theorem for simple closed surfaces in three dimensional digital spaces, in Proc. SPIE’s Vision Geometry, 2060: 179—188. Davies, E. R., and Plummer, A. P. N. (1981). Thinning algorithms: A critique and a new methodology, Pattern Recognition, 14:53—63. Duda, R. O., Hart, P. E., and Munson, J. H. (1967). Graphical Data Processing Research Study and Experimental Investigation, AD650926, March 1967. Eckhardt U., and Maderlechner, G. (1993). Invariant thinning. Int. J. of Pattern Recognition and Artificial Intelligence, 7:1115—1144. Francon, J. (1995). Discrete combinatorial surfaces, Graphical Models and Image Processing, 57:20—26. Freeman, H. (1970). Boundary encoding and processing, in B. S. Lipkin and A. Rosenfeld, editors, Picture Processing and Psychopictures, pp. 241—266, New York: Academic Press. Gross, A., and Latecki, L. J. (1995). Digitizations preserving topological and differential geometric properties, Computer V ision and Image Understanding, 62:370—381. Herman, G. (1992). Discrete multidimensional Jordan surfaces, GV GIP: Graphical Models and Image Processing, 54:507—515.
162
LONGIN JAN LATECKI
Hilditch, C. J. (1969). Linear skeletons from square cupboards, in B. Meltzer and D. Michie, editors, Machine Intelligence IV, pp. 403—420. New York: American Elsevier and Edinburgh: University Press. Kong, T. Y., and Roscoe, A. W. (1985). A theory of binary digital pictures, Computer V ision, Graphics, and Image Processing, 32:221—243. Kong, T. Y., and Rosenfeld, A. (1989). Digital topology: Introduction and survey, Computer V ision, Graphics, and Image Processing, 48:357—393. Kong, T. Y., and Rosenfeld, A. (1990). If we use 4- or 8-connectedness for both the objects and the background, the Euler characteristic is not locally computable, Pattern Recognition L etters, 11:231—232. Ko¨the, U. (2000). Generische Programmierung fu¨r Computer V ision. Doctoral dissertation, Dept. of Computer Science, University of Hamburg. Latecki, L. J. (1995). Multicolor well-composed pictures, Pattern Recognition L etters, 16:425— 431. Latecki, L. J. (1997). 3d well-composed pictures, Graphical Models and Image Processing, 59:131—142. Latecki, L. J., Conrad, Ch., and Gross, A. (1998). Preserving topology by a digitization process, Jour. Mathematical Imaging and V ision, 8:131—159. Latecki, L. J., Eckhardt, U., and Rosenfeld, A. (1995). Well-composed sets, Computer V ision and Image Understanding, 61:70—83. Latecki, L. J., and Gross, A. (1998). From mathematical digitization models to discrete shape constraints, in R. Kletle, A. Rosenfeld, and F. Sloboda, editors, Advances in Digital and Computational Geometry, pp. 185—226. Singapore: Springer-Verlag. Latecki, L. J., and Ma, C. M. (1996). An algorithm for a 3d simplicity test, Computer V ision and Image Understanding, 63:388—393. Lu¨, H. E., and Wang, P. S. P. (1985). An improved fast parallel thinning algorithm for digital patterns, in Proc. IEEE Conference on Computer V ision and Pattern Recognition, San Francisco, CA, pp. 364—367. Minsky, M., and Papert, S. (1969). Perceptrons. An introduction to Computational Geometry, Cambridge, MA: MIT Press. O’Gorman, L., and Kasturi, R. (1997). Document Image Analysis, Los Alamitos: IEEE Computer Society. Pavlidis, T. (1982a). Algorithms for Graphics and Image Processing, Berlin: Springer-Verlag. Pavlidis, T. (1982b). An asynchronous thinning algorithm, Computer Graphics and Image Processing, 20:133—157. Pratt, W. K. (1978). Digital Image Processing, New York: John Wiley and Sons. Ronse, C. (1988). Minimal test patterns for connectivity preservation in parallel thinning for binary images, Discrete Applied Mathematics, 21:67—79. Rosenfeld, A. (1975). A characterization of parallel thinning algorithms, Information and Control, 29:286—291. Rosenfeld, A. (1979). Digital topology, American Mathematical Monthly, 86:621—630. Rosenfeld, A., and Kong, T. Y. (1991). Connectedness of a set, its complement, and their common boundary, Contemporary Mathematics, 119:125—128. Rosenfeld, A., and Pfaltz, J. L. (1966). Sequential operations in digital picture processing, Jour. Association for Computing Machinery, 13:471—494. Rutovitz, D. (1966). Pattern recognition, J. Royal Statist. Soc., 129:504—530. Serra, J. (1982). Image Analysis and Mathematical Morphology, New York: Academic Press. Stefanelli, R., and Rosenfeld, A. (1991). Some parallel thinning algorithms for digital pictures, Jour. Association for Computing Machinery, 18:255—264.
WELL-COMPOSED SETS
163
Wang, Y., and Bhattacharya, P. (1992). Digital connectivity and extended well-composed sets for gray images, Computer V ision and Image Understanding, 68:330—345. Weszka, J. S. (1978). A survey of threshold selection techniques, Computer Graphics and Image Processing, 7:259—265. Zhang, T. Y., and Suen, C. Y. (1984). A fast parallel algorithm for thinning digital patterns, Communications of the ACM, 27:236—239.
a This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 112
Non-Stationary Thermal Field Emission V. E. Ptitsin Institute for Analytical Instrumentation RAS, Rizhskij Prospekt 26, 198103, St. Petersburg, Russia
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . II. Electron Emission and Thermal Field Processes Activated by High Electric Fields Acting on Metal Surfaces . . . . . . . . . . . . . . . . . . A. Thermal Field Emission of Electrons from Metal Surfaces: Canonical Concepts of the Emission Mechanism . . . . . . . . . . . . . . B. Inadequacy of the Concepts Regarding the Thermal Field Emission Mechanism for Intense Electric Fields Inducing High Density Emission Currents (J 10 A/cm) . . . . . . . . . . . . . . . III. Phenomenological Model of Non-Stationary Thermal Field Emission . . A. Heating of the Pointed Microcrystal Emitter Tip by the Emission Current Flow . . . . . . . . . . . . . . . . . . . . . . . . B. Instability of the Microcrystal Emitter Tip Surface During Non-Stationary Thermal Field Emission . . . . . . . . . . . . . C. Determination of the Surface Concentration of the Microcrystal Emitter Substance Native Atoms in the Two-Dimensional Gas State . . . . . D. Characteristic Features of the Emitter Native Neutral Atoms Motion After the Evaporation from the Emitting Surface into Vacuum . . . . E. Ionization Probability of the Emitter Substance Native Atoms After the Evaporation . . . . . . . . . . . . . . . . . . . . . . . . F. Processes at the Interface: Emitter Surface — Microplasma Layer . . . G. Non-Stationary Thermal Field Emission Current Kinetics . . . . . . IV. Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .
165
. .
167
. .
167
. . . .
171 191
. .
191
. .
195
. .
198
. .
202
. . . . . . .
207 209 217 221 225 225 228
. . . . . . .
I. Introduction This paper deals with one of the most important problems related to forming high-power density submicron electron probes, that is, that of development and practical preparation of stable electron sources with superhigh brightness (up to 10 A/cm sr) and angular emission intensity (up to 10\ A/sr). The timeliness of the development of electron-optical systems capable of forming on the sample (target) surface a submicron electron probe with a 165 Volume 112 ISBN 0-12-014754-8
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright 2000 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/00 $35.00
166
V. E. PTITSIN
power density of (10—10)W/cm and relatively low energy (about 10 eV and below) is because such probes could be very useful in advanced scientific and technical projects such as resistless (‘‘dry’’) electron lithography; high-intensity (up to 10 W/cm) ‘‘point’’ X-ray sources for X-ray lithography; electron-beam semiconductor surface profiling and modification; ‘‘soft’’ desorption ionization of adsorbates for mass spectral analysis of complex organic and bioorganic compounds; next-generation Auger spectrometers for high-locality chemical and structural analysis of molecules adsorbed on various matrices, and so on. The main difficulty here, which hinders practical development and utilization of such probes, is that the existing thermal field cathodes-emitters based on, for example, refractory transition metal microcrystals and also ZrO/W(100 composite emitters do not demonstrate stable operation in the high density (above 10 A/cm) emission mode since in such conditions an explosive breakdown is initiated in the interelectrode gap: emitter surface — the first anode of the electron-optical system. In this connection the present paper concentrates on the results of investigations on the physical mechanisms of thermal field processes leading to the explosive breakdown phenomenon. A detailed analysis of processes accompanying the development of non-stationary electron emission and explosive breakdown initiation has shown that the non-stationary behavior of the electron emission current from the transition metal microcrystal (MC) tip surface in high electric fields cannot be adequately interpreted based on canonical conceptions of the thermal field emission (TFE) mechanism. It has been found that in such conditions the non-stationary electron emission mode develops due to (a) evaporation of native atoms from the emitting MC emitter surface, (b) field and collisional ionization of evaporating atoms, and (c) interaction of the ions thus formed with the emitting surface. The development of those processes in time causes phase transition of the emitter condensed matter to dense plasma. The studies have revealed substantial differences in physical mechanisms between the stationary and non-stationary electron-emission processes that suggests their treatment as essentially different phenomena. The phenomenological model developed for the explosive breakdown phenomenon has scientific value by itself, but also is of great practical interest for applications in electron-optical systems, such as electron ‘‘quazilasers’’’ to produce intense submicron electron beams whose power density is comparable with that of modern high-power laser sources. Such instruments will make the scientific and technical projects listed above practical. The investigations carried out have provided the basis for developing an original nanoprocess for making pointed electron emitters capable of stable
NON-STATIONARY THERMAL FIELD EMISSION
167
operation in intense electric fields with an electron brightness of up to 10 A/cmsr and angular emission intensity of up to 10\ A/sr. A brief description of the technology for preparing emitters with such unique electron-optical parameters is given in the Appendix.
II. Electron Emission and Thermal Field Processes Activated by High Electric Fields Acting on Metal Surfaces A. Thermal Field Emission of Electrons from Metal Surfaces: Canonical Concepts of the Emission Mechanism The phenomenon of field emission (FE) of electrons experimentally detected by R.W. Wood (1897) consists in emitting electrons by substances exposed to a sufficiently high electric field (F about 1 V/nm). R. Fowler and L. Nordheim (1928), based on the ideas of tunneling of electrons of substance through the potential barrier at the metal surface—vacuum boundary, gave theoretical explanation to the phenomenon of FE. The expression describing dependence of the current density (J) of field emission on the field strength (F) and work function (3) has the form (see Fowler and Nordheim (1928) and Modinos [1990]) J(F ) :
1,537.10F 0,683 3 · exp 9 (y) 3t(y) F
(1)
here J(F ) is expressed in A/cm, 3 in eV, F in V/Å, and t(y) and v(y) are tabulated functions of the argument y : 3,79F/3 (Modinos, 1990). Note that formula (1) is obtained using several simplifying assumptions and computational techniques (Fowler and Nordheim, 1928; Modinos, 1990; Elinson, 1974). 1. The problem was stated as one-dimensional (i.e., the metal—vacuum interface was considered as an ideally flat surface, and hence the boundary field was considered homogenous). 2. The transparency of the potential barrier was calculated by means of the quasi-classical WKB approximation. 3. As a model of metal, the model of free electrons in a potential ‘‘box’’ (Sommerfeld model of metal) was selected, according to which electrons of metal will form a degenerate gas obeying the Fermi—Dirac statistics. 4. The theory is built for the metal temperature T : 0 K. The plot of ln(J/F) as a function of argument 1/F is a straight line and, consequently, is called the Fowler—Nordheim (F—N) straight line. The
168
V. E. PTITSIN
analytical expression describing the electron emission density from the substance surface with the work function 3 versus field strength F and absolute surface temperature T was derived by E. L. Murphy and R. H. Good in 1956 (Murphy and Good, 1956). This relation is of the form (Elinson, 1974) J(F, T ) : J(F )
# sin(#)
(2)
where # : (8m/ )(3/eF) t(y) kT ; m is the electron mass, is the Planck constant, e is the electron charge, and k is the Boltzmann constant. Formula (2) is correct with an error up to 40%, providing certain conditions interrelating the emission parameters (F, 3, T ) are fulfilled (Modinos, 1990; Elinson, 1974). The range of temperatures and field strengths, where (with the indicated error) the use of formula (2) is valid, is shown in (Modinos, 1990 (Figure 1.4)). Analytical expressions describing the relation J(F, T ) in an intermediate range of temperatures and field strengths were obtained by Christov (1966). However, the relationships derived in (Christov, 1966) are not given here, as they are cumbersome and can be accepted only after appropriate numerical calculations. In addition to the function J(F, T ) another essentially important physical characteristic of the field emission process is the energy distribution of emitted electrons. The number of electrons emitted from the unit area per unit time with the total energy from E to E ; dE can be defined as j (E ) dE, where j (E ) is the function of distribution by total energies. Within the framework of the theory of metals, in free electrons approximation, j (E ) can be expressed as [Modinos, 1990] j (E ) :
N(E, #) D(#, F)d# where N(E, w) is determined by the relationship
(3)
N(E, w)dEdw : (m/2 ) f (E )dEdw here f (E ) is the Fermi—Dirac distribution function; w is the normal component of the electron total energy; and D(w, F) is the coefficient of electron passage through the potential barrier at the metal—vacuum boundary: D(w, F) : ,1 ; exp[Q(w)]-\ Q(w) : 92i
(4)d4 (4 ) : [(2m/ ) (w 9 E 9 3 ; e/44 ; eF4)]
169
NON-STATIONARY THERMAL FIELD EMISSION
where Z and Z are the roots of the equation (4) : 0, and E is the Fermi energy. The expression for j (E ) true for the range of F and T, where formula (2) was obtained, was derived by R. Young (1959)
md f (E ) 4 2m (3 ; E 9 E) j (E ) : exp v(y ) , 2 3 eF
(4)
where y : (eF ) (E ; 3 9 E)\. As shown by refined experiments performed in (Swanson and Crouser, 1967), Young’s formula (4) satisfactorily agrees with experimental results up to fields F : 4 V/nm for crystallographic planes (111), (112), (116), and (310) of W. For higher values of field strengths F 5 V/nm (at current densities about 10 A/cm for W), Young’s formula is not true because of limitations at which it was obtained. In such conditions the integral (3) should be found numerically without any reduction. However, in (Bell and Swanson, 1979) it was experimentally shown that in high electric fields (F 5 V/nm) the energy distribution function of electrons proves to be broader than it follows from expression (4). According to the Bell and Swanson (1979), the experimentally observed significant broadening of the electron energy distribution function in comparison with the theoretically predicted distribution (3) is largely due to the influence of the emitted electrons space charge (SC) field. In Figure 1 some curves plotted using formulas (1) and (2) are given, which quantitatively and qualitatively characterize the theoretically predicted behavior of ‘‘current—voltage characteristics’’ (CVC) for the phenomenon of TFE depending on T (playing here the role of parameter) for a W tip-like emitter with the work function 3 : 4, 5 eV at field strengths ranging from 1.5 V/nm to 3V/nm. Formally, from curves 2—5 in Figure 1 it follows that, if the influence of the SC field were negligible, CVC of the TFE process should ‘‘pass’’ above the F—N straight line at any values of F. Actually such supposition is true only at rather low TFE current densities. In this connection the shape of curves 2—5 in Figure 1 and also their position relative to the F—N straight line well agrees with the experiment only at rather low (F 4.5 V/nm) fields. As revealed by W. P. Dyke and his colleagues (Dyke and Dolan, 1956), CVC of actual vacuum diodes with pointed cathodes-emitters in high electric fields (F 5 V/nm) deviate from the F—N straight line towards lower current densities (Figure 2). The authors (1956) believed that such Note that further in the text the abbreviation TFE will denote the phenomenon of electron emission activated by the effect of an electric field of about 1 V/nm and higher on the surface of condensed substance (metal) having a temperature different from 0 K. In our opinion, physically it is quite correct. Such generalization allows one to consider the FE phenomenon and also the so-called Schottky mode of electron emission as more specific phenomena produced by TFE.
170
V. E. PTITSIN
Figure 1. The current density versus field strength at different emitting W surface temperatures (3 : 4.5 eV) 1. T : 0 K (is the ‘‘F—N straight line’’); 2. T : 850 K; 3. T : 1050 K; 4. T : 1350 K; 5. T : 1600 K.
behavior of CVC in high electric fields could be attributed both to the influence of the emitted electrons SC and self-heating of the emitter tip apex by high-density current. It is natural that at a specific fixed value of F the process of self-heating of the emitter should result in the TFE current density growth, whereas the influence of the SC field, on the contrary, should cause a decrease in TFE current density. It formally follows that for the correct description of the electron emission process in high electrical fields, it is necessary to search for the solution of a set of equations consisting of Equation (2), the Poisson equation, and heat conduction equation. Apparently, owing to the complexity of such a problem statement for the description of the emission process in high electric fields, it has not been posed yet.
NON-STATIONARY THERMAL FIELD EMISSION
171
Figure 2. Typical experimental CVC for a vacuum diode with the W cathode-emitter. ACE is the F—N straight line, J is the emission current density corresponding to the transition ! from stationary TFE to a non-stationary process of electron emission; J is the maximum or limiting value of current density in the non-stationary pulsed emission mode for given initial experimental conditions: r, , 5 , and so on. (At J : J , the explosive breakdown process is initiated.)
B. Inadequacy of the Concepts Regarding the Thermal Field Emission Mechanism for Intense Electric Fields Inducing High-Density Emission Currents (J 10 A/cm) The theoretical conceptions of the physical mechanism of TFE from the metal surface briefly outlined in the preceding section quite satisfactorily agree, both quantitatively and qualitatively, with the results of numerous experimental studies of field emission process at field strengths up to
172
V. E. PTITSIN
F 5 V/nm. These results are well known to the specialists in field emission phenomena. They are comprehensively covered in a number of excellent review papers, monographs, and books (Modinos, 1990; Elinson, 1974; Dyke and Dolan, 1956) and, therefore, will not be discussed here in detail. However, the satisfactory agreement between the TFE theory and experimental data takes place only for the field range 1 V/nm F 5 V/nm and at relatively low temperatures of the metal emitter (T (1000 —1500)K). In high electric fields (F 5 V/nm) the canonical conceptions of the TFE phenomenon do not allow adequate interpretation of certain experimental data. The question of validity of using canonical conceptions of the TFE phenomenon to interpret thermal field processes occurring under the above conditions originally arose from the pioneering studies by W. P. Dyke and his colleagues (1956). In the course of those studies they revealed certain deviations from the normal TFE behavior. In particular, this manifests itself in the fact that in high electric fields the curves of J : J(F ) or I : I(V ) (here I is the total emission current from the field emitter, V is the extraction voltage or the potential difference between the emitter and anode of the vacuum diode) plotted in coordinates: ln(J/F), 1/F (or in coordinates log I, 1/V ) deviate from the F—N straight line (Figure 2). Note that in the deviation region of the curves (Figure 2) the electron emission process is not stationary and therefore the experiments were carried on in the pulsed emission mode. Dyke’s and his colleagues’ experiments have shown that during the square voltage pulse-period emission current increases with time. If the voltage pulse height is further slightly (1%) gradually increased, the emission current will increasingly grow at a fixed voltage pulse width 5(5 10\ s). Such non-stationary behavior of the emission process is usually called the effect of spontaneous current growth. Simultaneously, one can observe a bright emission ring (‘‘ring’’ effect) around the emission image of the W field emitter surface on the fluorescent screen of the vacuum diode (electron microscope—Muller projector) (Figure 3). The emission-current spontaneous-growth rate (dI/dt) slightly increases with voltage pulse height at repeated exposure of the emitter surface to field pulses and, finally, at a certain arbitrary time moment t(0 t 5) the next in sequence voltage pulse will cause an abrupt increase in the total emission current (Figure 4). For a period of 50 ns when this jump (or burst) of the total emission current takes place, it may grow (10—100) times depending on 5. This causes irreversible destruction of the emitter tip. The electron microscope observation of the destroyed and melted emitter tip shows that the linear dimensions of the melted emitter tip reach a few micrometers (up to (10—30) m). This phenomenon of abrupt emission current growth resulting in irreversible destruction and melting of the emitter tip was called
NON-STATIONARY THERMAL FIELD EMISSION
173
Figure 3. Emission image of the W MC surface in the non-stationary electron emission mode. (The ring effect can be observed at an emission current density satisying the inequality J J J .) !
by the authors, who first discovered it, the phenomenon of explosive breakdown (Dyke and Dolan, 1956). The explosive breakdown studies have shown the following (Dyke and Dolan, 1956). 1. Explosive breakdown development is not an absolutely random, uncontrollable process since it is preceded by typical and reproducible effects such as spontaneous current growth and the ‘‘ring’’ effect. It is shown that by varying the square voltage pulse height one can (by increasing pulse height) either strengthen the ring effect (i.e., ring brightness) and raise the spontaneous current growth rate or, inversely, (by slightly decreasing pulse height) gradually reduce the ring brightness and spontaneous current growth rate down to complete fading of those effects. If then, after vanishing of the effects, the voltage pulse height is increased again, the pre-explosion effects will be well reproduced.
174
V. E. PTITSIN
Figure 4. Typical oscillograms of current in the non-stationary emission mode. (Oscillograms 1—4 are obtained at successively increasing (by 1%) rectangular voltage pulse height of fixed duration; the oscillogram portion — 4 , corresponds to the current ‘‘break’’ stage due to explosive breakdown.)
2. Activation and development of explosive breakdown is not connected with the emitter bombardment by ions formed on the anode surface as a result of electron-stimulated desorption with a rather low but non-zero probability. This was proved by numerous experiments that show that the development of explosive breakdown depends only on the emission current density and not on ion energy, which is evidently defined by the extraction voltage. Besides, the independence of the breakdown activation process from the bombardment by ions desorbed from the vacuum diode’s anode was confirmed by direct experiments. The explosive breakdown phenomenon in those experiments was observed at pulsed field exposure times below the ions’ time of flight from anode to emitter.
NON-STATIONARY THERMAL FIELD EMISSION
175
Figure 5. Half-width at half-height of the emitted electron energy distribution (%) versus field strength at different MC surface temperatures (a) MC of W(100;
(1) T : 1685 K; (2) T : 84 K;
(b) MC of ZrO/W(100;
(1) T : 1455 K; (2) T : 84 K.
(Curves obtained from the experimental data (Bell et al., 1979) are shown by solid lines; curves obtained from Young’s formula (Equation 4) are shown by dashed lines.)
3. Breakdown initiation is not related to the emitter surface bombardment by residual gas ions. To prove this statement a spherical Muller projector was made (Dyke and Dolan, 1956) with two emitters whose ion bombardment probabilities were approximately the same, whereas emission current densities differed about two times at a given extraction voltage. The explosive breakdown (with preceding effects of spontaneous current growth and ring) was initiated of the emitter with a higher current density, and the
176
V. E. PTITSIN
explosive breakdown initiation at that emitter did not affect the emission current image of the other one. The experimental data given above demonstrate that explosive breakdown is initiated by thermal field processes in the emitter itself and the governing factor here is the value of current density and/or the field strength F. The results listed above were obtained in experiments on emitters made of W. Later these results were confirmed in the works of other authors (Elinson, 1974). They involved emitters both of W and other refractory transition metals such as Ta, Mo, Re, and Nb as well as of a number of refractory metal-like compounds. Another specific feature of the electron-emission process in high electric fields is the substantial broadening of electron-emission total energy spectra (Figure 5) experimentally revealed in Bell and Swanson, (1979). The experiments in Bell and Swanson, (1979) were carried out on emitters with essentially different work functions. Both emitters were made of W with crystallographic orientation (100 and the work function of one of the emitters was reduced by selective adsorption of Zr atoms. The curves in Figure 5 show that the significant departure of experimental data from the TFE theory takes place only in the high electric field region. So a set of unambiguously established and quite reliable experimental data discussed here allows us to assert that in high electric fields an abnormal behavior and some peculiarities of the emission process are observed, which go beyond the canonical conceptions of the physical mechanism of TFE. To conclude this section we list the anomalies found in high electric fields. 1. Deviation of the CVC of the process from the F—N straight line. 2. Broadening of the emission electron total energy spectra. 3. Emission process is non-stationary, accompanied by the ring effect and ends in explosive breakdown. For a long time these anomalies did not receive unambiguous interpretation. As for the question of the CVC departure from the F—N straight line, in particular, Dyke and Trolan (1953) explain this by the influence of the emission-electron SC field. A different point of view on this subject was offered in Lewis, (1956). In that work, the departure from linearity is related to possible deviation of the true potential barrier shape at the metal—vacuum boundary from the model, which is known to be defined by the combined effect of the electric image forces and electrostatic field on emitted electrons. It turns out that in high
NON-STATIONARY THERMAL FIELD EMISSION
177
electric fields the typical barrier widths become comparable or close to the interatomic distance (0.3 nm) and the conceptions of classical electrodynamics are no longer valid. The SC field effect on CVC of the vacuum diode with a pointed cathode was studied in (Aizenberg, 1954, 1964; Kompaneets, 1959; Barbour et al., 1953). The studies have shown that CVC departure from the F—N straight line at the field strength F (7—8) V/nm, with the current density of emission from the W emitter being, accordingly, as high as J (1 —5) 10 A/cm, may be really caused by the emitted electron SC field. This result was in satisfactory agreement with experimental data obtained by Dyke and his colleagues and did not raise serious objections for a long time. However, further investigations (Ptitsin et al. 1985, 1986, 1996, 1998) have cast doubt on the validity of those conceptions since for emitters made of W, Mo, and Nb used in these works the departure of CVC from the F—N straight line was observed in much weaker fields and, hence, at lower emission current densities (F 5 V/nm, J (1—2) 10 A/cm). Besides, it has been found that in the CVC departure region over the time 5 equal to a single field (voltage) pulse length (10\—10\) s the emitting surface microstructure undergoes a substantial change (Krotevich et al., 1985). Moreover, it was shown (Krotevich et al., 1986) that, when going from the linear portion of CVC to the portion where CVC deviates from the F—N straight line, one could observe an increase in resolution () of the pulsed electron microscope—Muller projector (up to (0.3—0.6) nm). Note that in the linear portion of CVC, where the emission process is stationary, the electron microscope resolution retains its typical range of values (2.5— 5.0) nm (Modinos, 1990). Experimental data presented above cannot be adequately explained based on the ideas of possible influence of the emitted electron SC field on the emission process. Because of the indicated above inconsistency between experimental data and theoretical conceptions, we performed calculations to obtain quantitative estimates of the emitted electron SC field contribution to the total field strength at the emitter tip surface (Ptitsin et al., 1996, 1998). Note again that a similar problem was earlier solved also by other authors (Aizenberg, 1954, 1964; Kompaneets, 1959; Barbour et al., 1953). However, the correctness of their results casts some doubt. It is based on the fact that to find a solution to the self-consistent problem for the Poisson equation, the authors mentioned above used essential simplifications that cannot always be considered as being physically justified enough. In particular, the emitted-electron SC distribution (6(x, y, z)) was described by a spherically symmetric function, that is a substantial oversimplification for the vacuum diode with a thermal field pointed cathode. Our
178
V. E. PTITSIN
calculations (Ptitsin et al. 1996, 1998) have shown that such simplification yields 2—3 times overestimated values for the SC contribution to field magnitudes at the emitter tip surface (F*). Besides, the initial velocity of field emission electrons ( ) in (Aizenberg, 1954, 1964; Kompaneets, 1959) was taken to be zero and, hence, 6 - at the emitter surface. Last assumption was not justified by the authors of (Aizenberg, 1954, 1964; Kompaneets, 1959). In contrast to (Aizenberg, 1954, 1964; Kompaneets, 1959), the authors of (Ptitsin et al. 1996, 1998) used a different approach to account for the SC field effect on the process of electron emission. It is based on the possibility of using the image method to calculate the field magnitudes created by the SC of emitted electrons at the emitter surface during intense emission. As opposed to earlier works, in (Ptitsin, 1996; Ptitsin et al. 1996, 1998), the initial electron velocity was assumed to be different from zero and, besides, the distribution 6(x, y, z) was approximated by an axially symmetric function corresponding to real conditions. It is shown that the suggested approach yields physically correct estimates of the magnitude sought. The potential distribution function was found by the integral equation M method (Molokovsky et al., 1991). As a model of the vacuum diode there was considered a spherical condenser in which SC of emitted electrons is distributed within a solid angle of 2 (Figure 6). The SC field strength at the thermal field emitter tip apex was calculated by the image technique (Landau and Lifshits, 1982). The field potential at an arbitrary point N M in the z-axis of the diode was sought as a superposition of three functions : ; ; * M
(5)
where is the solution of the Laplace equation meeting the boundary conditions : 0; :V Q ? where is the potential of the field created by emitted electrons, * is the potential of the field created by virtual ‘‘image’’ charges, r is the inner sphere (cathode) radius, and r is the outer sphere (anode) radius. The sum ; * should satisfy the boundary conditions: ( ; *) : 0. According to ; the image technique, the value of the function ;Q *? at the point N is defined by ; * :
1 1 r 9 dq 4% 4 4 · r
(6)
where 4, 4 , r are the magnitudes of respective vectors (Figure 6). Bearing in
NON-STATIONARY THERMAL FIELD EMISSION
179
Figure 6. Vacuum diode model for calculation of the emitted electrons SC field strength at the thermal field emitter apex. 4, 4* are radius — vectors of the point charge dq and its image dq*, respectively; r /r $ 10—10; (the anode surface is shown only partly; geometrical scale proportions are not observed).
mind that in the spherical coordinate system Jr sin dd3d4 (4)
dq : 6d : 9
where J is the TFE current density at the emitter surface, v(4 ) is the electron velocity
2e M m 4 : (4 ; r 9 2 · 4r cos ); r : (r ; r /4 9 2rr cos /4) (4 ) :
1;
180
V. E. PTITSIN
Equation (6) can be rewritten as Jr (r) : 9 M 4%
sin d4d d3 (1 ; 2e /m) M
1 r ; 9 (7) (4 ; r 9 2r4 cos ) (r4 ; r 9 2r r4 cos ) This integral equation is formally the solution of the self-consistent Dirichlet problem for the Poisson equation. In view of the fact that in high electric fields the potential distribution (r) differs from (r) only slightly (based on preliminary estimates, by two M or three percent max), to a first approximation, the function (r) under the M integral in Equation (7) may be replaced by the known function ( (r) V r r r (r) : 1 9 $ V 1 9 r 9r r r Integrating (7) in 3 and yields
?
4 r 24 cos ; 9 r r r Q 4 24 cos 1 1 9 ;19 ; (r ; 4) · 9 (8) r r r r Jr 2eV ; 1 r 2/(1 ; 2) here ; 2 2% (1 ; 2) m The field strength F at the emitter apex can be found from (8) M d F F :9 M Q M dr After differentiation and substitution of the dimensionless variable t : 4/r , we obtain r (r) : V 1 9 ; r M r
d4 (4 9 14)
F
Q
: 9V /r 9
?Q
dt
(t 9 1) 9 (t ; 1) (1 ; t 9 2t cos )
1 (t 9 1t/r ) r 3 r ?Q (t 9 1) · (t 9 1t/r )\ : 9V /r ; ; ln 4 9 dt r 2 r (1 ; t 9 2t cos ) (9) ;
181
NON-STATIONARY THERMAL FIELD EMISSION
The integral (I) in the latter relationship is elliptic. It may be expressed as a sum of simpler elliptic integrals by using a known transformation technique for such integrals (Smirnov, 1969):
c a I : (b 9 1) I 9 2b ; I ; I 9 C 2 \ 2 where I :
? z dz , (k : 91, 0, 1) (a z ; b z ; c z ; d ?
(10)
where a : (ab 9 b 9 b); b : (3b 9 2ab ; 1); c : (a 9 3b); d : 1 : (b ; r /r )\, : (1 ; b)\, a : 92 cos , b : 91/r (P( ) (P( ) 9 C: where (P( ), (P( ) are the values of denominator in the integrand of Equation (10). So Equation (9) can be rewritten in the form of F
Q
r 3 r : 9V /r ; ; ln 4 r 2 r c a 9 (b 9 1) I 9 2b ; I ; I 9 C \ 2 2
(11)
where I , I , and I are elliptic integrals of the first, second, and third kind, \ respectively, with known integration limits and coefficients of the polynomial P(z). Equation (11) is easily integrated by numerical techniques at any given values of the coefficients. Equation (11) can be reduced to a simple analytical expression. In particular, such expression may be obtained for the conditions of thermal field emitter functioning in the high emission current density mode. Under these conditions, equals /3 (Dyke and Dolan, 1956). After the transformation of integrals in Equation (11) to the Legendre form and integration we have
1 r (3 1 3 F* 5 (1.75 ; ln ; E 9 K (12) 2 2 r 2 2 where F* is the magnitude of the SC field vector, and K and E are total elliptic integrals. Numerical calculations by Equation (12) show that for typical values of the emitter tip radius and interelectrode space the field F*
182
V. E. PTITSIN
does not exceed the Laplace field by more than two percent up to J $ 10 A/cm. In other words, the result obtained implies that in agreement with the experiment for indicated values of J the SC field should not (Ptitsin, 1996) noticeably affect the shape of CVC. The agreement between the calculations performed and experiment proves that the present approach allows adequate evaluation of the SC field effect to an accuracy sufficient for unequivocal interpretation of experimental results. If, however, one sets : in Equation (11) in accordance with earlier adopted models of the vacuum diode with a spherically symmetric SC distribution (Aizenberg, 1954, 1964; Kompaneets, 1959), then the magnitude of F*, all things being equal, will be 2.5—3.2 times higher (as compared to the realistic model with : /3). For a spherically symmetric SC distribu tion, the SC field must affect the CVC shape already at J of about $5 10 A/cm, which is inconsistent with experimental data (Ptitsin, 1996). As for the possible effect of the TFE electrons’ initial velocity on the F* magnitude, from Equation (12) it follows that to ignore the initial velocity, setting it equal to zero, the following relationship should hold
2eV m
This is always the case with TFE, and the earlier adopted model simplifications : 0 are quite admissible. Thus the suggested approach yields physically correct, experimentally proved estimates of SC field magnitudes created by emitted electrons at intense TFE. To conclude, it is worth noting that the approach developed for quantitative evaluation of the SC field effect can be also used to calculate the SC field created by emitted ions during the operation of liquid metal ion sources. The suggested method for SC field calculations permits one to built CVC of the TFE process for the vacuum diode model considered. The theoretical dependence J(F, T ) was defined by Equation (2). The emitter tip surface temperature (T ) was calculated by the formula obtained in Ptitsin, (1996)
J% r Jr8 \ T$ T ; · 19 2e7 tan 27 tan where T is the initial temperature of the emitting surface at the moment preceding the process of high-density current emission that causes heating of the emitter tip up to temperature T (T T , T 300K), % is the mean energy transferred from the electronic to the phonon subsystem of metal via a single electron’s tunneling to vacuum (Ptitsin, 1996), 7 is the heat conduction coefficient value at high metal temperatures, 8 is the Lorentz number, and is the emitter apex cone opening half angle.
NON-STATIONARY THERMAL FIELD EMISSION
183
Figure 7. Predicted CVCs of TFE for a spherical vacuum diode with an axially symmetric SC distribution of the emitted electrons. 1. T : 0 K; the SC effect on the CVC shape is ignored; 2—4. CVCs of TFE, with the SC field effect and self-heating of the emitter tip by emission current taken into account (2 9 : 1°; 3 9 : 2°; 4 9 : 3°).
The current density values for specified values of r , , and % were numerically evaluated by successive iterations in the following manner: V J (T , F ) (T , F ) J (T , F ) (T , F ) J (T , F ) and so on until J $ 0.99J . Note that to reach such accuracy four iterations were enough. The theoretical CVC curves thus obtained are presented in Figure 7. They have shown that the thermal factor, that is, emitter tip heated by highdensity emission current, has greater effect than the SC field of emitted
184
V. E. PTITSIN
electrons has. Besides, comparison of theoretical CVC with experimental ones reveals that in high electric fields the physical mechanism of electron emission may be substantially changed. As for existing views on the broadening of emission-electron energy spectra in high electric fields, at present there is only one experimental work devoted to the study of emission-electron energy distribution spectra at strong electric fields (Bell and Swanson, 1979). The generalized results of this study are presented in Figure 5. The graphs given in this figure show a significant energy spectrum broadening (at half-height) in high electric fields. They unambiguously established in (Bell and Swanson, 1979) energy spectrum broadening that occurs at high-density emission currents cannot be adequately explained by the theory of thermal field electron emission (Murphy and Good, 1956). The authors of (Bell and Swanson, 1979) suggested that the spectrum broadening might be caused by the Coulomb interaction between emitted electrons (Boersch, 1954). However, at that time the authors did not make any quantitative estimates of the possible effect of such interaction on the full width of the energy distribution (E). To obtain such estimates we will use the results of (Knauer, 1981) where an analytical expression for E was derived:
e m dI R\ % V d9 where V is the voltage applied between the field emitter and anode of the vacuum diode; R is the distance between the emitter apex and the intersection point of the emitter axis with the tangent to the boundary path of emission electrons; dI/d9 is the angular emission intensity; and d9 is the element of solid angle (9) incorporating the emission current dI. To estimate E we can use the known empirical relationship F : 2V 5 V /5r, where 2 is the so-called field factor, or 2 factor, and r is the emitter tip radius, and also the relationship (Dyke and Dolan, 1956) E : 1.45
I dI : d9 2(1 9 cos
)
where I is the total emission current, and 2 is the angular opening of a cone encompassing the electron emission. Substitution of typical (for the pre-explosion phase of non-stationary electron emission from the W emitter surface) parameter values I 30mA, : /3, r 300 nm into the above expression yields the following upper numerical estimate: E < 3.0 eV. Comparison of the estimate with experimental data (Figure 5) shows that the full width of experimental energy distribution spectra for electrons emitted in high electric fields substantially exceeds the above upper numeri-
NON-STATIONARY THERMAL FIELD EMISSION
185
cal estimate. The difference between the theoretical estimates and experimental data will be even greater if the high- and low-energy tails of the distribution are taken into account. Thus, based on the estimates made, one may conclude that in strong electric fields the Coulomb interaction is only one of the possible factors that contribute to emitted electron total energy spectrum broadening. As for the mechanism instability of electron emission from pointed metal microcrystals, (with tip radii r 0.1 m), according to Dyke and his colleagues (1956), the electron-emission process may remain stationary (up to J $ 10 A/cm) only if special techniques are used, precluding development of emission instability. If no special technique are applied, the emission instability appears at much lower emission current densities (about 5 10 A/ cm). This was established by Dyke in the course of elegant experiments carried out specially to minimize bombardment of the emitter tip by residual gas ions. They were performed in ultrahigh vacuum (P 10\ torr). Ion bombardment of the emitting tip surface by the ions formed as a result of impact ionization of residual gas atoms in the bulk of a vacuum diode could be eliminated by a magnetic field whose magnetic induction vector was normal to the longitudinal axis of the field emitter. Nevertheless, even under such ‘‘refined’’ conditions the emission process could remain stationary only at J 10 A/cm. Note here that the experimentally defined stationary current density limit J , which is 10 A/cm for the W emitter, represents only some mean value obtained after data analysis and processing of numerous experiments carried out under the same conditions. Thus, it was in Dyke’s works where it was first established that the transition from the stationary mode of field electron emission to some non-stationary emission process is activated not by secondary processes in the vacuum diode, such as interaction of the electron flux with the anode surface or with the residual gas atoms, but by initiation of new subprocesses developing as a result of the effect of strong electric fields on the condensed emitter matter in the high emission current density mode. This non-stationary process, which ends in the development of instability of the total emission current and destruction of the emitter tip, was called by Dyke the explosive breakdown as mentioned earlier. Further investigations (Elinson, 1974; Ptitsin, 1996) on the phenomenon of explosive breakdown have shown, in particular, that if there exists no magnetic field normal to the longitudinal axis of the emitter, then the numerical value of J does not usually exceed 10 A/cm. It has been also shown that in the pulsed electron-emission mode the current density limit depends on the voltage pulse width (5). To distinguish between the current density limit in the stationary mode of emission and that in the pulsed emission mode, we introduce the notation J . Investigations of non-station
186
V. E. PTITSIN
ary electron emission have shown that J substantially depends on 5, and decreasing 5 may lead to its change from 10 A/cm at millisecond pulse widths up to 10 A/cm in the nanosecond pulse width range. Respectively, the total emission current from a single emitter may vary with 5 in the range 10\ A to 1 A. It is also worth noting that in Ptitsin, (1996) it was unambiguously shown that the portion of the vacuum diode CVC corresponding to the non-stationary mode of electron-emission did not coincide with the F—N straight line when the emission process was no longer stationary. Dyke et al. (1953, 1956) related the non-stationary behavior of the electron-emission process to heating of the field emitter tip by the highdensity emitter current. They believed that the unsteadiness of emission might be caused by the development of thermal instability. The latter is the result of emitter self-heating by the flowing current, that leads to a rise of temperature and, hence, of emission current density with time. As was suggested in Dolan et al., (1953), the interrelated processes of heating and respective emission current density growth, developing in time, may lead to an avalanche growth of emission current and, consequently, to emitter tip melting. The temperature regime of the emitter operation in the intense electron-emission mode was calculated in Dolan et al., (1953), based on solving the non-stationary heat conduction equation. It was assumed that the emitter heating was due only to the Joule heat. The emitter tip geometry was approximated by a figure of revolution of section shaped like a truncated cone bounded by a portion of a circle. An approximate solution of the equation was obtained with the following simplifying assumptions: 1. physical constants of the emitter substance were assumed to be independent of temperature and equal to some mean values; 2. energy losses by heat radiation were ignored; 3. temperature at the truncated cone base was assumed to be equal to an initial pre-set value; and 4. the conical part length of the emitter was believed to be much greater than the emitter tip radius, that is true only for small emitter cone apex angular opening values (). The results of calculations were used to obtain the curves of temperature distribution along the emitter axis as well as the kinetic curves T : T (t), where T is the temperature at the emitter apex, t is time. The calculated dependencies T : T (t) showed saturation. The calculated value of current density corresponding to the temperature close to the melting point (T ) of the W emitter is 10 A/cm. So the calculations in Dolan et al., (1953) have not allowed direct explanation of experimentally observed emission current growth with time
NON-STATIONARY THERMAL FIELD EMISSION
187
based on original qualitative assumptions of possible transition of the emitter condensed matter from the stationary thermal mode to the nonstationary one at high emission current densities. Note that Dyke also suggested (1953, 1956) that the nonstationary character of the emission process might be caused also by evaporation and impact ionization of emitter substance native atoms. It was supposed (1953, 1956) that the impact ionization could result in partial cancellation of the SC field, strengthening of the electron extraction field, emission current growth, and so on. However, no quantitative calculations or estimates justifying the above model for the mechanism of emission unsteadiness in strong electric fields were made. An attempt of more correct calculations of emitter heating by the high density emission current was made later in Gor’kov et al., (1962). Being essentially the same as in Dolan et al., (1953), the approach adopted in Gor’kov et al., (1962), in addition, took into account dependence of the emitter substance constants on temperature. Besides, the calculation results from that work are applicable to emitters with high values (up to /2). The main results of Gor’kov et al., (1962) are as follows. 1. The function T : T (t) is represented by a curve with saturation if J J , where J is some critical value of emission current density.
2. At J J the solution of the heat conduction equation is non-station ary and, accordingly, the T : T (t) curve abruptly rises with time. Furthermore, calculations of Gor’kov et al., (1962) qualitatively correctly represent experimentally defined relationships such as strong dependence of the maximum stationary current density on angle as well as sharp transition of the stationary emission mode to a non-stationary one upon a relatively small increase of voltage between the anode and cathode of the vacuum diode. However the calculation results of Gor’kov et al., (1953) qualitatively rather poorly agree with experiment Dyke et al., (1956) since, according to estimates made, J 10 A/cm, that is an order of magnitude higher than
the maximum experimental value of stationary current density for W emitters. In later studies on the problem of non-stationary electron-emission in high electric fields (Martin, 1960; Mitterauer et al., 1975; Glazanov et al., 1989; Ptitsin et al., 1992, 1993) the effect of Nottingham (1941) was taken into account for heat calculations. As is known, this effect consists in the phenomenon of field emission from the energy region below the Fermi level at rather low temperatures, and electrons escaping the metal are substituted by those injected into the metal from an external circuit with energy approximately equal to the Fermi
188
V. E. PTITSIN
energy. This dynamic process leads to metal lattice heating due to thermalization of the injected electrons. At emitter heating the maximum in the electron energy distribution may rise above the Fermi level and, as a consequence, the emitter tip is chilled. The transition from heating to cooling is called an inversion of the Nottingham effect, and the respective lattice temperature at which certain balance is reached is called the inversion temperature (T *). The influence of the Nottingham effect on the process of TFE was first discussed in Swanson et al., (1966). It has been shown there that thermal processes of lattice heating and cooling caused by this effect are confined in the emitter subsurface layer as the electron free path length is very small relative to the event of electron-phonon relaxation and equal to (1—10) nm. Besides, Swanson et al., (1966) gives an expression for the inversion temperature T *. Let us consider the inferences of Swanson et al., (1966) in more detail. The average energy released in the microcrystalemitter lattice as a result of emission of a single electron (% ) equals the difference between mean energies of emission ((%) and conduction (% )
electrons % : (% 9 %
In Nottingham, (1941) it is supposed that % : E , where E is the Fermi
energy at T : 0K. Then the average energy of exchange % is given by Levine (1962) % : kT cot(kT /d) where 9.76 · 10\ · F d$ 3 · t(y) Obviously, if kT /d : 0.5, no energy exchange occurs and, hence, the inversion temperature will be 5.67 · 10\ · F T *(K ) : 3 · t(y) where F comes in V/cm and 3 in eV. If the temperatures of electron (T ) and phonon (T ) subsystems of a " metal can be assumed to be the same (Kaganov et al., 1956) that is, T : T T and the Nottingham effect is considered a purely surface one, " the non-stationary thermal problem concerning field emitter heating by the flowing TFE current is formulated as follows T J 1c(T ) : (7 (T ) T ) ; 9 *(T ) (J # · T ) t (T ) ((T ) · 3) : 0
NON-STATIONARY THERMAL FIELD EMISSION
189
The boundary and initial conditions for these equations are:
T J(F , T ) % (F , T ) 9 (T ) S T : $ n e 3 (T ) : J(F , T ) T : T ; T (t : 0) T n % Here 1(T ) stands for the emitter substance density; c(T ) for specific heat, 7(T ) for the thermal conduction coefficient; s(T ) for the electrical conductance coefficient; *(T ) for the Thompson coefficient; F and T for the field strength and temperature, respectively, at the emitting surface; (T ) for the surface blackness degree; S for the Stefan-Boltzmann constant; n# for the $ vector of the normal to the emitting surface; J(F , T ) for the current density at the emission boundary; T for the temperature at the emitter base; and 3 for the potential. The solution to the thermal problem thus formulated is very difficult to obtain, even numerically. Therefore, various simplifications are applied. A solution of the one-dimensional stationary thermal problem for the conical model of the emitter tip bounded by a spherical surface portion of radius r was first found in Martin, (1960); Swanson et al., (1966). The contribution of the Thompson effect, energy losses by thermal radiation, as well as temperature dependence of the coefficients 7(T ) and (T ), were ignored and the distribution of the T (6) along the emitter axis was derived in the form of 7(T )
I% 1 I I T (6) : T ; ; 9 (13) e7 tan 76 · tan 6 276 · tan here 6 is the spherical coordinate, is the emitter cone apex opening half-angle, I is the emission current, 6 : r/tan . Taking 6 : 6 and J : I/ (6 tan ) we have I% I T :T ; ; (14) e7r · tan 27r · tan
or
J% r Jr ; T :T ; e7 · tan 27 · tan After differentiating Equation (13), we obtain
(15)
T I% I I :9 9 ; 6 e76 · tan 76 6 · tan 76 · tan where it follows that the maximum temperature is reached at the point
190 6:6
V. E. PTITSIN
% \ 6 :6 · 1; eJ6 The derivative of the function T (6) at the point 6 : 6 is defined by T 6
:9
I% J% :9 e7 e76 · tan
MMQ Substitution of typical mean experimental values of parameters (, 7, , % ) for the W emitter into Equation (15) yields numerical estimates of J values at which the emitter surface temperature T reaches (2000—2500) K. Calcu lations show that for : (0.01 to 0.03), 7 : 1 W/cm, : (1.3—2.0) 10 Ohm\ cm\, % : (0.28 to 0.33) eV (Swanson et al., 1966, 1973), T : 300 K the above mentioned temperatures are reached at J : (1 —3) 10 A/cm. The obtained estimate of J well agrees with experimental data of Ptitsin et al., (1985, 1986, 1996) where it has been shown that it is at these emission current densities that emission anomalies outlined above go beyond the theoretical conceptions of the physical mechanism of TFE. The agreement between the theoretical calculations and experimental data shows that the thermal problem in Martin (1960); Swanson et al., (1966) is stated quite correctly. However, the solution for the stationary thermal problem obtained in Martin, (1960); Swanson et al., (1966) does not answer the main question: What is, in fact, the mechanism that causes the emission current growth with time if the emitter tip temperature at J 10 A/cm remains substantially below the melting point. The studies in Glazanov et al., (1989); Vibrans, (1964) have also not clarified the above question since their results do not agree with experimental data because of considerable simplifications adopted in the non-stationary emitter heating problem statement. Note that interesting results were obtained in Glazanov et al., (1989) where the authors calculated the kinetics of pulsed heating of pointed emitters of real geometry by the high-density emission current. The main results of that work can be formulated in the following way: If the initial electron emission current density exceeds some critical value (10 A/cm), then the bulk emitter temperature, emission current, and bulk heat release power show an avalanche increase with time due to development of thermal instability. During heating up of the emitter by emission current the bulk of the emitter exhibits a region of higher temperature (as compared with that at the emitter surface), which causes thermal and elastic stresses sufficient for mechanical destruction of the emitter.
NON-STATIONARY THERMAL FIELD EMISSION
191
III. Phenomenological Model of Non-Stationary Thermal Field Emission A. Heating of the Pointed Microcrystal Emitter Tip by the Emission Current Flow As follows from the data given in the preceding sections, in the highemission current-density mode one can observe a number of effects and processes that are not amenable to adequate treatment in the framework of existing canonical conceptions applied to the phenomenon of TFE. However, the fact that under such conditions the emission process is influenced by heating of the MC tip by the high-density emission current causes no doubt. As mentioned above, thermal calculations of emitter tip heating by emission current poorly agree with experimental data. Analysis of results from these calculations and also new experimental data of Ptitsin et al., (1985, 1986) that characterize some specific features of the emission process in high electric fields show that the disparity between the predicted and experimental data may be related to the fact that all the works mentioned ignored the nonlocality of ‘‘hot’’ holes energy dissipation. These holes are formed in the subsurface layer of the emitter as a result of electron emission from the energy states below the Fermi level (Nottingham, 1941; Swanson and Bell, 1973). Besides, these works ignored the fact that at intense heating of the MC tip a noticeable contribution to the thermal balance might be made by thermally activated self-diffusion and also by evaporation of the MC native atoms into vacuum. Simultaneous consideration of all these factors when solving the heat conduction equation does not seem possible. Therefore, it was suggested (Ptitsin et al., 1992, 1993) to separate the solution of the heat conduction equation and analysis of thermally activated processes into two interrelated problems. Solving the heat conduction equation with regard to nonlocal hole energy dissipation yielded temperature distribution at the MC tip as a function of J, % , , and other parameters. Then the energy balance equation for surface processes was analyzed at a given surface temperature T to evaluate the contribution from each thermally activated surface process to dissipation of the heat energy flux, which (due to nonlocal dissipation of hole energy) propagates from the inside of the MC substance towards its emitting surface. Energy relaxation of hot holes produced in the course of tunneling emission from the states below E was considered in the framework of conceptions developed in Gadzuk and Plummer, (1973). According to those conceptions, a ‘‘hot’’ hole is localized in the subsurface layer 0.5 nm thick.
192
V. E. PTITSIN
The energy of the hot holes is reduced through conduction electron scattering by the holes, and the excess energy passes to electrons whose energy is close to E . The characteristic time of this process is the time of electron—electron interaction 10\ s (Ziman, 1962). After that, electrons of energy above E give their energy to the lattice at a distance approximately equal to the characteristic length of electron—phonon interaction " from the emitting surface. It would be natural to believe that the distribution of distances (from the emitter surface) at which electrons give up their energy to the lattice is normal. Then, in spherical coordinates, the stationary heat conduction equation in which the energy dissipation processes connected with the Nottingham effect are considered in terms of a volume heat releasing source, can be written as (Ptitsin 1992, 1993) T 2 T I% exp[9(2D)\ ( 6 9 6 9 ( )] I " ; ; ; :0 6 6 6 76 tan (2e7D6 · tan (16) where D is the variance of the random value ; ( is the mean value " " of ; 7 and are the coefficients of heat and electrical conductivity, " respectively. The boundary conditions for this equation are
T :C T :T 6 where T is the emitter base temperature, C (C 0) is the dimensional constant for a given field strength value at the emitter tip (F ). The constant value C may be determined from the function T (6) derivative zeroing condition at the point 6 $ 6 ; ( . This condition follows from the " known proposition that the temperature maximum in the bulk of a substance is located at the point corresponding to the maximum volume density of heat release power. Ignoring the temperature dependence of the functions 7(T ) and (T ), the solution of the heat conduction equation for the temperature distribution along the emitter axis T (6) can be written as
6 · C I 1 1 I% T (6) : T 9 ; 9 ; 6 76 · tan 6 26 2e7 · tan 1 ( z 9 6 dz " ; erf ; erf 6 z (2 · D (2 · D M 6 C I 1 1 I% 5T 9 ; · 9 ; 6 76 · tan 6 26 2e7 tan
193
NON-STATIONARY THERMAL FIELD EMISSION
;
1 ( 696 " ; erf ; erf 6 (2D (2D
1 , 66 6
where the constant C for a given value of field strength at the emitter tip is defined by the following relationship C5 $
1 I% ( I 6 96 " ; · erf · 6 2e7 tan 76 tan 6 ·6 (2D
J% ( J( erf " " ; 2e7 7 (2D
(17)
Substituting the relationships 6 : 6 , 6 : r/tan , and J : I/r into T (6) yields the expression for the surface temperature at the emitter tip:
I ( ( tan I% " " T 5T ; ; 1 ; 2erf 27r · tan 2e7r · tan r (2D
(18)
This expression, accurate within the factor 1/2 in the addend defining the contribution from the Nottingham effect, coincides with a similar expression (14) given in (Martin, 1960; Swanson and Bell, 1973). This implies that consideration of the hot hole energy dissipation nonlocality does not result in substantial correction of the earlier obtained numerical estimates of T . However, it is worth noting that the solution of the thermal problem given above, nevertheless, basically differs from the solutions obtained earlier (Martin, 1960; Swanson et al., 1966, 1973). The difference is in that, owing to the nonlocality of hole energy dissipation, the function T (6) has a maximum in the subsurface layer of the emitter at the point 6 : 6 . The maximum of T (6) at high J values means that in the intense TFE mode there exists a heat flux propagating from the bulk of the emitter material toward the emitter surface. The flux power density (# : 97C) at high J levels may reach values up to (10—10) W/cm. For the emission stability conditions to be retained, this heat flux directed toward the emitter surface must effectively dissipate through activation of various surface processes. In this connection a natural question arises about the energy dissipation mechanism of this heat flux. The analysis of this question is given in the subsequent sections. However, prior to this analysis it would be useful to consider a consequence of the solution found for the thermal problem. In particular, the solution given above can provide numerical estimates of maximum, or so-called limiting values of current densities (J ) in the
194
V. E. PTITSIN
stationary mode of TFE. It would be natural to assume that, by definition, J corresponds to such J values at which the temperature T becomes equal or close to the melting point (T ). Of course, the actual, experimental values of the current density limit should satisfy the inequality J J . By using the Wiedemann-Franc law, Equation (18) is easily transformed into
J% r Jr8 \ T $ T ; ; 19 2e7 tan 27 tan
(19)
where 8 is the Lorentz number. The latter expression is conveniently represented in the dimensionless form : : (1 ; ) · (1 9 )\
(20)
where
J % 2 7 · tan : T /T ; : (28)\ ; ; and J J eT 8 r Note that J formally corresponds to such emission current density value at which T $ -. From Equation (20), after rearrangements, we obtain J 4(: 9 1) : J: 91 1; (21) 2:
The latter expression can yield numerical estimates of J for emitters made of W, Mo, and Nb. Taking T :300 K, $5, r$0.3 m, % 50.3 eV (Swan son et al. 1966; 1973), and T : T and substituting respective tabular data gives the following estimates: J $ 1.8 10 A/cm (for W); J $ 1.6 10 A/ cm (for Mo); J $ 1.2 10 A/cm (for Nb). Comparison of the theoretical estimates of J with experimental data (Table 1) suggests the following implications.
TABLE 1 Dependence of Average Experimental Values (J and (J on emitter substance Emitter substance
W
Mo
Nb
5 value, ms
0.3
0.6
0.5
(J ; 10\, A/cm2
9.3
7.5
4.3
(J ; 10\, A/cm
10
8.0
5.0
NON-STATIONARY THERMAL FIELD EMISSION
195
1. The theoretical value of J for W well agrees with the results of special Dyke’s experiments, which involved a magnetic field transverse to the emitter axis that was created at the emitter surface in the ultrahigh vacuum. This agreement between theory and experiment allows one to admit that the solution of the thermal problem given above is quite correct. Based on the assumption of correctness of the solution obtained, one may now conclude that disparity between theoretical estimates for J and experimental data for (J (Table 1) is due to the fact that instability arising at current densities ! of about 10 A/cm with no magnetic field applied takes place at temperatures substantially below the melting point of W (T (1800 —2000) K). The transverse magnetic field ‘‘shifts’’ the onset of the emission instability development to the region of higher current densities and, hence, higher emitter substance temperatures (up to T ). 2. Numerical theoretical estimates of J values for various emitters (W, Mo, Nb) differ only slightly, which may be explained by the closeness of respective coefficients 7 to each other at temperatures near the melting point. 3. Theoretical estimates of J values 2—3 times exceed the experimental ones, that, in accordance with Implication 1, proves that emission instability leading to explosive breakdown develops at T (0.7 —0.8)T . 4. In accordance with experimental data (Ptitsin, 1996) the current density limit J . tan /r. To summarize, it may be stated that emission instability in strong electric fields is undoubtedly activated due to heating the metal lattice by the high density emission current, but this is not the only reason for the development of explosive breakdown since the instability appears at relatively low metal temperatures T (0.7 —0.8)T . B. Instability of the Microcrystal Emitter Tip Surface During Non-Stationary Thermal Field Emission As follows from thermal calculations, the nonlocality of hot hole energy dissipation is responsible for the temperature maximum occurring in the subsurface layer of the emitter material at a distance $( from the " emitting surface. This, according to general thermodynamic concepts, implies that intense TFE is accompanied by a heat flux whose density vector is directed from the inside of the emitter material towards the emitting surface. This suggests the necessity of analyzing possible dissipation mechanisms of this heat flux. To this end, in Ptitsin et al., (1992, 1993) was considered a local equation of balance for energy flux densities across the emitting surface. According to
196
V. E. PTITSIN
Equation (17), one may write
J( n D (T ) l ( J% " 5 S T ; O " ; erf $ kT S 2e (2D
8 ; 8 n f exp 9 kT where n is the concentration of the emitter material native atoms; is the heat of thermal transfer (Geguzin et al., 1984); D is the coefficient of surface self-diffusion; (T ) is the tangential component of the vector, T, S is a O physically infinitesimal element of the emitting surface at the emitter tip; l is the length of the outline enclosing the element S; 8 is the binding energy of the adatom with the emitting surface; and f $ kT /h is the adatom thermal vibration frequency. Note that the physical meaning of the balance equation is nothing but the energy conservation law for thermal field processes, which take place at the emitting surface at intense electron emission. For ( $ (5—10) nm (College papers, 1973), ( $ (2D, % $ 0.3 eV " " (Swanson et al., 1966, 1973), and emitter made of W, at J $ J the numer ical value for the left-hand side of the balance equation will be below 10W/cm, while the heat radiation power density will not exceed 10 W/cm. To evaluate the contribution of the second addend on the right-hand side of the energy balance equation, we will rearrange it first. Using the conical model of emitter bounded by a portion of a spherical surface of radius r and choosing an element of the spherical surface at the emitter tip as an element S, one can write T T e : T : (T ) e# ; (T ) # e# ; e# O 5 n (22) T T J (T ) : : O 5 J 5 where it is assumed that the temperature distribution in the vicinity of emitter tip apex is described by a certain function T : T (J ( (5))); e# , e# are the unit vectors tangent and normal, respectively, at the point M + S (note that, because of the axial symmetry, the vectors e# , e# and T lie in the meridional plane passing through this point and the axis of emitter); is the polar angle; T /5, T /n are the derivatives of the function T : T (J ( (5))) with respect to corresponding directions; and 5, n are the curvilinear coordinates along the e# , e# directions. Taking J( ) : J cos , where J is the current density at the emitter tip apex, and using Equations (20) and (22) and also the relationships /5 : 91/r, after rearrangement yields
NON-STATIONARY THERMAL FIELD EMISSION
197
(Ptitsin, 1993) n (T ) l (1 9 cos )n D [( ; 2 ; )] O $ (23) kT S kT r(1 ; ) It is evident that in the limit at $ 0 this expression defines the local energy flux density due to heat transfer at the emitter tip. To evaluate the right-hand side of Equation (23) numerically, one should define three parameters: n , , and D . Determining the numerical value of n at intense electron emission is a separate nontrivial problem. The solution of this problem is given in Section III. C. Here we only note that according to the solution found (Ptitsin, 1990, 1991), at T (0.7—0.8)T the concentration of the emitter material native atoms being at the emitting surface in the twodimensional gas state is equal to n 10 cm\. To obtain the ‘‘upper’’ estimate of the coefficient D , we use a known expression for the coefficient of surface self-diffusion in the Arrhenius form D : D exp(9E /kT ) ! where D is the so-called pre-exponential factor, and E is the activation ! energy of the surface migration. As far as we know, at present there are no analytical expressions for the factor D at T (0.7—0.8)T . So we will use the experimental data of Zhdanov (1988) according to which the maximum value of the coefficient D for the W MC does not exceed 10\ cm/s. At intense electron emission (Ptitsin, 1991) as a result of the thermally activated destruction of the monoatomic terrace ledges exposed at the emitting microcrystal emitter surface, the value of can be evaluated based on the TLK (terrace-ledge-kink) model (College papers, 1959). This model gives for W(110) the value $ 5 eV (Geguzin et al., 1984). With these data in mind, we find (Ptitsin, 1990, 1991) that at J $ J , the thermal transfer energy flux density (due to surface self-diffusion over the nonisothermal emitter surface) does not exceed 10 W/cm. Taking into account that the obtained value of the thermal transfer energy flux density is an ‘‘upper’’ estimate, we conclude that at T (0.7— 0.8)T the energy balance equation holds only if the general dissipation mechanism of the thermal flux directed from the inside of the emitter substance towards the emitting surface includes, in addition to processes of heat radiation and transfer, also the process of activated evaporation of emitter material native atoms, which go from the binding states in a two-dimensional gas to free states in vacuum. Based on this conclusion and using Equations (19—21) for J $ J , we obtain the mean effective binding energy of the adatom with the W emitter surface: 8 $ 2.5 eV. Note that this value of 8 is less than the binding
198
V. E. PTITSIN
energy of the W adatom with the W(110) surface when the metal surface is not exposed to a high external electric field. The difference in binding energy values is quite significant and equals 2 eV. This question calls for further investigation since for now it is difficult to give a reasonable explanation for this disparity. Nevertheless, the general inference of this section based on the fundamental concepts of thermodynamics is beyond question and consists of the fact that during intense electron emission (J 10 A/cm) the emitting surface of a pointed metal MC is transformed into an effective emitter of native neutral atoms. In other words, under these conditions the emitter surface becomes unstable and a source of neutrals (Ptitsin, 1990, 1991, 1993). C. Determination of the Surface Concentration of the Microcrystal Emitter Substance Native Atoms in the Two-Dimensional Gas State To calculate the concentration of native atoms on the emitting surface in the two-dimensional gas state at intense emission, consider a pointed W(110) MC in a ultrahigh vacuum environment. The estimation of the concentration sought (n ) at the MC emitter apex will be carried on in the context of the TLK model (College papers, 1959). The activation energies of atom transitions from various surface states will be calculated in terms of the theory of pairwise interactions. As a model of the real emitter we consider a W MC with orientation (110 bounded by a portion of a spherical surface (Muller and Tsong, 1972; Figure 96). In this model the upper part of the semisphere consists of a stack of atomic W(110) planes superimposed on one another. An average value of n will be used as a mean concentration n on concentric monoatomic ledges of width b and height s (Muller and Tsong, 1972) s:
a (h ; k ; l)
where a is the lattice parameter; h, k, l are Muller’s indices for the given orientation; is the coefficient equal to 1 if (h ; k ; l) is an even number; and : 2 if (h ; k ; l) is an odd number. In our case s : a (2/2, a : 0.316 nm. The ledge size b was calculated from the relationships (Muller and Tsong, 1972) b : r(sin 1 9 sin 1 ), i : 1, 2, 3, . . . \ (i ; 1) · s 1 : arccos 1 9 , i : 0, 1, 2, . . . r b : r sin 1
NON-STATIONARY THERMAL FIELD EMISSION
199
As a measure of the terrace width at the emitter tip of a given radius, we take an arithmetical mean ((b) over the first ten terraces. Then, for example, for r : 300 nm, (b 2.7 nm; for r : 500 nm, (b 3.5 nm; and for r : 1000 nm, (b 4.9 nm. This means that for the adopted model the terraces are narrow concentric rings. Therefore, based on the model symmetry, the problem of adatom diffusion on the terrace may be solved not for ring terraces, but rather for long plane-parallel ledges in the Cartesian coordinates on the planes. Note that such simplification eliminates Bessel’s functions in the solution of the diffusion problem. To define the n distribution over a certain terrace, it is necessary to find a solution to the equation (College papers, 1959) j :j 6
(24)
where j : 9D (n /6) is the atomic flux escaping the ledge; 6 is the coordinate counted along the normal to the ledge kink; and j is the atom evaporation—condensation flux density at some point of the ledge. For evaporation into vacuum
kT E j 5 n (6) exp 9 9 (25)
h kT where E is the activation energy of adatom evaporation from the terrace surface, and is the condensation flux density. As shown in Ptitsin, (1990);
Hirth and Pound (1957), at intense electron emission " 0. According to
the Hirth and Pound theory (1957), the relationship between n (6) and adatom concentration at an equilibrium pressure n" (6) is of the form n (6) 2P 1 : ; n" (6) 3P 3 s " where P , P are the equilibrium and nonequilibrium metal vapor pressures, " respectively, (P P). From this relationship, it is easily seen that at P : 0 " ( : 0) the values of n (6) and n" (6) will differ 3 times max. So if the term
is neglected, the error of the evaporation—condensation flux density
calculation also does not exceed 3 times, that is quite acceptable for our purposes. Thus, in view of the aforesaid, Equation (24) can be written as (Ptitsin, 1991)
n (6) kT E 9 n (6) · · exp 9 : 0 (26) 6 hD kT To specify the boundary conditions, in addition to this equation, it is necessary to define an average adatom displacement (() for the time
200
V. E. PTITSIN
preceding the evaporation event (5 ). The time 5 can be evaluated, based on the conceptions developed by Frenkel, (1958)
h E 5$ exp kT kT If the adatom movement over the W(110) terrace is considered as a two-dimensional random walk, we obtain
4D 5 For W(110) we assume the following values of parameters: E : 4.4 eV (Nakamura and Kuroda, 1969), E $ 3.1 eV (Sokolskaya, 1956; Barbour et ! al., 1960), D $ 3.6 10\ cm/s (Barbour et al., 1960). Note here that the difference in numerical values of E given in Sokolskaya (1956); Barbour et ! al., (1960) may be, perhaps, explained by the contribution of the entropy factor (Muller and Tsong, 1972). Calculations of ( for various T show that at relatively high temperatures (1500 K) the relationship b $ ( holds well. Therefore, the boundary condition for Equation (26) can be expressed as ( :
n (6) : n (0) n , n (6) 50 &J M In view of the remarks made and boundary conditions specified above, the solution of Equation (26) for the adatom concentration on the terrace will be (Ptitsin, 1990)
kT E exp 9 (b 9 6) hD 2kT n (6) : n · kT E sh exp 9 2kT hD The mean concentration of adatom may be given by sh
kT E (b exp 9 hD 2kT (n (6) : n · kT E (b exp 9 hD 2kT In the above expression, n is an uncertain quantity. To define it, we will use the following relationship (College papers, 1959) cth
j y:q 9q where q is the atomic flux density when the atoms go from their state at
NON-STATIONARY THERMAL FIELD EMISSION
201
the ledge step to a two-dimensional gas state on the terrace; q is the atomic flux density when the atoms go from the two-dimensional gas state on the terrace to their state at the ledge; and y is the ledge length per unit crystal surface. Substitution of the expression for n (6) as well as known expressions for q , q given in (College papers, 1959) into the last relationship and simple rearrangements yield [Ptitsin, 1991, 1993]
n E ;E ;E E · exp 9 (n (6) $ · exp 9 (b kT kT \ # · 1 · (b E ; · exp 9 ! cth (# (b) kT where n is the number of adsorption sites per unit monoatomic ledge length (3.8 10 cm\); # (kT /hD ) 1 is the probability of the adatom diffusion jump; E is the activation energy of the transition from the kink to adsorption at the ledge; E is the activation energy of the transition from adsorption at the ledge to adsorption in the plane; and E is the energy of kink formation at the ledge. In the framework of the theory of pairwise interactions, the activation energies E , E , E can be expressed in terms of two parameters: and *, where is the enthalpy of surface diffusion activation, and * is the energy of bond breaking at the kink-ledge adsorption transition. Based on the results of Wang and Tsong, (1982), 5 0.9 eV can be considered quite a credible value of for W(110). Then the value of * for W(110) can be obtained from the relationship (College papers 1959): 8 : 4 ; 3*, where 8 is the sublimation energy for W. The numerical value of 8 : 8.66 eV, so * : 1.7 eV. Using the thermal calculation data for T and expressions for activation energies given in Barbour et al., (1960), we obtain (n (6) $ (10 —10) cm\ at T $ (2000 —2500) K. This means that as a result of emitter tip heating by the high-density emission current the emitting surface will be coated by about a monolayer of native atoms in the two-dimensional gas state (Ptitsin, 1991). This agrees with the data of direct experiments (Drechsler, 1988). Note here that, generally speaking, the activation energies for various transitions depend on the field strength because the values of polarizability and surface-induced dipole moment for adatoms differ from zero. However, the contribution of these polarization terms can be ignored as in this case the relative changes in activation energy do not usually exceed $10%. The calculations made for (n (6) and T can be used, in particular, to estimate the emitter shortening rate due to adatom evaporation from the MC surface. The rate of pointed emitter length decrease with time (l/t) in
202
V. E. PTITSIN
the course of emitter heating can be expressed as
E kT l exp 9 $ a(n (6) h kT t This gives l/t $ 1 m/s at T $ 2500 K, and E $ 4.4 eV for W(110 emitters. The obtained shortening rate at first glance may appear to be rather significant. However, this estimate is true only for small tip radii and angles . As the real emitter shortens, its parameters r and substantially increase and, hence, the factor n /(b, concentration (n (6), and l/t will essentially decrease with time during high-temperature baking. This ‘‘dimensional’’ effect is well known to specialists in field emission electron and ion microscopy: At high-temperature heating of small radius (r 0.1 m) pointed microcrystals with no electric field applied, the MC tips are rapidly blunted up to about 1 m as a result of thermal evaporation and surface self-diffusion. In the course of emitter tip blunting, the tip retains flattened roundedness. Besides, this process is usually accompanied by deterioration of vacuum conditions inside the experimental instrument or system. A qualitatively different picture is observed at high-temperature heating of tips in a strong external electric field. As known from Sokolskaya, (1956), under such conditions a so-called thermal rearrangement phenomenon takes place. In addition, the high-density current pulse flowing through the emitter tip initiates the effect of spontaneous rearrangement (Krotevich et al., 1985). In this connection it would be useful to consider in more detail the processes developing at the MC-vacuum boundary during high-temperature heating of a pointed emitter in a strong external electric field. D. Characteristic Features of the Emitter Native Neutral Atoms Motion After the Evaporation from the Emitting Surface into Vacuum In a high inhomogenous electric field a neutral atom at the MC tip surface # (Ptitsin, 1991; Muller and Tsong, 1972) exercises a polarization force ; % # : · grad(F) ; 2 where % is the atom polarizability. To find qualitative estimates of the main characteristics of atom behavior on evaporation, consider a model problem with a spherically symmetric electric field distribution at the tip apex F(6) : F r/6 here F is the field strength at the apex of a radius r spherical emitter. After
NON-STATIONARY THERMAL FIELD EMISSION
203
an evaporation event the atom energy (W) at an arbitrary point above the emitter surface is defined by W:
M6! L %F ; 9 2 2M6 2
(27)
where M is the mass of the emitter substance atom, and L is the impulsive moment of the atom relative to the field center (center of a sphere). The radial motion part in this expression may be considered as one-dimensional motion in a force field with the ‘‘effective’’ potential energy U (Landau and "' Lifshits, 1973) L %F U 9 "' 2M6 2 The values 6 : 6 , at which U : W, define the bounds of the atom "' motion region because, if this condition is met, the radial velocity 6! goes to zero. The equality 6! : 0 corresponds to the turning point of the trajectory at which the function 6(t) changes from increasing to decreasing. The atom movement will be finite if the inequality U W holds. If this condition is "' satisfied, upon evaporation the atom will be ‘‘locked’’ in a potential trap due to the effect of the polarization force on this atom. It may be supposed that the mean kinetic energy of the atom at the moment of evaporation is equal to $kT , where T is the emitter surface temperature, and the probability density of atom leaving the surface at an angle to the normal to the surface is $2/ and, hence, the mean angle ( equals /4. In the general case, based on the energy conservation law, one may write r %F r %F kT 9 : kT sin 9 2 6 2 6 where it follows that for finite movement
6 :r·
sin 1 ;
19
2'
4'(1 9 ') sin
(28)
\
(29)
where the parameter ' %F /2kT must obviously satisfy the relationship 1 ' · (1 ; (1 9 sin ) 2 If one takes : ( : /4, then ' 0.933. Note that at : ( : /4 and ' : 1 the maximum separation of the atom from the emitter surface is
204
V. E. PTITSIN
6 9 r : (2r. For the specific case of finite movement, when the atom # ( : ) and ' 1, the initial velocity vector is collinear with the vector ; relationship [Equation (29)] may be simplified (Ptitsin, 1990, 1991) kT r : 6 9r$r (30) 2%F 4' Further analysis will require also an estimate of the typical jump time (t ) of the atom after the evaporation event for finite movement. In view of the energy conservation law, from Equation (28) follows
r r · sin \ 2M M 19';' 9 t : · d6 6 6 kT The latter integral cannot be expressed in terms of elementary functions. To evaluate the atom jump time t in the potential trap region, we restrict ourselves to the specific case of : ( : /4, ' : 1. Then 6 9 r : (2r and the jump time will be defined by
M (31) t 5 3r kT Substituting the typical values of W emitter parameters for the pre-explosive emission phase into Equation (31) yields t $ 2 ns and, hence, the time of the atom upward movement up to a point with the coordinate 6 : 6 will be $1 ns. The calculations given above refer to the model problem (a single sphere) and, therefore, cannot be directly applied to respective estimations for emitters of real geometries. To use the above results for such emitters, the following considerations could be applied: (1) if a single sphere whose surface potential is equal to V is replaced by a pointed emitter of real geometry with the same radius of curvature r of the tip and the same potential value, then the field strength near and at the top apex (F ) will be related to the field strength value F at the sphere surface by the formula F : 2rF where 2 is the field factor (or 2-factor), F : V ; (2) since the turning points are located near the emitter surface, to a first approximation Equation (28) remains valid. In view of the aforesaid, we have
F$
2kT %
r 6 r 1 9 2r 6
1 9 sin
(32)
NON-STATIONARY THERMAL FIELD EMISSION
205
Using Equation (32) and recalling that % : 1.867 10\ C m/V (for W atoms (Ptitsin, 1991)), one can determine F values at T $ 2000 K. The calculations show that the F values range from $2.7 V/nm to $5.4 V/nm (depending on 2 and ). Thus the calculations made above prove that during electron emission in the presence of a high electric field the atoms of the emitter material evaporating from the emitting surface turn to be ‘‘locked’’ in a polarization trap. The linear dimensions of the polarization trap (6 9 r) are very small (about 100 nm and below). Numerical calculations of F for other MCs (of Ta, Mo, Nb) show that the finiteness condition for the movement of evaporating atoms is met also for these emitters if the emission takes place in high electric fields. # does not depend It is worth noting that, as the direction of the vector ; # (with respect to the normal to the emitter surface), on that of the vector F all the results and implications of this section are applicable both to electron emission in a high electric field and to intense heating of a pointed emitter due to external energy source in the electric field of the reverse (or ‘‘field ionization’’) direction. It is known that during heating the emitter tips in the electric field of the ‘‘field ionization’’ direction, the initially flattened rounded surface of the MC tip is substantially transformed. These evolutional changes in the MC surface due to high temperatures and intense electric fields were called the phenomenon of thermal field rearrangement. Studies of this phenomenon have shown that the thermal field rearrangement may involve both infinite (Zhukov et al., 1989) and finite (Krotevich et al., 1984, 1985; Fursey et al., 1984) neutral atom movement. For further analysis, it would be interesting to estimate the vapor atom concentration (N ) in the polarization trap region. For a single-sphere emitter model we will do this with the following simplifying assumptions. In particular, we will suppose that: 1. The vapor atoms constitute an ideal gas of particles and, hence, such ensemble of particles can be described in terms of the Maxwell-Boltzmann statistics. 2. Upon evaporation the emitter atoms in the polarization trap region remain unionized. To calculate N , we assume that in the steady state the evaporation rate ( ) is equal to the condensation rate ( ) "
kT E exp 9 : (N , T ) :n (33) " h
( kT where T is the vapor atom temperature. The vapor atom could be assigned (
206
V. E. PTITSIN
a definite temperature, if in the polarization trap region there existed interatomic collisions accompanied with kinetic energy transfers. As a criterion of the absence of interatomic collisions, the following relationship can be adopted 1 5 2(6 9 r) (34) ' (2N where is the free-path length of the atom, and is the effective collision ' cross section. Taking $ R , where R is the atom radius, in view of Equation (34), we obtain that the effective interatomic collisions will take place if N is on the order of over 10 cm\. The calculations below will show that N values are somewhat less than 10 cm\. In the absence of interatomic collisions, it is imposible to talk about temperature as a constant quantity independent of the coordinates of the thermodynamic system and characterizing the equilibrium state of the vapor atoms since in this case the kinetic energy of a single vapor atom (E ) is not constant and depends on the coordinate 6 as follows
1 r 1 E (6) : kT 9 %F ; %F 2 2 6
(35)
To find the form of the function N (6), we will use the relationship of the classical statistical physics and define the effective temperature of vapor atoms (T ) by " 1 T $ · (E " k where (E is the average value of kinetic energy over the entire ensemble of atoms and coordinate 6. Note that such definition of T assumes that the " impulsive moment of an atom moving in the central force field remains constant and, hence, the atom trajectory lies in one plane. Therefore, the number of degrees of freedom for this atom can be taken equal to 2. In view of the above remarks, (E can be expressed as 1 MG 1 (E : E (6) d6 (36) 6 9r N where N is the total number of atoms within the polarization trap. Using Equation (36), upon integration, we obtain (E $ kT and, hence, T : T $ T . Using the expression derived for the effective temperature and ( " the relationship for the magnitude of vapor atom velocity
2kT M
207
NON-STATIONARY THERMAL FIELD EMISSION
the condensation flux density can be defined as (Ptitsin, 1990, 1991; Landau and Lifshits, 1976)
M )Q M kT : 2N exp 9 · d $ 0.3N
2kT M 2kT (37) To determine the particle concentration distribution along coordinate 6, we use the condition of chemical potential (Z) constancy for an ideal gas of particles in the force field (Landau and Lifshits, 1976) Z(6, T ) ; U(6) : const where Z(6, T ) : kT ln[N (6) (h/M kT )] 1 r U(6) : 9 %F 2 6
This yields
r %F 1 9 r 6 6 6 2kT After rearrangements, in view of Equations (33) and (37), we have N (6) : N exp 9
(38)
n E $ 3.4(MkT ) exp 9 (39) h kT The substitution of the numerical values of parameters: T $ (2000 — 2500) K, F $ 5 V/nm, and E $ 3.0 eV typical of the pre-explosion emission phase into Equation (39) shows that in this case N may be on the order of up to $(10—10) cm\. This means that the pressure of metallic vapor atoms in the pre-explosion emission phase in the polarization trap region approaches the atmospheric pressure. The above calculation results impose a natural question of the probability of vapor atom ionization due to either impact ionization (electron impact) or field ionization. This question is discussed in the next section. N
E. Ionization Probability of the Emitter Substance Native Atoms After the Evaporation According to the results of the preceding sections, the intense evaporation of the native atoms of the MC material can take place both at intense
208
V. E. PTITSIN
heating of the MC tip due to external energy source and in the course of high-density current electron emission. In the presence of a high electric field the evaporating atoms can be ionized by different mechanisms. If the electric field is accelerating for electrons, atom ionization may be caused by electron—atom collisions (impact ionization) or by field ionization (FI) of free atoms. If the high electric field is retarding for electrons, a free atom near the emitter surface is likely to be ionized only through FI. To calculate the probability of impact ionization (P ) during intense
electron emission, we will use the known expression P : 1 9 exp(97 · t)
where 7 is the atom ionization probability per unit time due to electron—
atom collision, and t is the time of atom movement in the polarization trap region, 7 will be defined by (Ptitsin, 1996)
1 1 7 : · (J · · J ·
e
e where J is the emission current density, is the impact ionization cross
section as a function of the relative velocity of the ionizing electron and the atom, and is the maximum value of impact ionization cross section
corresponding to the incident electron energy approximately equal to (2—5) I , where I is the atom ionization energy. After calculation of from Drawine’s empirical formula (1961) for W
atoms, we obtain 10\ cm, where it follows that at J $ 10 A/cm
the value of 7 satisfies the inequality 7 6 10 s\. Accordingly, P for the
time (1 ns) equal to the atom jump time turns to be close to unity. The FI probability per unit time (7 ) for a free neutral atom is (Ptitsin, ' 1996; Muller and Tsong, 1972)
2S S 7 : A# exp 9 '
(40)
where A : const 1, # is the electron oscillation frequency (# 10 s\), S $ (2m)
e I9 9 ezF(z) dz 4% z
(41)
where % is the permitivity of vacuum, and Z , Z are turning point coordinates. Note that Equation (41) ignores the contribution from the potential of image forces because for z 0.5 nm this contribution is negli-
NON-STATIONARY THERMAL FIELD EMISSION
209
gible. Assuming F(z) $ const yields
I eF · 1< 19 2eF 4% I After calculating (41), we have Z
:
(42)
Z Z 2 (43) S $ (Z ) E(0) 1 ; 9 2 K(0) Z Z 3 where E(0), K(0) are full elliptic integrals, 0 (1 9 Z /Z ). Calculating Equations (40—43) shows that in high electric fields F $ (4.5—5.0) V/nm, a free W atom in ground or low-lying excited energy states is ionized by the field at a rate of 7 2.5 10 s\ and, hence, the FI probability ' (P : 1 9 exp(97 t)), turns to be close (or equal) to unity already after ' ' the time of W atom residence in the polarization trap region equal to (or less than) 10\ s. The numerical estimates obtained suggest that in high electric fields (of magnitudes F (4.5 —5.0) V/nm), the W MC native atoms, which will reside in the polarization trap after evaporation, will be ionized primarily (7 7 ) '
through field ionization. Note that this conclusion is true not only for W emitters but also for other (Mo, Nb, Ta) pointed MCs studied and, therefore, can be considered as rather general. The results presented above suggest also that at a high density (J of about 10 A/cm) emission current in the polarization trap region there will exist SC consisting of ions of the MC substance and electrons. Among the electrons are those produced by FI of neutrals. Besides, the electronic component of SC naturally includes emitted electrons. In other words, under these conditions at the emitter tip surface there will appear a plasma blob, or plasmoid. F. Processes at the Interface: Emitter Surface—Microplasma L ayer As seen from the results given in the preceding sections, electron emission from the emitter tip surface in the high density emitter current mode initiates secondary thermal field processes, which produce a layer of SC—microplasma (MP) near the MC tip emitting surface. This statement is confirmed by experiment. From the data of Slivkov (1986) obtained in the studies devoted to the initiation and development of vacuum breakdown, it is known that immediately before the vacuum breakdown on single local microinhomogeneities— microtips at the macrosurface of the vacuum gap cathode, there appear luminous regions, which in the course of the breakdown development are
210
V. E. PTITSIN
transformed into intensely radiating and expanding with time plasma blobs—cathode flares. The MP radiation spectra first show the lines of excited neutral atoms entering into the chemical composition of the cathode material (Mesyats and Proskurovsky, 1984). The studies of electron emission from the surface of single- pointed emitters at emission current densities of about 10 A/cm have also revealed similar luminescence of the forming MP layer (Ptitsin, 1996). As stated in Section II, B simultaneously with the flashes of radiation emitted from the local region near the emitter apex, the non-stationary emission process is acompanied with: (a) significant change of emitting surface microstructure (due to the effect of spontaneous rearrangement), (b) ring effect, and (c) increase in the resolving power of the electron microscope— Muller’s projector. Applying the ideas developed in the previous sections, the effects listed above can be interpreted in the following way. The effect of spontaneous rearrangement is believed to be due to the interaction of ions formed at field ionization with the emitting MC surface. According to estimates made in Ptitsin, (1996) the energy E of the ion interacting with the emitter surface may reach 10 eV up to 100 eV and above. At such energies there is a high probability (Kaminsky, 1967; Ion Bombardment . . . , 1984) of knocking the surface atoms out of the lattice sites and also transferring to adatoms the initial energy sufficient for their transition from binded states at the surface to free states in vacuum. The atoms knocked from the emitter surface as a result of the cathode selfsputtering will be ionized with a high degree of probability by a high field. During the current pulse, flowing the secondary processes caused by ion interaction with the emitter surface may undergo multiple multiplication, that leads to emission instability and changes in the emitting surface microstructure. The ions produced by FI will have different energies depending on which local region of the near surface they exercise acceleration. Since the field near the emitter surface is inhomogenous both in coordinate 6, and polar angle coordinate , the ions accelerated in vacuum in the local regions adjacent to edges and angles of the emitting crystal surface will possess the maximum energy. This means that all other things being equal, the most substantial change of emitter surface microstructure must take place at the local sites near the edges and angles forming the emitting surface. It was the microstructure change in the course of spontaneous rearrangement that was observed in our experiments (Krotevich et al., 1984, 1985; Ptitsin, 1990). The ring effect can be also qualitatively interpreted in terms of emitted electron interactions with the MP layer. It is known that during such interactions electrons undergo scattering at the boundary of a dense plasma blob. Based on the axial symmetry of the process, the plasma blob boundary
NON-STATIONARY THERMAL FIELD EMISSION
211
is evidently close to a circle. The electron scattering process at this boundary is displayed on the luminescent screen of the electron microscope—Muller’s projector as a diffraction pattern. To justify this approach, let us make some estimates. As is known, the microparticle diffraction is most clearly noticeable if the de Broglie wave length of a microparticle ( ) is comparable with the $ distance between the scattering centers. This means that diffraction can be observed if the condition N \ holds. The mean de Broglie wavelength $ ( for electrons at TFE can be evaluated from the relationship (Ptitsin, $ 1996) ( 4Z, where Z is the width of the potential barrier at the $ metal—vacuum boundary. Using the data of Modinos, (1990); Elinson, (1974), we will find that, according to calculations of Section III, D, the concentration N in the MP layer for W emitter must be on the order of (10 cm\ 9 10 cm\). Another proof of such interpretation of the ring effect consists in that the suggested approach can give a satisfactory quantitative explanation to the known experimental fact (Elinson, 1974; Ptitsin, 1996; Krotevich, 1985) that the ring effect for W and Ta emitters is observed with the probability close to 1, while for Mo and Nb emitters the probability of the ring effect observation turns to be 3—4 times lower than for W and Ta emitters. This seems to be caused by the fact that the intensity (I ) of electron flux scattering on the MP layer boundaries depends on the atomic number (A) of the element in the periodic table in the following way I .
m · e $ · (A 9 f ) · 2h sin
(44)
where is the scattering angle, and f is the atomic scattering amplitude. From this it follows that, in accordance with the atomic numbers of the emitter materials studied, the intensity ratios for diffraction maxima well agree with the experimentally established ring effect observation probability. Finally, the resolution growth of the pulsed electron microscope—Muller’s projector in the pre-explosion phase of non-stationary electron emission may be adequately explained as follows. At the initial stage of MP layer formation, the electron flux through the anode surface of a vacuum diode is created by both the TFE electrons and the electrons produced in the course of field ionization of evaporating neutrals. Therefore, the emission image of the emitting surface will be evidently created by electron fluxes of various ‘‘origin.’’ To define the resolving power ( ) of Muller’s projector in ‘‘field ioniz ation electrons’’ we shall recall that the event of field ionization occurs at a short (100 nm) distance from the emitting surface. Therefore, to a first approximation, can be calculated from the known formula (Modinos,
212
V. E. PTITSIN
1990) for the resolution of Muller’s projector in ‘‘field emission electrons’’ ( ) ' E O : 41r (45) ' eV
where 1 is the image compression coefficient, E is the tangential compoO nent of the field emission electron initial energy, and V is the potential difference between the field emitter and anode of Muller’s projector. Using Equation (45) for yields E (46) $ · ' E O where E is the tangential component of the initial energy of an electron upon the event of field ionization of a neutral. E will be estimated based on the Heisenberg uncertainty principle
(2m E ) [D ; (Z 9 Z )] $ 2
(47)
where D is the diameter of a neutral, and Z and Z are the turning point coordinates. From Equations (46) and (47) and the expression for E given O in Modinos (1990) we obtain that for the W emitter at F 5 V/nm the resolution of Muller’s projector ‘‘in field ionization electrons’’ is $ 0.16 . ' For a typical value of 2.5 nm (Modinos, 1990), the resolution in ‘‘field ' ionization electrons’’ will be equal to about 0.45 nm, that is, in agreement with experimental data (Krotevich et al., 1986) the resolution of Muller’s projector in the pre-explosion phase of emission turns to be close to the atomic resolution. Note that the increase in resolution of Muller’s projector under these conditions is observed only at local sites of the emitting surface adjacent to closely-packed faces (Krotevich et al., 1986). This peculiarity of the emission image will be quite understandable if we estimate the electron current density due to field ionization of evaporating neutrals in vacuum. Taking the field ionization probability equal to 1, one can easily define the field ionization electron current density from Equation (25). The appropriate calculations show that the field ionization electron current density (J ) at the emitter surface temperature T (2000—2500) K may reach (10— 10) A/cm. The values obtained for the current density are 1—2 orders of magnitude less than the mean (or effective) current density at the preexplosion stage of non-stationary emission, which is close to 10 A/cm as follows from the preceding sections. However, in view of the fact that at the borders of closely-packed faces having a high value of the work function, the
NON-STATIONARY THERMAL FIELD EMISSION
213
TFE current density is rather low (see Equations (1) and (2)), and the temperature is evidently close to an average value over the entire surface, it is easy to see that it is at those local sites of the surface that the higher resolution conditions will be fulfilled. It would be natural to assume that the field ionization electron fluxes are also formed above the surface areas with a lower work function than that for closely packed faces, but as they are 1—2 orders of magnitude less than the TFE electron flux, it is impossible to distinguish the contribution of the field ionization electrons against ‘‘the background’’ of TFE electrons on the luminescent screen of Muller’s projector without special techniques. So the ideas developed above, which suggest that at intense emission in strong electric fields a MP layer is formed at the MC tip surface, will agree with experimental data. Let us go now to the analysis of some processes that depend on the established fact of MP formation at the MC tip surface. To analyze processes that occur in the system: pointed emitter—MP layer—vacuum gap of a diode, we start from the continuity equation for the current flowing through the diode I :I :I (48) where I , I are the electric current intensities in the emitter and cathode drop layer (or the so-called Langmuir layer), respectively; and I is the current intensity in vacuum at the MP—vacuum boundary. Next we can write the following relationships I :I ;I :I ;I :I (49) " " I : J S : I (50) " I I /e (51) where I is the electron current equal to the emission current from the metal " surface into MP; I is the component of the total electron current in metal corresponding to neutralization of the ion flux incident on the metal surface; I is the ion current from MP to the metal surface (its density J can be defined by an expression given, for example, in Molokovsky et al., (1991); Krendel, (1977); is the coefficient (or probability) of ion neutralization at the metal surface; S is the area of the emitting MC surface—MP boundary; " I is the neutrals flux from the metal surface; and is the ion accomodation coefficient. Note that the phenomenological equations (48)—(51) describe a nonstationary process and, therefore, all ‘‘currents’’ and also coefficients of accomodation and neutralization are generally time dependable. The meaning of these equations for an arbitrary time moment seems straightforward
214
V. E. PTITSIN
and does not require detailed comments. It should be only mentioned that the I current component defines the electron current in the Langmuir layer, " however, here, the mechanism of electron emission is not discussed. The last equation in the above set of equations reflects the fact that under these conditions the substance injection into the MP layer takes place due to evaporation of neutrals and, therefore, during the MP life the rate of neutrals generation must be no less than that of neutrals recombination. Note that according to data of Kaminsky, (1967); Rakhovsky, (1970), the neutralization probability at metal ion interaction with the metal surface consisting of native atoms is close to 1. However the accomodation coefficient of such ions ( ) differs from unity and according to various sources (Rakhovsky, 1970; Rayzer, 1987) may reach 0.5 1. To find out the mechanism of electron emission at complete screening of the external (Laplace) field by plasma SC, we will apply the known Langmuir-McCown equation (Rakhovsky, 1970; Rayzer, 1987)
4 M m F: J (1 9 s) 9s V (52)
% " 2e 2e where F is the field strength created by the SC of plasma ions and electrons at the metal surface, J : I /S ; s J /(J ; J ) 5 I /(I ; I ) : I /I; V is " " " " " " " "
the potential drop (or cathode drop) at the plasma—metal boundary; I is the total current through the vacuum diode; and M is the ion mass. Using the straightforward relationships: J : (I 9 I )/S ; s : I /(I ; I ) : (I 9 I )/I, " " " " the Langmuir-McCown equation is easily transformed into
4 I 9 I I M (I 9 I ) m F: 9 (53) V
% S I 2·e I 2·e " Ignoring the second term in square brackets, which is on the order of a 10\th part of the first term, and solving the equation for I at given I and F yields I 4· (54) I : · 1; 19 2 I
where % S FI(e/8MV ). The radicand non-negativeness condition "
implies that
I MV
F
(55) % S 2e " According to Equation (54), two solutions I and I correspond to given values of I and F satisfying Equation (55). Formally, in view of the form of Equations (52) and (53), this evidently means that one and the same field F
NON-STATIONARY THERMAL FIELD EMISSION
215
value may be created by either of two possible combinations: (I , I ) and " (I , I ) at a given total current I. In the general case, I " I and I " I . " " " Based on general physical principles, it was shown in Rakhovsky, (1970) that at the interaction of dense quasi-neutral plasma with the metallic electrode—cathode of a vacuum diode the equality I : I should hold. This " implies that under these conditions only a doubly degenerate solution of Equation (53) for the variable I exists, that is, I I :I : (56) 2 I I :I : " 2
s:
1 2
(57)
I MV
F: (58) % S 2e " Note that Equation (58) has quite a simple physical meaning that under such conditions the negative pressure of the SC field ponderomotive forces at the metal surface must be equal to the gas-kinetic pressure of the flux of ions accelerated in the cathode drop region up to the energy eV (% F/
2 : en V : J (MV /2e) where n is the ion concentration in the MP at the
cathode drop region boundary). Equation (58) allows one to estimate the value of F from experimental data in real conditions when it is possible to specify the values of I, S , and V with a sufficient accuracy. According to "
Ptitsin, (1990); Rayzer, (1987) in the non-stationary emission mode the total current through the vacuum diode with a pointed cathode-emitter abruptly (for less than 0.1 s) rises from 10\ A to 1 A and then over 0.1 s slowly increases or remains constant (quasi-stationary stage). The process either ends in transition to the state of vacuum breakdown, if the vacuum gap is bridged by plasma, or the current rapidly drops to zero (a so-called current ‘‘break’’ or ‘‘cut-off’’ ) if the gap is relatively wide. In the electron Muller projector, wherein the emitter-to-anode distance is relatively great (about 1 cm) the vacuum breakdown does not usually develop and the non-stationary process ends in the current break. The electron-microscopic studies of the emitter tip geometry upon completion of the current instability and break development show (Elinson, 1974; Mesyats et al., 1984) that the emitter tip turns to be melted and its average radius is equal to r (1.0 —1.5) m. In view of these results and using empirical values for the cathode drop (Rakhovsky, 1970), we obtain V 16 V, I $ 1 A, J $ (3 9 5) 10 A/cm,
J : J $ (1.5 9 2.5) 10 A/cm, and F $ (4.5 9 5.0) V/nm for the W emit " ter. These estimates have a very important physical meaning because they
216
V. E. PTITSIN
quantitatively characterize the non-stationary electron emission process after the emitter tip is screened by a plasma layer. First of all, the estimates made suggest that the electronic component of the total current in the Langmuir layer is due to thermal field electron emission from metal since the field F and the value of J numerically satisfy Equation (2) describing " the TFE process. Besides, from the obtained estimates, on the basis of thermal calculations and TFE theory concepts, it follows that the values obtained for J are not sufficient for MC tip heating up to the melting point. " This, evidently, means that the experimentally observed emitter tip melting upon explosive breakdown takes place either at the end of the abrupt total current growth or already at the quasi-stationary stage owing to injection of the concentrated energy flux due to the ion current component in the Langmuir layer into the emitter substance. The transition from the stage of abrupt current growth to a quasistationary stage followed by current break seems to be connected just with the phase transition of the emitter substance from a crystalline to a liquid state. This statement can be qualitatively proved as follows. The crystal— liquid phase transition may lead to current break if such transition results in an abrupt decrease of the field F , which is mainly defined by the ion SC in the Langmuir layer. The decrease of ion concentration in plasma is possible if, as a result of the phase transition, the flow of neutrals
kT 8 S exp 9 I $n h " kT
(59)
from the emitter surface is considerably reduced. Since the emitter temperature, surface atom concentration and oscillation frequency at phase transition change continuously, the flow of neutrals can be reduced only due to a sharp increase in the binding energy of surface atoms. According to Frenkel, (1958), the binding energy of surface atoms in liquid is close to that of atoms located at intralattice sites. As applied to the problem studied, this means that after transition to the liquid phase, the physically isolated, loosely bound with the closely-packed MC faces adatom states disappear and, hence, evaporation of neutrals from the liquid metal surface would require much more energy. To our estimate, the binding energy of the liquid W surface atoms is 6 eV. In the estimations it was taken into account that after melting the excitation of the electron subsystem of metal would continue for some time (Ptitsin, 1996). At such binding energies the neutrals generation rate becomes negligible and plasma will decay due to surface and volume recombination. To summarize the above, it is worth noting that, though the electronic component I of the total current is defined by the TFE mechanism, the "
NON-STATIONARY THERMAL FIELD EMISSION
217
process of non-stationary electron emission is not identical and not reduced to TFE, since, first, the bulk emitter current in the course of non-stationary emission is the sum of two components: the thermal-field and neutralization ones, and the latter becomes numerically equal to the thermal-field component as a result of MP formation (see Equation 57) and, second, owing to interaction of the emitter substance with concentrated energy fluxes (due to electron and ion components of the total current), the non-stationary emission is accompanied with a continuous phase transition (sublimation) of the condensed emitter matter to the state of ionized metallic vapor and then to the plasma state. As shown in Ptitsin, (1996), the plasma ion interaction with the emitter substance accelerates the process of intense sublimation of the MC substance surface layer, that leads to expansion of the dense plasma blob (cathode flare plasma) into vacuum.
G. Non-Stationary Thermal Field Emission Current Kinetics To obtain a more complete picture of the physical mechanism underlying the processes of non-stationary electron emission, let us consider also the process of current flowing in vacuum. To describe this process, it is necessary to find the form of the function I : I(V , t), where V is the potential difference between the cathode and anode of a vacuum diode. As a diode model we choose a spherical condenser. Based on the general concepts of plasma physics, the form of the instantaneous potential distribution 3(6, t) and the self-consistent field strength F (6, t) in a spherical vacuum diode can be qualitatively described
by the curves shown in Figure 8. To define the function I : I(V , t), a non-stationary self-consistent problem for the Poisson equation was solved. To a first approximation, it was assumed that the function characterizing the electron velocity () dependence on the potential : (3) is identical to that without SC field. From the solution of the Laplace equation for R 6 R with the boundary conditions: 3(R ) V , 3(R) : V $ 0 (V V ) it follows that $
2e V R R 19 9 V ; m R 9R 6
where is the initial velocity of electrons.
(60)
218
V. E. PTITSIN
Figure 8. Instantaneous self-consistent field potential 3(6, t) and strength F (6, t) dis tribution curves in a spherical vacuum diode in the course of microplasma blob expansion during non-stationary high current density electron emission process. (H is Langmuir layer width, R is instantaneous coordinate value of the emission boundary, r is the instantaneous value of the MC tip radius, and V is the instantaneous value of the potential 3(R, t); scale proportions along the coordinates are not observed.)
Then, in the general case, the problem in the spherical coordinate system can be stated as follows d3 2 d3 I ; : d6 6 d6 4% 6
(61)
3(R) V , F (R) : 0 The solution to Equation (61) will be sought in two steps: first we will find F ( 6, t) and then, after integrating the function found, we will determine
3(6, t). Rewriting Equation (61) in the form dF 2
; F : d6 6
R '6 6 9 '
(62)
219
NON-STATIONARY THERMAL FIELD EMISSION
here I
9
2e V R 4% · m R 9R upon integrating, we obtain F (6, t) : · '\
R 69 ' R 1 R ; 9 19 ; ln 6 6 ' '6
2e V 9 m '1; 2e V R · m R 9R
R ; 6 ' 1 ;1 19 '
69
R
(63)
Note that the last expression allows us to find an approximation to the sought relationship I : I(V , t) without subsequent integration. This is because the process of instability development lasts at most 50 ns. During this time a plasmoid (cathode flare) moving at a velocity of 2 10 cm/s, (Elinson, 1974; Mesyats et al., 1984) expands to sizes of about 1 mm. Therefore, for vacuum breakdown in Muller’s projector, which is characterized by relatively large vacuum gaps (R 1 cm), it can be assumed that the self-consistent field strength in the above expression near the anode surface at an arbitrary time moment t (after the onset of instability development), to a first approximation, is equal to V R · F (R , t) $ 9
R 9R R This ‘‘immediately’’ yields an approximation for the unknown function I(V , t). A general analytical expression that is true for any gap can be derived after integrating F (6, t). The integration followed by rearrangements gives
the following expression for the time dependence of the anode current
2e 'V (V 9 V ) 4% m I(t) $ (1 9 4)
; 2ln
1; 19 4
4 '
4 4 4 4 ; ln 49 19 9 ln 1; 19 2' ' ' '
\
(64)
220
V. E. PTITSIN
in which, for convenience, the notation 4 R/R : t/R , is introduced, where v $ 2 10 cm/s is the velocity of the plasma leading front (Mesyats et al., 1984). Note that the functional relationship Equation (64), according to which I . V , well agrees with the results of Bellustin, (1939). Equation (64) describes a non-stationary emission process for the model problem (in the spherical geometry of a vacuum diode) and, hence, cannot be used for qualitative comparison with experimental data obtained for diodes — Muller’s projectors. To make such comparison, one should take into account that the full solid angle of emission in the diode with an actual electrode geometry is substantially less than 4 and approximately equals 2(1-cos 9), where 9 is the half-angle of the cone opening incorporating 90% of the entire electron flux. According to Elinson, (1974); Dyke and Dolan, (1956), 9 /6. Another thing to be accounted for when refining Equation (65) is that initiation of a non-stationary emission process and its further development with time depends on the respective initial field strength originally created at the emitter surface as a result of applying a potential difference V between the anode and emitter of a vacuum diode However, the magnitude of the voltage V * in a diode of real geometry needed to create an initial critical field F 5 V/nm is not equal to that in a spherical diode. The relationship between V * and V can be defined if a certain approximation of the actual pointed emitter surface shape is specified. As shown in Dyke and Dolan, (1956), the shape of a real emitter surface is very close to a hyperboloid of revolution with a known expression for the so-called 2-factor (or field factor). Assuming such approximation, we introduce the following substitution into Equation (64)
V : 2V * ln\
4R r
(65)
here V * is the voltage at which the instability process in a diode of real geometry is initiated. To change from the variable 4 to t, we use the formula t : 4R / . Figure 9 shows a kinetic curve of the current I : I(V *, t, R , r) calculated from Equation (64) with appropriate corrections for V (R) $ V .
The following typical experimental values of parameters were used: V * : 8 keV, R : 5 cm, and r : 300 nm. A schematic diagram of different stages in the transition of thermal field electron emission to vacuum breakdown is presented in Figure 10. Figures 9 and 10 show a satisfactory agreement between theory and experiment.
NON-STATIONARY THERMAL FIELD EMISSION
221
Figure 9. Theoretical curve of the current kinetics in the spherical vacuum diode during non-stationary high current density electron emission process (V * : 8 keV; R : 5 cm; r : 300 nm).
IV. Discussion and Conclusion Thus the analysis has shown that the stationary TFE process naturally goes to a non-stationary emission process at high emission current densities (J 10 A/cm), that results in a 10-fold increase in total emission current (but not in its density!) in about 10\ s. The abrupt rise of current is accompanied by sublimation of the emitter material and dense plasma formation. The integral length of this non-stationary process including a quasi-stationary stage does not exceed 10\ s (Rakhovsky, 1970). The ‘‘lifetime’’ of the non-stationary process is defined mainly by the time needed
222
V. E. PTITSIN
Figure 10. Diagram of different stages in the transition of TFE to vacuum breakdown based on experimental data (Elinson, 1974). (The abrupt emission current rise stage lasts about 100 ns; the duration of other stages of this curve depends on experimental conditions.)
for the crystal-to-liquid phase transition of the emitter tip substance. Note that the detailed calculation of the current kinetics and the lifetime requires solving a non-stationary two-temperature thermal problem, which at present could hardly even be stated correctly because the interaction of intense high power density ((10—10) W/cm) ion fluxes with substances has not yet been well studied. In fact, for example, the question of interrelation between non-stationary thermal field electron emission and so-called ‘‘explosive emission’’ (EE) (Mesyats et al., 1984) still remains unclear. The studies carried on in this work have shown that non-stationary thermal field electron emission is a specific variation or consequence of the interaction between the concentrated energy flux and the emitter material, which takes place in a high electric field in cases when, due to high density (J 10 A/cm) TFE current flowing through the MC substance-emitter, the velocity of the concentrated energy flux at its entry into the MC material exceeds the rate of its dissipation through heat conduction and surface
NON-STATIONARY THERMAL FIELD EMISSION
223
self-diffusion processes. An additional contribution to energy dissipation from the activated evaporation of native MC atoms, subsequent vapor atom ionization and interaction of the high power density (over 10 W/cm) ion flux thus formed with the emitter material lead to avalanche sublimation of MC material and simultaneous reproduction (generation) of dense plasma. The conduction current in the metal is caused by both TFE from the emitting surface increasing with time and plasma ion flow neutralization at the emitter surface. Electron emission into vacuum from the surface of expanding MP follows a known mechanism (Molokovsky et al., 1991; Krendel, 1977). The conceptions offered for the physical mechanism of non-stationary thermal field electron emission show also that, contrary to the existing views of the EE mechanism (Elinson, 1974; Mesyats et al., 1984), the nonstationary process of electron emission is caused not by thermal instability and volume explosion of the emitter substance at high field electron emission current densities (J 10 A/cm) as believed earlier (Elinson, 1974; Mesyats et al., 1984), but rather is the result of a combination of the above listed interrelated secondary thermal field processes initiated at much lower TFE current densities (J 10 A/cm). Therefore, another very important question arises, namely: Why is it possible to attain such high current magnitudes (up to 1 A) and, respectively, high current densities (up to 10 A/cm and above (Fursey et al., 1984)), with the current jump simultaneously decreasing from 10 to 0 with decreasing 5 ? To answer this question we refer to experimental data (Fursey et al., 1984). According to these data, at small 5 the increase in total emission current (which is considered to be the thermal field electron emission current) from 10\ A to 1 A is accompanied by a 2-fold increase of the voltage V pulse height. This means that, provided the emission mech anism remains purely thermal-field, the field strength F also increases 2 times and at a current of 1 A is equal to 10 V/nm. At such field strength at the emitter tip apex, the field strength drop at the emitter tip periphery down to (4—5) V/nm for conventional hyperbolic or parabolic approximations of the MC tip surface shape occurs at 5 /6 (Gor’kov et al., 1962). Then, according to Equations (2.7) and (2.8) in Ptitsin, (1996), the value S 6 r and, hence, the values calculated (Fursey et al., 1984) from " the formula J $ I/4 r turn to be overestimated 5 times. However, the experimental results of Zhukov et al., (1988), where the ring effect in the nanosecond range of 5 was observed, show that an MP layer appears also at the above values of 5 . The MP layer alters the emitter ‘‘geometry’’ so that it will approach a so-called approximation shape: a sphere-hyperboloid or a sphere on a cone with a certain degree (*) of closeness to the spherical
224
V. E. PTITSIN
approximation. At * $ 0.25 the angle increases up to $ 5.6 /6 and, hence, the area of the real emitting surface S is much greater than r. The " calculations show that S (10 9 15) r and above. So the estimations " made indicate that for typical pointed (W) emitters of radius 0.1 m r 0.5 m, the maximum total current density does not exceed 10 A/cm and, hence, the maximum TFE current density at the emitter—MP boundary does not exceed 5 10 A/cm in a wide range of 5 from 10\ s to 10\ s. Another characteristic feature of the non-stationary electron emission is that, with decreasing 5 , the delay time (t ), at which the intense sublimation ! of the MC-emitter material begins, also decreases from 10\ s (and above) to a few nanoseconds (Elinson, 1974; Meysats et al., 1984). In the context of the phenomenological theory developed such significant variation of t is defined, as mentioned above, by the time taken by the dense ! plasma formation at various initial values of 8 and T due to TFE current flowing. The estimations show that the main contribution to t is from the ! adatom lifetime (according to Frenkel) on closely packed emitter planes and also from the electron—ion relaxation time in dense plasma. When the initial temperature T changes from 1800 K to 3600 K, the lifetime varies from 10\ s to 10\ s (Ptitsin, 1996). To summarize, it may be concluded that the conceptions of the nonstationary thermal field electron emission mechanism developed in this work essentially complement and extend the existing approaches to explosive breakdown or so-called EE (Elinson, 1974; Mesyats et al., 1984). In our opinion, conceptions of non-stationary thermal field electron emission, explosive breakdown, and EE are conceptually similar in that, according to different authors, the non-stationary emission process in the vacuum diode with a pointed metal emitter is initiated in high electric fields when the velocity of the concentrated energy flow at its entry into the emitter material is above its dissipation rate. In the present work, in contrast to Elinson, (1974); Mesyats et al., (1984), it has been taken into account that at high density current TFE the energy dissipation takes place not only through the heat conduction mechanism, but also through self-diffusion and activated evaporation of native atoms from the emitter surface. Analysis of these phenomena and their consequences has, first, shown that the non-stationary electron emission can be initiated already at the initial current density 10 A/cm instead of (10—10) A/cm as supposed earlier and, second, helped to explain the mechanism of dense plasma formation and abandon the known concepts of spatial explosion of the emitter tip. Due to such ‘‘historically established’’ state of affairs, one and the same physical process of non-stationary high current density electron emission
NON-STATIONARY THERMAL FIELD EMISSION
225
has different names. In fact, it is called ‘‘non-stationary thermal field emission’’ here and ‘‘explosive breakdown’’ (Dyke and Dolan, 1956) or EE elsewhere (Nonheating Cathodes, 1974; Mesyats et al., 1984). Perhaps it would be better to give for this process of the non-stationary high current density electron emission a common name, for example, ‘‘phase transition emission’’ which most closely characterizes its physical meaning and mechanism.
Acknowledgments Author is grateful to Mrs. G. D. Gelever and Mrs. G. G. Levina for valuable assistance in this work. Author also wishes to thank the Russian Foundation for Basic Research for financial support of this work (Grant No. 98-0218101).
References Aizenberg, N. B. (1964). ‘‘About the Influence of the Space Charge on the Form of the Current Voltage Characteristics of the Field Emission Cathodes’’ Radiotechnika i Electronika. 9, 2147 (in Russian). Aizenberg, N. B. (1954). ‘‘About the Role of the Space Charge in the Spherical Electron Projectors’’ Z. Tekh. Fiz. 24, 2079—2082 (in Russian). Barbour, J. P., Charbonnier, P. M., Dolan, W. W., Dyke, W. P., Martin, E. E., and Trolan, J. K. (1960). ‘‘Determination of the Surface Tension and Surface Migration Constants for Tungsten’’ Phys. Rev. 117, 1452. Barbour, J. P., Dolan, W. W., Trolan, J. K., Martin, E. E., and Dyke, W. P. (1953). ‘‘Space-Charge Effects in Field Emission.’’ Phys. Rev. 92, 45. Bell, A. E., Swanson, L. W. (1979). ‘‘Total Energy Distribution of Field Emitted Electrons at High Current Density.’’ Phys. Rev. B19, 3353. Bellustin, S. V. (1939). ‘‘To the Theory of Current in Vacuum. III. Spherical Electrode Case’’ Z. Experimentalnoi i T eoreticheskoi Fiziki, 9, 857 (in Russian). Boersch, H. (1954). ‘‘Ezperimentelle Bestimmung der Energiever-teilung in Thermisch Ausgelosten Electrononstrahlen.’’ Z. Phys. 139, 115. Christov, S. G. (1966). ‘‘General Theory of Electron Emission from Metals.’’ Phys. Stat. Sol. 17, 11. Dolan, W. W., Dyke, W. P., and Trolan, J. K. (1953). ‘‘The Field Emission Initiated Vacuum Arc II, The Resistivity Heated Emitter.’’ Phys. Rev. 91, 1054. Dravin, H. W. (1961). ‘‘Zur Formelmabigen Darstellung der Jonisierung Squerschnitte Gegeinuber Elektronenstob’’ Z. fur Physik. 164, 521. Drechsler, M. (1988). ‘‘Microscopy of the Thermal Roughening of Crystal Faces.’’ Journ. de Physique, Coll. C6, suppl., An. No. 11, Tome 49, C6—87. Dyke, W. P., Dolan, W. W. (1956). ‘‘Field Emission.’’ In Advances in Electronics and Electron Physics. Vol. 8 (Ed. L. Marton). New York: Academic Press. Dyke, W. P., Trolan, J. K. (1953). ‘‘Field Emission: Large Current Densities, Space Charge and the Vacuum Arc.’’ Phys. Rev. 89, 799.
226
V. E. PTITSIN
Fowler, R. W., Nordheim, L. (1928). ‘‘Electron Emission in Intense Electric Fields’’ Proc. Roy. Soc. Ser. A119, 173. Frenkel, Ya. I. (1958). ‘‘On the Surface Particle Walk in Crystals with Natural Roughness of Faces.’’ In Collection of Selected Papers by Ya. I. Frenkel. Vol. 2, Moscow: Leningrad Publ. USSR. Ac. Sciences. Fursey, G. N. Ptitsin, V. E., Krotevich, D. N. (1984). ‘‘Spontaneous Migration of the Surface Atoms at Top Current Densities of Field Emission Initiating Vacuum Breakdown.’’ Proc. XI Intern. Symp. on Discharges and Electrical Insulation in Vacuum, 1, 69. Fursey, G. N., Zhukov, V. M., Baskin, L. M. (1984). ‘‘Limiting Values of Field Emission Current Density and Pre-Explosion Effects.’’ In High Emission Current Electronics (Ed. G. A. Mesyats). Novosibirsk: Nauka Publ., Siberian Branch (in Russian). Gadzuk, J. W., Plummer, E. W. (1973). ‘‘Field Emission Energy Distribution’’ Rev. Modern. Phys. 45, 487. Geguzin, Ya. E., Koganovsky, Yu. S. (1984). Diffusion Processes at the Crystal Surface. Moscow: Energoatomizdat (in Russian). Glazanov, D. V., Baskin, L. M., Fursey, G. N. (1989). ‘‘Kinetics of Pulsed Heating of Pointed Field Emission Cathodes of Real Geometry by High Density Emission Current.’’ Zhurnal Teknicheskoi Fiziki. 59, 60 (in Russian). Gor’kov, V. A., Elinson, M. I., Yakovleva, G. D. (1962). ‘‘Theoretical and Experimental Studies of Pre-Breakdown Phenomena in Field Electron Emission’’ Radiotekhnika i Elektronika, 7, 1501 (in Russian). Hirth, J. and Pound, G. (1957). ‘‘Evaporation of a Metal Crystals’’ J. Chem. Phys. 26, 1216. Ion Bombardment Sputtering of Solids. (1984). Issue 1 (Ed. R. Berish). Moscow (in Russian). Kaganov, M. I., Lifshits, I. M., Tanatarov, L. V. (1956). ‘‘Relaxation between the Electrons and the Phonons.’’ Zhurnal Experimentalnoi i Teoreticheskoi Fiziki, 31, 232 (in Russian). Kaminsky, M. (1967). Atomic and Ionic Collisions at the Metal Surface. Moscow: Mir 338 (in Russian). Knauer, W. (1981). ‘‘Energy Broadening in Field Emitted Electron and Ion Beams’’ Optik 59, 335. Kompaneets, A. S. (1959). ‘‘About the Influence of the Space Charge on the Field Emission.’’ Dokladi Akademii Nauk SSSR. 128, 1160 (in Russian). Krendel, Y. E. (1977). Plasma Electron Sources, Moscow (in Russian). Krotevich, D. N. (1985). Ph.D. Degree Thesis, All-Union Research Center for Surface and Vacuum Properties Studies, Moscow (in Russian). Krotevich, D. N., Ptitsin, V. E., Fursey, G. N. (1986). ‘‘Observation of Fine Structure of the Restructured Microcrystal Surface by Pulsed Field Emission Microscopy.’’ Fizika Tverdogo Tela. 28, 3722 (in Russian). Krotevich, D. N., Ptitsin, V. E., Fursey, G. N. (1985). ‘‘Spontaneous Restructurization of the Field Emission Cathode at the Ultimate Field Emission Current Density Take-off.’’ Z. Tekh. Phys. 55, 625 (in Russian). Landau, L. D., and Lifshits, E. M. (1982). ‘‘Electrodynamics of Continuous Media,’’ Moscow: Nauka (in Russian). Landau, L. D., Lifshits, E. M. (1973). Mechanics, Moscow: Nauka (in Russian). Landau, L. D., Lifshits, E. M. (1976). Statistical Physics, Pt. 1, Moscow: Nauka (in Russian). Levine, P. H. (1962). ‘‘Thermoelectronic Phenomena Associated with Field Emission.’’ J. Appl. Phys. 33, 582. Lewis, T. J. (1956). Phys. Rev. 101, 1694. Martin, E. E. (1960). ‘‘Research on Field Emission Cathodes.’’ Air Develop. Div., Ohio, Tech. Rep, No. 59-20 (AD-272760), Field Emission Corp., McMinnvile, Oregon. Mesyats, G. A., Proskurovsky, D. I. (1984). Pulsed Electric Discharge in Vacuum, Nauka: Novosibirsk (in Russian).
NON-STATIONARY THERMAL FIELD EMISSION
227
Mitterauer, J., Till, R., Fraunschiel, E. (1975). ‘‘The Temperature of Field Emitting Surface.’’ Proc. XII Int. Conf. Phenomena in Ionised Gases, Eindhoven, 1975, Contr. Papers, Amsterdam, part 1, p. 249. Modinos, A. (1990). Field, T hermionic and Secondary Electron Emission Spectroscopy (Ed. G. N. Fursey). Moscow: Nauka. Molokovsky, S. I., and Sushkov, A. D. (1991). ‘‘Intense Electron and Ion Beams,’’ Moscow: Energoatomizdat, p. 35 (in Russian). Molokovsky, S. I., Sushkov, A. D. (1991). Intense Electron and Ion Beams, Moscow: Energoatomizdat (in Russian). Muller, E. W., Tsong, T. T. (1972). Field Ionization Microscopy. Moscow: Metallurgy. Murphy, E. L., Good, R. H. (1956). ‘‘Thermionic Emission, Field Emission and Transition Region.’’ Phys. Rev. 102, 1464. Nakamura, S., Kuroda, T. (1969). ‘‘On Field Evaporation end Forms of a bcc Metal Surface Observed by a Field Ion Microscope.’’ Surf. Sci. 17, 346. Nonheating Cathodes (1974). (Ed. M. I. Elinson). Moscow: Sov. Radio. Nottingham, W. B., (1941). Phys. Rev. 59, 907. Primary Processes of Crystal Growth. (1959). Coll. Papers (Eds. G. G. Lemmlein and A. A. Chernov). Moscow: Publ. House of Foreign Literature (in Russian). Ptitsin, V. E. (1990). ‘‘On a Mechanism of Vacuum Breakdown Initiated in a Point-Cathode Diode.’’ Proc. 14 Int. Symp. on Discharge and Electrical Insulations in Vacuum, Sante Fe, USA, p. 77. Ptitsin, V. E. (1990). ‘‘Surface Diffusion and Prebreakdown Phenomena’’ Proc. XIV Int. Symp. on Discharges and Electrical Insulation in Vacuum, Santa Fe, USA, p. 269. Ptitsin, V. E. (1991). ‘‘Instability of Thermal Field Electron Emission’’ Surface Science 246, 373. Ptitsin, V. E. (1992). ‘‘Atom Desorption Effects on Electron Emission in Intense Electric Fields.’’ IAI RAS (in Russian). Ptitsin, V. E. (1992). ‘‘To the Problem of the Vacuum Breakdown.’’ Pisma v Zhurnal Experimentalnoi i Teoreticheskoi Fiziki. 55, 325 (in Russian). Ptitsin, V. E. (1993). ‘‘Instability of the Metal Microcrystal Surface at Intense Electron Emission.’’ J. Vac. Science and Technology A. 11(5), 2447. Ptitsin, V. E. (1996). ‘‘Thermal Field Processes Activated by Exposure of Condensed Matter to Strong Electric Fields and Concentrated Energy Fluxes.’’ Doct. Sci. Degree Thesis, Inst. Analytical Instrumentation. St. Petersburg: Russian Academy of Sciences. Ptitsin, V. E. (1996). ‘‘Thermal Field Processes Activated by Exposure of Condensed Matter to Strong Electric Fields and Concentrated Energy Fluxes.’’ Autoreview of Doct. Sci. Degree Thesis, Inst. Analytical Instrumentation. St. Petersburg: Russian Academy of Sciences. Ptitsin, V. E., Koltsov, S. N. (1998). ‘‘Calculation of Field Strengths Caused by Emitted Electrons at the Field Emission Cathode Surface Using the Imaging Technique.’’ Izvestia Akademii Nauk Russian Fed. (Fiz. Ser.), No. 10, p. 1991 (in Russian). Ptitsin, V. E., Komyak, N. I., Koltsov, S. N. (1998). ‘‘Emitted Electron Space Charge Effect on Thermal Field Emission.’’ Doklady Akademii Nauk Russian Fed. No. 3 (in Russian). Rakhovsky, V. I. (1970). Physical Bases of Electrical Current Switching in Vacuum. Moscow: Nauka (in Russian). Rayzer, Yu. P. (1987). Physics of Gas Discharge. Moscow: Nauka (in Russian). Shrednik, V. N., Pavlov, V. G., Rabinovich, A. A., and Shaikhin, B. M. (1974). ‘‘Intense Electric Field and Heating Effects on Metallic Tips.’’ Izvestia AN SSSR (Phys. Ser.), 38, 296 (in Russian. Slivkov, I. N. (1986). High-Voltage Processes in Vacuum. Moscow: Energoatomizdat (in Russian). Smirnov, V. I. (1969). Course of Higher Mathematics. Vol. 3. Moscow: Nauka (in Russian).
228
V. E. PTITSIN
Sokolskaya, I. L. (1956). ‘‘Surface Migration of the W Atoms in the Electric Field.’’ Z. Techn. Fiz. 26, 1177. Swanson, L. W., Bell, A. E. (1973). Advances in Electronics and Electron Physics 32, 193 (Ed. L. Marton). New York: Academic Press. Swanson, L. W., and Crouser, L. C., Charbonnier, F. M. (1966). ‘‘Energy Exchange Attending Field Emission.’’ Phys. Rev. 151, 327. Swanson, L. W., Crouser, L. C. (1967). ‘‘Total Energy Distribution of Field Emitted Electron and Single-Plane Work Functions for Tungsten.’’ Phys. Rev. 163, p. 622. Tunneling Phenomena in Solids. (1973). (Ed. Ya. I. Perel). Moscow: Mir (in Russian). Vibrans, G. E. (1964). ‘‘Vacuum Voltage Breakdown as Thermal Instability of the Emitting Protrusions.’’ J. Appl. Phys. 35, 2855. Wang, S. C., Tsong, T. T. (1982). ‘‘Field Temperature Dependence of the Directional Walk of Single Adsorbed W Atoms on the W (110) Plane.’’ Phys. Rev. B, 26, 6470. Wood, R. W. (1897). ‘‘A New Form of Cathode Discharge and the Production of X-Rays together of Some Note of Diffraction.’’ Phys. Rev. 5, 1. Young, R. D. (1959). ‘‘Theoretical Total-Energy Distribution of Field Emitted Electrons.’’ Phys. Rev. 113, p.110. Zhdanov, V. P. (1988). Physicochemical Surface Processes. Nauka: Novosibirsk (in Russian). Zhukov, V. M., Polezhaev, S. A. (1988). ‘‘Changing of the Pointed Emitter Surface in Nanosecond Electric Fields,’’ Radiotekhnika i Electronika 33, 2360 (in Russian). Zhukov, V. M., Polezhaev, S. A. (1989). ‘‘Evolution of th Microcrystal Surface at the Point Tip in the Thermal Field Environment.’’ Z. Tekhn. Fiz. 59, 130 (in Russian). Ziman, J. (1962). Electrons and Phonons.’’ Moscow: Publ. House of Foreign Litrature (in Russian).
Appendix About the Feasibility of Creating a High Brightness and Angular Emission Intensity Thermal Field Cathode for Electron ‘‘Quasi-Lasers.’’ The inference drawn on the mechanism of emission instability in high electric fields have helped, first, to find out the physical factors limiting the emission capacity of conventional thermal field pointed emitters, which are usually fabricated by electrochemical etching of wires made of refractory transition metals (W, Mo, Ta, Nb) and, second, to offer a conceptually new nanotechnology to make pointed emitters with unique electron-optical characteristics: brightness of up to 10 A/cm sr, angular emission intensity of 10\ A/sr. Based on the conceptions developed, it has become possible to formulate physical conditions, which should be met by an ‘‘ideal’’ emitterelectron source for an electron ‘‘quasi-laser’’ (EQL). These conditions are as follow. 1. The emitting surface of a pointed emitter should be a closely packed MC plane. This is necessary to minimize the surface concentration of emitter substance native atoms, which at high emission current densities, go from
NON-STATIONARY THERMAL FIELD EMISSION
229
tightly bound intralattice states to loosely bound surface states, that is, adatom states, due to self-heating of the MC tip and excitation of the subsurface layer electron subsystem. The minimum surface concentration of atoms in adatom states provides minimization of surface self-diffusion fluxes and of the activated evaporation of adatoms into vacuum. The latter leads to the maximum stability of emission owing to decreasing emission current fluctuations (noise) and the counterflow of ions bombarding the emitting surface. 2. The mean binding (or cohesion) energy of atoms forming the emission surface should be maximum possible. This condition also implies minimum self-diffusion and activated evaporation flows, thus ensuring a stable geometry of the emitting surface and, hence, the long-term stability of electron-optical parameters of the cathode-emitter. 3. The work function of the emitting surface should be minimum possible. This is necessary to keep the stationary TFE from going to an unstable stage up to maximum possible (or limiting) values of emission current density (J ). In particular our calculations have shown that reducing the work function of the emitting surface from 4.5 eV (for W) to 2.8 eV (for ZrO/W(100) results in ‘‘shifting’’ the J value from 10 A/cm to 10 A/cm. 4. Thermal calculations relating to emitter tip self-heating by the highdensity emission current have shown that J depends also on the curvature radius of the emitting surface. The value of J was calculated to be on the order of 10 A/cm, with the surface curvature radius being comparable to the characteristic electron free path length as regards the electron-photon scattering. It has been found that the above combination of properties of an ‘‘ideal’’ electron emitter for EQL is exhibited by ZrO/W(100 thermal-field emitters made using special emitting surface forming techniques. Omitting the details of the nanotechnology developed, we only note that it is based on known data according to which at thermal-field rearrangement in an intense electric field, it is possible to facet the original rounded-smoothed W MC tip in a controllable and reproducible manner, as well as on the fact that thermalfield emission electrons can be confined in small, solid angles by means of selective adsorption of Zr atoms on closely packed W(100 type faces. In view of the aforesaid, the physical-nanotechnology principle of the new technique for enhancing the emission capacity, which we called the ‘‘duallocalization’’ of emission, is composed of successive application of two known localization technologies. This was implemented as follows. The first step consisted in rearrangement of the W(100MC in intense electric field, which lasted until forming a tetrahedral angle in the (100
230
V. E. PTITSIN
direction. It was followed by the deposition of a Zr monolayer on the facetted MC surface with a calibrated molecular gun. The thermal-field ZrO/W(100 cathode thus fabricated exhibited the claimed above unparalled brightness and angular emission intensity values in the stationary mode of TFE. Other electron-optical characteristics of the cathodes such as the half-width of the emission-electron energy distribution and noise spectral power density at various stages of the ZrO/W(100 cathode development were measured in automated ultrahigh vacuum units. The measurements have shown that the above characteristics substantially depend on the thermal-field conditions of the ZrO/W(100cathode operation. The most optimal conditions were found to be those of Schottky emission mode. Emission and Electron — Optical Parameters of the Thermal Field ZrO/W(100 Cathode with High Brightness and Angular Emission Intensity 1. 2. 3. 4. 5. 6. 7. 8.
Brightness in the stationary mode of emission, A/cm sr Angular emission intensity in the stationary mode of emission, mA/sr Total current in the stationary mode of emission, mA Half-width at half-height of the emitted electron energy distribution in the Schottky emission mode, eV Operating temperatures, K Emission current stability, %/hr Operating vacuum, Pa Service life, hr
up to 10 up to 10 up to 5—10 1.5 1500—1800 1 10\ 2000
So the theoretical and experimental results obtained have demonstrated the theoretical and practical feasibility of creating electron emitters providing electron flow densities comparable to photon densities reached with highpower laser sources. To solve the problem of forming high-power density submicron electron probes from electron fluxes emitted by ZrO/W(100 cathodes, a versatile multilens electron-optical system has been developed, whereby it is possible to find an optimal aperture and correct for aberrations of lenses in the electron gun in accordance with the probe current, electron energy, and working distance to the object. Based on the calculations of aberration coefficients (AC) of the lens system with a virtual intermediate electron source image, the possibility of building electron gun with a stepwise correction of aberrations, wherein each preceding lens reduces AC of subsequent lenses. By combining doublepole and single-pole magnetic lenses with unarmored coils and electrostatic
231
NON-STATIONARY THERMAL FIELD EMISSION
lenses, it is possible to obtain a versatile lens system with minimum aberrations for the entire range of probe currents electron energies and working distances to the object. The calculation of electron beam power densities in electron gun with developed thermal field cathodes shows that at average quite attainable values of AC of about 0.5 cm and probe sizes of 0.1 m power density values at the specimen surface are equal to (10—10)W/cm. Further optimization of the electron gun for EQL would result in even higher power density levels of up to 10 W/cm and over. The numerical calculated estimates of the possible electron—optical parameters of such electron guns with a thermal field ZrO/W(100 cathode are given below. Electron—Optical Parameters of the Electron Gun for EQL 1. 2. 3. 4. 5. 6. 7. 8.
Electron probe power density range on the exposed surface, W/cm Electron probe energy range on the exposed surface, keV Electron probe diameter range on the exposed surface, nm Electron probe minimum diameter, nm Beam current, A Working distance, cm Residual gas pressure in the cathode region, Pa Overall dimensions of the electron gun (diameter—length), cm
10—10 0.2—5.0 2—1000 2 10\—10\ 0.6—5 10\ 5—14
a This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 112
Theory of Ranked-Order Filters with Applications to Feature Extraction and Interpretive Transforms BART WILBURN University of Arizona, Optical Sciences Center, Tucson, Arizona
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . II. Statistical Approach to Ranked-Order Filters . . . . . . . . . . . III. Mathematical Logic Approach to Ranked-Order Filters . . . . . . . A. Logical Construction . . . . . . . . . . . . . . . . . . . . B. Logical Investigation . . . . . . . . . . . . . . . . . . . . C. Two-Dimensional Analysis . . . . . . . . . . . . . . . . . D. The Grammar of (L ) Fixed-Point Root Combinations . . . . .
E. Oscillating Roots . . . . . . . . . . . . . . . . . . . . . F. Octagonal Hexagonal and 3D Filters . . . . . . . . . . . . . IV. A Language Model Based on Ranked-Order Filters . . . . . . . . . A. The Necessity and Possibility of Interpretive Transforms of Imagery B. Satisfaction of a Propositional Language System . . . . . . . . C. Reflections . . . . . . . . . . . . . . . . . . . . . . . . D. Ontological Considerations . . . . . . . . . . . . . . . . . V. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
233 235 241 243 247 255 268 282 292 307 308 314 323 325 331 332
I. Introduction In every discipline there are objects and constructs of intellect that are almost trivially simple in form, and yet are fascinating because they exhibit complex behavior. The discipline of mathematics is the host of many such constructs and the ranked-order filter is one of them. The median window filter is the most common manifestation of the ranked-order filter and it is extraordinarily simple in construction. However, an understanding of its behavior, especially in two dimensions, has been slow in coming. Indeed, for lack of understanding, the median window filter has until recently fallen from grace and been regarded as a minor tool of limited use. The purpose of this essay is to shed some light on the behavior of the general class of ranked-order filters exemplified by the median window filter. An additional purpose of this essay is to suggest some rather promising applications of this understanding for feature extraction from imagery, and perhaps also for linking feature extraction to automated image interpretation, or artificial intelligence. 233 Volume 112 ISBN 0-12-014754-8
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright 2000 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/00 $35.00
234
BART WILBURN
We have known about ranked-order filters in various forms for a long time; the most well-known form being the median window filter. These filters take various names such as stack filters, median, or minimum, mean square windows, and so on. The commonality of these filters, explaining the term ‘‘ranked-order,’’ is that they all operate on the basis of determining the filter output according rank in a sorted list of the data, or by a function of a sorted list of data. The data in the sorted list are those data selected by a window as it moves incrementally along a stream of data. Nothing could be simpler, yet we may wonder about all the attention given to them in this and many previous studies. The simplicity of the median window filter belies the elegant insight of Tukey (1970) who introduced it as a simple scheme for presenting the essential characteristics of data that have meaning. He referred to it as the ‘‘box’’ method of sampling (or presenting) the essential information-bearing characteristics of a set of data. The underlying philosophy of the method was the assumption that the meaning of a set of data was conveyed by the relationship of the data, one to another, in its context. Following this introduction, the ‘‘box’’ method was extended to be the ‘‘repeated median filter,’’ or RMF. This underlying philosophy seems to have been largely overlooked, and the filter was popularly applied to sequential data for removing spiky noise while preserving edge gradients, that is, the features regarded as important. The median window indeed does this, but this application ignores its other more interesting and sometimes vexing characteristics associated with preserving edge gradients that became apparent after its introduction. Those characteristics were the phenomena of ‘‘roots’’ or patterns of data that are transparent to the filter. We will explore the roots and see how they may be applied to the problem of feature extraction without knowing the feature a priori. This author at least, regards the possibility of this application to be what distinguishes the median window from other filters, many of which are superior in noise-reduction capability. This chapter is divided into two approaches, that is, development of the theory of the filter, and then applications of the filter. Both approaches are derived from other works (Wilburn, 1998a; 1998b) with some new developments not previously published. The development of the theory is approached from two perspectives, statistical and logical. The statistical analysis is presented in Section II, and is concerned with the probability distribution of the filtered data and the associated signal-to-noise ratio (SNR). As noted, the filter has not been widely used for noise reduction and SNR theory of the filter has not been fully described. Nevertheless, the SNR theory is offered here for completeness. However, it is the logical theory that opens the door for our understanding of how the filter functions in a way that enables us to use it as a tool based on the morphology of the roots of
THEORY OF RANKED-ORDER FILTERS
235
the filter in one, two and possibly three dimensions. The logical analysis of the filter is much richer in content and leads to the logical theory of the ranked-order filter presented in Section III. II. Statistical Approach to Ranked-Order Filters The roots of the filter are a relatively small number of patterns of data satisfying a relational structure and are transparent to the filter. We will find that because these roots are defined by a relational structure, the co-joining, or conjunction, of them is similarly constrained by a relational structure. This light of understanding suggests we may constrain the filter to explore applications such as feature extraction and the notion of automated image interpretation, and for designing the filter to achieve other desired outcomes in signal processing. The statistical approach to the ranked-order filter, whether formulated by a distribution function or by order statistics, has been the approach of most analyses of the filter prior to this one. The method of order statistics led to a description of the 1D roots, both fixed-point and oscillating, primarily presented for bi-valued data, but it fell short of a functional understanding of the filter sufficient to control it for detection of roots. The method of order statistics will be referred to again later in Section III when we address the roots of the filter. The formulation of the filter as a multinomial distribution function leads to a solution for estimating the signal-to-noise (SNR). As a practical matter, however, estimating the SNR has not been a subject of concern in most applications of the median window filter. This is because most applications of the filter have been for morphological effects, for example, to remove spikes or to preserve edges, and in those cases, distortion rather than SNR was the matter of concern. Nevertheless, the SNR solution is interesting in its own right mathematically, and is given here for completeness and because the method may find utility in other problems that can be presented in a similar form. As remarked in the foregoing, the ranked-order (RO) filters derive their name from the datum output of the filter being a member of an ordered subset. We may express this somewhat more rigorously as: The output datum of the filter is a member of an ordered subset ( , . . . , of a set of input data ,x -; , - : ,x -. In the case of the median window filter, the output of the filter is the median of ( , . . . , , thus N is odd and generally expressed as N : 2k ; 1. The derivation of the generalized SNR for RO filters is based on the SNR defined as SNR : (x / (x), where (x is the mean value of multivalued data, and (x) is the standard deviation of that data. The input data to the filter ,x - are assumed to be
236
BART WILBURN
independent and identically distributed (iid), according to a probability density function p (x), and the filtered data, or output data, are distributed according to a multinomial distribution p (x). The datum selected from the ordered subset, x + , -, must satisfy three conditions defining x as: x : , an output datum. The conditions are: (a) That + (x, x ; dx); p (x) dx (b) That n of ( , . . . , are greater than ; (1 9 P (x)). (c) That m of ( , . . . , are less than ; (P (x)). For m ; n ; 1 : N, the rank of the filter is determined by m and n, for example, for a median window m : n. The multinomial construction (Frieden, 1998) of P (x) : p (x) dx is N N91 p (x) p (x) dx p (x) dx dx. P (x) : 19 1 m \ \ \ (1)
The task before us is to solve Eq. (1) for (x and (x) to estimate the SNR of data filtered by a ranked-order filter. To do this, we must stipulate that the probability density p (x) is defined to be integrable over x + X; P (x) : p (x) dx. With this stipulation, we have: \ N N91 (P (x)) (1 9 P (x)) p (x) dx. (2) P (x) : 1 m \ We may now make use of a probability integral transform y : P (x); dy : p (x) dx, and have P (x) expressed in Y-space as P (y) *V P (y) : B\(m ; 1, n ; 1) y(1 9 y) dy. (3) Please notice that the multinomial coefficient is identified as a beta function B\(m ; 1, n ; 1). The density function p (y) follows now as p (y) : B\(m ; 1, n ; 1) y(1 9 y). We can now see immediately that the first and second moments of p (y) are beta functions: (y : B\(m ; 1, n ; 1) y ?(1 9 y) dy, B(m ; 2, n ; 1) (y : , (4) B(m ; 1, n ; 1)
B(m ; 3, n ; 1) (y : . B(m ; 1, n ; 1)
(5)
237
THEORY OF RANKED-ORDER FILTERS
From these expressions, the (y and (y) are readily computed using ;(); (2 ) B(, 2) : , ;( ; 2) ;( ; 1) : ! The results are shown as follows for the median, maximum and minimum windows: m : n : (N 9 1)/ 2; (y
1 (4(N ; 2), : , (y) : \
2
(6)
N N , m : N 9 1, n : 0; (y : , (y) :
(N ; 1) (N ; 1) (N ; 2) 1 m : 0, n : N 9 1; (y : , (y) : (y).
(N ; 1)
(7) (8)
As implied in the results, (y) is symmetric about (y). The reader may
see quite readily that (y and (y) could be evaluated for any arbitrary value of m or n, subject to m ; n ; 1 : N, corresponding to the rank of the output selected by the filter. The SNR (y) could, of course, be computed from these quantities, but it is of little value. The desired quantities are (x and (x), and they are obtained by transforming from Y to X by P\ (y) utilizing its properties as a transform function and as a probability function. It is important to realize that P (x) is in fact P (0 9 x); thus P (x) is isomorphic from X to Y and P\ (y) is an inverse transform. Because P (x) is a probability transform, P\ (y) is the fractile in X at probability y; thus P\((y ) is the fractile (x 9 0 at probability (y and we have: P\ ((y ) : (x . (9) It is not normally the case to deal with the mean value of a probability, thus it may be worth a cautionary note that (x is not the mean value of the inverse transform, but is the inverse transform of the mean value of the probability (y . The transform of (y) is not direct by the same reason ing. The standard deviation is normally thought of as simply the square root of the variance, but here it is a measure in Y of probability y in the interval y of (y ; (y), or (y 9 (y). For this reason we must have a reference point for (y) and that reference is (y . This is especially true because P\ (y) is not symmetric in units of X about (x in the general case of all (m, n) for some p (x) dx, or of all p (x) dx for any (m, n). In other
238
BART WILBURN
words, the (x) can depend on which side of (x it is measured, and on the choice of (m, n), with respect to the symmetry of p (x). The solution is to form two measures t : (y ; (y) and t\ : (y 9 (y) such that P\ (t < ) : (x < < (x). The properties of this measure are that, whereas the intervals: (x , (x ; (x) and (x , (x 9 \ (x) are not equal for all cases of (m, n) and their sum is an unknown quantity according to p (x) dx, the corresponding intervals in Y are (y , t ; (y , t\ : const. and are computable according to Eqs. (6) to (8) invariant of p (x) for all cases. These conditions allow us to compute a (x) that subsumes asymmetric cases of p (x) dx and choices of (m, n) as: t
1 (x) : P\ (y) . (10) 2 t\
The SNR (x) may now be computed as defined: (x (11) SNR (x) : . (x) We will apply this formalism to three distributions of data: Rect(x), Normal(x) and Exp(x). The window filter used for explication is the median because it is simple and also because it is the most commonly used ranked-order filter. For this case, m : n : k; k : (N 9 1)/ 2. Rect(x) : The general formulation is:
1 x9a P (x) : Rect b b 1 (y : b
(12)
b (x 9 a ; 2
1 b
(x) : b(y) ; a 9 . 2 2 \ When we apply the case of the median window, that is, ‘‘w’’ is ‘‘med’’ in these expressions, we get: (x
b : b(y ; a 9
2
(x : a.
(13)
THEORY OF RANKED-ORDER FILTERS
239
Because p (x) is symmetric about a and m : n, P\ (y) is symmetric about
(x , P\(t ) : P\ (t\), and (x) : \(x). Thus:
(x) : b (y). (14)
The resulting SNR for Rect (x) is: SNR
(x)
:
2a (N ; 2. b
(15)
Normal(x): The Normal distribution N((x, (x)) is most easily analyzed by temporarily transforming it to the Gauss distribution N(0, 1) by z: thus, for p (z) : N(0, 1), P (z) is
(x 9 (x) (2(x)
,
1 P (z) : (1 ; erf (z)). 2 Here again, for the case of the median window, we have symmetries of m : n and P (y) about (x , but in this case P (z) is nonlinear. The general
formalism is:
1 (y : (1 ; erf ((z )) 2
1
(z) : erf \(2y 9 1) . 2 \ From these relationships, we realize the benefits of symmetry and find: erf( ) : erf(\ )
(z : 0
erf( (z)) : 2 (y).
(z) : erf\(2 (y)).
For practical application in most cases, we may note that for (y)
sufficiently small, that is, less than about 0.45 (N 3), erf( (z)) may be
approximated by (z) within an error of 0.10 at the user’s discretion. In
any case, we have (z) in terms of N, and we may transform from Z back
240
BART WILBURN
to X with the result of: (x : (x,
(16)
(x) : (2(x) (z),
(x . SNR (x) :
, (2(x) erf\(\ (N ; 2)
(17) (18)
In the case of the maximum and minimum windows applied to Normal(x), the reader will note that the approximation of (z) cannot be made, that
is, erf( (z)) " (z). Moreover, the symmetries are lost and the
solution must proceed with tabulated data of the error function for a particular window of length N using Eqs. (7) to (11). Exp(x): The Exp(x) density function is asymmetric about any point in its domain and is nonlinear, yet the solution is closed for all choices of (m, n). The solution is shown for an arbitrary window filter and is derived in the same manner as for the Rect(x) and Normal(x): P (x) : 1 9 exp
9x a
(19)
(x : 9a ln(1 9 (y ), a 1 9 (y 9 (y) (x) : 9 ln , 2 1 9 (y ; (y) 9a ln(1 9 (y ) . SNR (x) : (x) N.B.: The SNR (x) is not a function of the parameter a of the distribution, Eq. (19).
(20) (21) (22) data
Verification The estimates of SNR (x) derived in the foregoing are verified by a computer simulation of a median filter of length N : 9. The basis of comparison is the SNR gain G defined as G : SNR (x)/SNR (x). The filter
was applied to three strings of data X , n : 650, distributed according to: Rect(x): x : 2aRANF(y); a : 5, b : 2a Exp(x): x : 9a(1 9 ln(y )); y : RANF(y), a : 5
241
THEORY OF RANKED-ORDER FILTERS
a Normal(x): x : Rnorml(a, ), a : 5, : . 2(3 The results are shown in Table 1 for comparison of the estimated and measured values of : (x, (x , (x), (x), SNR(x), SNR (x), and G.
TABLE I Verification of SNR (x)
Rect(x)
(x: (x :
(x): (x):
SNR(x) : SNR (x):
G:
Normal(x)
Exp(x)
est.
meas.
est.
meas.
est.
meas.
5.00 5.00 2.89 1.51 1.73 3.32 1.92
4.96 5.00 2.87 1.47 1.72 3.40 1.98
5.00 5.00 1.44 0.58 3.47 8.61 2.48
5.04 5.06 1.50 0.60 3.37 8.43 2.50
5.00 3.47 5.00 1.55 1.00 2.23 2.23
5.08 3.70 5.24 1.65 0.97 2.24 2.31
The reader may verify that in all cases, except for the maximum window applied to Rect(x), the SNR (x) is less than the SNR (x). We pay a 0price in the SNR gain of the RO filters to realize the benefit of their morphological characteristics. The morphological characteristics are determined by the roots of the filter, which leads us to the following section. III. Mathematical Logic Approach to Ranked-Order Filters As mentioned earlier, the characteristic of the ranked-order filter that has attracted attention is that the median window (MW) has roots. This notion derives from the property of the MW to preserve edge gradients while suppressing noise. This property, however, is a consequence of the more general characteristic that ‘‘some’’ patterns of data simply pass through the filter unchanged. The peculiarity of these patterns is that they are not defined as to shape or value, but instead by the relationship of the data values to each other. Furthermore, these patterns are not restricted to edge gradients. This characteristic belies an important difference between the MW and almost all other digital filters that is often not appreciated. Almost all other filters replace the original data with another different set of data, that is, they generate a replacement set of data such as in the cases of the average window, the Wiener filter, or the maximum-entropy filter. The MW instead does one of the following two things to data: (1) It rearranges the
242
BART WILBURN
data, replicating some and throwing some away; or (2) it leaves a string of data unchanged if it is a root of the filter. However, there is this caveat: It leaves a string of data unchanged if it is a root AND if the filter is correctly implemented. We shall see later how it is possible to incorrectly implement the filter resulting in pseudoroots. Nevertheless, even then, it does not generate new data. Given in more mathematical terms: The range of the MW is a subset of the domain of the filter such that the output is an onto map of the input. The relational nature of the roots to the MW, and the fact that the roots are characteristic of the filter, suggest that the methods of mathematical logic (Schoenfeld, 1967) would be a fruitful approach to development of the theory of the RO filter exemplified by the MW. This is a departure from the previous theory based on order statistics (Justasson, 1982; Bovick et al., 1983), yet it builds directly on its results. The order statistics approach succeeded in the classification of roots to the MW in 1D, and provided the phenomenological basis for the logical theory by establishing the existence of fixed-point (FP) roots defined in terms of local monotonicity (Eberly et al., 1991; Longbotham, 1989). The results of order statistics provided a description of roots and data types as follows. If we posit a subsequence of data S : ( , . . . , ) within any sequence S of data, and a median window filter 3 (N), such that the
convolution of 3 (N) and S leaves S unchanged, 3 (N)* S : &
S , then S is a root of 3 (N). This defines S as type-I data with
respect to 3 (N). Data not having this property are called type-II data
with respect to 3 (N). There are two kinds of roots: fixed-point roots and
oscillating roots. Fixed-point roots have the property of contiguous data and are defined for both multivalued and binary (a ^ b, a b) data, whereas oscillating roots are defined only for binary data that oscillate between ‘‘a’’ and ‘‘b’’ as ‘‘abab . . . a.’’ The fixed-point roots of the median filter have been analyzed by Tyan (1982) and Longbotham (1989) using the methods of order statistics and described in terms of monotonicity (paraphrased here in the language of sets) as: A root of a median filter of length N, N : 2k ; 1, is a set of data , - : ( , . . . , ) such that for each + , -, ( , . . . , ) is either monotonically nonincreasing, ( . . . , or monotonically non
· · · , but not both, within a window of length decreasing, ( j ; k and j 9 k connecting each successive datum, , + , -. Such a sequence of data is called locally monotonic of order k ; 1, and designated: LOMO(k ; 1). This theorem is proved in this investigation as part of the development of a computable structure of a filter function that extends easily from one dimension to two dimensions and implies the existence of a logical grammar of fixed points.
THEORY OF RANKED-ORDER FILTERS
243
The development of the logical theory will involve two steps, that is, (1) logical construction, and (2) logical investigation. The logical construction will develop the schema for a mathematical model of the filter based on its structure and function. The logical investigation will employ this model to investigate the structure of data satisfying the conditions for roots, and further investigate the mathematical properties of the model for applications to signal and feature extraction. The investigation will begin with the 1D case for fixed-point (FP) roots and oscillating roots, and then introduce the coded window filter as a generalization of oscillating roots. The investigation will continue with analysis of roots structure and behavior for the 2D case in terms of root morphology and syntactic structure, and include example applications to feature extraction based on FP root representation of features. The chapter will conclude with an exploration of the development of a language model of features in imagery based on FP representation of features. A. Logical Construction We must begin this development with a description of the data u . This may seem trivial, but it is not. Data are usually regarded as a sequence of values, but here the data are regarded as an entity that has properties. The properties we are concerned with are value, or amplitude, and position in a sequence. This is an important concept because it will enable us to separate the notions of value and position. The construction proceeds as follows. We may consider the data as being composed of terms designating individual quantities having property u. Let us take note that if , . . . , are terms, and a function f is n-ary, then f, . . . , is a term. The data set u may be considered to be of this form where u is the symbol that is n-ary such that u, . . . , is a term. The number n is a natural number determined by u and is the index of u. We may use this form to associate with every u in a sequence u a term designating its position in that sequence. This convention enables separation of the index of a variable from the value of the variable so that we may construct functions of its position in a sequence independently of its value, or functions of its value independently of its position. The notational convention adopted is: u : (u) , i n such that for every n-tuple u : (u , . . . , u ) are associated functions ((u ), . . . , (u )) and (, . . . , where the (u ) are the value of the data element in the position indicated by . We may begin to reflect the structure of the MW with this representation of the data by defining a subsequence of u : u , that extends from i to i ; 2k: u : (u ) : ((u ), (u ), . . . , (u )), v : ( , , . . . , .
244
BART WILBURN
We notice some notational difficulties here if we want to retain identification of the designator with the index i. To avoid this, we may define a recursive function 2(a , j), a : (a , . . . , a +): 2(a , j) : a , j : 0, N 9 1 2((u ), j) : (u ) 2(v , j) : v (u ) : ((u ) , . . . , (u ) \ \) v : ( , . . . , \ \ u : (u ) v . This function allows us to define u : (u ) v where (u ) is the value of the jth term of u separable from its position in an ordered sequence: ( , . . . , , and preserves all necessary information to recover u from u by logical addition of u : u j u , u . This representation of the data defines a window u , of length N that selects N of u and associates an index with them of j : 0, N 9 1 for every increment of i; i : 1, n 9 (N 9 1), and constitutes a sampling function u of u. The u form a set: u : (u , . . . , u ) \ where every u is: u : (u , u , . . . , u ); i : 1, n 9 2k. The set u is a power set of u , and u : +\ u . We will employ this sampling function to construct the filter function applied to a data sequence u. The structure of the logical theory of the median filter is in terms of predicates and functions having the recursive properties of: K , ‘‘:,’’ and , F(a) : x(M(a, x) : 0) where x(. . . x . . .) is the -operator defined as the minimum x such that . . . x . . . is true. Some readers may not be familiar with the use of a predicate in this context. It is used here as a representation of the data that has semantic content, that is, the predicate ‘‘says something’’ about the data that is either true or false. In the simplest cases, it may seem a trivially assumed state by construction, for example, that the data are represented by the sampling function. We shall see, however, that the use of a predicate allows more complex statements about the data that enable construction of a filter function useful for applications such as we shall seek. Thus we define a recursive predicate R(a) and its representing function KR (a) : KR (a) : 0 if R(a) (read as: is true)
THEORY OF RANKED-ORDER FILTERS
245
: 1 if R(a), Def. ‘‘ ’’ is negation. We further define a function (a) subject to the predicate being true as: (a) R(a), to mean that both R and are defined for the argument a, or that both are undefined. In a pragmatic sense, we may read this as (a) having value not necessarily ‘‘0’’ as a function of a if and only if R(a) is true of a. We may generalize this notion to (a ) R(a ), (a ) R(a ), . . . , (a ) R(a ), and apply the generalization to the median filter 3 (N ) as
a function of u , (u , u ), where u is the input and u is the output. The result is the filtered output U , u : ((u ) , . . . , (u ) ) represented by a filter function I(u, u ) partially defined as follows: I(u , u ) : (u , u ) iff R(u ) < I(u
\
, u ) : (u , u ) iff R(u ), \ \ \ \
where R(u ) x(K R (u , x) : 0). In order to complete the definition of the filter function we must define the function (u , u ) more explicitly as a window function: (u , u ) : x[H(G (u , x), u )], u : +\ u The function G(u , x) involves the introduction of two functions: a selection function + ((x , . . . , x ) : x and an ordering function O(x) : ((x) '? , + ? (x) '@ , . . . , (x) ' ; (x) (x) , . . . , (x) . The ordering function is where @ + ? @ we make first use of the separability of the value (u) from its designator , and assign a different designator, ' l : ll l : 0, N 9 1. (N.B. The superY Y script index of 'l is determined by (u ) ). The ordering function may be expressed in terms of the sampling function u as: O(u ) : ((u ) ' , . . . , (u ) ' . \ We continue by defining G in terms of the selection function and the ordering function as: G(u , (u ) ) : x[x : \(O (u )) & (u ) : x], where m is the rank selected by the filter, for example, for 3 (N ), m: m : k.
(As mentioned, we shall be concerned with the median filter, thus m : k.) The purpose of G is to select the output of filter for each window, thus we
246
BART WILBURN
think of G as the output selection function. The function H in turn, is defined as a writing function: H(G (u , (u ) ), u ) : x [(u ) : (u ) ^ (u ) : (0) ], , so that: (u , u ) : ((0) , . . . , (0) , (u ) , (0) , . . . , (0) . \ \ \ \ we may make use of the properties of the sampling function as a member of a power set, and of the representing function of the predicate K (a), to construct the total filter function I(u, u ) as: I(u, u ) : (u , u ) · K R (u ) j, . . . , j (u , u ) · K R (u ), q : n 9 2k, k : (N 9 1) /2, or I(u, u ) : +\ (u , u ) · K R (u ). (23) An understanding of the filter function will be gained through the investigation of its roots in the following section, but before that let us consider its structure. The entire structure is based on the global union of the window function subject to the predicate being true. The arguments of the window function are the sampling function of the filter and the data selected by it, but otherwise it makes no constraints on the form of the sampling function. This is most easily understood by realizing that the only reference in (u , u ) to 1D as opposed to 2D is the dimension of the indexical scheme. This will become clearer when we undertake analysis of 2D roots. The point is that it is the predicate R(u ) that does all the work in defining the filter design and determining the filter response. It does this by representing the sampling function, and by imposing conditions on the data presented by the sampling function to the window function necessary for the predicate to be true. The criterion of truth is the representation of intent and filter design. This is where we begin to see the potential of a predicate of the data to control the filter for specific, intended applications. The predicate R(u ) represents the form and geometry of the data such that I(u, u ) is defined and true with respect to the intended use of I(u, u ). The distinction between filter design is determined by the geometry of the sampling function to distinguish between 1D and 2D filter designs. The satisfaction of filter intent is determined by conditions imposed on the data such as to be either or both type-I or type-II multivalued or binary data. The predicate accomplishes these distinctions with the sampling function, and by forcing conditions (Schoenfeld, 1967) c to be correct, cor(c) conditions imposed on R(u ) for R(u ) to be true such that I(u , u ) : (u , u ) iff R(u ). Thus, the predicate is defined by the sampling function u , and any
THEORY OF RANKED-ORDER FILTERS
247
forcing conditions represented as: u (u , . . . , u ) : [xR (x, u , . . . , u , cor(c)]
read as: the set of all x such that R is true. The subscript c indicates correct
conditions, cor(c), that force R (u ) to reflect constraints on the data such
that R : (u , u ) R (u ). It follows then that the intent to discover such
special phenomena as FP or oscillating roots for a 1D or 2D filter become discoveries of u and cor(c) in the representation of the data by the predicate such that (u , u ) R (u ). The notion of the predicate constraint is central to the application of the filter for feature extraction and the implementation will become clear by example in Section III.B, where we will illustrate these notions. In summary, the predicate and the functions (u , u ) : x[H(G( u , x), u )] and I(u, u ) constitute a conceptual framework of analysis wherein the predicate is the defining condition of the filter as remarked in the foregoing. The functions consist of a window function (u , u ), comprised of an output selection function G of the output datum and a writing function H of u , and I(u, u ) is a composing function of u from u . B. Logical Investigation We may now address the solution of R (u ) for an FP root of the filter
represented by I(u, u ). We will develop the notion in 1D and then extend it easily to 2D. The notion of an FP root is that there is a sequence of either multivalued or binary data of some continuous length r in u that is invariant to the filter. This means that if every value of r is selected by the filter preserving its value and its identity in the sequence, then every datum of r must be in the kth position of O(u ) and coincident with the kth position in u for each increment of i of a median filter of length N: N : 2k ; 1. This requirement for the output of the median data value to be coincident with the median data position in u within u defines a condition of: c: (u ) : (u ) 'l , (24) k : (N 9 1)/2; $ : ' : k at median as necessary for R (u ) to be true. This is a forcing condition in the predicate.
We must note here a very important consequence of the representation of data by u : (u ) v in (u ) : (u ) 'l . The condition on the data described in Eq. (24) results in identity of the datum, regarded as an entity, in the median position of the sampling function and the entity in the median position of the ordering function. This is important not only because it permits the formalism of Eq. (23), but because it is the difference between
248
BART WILBURN
correct and incorrect implementation of the filter mentioned much earlier. The most common error in implementation is to sort the data in the ordering function according to the equivalence value of the data rather than according to the identity of entities. Failure to implement the filter according to Eq. (24) will result in false roots. The satisfaction of Eq. (24) defines (u ) to be the median value of u , and results in the following arrangement of data in the ordering function: O(u ) : ((u ) ' % (u ) ' (u ) (u ) ' % (u ) ' , \ \ (u ) : (u ) ' $ : ' . If we increment to u , then we have (u ) : (u ) . If we do \ this and maintain the condition of coincidence, Eq. (24), then we also have : '\ or : '\ depending on whether (u ) is greater or less \ \ \ than (u ) . N.B.: This amounts to a corollary to the coincidence condition. We may incorporate this result into the O(u ) to result in: O(u ) : ((u ) ' % (u ) (u ) \ \ % (u ) ' , \ or ) : ((u ) ' % (u ) (u ) \ \ % (u ) ' , j k. (25) \ We can see now that the forcing condition of cor(c (u )) : : ' for i : i, ' i ; k results in a predicate representation of the data u to be a k ; 1 sequence that is either monotonically nondecreasing or nonincreasing as shown in Eqs. (26) and (27). This is a condition imposed on the data by the predicate for it to be an FP root of a median window filter 3 (N ):
O(u ) : ((u ) ' % (u ) ' (u ) \ % (u ) , j k, (26) or O(u
) % (u ) (u ) ' % (u ) ' , j k. (27) If we were to begin the analysis at i ; 1 and decrement to i, we would produce the converse of Eqs. (26) and (27). This means that for a 1D filter along the x-axis, an object that is a root must be symmetrical in terms of local monotonicity such that the sequence of data constituting the data object is locally monotonic on entry as well as on exit by the filter. This O(u
) : ((u
THEORY OF RANKED-ORDER FILTERS
249
defines a data sequence to be locally monotonic of order k ; 1 denoted as LOMO(K ; 1). This analysis illustrates how the predicate R (u ) constrained by forcing
conditions satisfies the intended use of I(u, u ). The subscript c indicates correct conditions, cor(c), that force R (u ) reflecting the filter design and
intent, in this case that I(u, u ) of 3 (N ) only selects FP roots of the
filter, that is, 3 (N ) iff R (u ). The forcing condition is hereafter
$. referred to for convenience as the coincidence condition. We must note that this case is LOMO(k ; 1) defining an FP because the coincidence condition is applied to every datum in a k ; 1 sequence. Another way to say this that will prove useful later in the discussion of 2D feature extraction is as follows: The predicate condition cor(c) on the data is that it must satisfy the coincidence condition for a k ; 1 set of contiguous data. This condition is not given explicitly in Eq. (24), but of course could be incorporated as a super-predicate on I(u, u ) as R (u ). We will see later that if the
$. coincidence condition is not applied to every datum, but, say, to every other datum, then it results in an oscillation root, and later still to a root defined by a repeated pattern of data in a coded median filter. 1. Example of 1D Fixed-Point Root Detection Let us suppose a sequence of data described as shown in the following where (u) is listed in the upper row and is in the lower row: (u) 0 1 2 1 4 5 5 5 5 6 6 7 6 : 1 2 3 4 5 6 7 8 9 10 11 12 13
Now let us apply the sampling function u : (u ) v for a median window of N : 9, that is, j : 0-8, for each increment of i : 1-5 with the designator of the sampling function , shown in the lower row indexed by i. We may then write the ordering function followed by the resulting window function. This analysis will illustrate the conditions of u to satisfy LOMO(k ; 1). The output index 'l , of the selection function in G(u , (u ) ), m : k, operating on O(u ) is highlighted in boldface: (u) 0 1 2 1 4 5 5 5 5 6 6 7 6 i:1 : 0 1 2 3 4 5 6 7 8 O(u ) : (5' 5' 5' 5' 4(' : ) 2' 1' 1' 0' (u , u ) : 000040000 (u) 0 1 2 1 4 5 5 5 5 6 6 7 6 : i : 2. 012345678
250
BART WILBURN
O(u ) : (6' 5' 5' 5' 5(' : ) 4 2' 1' 1' (u , u ) : 000050000
(u) 0 1 2 1 4 5 5 5 5 6 6 7 6 : v 012345678 O(u ) : (6' 6' 5' 5' 5(' : ) 5 4 2' 1' (u , u ) : 000050000 i : 3.
(u) 0 1 2 1 4 5 5 5 5 6 6 7 6 : v 012345678 O(u ) : (7' 6' 6' 5' 5(' : ) 5 5 5 4 1' (u , u ) : 000050000 i : 4.
(u) 0 1 2 1 4 5 5 5 5 6 6 7 6 : v 012 345678 O(u ) : (7' 6' 6' 6' 5(' : ) 5 5 5 4 (u , u ) : 000050000 i : 5.
I(u, u ) : (u , u ) · K R (u ) j, . . . , j (u , u ) · K R (u ) I(u, u ) : 0 0 0 0 4 5 5 5 5 0 0 0 0. We can see now that if we were to reverse the process and decrement i from 5 to 1, then we would produce the same result. An important observation can be made here that will be applied later in feature extraction. The implementation of the filter mentioned in the preceding incorporated the condition of coincidence R(u ) : R (u ), that is, for i : 1, 5, without
$. explicitly stating so. The root was detected because the data set was a type-I sequence of data. If we had used R (u ) for I(u, u ) where u was a
$. somewhat longer set of data that included the type-I sequence used earlier, but was otherwise type-II, then the result would have been the exclusive detection of the type-I sequence with all else returned as zero. This follows from K R : 0 in Eq. (23) for all but the type-I sequence. $.H We can Aalso illustrate the phenomena of the false root described earlier. Suppose we modified u to u by changing the last two data to zero as: (u ) 0 1 2 1 4 5 5 5 5 6 6 0 0 u : : . v 1 2 3 4 5 6 7 8 9 10 11 12 13
THEORY OF RANKED-ORDER FILTERS
251
This change will illustrate the subtle importance of the identity of the median data and the median rank implied by the coincidence condition. Let us relax the coincidence condition from ‘‘’’ defined on and 'l to ‘‘:’’ defined on (u ) , and then repeat the analysis for i : 1—5. This would be the same as sorting the data allowing substitution of equivalent values. The data u would fail to satisfy R (u ), but under a sort allowing substitution of equivalent value, it would
$. produce the same result, but it is a false root. This false root is made possible by the substitution between (u) permitted by ‘‘:’’ otherwise denied by ‘‘’’ \ defined on and the corollary to the coincidence condition. The reason it is a false root, other than violation of the corollary, (u ) : (u ) ; \ \ : '\ or : '\ is that the detection of the FP by substitution \ \ \ occurs only if (u) ; i : 6—9, are the same value greater than or equal to (u) , even if the data are otherwise a monotonic sequence. Furthermore, basing the filter on substitution of data having equivalent value prohibits the control of the filter by forcing conditions on the predicate necessary for feature extraction. Finally, one additional comment on 3 (N ), which applies also to 2D
filters, is that a filter unconstrained by the coincidence condition applied to type-II data results in type-I data output. This comes about because the output of the filter is always the median value of the sampling function u ; thus the datum in the median position of a sampling function of the filtered output, u is always the median value — ergo: type-I. 2. T he Structure of 1D Fixed-Point Roots The FP root is an object in itself having a property that the data composing it are related and describable as one of several distinct types of locally monotonic patterns of relationships ,e@-, 2 + ,s-. In the case of a 1D filter, we see the following patterns: (a) A monotonically increasing sequence of k ; 1 terms, for example, an up-ramp of length k ; 1, followed by a ‘‘ragged’’ plateau of k data all greater than any of the monotonic data in the ramp, and following a ‘‘ragged’’ plateau of k data all less than any of the monotonic data in the ramp (Fig. 1). (b) A monotonically decreasing sequence of k ; 1 terms, for example, a down-ramp, followed by a ‘‘ragged’’ floor of k data all less than any of the monotonic data in the down ramp, and following a ‘‘ragged’’ floor of k data all greater than any of the monotonic data in the down ramp (Fig. 2). (c) If the k ; 1 monotonic data are related by ‘‘:’’ rather than ‘‘’’ or ‘‘,’’ then the modes of the leading and trailing sequence of k terms may be independently derived from the neutrality of ‘‘:,’’ allowing for a ‘‘pulse’’ of k ; 1 binary data (Fig. 3).
252
BART WILBURN
Figure 1. Up-ramp.
Figure 2. Down-ramp.
Figure 3. Pulse.
THEORY OF RANKED-ORDER FILTERS
253
Figure 4. Alphabet of 1D FP roots.
The important point is that these FPs are independent of the absolute pixel values. The characteristic of a fixed-point pattern being a monotonic relationship of each of the (u ) to all other (u ) , j : 0, 2k in a window i 9 k to i ; k suggests that there is a definable grammar for the composition of subsets u to comprise a fixed-point set u or sentence of data. The notion of a grammar of FP roots can be illustrated as follows. Suppose we assign mnemonics , 2, 7 to the monotonic sequence of the three FP patterns described in the foregoing in (a), (b) and (c), and k and k\ to the increasing and decreasing ‘‘ragged’’ precursor and trailing data (Fig. 4). We may now see that the FP described in (a) is captured by the label k\k , that of (b) by k 2k\, and (c) by (k or k\)7(k or k\). We can also see that the FPs denoted by these labels may be combined in certain allowable ways that preserve the character of being an FP. If we may assume the first and last k sequence of data appropriate for k or k\ for 2 or , respectively, we have the following allowable and nonallowable dyadic combinations that satisfy the conditions for an FP of the combination (Table 2). In this structure, , 2, and 7 are primitive characters and k and k\ are constraints on the preceding and trailing k sequence of connective data. The character 7 is universal and and 2 have a sense of polarity. Furthermore, the combination . . . denotes a k ; 1 ; n monotonically increasing sequence and similarly for 2 . . . 2 or 7 . . . 7 . With these rules of combina tion, or grammar, we may compose allowable combinations of FP roots as words (Fig. 5) that are themselves FP roots such as shown in Fig. 4. It is important to realize here that the patterns , 2, and 7 are patterns of relationships of the data and are independent of the actual values of the
254
BART WILBURN TABLE II Grammar of 1D FPs Allowed 7 7 27 72
not allowed ... 2...2 7...7
2 2
data. There is much more work needed to be done on this notion of a grammatical structure, and there are variants of the forementioned combination such as overlaps of FP root patterns to be defined, but the possibilities of application are intriguing. For example, realizing our ability to constrain I(u, u ) by R (u ), we may detect a sequence of such words
$. as those in the preceding separated by some specified interval of data. If such a sequence is found and represents a relationship of data that is a characteristic feature of some known object, then the sequence can be said to have semantic content and be regarded as a sentence. For example, a sentence S : k\[ . . . 7 . . . 7 2]k\ k\[7727]k\ would represent the fol lowing as shown in Fig. 6. It is important to note, however, that the sentence described in the preceding by the sentence S is independent of the values of data except insofar as they satisfy a relationship of values expressed by k\[ . . . 7 . . . 7 2]k\ k\[7727]k thus S is an intensional representation of Fig. 6. The notion of a syntactical structure of FP roots having semantic content such as described earlier will be explored more rigorously later. It is more important now to continue this line of thought into two dimensions.
k\[72]k\:
Figure 5. Example FP word.
THEORY OF RANKED-ORDER FILTERS
255
Figure 6. Example FP sentence.
C. Two-Dimensional FP Analysis We may construct 2D filters by 2D extension of 1D filters. The most obvious forms of this extension of the standard 3 (N) are (L : N ; N) ) (N) or (L : N j N T ) (2N 9 1), for L : 2q ; 1
Examples of 2D filters (L ), representing these structures derived from a 1D filter 3 (N ), N : 5, are presented in Fig. 7. These two filters are not the only kind of 2D filter that one can construct, but they will serve to illustrate the analysis of 2D filter properties. The analysis of these filters will proceed with a mathematical investigation of the predicate and the ordering function followed by a phenomenological investigation addressing FP roots. We mentioned earlier in the derivation of the 1D case for I(u, u ) that the distinction between 1D and 2D was in the predicate given by the indexical scheme of the data and the sampling function. We will show this now beginning with the indexical scheme. The data are assumed to be an array in X, Y for the case of 2D filters, with the origin in the upper left. The
Figure 7. 2D MW filter designs.
256
BART WILBURN
set of data u : u : ,u -, are defined in X ; Y: D X ; Y : [: + X & 1 + Y & : + (, 1] so that when expressed in the form of designator terms, the data u : (u )D, where the index : is an ordered pair : : (, 1, : : ((1, 1, . . . , (n, n; thus u : (u , u , . . . , u ). Following the same form used in the general derivation in the preceding, we may form the sampling function of u by a subsequence u : (u ) D ; # : (6, , # : ((0, 0, . . . , (2k, 2k, based on D D S S 2(uD, #) : uD and 2((u ), #) : (u ) . Hereafter, the ordered pair indices of # S D D S and : will be abbreviated when convenient and no confusion will result with the understanding that they are an ordered pair, that is, : : (, 1 or 1; # : (6, or 6, or 6 ; k, ; k, etc. Hereafter, the distinction in 2D filters is provided by the sampling function and the analysis proceeds as before for the 1D case. The constraints on the range of # reflect the design of the filter as the sampling of u. We further define an analogous ordering function: O(u ) : (u ) 'S. The predicate R(u ) is defined by u and any cor(c), such D D S H D D as the coincidence condition defined in 2D as D : 'S . S H The ) (L ) filter is introduced in the following with some trepidation because for L 9, it is actually a false filter in the sense that it does not have an isolated FP root, that is, it is a quasi-root that spans the horizontal space of the data. An isolated root is one that is invariant under repeated passes of the filter. Furthermore, for L : 9, ) (9) is actually not a square filter at all, but the smallest member of the class of octagonal * (L ) filters. We will show that the * (L ) filter is in fact a logical extension of the (L ) filter.
For these reasons, we will start with the (L ) as its analysis is relatively
straightforward and serves to introduce the methodology of 2D analysis. 1. (L ) Fixed Points
As we mentioned earlier, everything hinges on the predicate with the sampling function and the correct conditions, specifically the coincidence condition and the condition for a contiguous set of data satisfying the coincidence condition. For that reason, let us begin with the (L ) sampling
function. It is: & ((u) (B A N ) ; u : 6((u) (B MA )) M, N, D (u ) : ((u ) , . . . , (u ) ) D S S D S D D : (vD , . . . , vD S. S S L : 2N 9 1, N : 2k ; 1, and (6, : ((0, 0, . . . , (2k, 2k.
(28)
This sampling function results in a double sequence Equation (29) representing the intersection of the two arms of the ‘‘cross’’ filter, (L ), shown in
THEORY OF RANKED-ORDER FILTERS
257
Fig. 6: u : (((u ) D , . . . , (u ) D , ((u ) D , . . . , (u ) D ) (29) D D D D D u : u / u ; u . u : (u ) DY . A B A B D D We may increment u on : to : : ; k, 1 in the same way as for the 1D D case, subject to the predicate forcing condition of coincidence as the filter slides along the X axis, to get a series analogous to Eq. (26). We may describe this series with an auxiliary ordering function O (u ) : O(u ) & A D O(u ) for each 1D sequence ordered over N: B O (u ) : ((u ) DY % (u ) DY (u ) 'M % (u ) 'M & KY DY DY DY M DY M ((u ) 'N % (u ) 'N (u ) DY (u ) 'N % (u ) 'N . DY N DY N \ DY DY N DY N (30) or the converse expression for a nonincreasing series analogous to Eq. (27). We may now increment u on : to : : , 1 ; k and get a similar series along D the Y axis: O (u ) : ((u ) D % (u ) D (u ) 'N % (u ) 'N & D D D N D D N ((u ) 'M % (u ) 'M (u ) DY (u ) 'M % (u ) 'M . D M D M D D M D M (31) We may produce the same analogous converse for the predicate symbol ‘‘ .’’ In similar fashion, if we decrement : and : by k from 1 ; k and ; k, we get a series analogous to the cases discussed for Eqs. (26) and (27). Equations (30) and (31), and their implied converses, describe two orthogonal monotonic sequences u and u connected by a common intersection A B constrained by the coincidence condition such that (u ) D : (u ) D : D D (u ) DY and .\ O(u ) : (u ) D , as the predicate for a fixed point of DY D D (L ).
It is important to note that the ordering function O(u ) is not defined on D the diagonal reflecting the sampling function geometry, that is, the adjacent diagonal datum is not in the sample space of the sample function. If we tried to increment u on the diagonal, the median value would drop out of the D ordering function on each increment. This means that diagonal monotonicity is not a requirement for the fixed points of (L ) in contrast to what we will
find for * (L ). The functions G, H and are 2D extensions of the set of functions described by Eq. (24) wherein the unary indices of Eq. (24) are replaced by ordered pairs : and #. The final condition for a 2D FP root of (L ) is that the data satisfying the coincidence condition are also a member
of a (k ; 1) ; (k ; 1) set of contiguous data all satisfying the coincidence condition.
258
BART WILBURN
Figure 8. (a) A uniform pulse pattern.
The 2D FP roots to (L ) are more complex than the 1D FP roots as
might be expected, and their combinations to form FP structures are similarly more complex. We may best visualize the FP roots of (L ) as a
box with the top surface formed by a sheet connecting to the 1D FP root patterns on each side. Imagine that the top surface is only moderately flexible in that it does not allow any kinks, much like the way a sheet of stiff cardboard might be flexed in various tilts and twists. The amplitude of the
THEORY OF RANKED-ORDER FILTERS
259
Figure 8. (b) a saddle pattern
top surface is the envelope of the values of the data in the pattern. A few examples of the FP structures are shown in Fig. 8a—8e that are FPs for (17), thus k : 4. The associated example data set for these patterns is
shown using arbitrary data values between 0 and 5 to illustrate the pattern. Please note that the (k ; 1) ; (k ; 1) FP is shown with the precursor and trailing k or k\ patterns of the ‘‘ragged’’ plateaus of data of length k. The patterns are shown in X, indexed by , and Y, indexed by 1. We should also
260
BART WILBURN
Figure 8. (c)a wedge pattern
note that these patterns are defined by the types of relationships on the boundaries, and by local monotonicity on the interior. As such, the data satisfying those boundary relationships and monotonicity is not a unique pattern of values in all cases, but in all cases, is independent of absolute values of the data. The first example is of a uniform pulse of data wherein the data are all 7
THEORY OF RANKED-ORDER FILTERS
261
Figure 8. (d) an inclined saddle pattern.
type monotonic patterns, that is, u : ((u ) : (u ) : % (u ) , u : ((u ) : (u ) : % (u ) A B D D D D D D for every : in : : (, 1 to ( ; k, 1 ; k. This pulse is of uniform amplitude 5 and we should note here and in what follows where applicable, that the precursor and trailing plateaus are shown as k\ patterns, but could
262
BART WILBURN
Figure 8. (e) a diagonal wedge pattern.
just as easily have been k because of the ‘‘:’’ predicate relating the data for a pulse. The FPs detected by (L ) are, as shown in the preceding, a pattern of
data determined by a monotonic structure within themselves and also satisfy a relation with neighboring k data as all ‘‘’’ or ‘‘ ’’ to be a finite set of possible types of roots. If we restrict the predicates of the monotonic structures of the roots to ‘‘:’’ and ‘‘’’ or ‘‘,’’ for (k ; 1) ; (k ; 1) sets, or ‘‘tiles,’’ we have potentially 81 possible types of FP patterns. This number
THEORY OF RANKED-ORDER FILTERS
263
Figure 9. Complex FP of a pulse, wedge and diagonal wedge. (a) Original image, (b) (13) iff R (u ), (c) and (d) (13) iff R (u ).
$.
$.!
is reduced to 54, however, by the constraint of being connected at the vertices of the FP, and further reduced to 15 fundamental patterns by disregarding patterns equivalent under rotation. Thus, we have an alphabet of 15 fundamental patterns: e@, 2 : 1,15; e@ R (u ) including the five D $. D shown in Fig. 8. The important fact is that there are only 15 patterns. Now we can see quite readily how some of these few patterns in Fig. 8 can be combined into a complex FP pattern. The following example, shown in Fig. 9, involving the wedge, pulse and diagonal wedge, is an FP pattern buffered by a k\ of ‘‘0.’’ This pattern could be further combined with /2 rotations of the wedge and diagonal wedge on matching edges, and buffered by k\ of ‘‘0’’ to form a 2D image of Fig. 5. Suffice it to say that it could also be combined with /2 rotations of itself to form a similar image, and other more complex and extensive FP surfaces could be imagined involving the other patterns. The solid lines in the complex FP shown in Fig. 9 indicate that the four FPs co-joined together share a common set of pixels on the adjoining edge. This results in each pattern satisfying the k-data requirements of the other patterns as well as the monotonic requirements of each pattern. This aspect of the co-joining of FP patterns will be addressed again in Sections III.D and III.E. With some understanding of the structure of a complex FP, we may illustrate the application of the predicate constrained (L ) for extract
ing features of an object in an image.
264
BART WILBURN
2. Application to Feature Extraction The application of the RO filter to feature extraction may be realized by enforcing a condition on the predicate such that the data satisfy the correct conditions of an FP root for the filter function to be not necessarily zero. The implementation of this logic is described by Eq. (23); I(u, u ) : (u , u ) · K R (u ) j, . . . , j(u , u ) · K R (u ) where K R (u):1 \ \ \ for R(u) : R (u ) and R (u ) is false, that is, that the data satisfy cor(c)
and R (u ) is true. If R (u ) is false, that is, R (u ) is true, then K R (u) : 0
and the associated (u, u ) contributes 0 to I(u, u ). A collection of juxtaposed, and thus grammatically compatible, R (u ) represents a predi D cate of an object in an image composed of FP roots and is denoted as: DY R (u ) ; R (u ) : + R (u ) ; : : : ; n. DY
DY
D D The application of RO filters for feature extraction assumes that objects of interest are composed of a set of related features distinguished from features not comprising objects of interest by their monotonic structure, and that the composition of features to constitute the object is a relational structure. This assumption clearly does not apply to all images and objects of interest, but may apply to many cases such as man-made objects embedded in clutter. The potential of the RO filter for feature extraction of objects embedded in clutter is illustrated with preliminary results in Fig. 10. This application is of (L ) iff R (u ), L : 13 constrained by the
$.
condition of coincidence and inclusion in (k ; 1) ; (k ; 1) sets denoted by (13) iff R (u ). The representation of the features is by the FPs
$. <
detected by this constrained filter and is shown both with and without the precursor and trailing k and k\ patterns that define the complete root. The (u ). representation with the k and k\ data is denoted by (13) iff R
$.<
The results shown in Fig. 10 are: (a) Original image of a navigation marker; (b) filtered image subject to constraints for selecting LOMO(q ; 1) FP structures; (c) filtered image composed of complete FP roots, that is, (13)
iff R (u ); and (d) (13) iff R (u ) applied to (a) rotated by 90°.
$. <
$. <
These results demonstrate that feature extraction by this method is independent of shape and gray level of the object. The dimension of the input image was 840 ; 1024 pixels and required approximately 100 s of computation by a Sun Microstation to produce Fig. 10(c). We can see that the constrained filter is rather efficient in detecting the feature of the numbered panel, and conversely it is efficient in rejecting the various kinds of clutter surrounding it. We can also see that the FP roots have internal variation in intensity and that they combine to form a recognizable object. The numbered panel is a reasonably smooth, albeit
THEORY OF RANKED-ORDER FILTERS
265
(a) Figure 10. Chesapeake Navigation Marker.
lumpy, structure, and as such it satisfies the predicate constraints for many of the FP patterns ,e@-. The other parts of the image do not satisfy these constraints by the nature of being clutter, except by chance, and the result is the occasional appearance of isolated patterns that are detected. The exception is the detection of the sky. An observer might note, however, that the sky appears as an unstructured feature as does the partially obscured orthogonal panel, both being recognized by association in the local context. It is important to note that some of the facing panel could be obscured and we may still recognize it as part of a feature. Finally, we can see that the application is fairly robust under rotation as seen by image (d).
266
BART WILBURN
(b) Figure 10. Continued.
By inspection of Fig. 10 (b), (c) and (d), we can verify that the FP patterns ,e@- are a set of relational patterns independent of absolute pixel value. This means that they represent the property of relation in that region of the image independently of the extension of the image, thus the ,e@- are an intension of the image in that region. This explains the claim that an image of a feature represented by an ordered sequence of e@: (e @, . . . , e @ is an D DY intensional representation of that feature. As a practical matter, we cannot say that any image of that feature will have the same representation by (e @, . . . , e @. We can say, however, that any image from a set of images of D DY that feature obtained within the limits of linear exposure, equivalent range, perspective and resolution, and digitized at the same level will have the same intensional representation. Furthermore, as individual objects are members
THEORY OF RANKED-ORDER FILTERS
267
(c) Figure 10. Continued.
of a sort of object, features are members of a sort of feature defined by a common set of intensions referred to as intensional properties. Images of members of that sort form families of images over range and perspective. This family of images is usually a fairly large set of possible images for which the intensional property of the feature holds true in practice. We should also point out that it is certainly possible to ‘‘fool’’ the filter with camouflage in the same way that human observers are deceived by camouflage. The relations required between the monotonic data and neighboring k data, or ‘‘ragged’’ plateaus, result in the roots being constrained by rules of combination so that the co-joining or partial superposition of roots satisfy a kind of grammatical compatibility. The demonstrated capability to extract
268
BART WILBURN
(d) Figure 10. Continued.
and represent features intensionally with a finite set of grammatically constrained ,e@- implies that it may be possible to define a language system of imagery based on a syntax of ,e@-. If we can devise a language system, then we may be justified to explore a language model of this system that would provide a semantic valuation in the calculus of the system. The purpose of such a language model would be to link an image directly to a logical model of object recognition in order to implement a system of automated image interpretation. This subject will be addressed in more detail later in this chapter. The next section will address the grammar, or compatibility, for combining the various FP patterns. D. The Grammar of (L) Fixed-Point Root Combinations
The discussion in this section is new and has not been previously presented or published. The purpose here is to develop the schema for computing the natural grammar of allowable combinations of FP patterns e@. The development of this schema is presented in a detailed, step-by-step progression and is not abstracted into more elegant and parsimonious mathematics as the concern here is for understanding the dynamics of the situation.
THEORY OF RANKED-ORDER FILTERS
269
Figure 11. Join of two FP by ‘‘’’.
The method for realizing the compatibility (or grammar) of the various FP patterns e@ to join together involves determining the relational predicate R (e?, e@) for combining any two e? and e@ by a connective 1 such that A R (e?, e@) is true (T ), that is, that they are compatible and form a complex A FP, or false (F), that is, that they are not compatible to form a complex FP. Thus, the problem is the evaluation of R (e?, e@) + ,T, F-. This relational A predicate is central to the language of imagery referred to in Section III.C.2 so as to constitute a nonclassical propositional language L (I) to be discussed later. The compatibility of e? and e@, that is, that the truth value of R (e ?, e@) A is T for 1(e?, e@), is a problem of matching the LOMO(k ; 1, k ; 1) relationship in X and Y of e? and e@. For our purposes here, we will assume, without any loss in generality, that 1 is the simple connective ‘‘’’ interpreted as two FPs e? and e@ joined in Fig. 11. The FPs in Fig. 11 are indicated by and 2, and the dashed enclosure of k denotes the k and k\ data patterns as appropriate to the FPs. If we may assume R (e ?, e@) is T, then the join of the FPs by forms a complex FP. Our purpose here is to develop the method of testing R(e?, e@) such that 1(e?, e@) R(e?, e@). The method of evaluating R(e?, e@) must also include the rotation of the FPs. This means that R(e?, e@) is more properly stated as R(e?(3), e@(3)), where 3 is the rotation (0, /2, , and 3/2) of the FPs. We will defer this complication for now and address it later. 1. Representation of Patterns An FP pattern e@ may be described in vector notation defined in a right-handed space with unit vectors e , e and e of X, Y and Z as shown in Fig. 12. The length of the e and e components of an FP is either 0 or at least k ; 1, and we shall denote this as 0 or 7. The e component in actual patterns is a variable, but this analysis does not need to be concerned with this complexity. This is because we are only concerned with the continuity of the slope of the vectors as a monotonic sequence of data that are all ‘‘’’
270
BART WILBURN
Figure 12. Coordinate system of FPs.
or ‘‘ .’’ This simplification follows from the e@ being an intensional representation of the actual data. The continuity of slope from one FP to another is the measure of compatibility and we denote the magnitude of the e component as 1, 0 or 91. With the conventions described in the foregoing, we now imagine an FP pattern as a square composed of four vectors a, b, c and d, and points indexed 1, 2, 3, and 4 of the four corners of the pattern as shown in Fig. 13.
Figure 13. Vector representation of FP.
THEORY OF RANKED-ORDER FILTERS
271
We may now describe the FP as a matrix of vector components as e e e a a a b b b (32) c c c d d d The components a, b, c and d are computed as follows indexed by the points of the FP pattern: a e@: b c d
a :x 9x a :y 9y a :z 9z b :x 9x b :y 9y b :z 9z c :z 9z c :x 9x c :y 9y d :x 9x d :y 9y d :z 9z With this convention (recall that the pattern length of k ; 1 or greater is 7) we may see that any pattern may be described by a matrix as follows: e a b (34) c d This representation permits the distinction of all (L ) FP pattern ‘‘types’’
by the e component of vectors a, b, c and d as 1, 0, or 91. We will refer to this representation as the e@ form where 2 + ,P-, P is the isomorphic mapping of the finite set FP pattern types to the set of positive integers. a e@: b c d
e 0 7 7 0
e 7 0 0 7
2. Characterization of Fixed-Point Surfaces We know that the compatibility of any two FP patterns at some orientation is determined by the continuity of the slope of the surfaces in the X and Y directions at the junction of the adjoining edges of the surfaces. We also know that by construction, each of the patterns is LOMO(k ; 1, k ; 1) in themselves. This means that we know that there are no compound curves in either X or Y, and that we may characterize the surface of each pattern by the surface normal vectors at each of the four corners of the patterns indexed by points 1—4. We must emphasize an important, but subtle, principle of combining FP patterns. We have stated the primary principle governing the formation of complex FPs and that is the continuity of slopes. There is another, more
272
BART WILBURN
subtle, principle also at play here and it is that these patterns are defined to already exist by hypothesis. The significance of this statement lies in the realization that we are not designing actual patterns to be joined. Rather, we are testing for the recognition, that is, the ‘‘name,’’ of the types of each of two patterns that can share related data as one existing complex FP. We must keep in mind that each of the patterns constituting the complex FP also accounts for the k and k\ data of each other extending from the mated edges. This principle governs the distinction between the necessary and sufficient conditions for compatibility and, indeed, implies an important, if not subtle, constraint: Two FP patterns joined along some edge necessarily share the data on the mated edge. For example, two patterns joined along d:a, implying the left and right placement, have a common edge that is the vector d of the pattern (i) on the left, and the vector a of the pattern ( j ) on the right such that d a. This constraint results in some unexpected realizations. For example, suppose we were to join two patterns that when isolated appeared as two wedges, one inverted with respect to the other. Upon joining constrained by a common edge, one pattern appears as the original wedge in e@ form corresponding to the left-hand side of the join, but the other appears as a saddle in e@ form. The saddle is defined by its form in the e@ representation even though it is a ‘‘step down’’ and a ‘‘step up’’ in the first column after the a edge at points 1 and 2, respectively. This illustrates the significance of the prior existence assumption of the patterns. Their combination already exists as a complex FP of kinds of e@, and the objective is to identify their type knowing that only certain types may exist in combination. The problem is to be able to compute the possibilities of types that may exist in combination as opposed to computing the types of independent patterns that may be combined. It is a subtle, but important distinction that we will illustrate as we continue. We may return to the point made in the preceding before the caveat on existence and common edges, specifically, that the surface of an e@ may be characterized by the four normal vectors at the corners. What we have is the situation of the four vectors a, b, c and d, and the points 1, 2, 3 and 4 as shown in Fig. 13. The normal vectors are indexed by the points and computed according to: n at point 1: b ; a n at point 2: c ; a n at point 3: c ; d n at point 4: b ; d.
(35)
THEORY OF RANKED-ORDER FILTERS
273
Thus, at point 1, we have n computed as: e e e n : b ; a: n : 7 0 b 0 7 a 0 b 7 b 7 0 e 9 e ; n : e 7 a 0 a 0 7 n : 9b 7e 9 a 7e ; 7e . The other normal vectors follow in a similar manner, and we may write them normalized by 7\ with the result of:
n : 9b e 9 a e ; 7e n : 9c e 9 a e ; 7e (36) n : 9c e 9 d e ; 7e n : 9b e 9 d e ; 7e . In this description of the normal vectors, they are represented as scaled by a constant of the size of the pattern 7 and referenced to a common origin in coordinate space. 3. Criteria for Compatibility We may test for the compatibility of two FP patterns by using the normal vectors to describe the slope of the surfaces of the two patterns. We do this by computing the direction cosines of the normal vectors in X and Y at mated vertices, or points. The test for compatibility is that the sign of the direction cosine for the e component of one pattern does not change at the junction with the other pattern at the adjoining point. This is also the case for the e component. We may visualize this in 1D. Suppose we imagine two patterns e and e of a ramp and a pulse, respectively, joined along the d:a vectors at points 4:1 and 3:2 (The superscripts refer to e and e.) Suppose further that we project the junction at point 4:1 onto the X-Z plane. The situation we have may be presented as shown in Fig. 14 for the junction of b and b, and the projection of the normal vectors, n and \ n of patterns 1 and 2 for points 4 and 1, respectively. \ In the case shown in Fig. 14, we can see that the direction cosine of n , \ is negative, cos 0, and that of n (1), is zero, cos : 0. Thus, the \ direction cosine of n does not change sign in transition to e and the \ two patterns are compatible at the junction of 4:1. The computational test is sign(cos ) sign(cos ) 0 for the junction of k:1. The same test
274
BART WILBURN
Figure 14. Normal vectors of joined FP edges.
applies for the Y-component in the Y-Z plane. By virtue of n n cos : , and cos : . n n This test is tantamount to: sign(n · n ) : 0 or 1, and sign(n · n ) : 0 or 1 (37) for any two patterns at the junction of points (k:1). We can see, by reference to Eq. (36), that the criterion of Eq. (37) is expressed in terms of the Z components a , b , c and d . Indeed, we will find it useful to define two matrices S and S as follows: n n n n pt.1 9b pt.2 9c (38) n cos :S : pt.3 9c pt.4 9b n n n n pt.1 9a pt.2 9a n cos :S : pt.3 9d pt.4 9d The S and S matrices represent the X and Y components of the normal vectors of the ith pattern for each of the four points of the pattern. We may
THEORY OF RANKED-ORDER FILTERS
275
recall that the Z components were defined to be 1, 0, or 91. Thus, the S and S matrices provide immediate representation of the sign of the direction cosines for use in the test of compatibility. The S and S matrices are useful for two purposes, one for providing the direction of the slope of the surface at the four points as shown in the foregoing, but also for computing the slope of the surfaces under rotation through /2, , and 3/2. To do this, we hold the labels of the points 1—4 and the vectors a, b, c and d fixed in space and transform the values of a , b , c and d to form S (3) and S (3). The rotational state of a pattern is incorporated into the notation as e@(3), where the condition of 3 : 0 is the normal form of the pattern. The transform is computed by a permutation matrix applied to the e , z-component of an e@ pattern in normal form as shown in what follows: a (3) 0 91 0 b (3) 0 0 0 : c (3) 1 0 0 d (3) 0 0 91
0 a (0) 1 b (0) ; 0 c (0) 0 d (0)
3:
n , n : 0, 1, 2, 3. 2
(39)
This matrix provides us with what we need for computing the compatibility of any two patterns in some state of rotation from the normal form. 4. Computation of Compatibility We argued in the preceding that the criterion for compatibility was the continuation of the slope in X and Y to be either nondecreasing or nonincreasing from one pattern to another. We also showed that the criterion is satisfied at the points of a mated corner (k :l) of e and e if the X and Y components of n and n are the same sign or neutral. From this follows that the corners of a mated edge must be a compatible pair of corners defining that edge, namely, (k :l) & (m :n). (The symbol ‘‘&’’ is the logical conjunction of sentences whereas ‘‘’’ is the logical conjunction of atomic sentences.) Furthermore, the satisfaction of the slope criteria at mated corners is necessary, but by itself, even for two sets of corners, is a necessary but not a necessary and sufficient condition for compatibility. Sufficiency is provided by the additional condition for defining a common edge and we will address that later. Nevertheless, for a necessary condition we have (n n n n ) & (n n n n ). Let us consider an example. Suppose two patterns e and e are found joined as shown in Fig. 15, where (k : 4: l : 1) & (m : 3, n : 2). (N.B.: k and m are points in e, and l and n are in e.)
276
BART WILBURN
Figure 15. Corner notation of joined FPs.
A necessary but not sufficient condition for identification as a compatible pair is: n n 0 n n 0 & n n 0 n n 0. By referring to the S and S matrices in Eq. (38), we see that this example of satisfying a necessary condition for compatibility amounts to: Points (4:1):
(9b) (9b ) 0 (9d ) (9a ) 0 &
(9c) (9c ) 0 (9d ) (9a ) 0. We need to link the example calculation for compatibility of Fig. 15 to a scheme for testing all possible mated edges. We may indeed accomplish this linkage with a computational scheme based on the formation of a 2 ; 2 matrix m : n 0 m : ; k : 1, 4 points of ith FP pattern. 0 n We may now form a vector of the four m characterizing the ith FP as: m m m : m m and then use this vector to represent the join of the ith FP to the jth FP with the computation of M : m mY (the prime denotes transpose). The calculation of M : m mY is another function subject to a predicate condition, in fact two of them. These conditions are in the form of Points (3:2):
THEORY OF RANKED-ORDER FILTERS
277
combinatorial constraints resulting from the indexing scheme of the points of the patterns. We will dispense with the formal expression and simply state them. The first constraint determines the form of the matrix, and the second determines the interpretation of it. The first constraint is that the mated corners allowable without rotation of either of the ith or jth patterns are determined by their sum, for example, (k ; l), being an odd number. This constraint results in:
M :
m
m m
m
m m
(40)
m m The second constraint is simply that the sum of four corner indices of a single pattern is the sum of indices of two mated edges, namely, (k ; l ) ; (m ; n) : 10. In other words, the four corners of any two mated edges must also be the four corners of any single pattern. This constraint, together with the first one, that (k ; l) and (m ; n) are both odd numbers, links the m pairs that are junctions of a possible common edge. These constraints results in the following possibilities of mated edges: A: m & m : c : b B: m & m : d : a (41) C: m & m : b : c D: m & m : a : d. The question before us now is to compute the satisfaction of the condition for compatibility of two FP patterns mated on one or more of the possibilities given by Eq. (41). To compute this compatibility, we need to introduce the notion of truth of Eq. (41) as a necessary condition for compatibility. The notion of truth in the context of Eq. (41) applies to the predicate statement of any two given FPs, present in some state of rotation, that they satisfy the condition for compatibility. The notion of truth was presented in the beginning of this section as R (e?, e@) + ,T, F- where R (e?, e@) is a A A predicate of e? and e@ that they satisfy some conditions described by R , A where 1 is a conjunctive for the join e? and e@. Those conditions must be the necessary and sufficient conditions for compatibility such that R (e?, e@) + A ,T, F- and (e? e@) iff R (e?, e@). That is, that a complex FP can be formed A A by the join according to 1 of e? and e@ if and only if what the predicate R (e?, e@) says about them is true. We have developed the basis for a A
278
BART WILBURN
necessary condition in Eq. (41), and we may apply the notion of truth to them that at least one of the possibilities of Eq. (41) is logically true in the sense that e and e are compatible on at least one of the possibilities of Eq. (41). We will soon add another condition to Eq. (41) such that the two together are the necessary and sufficient conditions represented in R (e?, e@) + ,T, F-. Let us first tackle the incorporation of Eq. (41) in A R (e?, e@). A The test of R (e?, e@) + ,T, F- is the logical truth of the sentences A, B, C A and D that at least one of them is true, namely, that the sentence (A^B^C^D), where ‘‘^’’ is the logical symbol of disjunction read as ‘‘or,’’ is true. The sentences A, B, C and D are closed under subsentences, or atomic sentences, in conjunction. (An atomic sentence is an element having semantic content, namely, is T or F, and is indivisible in semantic content.) This means that the valuation of A, B, C and D, namely, V (A) + ,T, F-, is determined by the conjunction of the truth values of all of the subsentences of each of them. This leads us to the question of determining the truth value of the atomic sentences of A, B, C and D, and for this we need to define a logical operator L , on m : L (m ) + ,T, F-. With L (m ) we may, for example, interpret A in the context of (e? e@) iff R (e?, e@); 1 : as: A A / For e and e, V (A) : V [L (m ) & L (m )] V (A) : T iff L (m ) : T & L (m ) : T. If A then R (e , e) is T, then (e e) is T where is a join on c : b. / / The other conditions of B, C and D follow in a similar manner. The logical operator L , is based on an assignment of the truth value of an atomic sentence ‘‘p’’: ‘‘V ( p) : T ’’ if p 0 and ‘‘V ( p) : F’’ if p 0, and is applied to the diagonal elements of m : L (m ) : tr(V )
n n 0
0 n n
L (m ) : [V (n n )V (n n )] n n 0 L (m ) : T iff and n n 0
(42)
279
THEORY OF RANKED-ORDER FILTERS
5. Example Calculation Let us assume a test of the compatibility of a saddle pattern e(3), and a wedge pattern e(3), both in normal form (3 : 0). The e@(3) forms are given in Fig. 16. The test is the satisfaction of at least one of the conditions of Eq. (41) to be true, and this is found by inspection of M : mmY with L (m ). The m for the two patterns are:
9b 0 0 9a 9c 0 m : 0 9a 9c 0 m : 0 9d 9b 0 m : 0 9d m :
m
m
91 0 0 1
0 0 0 91
1 0 0 1
0 0 0 91
1 0 0 91
0 0 0 91
91 0 0 91
0 0 0 91
With these data, we compute M : mmY:
M :
0 0 0 91
0 0 0 1
0 0 0 91
0 0 0 1
0 0 0 91
0 0 0 1
0 0 0 91
0 0 0 1
We can see the following results immediately by inspection: V (A): L (m ) : F; L (m ) : T, thus V (A) : F and R (e (0), e(0)) V (B): L (m ) : T ; L (m ) : T, thus V (B) : T and R (e (0), e(0)) V (C): L (m ) : F; L (m ) : T, thus V (C) : F and R (e (0), e(0)) ! V (D): L (m ) : F; L (m ) : F, thus V (D) : F and R (e (0), e(0)) " Thus (e (0) e(0)) is T where is a join on d : a.
is F. is T. is F. is F.
280
BART WILBURN
Figure 16. e(0): Saddle and e(0): wedge.
To further illustrate the computational scheme for compatibility, let us imagine a variation on the example in the foregoing wherein the wedge is rotated by 3 : 3/2. For this case, we consult the S (3) and S (3) matrices. Recall n n n n pt.1 9b pt.2 9c S : pt.3 9c pt.4 9b n n n n pt.1 9a pt.2 9a S : pt.3 9d pt.4 9d These matrices for the wedge in normal form (cf. Fig. 16) are: 91
0 S (0) :
0
,
0 0
S (0) :
91 91 91
or S (0) : [0] and S (0) : 9[1]. For 3 : 3/2, we consult Eq. (39) and get s (3/2) : 9[1] and S (3/2) : [0] (Fig. 17). Thus, denoting w(3) by w , bY : cY : 1 and aY : dY : 0.
THEORY OF RANKED-ORDER FILTERS
281
Figure 17. e(3/2) : wedge rotated by 3/2.
We substitute the foregoing values for mY, k : 1, 4, and repeat the computation of M : mmY to obtain the result of: V (C): L (mY ) : T ; L (mY ) : T, thus V (C) : T and R (e (0), e(3/2)) is T. ! Thus (e(0) e(3/2)) is T, where is a join on b : cY. ! ! 6. Necessary and Sufficient Conditions for Compatibility We have shown how satisfying at least one of the four truth conditions of Eq. (41) is a necessity, but we can illustrate quite easily how it is not a sufficient condition. Consider the case of a pulse joined to any pattern, but in particular to a saddle. In this case, the m are all [0], and thus also would be M , implying satisfaction of all of Eq. (41). However, we readily see that a pulse is incompatible with a saddle. In the language of logic, if is the state of satisfaction of one of Eq. (41) and ‘‘Comp’’ is the state of compatibility, then what we have shown is the material implication: ( 9 Comp), whereas what we need to show is the necessary entailment: ( Comp). The problem of the pulse arises out of the ambiguity of the end points of the flat edge to the end points of the adjoining orthogonal edge. We can solve this problem by recalling the fundamental assumption that any two patterns joined together already exist as FPs in their combined state, namely, that the k and k\ conditions are satisfied. Furthermore, we must remember that the intent in testing the patterns is not to design compatible pairs, but to compute the natural grammar of allowable patterns that can be found joined to form a complex FP. Finally, we must not lose sight of the relational nature of the data comprising the patterns. The solution is to
282
BART WILBURN
recognize that any two patterns found joined together share a common edge that stands in relation to the data on either side of it satisfying the requirements of an FP root. This recognition of shared edge is enforced if we impose an additional condition for compatibility that the cross product of the joined edges is zero. Thus, for example, if condition A of Eq. (41) is satisfied, V (A) : T, then the additional condition to be satisfied is that c ; b : 0. Thus V (A) : T & c ; b : 0 are the necessary and sufficient conditions for this case. If we set N to stand for any of Eq. (41) conditions and u to be any of the vectors a, b, c, or d associated with N, then the general statement of necessary and sufficient conditions for compatibility of any two patterns e and e is: Necessary and Sufficient Conditions for (e e): V (N ) : T & u ; u : 0 A (43) In reflection on this section, we can readily see now this formalization lends itself to a representation of the S symmetric group. The reformula tion, however, will be a task for another day. This schema shows that a grammar exists and is computable, and provides the requirements of the computation of compatibility. An elegant reformulation is necessary, however, to provide guidance for developing an algorithmic test of existing patterns found in combination. This is work yet to be done. 7. Catalog of Fixed-Point Roots to (L )
The types of FP roots to (L ) is a finite number and indeed, when reduced
by the constraints of piecewise continuity at the four corners and eliminating redundancies for rotation, the final number is 15 fundamental types of FP roots to (L ). We showed earlier that the e component of the vectors a,
b, c and d distinguish all patterns, thus we may list all 15 patterns in terms of the e column vectors in normal form and all rotations may be computed by Eq. (39). E. Oscillating Roots This section addresses the other kind of root of the MW filter, the so-called oscillating roots. We will develop the analysis in a manner parallel to the previous approach for the FP roots, starting with the 1D 3 (N ) and
extending to the 2D case of (L ). The reader is referred to Sections III.A
and III.B for notation. The oscillating root is referred to in the pejorative because it is not a true root; it gives rise, however, to a relative referred to as the coded MW root. The phenomena of oscillating roots involves
283
THEORY OF RANKED-ORDER FILTERS TABLE III Catalog of FP Roots for (L )
a b e@: c d
e:
1
91
1
1
91
1
0
0
1
,
e:
0
91
e:
1 1
e:
91 ,
e:
1 91
1
,
e:
1
91
0
91 e:
,
91 e:
,
1 91
1
e:
,
91
1
1
91
91
e:
,
91
1 0
e:
,
1 91
91
91
1
0
0
91
91
91
91
0
0 0 91
,
e:
0 1 91
,
e:
0 91 91
,
e:
0 91 1
,
e:
0 0
.
0
invariant patterns of bi-valued, or binary, data: a^b, a b, that oscillate in the sense of : ‘‘ababab . . . a.’’ Oscillating roots have been investigated by Tyan (1982), Longbotham (1989), and others, for 1D MW, and they concluded that oscillating roots were quasi-roots in that the first and last datum were lost by the filter. This means that the oscillating sequence must exceed the domain of the filter in order that the output qualified as a root. This investigation will verify that result, and will also show that the dimension of the MW must be N : 2k ; 1; k is even. This investigation will show further that the oscillating sequence need not be binary data but oscillating data modulated by an envelope that is locally monotonic. Finally, as already noted, this investigation will show that there is a way out of the quasi-root result through a generalization of the MW to a ‘‘coded’’ MW. We should also note that the use of binary (0, 1) data is an acceptable generalization for discussion oscillating roots, but is not for discussion of SNR or suppression of (0, 1) noise. Binary data allow the discussion of noise to be a discussion of ‘‘bit error.’’
284
BART WILBURN
1. Oscillating Roots The solution of oscillating roots is easily expressed in the same formalism we used for FP roots. As mentioned in the foregoing, we will begin with the 1D case and then extend it to the 2D case. The forcing conditions for an oscillating root are fairly simple. The individual functions following from the binary condition and the coincidence condition of : ' necessary for root definition of 3 (N ) are:
(u , u ) : (0 , . . . , a , . . . , 0 , R(u ) (u , u ) : (0 , . . . , b , . . . , 0 R(u ). Thus the ordering function must be: O(u ) : (a' , . . . , a' , b' , . . . , b' O(u ) : (a' , . . . , a' , b' , . . . , b' \ and O(u ) : O(u ), and O(u ) : O(u ) for all i. We may realize a generalization of the data by noting that the bi-valued constraint on the data (a, b) is not strict for the b’s if all of the a’s are monovalued and greater than any of the b’s. We may effectively analyze the oscillating root case using (0, 1) based on a threshold distinction between a and b. These arguments define the ordering function as an oscillating function alternating between (k ; 1) a’s and (k) b’s , and (k) a’s and (k ; 1) b’s for alternate odd and even i, of u . The interesting aspect of this analysis is that the forcing condition of coincidence, : 'l for alternating binary data, abab . . . a, means that not every median filter has an oscillating root. The coincidence condition for oscillating binary data forces k to be an even number such that the allowable filter dimensions for an oscillating root are N : 5, 9, 13, . . . , and the first and last instance of coincidence of an ‘‘a’’ with the k ; 1 position of the filter fail to satisfy the median coincidence of the ordering function. Thus, the first and last data elements are lost by the filter, namely, it is a quasi-root. The correct conditions forcing R(u ) such that the ordering function has this behavior for 3 (N ) are:
cor(c (1)) : u : a^b, a b cor(c (2)) : : 'l cor(c (3)) : u : (a) ( j: odd), (b) ( j: even)i, j, j : 0, n; n 3k, k : even. We can see from these conditions that the description of the oscillating roots as quasi-roots simply means that the data described by cor(c (1) and cor(c (3)) must exceed the data space of u, which is a contradiction. We
THEORY OF RANKED-ORDER FILTERS
285
also see that the cor(c) proscribing the coincidence condition does not proscribe contiguity for k ; 1 pixels. The next section introduces a way out of the quasi-root problem for oscillating roots. 2. Coded Medium W indow Filter The discussion of oscillating roots suggests another possibility — that of a filter having a symmetrical periodic structure incorporated into its design. We may interpret the discussion of oscillating data given in the proceeding as the notion of repeated occurrence, and apply that notion to subsets of data. In this perspective, the oscillating data of (0, 1) described previously is regarded now as a repeated pattern of binary data: 10. We may further generalize this to be any pattern of binary data regarded as a unit of a periodic, or repeated, sequence of data patterns. With this perspective, we may generalize the requirement of k is ‘‘even’’ for the oscillating data quasi-root to be a condition that k : ‘‘length of the binary pattern,’’ and that the number of patterns in U must be greater than one. We may further observe that the oscillating data are distributed across the extent of the filter and consider a scheme that would optimize the filter by effectively compacting the data of the binary pattern into one side of the filter for a LOMO(k ; 1) root response function analogous to the conventional filter. To implement this repeated pattern scheme, we may combine the notions of the median window and the matched filter to construct a filter as a symmetrical structure matching the intended repeated pattern E , of ‘‘0’’ ’s and ‘‘1’’ ’s. This filter is referred to as a coded window filter (N ), and is
applicable for patterns of binary data having at least one repetition. The oscillating root of this filter is not a quasi-root, but a true root. An example of a (N ) is:
(3) : [ ■ ■ ■ ■ ■ ],
where ■ are inactive elements of the filter. The following application of (N ) to a data set containing a finite pattern of binary data matching
the symmetry of the filter illustrates the root characteristic: S (23) : 00000101010101010100000 (3) * S (23) : 00000101010101010100000.
Note that the active dimension of the filter is 3 although the extent is 5. Another example is (5) defined as
(5) : [ ■ ■ ■■ ■ ■■ ■ ■ ]
S (28) : 0000100110011001100110010000 (5) * S (28) : 0000100110011001100110010000.
286
BART WILBURN
A somewhat more complicated example is: (7) : [ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ]
S (28) : 0000101011010110101101010000, with equal results. The symmetry of a coded filter is derived from the binary pattern, for example, 10 for S , 1001 for S , and 10101 for S , represented in the filter by ) and ■ corresponding to 1 and 0, respectively. The pattern is reflected about the k ; 1 active element of the filter plus ) on each end with the constraint that the output k ; 1 position of the filter must be an active element. We may define the binary pattern of (1, 0) as the root E : (% , . . . , e ), mapped into (), ■] denoted by % ; then for a given E having k:1’s and p—k: 0’s, the filter construction is defined by the sampling function u : u : ((u ) , % (u ) , . . . , % (u ) , (u ) , % (u ) , . . . , \ \ % (u ) , (u ) ; \ \ I (2k ; 1): [)% . . . % )% . . . % )], E : (% , . . . , % ).
(44)
In this design, the root pattern is sampled in the active elements of the window as the filter slides across the pattern. The active length of this filter is N: N : 2k ; 1, and the total length, or extent, of this filter is m : 2p ; 1, p : ‘‘length of the binary pattern.’’ This defines the root of the coded filter as a sequence of binary data composed of at least one repetition of a binary pattern satisfying R (u ).
The requirement for at least one repetition of a pattern to constitute a periodic binary pattern of length (n ; 2) E , n + ,Integers- augments the local monotonicity requirement of a root for this class of filters. (We should note that the conventional filter required at least two repetitions of the simple binary pattern S , to produce a quasi response.) The coded filter of extent m: m : 2p ; 1, also has a fixed point behavior of LOMO(p ; 1) for multivalued data represented by R(u ) reflecting E , and the noise sup pression of multivalued data according to the estimates given earlier by Eqs. (14), (18), and (21) for filters of length N. Such a fixed point is included in a fixed point of LOMO(p ; 1) for a conventional median filter of length m and may be regarded as a modulation envelope of the E pattern. We may incorporate this information about the filter with a notation of N (2k ; 1, m). The forcing conditions on R (u ) for the intent of detecting
a repeated pattern are that the data are binary a^b, a b, and the pattern is defined by ‘‘a’’ such that it matches the symmetry of the filter as noted in
THEORY OF RANKED-ORDER FILTERS
287
the preceding by E , and is repeated at least once. It is important to note also that the writing function H is defined over the extent m of the sampling function, but the ordering function is defined over N. Thus, the selection function assigns the output datum to j : p in the interval 0, 2p. The analysis of the noise response of the coded filters is primarily concerned with bit errors in the detection of a binary pattern or code. The analysis of bit errors in a binary pattern detected by N (2k ; 1, m) centers
on the requirement for a ‘‘1’’ to occur in a ‘‘0’’ position of the binary pattern when the output position is coincident with a ‘‘0’’ of the binary pattern. This must occur as many times as necessary for ( k ; 1) ‘‘1’’ ’s to be sampled by the filter centered on a ‘‘0.’’ The coded filter has an extent of m sampled in 2k ; 1 positions, thus there are 2(p 9 k) zero positions in the pattern divided symmetrically on the two sides of the filter when the filter is centered on a ‘‘1.’’ When the filter is centered on a ‘‘0,’’ however, there are (2(p 9 k) ; 1) ‘‘0’’ ’s within the window. The sampling pattern determines the number of ‘‘1’’ ’s coincident with active filter elements. Therefore, at most (p 9 k) ; 1 ‘‘0’’ ’s must have an error bit with a binomial probability to cause the filter to register a bit error. This error rate is effectively multiplied by 2(p 9 k) to account for the window length. Another source of bit errors derives from a data set composed of more than one kind of E . The rejection ability of N (2k ; 1, m) to distinguish
between different E binary patterns is mixed. Repeated applications of a given filter for a repeated binary pattern occurring at least once in a stream of other patterns similarly repeated converges to a separation of different E by 0’s or by a LOMO(2p ; 1) sequence of 1’s. This suggests a combination of filtering and image subtraction employing 3 , 3 and filters to
find and locate a binary pattern given knowledge of the pattern intended. 3. Representation of 2D Oscillating Data The oscillating roots discussed earlier for 3 (N ) are expanded here to 2D
for a finite set of data. First, we must address a functional description of the data in order to discuss the conditions imposed by the coincidence condition. There are two basic types of oscillating data — points and bars. However, as we shall see, bar data may be composed from point data. The 2D representation of oscillating point data may be composed by the matrix product of 1D forms A and B, one shifted by one datum with respect to the other, B : shifted A, and the union of the matrix products. 67 We define the data types A and B as oscillating data, for example: A : 1010 . . . , B : 0101 . . . , initially, and then later we will generalize to binary patterns E . If we consider these data types as vectors of data, then we may define the following matrices: A : A A, B : B B, C : B A, and
288
BART WILBURN
D : A B, where the prime denotes ‘‘transpose.’’ The A and B data results in: 1 0 1 0 A:
C:
0 0 0 0 1 0 1 0
0 0 0 0 ,
B:
0 1 0 1 0 0 0 0
0 0 0 0
0 1 0 1
0 0 0 0
0 1 0 1
1 0 1 0 0 0 0 0
,
D:
1 0 1 0
0 0 0 0 0 1 0 1
,
,
0 0 0 0
These sets of data, and the unions of these sets in all combinations, are the possibilities of patterns of oscillating data. The general conditions on the data that they are patterns of oscillating 2D data in the 2D space of X;Y are the properties of the matrices A, B, C, and D as follows: (a) Each matrix is its own identity: AA : A, BB : B, CC : C, DD : D. (b) The intersection of dissimilar matrices is a zero matrix: A . B : ,0- : C . D, A . D : ,0- : A . C, B . C : ,0- : B . D. (c) The multiplication of dissimilar matrices is a zero matrix: M M : ,0-; i, j : A, B, C, D & i " j.
(45)
4. Oscillating Roots of (L )
The first observation we can make is that, just as in the 1D case, 2D oscillating data will not satisfy the predicate conditions for oscillating roots of all (L ) filters, and the oscillating roots, quasi- and otherwise, of (L )
are more complex. By inspection against the q ; 1 criterion for coincidence and the predicate conditions of the data, we find that (L ) has quasi- and
real roots for the following: A, B, C, D A / B, C / D, A / C, B / C, A / D, B / D
(46)
(A / C / D) or (B / C / D). For purposes of describing filter response, this set of matrices can be reduced by eliminating shift-equivalent matrices. For example, we may note that B, C, and D are shifted versions of A, A / B is a shifted version of
THEORY OF RANKED-ORDER FILTERS
289
Figure 18. Elementary quasi-root of (L ).
C / D, and A / D and B / D are shifted versions of B / C and A / C, respectively. Thus, we may adequately describe the filter response with A, A / B, A / C, A / D, and (A / C / D) closed under A, B, C, and D. The result is the elementary point root of (L ) shown in Fig. 18.
By the same reasoning as applied to the 1D case, (L ): L : 2N 9 1,
L : 2q ; 1, N : 2k ; 1 $ q : 2k & k-even by the requirement of coincidence = q-even & q 4. (See Eqs. (30) and (31) and cor(c (3)).) The (L ) has a quasi-root response to the oscillating patterns given by:
A, A / B, and the parallel bar pattern given by: A / C or A / D. (The quasi-root of (L ) bar data also includes single monovalued bars spanning
the data set in the dimension of the bar.) The (L ) also has a true root
response to the juxtaposition of a bar pattern on both X and Y axes as a consequence of the sampling function of (L ) having no diagonal compo
nents. This root constitutes a crosshatch pattern reflecting (A / C / D) shown in Fig. 12. These results permit the occurrence of oscillating roots juxtaposed with fixed point roots in images (Fig. 19). 5. Roots of Coded W indow Based on (L )
The result shown in the preceding suggests that the structure of (L ) is the
class of structures that includes 2D coded filters for repeated 2D binary functions, N (L , M) : L : 2N 9 1, L : 2q ; 1, q : 2k; M : 2s ; 1. The
construction of the coded filter N (L , M) is entirely a 2D analog of the 1D
I (N, m) where the 2D parameter L corresponds to the 1D parameter N as
the length of the filter, namely, L is the number of active elements. In similar fashion, M corresponds to m as the extent of the filter including the inactive elements. The filter N (L , M) is based on (L ) so that the sampling
function is of the same form, but modified to represent E in the same way as for I (N, m), except that now the E is represented in 2D as Rep. ,E E -.
The data compaction of the coded window construction allows the filter to
290
BART WILBURN
Figure 19. Crosshatch root of (L ).
apply to repeated binary patterns within the 2D space in the same manner as for the 1D case. By virtue of membership in the (L ) class of filters,
however, N (L , M) retains the classification strictures of the (L ) given
by Eq. (45). The conditions of Eqs. (45) and (46) are necessary to ensure the independence of the repeated binary patterns on the X and Y axes. The representation of data for the 2D coded window filter N (L , M), is
based on the same matrix formulations as for the oscillating roots to (L ).
In this case, however, the matrices are formed by multiplication of E E . . . E 0 and 0E E . . . E instead of 1010 . . . and 0101 . . . , where E must be repeated at least once. Thus, we let vectors: A : E E 0 and B : 0E E , and form the matrices A, B, C, and D. The coded filter response is described then as before with A, A / B, A / C, A / D, and (A / C / D). Because the roots are true roots, and (A / C / D) subsumes A / B and A / C, we may further simplify the description by eliminating A / C and A / D in favor of (A / C / D) to result in rt.(1): A, rt.(2): A / B, and rt.(3): (A / C / D). For N (L , M), type-II noise suppression follows f (L ), and type-I fixed
point response follows LOMO(s ; 1). An example of N (L , M) is pro vided by (13, 21) derived from (9, 11) wherein E : 10101, and
shown in Fig. 20. The predicate is defined by the sampling function and conditions of Eq. (28) modified by the 2D adaptation of Eq. (32), and the requirement of repeated binary patterns. The writing function is defined over M. Using the symbol ‘‘*’’ to indicate a binary ‘‘1’’ and a blank to be a ‘‘0,’’ the images of the elementary roots of (13, 21); rt(1): A, rt(2): A / B, and
rt(3): (A / C / D) are shown in Fig. 21.
THEORY OF RANKED-ORDER FILTERS
291
Figure 20. (13, 21).
5.a. Verification and Application of N (L , M)
The verification of the 2D coded window filter (13, 21) is shown in Fig.
22. The verification is a test against a computer-generated 64 ; 64 pixel binary image a^b; a : 180, b : 0 of the root: rt(1), with normally distributed noise added according to: Normal(x): x : Rnormal(, ), : 128, : /2(3. Figure 22 shows: (a) target image of rt(1); (b) noisy image of rt(1); (c) filtered image of (b) by (13, 21) iff R(u ); and (d) filtered image
of (b) by (13, 21) iff R (u ); cor(c): : ' . The filtered images (c) and
(d) differ in that (c) is simple filtering and (d) is filtering constrained for feature extraction, that is, R (u ) ; cor(c) : the forcing condition of coinci dence, : ' . The measured background SNR(x) of the noisy image (a) and the measured background SNR (x) of the filtered image (c) are: SNR(X ) : 3.41, SNR (x) : 10.49, respectively. This compares well to the prediction of (18) using L in place of N for the 2D application, L : 13; SNR(x) : 3.46, SNR (x) : 10.65.
292
BART WILBURN
Figure 21. Elementary Roots of (13, 21).
The result in Fig. 22 (d) could be improved considerably by simply applying (13, 21) subject to R (u ); cor(c) a second time. This is because
the target pattern satisfies R(u ) by virtue of the companion data in E , whereas many (if not most) of the spurious responses in the background do not have those companion data and therefore will not satisfy R (u ).
F. Octagonal, Hexogonal and 3D Filters The octagonal filters are best introduced by explaining why the square filter is not a class of filters that exists as a true filter. The introduction is
THEORY OF RANKED-ORDER FILTERS
293
(a)
(b)
Figure 22. (13, 21). (a) Original image, (b) noisy image.
facilitated by explaining an apparent contradiction to this assertion in the form of the popular 3 ; 3 MW filter. The 3 ; 3 MW filter is, by all outward appearances, a square filter and it is a true filter, which accounts for its popularity. We will show, however, that it is not a square filter, but is in fact the smallest member of the class of octagonal filters. A class of filter is
294
BART WILBURN (c)
(d)
Figure 22. Continued. (c) Simple filtered image, and (d) feature extraction image.
established by the properties of having true roots of the sort described in the preceding, namely, FP and oscillating, specified by the order of the filter as a member of a set of integers, for example, N : 2k ; 1, k : ,positive integers-, or k : ,positive, even integers-. That is, if a filter is a true filter, specifically, has true roots, all sizes of that type of filter have true roots.
THEORY OF RANKED-ORDER FILTERS
295
1. Morphological Perspective Let us analyze the response of a 3 ; 3 MW filter from a morphological point of view. We will conduct a thought experiment as follows: Suppose that the 3 ; 3 MW filter is a square filter ) (L ) ; L : 2q ; 1, q 4 and L : N, N : 2k ; 1. Now imagine that ) (9) is placed in a field containing some isolated object composed of contiguous pixels. Let us suppose further that the object is an FP root. The description of the object as an isolated object is important insofar as it means that it has edges removed from the boundary of the field. This may seem a trivial statement, but it enters into the analysis in the following way. Imagine that the filter is scanned over the field from left to right and top to bottom. In the course of scanning, the filter encounters the object so that the leftmost pixel of the uppermost edge of the object is sampled by the q ; 1 (or center) element of the filter. The object is, by hypothesis, an FP object. Therefore, the pixel sampled by the center (or output) element of the filter must satisfy the requirement of being the output of the MW filter, namely, the coincidence condition. For simplicity, and without any loss of generality, we may suppose that the object is a set of ‘‘1’s’’ surrounded by ‘‘0’s.’’ For this case, the output requirement is that q ; 1 elements of the filter ‘‘downstream’’ in sample space are filled. The necessary pattern of data sampled by the first encounter of the ) (9) with the object to result in an FP response is shown in Fig. 23. We may continue this experiment by supposing that the pixel shown in Fig. 23 sampled by the leftmost element of the bottom row of the filter, that is, element 3:1, is the uppermost pixel of the leftmost edge of the object. Let us repeat the experiment with the filter shifted down one row such that the pixel marking the top of the leftmost edge is sampled by the q ; 1 (or center) element of the filter. For this object pixel to be an output of the filter, the required pattern of the object sampled by the filter is shown in Fig. 24. We can readily see that the pattern shown in Fig. 24 is compatible with being part of the same object shown in Fig. 23. We repeat the experiment again with the filter shifted down another row and obtain a necessary pattern shown in Fig. 25. This pattern is also compatible with being part of the same object shown in Figs. 23 and 24.
Figure 23. First encounter of ) (9) with object.
296
BART WILBURN
Figure 24. Second encounter of ) (9) with object.
By imposing the symmetry required for an object to be an FP on exit as well as on entry of a filter, we see that an elemental FP satisfying the ) (9) is the pattern shown in Fig. 26. If the ) (9) is indeed a member of a class of filters that are square filters, then we should be able to repeat the ) (9) experiment for ) (L ), L 9 with the result being a larger FP pattern. Indeed, the result should have a similarity relationship to that shown in Fig. 26 scaled by the size of the filter so as to include the pattern shown in Fig. 26. The next available size of ) (L ) is L : 25. (N.B.: L : N & L : 2q ; 1; N : (2k ; 1), thus k : 2 $ q : 12.) Imagine conducting the forementioned ) (9) experiment with ) (25). Then the analog to the pattern shown in Fig. 23 is as shown in Fig. 27 for ) (25). If, as before, we assume the pixel in the 4:1 filter element of Fig. 27 to be the topmost pixel of the leftmost edge of the object and shift the filter down one row as we did for ) (9), then we see a problem. When the center filter element samples that top pixel of the left edge, the pattern of data sampled by the filter window fails to permit that pixel to satisfy the coincidence condition for it to be the output of the filter. The pattern of data is shown in Fig. 28 and as we can see contains only 10 object pixels. Whereas the aspect ratio of the stepped corners of the ) (9) FP object shown in Fig. 26 is 1:1, the aspect ratio of the stepped corner in Fig. 28 is 2:1. To be sure we could redraw the object in Fig. 28, maintaining the left edge from Fig. 27, and fill 13 filter elements. However, it would not be compatible with being part of the same object shown in Fig. 27. The only
Figure 25. Third encounter of ) (9) with object.
THEORY OF RANKED-ORDER FILTERS
Figure 26. FP of ) (9).
Figure 27. First encounter of ) (25) with object.
Figure 28. Second encounter of ) (9) with object.
297
298
BART WILBURN
Figure 29. New class of filter.
way an object with an aspect ratio n:1, n 1 can satisfy the output requirement of the coincidence condition for every shift down by one row is for that object to extend beyond the field of data in the X direction, or in the Y direction for an aspect ratio 1:n, n 1. This means that the roots to ) (L ), L : 9 are quasi-roots, and thus ) (L ), L : 9 is a false filter and ) (L ) is not a legitimate class of filters. So what do we make of the 3 ; 3 MW filter that apparently has a true FP root? What we make of it is that we suspect that it is a member of a different class of filter. Suppose we construct a filter as shown in Fig. 29. This filter is denoted by * (Q), where Q : 2v ; 1, and for this instance has a dimension of Q : 25. Let us suppose that (Q) is a class of filters. * We may also note that Q : 4N 9 3 or Q : 2(2N 9 1) 9 1, that is, Q : 2L 9 1. There is an interesting analog to take note of here between Q : 2L 9 1 and L : 2N 9 1. This leads us to note that as N : 2k ; 1, then L : 4k ; 1 and Q : 8k ; 1, and as L : 2q ; 1, and Q : 2v : 1, then q : 2k and v : 4k. If Q : 25, then k : 3; conversely, if k : 1, then Q : 9, and thus (9) is the smallest member of (Q). (N.B.: (9) is a 3 ; 3 MW filter.) * * * We need to test the supposition that (Q) is a class of filters. To do this, * we will conduct the same experiment for (25) that we did for ‘‘(25).’’ (We * use scare quotes because ) (25) is not a legitimate filter.) The analog to Fig. 24 for the sampling of the leftmost pixel of the topmost edge by the center filter element results in a necessary data pattern for satisfying the coincidence condition as shown in Fig. 30. If we continue the experiment for (25) as we did in the foregoing, we * will find that there exists an isolated object pattern that satisfies the requirements of an FP root to (25) as shown in Fig. 31. * We can see by inspection that this isolated object is an octagon with edge k ; 1 in length, and thus it has symmetry about its axes. Furthermore, we
THEORY OF RANKED-ORDER FILTERS
299
Figure 30. First encounter of (25) with object. *
can see that the response for (25) includes the (9) response. Therefore, * * we can conclude by induction this this FP response continues for (Q) for * all Q : 2v ; 1; v : 4k, k 1. This establishes that (Q) is a class of filters. * The octagonal response gives rise to the name of (Q) being the class of * octagonal filters by virtue of its FP roots being a family of octagonal
Figure 31. Pulse-type FP of (25). *
300
BART WILBURN
patterns. Finally, as we indicated much earlier, because FP roots to (Q) * include the 3 ; 3 MW, and the square filter is not a filter class, the so-called 3 ; 3 square filter is really the smallest member of (Q) at Q : 9 for k : 1. * 2. Construction of (Q) Fixed-Point Roots * It will not be necessary to describe the details of the root computation to (Q) as the rationale is straightforward and parallels the computation for * (L ) described in Section III.B. We may take a clue for the rationale from
the analog mentioned earlier between (Q) and (L ), namely, *
L : 2N 9 1, L : 2q ; 1, q : 2k and Q : 2L 9 1, Q : 2v ; 1, v : 4k. Recall that the structure of (L ) was the juxtaposition of two 1D filters 3(N), one
rotated with respect to the other by /2 and constrained to share the intersecting pixel in evaluating the coincidence condition. The result of this construction of (L ) predicate was an FP that was a square structure
(k ; 1) ; (k ; 1) with monotonicity in X and Y. The FP root of (Q) is * found in a completely analogous way as the juxtaposition of two (L )
filters, one rotated with respect to the other by /4 and sharing the intersecting pixel in evaluating the coincidence condition. This construction is shown conceptually in Fig. 32.
Figure 32. Construction of (Q). *
THEORY OF RANKED-ORDER FILTERS
301
Figure 33. (17) example FP with k-data. *
The FP of the (Q) is an octagonal structure that is monotonic in X and * Y and also on both diagonals. The requirement of diagonal monotonicity in addition to X and Y monotonicity constrains the FP of (Q) to be a * more rigidly defined structure with a somewhat more complex k /k\ data structure. The k-data structure being all greater or less than any of the values in the monotonic k ; 1 sequence must reflect the diagonal as well as the orthogonal monotonic structure of the root. This means that a given column or row of k-data must satisfy the structure constraint simultaneously for four axes of monotonicity of the root as illustrated by example in Fig. 33 for (17), k : 2 showing the FP with the k-data pattern and the (17) * * overlaid on the pattern. 2.a. Discussion of Complex (17) Fixed-Point Roots * The computation of the grammar for (Q), such as the analysis given in * Section III.E for (Q), is incomplete and cannot be presented at this time.
Nevertheless, we can discuss some of the properties of a complex of (Q) *
302
BART WILBURN
Figure 34. Unsuccessful complex of (17) FP roots. *
FP roots. The first thing we notice about forming a complex of (Q) FP * roots, which are octagonal, is that we cannot simply join them along shared edges and form a continuous FP structure. This is because octagons do not pack into a grid as do squares or hexagons. In order to form a continuous FP surface based on a complex of octagonal patterns, the octagons must overlap in a complex manner represented in the grammar of their combinations. The complexity of successful and unsuccessful patterns of overlapping octagons can be seen by examination of Figs. 34 and 35. Figure 34 is of a complex of four octagons, each one a monovalued pulse: Oct. A: [1], B: [2], C: [3], and D: [4]. The complex is formed by having C overlap D, B overlap C, and A overlap B in the manner shown in Fig. 34. The test for success is monotonicity in all four axes such that all pixels satisfy the coincidence condition. The pattern shown in Fig. 34 is unsuccessful by virtue of the pixels shown encircled in boldface that are forced to have the values of ‘‘3’ and ‘‘4.’’ These pixels violate the monotonicity requirement ‘‘4’’ in the vertical axis and ‘‘3’’ in a diagonal axis to satisfy the coincidence condition as well as the k-data pattern requirement. The errors of course multiply except for selection by substitution, which is not strictly an FP condition. Under closer examination, we can see that the errors can almost be removed by moving C down two rows. In this case of moving C down by two rows, a datum is left uncovered as shown in Fig. 35, that is, the
THEORY OF RANKED-ORDER FILTERS
303
Figure 35. Successful complex of (17) FP roots. *
datum in the center indicated by the boldface circle, and as such is constrained only to satisfy the k-data requirement. Thus we are free to set the value of that datum so as to be locally monotonic, in this case to the value of ‘‘2,’’ which of course satisfies the k-data requirement. This leaves one remaining errant pixel indicated by the character ‘‘a.’’ If left unchanged, it would have the value of D, or ‘‘4,’’ and would violate vertical monotonicity. If we intercede, however, and change it to ‘‘a : 3,’’ then the system has the integrity of a complex FP, and indeed, we can find another pattern ‘‘E’’ indicated by the dashed octagon, to cover the ‘‘hole.’’ The result is a complete FP pattern over the set contained within the complex boarders of the octagonal ‘‘tiles,’’ or a fully successful complex of (17) FPs. * This exercise is not the only arrangement of (Q) FP roots possible to * result in a complex FP, and is only intended to illustrate the delicate balance involved in the conjunction of (Q) FP roots to satisfy a complex FP * structure. Furthermore, it is not a prelude to developing the alphabet or the grammar of the roots; that is a ‘‘work in progress.’’ An interesting consequence of this exercise, however, is that we notice that E, formed by the conjunction of A—D, begins to take on a complex form as an independent FP type or element of an alphabet of (Q) FP roots. Determining that *
304
BART WILBURN
alphabet is a prerequisite to computing the grammar of the conjunction of the roots. The octagonal structure of the (Q) FP roots allows the surface of the * FPs to have local maxima or minima internally and still satisfy the requirements of monotonicity. This means that the (Q) FP roots cannot * be characterized as first-order monotonic surfaces as were the (L ) FP
roots. Nevertheless, the method of characterizing the surfaces of (Q) FP * roots, and analyzing their grammar, based on normal vectors may still apply, but with some modifications and perhaps in a more complex manner. The notion of the approach at present is that the (Q) FP roots are * themselves decomposable into subroots that are first-order surfaces, and that they satisfy a subgrammar of root composition to types of roots. We should mention here that a hexagonal filter, denoted by (Y ), exists * as a class and is formed and analyzed in a manner analogous to the octagonal filter. The hexagonal filter is formed from the octagonal filter by excluding either the vertical or horizontal 1D arm. In this way, it explicity requires diagonal and either vertical or horizontal monotonicity, but excludes horizontal and vertical monotonicity, respectively. Further, we should note that the hexagonal filter packs into a grid. 3. T he Notion of 3D Feature Extraction The discussion of 3D filters is included in this section because of the similarity of their construction with octagonal and hexagonal filters. The construction of 3D filters parallels that of (Q) with the juxtaposition of * 1D filters on axes of monotonicity to construct 3D forms of (L ), (Y )
* and (Q) denoted as ! (8), ! () and ! ( ), respectively. The dimen
* * * sions of ! (8), ! (), and ! ( ) are:
* * 8 : 3N 9 2, N : 2k ; 1 $ 8 : 6k ; 1, thus for 8 : 2q ; 1, then q : 3k. : 5N 9 4, N : 2k ; 1, : 10k ; 1, thus for : 26 ; 1, then 6 : 5k. : 7N 9 6, N : 2k ; 1 $
: 14k ; 1, for
: 2v ; 1, then v : 7k.
The construction of ! (8) is illustrated in Fig. 36 for 8 : 13 and the
construction of ! () and ! ( ) is entirely analogous. * * No attempt will be made here to present the analysis of 3D root structure as, similar to the (Y ) and (Q), it is incomplete at this time awaiting * * further work. We can, however, discuss the roots. They are most easily envisaged as a cube in the case of the ! (8). As we will see, this structure
leads to an interesting notion for application to feature extraction that may provide some physical insight to the structure and composition of materials, both man-made and natural.
THEORY OF RANKED-ORDER FILTERS
305
Figure 36. Construction of !(13).
Let us imagine a 3D FP root to ! (8) to be a cube composed of
(k ; 1) ; (k ; 1) ; (k ; 1) voxels — ‘‘volume picture elements’’ analogous to pixels in a 2D image — as shown in Fig. 37. The immediate question for delineating the types of FP roots and their grammar is: How do we represent the value of the voxel? For the case of (L ) FP roots, the value was represented by the amplitude of a surface (see
Fig. 8). We cannot do this in the case of ! (8) because the three dimensions
of representation are used in defining the voxel. This concern may seem unimportant until we consider how to approach the computation of the
Figure 37. Geometry of FP root of !(13).
306
BART WILBURN
alphabet of roots and their grammar. The method of computing the grammar of (L ) FP root was based on a model of the root as a surface
constrained to be a first-order curve. The notion of how to visualize a 3D root as a complex of voxels is important in the same way, namely, how we visualize it is a model for the mathematical development of its structure and therefore follows the grammar. We cannot visualize the ! (8) FP root as surface, but we may visualize it
as a cube having an internal structure related to a detectable state of a 3D region. The structure detected by ! (8) is the relationship of the states of
voxels satisfying the locally monotonic structure of a ! (8) FP. The
representation of this model in mathematical terms cannot follow based on the continuity of the slope of a surface as in (L ) FP roots, but must access
a more abstract and elegant representation, one that may subsume the model used for (L ) FPs. We have seen that the valuation of pixels can be either
binary or multivalued, and we may easily extend this flexibility to the 3D case. For example, we may envision that a voxel of material may or may not be polarized along some axis P + ,0, 1-, or that it may be polarized to some degree. Either measure can satisfy the LOMO(v ; 1) FP requirement of ! (8) in an abstract hyperdimensional space. The binary case is particularly
easy to incorporate into an abstract space of (L ) FP structure, and the
multivalued case as well, but in a fourth or hyperdimension. Whereas the (L ) FPs described the brightness, or state, of a region of
a surface, the ! (8) FPs describe the distribution of some detectable state,
for example, brightness or polarization in three dimensions. In this light, the analysis of ! (8) EP opens up a much wider scope of application to study
the internal structure of states of a material surface, or a volume of material. For example, in the case of a material surface, or a complex astronomical object, we may consider feature extraction from a hyperspectral image expressed in terms of the image cube I(X, Y, ). (In the next section we will discuss an interpretive transform of imagery based on a representation of features in terms of FP roots. In that case, the utility of this representation is particularly appropriate because our experience does not provide for an intuitive model of recognition of an object in an image cube.) In a similar fashion, we may consider extracting features of regions of polarization in terms of the Stokes vector components S (X, Y, ) or the volume distribution of polarization at a given wavelength S (X, Y, Z) . H The case of feature extraction of the volume distribution of species in material brings to mind several applications in the form of I(X, Y, Z), for example, those derived from composites of tomographic X-ray or acoustic imagery. These are seen mainly in medical imagery, but we should not overlook the cases of man-made microstructures, or of larger-scale 3D imagery such as seismic or marine mapping. It is implicit in these dis-
THEORY OF RANKED-ORDER FILTERS
307
cussions that although the detection of species in a volume of material is dependent on the probe, the method of feature extraction remains the same. Finally, we should also note that these applications also apply to the 3D coded window version of ! (8) and the grammar is almost trivial, as there
are only three types of roots. The application of a 3D filter constrained by a predicate condition for FP or binary patterns amounts to a feature extraction of internal structure expressed in a finite alphabet of terms just as in the case of the 2D filters. The significance of the expression in terms of a finite alphabet will be explored in the next section. IV. A Language Model Based on Ranked-Order Filters The central thesis of this section is to advance the technology of object recognition. The development of automated object recognition discussed here is based on a top-down approach of simulating the natural phenomena of object recognition rather than a bottom-up approach of inventing an apparent function of object recognition with current technology. To that end, the development described here seeks to fulfill a necessary step in this top-down approach, and we will show how the advances in MW filters described herein factor in the development of object recognition technology. To begin, we must briefly discuss what we understand natural object recognition to be, and then discuss the perspective of this development based on that understanding. We assume that automated object recognition is a tool that is useful for us. Thus it is one that we understand and its function is compatible with our understanding of objects. This understanding derives from the phenomena of natural object recognition by human beings. We assume the phenomenological perspective that natural object recognition by human beings is the predication of an object by a name conveying what it means to us for an object to be what it is. In this way, natural object recognition is the relationship of an observer to the world in a language community such that an observer interprets the world by language. For example, a human observer may assert the following: ‘‘That object is a desk.’’ Other people understand what that object is in terms of function, or application, and some notion of extension. In the case of automated object recognition, we must distinguish between recognition and classification. A computing machine, functioning in the capacity of a tool to recognize objects must, like a human observer, point out an object in the context of an image and assign a name to it. Thus the tool identifies that object. In this case of automated machine recognition of an object, the meaning of the object is conveyed by its name and is understood, rather than given, by the user of the tool.
308
BART WILBURN
The notion of meaning of an object is derived from the collection of relationships of an object to a user’s various experiences of that object, or one like it, in terms of purposive ends and events. This notion of meaning in the present tense context of a particular object is the significance of that object and it is paramount to recognition of it. A machine (and here we mean a tool), then, must simulate the notion of significance in the local context of objects. Simulating significance involves the simulation of a user’s experience relevant to the object as if the user was in that local context having the present tense experience of that object. This is the classical ‘‘frame problem’’ of artificial intelligence (AI) and it has been the major stumbling block to efforts to develop automated object recognition. The purpose of this project is to take a step toward addressing the ‘‘frame problem’’ by establishing a commonality of language between the machine and the ‘‘perception’’ of the object by the machine. In the following paragraphs it is argued why establishing such a commonality of language is a necessary step toward fulfilling the goal of developing a useful tool for automated object recognition and, further, how recent developments in ranked-order filters opens a door to this development. We anticipate further that in pursuing this project we will also satisfy other useful applications of automated feature extraction from imagery.
A. The Necessity and Possibility of Interpretive Transforms of Imagery The experience involved in automated object recognition by a machine is simulated experience given by user of the tool in terms of his or her language, that is, with propositions. In order to relate the object to experience, the representation of the object must be logically compatible with the representation of experience, that is, both representations must be in propositional terms. In past efforts on this problem, the user of a computing tool interpreted an object to a representation in propositional terms and the result was variable from one user to another, making the process indeterminate. The first problem then, before that of simulating cognition, is a translation or transform of imagery into a natural language of imagery that is logically compatible with the language of the tool. The second problem is the interpretation of the language of imagery to a representation of meaning in the user’s language. We may discuss object recognition in somewhat greater detail by defining it as a true assertion by an observer to be some noun object that conveys the meaning of the object to be what it is. This means that the assertion of an object S to be a P is true if it ‘‘makes sense or is possible’’ in the context of the observer’s experience as a being-in-the-world with objects. This
THEORY OF RANKED-ORDER FILTERS
309
includes the collective experience of being in a community. That is, the assertion of the object to be some thing — P — must be logically consistent with the experience of the user in all past contexts having some relationship to the present tense of the local context where that object is occurring. For this logical consistency to be robust, the set of propositions defining past experience must be revisable by verification in the actual world. The objective before us here is, of course, a limited scope of simulating this phenomenon in a propositional logic model representing the object and experience it in propositions subject to logical consistency. In order to evaluate the logical consistency of the present tense context of the object and the relevant experience, the representation of the object must be by propositions in a direct transform from the image. This transform is a simulation of the natural interpretation of ideational complexes of objects to sentential complexes of propositions. This further means that the representation must be by a propositional language system, and hence it must be defined on a finite alphabet. This approach differs significantly from conventional approaches based on classification of object data in terms of templates, or by Bayesian estimates of joint probabilities of occurrence. This approach is most similar to knowledge-based reasoning and truth maintenance systems, yet differs from them dramatically in the method of representing the object in language. The detailed arguments of this approach may be found in Wilburn (1998). The well-known limitation of template matching as a basis of object recognition follows from templates being extensional representations of features. As such, they are a limited sample of a very large, denumerably infinite, number of possibilities of how that object may be found in whole or recognizable part. The distinction between the approach described here and Bayesian methods is subtler and needs further explanation. The conditional probability of the Bayesian estimate rests on the posterior and assumed prior probabilities of occurrence of objects to estimate a particular likely occurence of an object being one of an assumed set of objects. The classical weakness of the Bayesian method is the assumption of the prior probability. The major shortcoming, however, is that it is based purely on the probability of occurrence. Considerable advances have been made in recent years with Bayesianinference methods in the form of hybrids with knowledge-based systems, for example, by incorporating null constraints and descriptive terms of proximity and appearance into the object data to result in a classification of objects by decision trees. Nevertheless, Bayesian-inference methods remain based fundamentally on a probability of joint occurrence. The Bayesian method does not convey information about an object’s identity inferred from the
310
BART WILBURN
relational structure of object data independent of any particular occurrence, or from the relationship of the object in question to other objects with respect to function. Methods of image reconstruction, or enhancement, such as maximum entropy, and most recently the pixon approach of Puetter and Pina (1998) that is related to Bayesian estimates, have demonstrated utility in estimating features by enhancing noisy and blurred images. These methods are also based on pixel-bound statistics, but in the case of the pixon, the formalism is augmented by a measure of structure in the form of a correlation length. Nevertheless, none of these statistical methods can ‘‘logically deduce’’ an object’s determinate identity in terms of function or effect on a purposive end. Statistical methods cannot provide a sense of meaning about an object because they lack semantic content of a ‘‘concept.’’ A concept must be represented by the intensional properties of a sort of object that are true of it in any individual instance and the intensions of an individual object true in any situation where it may be found. A concept of an object represented in this way can be related to experience of that object, or one like it, namely, a member of that sort of object. Intensional properties are those properties common to a sort of object that are true of it under any conditions of occurrence, for example, no matter where or under what conditions you find them, all balls of any size, color, or surface design are spheres. Because Bayesian inferences lack this ability to represent the intensional property of an object, they fall short in the AI ‘‘frame problem’’ of recognition in contexts of apparent contradiction or partial data. Consider, for example, the following situations: partially obstructed, or occluding, objects such as a side view image of two flatbed trucks parked beside each other but headed in opposite directions so that the beds are exactly overlapping; and another, perhaps more common case of objects in strange local contexts such as a cartoon, for example, a soldier standing in a flowerpot. These problems are derived from the predicate of the object being defined on pixel-bound statistics, that is, extensional data, rather than on relational structure true in all instances of the object and compatible with a grammatical logic indexed by function and events. The outcome of statistical methods is a probability estimate founded on historical occurrence rather than an abductive inference to the ‘‘best’’ answer based on logical consistency of available evidence and historical relationships associated with purposive ends. The assertion of ‘‘What an object is’’ is the answer to the question of: ‘‘What does it mean to us for it to be whatever it is?’’ The answer is an intentional interpretation of the extensional object in terms of application — what we do with it, or to it, or what it does to us — and identity, namely, existential quantification. Object recognition then is a pointing-out-
THEORY OF RANKED-ORDER FILTERS
311
as of the individual object by an observer. That is, object recognition is an assertion by an observer, in the language understood by his or her community, to correctly signify the truth of a particular object to have the meaning to his or her community of being what it is. The recognition of an object is accomplished by language in a name conveying its meaning in terms of application, and we refer to the noun-phrase of a name as a substance-sort. Individual objects are members of a class, or sort, of object. The application of a sort of object is the consistent tautological deduction of all experience of that sort of object to be the ‘‘concept’’ of it shared by a community of observers. An individual member of a sort is distinguished from all other sorts of objects by intensional properties not shared by any other sort. A sort, then, is defined by a nonempty logical intersection of the intensions of individuals. It follows therefore that we must extract features of objects in terms of intensions in order to be compatible with recognition of an object in terms of identity and application, or meaning. This requirement for an intensional representation of features leads us to the constrained MW filter for feature extraction in terms of fixed-point (FP) roots of the filter. We have established that the ,e@- are a set of relational patterns independent of absolute pixel value, and that they represent the property of relation in that region of the image independently of the extension of the image. Further, we claim from experience that the relationship of values of data is reasonably invariant of any particular occurrence of an object. This means that the members of ,e@- are an intension of the image in that region. We have further shown in Section III.D that the relations required by R (u ) between the monotonic data and neighboring k data of the e@
$. D have the result that the e@ are constrained by rules of combination so that the co-joining or partial superposition of e@ satisfies a kind of grammatical compatibility. This entire argument means that an image of a feature represented by an ordered sequence of e@: (e@, . . . , e@ is then an intensional D DY representation of that feature. A collection of such co-joined, thus grammatically compatible, e@ satisfies a collection of predicates R (u ): R (u )
DY
D * R (u ), representing the predicate of an object in an image composed of
DY FP roots (e@, . . . , e@ R (u ). As a practical matter, we cannot say that D DY DY any image of that feature will have the same representation by (e@, . . . , e@. D D We can say, however, that any image from a set of images of that feature obtained within the limits of linear exposure, equivalent range, perspective and resolution, and digitized at the same level will have the same intensional representation. Furthermore, as individual objects are members of a sort of object, features are members of a sort of feature and images of them form families of images over range and perspective. The result is usually a fairly large set of possible images for which the intensional property of the feature
312
BART WILBURN
holds true in practice. The demonstrated utility of the filters for feature extraction further suggests that they may satisfy the requirement for detection of features of many, although certainly not all, objects of interest. The satisfaction of a grammar, or syntax, for conjunction of a finite set of FP roots that are intensions of objects leads us to the notion of a language system for an interpretive transform of features. Our motivation is that we may employ an interpretive transform for automated object recognition by a logical model based on propositions describing applications indexed by event, including other objects in the worlds of various events. What we need, then, is to develop a representation of features that can have semantic content, namely, to be true or false insofar as being in or not in the vocabulary of a semantic language and indexable to propositions of experience. This transform is the simulation of the transform of an ideational complex to a sentential complex as happens in natural object recognition. We must discuss the structure of a syntactic structure based on e@ and then show how this syntactic structure satisfies a semantic language model. In the simplest terms, a semantic language is a model of a propositional syntactic system (PCS). The system is a triple (A, L, S where: A: A set of denumerable atomic sentences. L: A set of logical symbols, for example, , , , , ^, (,)-. Respectively: ‘‘ ’’: negation; read as: ‘‘not,’’ ‘‘’’: conditional or entailment; read as: ‘‘if . . . then . . . ,’’ ‘‘’’: conjunction; read as: ‘‘and,’’ ‘‘^,’’: disjunction; read as: ‘‘or,’’ and ‘‘(,)’’: parentheses. S: The smallest set of sentences including A such that if A, B + S, then so are A and (AB). A language model of this system includes the concept of valuation of sentences composed by the logical connectives of atomic sentences; thus a language model includes the semantics of the calculus. Let us describe a language with the construction L ; L (1 , 1 , . . . , 1 ; p , p , . . .). The p’s are propositional variables and in the case of this language they are substituted with e@. The 1 are i( j)-ary connectives of the ith type connecting j variables, for example, 1 (p , p , . . . , p ). In our case, the variables are the e@, and the connection of any two may or may not be allowable depending on their types. Further, any logically allowable combination of the e@ must also be in the vocabulary of the language for it to have semantic value. In this case, we say the 1 are truth-and-relation functional connectives if there is a truth table for determining the truth value of 1 (e, e, . . . , e) to be either true(T ) or false(F) that it is in the vocabulary of the language. The truth value of any 1 (e, e, . . . , e) is determined according to the truth value of the individual e , and if the e are related by a relationship R (e, e, . . . , e) governing 1 permitted by the language. The relationship A
313
THEORY OF RANKED-ORDER FILTERS
R (e, e, . . . , e) introduces the relatedness component of the logical system A for a language model to be a semantic structure of nonclassical logic. This form of nonclassical logic is distinguished from classical logic that considers only form and truth-value neglecting content and relationship. The language L is based on a nonclassical logic. Nonclassical logic is defined by two types of relational structure, namely, a set-assignment semantics and a relations-based semantics, athough in the end, the sentences defined by either are semantically equivalent. A complete model must consider both, but our immediate concern here with the structure of the language will focus on the relation-based type. In the case of a relations-based semantics, we have for A and B the valuation of A and B connected according to 1: (1(A, B)) + ,T, F) determined as: (1(A, B)) : T iff R (A, B) is true, that is, 1(A, B) R (A, B). A A The relation R (A, B) is simply the relationship governing allowable comA binations of A and B. The truth-and-relation function f is the calculation of the truth value in a language of the combination of A and B given their individual existence as T or F. Suppose A and B are e and e, respectively. If (e) : T or F and (e) : T or F, then for a simple logical operation of 1 : ‘‘’’ in the language of ordinary propositional logic f( (e), (e)) is: f(T, T ) : T, f(T, F) : F, f(F, T ) : F, f(F, F) : F, if and only if the combination of e and e by is an allowed combination according to R(e, e) : T. A formal relation-based model M for a language L based on (e@ ) is ! D given then by M : (, R , 1 , 1 , . . . , 1 ; e, e, . . . , e; (e@ ), (e @ ), . . . A D D where: (e@ ), (e@ ), . . . complex propositions composed using 1 , D D is the evaluation (p) + ,T, F-, R * Sub(wffs(i )) is the relation governing the truth table for 1 allowed A by logical compatibility of the language. The valuation is applied to the complex propositions (e@ ) : 1 (e, D e, . . . , e) by: (1 (e, e, . . . , e)) : T iff
R (e, . . . , e) and f ( (e), . . . , (e)) : T
.
314
BART WILBURN
We can see now that the R defines the syntax of the nonclassical language A L . In the case of this language, the syntax we are addressing refers to the allowable combinations of characters rather than the more complex case we are accustomed to including word order. In the end, we will incorporate that as well, but that involves the predicate calculus of a quantificational syntactic system (QCS) and is beyond the scope of what is intended here. The semantic content is established by R and f ; R establishes that 1 is A A logically possible based on the methods developed in Section III.D, and f establishes that the combination according to 1 is in the vocabulary of L . The notion of a language based on an alphabet of e@ is that a feature d composed of e@ is an ordered sequence : (e@ , . . . , e@ ; (e@ ) R (u ) that D D DY ! ! D DY satisfies a syntax and has semantic content. The sets (e @ ) that have ! D semantic content in the sense of being a true or false valuation, namely, ( (e@ )) + ,T, F- in the vocabulary of a semantic model, are atomic senten! D ces closed under subsentences e@ . Sets (e@ ) that do not have this kind of D ! D atomic semantic value, that is, ( (e@ )) , ,T, F-, are molecular expressions. ! D Thus, if any (e@ ) belongs to a vocabulary of features it is atomic, and if ! D not, then it is molecular. We should point out that the (e@ ) are perhaps ! D better understood here as well-formed formulas (wffs), which are the individual e@ and their combinations by the logical connectives. N.B.: A sentence A: (A) + ,T, F- is a wff. The details of the structure of the e@ , and D the structure of a language based on them, are described in Wilburn (1998b) and briefly outlined here. B. Satisfaction of a Propositional Language System We must now show how L satisfies the requirements of being a proposi tional language system (PCS), that is, we need to look at the properties a propositional language has that a language of imagery must also have. To do this, we need to define a few concepts of classical logic systems incorporating the notions of syntax and semantics. A property that stands out in importance is the property of finitary entailment, and explaining it will serve to illustrate these notions. The theorems for transitivity and semantic-syntactic deduction for finite consequences can describe finitary entailment. In this exampe, ; is a set of sentences in a language model, and A and B are sentences, or propositions, in that model. Transitivity: ; / ,A , . . . , A -:B iff ;:A , i : 1.n, then :B Semantic consequences: ; / ,A , . . . , A -:B iff ;:(A A . . .A )B, where ‘‘:’’ means ‘‘validates,’’ for example, A :B: A validates B, or B is a semantic consequence of A. The case of :A means: A is a tautology, or is
THEORY OF RANKED-ORDER FILTERS
315
true in every model. A wff that is always true in every model is a valid wff. The theorem of semantic consequences is an analog to the case of finite syntactic consequences as follows: ; / ,A , . . . , A - 9 B iff ; 9 A (A (. . . (A B) . . . ) or ; / ,A , . . . , A - 9B iff ; 9 ((A A . . .A ) B). In this expression, ‘‘9’’ is read as ‘‘deducible from,’’ for example, 9B: B is deducible from in a logical proof. In the case of B + , -, then we refer to B as a theorem of the system and may denote it as 9 B. In this case, we may also refer to a theory closed under a rule, for example, modus ponens, denoted by Th( ) : ,A: 9A-. We can understand these expressions of finitary entailment as two forms of expressing a logic — one is semantic and the other is syntactic. A semantic description is in terms of the truth of propositions and a syntactic description is in terms of the theroemhood of propositions. Finitary entailment is founded on the notions of consistency and completeness of a system, and includes the important property of compactness. (a) A system is consistent if all of its theorems are valid well-formedformulas (wffs), meaning always true, in all models, namely, all theorems are tautologies: If 9 A, then : A. (b) A system is complete if all valid wffs are also theorems of the system: If : A, then 9 A. (c) A system that is both consistent and complete is strongly complete: 9 A iff : A. (d) A system is compact in either a semantic or syntactic sense of: ; : A iff there is a finite * ; such that : A, and similarly for ‘‘9.’’ Thus, compactness incorporates the notion of finiteness. The notions of consistency, completeness and compactness imply some other important properties of systems. A compact system has a finite model of it, it is without contradiction (consistent), and it is a model of all its elements (complete). Succinctly stated (Epstein, 1995): (a) Consistency means that for every A, ; 9 A or ; 9 A, but not both. (b) Completeness means that for every A, A or A is in ;. (c) If a system is consistent and there is a D: 9 D (using ‘‘9’’ to mean: D is not deducible from ), then there is a strongly complete ; such that D , ; and * ;. (d) Every consistent set of wffs has a model.
316
BART WILBURN
(e) If ; is strongly complete, then ; has a model and every finite subset of ; has a model. (f ) If ; is consistent, then there is some A: 9 A. These definitions and theorems serve to introduce the basic notions of semantic and syntactic entailment, consistency, completeness, and compactness as properties of language systems. To explore the structure of a nonclassical language based on these properties, however, we need to employ a more powerful or flexible metalanguage. We introduce this metalanguage by recalling the notion of valuation given earlier as + ,T, F- to be a mapping of sentences into ,T, F-. If a valuation maps all sentences of a language into ,T, F-, the language is a bivalent language, in this case specifically a bivalent propositional language. A valuation is an admissible valuation if it is a member of a set of points V L ; + V L , associated with the closure set of wffs of a language L. The truth valuation space of a wff A is H(A) : , + V L ; (A) : T -; H(A) is the truth set of A, meaning the set of all points in V L where A is true, or more logically stated: the set of points where A is satisfied. The valuation space of the language L is: H : (V L , ,H(A) ; A + L-. We have several useful definitions that follow from this concept (Van Fraasen, 1971a): (a) A wff is a valid wff, : A, in L iff every admissible valuation in H of L satisfies A, that is, (A) : T for all , + V L . (b) A set of wffs X is unassailable if every admissible valuation of L satisfies some member of X. (c) A set X of L semantically entails A, X : A, in L iff every of L that satisfies X also satisfies A. (d) The set H(X ) : > H(A) is the elementary class of X. A union of /Z0 elementary classes that span H is the cover of H. These results are a kind of restatement of those given earlier for consistency and completeness, but defined here in set-theoretic terms. With this approach, we may summarize these definitions in theorems as follows: (Def.: ` : the null set): (a) : A iff H(A) : H, (b) X is unassailable iff + H(A) : H, that is, if the truth set of all of /Z0 X is the cover of H. (c) X is satisfiable iff > H(A) " `, that is, if the elementary class of /Z0 X is not empty. (d) B : A iff H(B) > H(A) " `.
THEORY OF RANKED-ORDER FILTERS
317
(e) X : A iff > H(B ) * H(A), that is, if the elementary class of X is a $Z0 subset of truth set of A. We may now redefine compactness in terms of intersection and union. Compactness is described in two forms (Van Fraasen, 1971b): I-compact (intersection), and U-compact (union). (a) A language L and its valuation space H is I-compact iff for any set X of wffs in L, > H(A) : ` only if > H(A) : ` for any finite /Z0 /Z1 subset Y of X. This is the same as saying that the property of I-compactness means that any set in L is satisfiable iff all of its finite subsets are satisfiable. (b) A language L and its valuation space is U-compact iff for any set X of L, + H(A) : H only if + H(A) : H for some finite subset Y /Z0 /Z1 of X. This is the same as saying that the property of U-compactness means that any set in L is unassailable only if it has a finite unassailable subset. (c) A language that is both I-compact and U-compact is compact. Finitary semantic entailment is definable in these terms now as a property of a language: X :A iff for any X of L and a wff A of L, H(X ) * H(A) only if H(Y ) * H(A) for some finite subset Y of X. There are some other more subtle aspects of this subject not given here, and the reader is referred to Wilburn (1998b) for a more complete discussion. The language we have discussed in connection with the model M, is as we have said, a bivalent language. A bivalent language has the inherent property of exclusion negation. A language has this property if for every wff A of L there is an A* of L such that: H(A*) : H 9 H(A). A basic theorem is given by Van Fraasen (1971c) that connects compactness and finitary entailment for a language having exclusion negation: Theorem IV.4-A: If a language L has exclusion negation, then: (a) (b) (c) (d)
L is I-compact. L is U-compact. L is compact. L has finitary entailment.
This theorem result for finitary entailment is conditional, however, and not necessary. For finitary semantic entailment to be necessary, we need the property of convergence that is supplied by the construction of a filter. The customary use is to prove compactness. We introduce the logical construct of filters to illustrate how a nonclassical propositional language is represented and distinguishes propositions. Finally, we will show by example that
318
BART WILBURN
L satisfies the requirements of being a nonclassical propositional language system. In the foregoing we saw from Van Fraasen (1971a, 1971b, 1971c) that a language L having exclusion negation and compactness implies finitary entailment, but not necessarily. The missing condition is the convergence of a filter. As noted, the notion of filters has found application in logic as a tool for proving compactness and finitary entailment, and that it is used here as well. However, we also appeal to filters as a tool for distinguishing features by finitary semantic entailment, namely, semantic deduction for finite consequences. To do this, we must understand filters and show that the language L is compatible with the structure of filters. Finitary entail ment is intuitively necessary in order to make a deducible assertion in a language based on finite evidence. Convergence is also intuitively necessary in order that an assertion is consistent or unambiguously understandable in that language. Even if the assertion is a disjunction, for example, S is P or Q or . . . , it must be the same disjunction given same the evidence for the assertion. A filter I is defined on a set of sentences X in terms of the valuation space H(X ) to be the set X in I such that: (a) ` , I. (b) If Y + I and Y *Z*X, then Z + I. (c) If Y + I and Z + I, then Y . Z + I. This definition in terms of H on L leads to L(I) : ,A: H(A) + I-. From this follows that if for i : 1, n; ,A - + L(I-, then if ,A - : B in L, then B + I and L has finitary entailment, thus L(I) is a system. A filter I may contain subfilters, or filter bases B that can generate filter I such that I contains B. A filter I on X is an ultrafilter if there is no filter on X that contains I as a proper part, namely, it is a maximal element as the basis for including all of X subject to H(X ); (X) : T. That is: Ultrafilter I: ,A: H(A); (X) : T for every A + X in L-, and every filter base is contained in an ultrafilter. The notion of maximal element in itself implies finiteness, but more rigorously, X must be finite because the system is defined for A, B, A and (AB), and we cannot have a sentence that is a maximal element of an infinite conjunction. (N.B.: In this case, we have the system defined for the truth-functionally complete L( , ).) This construction of the system also defines the filter to be defined on H over the closure set of X and it is finite. Thus the union of the elementary classes is the closure set of X.
THEORY OF RANKED-ORDER FILTERS
319
If I is an ultrafilter on X, then (Van Fraasen, 1971a): (a) Y / Z + I iff Y + I or Z + I, for all Y, Z * X. (b) For every Y * X, either Y + I or X 9 Y + I. With this result, we can see if L(I) is bivalent, compact and convergent, then L(I) has finitary semantic entailment, and thus L(I) is a language system. As we shall see directly, convergence is closely allied to compactness and is important. Convergence is defined analogous to compactness in terms of intersection and union as: (a) U-convergent to in H iff every elementary class containing belongs to I. (b) I-convergent to in H iff every elementary class in I contains . (c) Convergent to if every elementary class belongs to I iff it contains . Let us imagine a simple language L , of atomic sentences p, q, and r, and the truth functionally complete syntax of L ( , ). Further, let us imagine the complex connectives 1 to be represented by the connective ‘‘’’ of atomic sentences A and B as (AB), defined as A co-joined with B and subject to the relatedness logic predicate of R(A, B). It is most important to emphasize here again that (AB) does not mean the occurrence of A and B in an image, but rather the occurrence of A joined with B in an image. For our purposes here, we will assume that ‘‘’’ means simple joining, but in actual fact, as mentioned earlier, it is a little more complex than that allowing for overlaps of the data satisfying FP constraints. However, for now this interpretation will do. Let us imagine that the variables are defined as p: an up-ramp (ramp ) of levels 0—5, q: a pulse of level 5, and r: a down-ramp (ramp\) of levels 5—0. Now suppose the set of sentences involving p, q, r are subject to R(A, B) : ,T, F-; (AB) R(A, B) as follows: ( pq); R ( p, q) : T (qr) ; R (q, r) : T ( pqr); R ( p, q, r) : T (qq) ; R (q, q) : T
(pr) ; R ( p, r) : F (rp) ; R ( p, r) : F ( pp); R ( p, p) : F (rr) ; R (r, r) : F
(qp) ; R ( p, q) : F (rq) ; R (q, r) : F (qrp) ; R ( p, q, r) : F (rpq) ; R ( p, q, r) : F (rqp) ; R (p, q, r) : F
A language based on this set of sentences may be thought to correspond roughly to a language based on the 1D FP roots described in the foregoing. (We should also postulate a second type of pulse of level 0, but that would lead to a larger set of V L that does not materially add to the explication intended here.) We may now compose a partial truth table in the usual fashion for sentences. We need to pause, however, and take note that the
320
BART WILBURN
nonclassical aspect of this logic imposes a condition on the set of sentences that might be overlooked. Those sentences for which R(A, B) : F, such as (pr) are absurd, that is, they simply do not exist in X. In this case, the axiom of (A, L, S; ‘‘S: The smallest set of sentences including A such that if A, B + S, then so are A and (AB)’’ is subject to R(A, B) : T. With this in mind, we may proceed with p, q, r, ( pq), (qr) and (pqr). With the three atomic sentences (wffs) p, q, r, we have H : 2, H : 8 for , . . . , comprising V L . The truth table is shown as follows (T : 1, F : 0). We can demonstrate the properties of L using the tools developed in the foregoing. We may construct a filter base B on a subset Y of X; Y : p, q as: B : H( p), H(q), and H(pq). B : , , , , -, , , , , -, , , -. The filter B is a base because Y : p, q, and pq is satisfiable, that is, H(A) : , , -, which is the elementary class of Y . We may also > /Z1 observe that the filter base B is the valuation space of the closure set of pq. The filter base B generates a filter I over the complete subset 1 Y ? Y that contains B , B * I : [N.B.: (a^b) : ( a b).] 1 I : H( p), H(q), H( pq), H(( pq)^( p q)), H( ( p q)), 1 H( ( pq)), H( ( p q)), H( p^ p). When we represent this in terms of V L , this corresponds to: I : , , , , 1 -, , , , , -, , , -, , , , , 1 , , , , , , -, , , , , , , -, , , , , , , -, , , , , , , , , -. This is a more interesting filter than B . We see that the tautology H( p^ p) : H, and the contradiction H( p p) : `. Note that exclusion negation accounts for not including H( ( pq)). Further, that > H(A) : , , - and + H(A) : H, and that > H(A) * H( p q), /Z1 /Z1 /Z1 that is, , , - * , , , , , , -, thus Y : q [N.B.: p q : ( p q)]. This shows that L is I-compact and U-compact for a finite subset Y of X. Thus for every X in L , L is compact and has finitary semantic entailment including finite axiomatizability. (We could generate a similar filter I based on B : H(q), H(r), and H(qr).) The question 1 now looms: is filter I an ultrafilter? The answer is no. It is itself contained 1 in a filter I over X: p, q, r, and its family of allowable conjunctions. Adding r to the subset Y to be the set X of L results in a filter base B , 0 B : H( p), H(q), H(r), H( pq), H(qr), H( pqr). 0 This base generates a filter I.
321
THEORY OF RANKED-ORDER FILTERS TABLE IV Truth Table of L ( , , R, p, q, r)
p
q
r
p
q
r
pq: R(p, q)
qr: R(p, r)
pqr: R(q, r)
1 1 1 1 0 0 0 0
1 1 0 0 1 1 0 0
1 0 1 0 1 0 1 0
0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1
0 1 0 1 0 1 0 1
1 1 0 0 0 0 0 0
1 0 0 0 1 0 0 0
1 0 0 0 0 0 0 0
I : H( p), H(q), H(r), H( pq), H(( pq)^( p q)), H( ( p q)), H( ( pq)), H( ( p q)), H(qr), H((qr)^( q r)), H( ( q r)), H( ( qr)), H( (q r)), H(p^ p), H(pqr), H(( pqr)^( pq r)), H( ( p(qr))), H( ( p (qr))), H( ( p (qr))), H( ( p( qr))), H( ( p ( qr))), H( ( p(q r))), H( ( p ( qr))), H( ( p (q r))), H( ( p ( q r))), H( (( pq) r)), H( ( ( pq)r)), H( ( ( pq) r)), H( ( ( pq)r)), H( ( ( p q)r)), H( ( ( p q) r)). When we represent this in terms of V L , the filter I shows that the language L is I-convergent to , - and U-convergent to + H, and has finitary semantic entailment. Furthermore, the filter I is an ultrafilter as it contains I ; B * I * I and there is no filter that contains I as a proper 1 1 part. Thus L (I) is a nonclassical propositional language system. The effect of R(A, B) : 0 is to reduce the number of truth sets in I, but has no effect on the compactness and convergence of L (I). Let us examine finitary entailment with a few examples. Suppose we find (pq). Then may we infer (pqr)? The case then is: ( pq) ( pqr)? ( pq) ( pqr) : (( pq) ( pqr)). The test is made by >
/Z0
H(A) * H(( pq) ( pqr)).
322
BART WILBURN
The set X is the filter base of I. Thus, in the case of p, q, and r, > H(A) : , -, the elementary class of X. The truth set of the entailment /Z0 is H( (( pq) ( pqr))) : H( (( pq) r)) : H 9 . Clearly *H9 . Thus, the test succeeds; therefore: (pq) ( pqr) is true in I at . This means that if we find an isolated object of (pq), then we may infer that it is possible that ( pqr) is the actual object. Let us suppose that instead of (pq), we find (qr). Can we infer that (pqr) is possible? By the same analysis we know that the elementary class is and find the truth set of the entailment is: H((qr) ( pqr)) : H 9 = *H9 . Thus, the entailment succeeds. Finally, for completeness, let us suppose we find q. Then can we infer (pq)? Here again, the elementary class of p and q is , , -, and the truth set of (q ( pq)) is: H((q ( pq) : H 9 , , = , , - * H 9 , , -. Thus, the test succeeds and the inference can be made in I at , , -. A 1 similar example can be shown for (p ( pq)), and for (qr) entailed by q or r but in I at , , -. 1 Let us consider another case. Suppose we find (p q). Then may we infer (p qr)? We apply the same procedures as before, that is, X: p, q, and r, and the elementary class is . The truth set of the entailment sentence (p q) (p qr) is: H( (( p q) ( p qr))) : H( ( ( p q) r)) : H 9 , and *H9 . Thus, the test shows that the entailment of (p q) ( p qr) succeeds and if we find ( p q), we can infer that (p qr) is possible in I at . The complement works as well: ( qr) ( p qr). A similar analysis, however, shows that if we find (p q) we cannot infer (pqr). This is because we have asserted (p q) and replacement is not allowed by this
THEORY OF RANKED-ORDER FILTERS
323
logic. If, on the other hand, we find p or q or r, we can infer (pqr) in I, for example, p ( pqr): * H 9 , , , -. The difference between finding (p q) instead of p, q, r, or (pq) distinguishes the possibility of (p qr) from the possibility of (pqr). There are not many degrees of freedom in this language of only eight valuations in V L , but we can see enough to understand how this structure will work. We see that p implies q, (p q), in I at , , -, and that p or 1 q implies (pq) also at , , - in I . In similar fashion, we saw how q or 1 r implies (qr) in I at , , -. We have also seen how p, q, or r, or (pq) 1 or (qr) implies (pqr) in I at , and finally we have seen how (p q) or ( qr) implies (p qr) in I at , but excludes the possibility of (pqr). The notion behind these examples of entailment is that we can infer an object from a sufficient subset of its parts, and we can distinguish some objects from others by identification, or by implication from a sufficient subset of its parts. When we recall that the set of V L for L is 2, and that the connectives 1 account for overlap and joins on four faces in 2D, then we can begin to appreciate the complexity of the language system to satisfy delineation of many objects. If we were to substitute (e, e, . . . , e) for (p, q, r, . . . , s), with the 1 and f , being truth functionally complete, then L is L and we can see that L would satisfy the requirements for L (I) to be a satisfactory language structure for a linguistic transform of features in imagery. This is the principle result: If f , f , . . . , f for connectives 1 (e, e, . . . , e), 1 , 1 , . . . , 1 ; R (e, . . . , e) A in L are truth-functionally complete, then L (I) is a nonclassical proposi tional language system that is compact, bivalent and convergent.
C. Reflections A student new to this subject might question why the representation I does not include logical connectives of p and r. It is worth taking some time to reflect on this as the reason it does not is instructive. It is easy to see that a filter base B defined on p and r is not satisfiable because the intersection is a null set, and thus it follows that a subfilter cannot be constructed based it. The ardent student might suppose we could construct a filter base B for Y : p and r: B : H( p), H( r), H( p r) B : , , , , -, , , , , ), , , -.
324
BART WILBURN
The elementary class of Y is , , -, thus Y is satisfiable. Suppose we continue with a filter I , based on B . I : H( p), H( r), H( p r), H(( pr)^( p r)), H( ( p r)), H( ( pr)), H( ( p r), H( p^ p) I : , , , , -, , , , , -, , , -, , , -, , , , , , , , , -, , , , , , , -, , , , , , , -, , , , , , , , , -. We could stop right here because (pr) is not defined in H(( pr)^ ( p r)). Thus the set of sentences would seem to be not unassailable. Nevertheless, let us push on and we will see how the notion falls apart. We can see that > H(A) : , , - and + H(A) : H, and thus Y /Z1 /Z1 is compact. The problem is that the ultrafilter I is I-convergent to rather than I-convergent to , , -. Nevertheless, it would seem that Y is compact and bivalent, therefore it should have finitary entailment:
and
Y : p r iff > H(A) * H(9p r) /Z1
, , - * , , , , , , -, and indeed it does, and, furthermore, X : p r by virtue of , - * , , , , , , -. So how does this happens? The resolution of the paradox comes about from realizing that I may be a filter, but it is not a subfilter of I: I : / I. The sentences involving p and r may, in principle, appear that they can be included in X subject to finitary entailment under I, even though I does not delineate them. (Recall the theorems defining an ultrafilter: (a) Y / Z + I iff Y + I or Z + I, for all Y, Z * X, or (b) For every Y * X, either Y + I or X 9 Y + I). Is Y * X a problem? Yes, it is, but the assailability of the subset and the disagreement of I-convergence prohibit I from being contained in I. The sentences of finitary entailment involving p and r may seem to be mathematically included in X. However, there is good reason why they are not included in I and should not be included in the language derived from it. The variables for our example language of L are defined as the \ occurrence of a 1D FP, that is, p : 1 means that the p-type FP occurred. A sentence expressing the conjunction of p & q: pq, satisfies R( p, q) and has semantic content in the language with a syntax founded on the notion that it is possible that pq can occur. The connection between the semantic language expression of pq and the actual world occurrence of p, q or pq
THEORY OF RANKED-ORDER FILTERS
325
is the basic problem, and we will address that later in a discussion of the ontology of language. Nevertheless, the relations R of L are based on governing the occurrence, not the nonoccurrence, of conjunctions. The conjunctions involving p and r are for the nonoccurence of one or both of them, namely, p : 1 or r : 1 is true, or p : 1 and r : 1 is true. The filter base B is satisfiable only if R( p, r) : 1. Thus B is compatible with a language structure based on nonoccurrence, or p, q, r, and R( p, q), R( q, r), R( p, r), and so on. For a language based on nonoccurrence to interpret the actual world of p, q, and r, the governing relations must be defined for nonoccurrence, and the resulting language structure would be the complement of I I-converging to . The filter bases B and B are in some negatives of each other and mixing them in the same system defined by the occurrence relations R is a bit like apples and oranges, or more appropriately apples and nonapples. We must respect the primacy and consistency of the relatedness of the nonclassical logic defining L (I) that we redefine in terms of e@ to be L (I). D. Ontological Considerations Ontological considerations concern the relationship between a language of objects and the existence of objects in imagery. We must understand that L is an artificial language and we use natural language as the metalanguage to discuss it. The language L samples the image and detects patterns of data in imagery that satisfy its characters and syntax. There is much more to the relationship than this simple statement, however, and we discover this richness in resolving what may appear to be contradiction found upon close examination of sentences permitted by I in L (I). For example, in our pedagogical language of L based on p, q, and r, both sentences of finitary entailment, that is, If p, then q: p q, and If not p, then not q: p q, are permitted following R( pq) : 1 in I . Both of these sentences derive from the truth conditions of pq and therefore p q being true. We use the term entailment here by convention, but it is more accurately material implication denoted by A 9 B in the case of ordinary propositional logic for a contingent universe. The truth conditions are that the sentence A 9 B is T for all cases of A, B + ,T, F- except for A : T and B : F. The truth tables for the sentences (p q) exemplified by H( ( p q)), and ( p q) exemplified by H( ( pq)), are as follows in Table V. As can be seen in Table V, the conditions of (b) and (c) are contradictory. So what is the relationship between L and the actual world of images? There can be only one actual world, yet the legitimate language permits contradictory expressions of it. The resolution lies in analyzing the terms
326
BART WILBURN TABLE V Truth Table for p q and p q
a b c d
p
q
( p q)
( pq)
T T F F
T F T F
T F T T
T T F T
‘‘expression’’ and ‘‘actual world.’’ We will consider them separately and then relate them together. We will employ the notion of a sentence as a well-formed-formula (wff) and develop the notion of wff in terms of atomic sentences. The reader may think the use of wff is redundant, and some people do, but the notion of wffs permits a delineation of properties without confusion more easily than does the conventional understanding of the term sentence in a metalanguage. We must, however, distinguish between a sentence as a wff, and as a molecular expression that is not in the vocabulary of a language, and further still, we must distinguish between a wff and a valid wff. This brings us into the realm of modal logic addressing necessity and possibility, a realm we will delve into only sparingly in this discourse. If we imagine an image of objects being referenced to time and place, then we consider it as a snapshot of a piece of the actual world at some time. We may think of an image as a representation of some state-of-affairs or relationships linking objects of that place at that time. In this way, we may regard the image as a ‘‘world’’ unto itself. Other images of the same place at different times and from the same, or different, perspective are themselves distinct worlds. Of course images of different places at any time are also distinct worlds in the same sense of representing an event involving places, time and relationships. If we relate an image to our experience of that place at that time, and even further still our experience of those objects and places at other times, then we define a world that has meaning to us in terms of our experience. The point is that an image is an actual world, but there are an indefinitely large number of worlds that are possible involving any given object representing the context of that object in terms of relationships binding it to other objects and features comprising objects. Finally, these possible worlds can be related to experience. With our understanding of wff and possible worlds given in the foregoing, we may begin to link the two in a context of necessity and possibility. The reader may recall the definition of a valid wff as a wff that is true in all models of a syntactic system. We have defined a syntactic system PCS as
THEORY OF RANKED-ORDER FILTERS
327
the triple (A, L , S consisting of A, the set of atomic sentences, L , the logical operators, and the grammar S defined in L by R . We further A defined a model of a syntactic system to be the incorporation of the notion of valuation of a sentence to have semantic content in the sense of being T or F. Earlier, we incorporated semantic content based only on the logical consequences of the syntax to define the truth space H. Now we wish to relate the syntactic system of this artificial language to the actual world so that the truth space of the language is relevant to the actual world. We relate the syntax of L to the actual world by interpreting a model of the PCS as one having its vocabulary defined on its truth space, but constrained by the existence of its terms in the set of worlds or images. In the ideal case, the set of all images has denumerably infinite possibilities, and in the real case, it is a class of images having an indefinitely large number of possibilities. We will address the justification for this constraint on the vocabulary later, but for now, an image, be it of real or imagined objects, is a possible world that may be an actual world at some instant of time, and the set of all images in some class is the semantic basis of L . A valid wff of L , then, is a wff in L that is true in all such images spanning all time, and an ordinary wff is true in at least one such image. Because a valid wff is true in all worlds (images) we may say that it is necessarily true, that is, it is not possible to find an image where it is not true. For example, a tautology is necessarily true, namely, (p^ p) is always true in all images. For an ordinary wff that is true in some, but not all worlds (images), we may say that it is possibly true, that is, it is possible to find an image where it is true, or where it is false, or perhaps both. (Here we understand that being false means that it does not exist.) In this sense, a tautology is independent of an actual world and is an artifact of the language. Propositions that are not tautologies may be a valid wff, but generally they are contingent on the actual world. An object of data, or data object, in an image that satisfies an atomic sentence in L is contingent. For example, and hereafter we will use L as a stand-in for L with the understanding that p, q, or r can be some choice of the e@ in L , an object that may be described by p may occur in some image, but may not occur in another image, and similarly for q and r. Furthermore, objects described by the conjunction (pq) or (qr) — meaning, for example, p co-joined with q as described in Section III.D — may occur in some image and not in another. It is possible that p may occur co-joined with q, but it is not necessary. Thus it is possible that p, q, ( pq), ( p q) or ( p q) are true statements about objects in some image (and similarly for q and r). The structure of L reflects this possibility. The problem comes about from knowing that 3 (N ) cannot detect a conjunction described by (pr)
328
BART WILBURN
in any image. We cannot discuss the problem of p and r in the same manner as for p and q or q and r; we cannot even say that it is necessary that (p r) or ( p r) are not possible because these sentences are logically absurd. The absurdity occurs because they are entailed by p, r, and (pr), but (pr) is not possible. These sentences simply do not exist in the vocabulary of L . The entailment sentences of p and r would permit the possibility of p occurring co-joined with r, which is not possible in the actual world, and coincidentally (pr) is not a possible sentence of L , that is, it is not wff of L . We cannot even say that ( pr) is a necessary property of the actual world as this would still imply (pr) because it is entailed by it. We cannot say anything about (pr) in L . If this seems severe, then we would observe that this situation could be generalized. The reader may be impatient because he or she would complain that of course (pr) is impossible because you simply cannot put these two patterns together with a single set of data and satisfy the monotonicity requirements with the shared data. The consequences of the nonexistence of (pr), however, apply equally to any other pattern of data that do not satisfy the requirements of simply being an FP, yet do exist, to our senses, in an actual image. The ontological implications of this are interesting. It is interesting to consider the question of the existence of (p q), or any equivalent variables, in the language structure. We have shown that the construct of finitary entailment is essential to the logical structure of a language. Furthermore, we postulate from philosophical arguments that finitary entailment is an essential mode of conscious thought expressing the temporality of the ego to project a possible relationship of the observer with his or her world onto the object-horizon of his or her existence. This suggests that the logical structure of language reflects the essence of the ego, that is, temporality. The interesting part of the question motivated by L comes about because we have also shown that the construction of finitary entailment is predicted on the detection of existent objects and their possible combinations in the actual world. The full impact of finitary entailment may not be illustrated by the binary combination of two characters, but the logical extension of (1 (e, e, . . . , e)) (1 (e, e, . . . , e)) begins to tell the story. The relationship between the conscious construction of finitary entailment and the actual world is subtle and complex, turning on the notion of the observer being embedded in the actual world as a part of it. The issue is not that some pattern of data does not exist in some possible world, but that for some N, 3 (N ) would not detect it and L cannot express it. In
a world perceived by 3 (N ), an object described by (pr) or any non-FP
does not exist, thus there is no basis in the experience of a monolingual observer in the language community of 3 (N ) for describing it. This
THEORY OF RANKED-ORDER FILTERS
329
implies that patterns of data in an image that do not satisfy any wff of a language are not experienced by the user of that language in the perception of that image because they are, in a sense, unintelligible to a monolingual observer in that language community. This seems like a rather strong statement until we remember that we have been developing this line of thought based on an artificial language. The conclusion may be easier to understand if stated in the converse: Any patterns of data that are not represented in the experience of an observer are not expressible by that observer, thus there is no evidence of his or her perception of it shared in that community. The structure of the artificial language is a simulacrum of natural language, so this conclusion serves to give us an appreciation of the richness of natural language. This conclusion and the ontological question bring us to the notion of interpreting an ideational complex of an object into a sentential complex about an object and to understand why this is not a neo-Kantian idea that the existence of objects conforms to our knowledge of them. The remaining discussion is based on the developments in phenomenological research spanning the twentieth century. The line of phenomenological thought began with Husserl (1970) circa 1900 continuing through Heidegger (1982), G. Frege (1952), and L. Wittgenstein (1945) in the first half of the century, and it has attracted the attention of several contemporary thinkers too numerous to cite here, but good references are Smith and McIntyre (1982), and Dreyfus (1995). The reader will find, however, that the argument is reminiscent of Aristotle and Plato. The delineation of objects, the notion of predication, and the terms of intensional properties and intensions are reminiscent of Aristotelian substances, sorts, essences and accidents. Indeed, the reader will find much in phenomenology derived from Plato and Aristotle, which may tempt some to comment that we have not really come all that far since Plato. This might be a rather harsh judgment, nevertheless, it is fair to say that the foundation of modern analysis was formed by Greek philosophy culminating in Aristotelian logic. The modern contribution relevant here is the philosophy of mind and language accounting for accessibility of intellect to the actual world, and the formalisms of modern mathematics and logic. Details of the discussion on modeling object recognition given here may be found in Wilburn (1998). The notion of an artificial interpretative transform of a complex of objects is the transform of a complex of data objects to a complex of sentences. These sentences must be compatible with sentences expressing experience relevant to the actual world of the object in order that they can be subject to an abductive inference — a judgment — modeled by logical consistency. This notion is a simulation of object recognition by a natural observer. A natural observer performs an interpretive transform of an identical complex
330
BART WILBURN
of objects and experience into a complex of sentences and makes a judgment of the evidence to assert an object in the act of recognition. The ideational complexes are grounded on the actual world, and as such, are the experience of an observer expressible in terms of language. The assertion is the evidence of an observer’s understanding of an object in terms of identity and application, and it is understandable within a language community. The ground for the transform of an ideational complex of an object, then, is the observer’s experience of being in the world as an undetached part of it. His or her experience in the world is both a disclosure of it to the observer and the observer’s discovery of it [Dreyfus (1995)]. The world is disclosed to the observer in ontological transcendence as a being-in-the-world, and he or she discovers of the world in ontic transcedence of coping with of things in the world. The evidence of the observer’s experience is reflected in his or her language as a mode of understanding what it means for an object to be what it is, and as a mode of coping with it. Finally, the observer’s language is a representation of a grammatical logic structure derived from his or her ontological/ontic transcendence, and it necessarily entails a community. The employment of the artificial language L for automated image understanding would be a two-level simulation of the grammatical logic of a natural observer in the context of an actual world of an image. The first level is the simulation of the disclosure of the world modeled by the (L )
filter and represented by the nonclassical PCS modeled by L described here. The second level is the simulation of the discovery of the world modeled by an indexical structure incorporated into a quantificational language system (QCS) not described here. The artificial language is a system L (I), and it includes sentences that are valid wff, for example, (p^ p), and wffs true in possible worlds, but does not include any sentences describing objects (L ) cannot detect. Objects that (L ) cannot
detect are ignored. The structure of L (I) into subfilters, I * I, can be employed to delineate nouns as the transform of (e@). This approach to ! the assignment of nouns means that the system must be ‘‘taught’’ by the user of this artificial intelligence and represents the artificial experience of the artificial intelligent system. The nouns, then, constitute the vocabulary of the system. Sentences of (e@) that are not distinguishable by subfilters may be ! considered as molecular expressions and, as such, are not atomic sentences even though they may be in H of L (I ). The approach to an interpretive transform explored here is an effort to address the classical ‘‘frame problem’’ of artificial intelligence. The ‘‘frame problem’’ is the representation of the primordial relation of an observer to an object in the context including both of them in the act of object recognition. The idea of an AI system based on L (I ) is tenable because the ,e@- of (L ) are a finite set of intensional representations of features and
THEORY OF RANKED-ORDER FILTERS
331
they satisfy a grammar permitting L (I ). This means that the representation of the detected object by a machine is in a form compatible with a logical model of an observer recognizing an object to be some thing. Clearly, this kind of AI system would be a very restricted simulation; nevertheless, such a system could prove to be a very useful tool. As time goes by, other methods of feature extraction may be developed based on the MW filters discussed here, or other filters altogether that satisfy the requirements of a language system, and they may be incorporated into an AI system.
V. Conclusion This chapter presented a review of recent work to investigate the structure and properties of 1D and 2D ranked-order filters with particular attention given to the median rank, or median window filter. We also presented some new developments in coded window, octagonal and 3D filters, and we explored some applications. The approach that permitted the full understanding of structure and function enabling application was a mathematical logic investigation in contrast to the customary statistical approach. The filter described as the (L ) filter received the most attention because it is
the most completely understood variant of 2D ranked-order filters, but we described the structure and root forms of the octagonal and 3D filters and alluded to the hexagonal filter. We also briefly discussed applications of these filters with the 3D variant in particular showing promise of useful applications to interpreting hyperspectral imagery. The hexagonal and octagonal filters suggest a somewhat grander scheme than previously thought; one constituting a hierarchy of dialects differing in complexity defined on degrees of monotonicity. The application of the (L ) to feature extraction was easily anticipated
from the formalism and it was shown to be remarkably effective for some kinds of imagery. It was that utility and the discovery that the fixed-point roots to (L ) satisfied a propositional language that proved to be the most
intriguing. The juxtaposition of utility for feature extraction and a syntactic structure of the elements of features suggested the possibility of automated interpretive transform of features prerequisite to an AI system for image understanding. The key element of this idea is that (L ) enables a machine
to ‘‘read’’ an image in terms of a language compatible with a model for abductive inference of what an object is. It is the hope of this investigator that this work will continue with success and encourage the communities of ontology, artificial intelligence and image processing to learn each other’s language so that they may work together in pursuit of developing visual artificial intelligence.
332
BART WILBURN
References Bovick, A., Huang, T. S., and Muson, D. (1983). A generalization of median filtering using linear combinations of order statistics, IEEE Trans. Acoust. Speech and Signal Process, 31:1342—1350. Dreyfus, H. (1995). Being-in-the-World: A Commentary on Heidegger’s Being and T ime, Division I, Cambridge: MA/London: England: The MIT Press. Eberly, D., Longbotham, H. G., and Aragon, J. (1991). Complete classification of roots to 1-dimensional median and ranked-order filters, IEEE Trans. Acoust. Speech and Signal Process., 39:197—200. Epstein, R. L. (1995). T he Semantic Foundations of L ogic, London: Oxford University Press, Chapter IV.H. Frege, G. (1892). Uber Sinn und Bedeutung. Zeitschrift fur Philosphie und philosphische Kritik 100 (1892) (Trans., On Sense and Reference in Frege, Philosophical Writings, P. Geach and M. Black, eds., London: Oxford-Blackwell, 1952. Frieden, B. R. (1998). Probability, Statistical Optics, and Data Testing, Berlin, Heidelberg, New York: Springer-Verlag, p. 257. Heidegger, M. (1982). T he Basic Problems of Phenomenology (Trans., Introduction and L exicon, Albert Hofstadter, Indiana University Press), Published in German as Die Grundproblems der Phanomenologie, Vittorio Klostermann, 1975. Husserl, E. (1970). L ogical Investigations, J. N. Findlay, translator, London: Routledge & Kegan Paul. Critical edition: L ogische Untersuchungen. Bd. I, Elmar Holenstein, editor, Husserliana XV III. The Hague: Nijhoff, 1975; Bd. II (in 2 parts), Ursula Panzer, ed., Husserliana XIX. The Hague: Nijhoff, 1984. First Edition 1900 (Vol. I), 1901 (Vol. II). Justasson, B. I. (1982). Median Filtering: Statistical Properties, in Two-Dimensional Signal Processing II: Transforms and Median Filters, T. S. Huang, ed., Berlin: Springer-Verlag, pp. 161—196. Longbotham, H. G. (1989). Theory of order statistic filters and their relationship to FIR filters, IEEE Trans. Acoust. Speech and Signal Process., 37:275—287. Puetter, R. C. and Pina, R. K. (0000). Pixon-based image restoration, http://www.stsci.edu./stsci/ meetings/irw/proceedings/puetter.dir/puetterr.html. Schoenfeld, J. R. (1967a). Mathematical L ogic, Reading, MA: Addison-Wesley, Chapters 6 and 7. Schoenfeld, J. R. (1967b). Mathematical L ogic, Reading, MA: Addison-Wesley, pp. 282—292. Smith, D. W. and McIntyre, R. (1982). Husserl and Intentionality, Dordrecht: Holland/Boston: D. Reidel Publishing Company. Tukey, J. W. (1970). Exploratory Data Analysis, Reading, MA: Addison-Wesley Publishing Co., Chapter 5, pp. 5-11—5-31. Tyan, S. G. (1982). Median Filtering: Statistical Properties, in Two-Dimensional Signal Processing II: Transforms and Median Filters, T. S. Huang, ed., Berlin: Springer-Verlag, pp. 197—218. Van Fraasen, B. C. (1971a). Formal Semantics of L ogic, New York: McMillan Company, pp. 31—34. Van Fraasen, B. C. (1971b). Formal Semantics of L ogic, New York: MacMillan, p. 36. Van Fraasen, B. C. (1971c). Formal Semantics of L ogic, New York: MacMillan, pp. 40—51. Wilburn, J. B. (1998a). Developments in generalized ranked-order filters, JOSA A, 15:1084—1099. Wilburn, J. B. (1998b). Exploring a language model of features in imagery, Proc. 4th Army Conf. Appl. Stat. 1998 and JETAI (Submitted 10 Nov. 98). Wilburn, J. B. (1998c). A possible worlds model of object recognition, Synthese, 116:(3), 403—438. Wittgenstein, L. (1945). Philosophical Investigations, Trans. G. E. M. Anscomb, Basil Blackwell & Mott, Ltd., 1967.
Index
A Adjacency relations, 101—102 in 3D well-composed sets, 106—109 in 2D well-composed sets, 115 Aharonov-Bohm (AB) effect dynamics of, 69 electromagnetic momentum conservation, 69—70 interaction of passing classical electron with rotating quantum cylinder, 74—81 interior of solenoid, 86—90 objections to standard interpretation, 56—63 quantum and canonical transformation ambiguities and future work on, 90—93 scattering, 58—59 shielding and, 71—74 solution of closed system, 81—86 stability of, 70—71 vector potential, 63—66, 72—74 Aharonov-Casher effect, 79 Ajudgment, 329—330 Anisotropic diffusion filtering, 33—34 Artificial intelligence frame problem, interpretive transforms and automated object recognition defined, 307—308, 310—311 conclusions, 331 consistency, completeness, and compactness, 315—317 convergence, 317, 318—319 exclusion negation, 317 finitary entailment, 314—323 logical connectives, 323—325 ontological considerations, 325—331
propositional syntactic/language system (PCS), 312, 314—323 translation/transform of imagery, 308—314 ultrafilter, 318—319, 324
B Background, 102 Bayesian estimates, 309—310 Bessed functions, 60—61 Bilinear interpolation, 140—141 Binary digital image, 102, 161 making well-composed, 135—139 Biorthogonal wavelets, 23 Black point, 102 Blocking artifacts, 3, 15, 20 Block partitioning, 10, 12 Boundary faces, properties of, 111—112 Box sampling method, 234
C Canonical concepts of emission mechanism, 167—171 Canonical momentum, 73, 78 Canonical transformation ambiguities, 90—93 Centroid linkage algorithm, 40—41, 45 Chain coding, 27 Closed surface, simple, 105, 111—113 digital version, 113—114 Coded window filter, 285—287, 289—292 Coifman’s wavelets, 23 Coincidence condition, 249, 256 Common face, 107 Compatibility, fixed-point roots and computation of, 275—278
333
334
INDEX
Compatibility (Cont.) conditions needed for, 281—282 criteria for, 273—275 example of calculation, 279—281 Cones, 5 Conjugate quadrature filter (CQF), 20 Connection number, 126 Connectivity paradoxes, 114—116 Continuous analog, of a digital set, 98 local properties of, in 3D wellcomposed sets, 103—106 Continuous representation of real objects, 142—146 Contour coding, 31, 39—40 Contour texture modeling, 40 Contrast resolution, 7 Corner adjacency, 106, 108 Corner points, 103—106 Coulomb gauge, 68—69, 76 Crossing number, 123
D Daubechies’ wavelets, 23 Decomposition/transformation, 8—9 Diagonally adjacency, 106 Digital bordered manifolds, 99 Digital characterization, of 3D wellcomposed sets, 106—109 Digital image, 146 Digital n-arc, 114 Digital sets, 98 Digitization of well-composed images continuous representation of real objects, 142—146 defined, 146 outcome of, 149—153 segmentation and, 146—149 Directional filtering, 27—31 Discrete cosine transform (DCT), 2 coder, 10—15 shape-adaptive, 38—39 Discrete wavelet transform (DWT), 23 Dynamic coding, 4
E Edge adjacency, 106, 108 Edge detection, 26—27 Electromagnetic momentum conservation, 69—70 Electron-optical systems, applications for, 165—166 Electron quasi-lasers (EQLs), 228—231 Embedded zerotree wavelet (EZW), 24 Endpoints, 114, 118—121 Entropy, maximum, 310 EPIC, 24 Euler characteristic, 117 Exclusion negation, 317 Expanding binary images, 137—139 by bilinear interpolation, 140—141 Explosive breakdown, 172—176, 185
F Face adjacency, 106 Fast transform techniques, 13 Feature extraction, 264—268, 304—307 Feature-width morphological pyramid (FMP), 29, 30, 31 Fermi-Dirac distribution function, 168 Feynman propagator, 62 Filter banks, 20 Finitary entailment, 314—323 First-generation image coding, 2 limitations of, 3 Fixed-point roots. See Rank-order (RO) filters Forcing condition, 247—249 Foreground, 102 Fourier tansform, 28 Fovea, 5 Fowler-Nordheim straight line, 167 Fractal model, 37, 44—46 Freeman chain coding, 39—40
G Gaussian Markov random field (GMRF) model, 42, 46 Gaussian pyramid, 17
INDEX
Gibbs phenomenon of linear filters, 20, 28 Go¨ppert-Mayer transformation, 90—91 Grammar of fixed-point roots, 253—254, 268—282 Graph/tree theory, 37, 43—44 Gray-level images, 161 histogram of checkerboard patterns, 157—159 making well-composed, 140—141 thresholding, 154—157 Grid system, 160
H Heisenberg uncertainty principle, 212 Hexagonal roots, 304 Hilbert space of functions, 60, 61 Histogram of checkerboard patterns, 157—159 H.261 compression standard, 2 Human visual system (HVS), 3, 4—8
I Image features, 3 Intensional properties, 267, 310 International Telecommunications Union (ITU), 2 Inverse gradient filter, 33, 40 Irreducible well-composed sets, 118, 123—126 graph structure of, 126—127
335
L Landau gauge, 63—64, 65 Langmuir-McCown equation, 214—215 Laplacian pyramid, 17 Layered zero coefficient (LZC), 26 Lorentz force, 66—67, 70 Luminance, 5
M Mathematical morphology, 19, 40 M-band filters, 20 Mean square distortion (MSD), 10 Mean square error (MSE), 3, 10, 14 Medial axis transformation (MAT), 42 Median window filter, 233, 234 See also Rank-order (RO) filters Minimum spanning forest (MSF), 43 M-JPEG standard, 1—2 Modulation transfer function (MTF), 6, 33 Morphological directional coding, 29—30 MPEG-4 standard, 4, 10, 32, 47 MPEG standard, 2, 14 Muller’s projector, 211—213, 215, 220 Multiscale/pyramidal coding, 15—19
N
Jordan-Brouwer separation theorem, 100, 109—110 Jordan curve theorem, 96, 97, 113—114, 117 Jordan n-curve, 114 JPEG standard, 10, 12, 14
n-boundary point, 122 n-dimensional bordered manifold, 99 n-interior/kernel of x, 122 n-irreducible, 118, 123—126 graph structure of, 126—127 Non-stationary thermal field emission. See Thermal field emission, nonstationary Nottingham effect, 187—188 n-thinning, 118
K
O
Karhunen-Love (KL) transform, 10 Kernel-boundary points, 132—135 K-H transformation, 92 k-means clustering, 36, 42
Octagonal roots, 292—304 Ordering function, 257, 284 Order statistics, 235 Orthogonal wavelets, 23
J
336
INDEX
Oscillating roots, 282—283 coded window filter, 285—287, 289— 292 solution of, 284—285 two-dimensional representation, 287— 288 Output selection function, 246
P Parallel thinning, 127—135 Perfect points, 129 Pixels, 96, 126, 160 Pixon approach, 310 Polygonal approximation, 40 Polyhedral surface, 110 without boundary, 110 Polynomial approximation, 38 Potential momentum, 64 Preprocessing techniques, 32—34 Propositional syntactic/language system (PCS), 312, 314—323 Pyramidal coding, 15—19 Pyramidal linking, 36—37
Q Quadrature mirror filter (QMF), 20 Quantificational syntactic/language system (QCS), 314, 330 Quantization/ordering of transform coefficients, 8, 9, 13—14, 17, 24 Quantum theory ambiguities, 90—93
R Rank-order (RO) filters coded window filter, 285—287, 289— 292 commonality of, 234 median window filter, 233, 234 statistical analysis of, 235—241 Rank-order (RO) filters, language model based on automated object recognition defined, 307—308, 310—311 conclusions, 331
consistency, completeness, and compactness, 315—317 convergence, 317, 318—319 exclusion negation, 317 finitary entailment, 314—323 logical connectives, 323—325 ontological considerations, 325—331 propositional syntactic/language system (PCS), 312, 314—323 translation/transform of imagery, 308—314 ultrafilter, 318—319, 324 Rank-order (RO) filters, mathematical logic approach to, 241—243 compatibility, 273—282 construction, 243—247 feature extraction, 264—268, 304—307 fixed-point roots, catalog of, 282 fixed-point roots, detection of 1D, 249—251 fixed-point roots, grammar of, 253— 254, 268—282 fixed-point roots, structure of 1D, 251—254 fixed-point surfaces, characterization of, 271—273 hexagonal roots, 304 octagonal roots, 292—304 oscillating roots, 282—292 solutions, 247—254 three dimensional roots, 304—307 two-dimensional fixed points, 255— 268 Recursive shortest spanning tree (RSST) algorithm, 43 Region adjacency graph (RAG), 44 Region growing, 34, 40—41 Regions, 96 Regular partition, 160 Repairing algorithm, 136—137 Repeated median filter (RMF), 234 Retina, 5 Ring effect, 172, 173, 210 Ringing, 20 Rods, 5
INDEX
S Sampling function, 244—247, 256—257 Scalability, 4 Scalar quantization (SQ), 24 Schro¨dinger equation, 74, 86—90, 91 Second-generation image coding conclusions, 46—49 development of, 3—4 Second-generation image coding, segmentation-based contour coding, 31, 39—40 fractal dimension, 37, 44—46 graph/tree theory, 37, 43—44 k-means clustering, 36 overview of, 31—32 preprocessing, 32—34 pyramidal linking, 36—37 region growing, 34, 40—41 split and merge, 35, 41—43 texture coding, 31, 38—39 Second-generation image coding, transform-based directional filtering, 27—31 discrete cosine transform (DCT) coder, 10—15 edge detection, 26—27 multiscale/pyramidal, 15—19 optimum coder, 9—10 overview of, 8—9 wavelet, 19—26 Segmentation, digitization and, 146—149 Segmentation-based coding contour coding, 31, 39—40 fractal dimension, 37, 44—46 graph/tree theory, 37, 43—44 k-means clustering, 36 overview of, 31—32 preprocessing, 32—34 pyramidal linking, 36—37 region growing, 34, 40—41 split and merge, 35, 41—43 texture coding, 31, 38—39 Segmented digital image, 103 Sequential thinning, 117—123
337
Set partitioning in hierarchical trees (SPIHT), 26 Shape-adaptive DCT (SADCT), 38—39 Shape analysis, 96 Signal-to-noise ratio (SNR), 234, 235— 241 Skeleton, 118 Split and merge, 35, 41—43 Spontaneous current growth, 172, 173 Square grid, 146 Square subset digitization, 148 Steven’s Law, 33 Stokes’ theorem, 72 Subband coding (SBC), 19—21, 24 Subset/element digitization, 99 Surface elements, 107 Synthetic highs system, 29
T Texture coding, 31, 38—39 Thermal field emission (TFE) broadening of emission-electron energy spectra in high electric fields, 184—185 current-voltage characteristics for, 169—170 electron quasi-lasers (EQLs), 228—231 of electrons from metal surfaces, 167—171 emission instability, 185 explosive breakdown, 172—176, 185 image technique, 178—182 inadequacy of concepts regarding high electric fields, 171—190 Nottingham effect, 187—188 Thermal field emission model, nonstationary calculating surface concentration of emitter atoms, 198—202 conclusions, 221—225 current kinetics, 217—220 emitter surface-microplasma layer, processes at, 209—217
338
INDEX
Thermal field emission model (Cont.) heating of emitter tip by emission current flow, 191—195 instability of emitter tip surface during, 195—198 ionization probability of emitter atoms after evaporation, 207—209 motion of emitter atoms after evaporation, 202—207 Thinning algorithms, 97 parallel, 127—135 sequential, 117—123 Three dimensional roots, 304—307 3D well-composed sets. See Wellcomposed sets, 3D Thresholding, 154—157 histogram of checkerboard patterns, 157—159 Topology-preserving threshold, 158 Transform-based coding directional filtering, 27—31 discrete cosine transform (DCT) coder, 10—15 edge detection, 26—27 multiscale/pyramidal, 15—19 optimum coder, 9—10 overview of, 8—9 wavelet, 19—26 Two-dimensional fixed points, 255—268 2D well-composed sets. See Wellcomposed sets, 2D
V Vector potential, 63—66, 72—74 Vector quantization (VQ), 9, 24
W Wavelet transform-based coding, 19—26 Weber ratio, 7 Weber’s Law, 33
Well-composed sets adjacency relations, 101—102 applications, 95—96 definitions and basic properties, 98—103 development of, 96—97 digitization of, 142—153 generalizations, 159—161 histogram of checkerboard patterns, 157—159 thinning algorithms, 97 thresholding, 154—157 Well-composed sets, 3D adjacency relations, 106—109 boundary faces, properties of, 111—112 connected components, 112—113 digital characterization, 106—109 Jordan-Brouwer separation theorem, 100, 109—110 local properties of continuous analog, 103—106 Well-composed sets, 2D adjacency relations, 115 definitions and properties, 113—116 Euler characteristic, 117 irreducible, 118, 123—126 irreducible, graph structure of, 126—127 Jordan curve theorem, 96, 97, 113—114, 117 making a binary image wellcomposed, 135—139 making a gray-level image wellcomposed, 140—141 thinning, parallel, 127—135 thinning, sequential, 117—123 White point, 102 Wiedemann-Franc law, 194 Window function, 245 Writing function, 246