ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 148
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES-CNRS Toulouse, France
HONORARY ASSOCIATE EDITORS
TOM MULVEY BENJAMIN KAZAN
Advances in
Imaging and Electron Physics
E DITED BY
PETER W. HAWKES CEMES-CNRS Toulouse, France
VOLUME 148
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier 525 B Street, Suite 1900, San Diego, California 92101-4495, USA 84 Theobald’s Road, London WC1X 8RR, UK ∞ This book is printed on acid-free paper.
Copyright © 2007, Elsevier Inc. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the Publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (www.copyright.com), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-2007 chapters are as shown on the title pages. If no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/2007 $35.00 Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, E-mail:
[email protected]. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.” For information on all Elsevier Academic Press publications visit our Web site at www.books.elsevier.com ISBN-13: 978-0-12-373910-0 ISBN-10: 0-12-373908-X PRINTED IN THE UNITED STATES OF AMERICA 07 08 09 10 9 8 7 6 5 4 3 2 1
CONTENTS
C ONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . P REFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F UTURE C ONTRIBUTIONS . . . . . . . . . . . . . . . . . . . . . .
vii ix xi
Planar Cold Cathodes V U T HIEN B INH AND V INCENT S EMET I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . II. Electron Emission from Solids: Basic Results . . . . . . . . . . III. Field Emission Analyses of Planar Cathodes by Scanning Anode Field Emission Microscopy . . . . . . . . . . . . . . . . . . . IV. Planar Cathodes: Theoretical Approaches and Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 4 22 33 67 68
Interval and Fuzzy Analysis: A Unified Approach W ELDON A. L ODWICK I. II. III. IV.
Introduction . . . . . . . Interval Analysis . . . . . Fuzzy Set Theory . . . . . Analysis with Distributions References . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. 76 . 85 . 119 . 129 . 184
On the Regularization of the Watershed Transform F ERNAND M EYER AND C ORINNE VACHIER I. Introduction: History of the Watershed Transform . . . . . . . . 194 v
vi II. III. IV. V. VI.
CONTENTS
Key Contributions . . . . . The Contours Regularization The Viscous Watershed Line Experiments . . . . . . . . Summary . . . . . . . . . . References . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
202 216 224 237 243 245
I NDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
CONTRIBUTORS
Numbers in parentheses indicate the pages on which the authors’ contributions begin.
V U T HIEN B INH (1), Equipe Emission Electronique, LPMCN-CNRS, University Lyon 1 69622, Villeurbanne, France W ELDON A. L ODWICK (75), Department of Mathematical Sciences, University of Colorado at Denver and Health Sciences Center, Denver, Colorado 80217, USA F ERNAND M EYER (195), Centre de Morphologie Mathématique, ENSMP, 35, rue Saint Honoré, F-77300 Fontainebleau, France V INCENT S EMET (1), Equipe Emission Electronique, LPMCN-CNRS, University Lyon 1 69622, Villeurbanne, France C ORINNE VACHIER (195), CMLA, ENS Cachan, CNRS, PRES UniverSud, 61, avenue du Président Wilson, F-94235 Cachan, France
vii
This page intentionally left blank
PREFACE
The three contributions that make up this volume come from the worlds of electron emission, fuzzy set theory, and mathematical morphology. The first, by V.T. Binh and V. Semet, describes the present state of studies on planar cold cathodes. The emission mechanism here is very different from that of pointed cathodes. The authors present, in detail, both the theory and experimental work on these cathodes, which have many attractive features, and numerous applications. This clear and modern account should be found very helpful. This is followed by a unified approach to interval and fuzzy analysis by W.A. Lodwick. This authoritative account of a relatively new and very exciting topic forms a short monograph on the subject and can be used in that spirit. The author leads us through the essentials of interval analysis and of fuzzy set theory, after which a long section is devoted to distribution arithmetic and the general theory of uncertainty. I am sure that this careful and methodical study will be widely used. Finally, F. Meyer and C. Vachier explain how to regularize the watershed transform. After a most informative account of the history of the transform and its role in image segmentation, the authors survey the various approaches to the transform and explain why regularization is needed. They introduce the idea of contour regularization and the notion of viscous transforms. They conclude with examples illustrating the procedure. Once again, this clear and modern account of an important topic in image processing should be much consulted. As always, I thank all the authors for contributing to these Advances and for the trouble they have taken to make their material accessible to a wide readership. The introductory sections that set the stage make very good reading and explain to the newcomer the excitement of the various subjects. Forthcoming contributions are listed in the following pages. Peter W. Hawkes
ix
This page intentionally left blank
FUTURE CONTRIBUTIONS
S. Ando Gradient operators and edge and corner detection P. Batson (special volume on aberration-corrected electron microscopy) Some applications of aberration-corrected electron microscopy C. Beeli Structure and microscopy of quasicrystals A.B. Bleloch (special volume on aberration-corrected electron microscopy) Aberration correction and the SuperSTEM project C. Bontus and T. Köhler Helical cone-beam tomography G. Borgefors Distance transforms Z. Bouchal Non-diffracting optical beams A. Buchau Boundary element or integral equation methods for static and time-dependent problems B. Buchberger Gröbner bases F. Colonna and G. Easley The generalized discrete Radon transforms and their use in the ridgelet transform T. Cremer Neutron microscopy xi
xii
FUTURE CONTRIBUTIONS
A.X. Falcão The image foresting transform R.G. Forbes Liquid metal ion sources C. Fredembach Eigenregions for image classification A. Gölzhäuser Recent advances in electron holography with point sources D. Greenfield and M. Monastyrskii Selected problems of computational charged particle optics M. Haider (special volume on aberration-corrected electron microscopy) Aberration correction in electron microscopy M.I. Herrera The development of electron microscopy in Spain N.S.T. Hirata Stack filter design M. Hÿtch, E. Snoeck and F. Houdellier (special volume on aberrationcorrected electron microscopy) Aberration correction in practice K. Ishizuka Contrast transfer and crystal images J. Isenberg Imaging IR-techniques for the characterization of solar cells A. Jacobo Intracavity type II second-harmonic generation for image processing K. Jensen (vol. 149) Field-emission source mechanisms B. Kabius (special volume on aberration-corrected electron microscopy) Aberration-corrected electron microscopes and the TEAM project
FUTURE CONTRIBUTIONS
xiii
L. Kipp Photon sieves A. Kirkland and P.D. Nellist (special volume on aberration-corrected electron microscopy) Aberration-corrected electron microscopy G. Kögel Positron microscopy T. Kohashi Spin-polarized scanning electron microscopy O.L. Krivanek (special volume on aberration-corrected electron microscopy) Aberration correction and STEM R. Leitgeb Fourier domain and time domain optical coherence tomography B. Lencová Modern developments in electron optical calculations H. Lichte New developments in electron holography L. Macaire, N. Vandenbroucke and J.-G. Postaire Color spaces and segmentation M. Matsuya Calculation of aberration coefficients using Lie algebra S. McVitie Microscopy of magnetic specimens S. Morfu and P. Marquié Nonlinear systems for image processing T. Nitta Back-propagation and complex-valued neurons M.A. O’Keefe Electron image simulation
xiv
FUTURE CONTRIBUTIONS
D. Oulton and H. Owens Colorimetric imaging N. Papamarkos and A. Kesidis The inverse Hough transform R.F.W. Pease (vol. 150) Miniaturization K.S. Pedersen, A. Lee and M. Nielsen The scale-space properties of natural images S.J. Pennycook (special volume on aberration-corrected electron microscopy) Some applications of aberration-corrected electron microscopy E. Plies (special volume on aberration-corrected electron microscopy) Electron monochromators V. Randle Electron back-scatter diffraction E. Rau Energy analysers for electron microscopes E. Recami Superluminal solutions to wave equations J. Rodenburg (vol. 150) Ptychography and related diffractive imaging methods H. Rose (special volume on aberration-corrected electron microscopy) The history of aberration correction in electron microscopy G. Schmahl X-ray microscopy J. Serra (vol. 150) New aspects of mathematical morphology R. Shimizu, T. Ikuta and Y. Takai Defocus image modulation processing in real time
FUTURE CONTRIBUTIONS
xv
S. Shirai CRT gun design methods T. Soma Focus-deflection systems and their applications J.-L. Starck Independent component analysis: the sparsity revolution I. Talmon Study of complex fluids by transmission electron microscopy N. Tanaka (special volume on aberration-corrected electron microscopy) Aberration-corrected microscopy in Japan G. Teschke and I. Daubechies Image restoration and wavelets M.E. Testorf and M. Fiddy Imaging from scattered electromagnetic fields, investigations into an unsolved problem N.M. Towghi Ip norm optimal filters E. Twerdowski Defocused acoustic transmission microscopy Y. Uchikawa Electron gun optics K. Urban and J. Mayer (special volume on aberration-corrected electron microscopy) Aberration correction in practice K. Vaeth and G. Rajeswaran Organic light-emitting arrays M. van Droogenbroeck and M. Buckley Anchors in mathematical morphology R. Withers Disorder, structured diffuse scattering and local crystal chemistry
xvi
FUTURE CONTRIBUTIONS
Y. Zhu (special volume on aberration-corrected electron microscopy) Some applications of aberration-corrected electron microscopy
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 148
Planar Cold Cathodes VU THIEN BINH AND VINCENT SEMET Equipe Emission Electronique, LPMCN-CNRS, University Lyon 1 69622, Villeurbanne, France
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . II. Electron Emission from Solids: Basic Results . . . . . . . . . . . . . A. Electron Emission Currents . . . . . . . . . . . . . . . . . B. Electron Emission Characteristics for Cathodes with a Work Function of 5 eV . . 1. Field Emission at Room Temperature . . . . . . . . . . . . . 2. Field Emission at High Temperatures (1500 K and 2500 K) . . . . . . . C. Emission Characteristics for Schottky Cathodes (Φ = 2.8 eV; F < 0.3 V/Å) . . D. Emission Characteristics for Cathodes with Work Function < 2 eV . . . . . III. Field Emission Analyses of Planar Cathodes by Scanning Anode Field Emission Microscopy A. Measurement Procedure . . . . . . . . . . . . . . . . . . B. Calculation Procedure for Converting a Set of I–V into J–F . . . . . . . . 1. Field Distribution Calculation . . . . . . . . . . . . . . . 2. Determination of the Absolute Distance d Between the Probe-Ball and the Cathode 3. Iterative Calculation to Convert I–V into J–F . . . . . . . . . . . C. Results and Discussion . . . . . . . . . . . . . . . . . . IV. Planar Cathodes: Theoretical Approaches and Experimental Results . . . . . . A. Ultrathin Dielectric Layer Planar Cathodes . . . . . . . . . . . . . 1. Theoretical Approach for the Basic SSE Structure . . . . . . . . . . 2. Theoretical Approach for the Composite SSE Structure . . . . . . . . B. Experimental Results for SSE Cathodes: TiO2 -SSE Cathodes . . . . . . . 1. Deposition of TiO2 Ultrathin Layer on Platinum Substrate . . . . . . . 2. SAFEM Field Emission Measurements . . . . . . . . . . . . . C. Experimental Results for SSE Cathodes: Composite-Layer Nanostructured SSE . . D. Intrinsic Low Work Function Material Lanthanum Sulfide . . . . . . . . 1. LaS Thin-Film Planar Cathode Fabrication . . . . . . . . . . . . 2. FE Behavior for Total Currents up to a Threshold Value of a Few Microamperes . 3. Burnout Behavior . . . . . . . . . . . . . . . . . . . 4. Patchwork Field Emission Model . . . . . . . . . . . . . . V. Conclusions . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
1 4 6 12 12 15 15 16 22 25 26 26 29 30 32 33 35 35 43 47 47 50 58 61 62 62 62 65 67 68
I. I NTRODUCTION A planar cathode is a thin-film emitter deposited on a conducting surface. It emits electrons when an electric field is applied by an anode separated from the film front surface by a vacuum gap. Planar cathodes give broadarea electron emission at values of local fields in the range of 100 V/µm; 1 ISSN 1076-5670 DOI: 10.1016/S1076-5670(07)48001-6
Copyright 2007, Elsevier Inc. All rights reserved.
2
BINH AND SEMET
these are much smaller in values than the fields of the order of 5000 V/µm at which field emission occurs for metals. Field emission from planar cathodes occurs because they have an effective low surface barrier less than 1 eV, resulting from nanostructured layered formation beneath the surface. There are several possible structures for planar cathodes. In addition to carbonbased (diamond) films (Robertson, 1986; Wang et al., 1991; Geis et al., 1991; Xu et al., 1993), reported field-emitting planar cathodes included multiple layers with graded electron affinity (Shaw et al., 1996), piezoelectric surface layer of indium gallium nitride/gallium nitride (InGaN/GaN) (Underwood et al., 1998), recently manufactured composite ultrathin dielectric layers with thicknesses of order 2–5 nm (Binh and Adessi, 2000; Semet et al., 2004), and nanocrystalline films of LaS (Semet et al., 2006). Electron emission was discovered about 200 years ago with thermionic emission, when it was discovered that the air around very hot solids conducts electricity, a phenomenon studied thenceforth with carbon heated in vacuum at high temperature by Thompson in 1897 (Thompson, 1897), and ascribed to the presence of electrons near the surface. The understanding of thermionic emission gradually increased with the development of vacuum technology (Dushman, 1923). Today, thermionic emission is used in various technologies that involve the use of an electron beam, including cathode-ray tubes, electron microscopes, electron-beam lithography, and e-beam evaporators. Thermionic electron guns are key components in microwave tubes such as klystrons, magnetrons, and others. Thermionic emission devices can also be found in more common household items, such as television (Morton, 1946) and computer monitors, which use cathode-ray tubes. Thermionic converters, which produce electrical power from heat generated by radioisotopes, have been extensively studied and developed for use as a source of energy for deep space probes. They have the advantage of being compact, lightweight, and reliable because they contain no moving parts. Thermoelectric refrigeration is a solidstate active cooling method that requires no moving parts or fluid. Replacing it by a thermionic emission across a vacuum gap (Tsu and Greene, 1999; Korotkov and Likharev, 1999) overcomes two major problems encountered by the thermoelectric cooling: lattice thermal conduction and Joule heating. However, to obtain a sufficient amount of current, at room temperature, a cathode work function Φ of less than 1 eV is necessary. For example, a cooling power density on the order of 1000 W/cm2 is expected for Φ = 1.0 eV at T = 300 K. In parallel to the thermionic “hot emission,” a “cold emission” of electrons from the surface of a solid can be obtained by a strong electric field and is called field emission (Guth and Mullin, 1942; Kleint, 1993, 2004). This phenomenon was first reported by Wood in 1897 (Wood, 1897) and occurs at fields in the order of 5000 V/µm for surfaces with work function in
PLANAR COLD CATHODES
3
the range of 4–5 eV (Millikan and Eyring, 1926). Such large fields are extremely difficult to realize on flat surfaces but can be generated by the field-enhancing properties of tiplike structures. The fundamental aspects were studied and analyzed in detail with the introduction by Müller of field emission microscopy (Müller, 1938) and field ion microscopy (Müller, 1956) and its use by a large community of scientists sharing their knowledge within the International Field Emission Society.1 However, the first use of field emission for instrumentation was limited to applications in quest of reduced energy distribution of emitted electrons; the illustrative example is the highresolution electron microscope using a field emission gun. It was not until the late 1980s that Spindt-type integrated cathodes (Spindt, 1968) were applied to flat-panel field emission displays (FEDs) (Ghis et al., 1991). This large consumer application then became the driving force for the use of field emission for the development of industrial products and instrumentations in areas where hot cathodes with working temperature higher than 1000 K are unsuitable. Spindt field emission arrays (FEAs), which are the heart of FEDs, are basically microfabricated Mo tips in gated configuration that use the local field enhancement at the apex of each microtip in front of a gate located at a microscale distance to lower the threshold voltage (100 V), thus enabling field emission. The microtip arrays subsequently were developed for largearea addressable electron emitters. This event introduced the notion of fieldemitter arrays and planar cathodes; that is, the field electron sources were no longer limited to the micron-size apex of a single tip but could be a large area surface in the range of square centimeters. However, the microfabrication of Spindt FEAs requires relatively expensive and complicated processes, and the field emission stability of metallic tips in poor vacuum condition is still uncontrollable. In seeking appropriate material and technique as an alternative to Spindt microtips, a covalent surface that is chemically inert is favored in order to have electron emission features and robustness much superior to metal tips. Two main orientations in this new area of research have caught the attention of many investigators: (1) the use of carbon nanotubes (Gulyaev et al., 1995) and (2) the thin-film cathodes with low or negative electron affinity (Geisl et al., 1998). Attention has been drawn to carbon nanotubes because of the success in deterministic growing carbon nanotubes in array form and the capability of aligning them vertical to the substrate surface (Teo et al., 2003). In association with the chemical inertness of the graphene surface, the high aspect ratio of the carbon nanotubes made them privileged potential candidates as an 1 The Society runs the International Field Emission Symposium, an annual meeting that has been in existence since 1952.
4
BINH AND SEMET
alternative to metal tips for uniform emission and high toughness FEAs (Semet et al., 2002). The other trend covers thin-film planar cathodes with a view to develop low work function materials (i.e., <1 eV), a specific attribute that is either intrinsic to the material or externally controlled. Planar cathodes are made preferentially from covalent materials. They present strong bonding of the surface atoms—with surface properties that are less sensitive to adsorption and to surface diffusion. This last particular quality, among others, minimizes the thermal runaway prejudicial to cathode stability at high temperature. Finally, to reduce the fabrication costs, planar cathode fabrication procedures are primarily based on thin-film deposition or selfassembled growth for obtaining large emission areas. Results concerning diamond and related carbon-based film cathodes have been reviewed previously (see Xu and Huq, 2005; Forbes, 2001), they are not, however, the subjects of this chapter. This chapter is restricted to thin film and ultra-thin film planar cathodes with effective low work function. It emphasizes basic principles underlying the electron emission to avoid misinterpretation of experimental results and is divided into three main sections. Section II addresses the two basic mechanisms for extracting electrons from solids in order to determine the precise limits of the present theoretical approaches— both analytical (Murphy and Good, 1956) and by numerical simulations (Semet et al., 2007a)—with special attention to low work function cathodes. The conventional analyses developed for field emission microscopy using a single-tip cathode cannot be directly used for field emission analyses of the planar cathodes by a local probe technique as the scanning anode field emission microscopy (SAFEM) (Semet et al., 2005). Such a local probe technique is described in Section III, which highlights the specific features of the SAFEM analysis. Experimental results with flat cathodes are presented in Section IV with their field emission characteristic analyzed by SAFEM. They concern dielectric ultrathin layer cathodes (Binh and Adessi, 2000; Semet et al., 2004), as well as intrinsic low work function material cathodes such as LaS (Semet et al., 2006).
II. E LECTRON E MISSION FROM S OLIDS : BASIC R ESULTS This section reviews the two basic mechanisms to extract electrons from a solid surface (van der Ziel, 1975). The first mechanism is thermionic emission (TE), in which the energy of the emitted electrons is higher than the top of the surface barrier (TB); the second is field emission (FE), in which the energy of the electrons is lower than TB with emission occurring by electron tunneling through this barrier (Figure 1). Thermionic emission is also called hot emission because it needs a supply of energy to raise the electrons from
PLANAR COLD CATHODES
5
F IGURE 1. Schematic representation of the potential energy for an electron in the vicinity of a metal surface with an applied field F . Taking the metal Fermi level as the reference level, the potential energy of an electron is given by V (z) = Φ − eF z − e2 /16πε0 z (ε0 is the vacuum electric constant, and the third term is the classical image-potential contribution to the potential barrier). The decrease in the effective surface barrier VSchottky due to the image effect is ∼3.8F 1/2 (VSchottky in electron volts and F in volts per Angström).
the Fermi level to TB; this supply is currently represented by kT (i.e., heating the cathode to a temperature T ). In FE electron energy is conserved during its tunneling through the deformed surface barrier by an applied field and detectable currents are emitted when the width is in the range of 1 nm or less; since no supply of energy to the electrons is needed, it is called cold emission. Within a metal, an electron current density jm of roughly 1012 A/cm2 impinges on the inner surface.2 Only a small fraction of this current escapes from the metal, either by jumping over the surface barrier or tunneling through it when the width of this surface barrier is less than about 1 nm (Figure 1). The main difference between these two electron emissions is in their brightness (Silverman, 1994). The brightness B is experimentally defined as the current density J (current per unit area normal to the beam) emitted into a solid angle Ω. Thus, the number of electrons received in a time interval t within a solid angle Ω through a detecting surface A can be written as follows: n =
B AtΩ. e
(1)
2 The value of j = en v , where e is the electron charge, n is the electron density in the solid m 0 m 0 (1022 –1023 ), and vm is the electron velocity at the Fermi level (∼108 cm/s).
6
BINH AND SEMET
By introducing a beam degeneracy factor δ as the mean number of particles per cell of phase space, n = δ × (number of occupied cells), that is: 2px py pz xyz n = δ × h3 2(p2 pΩ)(vtA) . (2) =δ× h3 The experimental brightness B can therefore be related to a maximum brightness Bmax to the expression B = δBmax ,
(3)
with 4meEE , (4) h3 where E and E are, respectively, the energy and the energy dispersion of the electrons, and h is Planck’s constant. As an example, for conventional cathodes with work function in the range of 4–5 eV, TE has δ between 10−12 and 10−10 , whereas for FE the values of δ are within the range of 10−6 –10−3 . This means from the experimental point of view that current densities from hot cathodes have an upper limit in the order of 100 A/cm2 , while cold cathodes can deliver currents up to 105 to 109 A/cm2 . Bmax =
A. Electron Emission Currents The calculations were based on the established model of the Fermi–Dirac distribution for the free electrons in metal and the classical image force barrier at the surface (i.e., the Schottky effect) (Schottky, 1914). The total current density extracted from a cathode was obtained by integrating, over all accessible energies, the product of the charge of an electron e, the number of electrons per second per unit area incident on the barrier, and the penetration probability D(F, W ). The complete expression for the current density was then stated in Hartree units3 : ∞ kT −(W − ζ ) dW, (5) J (F, T , ζ ) = D(F, W ) ln 1 + exp kT 2π2 CB
where k is the Boltzmann constant, T is the temperature, W is part of the energy for the motion normal to the surface (the energy is measured from 3 For the most basic equation, we keep the presentation of Murphy and Good in Hartree units, which is easier to read. The relationship to conventional units is presented in Murphy and Good (1956).
PLANAR COLD CATHODES
7
zero for a free electron outside the metal, i.e., the vacuum level, so the work function Φ = −ζ ), and F is the externally applied field. The form of the integrand suggested different approximated evaluation techniques to resolve analytically Eq. (5). The theoretical treatment of TE leads to the Richardson–Schottky emission relation, expressed by the following Eq. (6) (Herring and Nichols, 1949) and the theoretical treatment of FE to the Fowler– Nordheim equation [Eq. (7)] hereafter (Fowler and Nordheim, 1928). The entire emission phenomenon can be analyzed from a global point of view, as done by Murphy and Good, by extension of the Richardson–Schottky and Fowler–Nordheim F-N formulas, allowing calculations for a concomitant thermionic and field emissions (T-F), resulting in Eq. (8). For the Richardson–Schottky emission, the electrons over the Fermi level are extracted when their energy is higher than TB the top of the surface barrier deformed, lowered and rounded by the applied field F (Figure 1), and the current density JSchottky is given by: πd Φ − F 1/2 (kT )2 exp − , (6) JSchottky = kT 2π2 sin πd with F 3/4 . πkT For FE, the electrons tunnel easily through the field-deformed surface barrier when the thickness is the range of 1 nm or less; the FE current density JFE is given by: √ 3/2 4 2Φ v(y) πckT F2 exp − , (7) JFE = 3F 16π2 Φt 2 (y) sin πckT d=
where c = c(F, Φ, t (y)) is an approximate evaluation function obtained by an expansion of some terms of D(F, W ) about the Fermi energy, the variable y = F 1/2 /Φ, and v(y) and t (y) are the Nordheim elliptic functions. The contribution of temperature is expressed by the term ckT , and for small temperatures πckT / sin(πckT ) can be replaced by 1. Therefore, Eq. (7) becomes the conventional FE F-N formula given for 0 K. When both thermionic and field emissions are concomitant, a regime that is called T-F emission or emission in the intermediate region, the emission current density JTF is: F 2Θ F kT t (y) 1/2 Φ JTF = , (8) + exp − 2π 2π kT 24(kT )3
8
BINH AND SEMET
with Θ=
3 2v(y) − 3 . t 2 (y) t (y)
(9)
However, the approximations used by Murphy and Good to achieve the integration of Eq. (5) imply that Eqs. (6), (7), and (8) are valid only for some defined intervals of sets of values (Φ, F, T ) relative to each emission mechanism. These regions, delimited by boundaries 1 to 6, are plotted as illustrative examples for Φ = 5 eV and 2 eV in Figure 2. Equation (6) is valid only within the limited region defined for (T , F ) above both boundaries 1 and 2; Eq. (7) is valid within the limited region below both boundaries 3 and 4; and Eq. (8) is valid for between the boundaries 5 and 6. Figure 2 can be used to state rapidly which emission mechanism is the main process for a given set (Φ, F, T ). Moreover, it also reveals that large regions of experimental electron emission characteristics J–F cannot be quantified by the Murphy and Good analytical relations. To fill the gap (i.e., to remove the limitations intrinsic to the approximation techniques used in the analytic methodology), a self-consistent numerical approach has been developed (Semet et al., 2007b) that also allows extending the estimation of the currents to the very high field region, called the field-induced ballistic emission region (Semet et al., 2006). The numerical method calculated the transmission coefficient D(F, W ) by the use of an effective potential technique (Adessi and Devel, 1999, 2000) that reduces the N-electrons problem to a onebody Schrödinger equation. The one-dimensional (1D) Schrödinger equation is then solved in its integral Lippmann–Schwinger (LS) form (Lippmann and Schwinger, 1950) by means of Green’s function technique (Lang and Williams, 1978), with the image potential considered as a diffusion potential. The FE current was computed within a 1D jellium model of the cathode. The basic equation used is Eq. (5), and the current density is calculated using the obtained transmission probability for an electron to cross the barrier between the electron sea of the cathode and the vacuum. This reformulation allows the division of the system under consideration into a highly symmetrical reference system, corresponding to the system without the image force, and a localized perturbation near the surface by the image potential, described by a potential V ima . The result is a decomposition of the effective potential in two parts: one for the reference system (V (0) ), which can be solved quasi-analytically, and the other for the perturbation (V ima ), which can be solved numerically in the direct space due to its localization. The reference system is modeled by a half-space limited by the cathode surface (z = 0) and by a polarized vacuum for z > 0. It can be described by a 1D potential energy V (0) (z)
PLANAR COLD CATHODES
9
(a)
(b) F IGURE 2. Validity regions for the use of Eqs. (6)–(8), respectively, for thermionic Schottky emission, field emission, and concomitant thermionic and field emission; (a) for Φ = 5 eV cathodes and (b) for Φ = 2 eV cathodes. For illustration, for a 5 eV work function cathode and at T = 2500 K, Eq. (6) can be used to calculate the J–F characteristics only for F < 0.3 V/Å and Eq. (7) only for F > 0.7 V/Å. Between 0.3 < F < 0.7 V/Å Eq. (8) cannot be used because it is not within the validity region (i.e., inside boundaries 5 and 6). For a 2-eV work function cathode at T = 1000 K, only Schottky emission Eq. (6) for F < 0.08 V/Å can be used for this interval of field. No analytical approach is available for higher fields.
10
BINH AND SEMET
• equal to the bottom of the conduction band (CB), which is the effective constant potential energy inside the cathode for z < 0; and • equal to (−eF z) for z > 0, due to the contribution of the external applied field F . As a consequence of the splitting of the effective potential into two parts, the Hamiltonian is expressed as: H = H (0) + V ima ,
(10)
with H (0) = −
h¯ 2 2 ∇ + V (0) (z) 2m
(11)
and e2 . (12) 16πε0 (z − z0 ) Equation (12) is the outside classical image potential at a distance z, with z0 a constant term that allows a matching of the image potential with the inner potential at the interface. The reference wave function ψ (0) is a solution of V ima = −
H (0) ψ (0) = Eψ (0) ;
(13)
G(0) (z, z ; E)
corresponding to the reference therefore, Green’s function system can then be constructed by using two independent solutions of Eq. (13), namely: (0)
(0)
ψ (z< )ψin (z> ) , (14) G (z, z ; E) = out Wr where z< and z> are, respectively, the smaller and the larger of z and z and (0) (0) Wr is the Wronskian of ψout and ψin , which are, respectively, the outgoing and incoming wave functions from the cathode surface. The LS self-consistent form of the Schrödinger equation is then: ψ(z) = ψ (0) (z) + dz G(0) (z, z ; E)V ima (z )ψ(z ), (15) (0)
where ψ is the solution of the complete Schrödinger equation. As V ima is localized near the surface, the integrals occurring in the corresponding LS equations can be discretized, and ψ is calculated through a discretized version of Eq. (15): ψ(zi ) = ψ
(0)
(zi ) +
N j =1
G(0) (zi , zj ; E)V ima (zj )ψ(zj ),
(16)
PLANAR COLD CATHODES
11
where N is the number of discretization points. The discrete solutions {ψ(zi )} are therefore obtained by the numerical resolution of a set of N equations: N
δij − G(0) (zi , zj ; E)V ima (zj ) ψ(zj ) = ψ (0) (zi ).
(17)
j =1
The numerical method allows the exact resolution of the Schrödinger equation with the full image-rounded quasi-triangular barrier. It uses the same formalism for electrons tunneling through or going over the barrier—it performs the exact calculation of the electron current with the image potential and gives the corresponding energy distribution, for any set of values of Φ, F , and T covering large ranges of Φ, F , and T values. This means that the numerical resolution is not restricted to some regions of validity as in Murphy and Good analyses (Figure 2), because those former restrictions were simply the consequences of the different approximations and simplifications used for valuating the integral in Eq. (5) for the different processes. For example, Figure 2 shows that T-F emission (i.e., Schottky cathodes) is very poorly covered by the analytical resolution. The other main restriction of the analytical resolution is that it has not considered the direct electron emission from the Fermi sea over the surface barrier. This last process, called field-induced ballistic emission (Forbes, 1999), is a consequence of the decrease of the surface barrier height by the Schottky effect of a value of VSchottky ≈ −3.8F 1/2 , where Φ in eV and F in V/Å (see Figure 1), meaning that the top of the barrier TB is pulled down below the Fermi level for applied field values F > Fbal = 6.945 × 10−2 Φ 2 , where Fbal is the threshold field when TB is at the Fermi level. In such conditions, comparable to a situation of negative electron affinity, electron emission no longer takes place solely by tunneling through the barrier but occurs predominantly by flowing over the top of the barrier directly from the Fermi sea. This mechanism is not relevant for metal cathodes with Φ > 4.5 eV because it occurs only for F ≥ 1.5 V/Å. However, for cathodes with Φ < 2 eV, it becomes rapidly the dominant mechanism when F > 0.2 V/Å (Figure 3). The following paragraphs examine the different characteristics for electron emission in the cathode work function and temperature versus the applied field.
12
BINH AND SEMET
F IGURE 3. The field-induced ballistic region lies over the curve Fbal . It corresponds to the situation where the top of the barrier is pulled down below the Fermi level by the Schottky effect.
B. Electron Emission Characteristics for Cathodes with a Work Function of 5 eV 1. Field Emission at Room Temperature Within the interval of electric field compatible with experimental applied fields and current measurements (between 0.3 and 1 V/Å), Figure 2a indicates that the Murphy and Good FE [Eq. (7)] is valid. The electron emission mechanism is mainly tunneling through the surface barrier, and the J–F characteristic is a straight line when Eq. (7) is plotted under ln(J /F 2 ) versus (1/F ) (Figure 4), and is called thereafter the F-N plot.4 The F-N plot for FE mechanism is a straight line with a slope proportional to Φ 3/2 ; that is, the value of Φ can be extracted from the slope value (F in V/Å, J in A/cm2 , and Φ in eV) by the following relation: 2/3 slope Φ= − , (18) 0.68455v(y) in which v(y), the value of the Nordheim elliptic function, must be calculated for each set of (F, Φ) (Forbes, 2006). 4 The current characteristics J versus F are frequently plotted under the coordinates ln(J /F 2 ) versus (1/F ); this plotting procedure is therefore called F-N plot or plotting style, and it does not imply that the emission process is restricted to tunneling emission mechanism. This choice of the F-N plotting style, aside from the fact that it is the conventional plot for the cold cathode community, has one main advantage: such plot should be a straight line the moment that a tunneling process is active.
PLANAR COLD CATHODES
F IGURE 4. process.
13
F-N plot of the J–F variations showing a linear behavior specific to a tunneling
In Figure 4 the J–F characteristics obtained from the numerical simulations are also plotted, and they are in accordance with the analytical results, particularly for the slope. To eliminate the calculation of v(y), the following numerical [Eq. (19)] directly yields Φ in the function of the slope value of the F-N plot: Φ=
0.2266 − slope 0.7370
2/3 ,
(19)
for F in V/Å, J in A/cm2 , and Φ in eV. Both Eqs. (18) and (19) are valid only for Φ > 0.8 eV, at T < 300 K, and for field intervals given by Figure 2. Moreover, Eq. (19) is limited to Φ < 5 eV. From a practical standpoint, when the cathode work function is ∼4–5 eV (values for metal surfaces or graphene), the current densities for F less than ∼0.3 V/Å are too small to be measured experimentally, and for F greater than ∼1 V/Å, it results very easily in a blowup of the cathodes by a thermal runaway, resulting from the action of the applied electrostatic forces and FE current-induced temperature increase. Thus, for conventional metallic cathodes (those with a work function in the range of 4–5 eV), experimental F-N plot is mostly restricted to the FE zone as indicated in Figure 2a and is therefore represented by a straight line. This also explains why the F-N plot is sometimes incorrectly associated to a linear variation.
14
BINH AND SEMET
The total energy distribution (TED) of the field-emitted electrons is governed by the tunneling probability through the rounded triangular barrier (Figures 5 and 6). The TED spectrum then presents a high-energy side slope that is mostly temperature dependent (i.e., ∝ −1/kT ) and a low-energy side slope that is field dependent (i.e., ∝ Φ 1/2 /F ), with its maximum location
F IGURE 5. Evolution of the surface barrier with the applied field (left-hand side plots) with the corresponding TED of the emitted electrons (right-hand side plots).
F IGURE 6.
Detailed plots of the TED of Figure 5.
PLANAR COLD CATHODES
15
near the Fermi level (Figure 6). The full-width half-maximum (FWHM) of this distribution is less than ∼0.3 eV and, for a given temperature, it increases with the applied field due to larger deformation of the surface barrier by the field (i.e., −eF z) (Figure 6). 2. Field Emission at High Temperatures (1500 K and 2500 K) The lowering of the surface barrier height by the Schottky effect has a value of Vbarrier ≈ −3.8F 1/2 (Vbarrier is in eV and F in V/Å) and reaches values in the order of 3.5 eV for F ∼ 1 V/Å. However, for work function in the range of 5 eV, the Schottky effect is not strong enough to allow an overriding electron emission by thermionic mechanism and thus be the dominant mechanism. Figure 7 presents the TED for two temperatures (1500 K and 2500 K)5 : they demonstrate that the tunneling process is still the dominant mechanism even if there is a great modification in the TED spectra with primarily a large increase in the energy spread with temperature. A qualitative analysis using Figure 2 yields the same conclusion. It shows that TE is the dominant mechanism for F less than 0.2 and 0.3 V/Å, correspondingly for 1500 K (with a very small current) and 2500 K, whereas FE assumes the dominant mechanism role for F greater than 0.3 and 0.7 V/Å (respectively for 1500 K and 2500 K). C. Emission Characteristics for Schottky Cathodes (Φ = 2.8 eV; F < 0.3 V/Å) To take advantage of the lowering of the surface barrier by the image effect, a cathode with a work function of less than 3 eV is needed to achieve an effective barrier height less than 1 eV for applied fields greater than 0.25 V/Å (Figure 8). These cathodes are called Schottky emitters; they are routinely tungsten tips covered with a layer of ZnO, which after a seasoning process, yields an oxide surface with a Φ in the range of 2.8 eV. The working temperature is 1800 K. In Figure 8, the numerical simulations of the TED evolution with the surface barrier show very clearly that for low field (less than ∼0.1 V/Å) the dominant mechanism is thermionic. With increasing fields (F > 0.15 V/Å), the contribution of FE increases, and a concomitant T-F emission becomes the main mechanism for F in the range of 0.25 V/Å and higher. In practice, such high fields create a field-induced surface diffusion, followed by a strong geometrical modification of the emitting area with 5 For temperature in the range of 2500 K, metallic tips undergo a very rapid modification of their profile under surface diffusion, with the field gradient as the driving force most ended in a destruction of the tip. This process is called thermal runaway. However, for covalent material tips (i.e., carbon nanotubes), temperatures up to ∼2500 K can be reached without undergoing thermal runaway destruction.
16
BINH AND SEMET
F IGURE 7. Evolution of the surface barrier with the applied field (left-hand side plots) and the corresponding TED of the emitted electrons (right-hand side plots) for (a) 1500 K and (b) 2500 K.
possible blowup of the tips. Therefore, Schottky tips are used mainly with fields well below the thermal runaway values—as thermionic cathodes with a Schottky reduced effective surface barrier less than 2.8 eV. D. Emission Characteristics for Cathodes with Work Function < 2 eV Figure 2 shows that T-F emission (i.e., Schottky cathodes) is poorly covered by the analytical resolution. Another focal limit of the analytical resolution is
PLANAR COLD CATHODES
17
F IGURE 8. Evolution of the surface barrier with the applied field (left-hand side plots) and the corresponding TED of the emitted electrons (right-hand side plots) for a Schottky cathode showing the transition from a dominantly thermionic emission toward a T-F emission with the increase of the applied field.
that it has not scrutinized the direct electron emission from the Fermi sea over the surface barrier, the field-induced ballistic emission, a phenomenon that occurs for F > 0.3 V/Å for cathodes with low work functions (i.e., <2 eV). Hence, a systematic study for different values of Φ and T must use numerical simulations. The overall behavior follows the same outline as presented in Figure 9 for Φ = 1.05 eV and plotted within the format ln(J /F 2 ) versus (1/F ). The choice of the F-N plotting style, aside from the fact that it is the conventional plot for the cold cathode community, has a key advantage: such a plot should be a straight line the moment that a tunneling process is active. The F-N plot in Figure 9 clearly displays three zones for the current variation. The first zone, for low fields (F < 0.03 V/Å), shows a nonlinear increase of the emission current (including Schottky and T-F emissions intervals). It is followed by a linear variation in zone 2 (FE interval of Figure 9) and zone 3 for high field values (F > 0.065 V/Å), which exhibit a saturation effect and correspond to the ballistic emission interval. The analysis of the electron emission process is simplified by considering the current characteristics in relation to the surface barrier geometry and the TED of the electrons (Figure 10). The corresponding TED of the emitted electrons are therefore plotted (in Figure 10) in front of the potential diagrams for some values of the field (indicated by arrows 1–5 in Figure 9). For clarity, Figure 10 does not incorporate the TEDs related to F < 0.02 V/Å as they are distinctive of a conventional TE regimen with energy of the extracted
18
BINH AND SEMET
F IGURE 9. J–F characteristics of a low work function cathode. The arrows numbered 1 to 5 indicate the J–F values for the corresponding five TED in Figure 10.
F IGURE 10. Surface barriers and total energy distribution spectra for five values of the applied fields indicated by arrows in Figure 9.
electrons over the top of the barrier (as shown with the TED 1 in Figure 8, for example). The nonlinear increase of the emission current until F ∼ 0.03 V/Å in the F-N plot (see Figure 9) is indicative of a T-F process as indicated by TED 1 in Figure 10; it reflects the evolution from a dominant TE toward a tunneling emission process (as characterized by TED 2) passing by a T-F mechanism.
PLANAR COLD CATHODES
19
When most of the electrons are extracted by tunneling through the barrier (e.g., zone 2 in Figure 9 for the interval 0.03 < F < 0.065 V/Å), the F-N plot is a straight line and its slope is proportional to Φ 3/2 . For field values F > Fbal = 0.07 V/Å, TEDs 4 and 5 in Figure 10 show that the main contribution of the current comes from the flow of electrons from the Fermi sea over the surface barrier (i.e., the field-induced ballistic contribution zone 3 in Figure 9). Information provided by the TEDs thus allows relating the numerical analyses of Murphy and Good with the numerical simulation data. For very low fields (e.g., F < 0.018 V/Å for Φ = 1.05 eV), as Schottky emission is the main active process, the Murphy and Good Eq. (6) can be used to calculate JSchottky and thus until F < 0.018 eV (as indicated by the validity region plot in Figure 2). For higher fields, the T-F Eq. (8) cannot be used because it is not valid for 0.018 < F < 0.032 V/Å, and the FE Eq. (7) is suitable only for the interval 0.032 < F < 0.062 V/Å (Figure 2). The two sets of data in the Schottky and FE regions obtained from the Murphy and Good relations in Eqs. (6) and (7) are plotted in Figure 11 (open triangles). The comparison with our numerical results show good agreement in the respective validity regions; it also highlights the restrictions of the analytical approach. In the saturation region, our numerical simulation results and the TED spectra (e.g., TEDs 3, 4, and 5) indicated clearly that electron emission takes
F IGURE 11. Comparison between numerical simulations and analytical approaches of the emission current variation with applied field for a cathode work function Φ = 1.05 eV. Open (black) circles correspond to numerical simulation and (red) triangles to data calculated with Murphy and Good relations (6) and (7). Open (pink) squares correspond to field-induced ballistic emission Eq. (23) when F > Fbal = 0.075 V/Å.
20
BINH AND SEMET
place predominantly by a direct flow from the Fermi sea over the top of the barrier. In order to make an analytical appraisal of Eq. (5) for the field-induced ballistic mechanism, and in particular to assess its role in the saturation of the F-N plot at the high field region, the following assumptions have been used to develop it for F > Fbal : • The transmission coefficient D(F, W ) = 1 in calculating the current in Eq. (5) for W > TB. • The tunneling part of the current is neglected, D(F, W ) = 0 for W < TB. Under these two assumptions, Eq. (5) becomes: J (F, T , ζ ) = Jbal
kT = 2π2
∞ −(W − ζ ) dW. ln 1 + exp kT
(20)
TB
With X and XTB defined as:
W −ζ , X = − exp − kT Φ , XTB = − exp − kT
(21) (22)
the integral function of Eq. (20) is a dilog function of X, (kT )2 J (F, T , EF ) = 2π2
0 −
ln(1 − X) dX. X
(23)
XTB
A plot of Eq. (23) for F > Fbal is shown in Figure 11 (open squares). The similarity between the analytical and numerical values in this interval confirms that it is the field-induced ballistic mechanism that is at the origin of the saturation effect of the F-N plot in the high field region. One aspect about the analytical data obtained with Eq. (20) merits mention: the analytical data are underestimated compared to the numerical simulation because of the approximation D(F, W ) = 0 for W < TB (i.e., no tunneling contribution). It resulted in an underestimation of the analytical data compared to the numerical simulation, in particular for field values ranging from 0.06–0.09 V/Å in Figure 11. Regarding the contribution of the temperature, numerical simulations indicate major modifications for the F-N plots in function of the temperature in the T-F and FE regions, but they all converge to the same saturation value as illustrated in Figure 12. This last behavior is specific to the field-induced ballistic emission, which is much less sensitive to the temperature distribution of the electrons in the cathode. With the increase of the temperature, the T-F
PLANAR COLD CATHODES
21
F IGURE 12. J–F characteristics showing the convergence, in the field-induced ballistic emission, toward the same saturation current for different cathode temperatures.
emission region expanded to the detriment of the FE region. This results, for a certain temperature threshold, to a direct transition from T-F regimen toward ballistic regimen without passing by the FE regimen, as exemplified by curve 4 plotted for 450 K and Φ = 1.05 eV in Figure 12. More concisely, the linear region of the F-N plot disappears, and therefore a value of Φ 3/2 cannot be directly extracted from F-N plot, an important observation when analyzing experimental data, as developed hereafter. As a summary, use of a numerical simulation approach allows the calculation, with the same formalism, of the electron emission current versus applied field from its theoretical smallest values until thousand of A/cm2 when the applied field is high enough to pull the top of the surface barrier well below the Fermi level. The ln(J /F 2 ) versus (1/F ) plot of the emission features clearly shows three variation zones. In the first zone, the emission started with electrons having energy high enough to escape over the surface barrier by a Richardson–Schottky mechanism. As the applied field increases, the Schottky-deformed barrier allows more and more electrons to not only pass over but also tunnel through the very top of the barrier, resulting in a combined thermionic-field emission mechanism characterized by a nonlinear variation of the F-N plot. The second zone, which follows the first zone, is represented by a linear variation in the F-N plot and is obtained when the surface barrier is so deformed by the applied field that electron emission occurs dominantly by tunneling through the surface barrier and is quantified by the conventional FE F-N relation [i.e., the slope of the linear variation part is proportional to Φ 3/2
22
BINH AND SEMET
so the values of Φ can be extracted by using Eq. (19)]. Increasing the field to values high enough to lower the top of the barrier below the Fermi level leads to field-induced ballistic emission of electrons directly from the Fermi sea, resulting in very high current densities that evolve toward a distinguishing current saturation, independent of the cathode temperature, in the third zone. From a practical point of view, when the cathode work function is ∼4 eV, the current densities in the first zone are too small to be measured experimentally. The field-induced ballistic emission started only for applied fields greater than 1 V/Å, and coupled with the very important current densities extracted, it results very easily in a blowup of the cathodes due to the applied electrostatic forces and FE current-induced temperature increase. Consequently, for conventional metallic cathodes, with a work function in the range of 4 eV, experimental F-N plot is mostly restricted to the FE (second) zone, that is, a straight line, with eventual deviation from the straight line— saturation behavior for very high fields near 1 V/Å. Nonetheless, when the cathode work function is the range of 2 eV or less, experimental recording of currents from the first zone is possible, and reaching the ballistic emission regimen with field values in the range of 0.2 V/Å is also potentially workable because the electrostatic pressures are not immediately prejudicial for the cathodes. This last potentiality of a low work function cathode appears as a route for the realization of stable high current cold cathodes, meaning a technical breakthrough for several new vacuum electronics application developments.
III. F IELD E MISSION A NALYSES OF P LANAR C ATHODES BY S CANNING A NODE F IELD E MISSION M ICROSCOPY Analyses of the electron emission characteristics from a tip cathode are currently made with a field emission microscope (FEM), either in a diode or a triode mode. In the conventional FEM, a single tip with a radius in the range of 100 nm is placed in front of an anode (i.e., a screen) located a centimeter away. The field F at the apex of the tip can be expressed as F = V /κra , where V is the applied voltage between the tip and the anode, ra is its apex radius, and κ is a geometrical factor depending on the tape angle of the tip having an approximate value of 5 (Gomer, 1961). The exact electric field distribution at the surface of the cathode can be estimated by two simple analytical solutions for the potential distribution; one is based on the paraboloidal model (Smith and Walls, 1978) and the second on the hyperboloidal model (Coehlo and Debeau, 1971). These two models represent the tip and the anode, either by two confocal paraboloids or by hyperboloids of revolution. A more precise representation that considers the role of the shank angle was developed by
PLANAR COLD CATHODES
23
Dyke and Dolan (1956); their model considers the tip as a sphere on an orthogonal cone. In practice, none of these models with their assumptions is strictly correct. However, they allow a correct estimation of the actual field at the tip end region. Their use allows transformation of any total FE current I -applied voltage V (I–V) characteristics into total FE current density J -applied field F (J–F) characteristics. Note that only the J–F data can be consistently used to make quantitative analysis of any FE process. For a planar cathode, the cathode–anode configuration is more similar to a parallel plate capacitor. Then two categories of electric fields can be defined as follows: • A macroscopic field FM , which is the field between the two parallel electrodes considered as flat surfaces V FM = , (24) d where V is the voltage applied between the cathode and the anode, being a distance d apart. • A local field F , close to the emitting surface, which is the field actually acting at an emitting surface that determines the surface barrier as described in Section II. This local field is strongly dependent on the local conditions at the surface, particularly the local geometrical corrugation. It is related to FM by a field enhancement factor γ , which is the ratio between the local field F and the macroscopic field FM : F . (25) FM The value of γ depends on the geometry of the local protrusion on the plane surface, and in particular on the aspect ratio ν = L/ρ between the protrusion length (or height) L and its base radius ρ. For a perfectly flat cathode surface, γ = 1; therefore, FM and F are identical. For geometry of the protrusions with high aspect ratio, γ could reach values of thousands, and confusion of F with FM should lead to severe misinterpretation of experimental results. Estimates of γ for different protruding geometries have been calculated (Forbes et al., 2003), particularly for two of them: a hemisphere on a post model and a hemi-ellipsoid on a plane (Table 1). These results are given for assumptions of a gap d L and for protrusions on a flat surface substrate. Otherwise, there are two primary corrections to stress: γ =
1. If the anode is relatively close to the protrusion (i.e., d is in the range of L), then the influence of d may need to be taken into account by using the relation (Miller, 1967) γ (d) = γ × (1 − L/d).
(26)
24
BINH AND SEMET
TABLE 1 R ELATIONS G IVEN THE F IELD E NHANCEMENT FACTORS FOR D IFFERENT G EOMETRIES OF P ROTRUSIONS ON A P LANE S URFACE Protrusion geometries
Field enhancement factor
Hemisphere on a post (for ν < 4)
γ =2+ν
Hemisphere on a post (for 4 < ν < 3000)
γ = 1.2 × (2.15 + ν)0.90
Hemi-ellipsoid on a plane (for ν > 1) (apex radius ra = ρ/ν = ρ 2 /L)
ζ γ = (ν ln(ν+ζ ))−ζ
Schematic draws
3
Field enhancement factor = γ = F /FM ; ν = L/ρ; ζ = (ν 2 − 1)1/2 .
2. If the underlying surface is not flat, the field enhancement of the substrate must be taken into account, resulting in a field enhancement factor γtotal = γsubstrate × γ if Eq. (24) is used for the determination of the macroscopic field. Otherwise, care may be needed over the definition of the macroscopic field (Miller et al., 1996). For example, if the protrusion is located on the apex of a substrate tip with a radius rsubstrate , then the macroscopic field is therefore FM = V /(krsubstrate ) and not Eq. (24). Subsequently, the local field at the apex of the protrusion is given by Eq. (25). With nominal planar cathode, γ = 1; therefore, the primary conclusion that can be stated for FE analysis is that to obtain FE fields in the range of 102 –104 V/µm with applied voltages in the range of 1000 V, the cathode– anode gap d must be less than 1 µm. Another constraint that distinguishes the planar cathode from the single-tip cathode resides in the magnification effect due to the radial projection from the hemispherical apex of a tip. Subsequently, for a perfectly flat surface in a projection mode, the lateral resolution of the emission is very poor. For a corrugated surface, if the density of protrusions is high enough, electron beams from the different protrusions merge, thus preventing the possibility for projection FE analysis of individual protrusions (i.e., with good lateral resolution). For these two main reasons—to have both a gap d and a lateral resolution in the range of microns—I–V characteristics for planar cathodes were currently
PLANAR COLD CATHODES
25
F IGURE 13. Schematic description (left) and a photo (right) of the SAFEM. The probe-ball is attached to a piezo-driven mechanical displacement stage with 5 degrees of liberty. The resolution of the piezo displacements is better than 1 nm, and the probe-ball with radius R is in the range of 100 µm.
measured with a spherical probe moving very near the cathode surface with nanometer resolution, defining a SAFEM (Mahner et al., 1993; Pupeter et al., 1996; Nilsson et al., 2001; Binh et al., 2001). A schematic description is presented in Figure 13. Under this configuration, the field distribution created by the probe-ball over the flat cathode surface is not uniform. It depends on the probe radius R and the exact distance d between the probe-ball and the surface. Quantitative interpretation of FE directly from the raw I–V data can be very misleading due to the strong variation of the actual field across the emitting area (i.e., the tunneling current). Therefore, converting the experimental data I–V into J–F is required before any quantitative SAFEM analysis. The J–F characteristics can be extracted from a set of I–V data measured for different values of d, with the knowledge of the value of the relative displacement d of the probeball between two successive I–V measurements. The methodology for this determination (and given hereafter) is based on the determination of the field distribution over the planar cathode surface in front of a spherical anode by electron optics simulations, and subsequently extracting the active emitting zone and the local current densities. A. Measurement Procedure The active element of the SAFEM is the probe-ball made from a platinum– iridium (PtIr) wire. It is considered a sphere with a typical radius less than 300 µm, resulting from a rapid melt of the end of a wire with a diameter 200– 250 µm. This probe-ball is brought in front of the planar cathode by a piezo-
26
BINH AND SEMET
driven displacement having 5 degrees of liberty [three linear displacements (x, y, z) and two tilts (θ, Φ)]. The interval for the x-y-z linear displacements was in the centimeter range with a resolution of 1 nm. The maximum angular rotation was 45◦ for the two tilts, with a resolution of 5 × 10−8 rad. The three linear displacements allowed scanning the probe-ball in x-y over the cathode with a control of the distance z between the probe-ball and the cathode. The two tilts ensure the horizontality of the cathodes plane versus the probe-ball, that is, keeping the same distance z during the x-y scan. The methodology consisted of measuring the total FE current I in function of the applied voltage V between the probe-ball and the cathode at different distances z with successive increments d: d, (d + d), . . . , (d + n × d). The exact value of d is determined from an experimental calibration of the z displacement. The three-step calculation procedure to convert any set of I–V characteristics into J–F data is as follows: 1. Calculation of the field distribution by numerical simulations 2. Determination of the absolute distance d, followed by 3. Conversion of the I–V set of data into unique J–F characteristics by an iterative method. Each of these steps is detailed below. B. Calculation Procedure for Converting a Set of I–V into J–F 1. Field Distribution Calculation Numerical simulations showed that the potential distributions, under the probe-ball and within an area of the cathode limited by the ball diameter, between a standing alone sphere and a sphere connected to a wire were mostly the same if the diameter of the wire is smaller than the sphere diameter. The very small perturbation due to the connecting wire is neglected since it is only observable for area on the cathode that is located outside a circle with the diameter of the ball and centered on the probe axis. Given that this is the experimental requirement for the SAFEM probe, the assumption is made that the field distribution on the planar cathode can be calculated by an analytical model of a stand-alone sphere in front of a plane from the conventional electrostatic image method (Durand, 1953). The field distribution, with an axial symmetry around the probe axis, is a function of the following parameters: • The radius of the probe-ball R • The distance d between the probe-ball apex and the cathode • The difference of potential between the probe-ball and the cathode V = Vs − Vp
PLANAR COLD CATHODES
27
F IGURE 14. Schematic representation of the different parameters in the “image method” for the calculation of the field distribution on the plane in front of the probe-ball. (Only the two first charges and their images are drawn.)
• The distance x from the probe axis on the cathode surface where the field must be calculated With the image method, a sphere at the potential V in front of a plane can then be replaced by an infinity of elementary charges qi and qi = −qi , placed respectively at the distance Di and Di = Di from the plane (Figure 14). The field distribution F (d, V , x) on the plane is then the sum of the different fields created by all these charges qi and qi . If D0 is the distance between the sphere center point and the plane (D0 = d + R), and q0 the charge of the sphere (q0 = 4πε0 RV ), then F (d, V , x) is F (d, V , x) = with
+∞ 2Di qi 1 V, 2 4πε0 (Di + x 2 )3/2 i=0
qi = q 0
(R 2 − D02 )
, R sin((i + 1) arccos(D0 /R)) sin((i arccos(D0 /R))) . Di = D0 − R sin(((i + 1) arccos(D0 /R)))
(27)
(28) (29)
Typically for d = 1 µm, R = 250 µm, and V = 200 V, the number of charges qi can be limited to ∼40 (i = 40) if a relative error F (d, V , x)/F (d, V , x) less than 0.1% is acceptable. Figures 15 and 16 are the normalized plots of the field distributions, respectively, for different probe-ball radii at a given distance d = 1 µm and
28
BINH AND SEMET
F IGURE 15. Normalized field distribution across the probe-ball–cathode gap d = 1 µm and for different probe-ball radii.
cathode
surface
for
one
F IGURE 16. Normalized field distribution across the cathode surface for one probe-ball radius = 150 µm and for different probe-ball–cathode gap distances.
for a given probe-ball radius R = 150 µm but located at different distances d from the cathode surface. These figures indicate a rapid drop in the field distribution along x, which justifies the choice of a stand-alone sphere in front of a plane for the analytical approach to determine the field distribution. Figure 15 indicates that for small d (∼1 µm), the field is less than half the
PLANAR COLD CATHODES
29
field at the axis when x ≥ R/10. Therefore, for FE measurements at close distances, the emitting area can be estimated to be about 1/10 of the probeball diameter. Figure 16 demonstrates that a precise knowledge of the distance d is essential to determine the field distribution at the surface of the cathode, which means the exact emission area under the probe-ball. 2. Determination of the Absolute Distance d Between the Probe-Ball and the Cathode The absolute value of the distance d can be determined experimentally from a calibrated piezo displacement in z and by considering d = 0 when the probe-ball is within tunneling distance from the surface (for example, when a nanoampere tunneling current is measured with only a few volts of polarization of the probe-ball) or when the ball is just in contact with the cathode surface. Still, if such an experimental procedure is useful for the calibration of the distance, it could modify, in an unknown manner, the surface of the cathode during the contact. Moreover, it is a very demanding procedure when repetitive measurements are necessary. An alternative method allows the determination of d with only the knowledge of the relative displacement of the probe-ball d. With this method, the only requirement is calibrating d between two successive I–V measurements, a procedure that is more straightforward compared to the upper calibration of the absolute value of d. This determination uses the following assumption: the field F0 required for a threshold FE current, typically in the range of 0.1 nA, is the same for any distance d, and the FE threshold current is emitted from the same small circular area centered at x = 0. The choice of the threshold current value results from a trade-off between the desired precision for the calculation and the measurement resolution of the current. From the experimental standpoint, this assumption means that the variation dV of the voltages Vmin (z) needed to have the threshold current, for different values of z = d, (d + d), (d + 2d), . . . , (d + nd), d is the starting distance for the set of successive measurements with increment d, will be linear in z, and its slope S is as follows: dV (z) . (30) dz As F = F0 + (δF /δz) dz + (δF /δV ) dV , in order to maintain the field constant and equal to F0 at the cathode surface, any variation of the field due to a modification of the probe-ball distance [δF (z, Vmin , 0)/δz] dz should be compensated by a field variation due to the potential modification [δF (z, Vmin , 0)/δV ] dV , which means:
∂F (z, Vmin , 0) ∂F (z, Vmin , 0) = −S. (31) ∂Z ∂V S=
30
BINH AND SEMET
Resolution of Eq. (31) yields the absolute value of the distance d. To ensure the soundness of this numerical procedure, it is useful to perform a comparison between the values of d obtained from a direct measurement using the contact procedure with the one calculated from Eq. (31). This provides agreement between those two values within 5%. 3. Iterative Calculation to Convert I–V into J–F With the assumption that the field distribution is symmetrical to the axis of the probe, the total current was expressed in function of the current density by: ∞ I (V ) =
J F (d, V , x) 2πx dx.
(32)
0
Equation (32) means that the total current is the sum of elementary currents coming from concentric rings of surface 2πx dx, where the current density J [F (d, V , x)] is constant. This integral function is approximated by a discrete sum: In (Vn ) =
n
J (Fn−i ) × Si ,
(33)
i=0
where Si is the surface of a concentric ring limited by Ri and Ri+1 . S0 corre2 ), sponds to the emission area at the threshold current, Si = π × (Ri2 − Ri−1 and the maximum value of i = n is given when J (Fn−i ) ≈ 0. The implicit approximation of Eq. (33) is that the field Fi is considered constant over the section Si . Figure 17a represents the field distribution at the threshold current and the area S0 = πR02 . The implicit approximation is that the field F is constant over the area S0 and is given by F0 = F (d, V0 , 0); outside S0 the field F < F0 is too small to extract electrons from the cathode, and the FE current is considered 0. The current density J (F0 ) over S0 is: J (F0 ) =
I0 (V0 ) . πR02
(34)
Keeping the same distance d and increasing the voltage by V , V1 = V0 + V , the emission area becomes S0 + S1 , and the total current is shown by Figure 17b: I1 (V1 ) = J (F1 ) × πR02 + J (F0 ) × π R12 − R02 , (35)
31
PLANAR COLD CATHODES
F IGURE 17. Field distribution over a planar cathode in front of the probe-ball with the discrete emission areas at the threshold current (a) and for increasing voltages (b) and (c).
with J (F1 ) =
I1 (V1 ) − J (F0 ) × π(R12 − R02 ) πR02
(36)
.
Increasing the voltage n times so that Vn = V0 + nV , we obtain: In (Vn ) =
n
2 J (Fn−i ) × π Ri2 − Ri−1 ,
(37)
i=0
with Fi = F (d, Vi , Ri−1 ) given by Eq. (27) and R−1 = 0. The general term of J (Fn ) is then: J (Fn ) =
In (Vn ) −
n−1 i=0
2 ) J (Fn−i ) × π(Ri2 − Ri−1
πR02
.
(38)
By considering that the field distribution is given by Eq. (27), the conversion of the I–V characteristics into J–F characteristics consists, in practice, of calculating for a given value of d and for the successive voltages, increasing from V0 to Vn , the values of Ri . Thereafter, an iteration process allows determination of the values of J (Fi ) from Eqs. (33)–(38).
32
BINH AND SEMET
C. Results and Discussion Figure 18 represents the I–V measurements obtained at different distances d from a planar emitter (5-nm thickness of titanium oxide [TiO2 ] on platinum). Figure 18b shows the corresponding J–F curve obtained by using the upper conversion procedure. Figure 18b also shows the J–F curve of another
F IGURE 18. (a) I–V characteristics at different distances of the probe-ball for a planar cathode of 5 nm TiO2 . (b) Corresponding J–F curve obtained for the 5-nm TiO2 thickness from the I–V set of (a); a J–F curve obtained for 2-nm TiO2 thickness planar cathode is also plotted to show the uniqueness of the J–F plot for each different cathode.
PLANAR COLD CATHODES
33
planar cathode (2-nm TiO2 thickness); it is obtained from a set of several I–V characteristics measured at different values of d. For each of these two cathodes, a unique J–F plot has been obtained from, respectively, the different I–V curves measured from the same cathode and for different d, the uniqueness of the J–F plot for each cathode is a validation of the upper conversion procedure. Some of the characteristics related to this conversion procedure are as follows: 1. The main factor in the determination of the J–F curves is the distance d. An error on the distance d greater than 5% modifies the field distribution, so that from a set of I–V measurements at different d, they cannot be converted into a single J–F curve, as shown in Figure 18b. 2. An uncertainty of 5% on the radius R of the probe-ball induces an error of less than 1% on the field values and an error of less than 5% in the determination of the current density. Measurement of R with an optical microscope (R/R < 2%) gives an adequate precision. 3. The increment of the emission voltage V between two measurements must be as small as possible so that the variation of the field distribution over Si can be considered negligible. In our experimental procedure, we use a value of 1 V (or less) for the voltage increment, and the related variation of the field over a discrete area Si was then estimated to be less than 0.12 V/µm. 4. Finally, any current instability at a given voltage V during the I–V measurements is greatly amplified from the conversion into the J–F curve by introducing local singularities.
IV. P LANAR C ATHODES : T HEORETICAL A PPROACHES AND E XPERIMENTAL R ESULTS Industrial applications of FE in view to developing vacuum microelectronic devices were conceptually stated in the 1960s (Shoulders, 1961), but active investigation began only in the early 1970s with the introduction of integrated microtips by Spindt and co-workers (Spindt et al., 2001; Spindt et al., 1976). The Spindt microtips have overall dimensions in the range of several micrometers. Microfabrication is used to produce the integrated microtips via monolithic construction (Spindt, 1968); they are basically sharp molybdenum microtips in front of an extractor gate separated by 1 µm. The local field enhancement at the apex of each microtip with the emitter–gate separation of ∼1 µm enables FE at a threshold voltage in the order of 100 V. Subsequently, it allows the realization of arrays of such vacuum microtriodes with two major outcomes; the development of large-area addressable FE sources called field
34
BINH AND SEMET
emission arrays (FEAs) and the paradigm of planar cold cathodes (i.e., the FE electron sources were no longer limited to the micron-size apex of a single tip but could be a large-area surface in the range of several square centimeters). However, the microfabrication of Spindt FEAs is relatively expensive and requires complicated processes, and the FE stability of metallic tips in poor vacuum condition is still uncontrollable. In seeking an appropriate material and technique as an alternative to the Spindt metallic microtips, a covalent surface that is chemically inert is favored in order to have electron emission properties and robustness much superior to metal tips. There are two main orientations in this new area of research. The first one is the use of carbon nanotubes, and the second is the setup of new thin-film cathodes with low or negative electron affinity. With the first approach, the Mo microtip is replaced by a carbon nanotube (CNT) to take advantage of the success in deterministic growing CNTs in array form, with the capability of making them vertically aligned to the substrate surface (Teo et al., 2003). In association with the chemically inertness of graphene surface, the high aspect ratio of the CNTs makes them privileged potential candidates for uniform emission and high toughness FEAs (Semet et al., 2002). With the second approach for a real planar cathode, a thin-film emitter is deposited on a flat conducting surface. It emits electrons when an electric field is applied by an anode separated from the film front surface by a vacuum gap. Because there is no corrugation or protrusion at the surface (i.e., the field enhancement factor is 1), FE from planar cathodes can occur only for an effective low surface barrier (of order 1 eV or less), resulting from a nanostructuring of the underneath surface by the thin-film deposition. In other words, planar cathodes yield broad-area electron emission for values of local fields in the range of 100 V/µm. They are much smaller in values than the fields (of the order of 5000 V/µm) at which FE occurs for metals. There are several possible structures for planar cathodes. In addition to carbonbased (diamond) films (Robertson, 1986; Wang et al., 1991; Geis et al., 1991; Xu et al., 1993; Shaw et al., 1996; Underwood et al., 1998), reported fieldemitting planar cathodes include multiple layers of graded electron affinity (Shaw et al., 1996), piezo-electric surface layer of InGaN/GaN (Underwood et al., 1998), recently manufactured composite ultrathin dielectric layers with thicknesses of order 2–5 nm (Binh and Adessi, 2000; Semet et al., 2004) and nanocristalline films of LaS (Semet et al., 2006). The work function Φ of a uniform surface of a conductor is defined (Herring and Nichols, 1949) as the difference between the chemical potential μ of the electrons inside the conductor and the electrostatic potential difference ϕi–o of an electron between the inside and the outside of the conductor Φ = ϕi–o − μ.
(39)
PLANAR COLD CATHODES
35
The chemical potential μ is a volume property that is independent of the structure of the surface. On the contrary, ϕi–o depends on the condition of the surface as well as on the structure of the interior underneath the surface. For example, any change, , in the dipole moment per unit area of the surface changes the electrostatic potential by 4π. This is why the work function varies from one crystallographic facet orientation to another and why electronegative adsorbates such as oxygen on the surface usually increase the work function, whereas electropositive adsorbates such as barium or cesium decrease the work function. This last attribute is currently used in cathode ray tube (CRT) hot cathodes to lower the tungsten work function from 4.5 eV to ∼3 eV. The shortcomings of the surface adsorption to lower the work function are the stability in time and the reproducibility of the adsorbed film (i.e., of the cathode properties). Thus, intrinsic low work function materials, such as LaS (Semet et al., 2006), have been used as an alternative for obtaining planar cathodes, a case described in later text. The other method to modify the electrostatic potential ϕi–o is to change the electrostatic structure of the interior by the addition of ultrathin films with thicknesses in the range of nanometers, a solution underlying the composite ultrathin dielectric layers planar cathodes (Binh and Adessi, 2000; Semet et al., 2004) described in the following text. A. Ultrathin Dielectric Layer Planar Cathodes The basic structure of the dielectric layer planar cathodes is a wide band-gap n-type ultrathin semiconductor (UTSC) layer, with thickness in the range of 2–5 nm, deposited on a metallic surface (Figure 19). Since the emission is controlled by the electrostatic structure of this layer, this class of cathode is called a solid-state field-controlled emission cathode (SSE cathode). We first describe a numerical implementation for the SSE to highlight the physical process that allows the electron emission from the SSE cathode and to compare the J versus F features. The experimental results and their analyses are presented after the theoretical aspects. 1. Theoretical Approach for the Basic SSE Structure The model consists of: 1. A basic SSE structure with a UTSC layer in the range of 2- to 10-nm thick deposited on a metallic surface is presented in Figure 19. The material chosen for UTSC is TiO2 , with ε = 35, electron affinity = 4.5 eV, band gap = 3 eV, and a doping level at −0.2 eV under the conduction band and the thickness layer at 5 nm. The substrate is platinum, with the Fermi level
36
BINH AND SEMET
(a)
(b) F IGURE 19. SSE planar cathode. (a) Schema of the geometrical structure. (b) Potential diagram for a 5-nm TiO2 ultrathin layer on platinum as substrate.
at 9.45 eV and a work function = 5.3 eV. Figure 19b shows the potential diagram for this SSE structure. 2. A Schottky junction between the metal and the UTSC with the conventional energy band relation (Sze, 1981). 3. A triangular representation of the vacuum barrier, including the image potential at the surface of the semiconductor (see Section II). 4. The equilibrium space charge distribution QSC inside the UTSC is calculated by a numerical integration of Poisson’s equation involving the
PLANAR COLD CATHODES
37
potential distribution V (z): e2
d 2 V (z) = − p(z) + p (40) n(z) − n 0 0 ε dz2 within a quasi-equilibrium condition, which means the assumption of a zero emission current approximation (ZECA) [n(z) is the conduction electronic density, p(z) is the hole density, and n0 and p0 are the intrinsic carrier densities of n and p, respectively]. This implicitly assumes that the electrons are in thermal equilibrium among themselves, and ZECA is valid as long as the emission current J is small relative to the electron supplies function (Jensen and Ganguly, 1994). The potential distribution V (z) is calculated by starting from the metallic boundary with an initial value corresponding to the Schottky barrier height and propagating it by using a finite differences method based on the second-order series development h2 d 2 V dV + , (41) dz 2 dz2 until the UTSC surface with vacuum, where dV /dz is the field inside the UTSC (Jensen, 1993) and h is the discretization step. The propagation is done by modifying the polarization of the UTSC in an iterative process until the field at its surface reaches F /ε. 5. The emission current density J is obtained by the resolution of the one-body Schrödinger equation using a Green’s formalism based on the numerical resolution of the self-consistent Lippmann–Schwinger (LS) equation as described in Section II. The calculations are 1D and the charge densities for electrons in the conduction band, as well as for holes in the valence band, are assumed to be given by the Fermi–Dirac statistics. The two boundary conditions are the following: (1) at equilibrium and at the Schottky interface, the two Fermi levels are linked, and (2) the field at the UTSC surface with vacuum is F /ε, with ε the dielectric constant. V (z + h) = V (z) + h
The quantum transport through this 1D potential barrier is analyzed using a transmission coefficient, D(F, W ), approach. Therefore, the current density between the electron reservoir (metal substrate) and the vacuum through the UTSC layer is given by (Duke, 1969) kT J (F, T , EF ) = 2π2
∞
W − EF dW, (42) D(F, W ) ln 1 + exp − kT
CB
where k is the Boltzmann constant, T is the temperature, W is part of the energy for the motion normal to the surface, EF is the Fermi level in the metal, and F is the externally applied field.
38
BINH AND SEMET
D(F, W ) is numerically calculated by means of the Lippmann–Schwinger (LS) self-consistent equation, which introduces a reference system, having analytical solutions, and a perturbation corresponding to the UTSC layer (see Section II). The reference system corresponds now to a 1D system constituted by a metal–metal-polarized junction. The LS equation is (43) ψ(z) = ψ0 (z) + dz × G0 (z, z ; W ) × V (z ) × ψ(z ), where ψ is the wave function of the electron in the whole system, ψ0 and G0 (z, z ; W ) are, respectively, the wave function and the Green’s function of the reference system. The solutions of the corresponding one-body Schrödinger equation are expressed by means of Airy functions. The main results are as follow: 1.1. Potential Distribution Across the System. At the application of a voltage V between the SSE cathode and the anode, electrons from the metal substrate cross the Schottky junction, but they are trapped inside the UTSC because the initial surface barrier is too high (of order 4 eV) to allow the electron emission (Figure 20a). At this beginning stage, slight band bending occurs near the metal–UTSC interface due to charge redistribution at the Schottky junction (Figure 20a). With the increase of the applied voltage, the injected electrons build up a space charge QSC in the UTSC layer with subsequent modification of the potential distribution inside the layer. QSC increases and reaches an ending value that corresponds to the solution of Poisson’s equation. For this value (−2.24 e/nm2 for 5 nm-TiO2 ) there is a strong band bending with the formation of a quantum well inside the UTSC layer underneath the surface. The electron injection with the subsequent formation of the quantum well (QW) defines step 1 as shown in Figure 20. 1.2. Emission Current versus the Applied Field. The evolution of the potential energy diagram during the formation of the quantum well, until QSC reaches the end value of −2.24 e/nm2 (Figures 20a and b), can be replotted on a same figure with the potential energy at the metal Fermi level as energy reference, as shown in Figure 21. With the assumption that the electrons hit the surface barrier with energy around the Fermi level and lower, step 1 is clearly equivalent to an effective lowering of the surface barrier to a value of 0.86 eV above the metal Fermi level. The height of the surface barrier relative to the metal Fermi level is called the effective surface barrier Φ. Further increase of the applied potential does not significantly modify the band bending inside the UTSC. The structure of the quantum well remains essentially unmodified; however, the surface barrier undergoes the field-induced deformations with a lowering of its height and a narrowing of its width. Plots 1–3 in Figures 20
PLANAR COLD CATHODES
39
F IGURE 20. Evolutions of the potential energy diagrams of an SSE cathode for increasing applied voltage from (a) to (c). (a) The surface barrier is too high; the electrons injected from the metal through the reverse-bias Schottky junction are trapped in the UTSC. (b) Formation of a quantum well as a consequence of a buildup of a space charge QSC by the electron injection from the substrate; this defines step 1. (c) Electron emission from the quantum well when the applied field is increased; this defines step 2.
and 21 illustrate the deformation of the surface barrier for applied fields from 50 to 140 V/µm. This evolution constitutes step 2, which is characteristic of an electron emission process. The numerical simulations of the emitted current density J versus applied field F , as well as the corresponding TED, are calculated by assumption that the QW is filled until the metal Fermi level. This simple approach certainly overestimates the emitted current, but the results shown in Figure 22 should provide a good insight of the specific features of the electron emission from
40
BINH AND SEMET
F IGURE 21. Evolutions of the potential energy diagrams of an SSE cathode for increasing applied voltage (i.e., field). Figures 20(a)–(c) are gathered on the same figure with the reference for the potential energy at the Fermi level of the substrate metal. Step 1 illustrates the quantum well formation. Step 2 shows the subsequent deformation of the surface barrier allowing the electron emission.
the SSE cathodes. The FN plot of the emission current variation (Figure 22a) is calculated for Φ of 0.86 eV, which corresponds to the height of the surface barrier relative to the metal Fermi level after step 1. In this figure, the different emission processes are indicated; they are deduced from the position of the corresponding TED relative to the top of the barrier TB and to the metal Fermi level. Some of them are represented in Figure 22b. The detailed features of the electron emission from the SSE cathode are summarized below. They are specific to electron emission from a low-height surface barrier: 1. At very low field, as the surface barrier is not yet very deformed by the field (as indicated by the plot 1 of Figures 20 and 21), the electrons can emit only when their energy is higher than the top of the barrier (i.e., by a thermionic process; see Section II). The TED 1 of Figure 22b is an example of a thermionic TED. 2. With increasing fields, the width of the barrier becomes narrower (as illustrated by plots 2 and 3 in Figures 20 and 21). This therefore allows tunneling of the electrons through the top of the barrier in concordance with the former TE and defining then a T-F emission process. In TED 2 (Figure 22b), for example, a small number of the electrons begin to tunnel through the barrier (they have energy less than TB 2). This increases
PLANAR COLD CATHODES
41
F IGURE 22. (a) J–F characteristics for a 5-nm TiO2 SSE cathode at 300 K. The arrows numbered 1 to 6 correspond to the J–F values of the TED 1 to 6 in (b), respectively. (b) TED evolutions of the electrons emitted from an SSE cathode at 300 K. TB 1 to 6 correspond to the energy levels of the top of the barrier, respectively, for: 0.008, 0.012, 0.019, 0.034, 0.049, and 0.071 V/Å.
from TED 2 to TED 5, but both mechanisms—TE and tunneling—are still present until the ballistic emission process starts for F > 0.05 V/Å. 3. The lowering of the surface barrier height with increasing fields, concomitant to its narrowing, brings on a direct evolution from a T-F process to a field-induced ballistic process when F > Fbal = 0.05 V/Å, without transition through the FE process. This occurs because of the low initial value of the barrier height, which is less than 1 eV. Of note, TED 5 (Figure 22b) still has an important thermionic part (electrons with energy higher than TB 5) even when the top of the barrier crosses the metal Fermi
42
BINH AND SEMET
level (i.e., shifting to field-induced ballistic emission as illustrated by the TED 6 on the same figure). 4. The corresponding TED of the emitted electrons, before the ballistic emission, is specific to a T-F emission. This means a TED profile without a sharp edge at the high-energy side as in Figure 6, but the peaks look like the TED of a Schottky cathode (see Figure 8, with both lower and upper energy tails). They are subsequent to a concomitant contribution of electrons passing over the barrier, which are no longer negligible, with tunneling electrons as shown in (Figure 22b). 5. Another specific feature, correlated to a low surface barrier and a T-F emission, is the noticeable shifting of the TED peak toward the low energy with increasing voltage (i.e., increasing field as illustrated in Figure 22b). This behavior pattern is not observable for FE when the cathode work function is in the range of 4–5 eV (Figure 6), but it is evidence for T-F emission as shown for the Schottky cathode in Figure 8. In conclusion, the TED features of an SSE cathode, as well as its emission characteristics at room temperature, are specific of a T-F emission. 1.3. The Band Bending versus the UTSC Layer Thickness. The main parameter that allows the understanding of the emission behavior of the SSE is the important band bending observed inside the UTSC layer. This band bending is related to the equilibrium charge QSC that can be injected into the UTSC layer from the metal substrate. The value of QSC depends on the thickness of the UTSC layer (as shown in Figure 23)—it is higher for smaller thickness. Subsequently, the effective surface barrier Φ is thickness dependent; it decreases when the space charge density increases. Its variation is also represented in Figure 23. Due to the assumptions used to perform the numerical simulations, the absolute theoretical values for QSC and Φ with the thickness should not match exactly the real experimental data. However, their evolutions with the UTSC layer thickness denote the overall tendency, and we can state that electron emission is easier to render when the thickness grows smaller. Moreover, we can define two limits for the UTSC layer thickness. As shown in Figure 23, the lowering of the effective surface barrier Φ to values less than 2 eV (similar to low electron affinity cathode) can be obtained only for thicknesses of the UTSC layer smaller than 6 or 7 nm. This last value thus defines the upper limit of the UTSC thickness for SSE operation. This means that for thicknesses greater than 10 nm, the electron emission enhancement—the SSE effect—is not perceptive. Conversely, the lower limit of the thickness is determined by the requirement to keep QSC inside the UTSC layer. This lower limit for SSE operation is then given for Φ negative,
PLANAR COLD CATHODES
43
F IGURE 23. Variations of the space charge density in the UTSC and of the effective surface barrier height in function of the UTSC thickness.
which is similar to a negative electron affinity behavior, because therefore no significant increase in the concentration of electrons in the UTSC layer can happen. In conclusion, if the correct values of the material properties used in the experiments are not known with precision, particularly the exact value of the donor level (which is related to the presence of impurities and defects), then discrepancies between experimental results and theoretical predictions can result. This means that the exact values for the upper and lower limits for the thickness of the UTSC for SSE operation will change depending on the UTSC layer properties. However, the general behavior is maintained. It is given that the thickness of the UTSC layer should remain in the range of a few nanometers in order to notice an electron emission enhancement, and that the smaller the thickness, the greater the enhancement of the electron emission. 2. Theoretical Approach for the Composite SSE Structure This section presents the electron emission from a composite ultrathin dielectric multilayer planar cathode, which means a nanostructured SSE cold cathode. By introducing the multilayer concept (Semet et al., 2004), and compared to the single-layer SSE cathode previously mentioned, more parameters become available for the control of the SSE process, and in particular, for the injection of the space charge QSC . These are the bulk interfacial barriers in addition to the surface barrier and the presence of quantized sub-bands E1
44
BINH AND SEMET
F IGURE 24. Composite multilayer SSE cold cathode. (a) Schematic structure of the different layers of the cathode within the SAFEM environment. (b) Band-edge diagram in the absence of an external electric field (Vapp = 0); Vb1 and Vb2 are respectively the first and second barriers; E1 , E2 are the energy levels of sub-bands inside the quantum well.
and E2 due to the confined thickness of the outermost GaN layer. The actual structure of the cathode is shown in Figure 24. We first explain why the introduction of a QW with the GaN layer can effectively increase the tunneling, but the subsequent electron emission enhancement is not strong enough to explain the SSE cathode features. Figure 24b shows two QW states E1 and E2 at a bias voltage V = 0. With the application of a bias at V = V1 , the state E1 is aligned with the electrons in the contact at energy 0 < E < EF , resulting in a resonant tunneling (RT) current, JRT (Figure 25). Basically RT serves as: (1) a matching section for higher transmission, and (2) raising the states from the bottom of the conduction band due to quantum confinement so that the tunneling distance is reduced. RT results in a lowering of the effective work function given by ∼(E1 + [V (w) − V (0)]/2), when w is the width of the QW. However, in such a mechanism, the FE surface barrier is still the original barrier unchanged, and it cannot account for very low values of the surface barrier ΦFN for the nanostructured SSE. To account for a specifically low value of ΦFN of the nanostructured SSE, the model for the electron emission is obtained through a serial two-step mechanism under applied field, similar to the upper one for the single-layer SSE. In a first step, under the polarization, electrons are injected in the GaN layer from the cathode substrate by tunneling through the Al0.5 Ga0.5 N 2-nm layer. They will occupy the sub-bands that are below the Fermi level, creating a concentration of electrons inside the GaN layer. Due to this electron concentration QSC (space charge) formation, there is an upward energy shift, which leads to a relative lowering of the vacuum level compared to the Fermi level of the substrate (Figure 25b). The concept is as follows: when this 2D-like quantum state is occupied, it results in a space charge in the QW, leading to additional lowering of the effective work function defined by the energy of the source of electron to the vacuum level.
PLANAR COLD CATHODES
45
F IGURE 25. Illustration of the different field emission mechanisms by schematic band-edge diagrams of the nanostructured SSE planar cathode with an applied field F and at room temperature. (a) Only resonant tunneling mechanism; (b) with space charge formation inside the GaN layer with, as a consequence, an effective lowering of the surface barrier. In addition to the resonant tunneling and due to the occupation of the quantum state E1 , electrons occupying this state (for example, whenever the level E1 moves below 0) can tunnel out of this single barrier via the usual F-N tunneling, resulting in JSC , and the total current JFN = (JRT +JSC ). (Notes: (i) to be able to fit these band diagrams within this figure, the field representation is not at the same scale inside the cathode and outside in vacuum; in particular, if one considers GaN having ε = 8ε0 with an applied field in the range of 50 V/µm; (ii) further, reduction from the induced image charges due to the space charge in the quantum well is not shown in this sketch.)
To estimate the lowering of the effective work function due to the QSC inside the 2D QW, we can use a simple approach based on the potential induced by the presence of QSC . To calculate this effect, we begin with the 2D-like density of state: n(E) =
m∗ πh¯ 2
,
(44)
where m∗ is the effective mass of the electron with charge e. In the QW of width w, assuming a perfect confinement, the charge density in the lowest level E1 is 2 πz eΨ ∗ Ψ = e sin2 , (45) w w with Ψ the wave function, so that the charge density inside the QW is m∗ 2 2 πz (E2 − E1 ). n(E) = e 2 sin w πh¯ w
(46)
Solving the Poisson’s equation ∇ 2 (VSC ) = −n/ε (ε is the dielectric constant of GaN), we arrive at the potential energy of the space charge VSC ,
46
BINH AND SEMET
for 0 ≤ x ≤ w, VSC
ρ0 = 4ε
2 w 2 πz 2 + wz − z , sin π w
(47)
where ρ0 ≡ e
m∗ 2 (E2 − E1 ). πh¯ 2 w
(48)
Although the expression in Eq. (47) is for the case of V (0) = V (w) and we used the sine function instead of Airy function for the probability density Ψ ∗ Ψ , we assume that except for very high field, this approach is more than adequate. The maximum value of VSC is at z = w/2, or VSC [w/2] = 0.25ρ0 w2 × π−2 + 0.25 , (49) and the average VSC ≡ VSC [av] = 0.62 × VSC [w/2].
(50)
Taking the average of the difference [V (w) − V (0)], the total lowering of the work function Φ is:
(51) Φ ≡ Vb1 − Φeff = VSC [av] + 0.5 × V (w) − V (0) + E1 . The effective barrier Φeff is the actual barrier at the surface after the lowering Φ. For an estimation of Φ we have evaluated numerically the right-hand side of Eq. (51). Taking m∗ = 0.22m0 , ε = 8ε0 for GaN, and a value of VSC [w/2] = 0.4 eV, yields VSC [av] = 0.25 eV, 0.5 × [V (w) − V (0)] = 0.62 eV, by taking E1 = 0.18 eV, this gives Φ = 1.05 eV due to space charge in the QW; therefore, a value of Φeff = 0.45 eV for Vb1 is 1.5 eV. We conclude that after the occupation of the quantum level for the electron in the state E1 lying below E = 0, the current JFN = JRT + JSC is given by the tunneling through a single barrier created by the vacuum, with an effective barrier of only a few tenths of an electron volt (Figure 25b). This lowered second barrier at the surface controls the variation of the emitted current JFN with field. Recently a self-consistent quantum calculation of the FE current by a quantum transfer matrix (TM) method (Wang et al., 2005) based on analytical solution of the Schrödinger equation with a linear potential (with the solution expressed as a linear combination of the Airy function or other wave functions) has been developed to investigate the structural enhancement mechanism of FE from multilayer semiconductor films. This calculation has shown for an applied field of 0.5 V/Å the presence of three quantum energy
PLANAR COLD CATHODES
47
levels (E1 = 0.071 eV, E2 = 0.282 eV, and E3 = 0.617 eV). Since the quantum energy levels are self-consistently calculated, the levels E1 and E2 should be the occupied states corresponding to the accumulation of electrons in the QW. The presence of this space charge induces a band bending inside the GaN layer, leading to a first barrier height of 0.66 eV and a second barrier height at the surface of 0.43 eV. The energy band diagram showing the band bending with the different quantum energy levels is shown in Figure 26. The energy level E3 = 0.617 eV is above the top of the effective surface barrier, and thus it must play a role in the direct emission by a thermionic process of electrons directly from the substrate reservoir n-GaN (illustrated in Figure 25 with the arrow on top of the first barrier). The self-consistent quantum calculation does confirm the data estimated by Eq. (51); it also gives the exact values for the discrete quantum energy levels inside the QW and the possibility to follow the evolution of the dual-barrier structure for increasing applied voltage between the substrate n-GaN and the GaN surface (Figure 26b). It indicated that the different quantum levels are not affected by the applied voltage; only the surface barrier is modified with subsequent increase of the electron emission current with the field. Experimental studies of the SSE cathodes have been done with a single layer of TiO2 on platinum with thickness between 2 and 5 nm and also with a composite layer of undoped GaN (4 nm) on n-Al0.5 Ga0.5 N (2 nm). B. Experimental Results for SSE Cathodes: TiO2 -SSE Cathodes With the TiO2 -SSE cathodes the objectives were to confirm experimentally the following main features that are demonstrated theoretically in the above paragraph: 1. The existence of a strong enhancement of the electron emission due to the presence of an ultrathin layer of TiO2 . This characteristic should be expressed by a threshold field value for electron emission much smaller than 5000 V/µm, which is the value observed for metal surfaces. 2. The electron emission enhancement should be dependent on the UTSC layer thickness. 3. The TED of the emitted electrons must be specific to a T-F mechanism— a peak without a sharp edge pinned at the Fermi level and a large shift of the TED peak toward the low-energy side for increasing applied fields. 1. Deposition of TiO2 Ultrathin Layer on Platinum Substrate The flat cathodes are obtained from a conventional thin-film deposition sputtering technique (Binh et al., 2000), with two independent magnetron
48
BINH AND SEMET
F IGURE 26. (a) Energy band diagram of the n-GaN/Ga0.5 Al0.5 N–GaN/vacuum FE structure; d1 = 2 nm and d2 = 4 nm and the vacuum gap is 10 nm. (b) Evolution of the energy band diagram for increasing applied voltages between n-GaN and the surface of GaN. The three quantum energy levels remain unchanged. (From Wang et al., 2005.)
sputtering sources inside an oil-free chamber with a base vacuum better than 10−9 torr. A Kaufmann Ar+ ion source is used to clean the surfaces before the deposition. The substrate is a commercial polished silicon wafer; it is used to have a reference flat surface. The metal layer is platinum and has a thickness of 100 nm. The UTSC layer is TiO2 ; it is deposited immediately after the
PLANAR COLD CATHODES
49
F IGURE 27. AFM observation of the deposited TiO2 layer (left-hand side) and HRTEM observation (right-hand side) of a 3-nm amorphous TiO2 layer on columnar crystalline platinum layer with a monoatomic resolution interface.
platinum layer in the same run (i.e., without modifying the vacuum condition). This procedure is used to provide a good interface between the platinum and the 1- to 5-nm thickness UTSC layers. Rutherford backscattering analysis of the deposited TiO2 films shows a good stoichiometry. Defects resulting from the sputtered deposition of TiO2 are known to induce donor states at a few tenths of an electron volt under the conduction band, giving to the UTSC layer the quality of an n-type wide-bandgap semiconductor. The geometrical characteristics of the ultrathin layer of TiO2 are controlled by atomic force microscopy (AFM) and by high-resolution transmission electron microscopy (HRTEM). They indicated columnar nanocrystallites for the platinum layer, a monoatomic interface between platinum with TiO2 , a uniform-thickness layer of amorphous TiO2 , and an average root-mean-square variation of the TiO2 surface roughness, measured over several 1-µm2 areas across the entire sample, to be below 2 nm (Figure 27). A sol–gel technology also can be used for the TiO2 deposition with thickness in the range of 5 nm (Binh et al., 2000). The precursor solution was obtained by mixing titanium isopropoxide (Ti[OCH(CH3 )2 ]4 ) and propanol-2 [(CH3 )2 CHOH]. Glacial acetic acid (AcOH) was then added with an AcOH titanium molar ratio of 6. The AcOH, which acts as a stabilizer by complexing the titanium alkoxide, is necessary to prevent precipitation of TiO2 and to maintain a stable precursor solution before coating. The water for hydrolysis comes from the esterification reaction, which occurs between an excess of AcOH and alcohol. A transparent stable solution was obtained. Methanol was then used to adjust the viscosity of the solution, which controls the thickness of the deposited layer (Bahtat et al., 1996). The TiO2 film is deposited on the
50
BINH AND SEMET
metal base cathode by dipping (at a constant and controlled speed of 4 cm/min under controlled atmosphere) and then first dried at a temperature of 100◦ C. Finally, the sample is annealed under infrared lamp at a temperature of 350◦ C for 15 minutes to stabilize, densify, crystallize, and harden the ultrathin film. The thickness of such layer (Plenet et al., 2000) has been measured; it was in the range of 5 nm. 2. SAFEM Field Emission Measurements 2.1. Electron Emission Enhancement over TiO2 Layer. To be certain about the role of the TiO2 layer, a differential measurement has been done. On the platinum substrate only a strip of TiO2 is deposited, its width is 100 µm on 5-mm length, and the thickness is 5 nm. A SAFEM scan with the probe-ball at constant height was then performed across that strip to measure, with the same applied voltage, the local FE current outside and over the TiO2 strip. The measurements were done in a vacuum of 10−8 torr and with no prior cleaning of the surface of the cathode after the introduction of the sample in the analysis vacuum chamber. The result is presented in Figure 28, which indicates without any doubt the existence of an electron emission enhancement over the TiO2 strip as the electron emission was confined over the part of the surface with having the deposited TiO2 layer, with an almost uniform emission over this region if a correction due to the edge effect is taken into account. This differential measurement removes the possibility of field emission from the platinum surface at some corrugations resulting from its columnar nanostructure, even if this possibility seems already very unlikely from the dimensions of the corrugations given by the AFM and HRTEM observations. Such a differential observation indicates a strong electron emission
F IGURE 28. strip.
Mapping the field emission current at constant height and voltage across a TiO2
PLANAR COLD CATHODES
51
enhancement due to the presence of the deposited UTSC layer. Therefore, it experimentally validates the concept of a strong lowering of the surface barrier by the ultrathin dielectric layer. 2.2. J–F Characteristics of the SSE Cathode. The total emission current I versus applied voltage V measurements were done in a SAFEM by using a spherical probe-ball, ∼250 µm in diameter, located at a distance z over the SSE plane surface with the methodology described in Section III. All experimental measurements were done in a vacuum ∼10−8 torr without preliminary baking of the chamber. The measured total current–voltage I–V characteristics were converted into current density-local field J–F characteristics. A characteristic variation of the total emission current from the SSE cathode in function of the applied voltage (I–V) is shown in Figure 29a for a given distance (∼7 µm) of the probe-ball in front of the SSE cathode. The thickness of the TiO2 layer is 5 nm. From a set of (I–V) obtained for different probe-ball–cathode distances (from 5–10 µm), a unique J–F plot can be extracted resulting from the gathering of all the measurements. Figure 29b is the J–F plot extracted from Figure 29a. This plot indicates a threshold field of order 100 V/µm, a value that must be put in perspective with the 3000 V/µm for a metallic surface cathode such as clean platinum. This is strong confirmation that the presence of the TiO2 layer drastically reduces the threshold field for an electron emission. It is the quantification of the phenomenon qualitatively presented in the previous paragraph with the differential measurement. A comparison of the experimental data with the theoretical calculations in Section II points out the following specific features: 1. The F-N plot of the I–V characteristic is not a straight line, as shown in Figure 30. It indicates a saturation effect, which is a specific feature of a T-F emission. Therefore, this means that the effective surface barrier must be of order less than 1 eV, low enough to allow a T-F emission at 300 K. 2. The current densities of the SSE cathode are much smaller than the theoretical values. This difference in the values could be understood for the following two reasons. First, in the theoretical calculations the electron density inside the cathode is given by the metal Fermi–Dirac distribution. It is not the case here with the SSE cathode with the TiO2 layer. Second, the electrons that are emitted by the SSE cathode come from the QW created by the space charge formation. Subsequently, the supply function is not the same as for a bulk metal and could be much smaller. These two reasons explain the small values of the SSE current densities compared to theoretical calculations using metal-like material. This difference precludes a possible straightforward and precise extraction
52
BINH AND SEMET
F IGURE 29. (a) I–V characteristics for a given gap between the probe-ball and the SSE cathode. (b) Corresponding J–F characteristics.
of the surface barrier from experimental data. However, a rough estimation is still feasible and yields a value of order less than 1 eV for the effective surface barrier height. 3. A specific feature of the emission from the SSE cathode that was not anticipated by the theoretical calculations is the explosive behavior that occurs for high emission currents. This is indicated by an arrow in Figure 30, and it is characterized by a very steep increase of the current with the voltage. The control of this very rapid current increase is not
PLANAR COLD CATHODES
F IGURE 30.
53
F-N plot of the I–V characteristics of Figure 29a.
easy; however, the SSE cathode is not destroyed if the current is kept below a value of the order of a few milliamperes. Otherwise, the cathode is destroyed by the high emission current. The possibility of maintaining reversible behavior after a controlled incursion into the explosive emission is proof that this phenomenon is not the formation by surface diffusion of nanocorrugations on the cathode surface due to thermal runaway. To account for this explosive behavior, we introduce the hypothesis of an increase of the local temperature due to the emission current, but without geometrical deformation by surface diffusion. Calculations show that the effect of the temperature on the emission from an SSE cathode affects not only the Fermi–Dirac distribution but also the space charge value QSC in the UTSC layer. Therefore, the calculations of the current increase due to the current-induced temperature increase must consider these two effects. To estimate the current increase, we have taken an arbitrary function to relate the temperature increase induced by the current; the result is shown in Figure 31. The calculations indicate a very sensitive dependence of the current with the temperature. In Figure 31, the current is increased by a factor of 1000 when the temperature shifts from 300 K to 320 K. The main cause of such a steep increase is the lowering of the effective surface barrier consequence of the space charge redistribution with temperature. To corroborate the theoretical estimation of the very strong increase of the current with temperature, the experiment has been done with a 3-nm TiO2 SSE cathode (plotted in Figure 32). The results show an increase of the current by a factor on the
54
BINH AND SEMET
F IGURE 31.
F IGURE 32.
Emission current increase with temperature.
J–F characteristics of an SSE cathode for two different temperatures.
order of 100 when the temperature of the SSE cathode increases by 42 K from room temperature.
PLANAR COLD CATHODES
55
2.3. Electron Emission for Different TiO2 Layer Thicknesses. The second point that must be assessed is the variation of the electron emission enhancement in the function of the TiO2 thickness. Figure 33 shows the collected measurements for four values of TiO2 thickness: 1, 2, 3, and 5 nm. The threshold field for electron emission is of order 90 V/µm for 5-nm thickness; it decreases to ∼30 V/µm when the TiO2 thickness is 1 nm. This figure demonstrates that the experimental measurements do corroborate the theoretical calculations, which indicated a higher electron emission enhancement for smaller thickness of the UTSC layer. These values must also be put in parallel with the threshold value for metallic cathodes (of order 3000 V/µm), a comparison that provides a good perception of the enhancement factor and the lowering of the effective surface barrier. 2.4. TED of the Emitted Electrons. The next feature to assess for an SSE cathode is the TED of the emitted electrons. For this study, the platinum and UTSC layers were deposited by sputtering on the apex of a W 111 tip with a nominal thickness of 10 nm for the platinum layer and 5 nm for TiO2 (Binh, 2001). The choice of a tip geometry substrate permitted the analysis within a field electron emission spectroscopy (FEES) environment (Binh et al., 2001), that is, direct measurements of the emitted electrons without the presence of an intermediate extraction grid that can create artifacts.
F IGURE 33. thickness.
J–F characteristics for SSE cathode with four different values of the TiO2
56
BINH AND SEMET
The energy distribution measurements were performed in a vacuum of 5 × 10−9 torr, through a 1-mm diameter probe hole, and the temperatures of the tips were controlled via a calibrated Joule heating loop with a precision of 10–20◦ . Using a tip geometry substrate means that the thickness of the layers were not known with the same precision as for a deposition on a flat surface. Also the energy distribution measurements cannot be obtained for very low current densities because the measurements of the spectra required a minimum current I ≈ 0.1 pA coming from the probed zone of the emission area, which has only an actual diameter in the order of 10 nm due to the magnification induced by the radial emission of the electron beams. Figure 34 shows a characteristic example of the overall behavior of the energy distributions of the electrons emitted from SSE cathodes and its evolution for increasing the applied voltage V , from 780 V to 1050 V, which resulted in a variation of the probe emission current from 1 pA to 780 pA, respectively. From the experimental results, the following properties can be demonstrated: 1. The SSE energy distribution showed one peak with an FWHM value in the range of 0.4 eV. 2. The peak profile does not present a sharp edge at its high-energy side. 3. A conservation of the normalized peak profile, in particular for the lowenergy side of the peak, even for an increase of V by a factor of 1.4. This again indicates behavior different from the conventional tunneling process through a triangular barrier for which the slope at the low-energy side of the peak is controlled by the field (i.e., proportional to 1/V ). 4. For increasing V , the peak shifted to lower energies with larger shifts at higher voltages. Figure 34b shows the values of the peak shifts E for different measurements. The plot E versus V showed a nonlinearity of the peak shifts. Such results mean that the peak shifts cannot be related to a field penetration into the UTSC layer, because this latter case implies a linear shift in function of V . These different specific features of the experimental TED very much favor a T-F process as the main mechanism for electron emission from the SSE cathode. It echoes what is predicted by the theoretical analysis. 2.5. Stability in Time of the Emission Current. Most of the specific properties described in the previous list can be understood if the drastic band bending resulting from the electronic injection from the Schottky junction is considered. Moreover, the covalent nature of the UTSC surface associated with the low effective surface barrier must relegate to a minor role the adsorption from the vacuum environment for the current stability. To verify this feature, the stability in time of the emission current from a 5-nm TiO2
PLANAR COLD CATHODES
57
F IGURE 34. (a) Evolution of the TED spectra of electrons emitted from an SSE cathode. The data sets indicate the applied voltages and the probe currents associated with each TED. (b) Shift toward the low-energy side for increasing applied voltage of the TED in (a).
SSE cathode was observed for different vacuum conditions (Binh et al., 2001). The results are plotted in Figure 35 for three different vacuum values, and the total emission currents I are recorded for fixed applied voltages. Under ultrahigh vacuum (5 × 10−10 torr) condition the stability can last for tens of hours, but Figure 35a shows only the data for the first two hours. Such stability was also observed for vacuum in the range of 10−8 torr (Figure 35b). Under poor vacuum condition, the stability is maintained but the emission
58
BINH AND SEMET
F IGURE 35. Stability in time of a 5-nm TiO2 SSE cathode. The total currents are measured for fixed applied voltages.
current became noisier. Figure 35c is a characteristic plot of such a current stability obtained for a working pressure in the range of 10−6 torr up to a few 10−4 torr of oxygen. We ascribe the rapid fluctuations of the currents (i.e., the noise), observed at poor vacuum conditions (pressure higher than 10−6 torr of oxygen), mainly to the formation of oxygen ions between the probe anode and the SSE cathode with subsequent modification of the cathode surface due to ion sputtering. This explains why the lifetime of the SSE cathode under such poor vacuum condition was limited to a few hours of continuous emission. The upper limit to poor vacuum working conditions (pressure higher than 10−4 torr) for SSE cathodes is the formation of highdensity ionization charges (plasma-like) in front of the cathode that will inhibit the electron emission from the surface. This SSE cathode emission behavior in stability contrasts strongly with the conventional FE from metallic tips. C. Experimental Results for SSE Cathodes: Composite-Layer Nanostructured SSE For the composite-layer SSE represented in Figure 24a (Semet et al., 2004), the different ultrathin layers were deposited on n-SiC using low-pressure
59
PLANAR COLD CATHODES
metalorganic chemical vapor deposition (LP-MOCVD). Trimethylaluminum (TMA), trimethylgallium (TMG), silane, and NH3 were used as the precursors. First, a 0.15 µm-thick Si-doped AlGaN layer with aluminum content graded from 40% to 15% was deposited on the SiC substrate. It served as the conducting buffer layer. This was followed by 0.25-µm n-GaN, 2-nm n-Al0.5 Ga0.5 N, and 4-nm undoped GaN. The Si-doping level in all the doped layers was ∼2 × 1018 /cm3 , and the finished surface was characterized as smooth by AFM. The FE measurements were done at different locations of this planar cathode; the emission characteristics presented in the following list are common to all these measurements and can be considered to be specific to the composite-layer nanostructured SSE. 1. The J–F characteristics, for a given temperature, presented the following features: a. The threshold field for J = 10−5 A/cm2 is in the range of 50– 100 V/µm at room temperature (Figure 36a). This value is approximately 100 to 50 times less than the conventional field needed for FE from a metallic surface with a work function of 4 eV. It means that the effective surface barrier height should be less than 1 eV. b. The ln(J /F 2 ) versus 1/F plots of the data are straight lines—the increase of the FE current can be described using a tunneling model (Figure 36b). Given this assumption, the effective surface barrier height ΦFN can be calculated from Eq. (19), which yields a value of 0.58 eV. Measurements at different points of the planar cathode
F IGURE 36. J–F characteristics of a composite n-Al0.5 Ga0.5 N/4-nm GaN). The F-N plot is a straight line.
nanostructured
SSE
(2-nm
60
BINH AND SEMET
always give a straight line for the F-N plots with a value of ΦFN in the range of 0.3–0.6 eV. c. There is always a very abrupt and sharp increase of the emitted current in the high-field region and for J ≥ 0.3 A/cm2 . This sharp increase ends, in most cases, with the destruction of the surface cathode due to an arc formation between the cathode and the probe-ball anode. 2. Emission currents are very unstable at the beginning of the emission process. These instabilities always slowly vanish after a seasoning period of more than 30 minutes of continuous emission of the cathode at high current. After the seasoning period, the emission currents become very stable. We observed these instabilities for all cathode temperatures up to 500 K. They thus seem to be intrinsic to the emission mechanism and are not the result of a surface adsorption–desorption process that modifies the emission. 3. Over a given area, noticeable current increases with temperature are observed for temperature increase from room temperature to 500 K. However, for each temperature, the plots of ln(J /F 2 ) versus 1/F remain straight lines as mentioned previously. In this study, two determinations of the experimental J–F characteristics have been extracted. The first one, done for a given T , determines the values of the effective tunneling barrier ΦFN from the slope of the ln(J /F 2 ) versus 1/F plots, in the range where the plots are straight lines. The second analysis of the data is the determination of the effective activation energy Q, obtained from the slope of ln(J /T 2 ) versus 1/T plots, at different points of the cathode and for a given value of F . The values of this effective activation energy obtained are in the range of 0.85–0.90 eV and are not field dependent. The values of the tunneling barrier height ΦFN are of order 0.5 eV, a value comparable with the 1.5-eV value of electron affinity of GaN. This value is much smaller than the thermal activation energy Q (0.8–0.9 eV) obtained from the temperature dependence emission, which is very near the height of the barrier between the buffer layer and Al0.5 Ga0.5 N layer. We now discuss these experimental data to propose an electron emission mechanism from the nanostructured SSE cathode surface. The basic interpretation is based on the concept that the electron emission from a solid surface is the result of only two main mechanisms. The first is the tunneling process through a field-deformed surface barrier leading to measurable emission current when the width of the barrier is less than 1 nm. The second mechanism is the thermal process over the surface barrier, when the energy of the electrons is higher than the barrier. The specific low value of the surface barrier of the nanostructured SSE (of order 0.5 eV) confirms the model of the electron emission through a
PLANAR COLD CATHODES
61
serial two-step mechanism under applied field, described previously, and summarized as follows: • In a first step, under the polarization, electrons are injected into the GaN layer from the cathode substrate by tunneling through the Al0.5 Ga0.5 N 2-nm layer. They will occupy the sub-bands below the Fermi level, creating a concentration of electrons inside the GaN layer. Due to this electron concentration or space charge formation, there is an energy shift, which leads to a relative lowering of the vacuum level compared with the Fermi level of the substrate. • In the second step, there are two concomitant mechanisms for the electron emission in our model: ◦ The first mechanism is tunneling through a low effective surface barrier. The electrons are emitted by an FE mechanism from the quantized subbands inside the GaN QW. Compared to the single-layer TiO2 -SSE and for an effective surface barrier height of the same value, the J–F characteristics of the nanostructured SSE do not show a saturation effect specific to a T-F emission process (as shown in Figure 30 for TiO2 -SSE). We interpret the difference between these two behaviors by the presence of sub-bands in the GaN layer, therefore preventing the electrons from having energy higher than the top of the barrier (i.e., the possibility to have a concomitant thermionic process). ◦ The second mechanism occurs for elevated temperatures (kB T > 0.8 eV) when hot electrons can jump over the first barrier located between the conductive substrate and the Al0.5 Ga0.5 N ultrathin layer. Because the second barrier at the surface is lower (<0.5 eV due to space charge), these electrons emit directly through the quantum level E3 , given JTH . This first barrier controls the variation of the emitted current JTH with temperature (Figure 25b). In this dual-barrier model, the measured total emission current J will be the sum of both contributions, J = JFN + JTH , a model that is specific to the composite nanostructured SSE. D. Intrinsic Low Work Function Material Lanthanum Sulfide In rare-earth monosulfides, such as LaS, the rock salt structure with a surface covalent bonding offers an option to achieve stable low or negative electron affinity cathode material when deposits are made on the surface [e.g., various III–V semiconductor surfaces (Mumford and Cahay, 1996)]. Calculations indicate that some orientations can have work function less than 1 eV; for example, it gives a value of 0.9 eV for the 100 direction (Eriksson et
62
BINH AND SEMET
al., 1998; Samsonov, 1965). Two other important features of the rock salt form of rare-earth monosulfides are their relatively high melting temperature (>2000◦ C) and their fairly low electrical resistivity (a few tens of microohms-centimeter). 1. LaS Thin-Film Planar Cathode Fabrication Pulsed laser deposition (PLD) technique was used to deposit LaS films on silicon wafers. The values of deposition parameters (chamber pressure, substrate temperature, substrate-to-target separation, laser energy, repetition rate, and spot size on the target) leading to a successful growth of films in their cubic rock salt structure were identified (Cahay et al., 2006). The resulting films, with a thickness in the range of 1 µm have a sheet resistance ∼0.1 Ohm/square. X-ray diffraction analysis of these films shows a lattice constant of 5.863 Å, which is close to the bulk LaS value (Fairchild et al., 2005). The root-mean-square variation of the film surface roughness measured over 1 µm2 area was found to be 1.74 nm by AFM. HRTEM revealed the films to consist of nanocrystalline regions (Figure 37). The emission characteristics are analyzed in a SAFEM (Semet et al., 2006), showing that the LaS planar cathodes is a good electron source with the following specific features. 2. FE Behavior for Total Currents up to a Threshold Value of a Few Microamperes A characteristic J–F variation is shown in Figure 38 and is typical of a conventional FE behavior: linear variation of ln(J /F 2 ) versus 1/F except at the high-field region. The J–F variations are reversible as long as the total FE current is lower than a threshold value IBO of order 1 µA. A straightforward comparison between the experimental data and the theoretical results presented in Section II shows great discrepancy in the absolute value of the current densities. However, if the experimental current densities are multiplied by a factor of 1000, the corrected experimental data corroborate with the theoretical prediction for a metallic cathode with a work function of 1.05 eV (Figure 39). The same correspondence between corrected experimental data by the same factor of 1000 is also found for current emission at 490 K. These two correlations and agreements allow, therefore, to state that the LaS planar cathode has a work function of 1.05 eV, but only for area covering 1/1000 of the total surface. 3. Burnout Behavior For values of the emitting current I that exceed IBO (with IBO of order 1 µA), a sudden blackout of the current occurs. Figure 40 is a characteristic example
PLANAR COLD CATHODES
F IGURE 37.
HRTEM image of LaS thin film grown on (100) Si substrate.
F IGURE 38.
J–F characteristics of a LaS thin-film planar cathode.
63
64
BINH AND SEMET
F IGURE 39. Comparison between the experimental data from Semet et al. (2006) with numerical simulations of J–F done with Φ = 1.05 eV at the same temperatures.
F IGURE 40. Field emission current evolution for increasing applied voltage showing successive blackout events for I exceeding ∼2 µA.
of such a behavior with a value of the IBO = 1.5 µA. After a first blackout and still maintaining the probe-ball at the same position [i.e., the same (x, y)
PLANAR COLD CATHODES
65
locations and same distance d of the SAFEM probe above the sample], an FE recovery takes place for higher applied voltages. In the example shown in Figure 40, the cathode–probe distance was 4.25 µm; the first blackout occurs at ∼1500 V with IBO = 1.5 µA. Pushing the voltage up to ∼2200 V, an FE current can be obtained resulting in an FE behavior until a second blackout occurs at an overall current of ∼2 µA for an applied voltage of ∼2400 V. To estimate the actual value of the current density JBO leading to the blackout, we divided the total blackout current IBO by 1/1000th of the probe surface defined by numerical simulations. Different measurements yield JBO values between 50 and 100 A/cm2 . This is a nonreversible surface evolution, and the resulting emission area is called a burnout area. This sudden and irreversible blackout of the FE current occurs because of a sudden increase in the apparent surface barrier of the emitting zone. A comparison between the F-N plots measured successively for the same area before and after a burnout shows an apparent surface barrier of the burnout surface to be ∼3.5 eV compared to the as-grown surface, which has an effective value of ∼1.05 eV as shown earlier. This value is comparable with the Kelvin probe measurements and with the thermionic and photoemission measurements performed on the as-grown surface. 4. Patchwork Field Emission Model To interpret the FE behavior from the LaS film, two specific features must be taken into account: (1) the LaS film is a polycrystalline layer with nanometric crystallites (see Figure 37), and (2) the work function of the LaS surface is dependent on the crystallographic orientation, with a work function less than 1 eV for the 100 direction. Our interpretation is therefore based on a patchwork distribution of the work function at the surface of the LaS film due to its polycrystalline structure. Nanocrystals with the 100 orientation perpendicular to the surface and outcropping it are believed to be responsible for the low work function of ∼1 eV recorded using the SAFEM technique. They are surrounded by nanocrystals presenting other crystallographic orientations and amorphous zones that have a work function in the range of 2.6–3.5 eV. FE current then is preferentially extracted from these (100) areas, which represent only 1/1000th of the total surface of the cathode. When the total FE current exceeds the 1-µA range, the FE current density transiting through these areas reaches values in the range of 100 A/cm2 , which can produce enough internal heating (i.e., energy absorption) to activate a crystallographic rotation/modification of the 100 nanocrystals, accompanied by a sudden disappearance of the low work function patchwork area. Most of the surface then has a work function in the range of 3.5 eV, leading to a blackout of the FE current when the same applied voltage (∼1500 V) is maintained
66
BINH AND SEMET
because it is insufficient to cause FE from a 3.5-eV work function surface. In concrete terms and for comparison, having a cathode–probe distance of 4.25 µm (experimental condition of Figure 40) and a value of the work function of 3.5 eV, an applied voltage of 5600 V (i.e., an axial local field of 1350 V/µm) is required to extract a total FE current of ∼1 × 10−10 A. The multiple blackouts of the FE current shown in Figure 40 can also be explained by this model if the electric field distribution under the probeball is considered. Since the burnout zone is localized only for an area where Jlocal > JBO , the first burnout surface is then a circular zone around the axis of the probe-ball where the highest field distribution is present. After the first blackout, this zone now has a work function of ∼3.5 eV and will have observable FE only for a local field exceeding 1350 V/µm (i.e., 5600 V for the applied voltage). However, by increasing the applied voltage from 1500 V to less than 5600 V, the patchwork areas with effective work function of ∼1 eV within the annular region surrounding the first burnout area will emit when the local fields become greater than the threshold value of ∼230 V/µm (Figure 41). The corresponding FE current from these patchwork areas increases with the applied voltage following the FN relation until the second blackout occurs when the current density over this annular region reaches JBO . The patchwork model for FE answers also to the interrogation about the ability of keeping (100) area, with work function of order 1 eV, free from surface chemical reaction. By considering the size of the nanocrystallites (of order a few nanometers as shown in Figure 41) a difference in the work function of ∼2 eV between the (100) patchwork area and the surrounding area creates a patch field (Herring and Nichols, 1949) that can be of sufficient strength to protect it from adsorption—a suggestion areas that needs to be developed and confirmed.
F IGURE 41. Schematic representation of the emission areas before and after a first blackout of the field emission current.
PLANAR COLD CATHODES
67
V. C ONCLUSIONS FE devices offer unique advantages compared to the vacuum tube. Among these are the modest power consumption and the possibility of integration with solid-state electronics. Vacuum microelectronics research has shown the feasibility of fabricating large-area cold cathodes and the possibility for extreme miniaturization of integrated field emitters. However, industrial development of vacuum microelectronic or nanoelectronic devices is more demanding than the academic uses of FE for surface studies. Considering the different applications that use electron beams, an ideal cathode, from the industrial point of view, can be outlined and characterized by the following factors, with their relative importance depending on the specific applications envisaged: • • • • • • •
Room temperature operating condition Possibility of a fast modulation of the emission Small operating voltage High brightness and efficiency Insensitivity to medium or poor vacuum conditions Long lifetime and reproducible operating conditions High throughput and ease of fabrication for large area cathodes
Most current research is now in the area of materials and fabrication techniques for the preparation of emitter arrays from well-aligned nanoemitters or nanomaterials with low work function and physical properties appropriate for FE with a view to developing a robust technology matching the upper dictates of industrial devices, which is the requisite focal point for a major breakthrough. Studies of cold cathode materials and their applications are now investigating the possibility of using quantum effects and nanosize effects. Device structures such as SSE cathodes or LaS cathodes are particularly suitable for flat field emitters; they have effective surface barriers on the order of 1 eV or less. The measured FE characteristics indicate that their surfaces are very appropriate for FE. The SSE cathodes use of QW features of dielectric ultrathin films; their surface is less subject to adsorption because of the covalent bonding nature of the dielectric, and the effective low surface barrier is obtained by a modification of the inside electrostatic potential. Stable emission, with threshold field in the range of 100 V/µm, can therefore be obtained for poor vacuum condition, such as 10−5 torr of O2 . For LaS cathodes, the nanocrystalline structure of the surface defines patchwork nanoareas with intrinsic low work function areas that protected by the surrounding patch field. Electron emission can therefore be obtained from these low work function areas with threshold fields on the order of 250 V/µm.
68
BINH AND SEMET
The ideal scenario is a combination of the low surface work function with substrate-arrayed structures to enhance local fields to further reduce the threshold fields for emission from large-area surfaces. The SSE cathode fabrication is a thin-film deposition technology that can be adapted to any substrate geometry. A microfabricated cone-shape array, for example, could meet the requirements for large arrayed SSE cathodes. For intrinsic low work function material such as LaS, a self-assembling process for the growth of nanoprotrusions in arrayed structure is probably a route to be developed (Cahay et al., 2007).
ACKNOWLEDGEMENTS We express our gratitude to the following colleagues: Prof. P. Thévenard, Dr. J.P. Dupin, and R. Mouton of the Equipe Emission Electronique; Dr. Ch. Adessi (University of Lyon 1); Prof. R. Tsu (University of North Carolina– Charlotte); Prof. M. Cahay (University of Cincinnati); and Dr. S. Fairchild (Air Force Research Laboratory, Wright–Patterson Air Force Base) for their valuable contribution to the planar cold cathode studies and assistance in preparing this chapter.
R EFERENCES Adessi, Ch., Devel, M. (1999). Electron scattering by a large molecule: application to (n, n) nanotubes. Phys. Rev. A 60, 2194–2199. Adessi, Ch., Devel, M. (2000). Theoretical study of field emission by a four atoms nanotip: implications for a carbon nanotubes observation. Ultramicroscopy 85, 215–223. Bahtat, A., Bouazaoui, M., Bahtat, M., Garapon, C., Jacquier, B., Mugnier, J. (1996). Up-conversion fluorescence spectroscopy in Er3+ :TiO2 planar waveguides prepared by a sol–gel process. J. Non-Cryst. Solids 202, 16– 22. Binh, V.T. (2001). Electron spectroscopy from solid-state field-controlled emission cathodes. Appl. Phys. Letters 78, 2901–2999. Binh, V.T., Adessi, Ch. (2000). New mechanism for electron-emission from planar cold cathodes: the solid-state field-controlled electron emitter. Phys. Rev. Letters 85, 864–867. Binh, V.T., Dupin, J.P., Adessi, Ch., Semet, V. (2001). Solid-state fieldcontrolled emitters: a thin-film technology solution for industrial cathodes. Solid State Electronics 54, 1025–1031. Binh, V.T., Dupin, J.P., Thevenard, P., Guillot, D., Plenet, J.C. (2000). Solidstate field-controlled electron emission: an alternative to thermionic and
PLANAR COLD CATHODES
69
field emission. In: Electron-Emissive Materials, Vacuum Microelectronics and Flat-Panel Displays. Mat. Res. Soc. Symp. Proc., vol. 621. Materials Research Society, USA, pp. R 4.3.1–R 4.3.6. Binh, V.T., Dupin, J.P., Thevenard, P., Purcell, S.T., Semet, V. (2000). A serial process for electron emission from solid state field-controlled electron emitters. J. Vac. Sci. Technol. B 18, 956–961. Binh, V.T., Garcia, N., Purcell, S.T. (2001). Electron field emission from atom-sources: fabrication, properties, and applications of nanotips. In: Advances in Imaging and Electron Physics, vol. 95. Academic Press, New York, pp. 63–153. Binh, V.T., Semet, V., Dupin, J.P., Guillot, D. (2001). Recent progress in the characterization of electron emission from solid-state field-controlled emitters. J. Vacuum Sci. Technol. B 19, 1044–1050. Cahay, M., Garre, K., Fraser, J.W., Lockwood, D.J., Semet, V., Thien Binh, V., Bandyopadhyay, S., Pramanik, S., Kanchibotla, B., Fairchild, S., Grazulis, L. (2007). Characterization and field emission properties of lanthanum monosulfide nanoscale emitter arrays deposited by pulsed laser deposition on self-assembled nanoporous alumina templates. J. Vac. Sci. Technol. B 25, 594–603. Cahay, M., Garre, K., Wu, X., Poitras, D., Lockwood, D.J., Fairchild, S. (2006). Physical properties of lanthanum monosulfide thin films grown on (100) silicon substrate. J. Appl. Phys. 99, 123502–123507. Coehlo, R., Debeau, J. (1971). Properties of the tip-plane configuration. J. Phys. D: Appl. Phys. 4, 1266–1280. Duke, C.B. (1969). Tunneling in Solids. Academic Press, New York. Durand, E. (1953). Distributions données de charges. In: Electrostatique et Magnetostatique. Masson & Cie, Paris, pp. 30–73. Dushman, S. (1923). Electron emission from metals as a function of temperature. Phys. Rev. 21, 623–636. Dyke, W.P., Dolan, W.W. (1956). Field emission. Adv. in Electr. and Elec. Physics 8, 89–185. Eriksson, O., Willis, J., Mumford, P.D., Cahay, M., Friz, W. (1998). Electronic structure of the LaS surface and LaS/CdS interface. Phys. Rev. B 57, 4067– 4070. Fairchild, S., Jones, J., Cahay, M., et al. (2005). Pulsed laser deposition of lanthanum monosulfide thin films on silicon substrate. J. Vac. Sci. Technol. B 23, 318–321. Forbes, R.G. (2001). Low-macroscopic-field electron emission from carbon films and other electrically nanostructured heterogeneous materials: hypothesis about emission mechanism. Solid-State Electronics 45, 779–808. Forbes, R.G. (2006). Simple good approximations for the special elliptic functions in standard Fowler–Nordheim tunneling theory for a Schottky– Nordheim barrier. Appl. Phys. Lett. 89, 113122.
70
BINH AND SEMET
Forbes, R.G. (1999). Refining the application of Fowler–Nordheim theory. Ultramicroscopy 79, 11–23. Forbes, R.G., Edgcombe, C.J., Valdrè, U. (2003). Some comments on models for field enhancement. Ultramicroscopy 95, 57–65. Fowler, R.H., Nordheim, L.W. (1928). Electron emission in intense electric fields. Proc. Roy. Soc. (London) A 119, 173–181. Geis, M.W., Gregory, A., Pate, B.B. (1991). Capacitance–voltage measurements on metal–SiO2 –diamond structures fabricated with (100)-oriented and (111)-oriented substrates. IEEE Trans. Electron. Dev. 38, 619–626. Geisl, M.W., Efremow, N.N., Krohn, K.E., et al. (1998). A new surface electron-emission mechanism in diamond cathodes. Nature 393, 431–435. Ghis, A., Meyer, R., Ramband, P., Levy, F., Leroux, T. (1991). Sealed vacuum devices fluorescent microtip displays. IEEE Trans. Electron Devices 38, 2320–2322. Gomer, R. (1961). In: Field Emission and Field Ionization. Harvard University Press, Cambridge, MA. Gulyaev, Y.V., Chernozatonskii, L.A., Kosakovskaja, Z.J., Sinitsyn, N.I., Torgashov, G.V., Zakharchenko, Y.F. (1995). Field emitter arrays on nanotube carbon structure films. J. Vac. Sci. Technol. B 13, 435–436. Guth, E., Mullin, C.J. (1942). Electron emission of metals in electric fields III. The transition from thermionic to cold emission. Phys. Rev. 61, 339–348. Herring, C., Nichols, M.H. (1949). Thermionic emission. Rev. Mod. Phys. 21, 185–271. Jensen, K.L. (1993). Numerical simulation of field emission from silicon. J. Vac. Sci. Technol. B 11, 371–378. Jensen, K.L., Ganguly, A.K. (1994). Time dependent, self-consistent simulations of field emission from silicon using the Wigner distribution function. J. Vac. Sci. Technol. B 12, 770–775. Kleint, C. (1993). On the early history of field emission including attempts of tunneling spectroscopy. Prog. Surf. Sci. 42, 101. Kleint, C. (2004). Comments and references relating to early work in field electron emission. Surf. Interface Anal. 36, 387–390. Korotkov, A.N., Likharev, K. (1999). Possible cooling by resonant Fowler– Nordheim emission. Appl. Phys. Lett. 75, 2491–2493. Lang, N.D., Williams, A.R. (1978). Theory of atomic chemisorption on simple metals. Phys. Rev. B 18, 616–636. Lippmann, B.A., Schwinger, J. (1950). Variational principles for scattering processes I. Phys. Rev. 79, 469–480. Mahner, E., Minatti, N., Piel, H., Pupeter, N. (1993). Experiments on enhanced field-emission of niobium cathodes. Appl. Surf. Sci. 67, 23–28. Miller, H.C. (1967). Change in field intensification factor beta of an electrode projection (Whisker) at short gap lengths. J. Appl. Phys. 38, 4501–4504.
PLANAR COLD CATHODES
71
Miller, M.K., Cerezo, A., Heatherington, M.G., Smith, G.D.W. (1996). Physical principal of field ion microscopy. In: Atom Probe Field Ion Microscopy. Clarendon, Oxford, pp. 41–133. Millikan, R.A., Eyring, C.F. (1926). Laws governing the pulling of electrons out of metals by intense electrical field. Phys. Rev. 27, 51–67. Morton, A. (1946). Electron guns for television application. Rev. Modern Physics 18, 362–378. Müller, W. (1938). Weitere Beobachtungen mit dem Feldelecktronenmikroskop. Z. Physik 108, 668–680. Müller, E.W. (1956). Resolution of the atomic structure of a metal surface by the field ion microscope. J. Appl. Phys. 27, 474–476. Mumford, P.D., Cahay, M. (1996). Dynamic work function shift in cold cathode emitters using current carrying thin films. J. Appl. Phys. 79, 2176– 2179. Murphy, E.L., Good, R.H. (1956). Thermionic emission, field emission, and the transition region. Phys. Rev. 102, 1464–1473. Nilsson, L., Groening, O., Groening, P., Kuettel, O., Schlapbach, L. (2001). Characterization of thin-film electron emitters by scanning anode field emission microscopy. J. Appl. Phys. 90, 768–780. Plenet, J.C., Brioude, A., Bernstein, E., et al. (2000). Densification of sol–gel TiO2 very thin films studied by SPR measurements. Opt. Mat. 13, 411–415. Pupeter, N., Giihl, A., Habermann, T. (1996). Field emission measurements with µm resolution on chemical-vapor-deposited polycrystalline diamond film. J. Vac. Sci. Technol. B 14, 2056–2059. Robertson, J. (1986). Amorphous carbon. Adv. Phys. 35, 317–374. Samsonov, G.V. (1965). High Temperature Compounds of Rare-Earth Metals with Nonmetals. Consultants Bureau Enterprises, Inc., USA. Schottky, W. (1914). Über den Einfluc von Strukturwirkungen, besonders der Thomsonschen Bildkraft. Z. Phys. 15, 872. Semet, V., Adessi, Ch., Capron, T., Binh, V.T. (2006). Technical Digest of the Joint 19th IVNC (International Vacuum Nanoelectronics Conference) and 50th IFES (International Field Emission Symposium), Guilin, China, July 2006. Semet, V., Adessi, Ch., Capron, T., Binh, V.T. (2007a). Electron emission from low surface barrier cathodes. J. Vac. Sci. Technol. B 25, 513–516. Semet, V., Adessi, Ch., Capron, T., Mouton, R., Binh, V.T. (2007b). Low work-function cathodes from Schottky to field-induced ballistic electron emission: self-consistent numerical approach. Phys. Rev. B 75, 045430– 045437. Semet, V., Binh, V.T., Vincent, P., et al. (2002). Field electron emission from individual carbon nanotubes of a vertically aligned array. Appl. Phys. Lett. 81, 343–345.
72
BINH AND SEMET
Semet, V., Binh, V.T., Zhang, J.P., Yang, J., Azif Khan, M., Tsu, R. (2004). Electron emission through a multilayer planar nanostructured solid-state field-controlled emitter. Appl. Phys. Lett. 84, 1937–1939. Semet, V., Cahay, M., Binh, V.T., Fairchild, S., Wu, X., Lockwood, D.J. (2006). Patchwork field emission properties of lanthanum monosulfide thin films. J. Vac. Sci. Technol. B 24, 2412–2416. Semet, V., Mouton, R., Binh, V.T. (2005). Scanning anode field emission microscopy analysis for studies of planar cathodes. J. Vac. Sci. Technol. B 23, 671–675. Shaw, J.L., Gray, H.F., Jensen, K.L., Jung, T.M. (1996). Graded electron affinity electron source. J. Vac. Sci. Technol. B 14, 2072–2079. Shoulders, K.R. (1961). Microelectronics using electron-beam activated machining techniques. Adv. Comp. 2, 135–293. Silverman, M.P. (1994). More Than One Mystery: Exploration in Quantum Interference. Springer-Verlag, New York. Smith, R., Walls, J.M. (1978). Ion trajectories in field-ion microscope. J. Phys. D: Appl. Phys. 11, 409–419. Spindt, C.A. (1968). A thin-film field-emission cathode. J. Appl. Phys. 39, 3504–3505. Spindt, C.A., Brodie, I., Humphrey, L., Westerberg, E.R. (1976). Physical properties of thin-film field emission cathodes with molybdenum cones. J. Appl. Phys. 47, 5248–5263. Spindt, C.A., Schwoebel, P.R., Holland, C.E. (2001). I. Inform. Display 2, 44. Sze, S.M. (1981). Physics of Semiconductor Devices. Wiley Interscience, New York. Teo, K.B.K., Lee, S.B., Chhowalla, M. (2003). Plasma enhanced chemical vapour deposition carbon nanotubes/nanofibers – how uniform do they grow. Nanotechnology 14 (2), 204. Thompson, J. (1897). Cathode rays. Philosoph. Mag. 44, 293–316. Tsu, R., Greene, R.F. (1999). Inverse Nottingham effect cooling in semiconductors. Electrochemical and Solid-State Letters 2, 645–647. Underwood, R.D., Kozodoy, P., Keller, S., DenBaars, S.P., Mishra, U.K. (1998). Piezoelectric surface barrier lowering applied to InGaN/GaN field emitter arrays. Appl. Phys. Lett. 73, 405–407. van der Ziel, A. (1975). Practical cathodes. In: Everitt, W.L. (Ed.), Solid State Physical Electronics. Prentice-Hall Electrical Engineering Series. Prentice-Hall, Englewood Cliffs, NJ, pp. 119–143. Wang, C., Garcia, A., Ingram, D.C., Lake, M., Kordesch, M.E. (1991). Cold field-emission from CVD diamond films observed in electron-microscopy. Electron. Lett. 27, 1459–1461. Wang, R.Z., Ding, X.M., Wang, B., Xue, K., Xu, J.B., Yan, H., Houl, X.Y. (2005). Structural enhancement mechanism of field emission from multilayer semiconductor films. Phys. Rev B 72, 125310.
PLANAR COLD CATHODES
73
Wood, R.W. (1897). A new form of cathode discharge and the production of X-rays, together with some notes on diffraction. Preliminary communication. Phys. Rev. 5, 1–10. Xu, N.S., Huq, S.E. (2005). Novel cold cathode materials and applications. Material Science and Engineering R 48, 47–189. Xu, N.S., Tzeng, Y., Latham, R.V. (1993). Similarities in the cold electronemission characteristics of diamond-coated molybdenum electrodes and polished bulk graphite surfaces. J. Phys. D 26, 1776–1780.
This page intentionally left blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 148
Interval and Fuzzy Analysis: A Unified Approach WELDON A. LODWICK Department of Mathematical Sciences, University of Colorado at Denver and Health Sciences Center, Denver, Colorado 80217, USA
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . A. Historical Background . . . . . . . . . . . . . . . . . . B. The Focus and Basic Themes . . . . . . . . . . . . . . . . 1. Extension Principles . . . . . . . . . . . . . . . . . . 2. Arithmetic . . . . . . . . . . . . . . . . . . . . . 3. Enclosure and Verification . . . . . . . . . . . . . . . . II. Interval Analysis . . . . . . . . . . . . . . . . . . . . . A. Interval Extension Principle . . . . . . . . . . . . . . . . . B. Interval Arithmetic . . . . . . . . . . . . . . . . . . . 1. Axiomatic Approach . . . . . . . . . . . . . . . . . . 2. Interval Arithmetic from the United Extension: Constraint Interval Arithmetic . 3. Specialized Interval Arithmetic . . . . . . . . . . . . . . . 4. Comparison Between the Axiomatic and Extension Principle Approach to Interval Arithmetic . . . . . . . . . . . . . . . . . . . . . C. Enclosure and Verification . . . . . . . . . . . . . . . . . 1. Enclosure of the Range of a Function . . . . . . . . . . . . . 2. Epsilon Inflation . . . . . . . . . . . . . . . . . . . 3. Defect Correction . . . . . . . . . . . . . . . . . . . D. Algorithms and Software . . . . . . . . . . . . . . . . . . III. Fuzzy Set Theory . . . . . . . . . . . . . . . . . . . . . A. Possibility and Necessity Distribution . . . . . . . . . . . . . . B. Semantics of Fuzzy Sets and Possibility and Necessity Distributions . . . . . C. Fuzzy Extension Principles . . . . . . . . . . . . . . . . . 1. Fuzzy Arithmetic . . . . . . . . . . . . . . . . . . . D. Enclosure and Verification . . . . . . . . . . . . . . . . . IV. Analysis with Distributions . . . . . . . . . . . . . . . . . . A. Distribution Arithmetic . . . . . . . . . . . . . . . . . . 1. Interval Convolution Methods . . . . . . . . . . . . . . . 2. Interval Histogram Methods . . . . . . . . . . . . . . . . 3. Inverse Probability Method . . . . . . . . . . . . . . . . B. General Theory of Uncertainty . . . . . . . . . . . . . . . . 1. Clouds . . . . . . . . . . . . . . . . . . . . . . 2. Interval-Valued Probability Measures . . . . . . . . . . . . . C. Generalized Extension Principles for Distribution . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
76 76 79 80 81 83 85 90 93 94 95 98 109 110 113 113 115 117 119 123 124 125 127 129 129 131 132 136 152 165 165 167 184 184
75 ISSN 1076-5670 DOI: 10.1016/S1076-5670(07)48002-8
Copyright 2007, Elsevier Inc. All rights reserved.
76
LODWICK
I. I NTRODUCTION A. Historical Background A unified approach to real-valued interval and fuzzy analysis emphasizing common themes is presented. Interval analysis and fuzzy analysis may be viewed as a bridge between deterministic problem solving and problems with generalized uncertainty (Zadeh, 2005). This presentation focuses on two key features common to both interval and fuzzy analysis: (1) the extension principles that generalize real-valued analysis-to-analysis on sets and functions (intervals, fuzzy sets, and possibility distributions), and (2) verification or enclosure methods that define lower and upper values and distributions with the view of obtaining approximate solutions to problems whose existence is verified and given within guaranteed bounds in an efficient and mathematically consistent way. The extension principles relate interval analysis and fuzzy analysis. The extension principles may point to how computation with numbers and sets is extended to computation with words. Enclosure methods are derived from the extension principles and indicate how interval, fuzzy, and possibilistic uncertainty can be incorporated, computed, and propagated in mathematical analysis on these entities. Enclosure methods, for example, in the substitution of interval arithmetic operations for every algebraic operation in a continuous algebraic function, obtain lower and upper bounds on the result. Knowing how to incorporate, compute, and propagate uncertainty in mathematical analysis problem solving is key to the use of interval, fuzzy, and possibilistic entities in practice. Enclosure methods are one of the most important contributions of interval analysis and fuzzy analysis to mathematical analysis. One emphasis of this chapter is enclosure. While not explicitly derived, the enclosure methods developed here may equally be applied to probabilistic distributions. Professor Lothar Collatz, in a lecture titled Monotonicity and Optimization, given to the Mathematics Department at Oregon State University, August 5, 1974, stated, “The idea of ordering is more fundamental than distance because from ordering we can get distance but not the other way around” (author notes). The real number system contains within itself the most fundamental mathematical order to which both interval analysis and fuzzy analysis relate. Real-valued arithmetic and derived distance generate mathematical analysis. If we take Professor Collatz’s insight seriously, any new structure—interval analysis or fuzzy analysis in particular—has as its first task the elucidation of order and subsequently derived distance (measure) from that order. Since both interval numbers and fuzzy numbers, as will be seen from their definition, relate themselves to sets of real numbers and graphs in R2 , respectively, the order needs to be derived from that associated with sets and distributions.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
77
Moreover, since interval numbers and fuzzy sets model uncertainty on one hand and amplification and flexibility on the other, it is clear that the idea of order and the derived distance (measure) generated will be flexible; that is, they require choices. The choice that needs to be made depends on the semantics of the problem to a greater extent than in the deterministic setting. Our approach takes as its point of departure arithmetic, which has as its underlying the order of sets of real numbers. Subsethood is what will be taken as the order on sets. Thus, since sets are the primary entity, the way to handle set-valued functions whose domains are interval numbers, fuzzy sets, and possibility distributions is a major task of this chapter. Functions and algorithms are the mathematical embodiment of cause and effect. A mathematical function can be considered the symbolic expression of the essence of science, the study of what ties cause to effect. Thus, functions are central to scientific endeavors. Intervals and fuzzy sets are relatively new entities of mathematical study; therefore, the most useful approaches to their analyses are still being uncovered. Nevertheless, functions are such a central part of mathematics that they must be confronted by any developing field; for this reason, they are the theme of this presentation. As will be demonstrated, functions as defined by the extension principle are a key feature to the interconnection between interval analysis and fuzzy analysis. Extension principles and verification methods express how to obtain functions. Moreover, knowing how to relate interval analysis to fuzzy analysis clears the way to application and amplification of results in each discipline, as well as the creation of new ones. This monograph contends that the extension principles are the bridge between the two disciplines, and enclosure methods apply the extension principles to obtain lower and upper bounds on computations with interval, fuzzy, and possibilistic entities to possibly guarantee that the lower and upper bounds that are obtained contain within their range the correct result. Researchers have searched for efficient methods to compute with intervals, fuzzy sets, and possibility distributions. Methods to compute lower and upper values (numerical lower and upper bounds on intervals or lower and upper distributions on fuzzy sets and possibility distributions) that are developed and presented use a min/max calculus (presented below) to attain efficiency in computation. Some material found here was first developed and presented by Dubois and Prade (1980, 1987a, 1987b, 1991), and later with colleagues Kerre and Mesiar (Dubois et al., 2000b). For example, the interrelationship between fuzzy set theory and interval analysis was first discussed in 1984 by Dubois and Prade (1980, page 58) and later in their 1984 technical report (Dubois and Prade, 1987b). The central theme of this presentation is the extension principle and enclosure and verification as unifying concepts in interval and fuzzy analysis. Dubois and Prade give prominence to the extension principle. However, they
78
LODWICK
do not deal explicitly with enclosure and verification methods, especially as arising from the extension principle. Since the extension principle is the analysis of function over sets, this discussion begins with set-valued functions and shows how these directly tie into Moore’s united extension and Zadeh’s extension principle. This forms the basis for constrained interval arithmetic (Lodwick, 1999), as well as for the ensuing analyses. This chapter is intended for applied professionals, researchers, and students who have some familiarity with fuzzy set theory or interval analysis and wish to study the synergisms between the two fields. Some basic knowledge of interval analysis (as found in Moore, 1979) and fuzzy set theory (as found in Klir and Yuan, 1995) is assumed. For example, it is assumed that the reader has understanding of an α–cut of a fuzzy membership function. Louis Rall, at the beginning of his talk at SIAM Workshop on Validated Computing (May 22–25, 2002, Toronto, Canada), said (I am paraphrasing), “My early career can be characterized by what I learned from R.E. Moore. My later career can be characterized by what I further learned from R.E. Moore.” It is certainly true that this chapter reflects what I learned from R.E. Moore, D. Dubois, and H. Prade as I was beginning to learn interval analysis in graduate school and fuzzy set theory in my early academic career. Subsequently, this chapter reflects what I further learned from R.E. Moore, D. Dubois, and H. Prade. Interval analysis and fuzzy set theory, as active fields of research and application, are relatively new mathematical disciplines receiving the impetus that defined them as separate fields of study in 1959 and 1965, respectively, with R.E. Moore’s technical reports on interval analysis and his doctoral dissertation (Moore, 1959a, 1962; Moore and Yang, 1959b; Moore et al., 1960) and Zadeh’s seminal papers on fuzzy set theory (Zadeh, 1965, 1968, 1975, 1978). The connection between interval analysis and possibility theory [possibility theory historically arose out of fuzzy set theory (Zadeh, 1978)] is evident in the mathematics of uncertainty. The theory of interval analysis models, among other things, the uncertainty arising from numerical computation, which can be considered a source of ambiguity. Fuzzy set theory and possibility theory model, among other things, the uncertainty of vagueness and ambiguity arising from the transitional nature of entities and a lack of information, respectively. This chapter will clarify the distinction between fuzzy set theory and possibility theory. Since intervals can be considered a particular type of fuzzy set (as we will see), one view is that fuzzy set theory is a more general theory than interval analysis. That is, interval analysis can be thought of as dealing with a particular type of uncertainty whose general theory is described by fuzzy sets. However, interval analysis developed as part of the then emergent field of numerical analysis initially had three directions: (1) computational error analysis (automatically computed error including
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
79
rounding); (2) verified computing, which Moore first called range arithmetic, subsequently called interval arithmetic [later Aberth (1988) developed a separate arithmetic he called range arithmetic, which differs from Moore’s use of these words]; and (3) the derivation of the underlying algebraic structure of floating-point numbers called computer algebra. Fuzzy sets developed in at least two directions of interest to this presentation: (1) possibility theory, and (2) fuzzy interval analysis. These have direct linkages with what has become known as generalized theory of uncertainty (see Zadeh, 2005). Although the two fields can be considered as having a common root, interval analysis and fuzzy set theory are independent fields whose cross-fertilization has been a relatively recent phenomenon (Dubois and Prade, 1991; Dubois et al., 2000b; Lodwick and Jamison, 2003a; Lodwick, 2002). Of note, all the interval analysis and fuzzy analysis in this monograph is over sets of real numbers. All sets are over real numbers, that is, real-valued intervals and real-valued distributions (in the case of fuzzy membership functions, possibilistic distributions, and probabilistic distributions). Moreover, when the word box is used in the context of intervals, it is understood that if the analysis is in R, the box refers to an interval [a, b]. In the context of Rn , a box is a rectangular n-dimensional hyperrectangle [a1 , b1 ] × · · · × [an , bn ], ai , bi ∈ R, i = 1, . . . , n. B. The Focus and Basic Themes The thesis of this presentation in that an underlying mathematical theory from which interval and fuzzy analysis can be viewed and understood is the theory of set functions particularized to intervals and fuzzy sets and associated lower and upper approximations of the resultants. Therefore, this development focuses on two areas common to both interval and fuzzy analysis: the extension principles and enclosure. Functions are a central part of mathematics and science, so their application to interval and fuzzy sets is a crucial step in the development of both fields. The extension principles of R.E. Moore (Moore et al., 1960) and L. Zadeh (Zadeh, 1965) are directly related to an earlier development and more general extension principle (Strother, 1952, 1955). A relatively recent treatment of set-valued functions is Audin and Frankkowska (1990). Of course, set-valued functions extend real-valued functions to functions on intervals and fuzzy sets. The extension principle used in interval analysis is called the united extension (Moore et al., 1960; Moore, 1962). In fuzzy set theory, it is called simply the extension principle (Zadeh, 1965). Since arithmetic operations are continuous real-valued functions, excluding division by zero, the extension principles can be used to define interval and fuzzy arithmetic.
80
LODWICK
Enclosure means approximations that produce lower and upper values (an interval or a functional envelope, depending on the context) to the theoretical solution, which lies between the lower and upper values or functions. Efficient methods to compute lower and upper approximations are desired and necessary in practice. When enclosure is part of mathematical problem solving, it is called verification, formally defined in the following text. The point of view that the extension principle is an important thread, which can be used to relate and understand the various principles of uncertainty that are of interest to this discussion (interval, fuzzy, and possibility), leads to a direct relationship between the associated arithmetic. Moreover, lower and upper approximation pairs for fuzzy sets allow for simpler computation using min/max calculus and lead to enclosures with careful implementation. Arithmetic is a gateway to mathematical analysis. The extension principle is how arithmetic is defined for uncertainty entities of interest, and thus the extension principle provides a path to understand mathematical analysis on these entities. Three basic themes ensue: (1) extension principles, (2) arithmetic derived from axioms and extension principles, and (3) enclosure and verification. The sections after this introduction consider the three basic themes in the context of interval analysis (Section II), fuzzy set theory (Section III), and distributions (Section IV). 1. Extension Principles The extension principle is key because it defines how real-valued expressions are represented in the context of intervals, fuzzy sets, distributions, and generalized uncertainty. The extension principle can be viewed as one of the main unifying concepts between interval analysis and fuzzy set theory. Moreover, the extension principle is one way to define arithmetic on intervals, fuzzy sets, and distributions. All extension principles associated with intervals and fuzzy sets can be considered as originating from set-value mappings or graphs. Generally, an extension principle defines how to obtain functions whose domains are sets. Accomplishing this for real numbers is clear. It is more complex for sets because how to obtain resultant well-defined entities must be defined. Set-valued maps have a very long history in mathematics. Relatively recently, Strother’s 1952 doctoral dissertation (Strother, 1952) and two subsequent papers (Strother, 1955, 1958) define the united extension for set-valued functions for domains possessing specific topological structures. R.E. Moore applied Strother’s united extension to intervals. In doing so, as will be seen in Section II, Moore had to show that the topological structures on intervals were among those that Strother developed. Having done this, he retained the
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
81
name united extension as the extension principle particularized to intervals. In fact, Strother co-authored the technical report that first used the setvalued extension principle on intervals. That is, Moore’s united extension (the interval extension principle) is a set-valued function whose domain is the set of intervals, and the range is an interval for those underlying functions that are continuous. Zadeh’s extension principle (Zadeh, 1965) explains how functions of fuzzy sets are derived from real-valued functions. It describes, among other things, how to compute with fuzzy sets. That is, Zadeh’s extension principle can be thought of as a set-valued function where the domain elements are fuzzy sets, and the range values are fuzzy sets for the appropriate maps, called membership functions (defined in Section III). The extension principle was generalized and made more specific to what are now called fuzzy numbers or fuzzy intervals by various researchers beginning with Nguyen (1978). The details are found in Section III. 2. Arithmetic Interval arithmetic is central to fuzzy arithmetic and can be derived axiomatically or from R.E. Moore’s united extension. Of special interest is the latter approach, especially in deriving a constrained interval arithmetic in Section II, which will have implications for fuzzy arithmetic. Moore (Moore, 1959a, 1962; Moore and Yang, 1959b; Moore et al., 1960) developed interval analysis arising from Moore and Lodwick (2003) . . . the observation that if a real number A is computed, and a rigorous bound B on the total error in A as an approximation to some unknown number X, that is, |X − A| ≤ B, then no matter how A and B are computed, it is known for certain that X lies in the interval [A−B, A+B]. Computations with intervals, especially in conjunction with computer implementations, arose naturally. In fact, Moore’s initial work is tied directly to numerical error analysis. There are two direct precursors of Moore’s development of interval analysis in 1959: Warmus (1956) and Sunaga (1958). Moore’s initial work references and extends in significant ways Sunaga’s work. Moore develops computational methods, incorporates computer rounding, develops for the first time automatic numerical error analysis (gets the computer to calculate roundoff, numerical truncation, and numerical method error estimations), and extends interval arithmetic to interval analysis. An interval can be considered as a set and a number. On the real number line, with the usual meaning of the order relation ≤, an interval [a, b] is the set of all real numbers {x | a ≤ x ≤ b}. As a number, an interval X is a
82
LODWICK
pair of numbers {a, b}, the left and right end points of the interval. Analysis on intervals, since intervals are sets, requires set-valued functions, limits, integration, and differentiation theory. This is done via the united extension (Moore, 1966). The axioms of interval arithmetic as articulated by Warmus (1956), Sunaga (1958), and Moore (1959a) are as follows. It is noted that while Warmus’ notation is different, the operations are the same. Definition 1. For all arithmetic functions ◦ ∈ {+, −, ÷, ×}, [x] ◦ [y] = {x ◦ y | x ∈ [x] and y ∈ [y]} where [x] is an arbitrary interval X. In particular, we have the following: 1. Addition: [a, b] + [c, d] = [a + c, b + d].
(1)
[a, b] − [c, d] = [a − d, b − c].
(2)
[a, b] × [c, d] = min{ac, ad, bc, bd}, max{ac, ad, bc, bd} .
(3)
2. Subtraction:
3. Multiplication:
4. Division: [a, b] ÷ [c, d] = [a, b] × [1/d, 1/c] where 0 ∈ / [c, d].
(4)
There is an extended interval arithmetic that incorporates the case where 0 ∈ [c, d] for division (Hansen, 1975; Kahan, 1968b). Moreover, there are various ways to approach interval arithmetic; for example, see Dempster (1974), Neumaier (1993), Nickel (1969), and Stolfi et al. (1994). In fuzzy arithmetic, the axioms of interval arithmetic apply to each α-cut of a fuzzy set membership function as long as the entity is a fuzzy number or fuzzy interval (defined in Section III). Definition 2. An interval arithmetic based on axioms 1–4 above is called axiomatic interval arithmetic. Remark. The axioms 1–4 above essentially compute the maximum and minimum values of the set {z = x ◦ y | x ∈ [a, b], y ∈ [c, d]}, where ◦ ∈ {+, −, ×, ÷} and the two intervals [a, b], [c, d] are considered independent. Thus, axiomatic interval arithmetic is a type of min/max calculus since the values arising from axiomatic interval arithmetic only involve the end points (min/max) of the intervals.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
83
The implementation of interval arithmetic on the computer, for which the goal is to account for all errors, including numerical and truncation errors, is called rounded interval arithmetic (Moore, 1959a; Moore and Yang, 1959b). Kulisch (1971) and Kulisch and Miranker (1981, 1986) axiomatized rounded interval arithmetic and uncovered the resultant algebraic structure, called a ringoid. While specialized extended languages (Pascal-XSC, C-XSC) and chips were developed for interval and rounded interval data types, incorporating the ideas set forth by Moore, Kulisch, and Miranker (among other researchers), the most successful rounded interval tool is undoubtedly INTLAB, a downloadable software package that runs in conjunction with MATLAB with imbedded interval arithmetic, rounded interval arithmetic, and some interval analytic methods, in particular computational linear algebraic methods. 3. Enclosure and Verification Archimedes’ (Archimedes of Siracusa, 1897; Phillips, 1981) approach to the computation of the circumference of a circle using outer circumscribed and inner inscribed regular polygons whose perimeter is a straightforward calculation is an enclosure and verification method, perhaps the first one. The essential part of enclosure and verification is that a solution is mathematically proven to exist (perhaps is unique) and lies between the computed lower and upper bound values (real numbers for our purposes). That is, verification guarantees that the solution exists (and perhaps is unique) in a mathematical sense. Enclosure consists of lower and upper bounds containing the solution. Verification in the case of Archimedes’ computation of the circumference of a circle is the geometrical fact (theorem) that the perimeter of the circumscribed regular polygon is greater than the circumference of a circle and that the inscribed regular polygon has a perimeter less than that of the circumference of a circle. Often fixed-point theorems (contractive mapping theorems, for example) are used to verify the existence of solutions in mathematical analysis. These theorems are used from the point of view of verification of hypotheses so that interval analysis may be used to mathematically calculate guaranteed bounds (for example, compute a Lipschitz condition, accounting for all numerical and truncation errors, that is less than 1, for example), which if held means that the mapping is contractive (hence, a computed solution exists). The methods to compute lower and upper bounds in a mathematically correct way on a computer must account for numerical and computer truncation error. This is one of the core research areas of interval mathematics. One of the primary applications of interval analysis consists of enclosures for verification. As will be seen, the equivalent for fuzzy sets and possibility theory is the computation of functional envelopes
84
LODWICK
(interval-valued probability or clouds). Interval verification methods obtain interval enclosures containing the solution(s) within their bounds. In interval analysis, verification means that existence is mathematically demonstrated, and valid bounds on solutions are given. When possible and/or relevant, uniqueness is mathematically determined. Thus, verification in the context of a computational process that uses intervals for a given problem means that a solution, say x, is verified (mathematically) to exist, and the computed solution is returned with lower and upper bounds, a and b, such that the solution, shown to exist, is guaranteed to lie between those bounds, that is, a ≤ x ≤ b. Uniqueness is determined when possible or desirable. Although not often thought of in these terms, possibility and necessity pairs, when constructed from an underlying though perhaps unknown distribution according to Jamison and Lodwick (2002), envelope the distribution. As such, they are functional enclosures. Thus, enclosure in the context of distributions is understood to be the construction of lower and upper functions, g(x) and h(x), to a given function f (x) such that g(x) ≤ f (x) ≤ h(x). This is desired not only when x is a real number or vector, but also when x consists of vectors of distributions such as of random variables, intervals, fuzzy sets, and/or possibilities. This is an especially difficult problem when f (x) is such a complex expression. The problem of computing such lower and upper bounds for various types of domains whose elements are described by a variety of uncertainty distributions is important in problems dealing with risk analysis, optimization under uncertainty, and simulations when inputs are distributions. Fuzzy set membership functions only give information about upper values of uncertainty, as noted by Dubois, Moral, and Prade (Dubois et al., 1997). The lower distribution for a fuzzy set is the x-axis. That is, in the context of enclosures, a fuzzy entity is enclosed between g(x) = 0 and f (x) = h(x), where h(x) is the membership function. The same is not true with a probability density function f (x), since a probability density function is considered “deterministic” in the sense that it captures the uncertainty with “certainty,” that is, g(x) = f (x) ≤ f (x) ≤ h(x) = f (x). For fuzzy sets, it is possible to obtain a tighter bound on the lower end, g(x), which can be accomplished by constructing probability-based possibility and necessity pairs via interval analysis (Berleant and Goodman-Strauss, 1998; Olsen, 2005; Tonon, 2004), with probability-based possibility and necessity pairs (Jamison and Lodwick, 2002), or with clouds (Neumaier, 2004a). These are discussed later. This exposition begins with interval analysis, its early development, and its extension principle, which led to interval arithmetic and interval analysis. The various interval arithmetics that arise from axioms, mathematical analysis, and the direct application of the extension principle all attempt to solve the outstanding problems of dependency, tight bounds, and implementability. Enclosure and verification methods, which are presented next, are applied
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
85
to problems in mathematical analysis. Enclosure and verification methods demonstrate the power and usefulness of interval analysis. Fuzzy set theory as it relates to the basic themes of the extension principle, arithmetic, analysis, enclosure, and verification methods follow next. The concluding section discusses enclosure and verification methods for uncertainty distributions under the subsection “Distribution Arithmetic.”
II. I NTERVAL A NALYSIS Interval analysis has an early history in Archimedes’ computation of the circumference of a circle (Archimedes of Siracusa, 1897). Interval analysis as developed by Moore (Moore, 1959a; Moore and Yang, 1959b; Moore et al., 1960) arose from the attempt to compute the error bounds of the numerical solutions of a finite state machine (computer), for which the roundoff error was automatically accounted for by the computer itself. This led to the investigation of computations with intervals as the entity, data type, that enabled automatic error analysis. Hansen writes (Hansen, 2001): R.E. Moore (Moore, 1999) states that he conceived of interval arithmetic and some of its ramifications in the spring of 1958. By January of 1959, he had published (Moore, 1959a) a report on how interval arithmetic could be implemented on a computer. A 1959 report (Moore and Yang, 1959b) showed that interval computations could bound the range of rational functions and integrals of rational functions. Theoretical and practical interval arithmetic were differentiated. Reference (Moore, 1965) discusses interval-valued functions, interval contractions, a metric topology for interval numbers, interval integrals, and contains an extensive discussion of Moore’s use of interval analysis to bound the solution of ordinary differential equations. Although there are five known direct and clear precursors to Moore’s version of interval arithmetic and interval analysis beginning in 1924 (Burkill, 1924; Dwayer, 1951; Fischer, 1958; Sunaga, 1958; Warmus, 1956; Young, 1931), it was Moore who worked out rounded computer arithmetic and fully developed interval analysis, that is, the continuous mathematics of intervals. Moore (1999) describes his independent discovery of interval analysis. He states (personal communication, December 22, 2006): When I submitted the first draft of my thesis to George Forsythe at Stanford, in late 1959 . . . Forsythe insisted that I do a careful literature search for prior work. I spent the next several months in the mathematics library at Stanford, and thereby found the works of Burkhill, Dwyer, Fischer, Sunaga and R.C. Young.
86
LODWICK
Moore was the first, in 1959, to use interval methods in the solution methods of ordinary differential equations. While it can be argued that Burkill (1924) was the first to deal with interval analysis since his article considers functions of intervals (which includes arithmetic since the algebraic operations are functions), Burkill’s interest is functionals of intervals since he maps intervals to real numbers such as occur in derivatives and integration. Intervals as entities in and of themselves, for example, functions that map intervals to intervals, were not the focus of Burkill’s work. In 1931 Young developed the arithmetic for sets of numbers and was interested in properties of limits in this more general setting. Moreover, Young worked out interval arithmetic and the commutative, associative, and distributive law of scalars (real numbers) over intervals. In 1951 Dwyer particularized Young’s work to compact sets of numbers (intervals). Both Warmus (1956) and Sunaga (1958) had the full development of axiomatic interval arithmetic. Sunaga recognized the importance of interval arithmetic in computational mathematics but did not proceed further. In 1958 Fischer reported a computer program that uses two computer words that automate propagated and roundoff error. One computer word holds the approximate value of the variable; the other word holds the value representing the bound of previous computations and roundoff errors of the approximation. The computer program includes the bound contained in the second word to compute subsequent bounds resulting from the processing being done. Notwithstanding this earlier development, Moore and his colleagues are responsible for developing the early theory, extensions, vision, a wide array of applications, and the actual implementation of interval analysis to computers. Moore’s major contributions include at least the following: 1. Moore recognized how to use intervals in computational mathematics, now called numerical analysis. 2. Moore extended and implemented the arithmetic of intervals to computers. 3. Moore’s work was influential in creating Institute of Electrical and Electronics Engineers (IEEE) standards for accessing computer’s rounding processes, which is a necessary step in obtaining computer-generated verified computations (Kulisch and Miranker, 1986). 4. Moore developed the analysis associated with intervals where the united extension plays a key role in achieving this. One such major achievement was to show that Taylor series methods for solving differential equations are not only more tractable, but more accurate (Moore, 1979). 5. Moore was the first to recognize the usefulness of interval analysis in computer verification methods, especially for solutions to nonlinear equations using the interval Newton method in which the method includes verification of existence and uniqueness of solution(s) by successive intersection of intervals, which is the key to successful interval Newton
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
87
method approaches. Sunaga (1958) did have a version of the Newton method. However, it was Moore who recognized that intersections of resulting intervals were key to an interval Newton method. “In fact George Forsythe [R.E. Moore’s major professor] asked an important question. . . . He asked, ‘what about an interval version of Newton’s method?’ I set to work on finding one, and almost at once saw that an intersection of interval Newton steps was necessary in order to obtain other very important properties which the interval Newton method then had.” (R.E. Moore, personal communication, December 22, 2006). The term range arithmetic, which was the name Moore used for what is now called interval arithmetic, was only used in Moore (1959a). While Moore’s earliest articles point to the use of interval arithmetic to compute the range of a function (hence, the original name of range arithmetic, and so to global optimization), it was Hansen who was the first to develop interval global arithmetic. Walster states how interval global optimization was first conceived: Compute upper bounds on (a function) f at various sampled points. The smallest computed upper bound, f¯, is an upper bound for f ∗ (the global minimum). Then delete subboxes of the current box wherein f > f¯. The concept of generating and using an upper bound in this way was introduced by Hansen (1980). [Personal communication among Moore, William Walster, Hansen, and Lodwick; January 10, 2007]. An interval [a, b] is defined as the set of all real numbers X = {x: a ≤ x ≤ b} on the real number line with the usual meaning of the order relation ≤. A second definition of an interval is as a pair of real numbers that are the end points of the interval. Thus, from the point of view of interval analysis, intervals on the real line have a dual nature, as sets of real numbers and as a new kind of number represented by pairs of real numbers, with an arithmetic, interval arithmetic, developed axiomatically, (1)–(4), and consistent with the set interpretation. The logic associated with interval analysis is one of certain containment. The sum of two intervals certainly contains the sums of all pairs of real numbers, one from each of the intervals. This follows from the definitions of interval arithmetic based on simple properties of the order relation ≤ on the real line. Intersections and unions of intervals also have an algebra and are computed in a straightforward manner. These definitions, unlike those of fuzzy set theory found in the sequel, come from classical set theory. Intersections and unions are crucial in defining what is meant by solutions to simultaneous equations and inequalities, as well as the fundamental building blocks of logical statements. For a given interval [a, b] and a given real number x, the statement x ∈ [a, b] is either true or false. There is no vagueness or ambiguity,
88
LODWICK
except for roundoff when the statement is implemented on a computer. For two intervals A1 and A2 , if we know that x ∈ A1 and x ∈ A2 , then we also know with certainty that x ∈ A1 ∩ A2 . These statements have certainty (except possibly for implementations on the computer where roundoff error can become a factor), unlike statements of this type in fuzzy set theory. Interval arithmetic and the interval analysis developed from it do not assign any measure of possibility or probability to parts of an interval. A number x is either in an interval A or it is not. By introducing probability distributions or possibility distributions on an interval, and using level sets, integrals, or other measures, a connection between intervals and fuzzy sets can be made. The application of functions to intervals is accomplished through what R.E. Moore calls the united extension (an extension principle) already used in Moore and Yang (1959b) and formally defined in Moore et al. (1960), particularized to intervals from the more general extension theory for setvalued functions developed in Strother (1952). In particular, using interval computations, the range of values of a real-valued continuous function f of a single real variable x is contained in an interval [c, d] when x ∈ [a, b] where f ([a, b]) ⊆ [c, d]. Such an interval exists since f is continuous and [a, b] is a compact set so that f attains both a minimum and maximum (and all values in between). Thus, if f is continuous, we may be interested in finding an interval containing a 0 of f that is a solution to the equation f (x) = 0. If 0 ∈ / [c, d], for which we can test using 0 < c or d < 0, then we know that there is no solution in [a, b]. On the other hand, if 0 ∈ [c, d], then it is possible that f has a 0 in [a, b]. It is not certain because it is in the nature of interval computation that it cannot generally find exact ranges of values. That is, we may have 0 ∈ [c, d] but 0 ∈ / f ([a, b]). Techniques have been developed over the past four decades for reducing overestimation of interval ranges of mappings to any prescribed tolerance with enough computing, thus enabling the analysis of whether there exists a 0 of the function in the interval. Computer implementations of intervals add another dimension of uncertainty to intervals. If the lower limit of the range of a function is an irrational number (or even a particular rational number), then it cannot be represented exactly with floating-point numbers, which are a subset of the rational numbers. However, we can find a close rational approximation where “close” is associated with the computer’s precision. Thus, it is necessary when computing with intervals to round down (left) lower end points and round up (right) upper end points, and so compute a result that contains the set of all possible results. ¯ considered as a fuzzy set (defined in An interval number U = [u, u] Section III), has a membership function ¯ μU (x) = 1 for u ≤ x ≤ u, 0 otherwise.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
89
Thus, interval analysis may be considered a subset of fuzzy set theory. As a probability distribution, various authors have considered an interval number as one of two probability density functions. 1. The distribution
for u ≤ x ≤ u, ¯ u < u, ¯ 0 otherwise. 2. An interval may represent the fact that all we know is the support so that ¯ is in the the set of all distributions, p(x), with support suppp(x) = [u, u] interval. Many, including Berleant (1993), take this point of view. p(x) =
1 u−u ¯
Remark. The issue of how to interpret intervals as probabilities is emphasized in Dubois et al. (1997), where semantics is also discussed. Moreover, it turns out that intervals form the underlying fundamental principle of fuzzy and interval numbers (Dubois and Prade, 2005; Fortin et al., 2006) (discussed in Section III). Neumaier (1990, page 1) states, “Interval arithmetic is an elegant tool for practical work with inequalities, approximate numbers, error bounds, and more generally with certain convex and bounded sets.” He goes on to say that intervals arise naturally in: 1. Physical measurements 2. Truncation error, the representation of an infinite process by a finite process (a) Representation of numbers by finite expansions (b) Finite representation of limits and iterations 3. Numerical approximations 4. Verification of monotonicity and convexity 5. Verification of the hypotheses of fixed-point theorems (the contraction mapping theorem or Brouwer’s fixed-point theorem are examples) 6. Sensitivity analysis, especially as applied to robotics 7. Tolerance problems Interval arithmetic and analytic methods have been used to solve an impressive array of problems given that these methods capture error (modeling, roundoff, and truncation) so that rigorous accounting of error together with the contraction mapping theorem or the Brouwer fixed-point theorem allow for computer verification of existence, uniqueness, and enclosure. In particular, Tucker (2002), using interval analysis, solved a long-outstanding problem [Smale’s 14th conjecture (Smale, 1998)] by showing that the Lorenz equations do possess a strange attractor. Davies (2005, page 1352) observes, Controlled numerical calculations are also playing an essential role as intrinsic parts of papers in various areas of pure mathematics. In some areas
90
LODWICK
of nonlinear PDE, rigorous computer-assisted proofs of the existence of solutions have been provided. . . . These use interval arithmetic to control the rounding errors in calculations that are conceptually completely conventional. Another long-standing problem, Kepler’s conjecture about the densest arrangement of spheres in space, was solved by Hales (2000) using interval arithmetic. Ten problems were posed by Trefethen in the January/February, 2002 issue SIAM News; each had a real number solution, and the objective was to obtain a 10-digit solution to each of the problems. The book of Bornemann et al. (2004) documents not only the correct solutions, but the analysis behind the problems. One of the authors, Wagon (personal communication), indicated that, Intervals were extremely useful in several spots. Problem 4: Designing an optimization algorithm to solve it by subdividing . . . . Problem 2: Intervals could be used to solve it by using smaller and smaller starting interval until success is reached. Moreover, Problems 2, 4, 7, and 9, intervals yield proofs that the digits are correct. Chapter 4 of Bornemann et al. (2004) contains an exposition of interval optimization. Robust stability analysis for robots that uses interval analysis methods, performed with the aid of a computer that is verified to be mathematically correct, can be found in Daney et al. (2004), Jaulin (2001a), or Jaulin et al. (2001b). There are excellent introductions to interval analysis beginning with Moore’s book (Moore, 1966) (also see other texts listed in the references). A more recent introduction can be found in Corliss (2004) and downloaded from http://www.eng.mu.edu/corlissg/PARA04/READ_ME. html. Moreover, introductions can be downloaded from the interval analysis website (http://www.cs.utep.edu/interval-comp). A. Interval Extension Principle Moore recognizes, in three Lockheed Aircraft Corporation technical reports (Moore, 1959a; Moore and Yang, 1959b; Moore et al., 1960), that the extension principle is a key concept. Interval arithmetic, rounded interval arithmetic, and computing range of functions can be derived from interval extensions. Of issue is how to compute ranges of set-valued functions. This requires continuity and compactness over interval functions, which in turn needs well-defined extension principles. In 1960, Moore (Moore et al., 1960) used for the first time in an explicit way, the extension principle for intervals called the united extension, which particularizes set-valued extensions to sets that are intervals. Following
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
91
Strother’s development (Strother, 1952, 1955), a specific topological space is the starting point. Definition 3. A topological space {X, Ω} on which open sets are defined (X is a set of points, and Ω is family of subsets of X) has the property that X ∈ Ω, ∅ ∈ Ω, finite intersections, and uncountable unions of sets in Ω are back in Ω. The topological space {X, Ω} is called a T1 -space if for every x ∈ X and y ∈ X (distinct points of X), there is an open set Oy containing y but not x. Metric spaces are T1 . In addition, if there exists an open set Ox containing x such that Ox ∩ Oy = ∅, then the space is called a T2 -space, or a Hausdorff space. The set of subsets of X is denoted by S(X). Lemma 1 (Strother, 1955). If f : X → Y is a continuous multivalued function and Y is a T1 -space, then f is closed for all x. That is, the image of a closed set in X is closed in Y . Remark. This means that f is a point-closed function. In other words, when the sets of X, S(X), are retracted to points, then the set-valued function F on S(X) becomes the original function f when defined through the retract. Phrased in another manner, if f is a real-valued function, then the mapping extended to sets of X ⊆ R, S(X), where the range is endowed with a T1 topology, is well defined. Remark. (Moore et al., 1960). If f : X → Y is an arbitrary mapping from an arbitrary set X into an arbitrary set Y , the united extension of f to S(X), denoted F , is defined as follows. F : S(X) → S(Y )
where F (A) = f (a) | ∀a ∈ A, A ∈ S(X) , in particular F {x} = f (x) | x ∈ {x}, {x} ∈ S(X) where {x} is a singleton set.
Thus, F (A) =
f (a).
a∈A
This definition, as we shall see, is quite similar to the fuzzy extension principle of Zadeh, where the union is replaced by the supremum. Theorem II.1 (Strother, 1955). Let X and Y be compact Hausdorff spaces and f : X → Y continuous. Then the united extension of f , F , is continuous. Moreover, F is closed.
92
LODWICK
There are fixed “point” (set) theorems associated with these set-valued maps, the united extensions, that can be found in Strother (1955) and not repeated here since we are not interested in the full generality of these theorems, but only as they are related to intervals. Moore (Moore et al., 1960, 1962, 1966) particularized the ideas of Strother (1952, 1955) to spaces consisting of subsets of R that are closed and bounded (real intervals), which we denote as S([R]), where S([X]) denotes the set of all intervals on any set of real numbers X ⊆ R. To this end, Moore needed to develop a topology on S([R]) that proved to be Hausdorff. The results of interest associated with the united extension for intervals are the following (Moore, 1966): 1. Isotone Property: A mapping f from a partially ordered set (X, rX ) into another (Y, rY ), where rX and rY are relations, is called isotone if xrX y implies f (x)rY f (y). In particular, the united extension is isotone with respect to intervals and the relation ⊆. That is, for A, B ∈ S([X]), if A ⊆ B, then F (A) ⊆ F (B). 2. The Knaster–Tarski Theorem: An isotone mapping of a complete lattice into itself has at least one fixed point. The Knaster–Tarski theorem implies that F , the united extension, F : S([R]) → S([R]), has at least one fixed “point” (set) in S([R]), which may be the empty set, and has an important numerical consequence. Consider the sequence {Xn } in S(X) defined by (choosing X0 = X) Xn+1 = F (Xn ). Since X1 ⊆ F (X0 ) = F (X) ⊆ X = X0 , then, by induction, Xn+1 ⊆ Xn . Let Y =
∞
Xn .
n=0
The following is true (Moore et al., 1960). If x = f (x) is any fixed point of f in X, then x ∈ Xn for all n = 0, 1, 2, . . . so that x ∈ Y and x ∈ F (Y ) ⊆ Y. Thus, Xn , Y , and F (Y ) contain all the fixed points f in X. If Y and/or F (Y ) is empty, then there are no fixed points of f in X. Newton’s method is a fixedpoint method, so that the above theorem pertains to a large class of problems. Moreover, these enclosures lead to computationally verified solutions when implemented on a computer with rounded interval arithmetic.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
93
B. Interval Arithmetic Interval arithmetic was defined axiomatically by Young (1931), Dwayer (1951), Warmus (1956) in 1956, and then independently by Sunaga (1958). Moore (1959a, 1959b) rediscovers and extends interval arithmetic to rounded interval arithmetic, thereby allowing interval arithmetic to be useful in computational mathematics. There are two approaches to interval arithmetic. The first is that obtained by application of the united extension. The second approach is the axiomatic approach. An interval arithmetic and associated semantics allow for “intervals,” [a, b], for which a > b (Gardeñes et al., 1986; Hanss, 2005). This arithmetic is related to directed interval arithmetic (see Section II.B.3) and has some interesting applications to fuzzy control (Bondia et al., 2006; Sáinz, 2001). The basic axioms associated with interval arithmetic are (1)–(4). They are more fully developed in Moore (1966). There are various properties associated with the axiomatic approach to interval arithmetic that differ from those of real numbers and the constraint interval arithmetic defined subsequently. In particular, interval arithmetic derived from axioms is subdistributive. Thus, from Moore (1979) we have for intervals X, Y , and Z: 1. 2. 3. 4. 5. 6. 7.
X + (Y + Z) = (X + Y ) + Z X · (Y · Z) = (X · Y ) · Z X+Y =Y +X X·Y =Y ·X [0, 0] + X = X + [0, 0] = X [1, 1] · X = X · [1, 1] = X X · (Y + Z) ⊆ X · Y + X · Z
The associative law for addition The associative law for multiplication The commutative law for addition The commutative law for multiplication Additive identity Multiplicative identity The subdistributive property.
Example. Moore (1979, page 13) points out that [1, 2](1 − 1) = [1, 2](0) = 0, whereas [1, 2](1) + [1, 2](−1) = [−1, 1]. Remark. From Moore’s (Moore, 1959a) implementation of Sunaga (1958) [neither Moore nor Sunaga seem to have been aware of Warmus’ earlier work (Warmus, 1956)], he states that X ◦ Y = z | z = x ◦ y, x ∈ X, y ∈ Y, ◦ ∈ {+, −, ×, ÷} , which means that Moore applies the united extension for distinct intervals X and Y . However, Moore abandons this united extension definition and
94
LODWICK
develops axioms. The axioms lead to a simplification of the operations since one need not account for multiple occurrences, while at the same time it leads to overestimation (which is severe at times). From the beginning, Moore was aware of the problems of overestimation associated with multiple occurrences of the same variable in an expression. Moreover, it is apparent that, from the axiomatic approach, X − X is never 0 unless X is a real number (a zero width interval). Moreover, X ÷ X is never 1 unless X is a real number (a zero width interval). 1. Axiomatic Approach The axiomatic approach to interval arithmetic considers all instantiations of variables as independent. That is, the Young, Warmus, Sunaga, and Moore axiomatic approach to interval arithmetic is one in which multiple occurrences of a variable in an expression are considered as independent variables. While axiomatic interval arithmetic is quite simple to implement, it leads to overestimations. Example. Consider f (x) = x(x − 1),
x ∈ [0, 1].
Using the axiomatic approach, [0, 1] [0, 1] − 1 = [0, 1][−1, 0] = [−1, 0].
(5)
However, the interval containing f (x) = x(x − 1) is [−0.25, 0]. This is because the two instantiations of the variable x are taken as independent when they are dependent. The united extension F (A), which is F [0, 1] = f (x) = [−0.25, 0], x∈[0,1]
was not used. If the calculation were x(y − 1) for x ∈ [0, 1], y ∈ [0, 1], then the tightest interval containing x(y − 1), its united extension, indeed is [−1, 0]. Note that the subdistributivity property does not use the united extension in computing X · Y + X · Z, but instead considers X · Y + W · Z, where W = X. Partitioning the interval variables (that are repeated) leads to closer approximation to the united extension. That is, take the example above and partition the interval in which x lies.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
95
Example. Consider x(x − 1) again, but x ∈ [0, 0.5] ∪ [0.5, 1]. This yields [0, 0.5] [0, 0.5] − 1 ∪ [0.5, 1] [0.5, 1] − 1 (6) = [0, 0.5][−1, −0.5] ∪ [0.5, 1][−0.5, 0]
(7)
= [−0.5, 0] ∪ [−0.5, 0] = [−0.5, 0],
(8)
which has an overestimation of 0.25 compared with an overestimation of 0.5 when the full interval [0, 1] was used. In fact, for operations that are continuous functions, a reduction in width leads to estimations that are closer to the united extension and in the limit, to the exact united extension value (Moore, 1962, 1966, 1979; Neumaier, 1990). Other approaches that find ways to reduce the overestimation arising from the axiomatic approach have proved extremely useful; these include the centered, mean value, and slope forms (Hansen, 1992; Kearfott, 1996a; Moore, 1979; Neumaier, 1990, 2004b; Ratschek and Rokne, 1988). More recently, the Taylor models discussed below (Makino and Berz, 2003, 2005) exhibit a high order of convergence and minimization of the dependency illustrated in the above two examples. 2. Interval Arithmetic from the United Extension: Constraint Interval Arithmetic The power of the axiomatic approach to interval arithmetic is its application simplicity. Its complexity is at most four times that of real-valued arithmetic. However, the axiomatic approach to interval arithmetic leads to overestimations in general because it takes every instantiation of the same variable independently. As seen below, the united extension, when applied to sets of real numbers, is global optimization, which generally is NP-Hard. Conversely, simple notions such as X−X =0
(9)
X ÷ X = 1, 0 ∈ /X
(10)
and
are desirable properties and can be maintained if the united extension is used to define interval arithmetic [as will be seen below and in Lodwick (1999)]. In the context of fuzzy arithmetic, which uses interval arithmetic, Klir (1997) looked at fuzzy arithmetic, which was constrained to account for Eqs. (9) and (10) from a case-based approach. What is given next was developed in Lodwick (1999) independently of Klir (1997) and is more general than the case-based method. Constraint interval arithmetic is
96
LODWICK
derived directly from the united extension rather than axiomatically or casebased. It is known that applying interval arithmetic to the union of intervals of decreasing width yields tighter bounds on the result that converges to the united extension interval result (Moore, 1962). Of course, for n-dimensional problems, “intervals” are rectangular parallelepipeds (boxes), and as the diameters of these boxes approach 0, the union of the result approaches the correct bound for the expression. Partitioning each of the sides of the n-dimensional box in half has complexity of O(2n ) for each split. Theorems proving convergence to the exact bound of the expression and the rates associated with the subdivision of intervals can be found in Hansen (1992), Kearfott (1996a), Moore (1979), Neumaier (1990, 2004b) or Ratschek and Rokne (1988). What is proposed here is to redefine interval numbers in such a way that dependencies are explicitly kept. The ensuing arithmetic will be called constraint interval arithmetic. This new arithmetic is the derivation of arithmetic directly from the united extension of Strother (1952). An interval number is redefined (Lodwick, 1999) into an equivalent form next as the graph of a function of one variable and two coefficients or parameters. Definition 4. An interval [x, x] ¯ is the graph of the real single-valued function X I (λx ), where XI (λx ) = λx x + (1 − λx )x, ¯
0 ≤ λx ≤ 1.
(11)
Strictly speaking, in Eq. (11), since the numbers x and x¯ are known (inputs), they are coefficients, whereas λx is varying, although constrained between 0 and 1, hence the name constraint interval arithmetic. Note that Eq. (11) defines a set representation explicitly, and the ensuing arithmetic is developed on sets of numbers. The algebraic operations are defined as follows: Z = [z, z¯ ] = X ◦ Y (12) = z | z = x ◦ y, ∀x ∈ X I (λx ), y ∈ Y I (λy ), 0 ≤ λx , λy ≤ 1 = z | z = λx x + (1 − λx )x¯ ◦ λy y + (1 − λy )y¯ , 0 ≤ λx ≤ 1, 0 ≤ λy ≤ 1 , where z = min{z},
z¯ = max{z},
and ◦ ∈ {+, −, ×, ÷}.
(13)
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
97
Remark. It is clear from Eq. (13) that constraint interval arithmetic requires a global optimization. When the operations use the same interval, no exceptions are necessary, as in Klir (1997). Using only Eq. (12), we obtain, (14) Z = [z, z¯ ] = X ◦ X = z | z = λx x + (1 − λx )x¯ ◦ λx x + (1 − λx )x¯ , 0 ≤ λx ≤ 1 = z | z = λx x + (1 − λx )x¯ ◦ λx x + (1 − λx )x¯ , 0 ≤ λx ≤ 1 . This results in the following properties: 1. Addition of the same interval variable: X + X = z | z = λx x + (1 − λx )x¯ + λx x + (1 − λx )x¯ , 0 ≤ λx ≤ 1 = z | z = 2 λx x + (1 − λx )x¯ , 0 ≤ λx ≤ 1 = [2x, 2x]. ¯ 2. Subtraction of the same interval variable: X − X = z | z = λx x + (1 − λx )x¯ − λx x + (1 − λx )x¯ , 0 ≤ λx ≤ 1 = 0. 3. Division of the same interval variable, 0 ∈ / X: X ÷ X = z | z = λx x + (1 − λx )x¯ ÷ λx x + (1 − λx )x¯ , 0 ≤ λx ≤ 1 = 1. 4. Multiplication of the same interval variable with x < x¯ X × X = z | z = λx x + (1 − λx )x¯ × λx x + (1 − λx )x¯ , 0 ≤ λx ≤ 1 = z | z = λ2x x 2 + 2(1 − λx )xλx x¯ + (1 − λx )2 x¯ 2 , 0 ≤ λx ≤ 1
= min x 2 , x¯ 2 , 0 , max x 2 , x¯ 2 , 0 . To verify that this is the interval solution, note that as a function of the single variable λx , the product, X × X, is f (λx ) = (x¯ − x)2 λ2x + 2x(x¯ − x)λx + x 2 , which has a critical point at λx = −
x . x¯ − x
98
LODWICK
Thus,
x z = min f (0), f (1), f − , x¯ − x x , z¯ = max f (0), f (1), f − x¯ − x z¯ = max x 2 , x¯ 2 , 0 z = min x 2 , x¯ 2 , 0 ,
as is obvious. Of course, if x = x, ¯ then X × X = x 2 . 5. X(Y + Z) = XY + XZ. Constraint interval arithmetic is the complete implementation of the united extension, and it provides an algebra that possesses an additive inverse, a multiplicative inverse, and a distributive law. 3. Specialized Interval Arithmetic Various interval arithmetic approaches have been developed in addition to the axiomatic and united extension approaches. Different representations of intervals have been created and include the development of range and rational arithmetic. These purport to simplify operations and/or obtain more accurate results using arithmetic. Another issue addressed by researchers was how to extend interval arithmetic, called extended interval arithmetic, to handle unbounded intervals that may be entered or result from a division by 0. The general space of improper intervals, which includes extended interval arithmetic, called directed interval arithmetic, was developed subsequently. Next, generalized interval arithmetic and its more recent generalizations, affine arithmetic and Taylor model arithmetic, deals with the problem of reducing overestimation that characterizes the axiomatic approach to interval arithmetic. Triplex arithmetic and its generalization, quantile arithmetic, were developed to carry more information than an interval carries. These specialized arithmetics are presented next. a. Interval Arithmetic with Different Representation. Two representations of numbers can be used in interval arithmetic. Range arithmetic is the midpoint/error form of representing an interval. Rational arithmetic as it applies to interval arithmetic is mentioned because of its potential “speed” per unit of work and accuracy as a way to represent floating point numbers. Therefore, its representation would propagate less overestimation. Range Arithmetic. Range arithmetic was developed by Aberth (1988, pages 13–25). A range number [Eq. (15)] is really an interval where ¯ X = m ± = [x, x],
m=
x¯ − x x + x¯ , = , 2 2
(15)
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
99
and the arithmetic (Aberth, 1988), page 15, is given by: X1 + X2 = (m1 ± 1 ) + (m2 ± 2 ) = (m1 + m2 ) ± (1 + 2 ),
(16)
X1 − X2 = (m1 ± 1 ) − (m2 ± 2 ) = (m1 − m2 ) ± (1 + 2 ),
(17)
X1 × X2 = (m1 ± 1 ) × (m2 ± 2 ) ⊆ (m1 m2 ) ± 1 |m2 | + 2 |m1 | + 1 2 ,
(18)
X1 ÷ X2 = (m1 ± 1 ) ÷ (m2 ± 2 ) 1 1 + | m m1 m2 |2 ± . ⊆ m2 |m2 | − 2
(19)
2 Using range arithmetic, X − X = [−2, 2], and X ÷ X = 1 ± |m|− , which means that the problem of repeated variables still remains, although with improved bounds in general.
Rational Arithmetic. Rational arithmetic is arithmetic on numbers whose representations are fractions. This is particularly useful since rational numbers have simple continued fraction representations. Moreover, there are rational arithmetic chips that use continued fraction representation of numbers (Korenerup and Matula, 1983). This is, in general, a more precise representation of floating-point numbers and computer arithmetic, although this approach has not been an accepted part of computer hardware. Aberth (1978, 1988, Chapter 5) shows how to use rational arithmetic to form more accurate representations for range (interval) arithmetic. To provide a computer interval arithmetic, one simply rounds outwardly all data and all intermediate rational arithmetic calculations. Since the round-down and round-up values are rational, this process is quite simple. In general, the result will be more accurate per available bytes devoted to floating-point number. b. Interval Arithmetic on the Extended Real Number System. Interval arithmetic carried out on the computer must deal with division by 0 (among other challenges). Therefore, the axioms are extended to include arithmetic on values that are infinite so that they must deal with the extended real number system. Two approaches have been developed—extended real interval arithmetic and directed interval arithmetic. Extended Real Interval Arithmetic. Interval arithmetic on the set of extended real numbers is called extended interval arithmetic. Kahan (1968b) was the first to propose the extension to handle, among other things, division by intervals containing 0 and subsequent processing of the resulting intervals. This is especially useful in Newton’s method, and in this context Hansen (1978) uses interval arithmetic over the extended real numbers. A review of the extended real interval system can be found in Walster (1998). INTLAB,
100
LODWICK
C-XSC, PASCAL-XSC, Sun Microsystems Fortran, and Mathematica support extended interval arithmetic. The most recent approach to extended interval arithmetic is due to Pryce and Corliss (2006), who develop the theory and implementation of containment sets for interval arithmetic. The idea is to consider intervals as sets of real numbers where, in the implementation, interval sets become abstract data types. The arithmetic becomes operations on sets. Directed Interval Arithmetic. Two approaches to extended interval arithmetic were developed. Kahan’s approach dealt with the problem of how to incorporate plus and minus infinity as end points of intervals in interval arithmetic. The approach of Ortolf (1969) and Kaucher (1973, 1980) was to mathematically complete the set of intervals I (R) to its closure I (R) by including what they call nonregular intervals. However, previously, Warmus (1956, 1961) had considered this space and its arithmetic. Alefeld and Herzberger (1983, page 8) state: These intervals are interpreted as intervals with negative width. The point intervals [a, a] are no longer minimal elements with respect to the ordering ⊆. All the structures of I (R) are carried over to I (R) ∪ I (R), and a completion through two improper elements p and −p is achieved. In this manner the division by an interval A = [a, a] ¯ with a ≤ 0 ≤ a, ¯ a = a, ¯ can also be defined. This approach was studied by Gardeñes et al. (1986) and Hanss (2005). Popova (1998) states, Directed interval arithmetic is obtained as an extension of the set of normal intervals by improper intervals and a corresponding extension of the definitions of the interval arithmetic operations. The corresponding extended interval arithmetic structure possesses group properties with respect to addition and multiplication operations and a number of other advantages. c. Interval Arithmetic—Reducing the Effects of Dependencies. Moore recognized from the beginning the problems associated with dependencies and the axiomatic approach. Various approaches have addressed this issue. Three of these are discussed: generalized interval arithmetic, affine arithmetic, and Taylor model arithmetic. Generalized Interval Arithmetic. 1. One of the first attempts to deal with the fact that axiomatic interval arithmetic often yields results that are not sharp (overestimates) is generalized
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
101
interval arithmetic (Hansen, 1975). An interval is represented by ¯ = y + [−c, c], X = [x, x] y−c =x
c ≥ 0,
and y + c = x. ¯
(20)
An arbitrary point x ∈ X is expressed as x = y + ξ,
ξ ∈ [−c, c].
(21)
Observe that Eq. (21) is precisely the range arithmetic representation [Eq. (15)] published 9 years after generalized interval arithmetic. Whereas range arithmetic is an arithmetic on intervals, generalized interval arithmetic is an interval arithmetic that specifically deals with reducing the effects of dependencies. Range arithmetic does not deal explicitly with the issue of dependency. This means that any interval Xk (k > n), depending of the previous intervals j = 1, . . . , n, has the form Xk = Y k +
n
ξl Zlk ,
(22)
l=1
where Yk and Zlk are previously computed intervals k = 1, 2, . . . , l = 1, 2, . . . , n, and ξl ∈ [−cl , cl ]. An interval Xk of the form Eq. (22) is called a generalized interval. The actual interval is Xk = Yk +
n [−cl , cl ]Zlk l=1
= Yk + [−1, 1]
n
cl zlk ,
(23)
l=1
lk |} = maxz∈Zlk |z|. The general where zlk = |Zlk | = max{|Z lk |, |Z arithmetic operations are defined in Hansen (1975) as follows: 2. Generalized interval: Addition/Subtraction Xk = X i ± X j = Y i ± Yj +
n
ξl (Zli ± Zlj ).
l=1
Let Yk = Yi ± Yj and Zlk = Zli ± Zlj , then Xk = Y k +
n
ξl Zlk ,
l=1
which is again in the form of generalized interval arithmetic.
102
LODWICK
3. Generalized interval: Multiplication Xk = Y i Y j +
n
ξl (Yi Zli + Yj Zlj )
l=1
+
n n
(24)
ξl ξm Zli Zlj .
l=1 m=1
For l = m, replace [−cl , cl ]2 by [0, cl2 ], and for l = m replace ξl ξm by ξl [−cm , cm ], which means that Eq. (24) can be replaced by Xk = Yk +
n
(25)
ξl Zlk ,
l=1
where Yk = Y i Y j +
n
0, cl2 Zli Zlj ,
l=1
Zlk = Yi Zlj + Yj Zli +
n
[−cm , cm ]Zmj
m=1,l=m
= Yi Zlj + Yj Zli + [−1, 1]zil
n
cm zj m
m=1,l=m
and zil = |Zil |, zj m = |Zj m |. The result of multiplication of a generalized interval [Eq. (25)] is again a generalized interval. 4. Generalized interval: Division Xi Xk = Xj n n Yj + = Yi + ξl Zli ξm Zmj l=1
n
m=1
ξl (Yj Zli − Yi Zlj ) Yi + l=1 n Yj Yj (Yj + m=1 ξm Zmj ) n = Yk + ξl Zlk , =
l=1
where Yk =
Yi Yj , zmj
= |Zmj |,
Zlk =
ξl (Yj Zli − Yi Zlj ) , Yj (Yj + [−1, 1] nm=1 cm zmj )
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
103
and ξm is replaced by [−cm , cm ]. Generalized interval arithmetic has been successfully applied to problems associated with computer graphics of equations and surface rendering (Tupper, 1996). Affine Arithmetic. Another more recent approach to minimize the effects of overestimation due to dependencies is affine arithmetic (Stolfi et al., 1994). A number x, whose value is subject to uncertainty, has the representation x = x 0 + x 1 1 + · · · + x n n ,
(26)
where xi are coefficients (known, real) and the i ∈ [−1, 1]. That is, xi represents the magnitude of “error” and i represents the ith uncertainty that is contributing to the total uncertainty represented by the interval. To recover an interval from the affine representation of a number [Eq. (26)], each of the i is replaced by [−1, 1] so that X = [x0 − ξ, x0 + ξ ],
n
(27)
¯ to obtain where ξ = i=1 |xi |. Clearly, if one is given an interval X = [x, x], the affine number representation in Eq. (26), x = x 0 + x 1 1 , ¯ x2 = (x − x)/2, ¯ and 1 ∈ [−1, 1], where the subscript where x0 = (x + x)/2, is used to distinguish the variable x0 from all other variables. Example. Suppose x = 4 + 31 + 22 + 3 , and y = 2 + 1 − 32 + 4 . From this representation, we see that x depends on variables 1, 2, and 3, whereas y depends on variables 1, 2, and 4. Note that the dependency information is carried forward, where X = [−2, 10], and Y = [−3, 7]. Interval analysis, which does not carry forward dependencies, would obtain a sum of Z = X + Y = [−2, 10] + [−3, 7] = [−5, 17]. Affine arithmetic yields z = x + y = 6 + 41 − 2 + 3 + 4 , from which we obtain Z = [−1, 13]. Addition and subtraction of affine numbers is straightforward. However, the challenge comes with multiplication and division in such a way that the interval result is guaranteed to enclose the true result. Moreover, obtaining an affine number for arbitrary transcendental functions, and composites of these, is another challenge. These have been computed for interval arithmetic in such
104
LODWICK
a way that roundoff errors are incorporated in the final result. The challenge in affine arithmetic as it is in quantile, range, and generalized interval arithmetic, is how to implement multiplication and division. These matters can be found in Stolfi et al. (1994). Multiplication (Stolfi et al., 1994) z = xy = x0 +
n
= x0 y0 +
y0 +
x i i
i=1 n
n
y i i
i=1
(x0 yi + y0 xi )i +
i=1
= z0 +
n
n
x i i
n
i=1
y i i
(28)
i=1
zi i + Q(1 , . . . , n ),
(29)
i=1
where zi = x0 yi + y0 xi , and n n n n Q(1 , . . . , n ) = x i i y i i = x i y j i j ⊆
i=1 n n
(30)
i=1 j =1
i=1
xi yj [−1, 1][−1, 1]
(31)
i=1 j =1
= [−1, 1]
n n
(32)
xi yj
i=1 j =1
= [q, q]. ¯
(33)
To obtain an affine representation of Eq. (28) that contains the product, z = zˆ 0 + = z0 +
n i=1 n+1
z i i +
q + q¯ 2
+
q − q¯ 2
n+1
z i i ,
i=1
where z0 = zˆ 0 + (q + q)/2, ¯ and zn+1 = (q − q)/2. ¯ Stolfi et al. (1994) go on to successfully and efficiently apply affine arithmetic to computer graphics. Taylor Model Arithmetic. The Taylor model (Makino and Berz, 2003, 2005) is perhaps the most successful tractable modern approach to deal with dependencies. The Taylor model is a method to do arithmetic on functions.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
105
Since arithmetic of numbers is a function, these methods can be applied to ordinary interval arithmetic. For the general Taylor model, Makino and Berz (2003) state: . . . the Taylor model has the following fundamental properties: 1. The ability to provide enclosures of any function with a finite computer code list by a Taylor polynomial and a remainder bound with a sharpness that scales with order (n + 1) of the width of the domain. 2. The ability to alleviate the dependency problem in the calculation. 3. The ability to scale favorable to higher-dimensional problems. The basic definition is: Definition 5. Let f : D ⊆ Rk → R be a function that is (n + 1) times continuously differentiable on an open set containing the domain D. Let x0 ∈ D and P the nth-order Taylor polynomial of f around x0 . Let I be an interval such that f (x) ∈ P (x − x0 ) + I
for all x ∈ D.
(34)
Then the pair (P , I ) is called an nth-order Taylor model of f around x0 on D. Let T1 = (P1 , I1 ) and T2 = (P2 , I2 ). Then, the Taylor model arithmetic is given as follows. 1. Addition (see Makino and Berz, 2003, page 384): T1 + T2 = (P1 , I1 ) + (P2 , I2 ) = (P1 + P2 , I1 + I2 ). 2. Multiplication (see Makino and Berz, 2003, page 384): T1 × T2 = (P1 , I1 ) × (P2 , I2 ) = (P1·2 , I1·2 ), where P1·2 is the part of the polynomial P1 · P2 up to order n. The interval part is I1·2 = B(Pe ) + B(P1 ) · I2 + B(P2 ) · I1 + I1 · I2 , where Pe is the part of the polynomial P1 · P2 of orders n + 1 up to 2n, and B(P ) denotes a bound of P on the domain D. Makino and Berz go on to state that B(P ) is required to be at least as “sharp” as direct interval evaluation of P (x − x0 ) on D. The rules for subtraction and division are clear. Remark. When the polynomials are simply numbers, P1 = x, P2 = y, I1 , and I2 represent the interval bounds on the roundoff error. As such, the Taylor model for numbers resembles Fischer’s (1958) approach. Remark. When evaluating functions and doing arithmetic or analysis with functions, it is clear that the Taylor model is an excellent approach since
106
LODWICK
overestimation on dependencies is lessened, if not eliminated, up to the precision of the floating-point representation being provided the computer program code list is of reasonable size. It is also clear that the Taylor model is one that enclosures results and thus is able to verify (formally defined in the sequel). A downloadable software package for the Taylor model is available at http: //www.beamtheory.nscl.msu.edu/cosy. d. Interval Arithmetic—The Carrying of More Uncertainty Information. Intervals carry only the bound information on the uncertainty they represent. Two methods of carrying more uncertainty information, short of doing arithmetic on distributions (Section IV), are presented next. Triplex arithmetic carries along a “central” value, whereas quantile arithmetic (a generalization of triplex arithmetic) carries an arbitrary but finite amount of intermediate information about the distribution of the uncertainty that lies between the end points of the interval. Triplex Arithmetic. Triplex arithmetic (Nickel, 1969) is a way to carry more information about the uncertainty beyond the bounds that are represented by the end points of the interval (the end points of the support if it is a distribution) by keeping track of a main value within the interval in addition to its end points. According to Nickel (1969), triplex arithmetic started as a project initiated in 1966 at the University of Karlsruhe to develop a compiler and to demonstrate its usefulness for solutions to problems in numerical analysis. Three-valued set theory has also been studied by Klaua (1969) and Jahn (1980). The presentation here is a synopsis of Nickel (1969). ˜ x], ¯ x ≤ x˜ ≤ x, ¯ where [x, x] ¯ is the interval, A triplex number is X = [x, x, and x˜ is called the main value. The main value could be an average value if this is known. The arithmetic is straightforward in the sense that Z = X ◦ Y = [z, z˜ , z¯ ]
where ◦ ∈ {+, −, ×, ÷},
(35)
¯ ◦ [y, y] ¯ (obtained from interval arithmetic), [z, z¯ ] = [x, x]
(36)
z˜ = x˜ ◦ y. ˜
(37)
In statistical arithmetic, if the main value is to be interpreted as something like a mean or mode, an issue arises when the ◦ in Eq. (37) is multiplication or division. Thus, the interpretation of the resulting main value from a statistical point of view using Eq. (37), is problematic. That is, the semantic value of the resulting “main value” in multiplication and division, as computed by Eq. (37), is not clear.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
107
Quantile Arithmetic. Quantile arithmetic (Dempster, 1969, 1974) is a way to carry more information about the uncertainty bound in a probabilistic and statistically faithful way than triplex arithmetic. While it is more complex (as will be seen), it does have a well-defined probabilistic and statistical semantics. In fact, triplex arithmetic can be represented by quantile arithmetic. In particular, quantile arithmetic approximates distributions whose support is an interval (which can be infinite for extended interval arithmetic), whose value lies between the given lower and upper bounds, and whose error at each arithmetic operation is independent. In Dempster (1969, 1974), a threepoint arithmetic is used to approximate a discrete distribution, although there is nothing to prevent using a finer approximation except computational time consideration, and this presentation presents a synopsis of these results using the three-point approximation with instruction on how to do arithmetic on a finer mesh. In triplex arithmetic, a main value is carried. Quantile arithmetic accounts for the uncertainty within a given interval and the manner in which this uncertainty within the interval propagates. The problem of how the uncertainty is distributed is especially problematic when the uncertainty has a large support, and the bulk of the uncertainty is amassed around a single value, that is, it has a narrow dispersion and a long tail. Developing an arithmetic begins not only with the assumption of independence, but that the variables are represented by an absolutely continuous distribution or a discrete distribution. For the three-point quantile arithmetic, such consider a variable X, whose three-point approximation is denoted X, that its distribution is ⎧ if x = x1 , where P (X ≤ x1 ) = α ⎪ ⎨ αY 1 − 2α if x = x2 , where P (X ≤ x2 ) = 12 fX (38) (x) = ⎪ if x = x3 , where P (X ≤ x3 ) = 1 − α ⎩α 0 otherwise with 0 ≤ α ≤ 12 . From the construction of fX , we have x ≤ x1 ≤ x2 ≤ ¯ where [x, x] ¯ is the support of the distribution (where it is understood x3 ≤ x, that we are using the extended real line in the case in infinite support). The x1 ≤ x2 ≤ x3 are, respectively, the αth quantile, 12 th quantile (median), and the (1 − α)th quantile of the absolutely continuous random variable X with support [x, x]. ¯ Let Sα be the space of all independent random variables of the form X, that is, whose distribution is given by Eq. (38). The parameter α is typically fixed a priori. The more concentrated the random variables in Sα , the larger α may be, and the choice is based on the probabilistic interpretation that is applicable to the problem at hand. According to Dempster (1969, 1974), the 1 choice of α = 20 is a reasonable value since 90% of the probability will then lie between x1 and x3 . It is clear that S 1 is isomorphic to R, and S0 is the 2
108
LODWICK
set of triplex numbers, where the median is the main value. In what follows, it is assumed that 0 ≤ α < 12 , and we drop the “hat” designation of the approximate random variables. Let ◦ ∈ {+, −, ×, ÷} so that Z = X ◦ Y consists of a nine-point distribution, where the resultant support is handled by interval arithmetic, with discrete density pi pj if z = xi ◦ yj for i, j = 1, 2, 3 fZ (z) = 0 otherwise, and p1 = α, p2 = 1 − 2α, p3 = α, where (x1 , x2 , x3 ) and (y1 , y2 , y3 ) are the defined triplets for X and Y . We must approximate Z by a triplet (w1 , w2 , w3 ) for it to be a member of Sα . To do so, 1. Order the nine values zk = xi ◦ yj , k = 1, . . . , 9 (z1 ≤ · · · ≤ z9 ), and denote their associated probabilities qk . 2. Take w1 to be the largest zk for which ki=1 qi ≤ α. 3. Take w2 to be the smallest zk for which ki=1 qi ≥ 12 . 4. Take w3 to be the smallest zk for which ki=1 qi ≥ 1 − α. A real number r has a distribution R in Sα of 1 x=r fR (x) = 0 otherwise, so that R = (r, r, r). Scalar multiplication becomes Z = R × X = (r × x1 , r × x2 , r × x3 ). Powers are also easily computed as follows: Z = X n = (w1 , w2 , w3 ), where w1 = min x1n , x2n , x3n , w2 = x2n ,
w3 = max x1n , x2n , x3n . Readers can find examples in Dempster (1969, page 113; and 1974, page 188). Quantile arithmetic is commutative but not associative. Real numbers, 0, and 1, are the additive and multiplicative identities. However, in general, quantile arithmetic is not even subdistributive (as is interval arithmetic). Moreover, in general, quantile arithmetic is not inclusion monotonic for all arithmetic operations. On the other hand, if f (x1 , . . . , xn ) is a rational expression, the corresponding quantile expression F (X1 , . . . , Xn ) has the following important enclosure property: f (x1 , . . . , xn ) ⊆ F (X1 , . . . , Xn ). The above is related to distribution arithmetic, which is the topic of Section IV.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
109
e. Other Interval Arithmetics. A brief mention is made of other types of interval arithmetics and an interval-like arithmetic for the sake of completeness. These were developed to deal with dynamic modeling problems. Ellipsoid Arithmetic. The ellipsoidal arithmetic of Neumaier (1993) is based on approximating enclosing affine transformations of ellipsoids that are again contained in an ellipsoid. The focus of the article is in enclosing solutions to dynamic systems models where the wrapping effect associated with interval (n-dimensional hyperboxes, n ≥ 2) enclosures may severely hamper their usefulness since boxes parallel to the axes are not the optimal geometric shape to minimize bounds (Guderley and Keller, 1972). A second focus of Neumaier’s article is enclosing confidence limits. Kahan (1968a) shows how to compute the “tightest” ellipsoid enclosure of the intersection of two ellipsoids. These tightest ellipsoid enclosures are the underlying basis of the approximations developed in Neumaier (1993). It is clear that computing with ellipsoids is not simple; therefore, a simple approximation is necessary if the method is to be useful. While the sum and product of ellipses are not found explicitly worked out by Neumaier, they are implicit. Enclosing the sum is straightforward. The difference, product, and quotient require approximations. Variable Precision Interval Arithmetic. Variable precision interval arithmetic [Ely (1993a), Moore (1992), and more recently Revol and Rouillier (2005) and Schulte and Swartzlander (2000)] was developed to enclose solutions to problems in computational mathematics requiring more precision than afforded by the usual floating-point arithmetic (single and double precision, for example). A problem in this category is wind shear (vortex) modeling (Ely and Baker, 1993b). A specialized interval arithmetic has been developed both in software (Ely, 1993a) and in hardware (Schulte and Swartzlander, 2000). The Taylor model arithmetic (Makino and Berz, 2003, 2005) may be considered as a variable precision interval arithmetic. 4. Comparison Between the Axiomatic and Extension Principle Approach to Interval Arithmetic The axiomatic approach to interval arithmetic considers an interval as a number with two components, whereas constraint interval arithmetic considers an interval as a set. The set point of view is the one taken by interval arithmetic that uses containment sets (Pryce and Corliss, 2006). In considering an interval as a number, interval arithmetic defines the operations axiomatically. Axiomatic interval arithmetic is simple and straightforward since it is defined through real number operations that are no less than twice, and at most four times, more complex than the corresponding real number operations. What
110
LODWICK
follows is an arithmetic that does not have additive or multiplicative inverses and is subdistributive, potentially resulting in overestimation. Exponential complexity arises in attempting to reduce overestimations. An interval considered as a set leads to an arithmetic defined through global optimization of the united extension function of the arithmetic operations. Thus, constraint interval arithmetic requires a procedure rather than an arithmetic operation. The complexity is explicit at the onset and potentially NP-Hard. Nevertheless, the algebraic structure of constraint interval arithmetic not only possesses additive and multiplicative inverses, but is also distributive. It may not be easily implemented because the arithmetic is a global optimization. Extended interval arithmetic and (nonstandard) directed interval arithmetic add axioms so that interval arithmetic can operate on the extended real number system. Generalized interval arithmetic, affine interval arithmetic, and Taylor model arithmetic deal with the dependency problem. Triplex arithmetic and quantile arithmetic carry more information in their representation so that the propagated value that results from arithmetic carries more information than just the support (end points of the interval). The specialized arithmetics (ellipsoid, variable precision, range, and rational) deal with a different representation of uncertainty. The current implementations and uses of ellipsoid and variable precision arithmetics are tailored to deal with specific problem domains. C. Enclosure and Verification Enclosure and verification methods are approaches to problems in computational mathematics in which solutions are returned with automatically computed (computer-generated) bounds (enclosures). If the enclosure is nonempty, the goal is to verify existence and uniqueness where possible. Three different approaches to enclosure methods are presented here: 1. Range of a function methods compute an upper bound to a maximum and a lower bound to the minimum of a continuous function by using rounded interval arithmetic (Alefeld, 1990; Hansen, 1980, 1992; Makino and Berz, 2003, 2005; Ratschek and Rokne, 1988). 2. Epsilon inflation methods (Kaucher and Rump, 1982) compute an approximate solution, inflate the approximate to form an interval, and compute the range according to the above. 3. Defect correction methods (Böhmer et al., 1984) compute an approximate inverse to the problem. If the approximate inverse composed with the given function is contractive, then iterative methods are guaranteed to converge to a solution, and mathematically correct error bounds on the solution can be computed.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
111
The naive approach to computing the range of a rational function is to replace every algebraic operation by axiomatic interval arithmetic operations. This works in theory for continuous function with unions of smaller and smaller boxes whose diameters approach 0. However, this approach has exponential complexity. Authors have found a variety of methods to obtain the range of a function (see Hansen, 1992; Makino and Berz, 2003, 2005; Neumaier, 2004b). The meaning of enclosure and verification in the context of interval analysis is discussed next. Definition 6. The enclosure of a set of real numbers (real vectors) Y is a set of real numbers (real vectors) X such that Y ⊆ X. In this case, X encloses Y . The set X is called the enclosing set. Enclosure makes sense when Y is an unknown, but for which bounds on its values are sought. For example, the set Y could be the set of solutions to a mathematical problem. In the case of interval analysis over R, the enclosing set X is a computed interval. Typically, approximating algorithms return a real number (vector) approximation x˜ as the computed value of the unknown solution y with no sense of the quality of the solution, that is, its error bounds. The idea of enclosure is that mathematically valid computed error bounds, ¯ on the solution are provided. If the approximation to the Y ⊆ X = [x, x], x+x¯ x−x ¯ solution is x˜ = 2 , the maximal error is guaranteed to be errormax = 2 . If we are dealing with functions, there are only two pertinent cases. 1. The first is the enclosure of the range of a function in a box, that is, Y = {f (x) | x ∈ Domain} ⊆ X, where X is a box. 2. The second case is pointwise enclosure, that is, [g(x), h(x)] encloses the function f (x) pointwise if g(x) ≤ f (x) ≤ h(x) ∀x ∈ Domain. The methods to (efficiently) obtain g(x) and h(x) are found in Section IV. Researchers do not give a definition to “enclosure methods,” since the word enclosure itself seems to denote its definition. In fact, Alefeld (1990) states, In this paper we do not try to give a precise definition of what we mean by an enclosure method. Instead we first recall that the four basic interval operations include the range of values of rational functions. Using more appropriate tools, the range of more general functions can be included. Since all enclosures methods for solution of equations which are based on interval arithmetic tools are finally enclosures methods for the range of some function, we concentrate ourselves on methods for the inclusion of the range of function.
112
LODWICK
There is an intimate relation between enclosure and inclusion of the range of functions. However, enclosure for this study is more general than that to which Alefeld (1990) limits himself, since we deal with epsilon inflation and defect correction methods in addition to finding the range of a function. The concept of verification for this study is restricted to the context of computed solutions to problems in continuous mathematics. Verification is defined next. Definition 7. Verification of solutions to a problem in continuous mathematics of Rn is the construction of a box X that encloses the solutions of the problem in a given domain where, for X = ∅, at least one solution exists, and for X = ∅, no solution exists in the given domain of the problem. Thus, verification includes the existence of solutions and the computability of enclosures. In particular, when the construction of the verified solution is carried out on a computer, the enclosures are mathematically valid enclosures whose end points are floating-point numbers. Note that even if a mathematical analysis results in X = ∅, it may still fail to contain a solution. For example, the computed range of a function may contain 0, but fail to have a solution to the problem f (x) = 0. Thus, enclosure may not mean verification. The literature often uses the term validation to mean what we have defined as verification. Methods that compute verified solutions and verify uniqueness are also called E-methods by Kaucher and Rump (1982), and these methods are applied to solutions of fixed-point problems f (x) = x. The authors develop methods to solve linear equations by E-methods. Many authors, in the context of verifying solutions to equations, use the word proof (see, for example, Section 2 of Kearfott, 1996b). While the mathematical verification of existence (and perhaps uniqueness) is a type of proof, for this monograph, the mathematical confirmation that the hypotheses of a theorem (say the Brouwer fixed-point theorem) hold is what we mean by verification. Nevertheless, Kearfott (1996b) states on page 3, A powerful aspect of interval computations is tied to the Brouwer fixedpoint Theorem. Theorem A (Brouwer fixed point-Theorem—see any elementary text on real analysis or Neumaier (1990), page 200). Let D be a convex and compact subset of Rn with int(D) = ∅. Then every continuous mapping G : D → D has at least one fixed point x ∗ ∈ D, that is, a point with x ∗ = G(x ∗ ). The Brouwer fixed-point theorem combined with interval arithmetic enables numerical verification of existence of solutions to linear and
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
113
nonlinear systems. The simplest context in which this can be explained is the one-dimensional interval Newton method. ¯ → R has continuous first derivative on x, xˇ ∈ x, Suppose f : x = [x, x] and f (x) is a set that contains the range of f over x (such as when f is evaluated at x with interval arithmetic). Then the operator N(f ; x, x) ˇ = xˇ − f (x)/f ˇ (x)
(39)
is termed the univariate interval Newton method. . . . Applying the Brouwer fixed-point theorem in the context of the univariate interval Newton method leads to: Theorem B. If N(f ; x, x) ˇ ⊂ x, then there exists a unique solution to f (x) = 0 in x. Existence in Theorem B follows from Miranda’s theorem, a corollary of the Brouwer fixed-point theorem. Three types of verification occur in practice. These are, as mentioned, (1) enclosure of the range of a function or global optimization, (2) epsilon inflation, and (3) defect correction. 1. Enclosure of the Range of a Function The enclosure of the range of a function using interval arithmetic most often assumes that a function is continuous. Thus, as long as rounded interval arithmetic is used, the resulting enclosure is verifiable to be correct (Hansen, 1992; Moore, 1966; Neumaier, 1990). Uniqueness can also be verified mathematically on a computer using methods outlined in Hansen (1992), Kearfott (1996a), Makino and Berz (2003, 2005), Neumaier (1990), or Neumaier (2004b). This article does not elaborate further on interval methods to obtain the range of a function since these methods are well represented in the literature except to point out that interval arithmetic and interval analysis have been used to compute tight constraints in artificial intelligent systems for constraint propagation. The interfaces between constraint propagation and interval analysis can be found in Lodwick (1989). It is noted that more modern methods for global optimization do use constraint propagation methods such as those found in Lodwick (1989). 2. Epsilon Inflation Epsilon inflation methods are approaches for the verification of solutions to the problem f (x) = 0 using two steps: (1) application of a usual numerical method to solve f (x) = 0 to obtain an approximate solution x, ˆ
114
LODWICK
= [xˆ − , xˆ + ], and (2) inflation of xˆ to obtain an approximate interval X and application of interval methods using rounded interval arithmetic (for example, interval Newton’s method) to obtain an enclosure. Mayer (1996, page 98) outlines how to solve problems through E-methods using epsiloninflation techniques to solve f (x) = 0, where the function is assumed to be continuous over its defined domain. The idea is to solve the problem on a closed and bounded subset of its domain using the following steps: 1. Transform the problem into an equivalent fixed-point problem, f (x) = 0 ⇔ g(x) = x. 2. Solve the fixed point for an approximate solution x˜ using a known algorithm. That is, g(x) ˜ ≈ x. ˜ 3. Identify an interval function enclosure to the fixed-point representation of the problem, g(x) ∈ [G] [x] ∀x ∈ [x], where [x] is in the domain of both g and [G]. For example,
. [G] [x] = min G(y), max G(y) y∈[x]
4. Verify
y∈[x]
[G] [x] ⊆ interior[x]
by doing the following: ˜ x] ˜ (a) [x]0 := [x, (b) k = −1 (c) repeat i. k := k + 1 ii. choose [x]k such that [x]k ⊆ interior([x]k )—this is the epsiloninflation where [x] is defined below. iii. [x]k+1 := [G]([x]k ) (d) until [x]k+1 ⊆ interior([x]k ) or k > kmax A variety of methods can be used to pick the epsilon-inflation. In particular, Mayer (1995) uses the following: [x] = (1 + )[x] − [x] + [−η, η] where η is the smallest floating-point number (machine epsilon). Another approach is as follows: ¯ := (1 + )[x] − [x], [y] = [y, y]
¯ [x] := pred(y), succ(y)
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
115
where pred(y) denotes the first floating-point number equal to or less than y (round down) and succ(y) ¯ denotes the first floating-point number equal to or greater than y¯ (round up). The value = 0.1 has been used as an initial guess. 3. Defect Correction Defect correction methods (Böhmer et al., 1984) solve the fixed-point problem f (x) = x by computing an approximate inverse in such a way that the approximate inverse acting on the original operator is contractive. This approach is then used in conjunction with verification (Kaucher and Rump, 1982), for example, when they are used in conjunction with epsilon-inflation and/or range enclosure outlined above. The general defect method as stated by Böhmer et al. (1984, page 3): Solve F z = y,
(40)
⊂ E is a bijective continuous, generally where F : D ⊂ E → D are Banach spaces. The domain and range are nonlinear operator; E, E there exists exactly one defined appropriately so that for every y˜ ∈ D solution of F z = y. ˜ The (unique) solution to Eq. (40) is denoted z∗ . Assume that Eq. (40) cannot be solved directly, but the defect (also called the residual in other contexts) d(˜z) := F z˜ − y
(41)
may be evaluated for “approximate solutions” z˜ ∈ D. Further assume that the approximate problem F z = y˜
(42)
That is, we can evaluate the solution can be readily solved for y˜ ∈ D. :D → D is an approximate inverse of F such operator G of Eq. (42). G that (in some approximate sense) GF z˜ = z˜
for z˜ ∈ D
(43)
F Gy˜ = y˜
for y˜ ∈ D.
(44)
and
Assume that an approximation z˜ ∈ D to z∗ is known and the defect d(˜z) [Eq. (41)] has been computed. There are, in general, two ways to compute another (hopefully better) approximation z¯ to z˜ by solving Eq. (42):
116
LODWICK
1. Compute a change z in Eq. (42) with the right-hand side being the defect, d(˜z), and then use z as a correction for z˜ . That is,
z¯ := z˜ − z = z˜ − G y + d(˜z) − Gy (45) z¯ := z˜ − GF z˜ + Gy. This assumes that the approximate inverse, G, is linear, that is, G(y + d(˜z)) = Gy + G(F z˜ − y) = GF z˜ . 2. Use the known approximate solution z˜ in Eq. (42) to compute y. ˜ Now change this value by the defect to obtain y¯ = y˜ − d(˜z). Use the approximate inverse and solve using y. ¯ That is, y¯ := y˜ − d(˜z) = y˜ − (F z˜ − y) = y˜ − F Gy˜ + y, since y˜ = F z˜ , so that from Eq. (43) Gy˜ = GF z˜ = z˜ , that is, z˜ = F Gy. ˜ Now, the new approximation to z˜ becomes
z¯ = Gy¯ = G (F − F )˜z + y , (46) where again, we must assume that the inverse operator G is linear. The success of the defect correction, steps 1 [Eq. (45)], or 2 [Eq. (46)], depends on the contractivity of the operators (I − GF ) : D → D, or → D, (I − F G) : D respectively, since Eq. (45) implies z¯ − z∗ = (I − GF )˜z − (I − GF )z∗ , whereas Eq. (46) implies y¯ − y ∗ = (I − F G)y˜ − (I − F G)y ∗ . The associated iterative algorithm is (Stetter, 1978): DEFECT CORRECTION 1 [Eq. (45)] zk+1 = zk − GF zk + Gy
(47)
DEFECT CORRECTION 2 [Eq. (46)] yk+1 = yk − F Gyk + y zk+1
zk = Gyk
= G (F − F )zk + y .
(48)
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
117
D. Algorithms and Software The practical issues of using intervals to solve mathematical problems must include the following: (1) how to handle dependencies (repeated occurrences of the same variable), (2) how to obtain enclosures when carried out on a particular computer in such a way that they are mathematically correct, and (3) efficient algorithms and computer structures in software systems. Methods have been presented that deal with dependencies, enclosures, verifiability, and computability. For example, iterative methods are particularly well suited to deal with computability problems. Approximate global optimization techniques, even NP-Hard, can obtain good approximations using constraint propagation and other interval techniques (Neumaier, 2004b). As mentioned in Corliss (2004, pages 35–36), quality production packages that incorporate some or all the above issues include: 1. INTLAB, which is a Matlab interval analysis system that can be downloaded from http://www.ti3.tu-harburg.de/~rump/intlab/index.html 2. PROFIL/BIAS (Programmer’s Runtime Optimized Fast Interval Library/Basic Interval Arithmetic Subroutines) for Matlab from TU Hamburg-Harburg (http://www.ti3.tu-harburg.de/Software/ PROFILEnglisch.html). 3. Fortran 95 and C++ from Sun Microsystems (wwws.sun.com/software/ sundev/suncc/index.html). Moreover, interval arithmetic as a native type has been on SPARC/Solaris since about 2000. It is free (Sun Studio 11 release) and can be obtained at http://developers.sun.com/prodtech/cc/ index.jsp. Documentation can be downloaded from http://docs.sun.com/ app/docs/doc/819-3695. The C++ documentation can be downloaded from http://docs.sun.com/app/docs/doc/819-3696. 4. C-XSC, Pascal-XSC packages from TU Karlruhe at www.uni-karlsruhe. de/~iam/html/language/xsc-sprachen.html 5. FI-LIB, the authors, Hofschuster and Krämer, state that FI_LIB is a “fast interval library (version 1.2) in ANSI-C. . . ” whose main features of the library, called fi_lib (fast interval library) are: (a) Fast table look-up algorithms are used for the basic functions, such arctan, exp, or log. (b) All elementary function routines are supplied with reliable relative error bounds of high quality. The error estimates cover rounding errors, introduced by not exactly representable constants, as well as approximation errors (best approximations with reliable error bounds). (c) All error estimates are reliable worst-case estimates, which have been derived using interval methods.
118
6.
7.
8.
9. 10.
LODWICK
(d) We only insist on faithful computer arithmetic. The routines do not manipulate the rounding mode of basic operations (setting the rounding mode may be rather expensive). (e) No higher-precision internal data format is used. All computations are done using the IEEE double format (64-bit). (f) A C++ interface for easier use is also supplied with the library. (g) For good portability, all programs are written in ANSI-C. FILIB++ (http://www.math.uni-wuppertal.de/wrswt/software/filib.html), FILIB++ Interval Library whose authors are Lerch, Tischler, Wolff von Gudenberg, Hofschuster, and Krämer. They say that, “filib++ is an extension of the interval library filib. The most important aim of the latter was the fast computation of guaranteed bounds for interval versions of a comprehensive set of elementary function. filib++ extends this library in two aspects. First, it adds a second mode, the “extended” mode, that extends the exception-free computation mode using special values to represent infinities and NotaNumber known from the IEEE floating-point standard 754 to intervals. In this mode so-called containment sets are computed to enclose the topological closure of a range of a function defined over an interval. Second, state-of-the-art design uses templates and trait classes in order to get an efficient, easily extendable and portable library, fully according to the C++ standard.” Maple (see http://www.maplesoft.com) and Mathematica (see http:// www.wri.com) support interval structures. Wagon (Bornemann et al., 2004) and colleagues solved a suite of challenge problems using Mathematica’s interval subroutines. COSY, from Michigan State University, is a language for verified global optimization and ordinary differential equation solving, based on intervals as well as Taylor models with remainder bounds (see http://www. beamtheory.nscl.msu.edu/cosy). The interval analysis website (http://www.cs.utep.edu/interval-comp/ intlang.html), has a list of languages that support interval data types. Global optimization systems that use interval analysis are found in the software developed by the COCONUT project. “COCONUT is an IST Project funded by the European Union. Its goal is to integrate the currently available techniques from mathematical programming, constraint programming, and interval analysis into a single discipline, to get algorithms for global optimization and continuous constraint satisfaction problems that outperform the current generation of algorithms based on using only techniques from one or two of the traditions.” (See http://www. mat.univie.ac.at/users/neum/public_html/glopt/coconut/.) Another global optimization solver, GLOBSOV, can be downloaded from http://interval. louisiana.edu/GlobSol/download_GlobSol.html.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
119
11. UniCalc (see http://archives.math.utk.edu/software/msdos/miscellaneous/ unicalc/) is another system that uses intervals. Its website states that, “UniCalc is a solver for mathematical problems using novel mathematical tools for calculations. This approach makes it possible to solve principally new classes of problems and to obtain new results for problems solved with standard methods of calculations. UniCalc is intended to solve direct and inverse problems represented by systems of algebraic and differential equations, inequalities, and logic expressions. The system may be overdetermined or underdetermined, and the system’s parameters can be given imprecisely. UniCalc allows calculations to be performed both with integers and real variables that may be mixed in the system. As a result of calculations, a set of intervals containing all real solutions of the system is found.”
III. F UZZY S ET T HEORY Fuzzy set and possibility theory were defined and developed by Zadeh beginning in 1968 with subsequent articles in 1968 and 1975. As is now well known, the idea was to mathematize and develop analytical tools to solve problems whose uncertainty was more ample in scope than probability theory. Classical mathematical sets—for example, a set A—have the property that either an element x ∈ A, or x ∈ / A, but not both. There are no other possibilities for classical sets, which are also called crisp sets. An interval is a classical set. Zadeh’s idea was to relax this “all-or-nothing” membership in a set to allow for grades of belonging to a set. When grades of belonging are used, a fuzzy set ensues. To each fuzzy set, A, Zadeh associated a realvalued function μA (x), called a membership function, for all x in the domain of interest, the universe Ω, whose range is in the interval [0, 1] that describes, quantifies the degree to which x belongs to A. For example, if A is the fuzzy set “middle-aged person,” then a 15-year-old has a membership value of zero whereas a 35-year-old might have a membership value of 1, and a 40-year-old might have a membership value of 1/2. That is, a fuzzy set is a set for which membership in the set is defined by its membership function μA (x) : Ω → [0, 1] where a value of 0 means that an element does not belong to the set A with certainty and a value of 1 means that the element belongs to the set A with certainty. Intermediate values indicate the degree to which an element belongs to the set. Using this definition, a classical (so-called crisp) set A is a set whose membership function has a range that is binary, that is, / A, and μA (x) = 1 μA (x) : Ω → {0, 1}, where μA (x) = 0 means that x ∈ means x ∈ A. This membership function for a crisp set A is, of course, the characteristic function. So a fuzzy set can be thought of as being one that has
120
LODWICK
a generalized characteristic function that admits values in [0, 1] and not just two values {0, 1} and is uniquely defined by its membership function. Another way of looking at a fuzzy set is as a set in R2 as follows. Definition 8. A fuzzy set A, as a crisp set in R2 , is the set of ordered pairs A = x, μA (x) ⊆ (−∞, ∞) × [0, 1] . (49) Some of the earliest people to recognize the relationship between interval analysis and fuzzy set theory were Nguyen (1978), implicitly, Dubois and Prade (1980, 1981), and Kaufmann and Gupta (1985), explicitly. In particular, Dubois and Prade (1987a, 1991), Dubois et al. (2000b), Fortin et al. (2006) deal specifically with interval analysis and its relationship with fuzzy set theory. In Dubois et al. (2000b), it is shown that, . . . set-inclusive monotonicity, as given by R.E. Moore (see Moore, 1966, 1979), holds for fuzzy quantities. That is, for fuzzy sets A and B, A⊆B
⇒
f (A) ⊆ f (B).
This crucial result just reminds us that when the operands become more imprecise, the precision of the result cannot but diminish. Due to its close relationship to interval analysis, the calculus of fuzzy quantities is clearly pessimistic about precision, since f (A1 , A2 ) is the largest fuzzy set in the sense of fuzzy set inclusion, that is, A⊆B
⇐⇒
μA (x) ≤ μB (x),
∀x.
Much has been written about fuzzy sets that can be found in standard textbooks (Klir and Yuan, 1995); this material is not repeated here. We present only the ideas that are pertinent to the areas in the interfaces between interval and fuzzy analysis of interest. Given that the primary interest is in the relationships between real-valued interval and fuzzy analysis, we restrict our fuzzy sets to a real-valued universe, Ω ⊆ R, whose membership functions, fuzzy numbers, or fuzzy intervals, are defined below. Definition 9. A modal value of a membership function is a domain value at which the membership function is one. A fuzzy set with at least one modal value is called normal. The support of a membership function is the closure of {x | μA (x) > 0}. Definition 10 (Fortin et al., 2006). A fuzzy interval, M, defined by its membership function μM (·), is a fuzzy subset of the real line such that, if x, y, z ∈ R, z ∈ [x, y], then μM (z) ≥ min μM (x), μM (y) .
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
121
Like a fuzzy set, a fuzzy interval M is said to be normal if ∃x ∈ R, such that μM (x) = 1. The set {x | μM (x) = 1} is called the core of the fuzzy interval. For all that follows, fuzzy intervals will be assumed to be normal fuzzy intervals with upper semi-continuous membership functions. This means that the α-cut of a fuzzy interval, Mα = x | μM (x) ≥ α > 0 , + is a closed interval. Let M1 = {x | μM (x) = 1} = [m− 1 , m1 ] be the core of a + fuzzy interval M, where the support is M0 = {x | μM (x) > 0} = (m− 0 , m0 ). − For a fuzzy interval, M, μM (x), for x ∈ (−∞, m1 ] is nondecreasing, and μM (x), for x ∈ [m− 1 , ∞) is nonincreasing. The definition of fuzzy number is given next.
Definition 11. A fuzzy number is a fuzzy interval with a unique modal value, that is, the core is a singleton. The fact that we have closed intervals at each α-cut means that fuzzy arithmetic can be defined by interval arithmetic on each α-cut, as will be seen. Unbounded intervals can be handled by extended interval arithmetic. In fact, when dealing with fuzzy intervals, the operations and analysis can be considered as interval operations and analysis on α-cuts. However, a different and more recent approach is possible. Instead of considering a fuzzy number as a specialized fuzzy set over the set of real numbers, R, Dubois and Prade (2005) and Fortin et al. (2006) revise the theory of fuzzy numbers so that a (real-valued) fuzzy number is to a (real-valued) interval what a fuzzy set is to a (classical) set. To this end, what are called gradual numbers were created (Dubois and Prade, 2005; Fortin et al., 2006). Definition 12 (Fortin et al., 2006). A gradual number r˜ is defined by an assignment Ar˜ from (0, 1] to R. The interest of this article is on special assignments that are associated with fuzzy intervals. The idea will be to define a gradual number associated with a fuzzy interval by the inverse of two functions; one is the inverse of the membership function restricted to (−∞, m− 1 ], that is the inverse of − (x) = μ (x), x ∈ −∞, m μ− A 1 , A + where [m− 1 , m1 ] is the core as before, which is nonempty. The second function is the inverse of the membership function restricted to [m+ 1 , ∞), that
122
LODWICK
is, the inverse of μ+ (x) = μA (x), A
x ∈ m+ 1 ,∞ .
These inverses,
−1
−1
μ− A
(α) : (0, 1] → R
(50)
(α) : (0, 1] → R,
(51)
and μ+ A
define the gradual numbers in the context of real fuzzy intervals, which is our )−1 (α) [Eq. (50)] and interest. Thus, for a fuzzy interval, A, the functions (μ− A )−1 (α) [Eq. (51)], are special cases of this definition, and we concentrate (μ+ A on fuzzy sets that describe fuzzy intervals but with some restrictions that are specified next. Definition 13 (Fortin et al., 2006). Using the notion of gradual number, we can describe a fuzzy interval M by an ordered pair of gradual numbers ˜ + ), where m ˜ − is called the fuzzy lower bound, or left profile, and (m ˜ −, m + m ˜ is called the fuzzy upper bound, or right profile. To ensure that the left and right profiles adhere to what has been defined as a fuzzy interval (as opposed to an interval), several properties of m ˜ − and m ˜+ must hold. In particular (Fortin et al., 2006): 1. The domains of the assignment functions, Am˜ − and Am˜ + , must be in (0, 1]. 2. Am˜ − must be increasing and Am˜ + must be decreasing. ˜ + must be such that Am˜ − ≤ Am˜ + . 3. m ˜ − and m Remark. Fuzzy intervals with properties 1–3 above possess well-defined inverses that are functions. Note that an interval, [a, b], has constant assignments, that is, Am˜ − (α) = a and Am˜ + (α) = b, 0 < α ≤ 1. Since it is constant, it contains no fuzziness and is simply an interval, not a fuzzy interval. What in the literature is called a trapezoidal fuzzy number is indeed a fuzzy interval (the left profile is strictly linearly increasing and the right profile is strictly linearly decreasing)—that is, an interval has no fuzziness in the left/right profiles (they are horizontal line segments). A nonhorizontal assignment function indicates fuzziness present. The properties that are associated with gradual numbers applied to fuzzy intervals that are relevant to this study are as follows (Fortin et al., 2006):
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
123
1. Gradual numbers display nonsharpness when their assignments are not constant. 2. Gradual numbers are what Dubois and Prade (2005) call fuzzy elements. 3. Gradual numbers (an assignment) do not account for incomplete information. Pairs of gradual numbers may be used to account for incomplete information. In fact, a fuzzy interval (which does account for incomplete information) is defined as a pair of gradual numbers. 4. A gradual number is not a fuzzy set. 5. Selecting a gradual number in a fuzzy interval is the selection of an element per α-cut. That is, it is an assignment that takes one value for each α-cut.
A. Possibility and Necessity Distribution Two types of measures and distributions are associated with fuzzy sets, measures, and membership functions—possibility and necessity measures and distributions. Whereas fuzzy measures quantify the uncertainty of gradualness, possibility and necessity measures are ways to quantify the uncertainty of lack of information. Books and articles are available that develop possibility theory (see Dubois and Prade, 1980; Dubois and Prade, 2000a; Klir and Yuan, 1995; Wang and Klir, 1992). What is of interest here for fuzzy and possibilistic mathematical analysis, is called quantitative possibility theory (Dubois et al., 2000b). General possibility theory may be derived at least in any one of the following ways: 1. Through normalized fuzzy sets (see Zadeh, 1975). 2. Axiomatically, from fuzzy measures g that satisfy (Dubois and Prade, 1988; Klir and Yuan, 1995) g(A ∪ B) = max g(A), g(B) . 3. Through the belief functions of Dempster–Shafer theory, whose focal elements are normalized and nested (Klir and Yuan, 1995). 4. By construction, through nested sets with normalization; for example, nested α-level sets (Jamison and Lodwick, 2002). The third and fourth approaches are of special interest since they lead directly into a quantitative possibility theory, although the first points the way and lays the foundation. It will be assumed that the possibilistic/necessity measures and distributions that are used herein are constructed according to the third or fourth approaches.
124
LODWICK
B. Semantics of Fuzzy Sets and Possibility and Necessity Distributions Confusion often exists about the differences between fuzzy and possibilistic analysis. Fuzzy and possibilistic entities have a different development from the first principles as noted previously. Moreover, they have different meanings, or semantics. Fuzzy and possibility uncertainty model different entities. Fuzzy entities, as is well known, are sets with nonsharp boundaries in which there is a transition between elements that belong and elements that do not belong to the set. Possibilistic entities are entities that exist, but the evidence associated with whether a particular element is the entity or not is incomplete or hard to obtain. Quantitative possibility distributions constructed from first principles require nested sets (Jamison and Lodwick, 2002) and normalization. Possibility distributions are normalized since their semantics are tied to existent entities. Normalization is not required of fuzzy membership functions. Thus, not all fuzzy sets can give rise to possibility distributions. That is, even though Zadeh’s original development of possibility theory was derived from fuzzy sets, possibility theory is different from fuzzy set theory. Possibilistic distributions (of fuzzy numbers) encapsulate the best estimate of the possible values of an entity given the available information. Fuzzy membership function values (of fuzzy numbers) describe the degree to which an entity is that value. Note that if the possibility distribution at x is 1, this signifies that the best evidence available indicates x is the entity that the distribution describes. On the other hand, if the fuzzy membership function value at x is 1, x is certainly the value of the entity that the fuzzy set describes. Thus, the nature of mathematical analysis, in the presence of fuzzy and possibilistic uncertainties, is quite different semantically. The most general form of possibility theory (the first and second approaches to possibility theory listed above) establishes an order among variables with respect to the potential of their being an entity. The magnitudes associated with this ordering have no significance other than an indication of order. Thus, if possibilityA (x) = 0.75 and possibilityA (y) = 0.25, all that can be said is that the evidence is stronger that x is the entity A than y. One cannot conclude that x is three times more likely to be A than y is. This means that for mathematical analysis, if the possibility distributions were constructed using the most general assumptions, comparisons among several distributions are restricted (to merely order). For the most general possibility theory, setting the possibility level to be greater than or equal to a certain fixed value α, 0 ≤ α ≤ 1, does not have the same meaning as setting a probability to be at least α. In the former case, the α has no inherent meaning (other than if one has a β > α, one prefers the decision that generated β to that which
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
125
generated α), whereas in the latter, the value of α is meaningful. The third and fourth derivations of possibility theory lead to quantitative possibility theory. An alternative approach to possibility theory is as a system of lower and upper distributions bounding a given, yet unknown probability. That is, given a measurable set A, Nec(A) ≤ prob(A) ≤ Pos(A) bounds the unknown probability of the event A so that Pos(A) ≤ α guarantees that prob(A) ≤ α. If the possibilistic entities are constructed from this perspective, then their α-levels are numerically meaningful beyond simply being an ordering. This is the method developed in Jamison and Lodwick (2002). C. Fuzzy Extension Principles Fuzzy extension principles show how to transform real-valued functions into functions of fuzzy sets. The meaning of the associated arithmetic depends directly on the extension principle in force, since arithmetic operations are (continuous) functions over the reals, assuming division by 0 is not allowed, and over the extended reals (Hansen, 1975) when division by 0 is allowed. The fuzzy arithmetic coming from Zadeh’s extension principle (Zadeh, 1965), and its relationship to interval analysis, has an extensive development (see, for example, Kaufmann and Gupta, 1985). Moreover, there is an intimate interrelationship between the extension principle being used and the analysis that ensues. For example, in optimization, the manner in which union and intersection are extended via t-norms and t-conorms will determine the constraint sets so that the way trade-offs among decisions is made is captured (see Kaymak and Sousa, 2003). The extension principle in the context of fuzzy set theory was first proposed, developed, and defined in Zadeh (1965, 1975). Definition 14 (Extension Principle of Zadeh, 1965, 1975). Given a realvalued function, f : X → Y , the function over fuzzy sets F : S(X) → S(Y ), where S(X) [respectively, S(Y )] is the set of all fuzzy sets of X (respectively, Y ) is given by μF (A) (y) = sup μA (x) | y = f (x) (52) for all fuzzy subsets A of S(X). In particular, if (X1 , . . . , Xn ) is a vector of fuzzy intervals, and f (x1 , . . . , xn ) a real-valued function, then μF (X1 ,...,Xn ) (y) =
sup (x1 ,...,xn )∈(X1 ,...,Xn )
min μXi (xi ) | i = 1, . . . , n i
and y = f (x1 , . . . , xn ) . (53)
126
LODWICK
This definition [Eq. (53)] of the extension principle has led to axiomatic fuzzy arithmetic (corresponding to axiomatic interval arithmetic). Moreover, it is one of the main mechanisms in the literature used for fuzzy interval analysis. Various researchers have dealt with the issue of the extension principle and have amplified its applicability. In his 1978 paper, Nguyen pointed out that a fuzzy set needs to be defined to be what Dubois and Prade later called a fuzzy interval (Dubois and Prade, 2005; Fortin et al., 2006) in order that
f (A, B) α = f (Aα , Bα ), where the function, f , is assumed to be continuous. In particular, Aα and Bα need to be compact (i.e., closed/bounded intervals) for each α-cut. Thus, Nguyen defined a fuzzy number as one whose membership function is upper semi-continuous and for which the closure of the support is compact. In this case, the α-cuts generated are closed and bounded (compact) sets, that is, real-valued intervals. This is a well-known result in real analysis; that is, when f is continuous, the decomposition by α-cuts can be used to compute f (X1 , . . . , Xn ) through interval analysis by Nguyen (1978) as
f (X1 , . . . , Xn ) α = f [X1 ]α , . . . , [Xn ]α . It should be noted that, considering a fuzzy interval as a particular pair of gradual numbers (left and right profiles), the extension principle may be accomplished without using α-cuts. Yager (1986) pointed out that, by looking at functions as graphs (in the Euclidean plane), the extension principle could be extended to include all graphs, thus allowing for analysis of what he calls “nondeterministic” mappings, that is, graphs that are not functions. Now, “nondeterminism,” as used by Yager, can be considered as point-to-set mappings. Thus, Yager implicitly restores the extension principle to a more general setting of pointto-set mappings. Ramik (1986) points out that we can restore Zadeh’s extension principle to its most general setting of set-to-set mappings explicitly. In fact, a fuzzy mapping is indeed a set-to-set mapping. He defines the image of a fuzzy setto-set mapping as being the set of α’s generated by the function on the α-cuts of domain. Finally, Lin’s paper (Lin, 2005) is concerned with determining the function space in which the fuzzy set generated by the extension principle “lives”; that is, the extension principle generates the resultant membership function in the range space. Suppose one is interested in stable controls; one way to extend is to generate resultant (range space) membership functions that are continuous. The definition of continuous function states that small perturbations in the input (i.e., domain) cause small perturbations in the output (i.e., range), which
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
127
is one way to view the definition of stability. Lin points out conditions that are necessary in order that range membership functions have some desired characteristics (such as continuity or smoothness). These extension principles express how to define functions over fuzzy sets, so that the resulting range has various properties of interest and defining what may be done in the space of where the extension sends the fuzzy set through the function as dictated by the extension principle itself. 1. Fuzzy Arithmetic Fuzzy arithmetic was, like interval arithmetic, derived from the extension principle of Zadeh (1965). Nahmias (1978) defined fuzzy arithmetic via the fuzzy convolution as follows: 1. Addition: μZ=X+Y (z) = sup minx {μX (x), μY (z − x)}, where z = x + y. 2. Subtraction: μZ=X−Y (z) = sup minx {μX (x), μY (x − z)}, where z = x − y. 3. Multiplication: μZ=X×Y (z) = sup minx {μX (x), μY (z/x)}, where z = x × y. 4. Division: μZ=X÷Y (z) = sup minx {μX (x), μY (x/z)}, where z = x ÷ y. The arithmetic of fuzzy entities was originally conceived by the above definition. When the extension principle of Zadeh (1965) was applied to 1–4 above, assuming that the fuzzy entities involved were noninteractive (independent), what has come to be known as fuzzy arithmetic followed. At least by 1975, deriving fuzzy arithmetic as interval arithmetic on α-cuts had been described from the work of Negoita and Ralescu (1975). Also, by 1975, Zadeh (1975) had already used the extension principle to define fuzzy arithmetic. Combining these two research streams (the extension principle and arithmetic on α-cuts to define arithmetic on fuzzy numbers), fuzzy arithmetic developed into interval arithmetic on α-cuts (Dubois and Prade, 1977, 1978, 1980, 1981). Moore is explicitly mentioned in Dubois and Prade (1980). Much of what occurred to interval arithmetic occurred to fuzzy arithmetic— its roots in the extension principle were eliminated when given noninteraction (independence), and the axioms for arithmetic using Nguyen (1978) and requiring membership functions to be lower/upper semi-continuous. Definition 15. A function f : R → R is upper semi-continuous at x0 if lim sup f (x) ≤ f (x0 ). x→x0
Fuzzy arithmetic using interval arithmetic on α-cuts saw its full development in 1985 (Kaufmann and Gupta, 1985). In fact, 1985 was the year
128
LODWICK
that R.E. Moore gave a plenary talk at the first International Fuzzy System Association Congress in Mallorca, Spain. a. Axiomatic Fuzzy Arithmetic. The fuzzy arithmetic developed by Kaufmann and Gupta (1985) is taken as the standard approach, while a more recent approach is found in Hanss (2005). What is needed is the fact determined by its α-cuts, A = ! that a− fuzzy +interval is uniquely − [μ (α), μ (α)], where μ (α) and μ+ (α) are the left/right end α∈(0,1] A A A A points of the α-cuts of the fuzzy set A. In particular, for fuzzy intervals we have:
− + + A+B = μ− (α), μ (α) + μ (α), μ (α) , (54) A A B B α∈(0,1]
A−B =
(α), μ+ (α) − μ− (α), μ+ (α) , μ− A A B B
(55)
α∈(0,1]
A×B =
μ− (α), μ+ (α) × μ− (α), μ+ (α) , A A B B
α∈(0,1]
A÷B =
− + (α) ÷ μ (α), μ (α) . μ−˜ (α), μ+ A B B A
(56) (57)
α∈(0,1]
For fuzzy sets whose membership functions are semi-continuous, (A ◦ B)α = (A)α ◦ (B)α ,
◦ ∈ {+, −, ×, ÷}.
Computer implementation of Eqs. (54)–(57) can be found in Anile et al. (1995). This program uses INTLAB (another downloadable system that has interval data types and runs in conjunction with MATLAB) to handle the fuzzy arithmetic on α-cuts. b. Case-Based Fuzzy Arithmetic. Klir (1997) notices, as Moore before him, that if Eqs. (54)–(57) were used, overestimations will occur. Moreover, when this approach is used, A − A = 0 and A ÷ A = 1. Klir’s idea for fuzzy arithmetic, with requisite constraints, is to do fuzzy arithmetic using constraints dictated by the context of the problem. That is, Klir defines exceptions to obtain A − A = 0 and A ÷ A = 1. c. Constraint Fuzzy Arithmetic. Klir’s (1997) approach to fuzzy arithmetic requires an a priori knowledge (through cases) of which variables are identical. Constraint fuzzy arithmetic (Lodwick, 1999) carries this information in the parameters, that is, it performs Eqs. (54)–(57) using a parameter, λx , that identifies the variable. The resulting fuzzy arithmetic derived from constraint interval arithmetic on α-cuts is essentially fuzzy arithmetic with requisite constraints of Klir without cases.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
129
d. Fuzzy Arithmetic Using Gradual Numbers (Fortin et al., 2006). The implementation of Fortin et al. (2006) as a way to perform fuzzy arithmetic uses Eqs. (50) and (51) in the following way: ⎧ − −1 (μA◦B ) (α) ⎪ ⎪ ⎪ ⎪ ⎪ )−1 (α) ◦ (μ− )−1 (α), (μ− )−1 (α) ◦ (μ+ )−1 (α), = min (μ− ⎪ ⎪ A B A B ⎪ + −1 − −1 + −1 + −1 ⎨ (μA ) (α) + (μB ) (α)(μA ) (α) ◦ (μB ) (α) A◦B = ⎪ (μ+ )−1 (α) ⎪ A◦B ⎪ ⎪ ⎪ )−1 (α) ◦ (μ− )−1 (α), (μ− )−1 (α) ◦ (μ+ )−1 (α), = max (μ− ⎪ ⎪ A B A B ⎪ ⎩ )−1 (α) + (μ− )−1 (α)(μ+ )−1 (α) ◦ (μ+ )−1 (α) (μ+ A B A B for ◦ ∈ {+, −, ×, ÷}. Fuzzy arithmetic, using gradual numbers, can handle dependencies in many cases. e. Propagation. The application of the extension principle to the evaluation of functions can be considered as propagation of uncertainty. This matter has been explicitly studied by Lodwick and Jamison (2003b), and later by Baudrit et al. (2005). The next section considers this topic under distribution arithmetic, where the general theory for propagation of more than one of the uncertainties of interest is the last subsection of the next section, covering in theoretical terms what Lodwick and Jamison (2003b) and Baudrit et al. (2005) introduce. D. Enclosure and Verification Enclosures require the acquisition of lower and upper bounds. For fuzzy, possibilistic, and probabilistic uncertainty, lower and upper bounds are lower and upper distribution functions. Since there are distinct methods for this, and the complexity of the problem is significant, we discuss enclosure and verification for fuzzy, possibilistic, and probabilistic uncertainty as a separate section next. At the heart of enclosure and verification methods is distribution arithmetic, as interval arithmetic is at the heart of enclosure and verification in interval analysis.
IV. A NALYSIS WITH D ISTRIBUTIONS This section extends the results of the previous section to general distributions, not just intervals and fuzzy sets, for the purpose of doing mathematical analysis on entities associated with the types of uncertainties that are of interest to this monograph. Mathematical analysis that includes a mixture
130
LODWICK
of uncertainty types is also discussed. This type of analysis, containing a mixture of uncertainty types, requires a theory that is broad enough to have as specific instances all the uncertainty types of interest. To this end, the theory of interval-valued probability measures (IVPMs) (Weichselberger, 2000; Jamison and Lodwick, 2004; Lodwick and Jamison, 2006), and the theory of clouds (Neumaier, 2004a) are presented. The problem of mathematical analysis on distributions is twofold. First, it requires knowing how to do arithmetic over general distributions. Second, it requires knowing how to evaluate a function of distributions in such a way that when given the input (domain) of lower and upper distribution bounds containing the distribution, the result or output (range) will be verified. The relationship between convolutions and distribution arithmetic are mentioned. Convolutions are the definitions of distribution arithmetic and are usually computationally intractable for complex expressions. Our interest is in relating the theoretical foundations (Springer, 1979; Kaplan, 1981), in particular, copulas (Nelsen, 1995), to bounds on distributions resulting from binary operations and their relationship to interval and fuzzy arithmetic (Dubois and Prade, 1987a; Williamson and Downs, 1990a; Williams, 1990b). Arithmetic operations, of course, are binary operations. The material presented in this section is not an exhaustive survey, but consists of a few methods that are more tractable. We have already discussed enclosure and verification associated with intervals that can be considered a type of uncertainty. In the context of fuzzy, possibilistic, and probabilistic uncertainty, lower and upper distributions need to be obtained or constructed (Jamison and Lodwick, 2002). For fuzzy uncertainty, a fuzzy set as a crisp in R2 may be thought as enclosing the uncertainty that a quantity x is (or is in) A, with the x-axis being the lower distribution and the membership function being the upper distribution so that 0 ≤ uncertaintyA (x) ≤ μA (x). This was pointed out by Dubois et al. (1997). From this point of view, a fuzzy set membership function loses information at the lower bound. For possibilistic uncertainty, possibility and its dual, necessity, pairs can be developed to provide a “tighter” set of bounds on the uncertainty, if constructed appropriately (Jamison and Lodwick, 2002). The question of a general theory containing all uncertainties of interest to this monograph (interval, fuzzy, possibility, probability) was dealt with by Jamison and Lodwick (2002). In particular, Jamison and Lodwick (2004), Lodwick and Jamison (2003b, 2006) consider IVPMs as the theory behind the methods for obtaining lower and upper bounds on all of the uncertainties. Clouds (see Neumaier, 2004a) are also enclosures of uncertainty that include the uncertainty associated with probabilities, intervals, and fuzzy sets, as well as possibility. Clouds can also be viewed as IVPMs (Lodwick and Jamison, 2006). Weichselberger (2000) develops interval-valued probabilities that are
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
131
used for bounds on probability distributions. His work is extended to explicitly include intervals, fuzzy, possibility, and clouds by Jamison and Lodwick (2006) and Lodwick and Jamison (2006). When one has valid bounds on entities, whose values are characterized by probabilistic, fuzzy set, and/or possibilistic uncertainties, and these values are used in a mathematical model described by a function to verify the result, then propagation of the lower and upper distribution in a way that guarantees that the derived lower and upper bounds enclose the uncertainty, is verification. The exposition begins with arithmetic. A. Distribution Arithmetic This section begins with several approaches to distribution arithmetic that attempt to be computationally tractable. Once distribution arithmetic is developed, how to analyze algebraic functions by means of a new extension principle is presented in the section on IVPM. The problem being addressed, in the context of probability theory, is, given a real-valued function f : Rn → R, compute the cumulative distribution function of Y = f (X),
(58)
where X is a vector of independent random variables. The cumulative distribution function of Y is denoted FY . It is first assumed that, f , as a realvalued function, is continuous and monotonically increasing in each variable. It is a theoretically simple adjustment (although it may be computationally complex) to consider functions that increase in some variables and decrease in others. It is also assumed that each random variable Xi has a finite closed and bounded support [s i , s¯i ] with a marginal cumulative distribution function denoted FXi . That is, the support of Xi is
supp(Xi ) = [s i , s¯i ] = FX−1 (0), FX−1 (1) . (59) i i Given the assumption that FXi : [s i , s¯i ] → [0, 1] is one-to-one, onto denote this inverse, and and strictly increasing, it is invertible. Let FX−1 i FY : [s Y , s¯Y ] → [0, 1] denote the cumulative distribution function of Y = f (X), which is also invertible since f is monotonically increasing. The support of each random variable is a closed interval. Clearly, Y is contained in a closed and bounded interval given that the support of each Xi is a closed and bounded interval and f is continuous. The methods outlined in this section are useful in:
132
LODWICK
1. Simulations where a resulting closed form is desired [Eq. (58)]. Moreover, circumventing a Monte Carlo approach altogether may be desirable, given the complexity of the problem. 2. Optimization under uncertainty (Jamison and Lodwick, 2006; Lodwick and Jamison, 2006). 3. Risk analysis (Kaplan, 1981; Berleant and Goodman-Strauss, 1998; Jamison et al., 2002; Ferson et al., 2003; Berleant and Zhang, 2004). 4. Enclosure/verification where the idea is to enclose distributions between lower and upper envelopes (Berleant, 1993; Jamison and Lodwick, 2002, 2006; Lodwick and Jamison, 2006; Moore, 1984; Williamson and Downs, 1990a; Williams, 1990b). There are a variety of approaches to distribution arithmetic. The three types presented here are: (1) the interval convolution approach based on Williamson and Downs (1990a), Williams (1990b); (2) the interval histogram approaches, which are based on histograms (Moore, 1984; Berleant, 1993; Tonon, 2004); and (3) the inverse probability approach based on Lodwick and Jamison (2003b) and Olsen (2005). These methods are developed for cumulative distribution functions. To extend these methods to possibility, a method to construct quantitative possibility and necessity distribution pairs such as Jamison and Lodwick (2002) must be used. Given an appropriately created possibility and necessity pair, the enclosure of the pair is obtained by taking the upper function of the envelope for the possibility and the lower function of the envelope for the necessity. That is, if the constructed possibility function is p(x) and the constructed necessity function is n(x), and these functions are derived using Jamison and Lodwick (2002), then, using methods outlined below, envelopes on n(x) and p(x) are constructed, so that: gp (x) ≤ p(x) ≤ hp (x), gn (x) ≤ n(x) ≤ hn (x). The enclosure of possibility and necessity pair becomes, gn (x) ≤ n(x) ≤ p(x) ≤ hp (x). Thus, it is straightforward to extend the methods outlined in this section to possibility and necessity pairs, and thus it will not be treated separately. 1. Interval Convolution Methods Convolutions used in defining distribution arithmetic were known from the start. Following the development of Kaplan (1981), Williamson and Downs (1990a), Williams (1990b), given a general binary relationship, W = f (X, Y ),
(60)
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
133
where X and Y are independent random variables, the probability density function for the random variable resulting from the operation is given by ∞ pW (w) = −∞
Here,
f −1 (w, x)
∂ −1 pX (x)pY f −1 (w, x) f (w, x) dx. ∂w
(61)
is assumed to exist and is the unique-valued y for which w = f x, f −1 (w, x) .
In particular (Williams, 1990b), Addition ∞ pW (w) = pX (x)pY (w − x) dx.
(62)
−∞
Subtraction ∞ pW (w) =
pX (x)pY (w + x) dx.
(63)
w 1 pX (x)pY dx. |x| x
(64)
|x|pX (x)pY (wx) dx.
(65)
−∞
Multiplication ∞ pW (w) = −∞
Division ∞ pW (w) = −∞
If the distributions are dependent, then ∞ pW (w) = −∞
∂ −1 pX (x)pY f −1 (w, x)|x f (w, x) dx. ∂w
(66)
For example, in the addition of two dependent distributions, ∞ pW (w) = −∞
pX (x)pY (w − x)|x dx.
(67)
134
LODWICK
The numerical calculation of the arithmetic defined by its convolution is often computationally intensive, especially in the presence of dependencies whose precise nature (in a closed form) is usually missing. The idea is to compute with just the information associated with the marginal distribution. The error in computing as though the random variables are independent is called dependency error by Williamson and Downs (1990a) and Williams (1990b). a. Williamson and Downs. Williamson and Downs (1990a, 1990b) use enclosure to deal with dependency bounds associated with distribution arithmetic and functions of distributions. Their research not only deals with enclosure but also deals directly with dependencies. Note that constraint interval arithmetic (Lodwick, 1999) and gradual numbers (Fortin et al., 2006) also make explicit the issue of dependencies. In particular, Williamson and Downs use Fréchet bounds to enclose the joint cumulative distribution as follows: max FX (x) + FY (y) − 1, 0 ≤ FXY (x, y) ≤ min FX (x), FY (y) ∀x, y ∈ R. (68) Clearly, Eq. (68) is related to what the fuzzy literature calls a t-norm (Klir and Yuan, 1995). The bounds of Eq. (68) are tight since they are met in the cases of positive or negative dependence. Williams and Downs go on to use copulas that allow the Fréchet bounds to be written in a perhaps more useful way and whose definition is as follows. Definition 16 (Nelsen, 1995). A (two-dimensional) copula is a function C : I × I → I = [0, 1], with the following properties: (1) C(0, t) = C(t, 0) = 0 and C(1, t) = C(t, 1) = t, ∀t ∈ I , (2) C(u2 , v2 ) − C(u1 , v2 ) − C(u2 , v1 ) + C(ui , v1 ) ≥ 0, ∀u1 , u2 , v1 , v2 ∈ I , where u1 ≤ u2 and v1 ≤ v2 . Remark. C is nondecreasing in each variable and continuous (since it satisfies a Lipschitz condition) |C(u2 , v2 )−C(ui , v1 )| ≤ |u2 −u1 |+|v2 −v1 |. Definition 17 (Williams, 1990b). A connecting copula for the random variables X and Y is the copula (69) CXY (u, v) = FXY FX−1 (u), FY−1 (v) .
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
135
Remark. It is clear that a connecting copula is a copula. Moreover, FXY (u, v) = CXY FX (u), FY (v) .
(70)
Two other copulas are W (a, b) = max{a + b − 1, 0}
(71)
M(a, b) = min{a, b}.
(72)
and
Williams (1990b) mentions that, “The copula CXY contains all information regarding the dependency of X and Y . If CXY (x, y) = xy, then X and Y are independent.” Given a binary operation on two random variables Z = X ◦ Y (◦ is the binary operation) that is monotonic in both variables (places) where the marginal distributions FX and FY are known, bounds on FZ are sought (called dependency bounds by Williamson and Downs; in the language of this monograph, dependency bounds are enclosures). That is, ldb(FX , FY , ◦)(z) ≤ FZ (z) ≤ udb(FX , FY , ◦)(z),
∀z ∈ R,
where ldb and udb denote the lower dependency bound and the upper dependency bound, respectively. It has been shown by Frank et al. (1987) that for certain binary operations that include ◦ ∈ {+, −, ×, ÷}, ldb(FX , FY , ◦)(z) = sup W FX (x), FY (y) z=x◦y
and ldb(FX , FY , ◦)(z) = inf W d FX (x), FY (y) z=x◦y
where W d is called the dual copula given by W d (x, y) = x + y − W (x, y). Williamson and Downs go on to show how to compute the lower and upper dependency bounds that have order O(n2 ) where n is the number of discretization of the distributions F . Their approach has been implemented by Ferson (2002), Ferson et al. (2003), and Ferson and Kreinovich (2005), and is similar to quantile arithmetic discussed previously (Dempster, 1969, 1974; Nickel, 1969).
136
LODWICK
2. Interval Histogram Methods This section presents three interval histogram methods. Broadly speaking, interval histogram methods are based on partitioning the supports [Eq. (59)], computing histograms on the partition, and using interval arithmetic to compute an approximation to FY . a. R.E. Moore. The precursor and basis of all interval histogram methods, the two inverse probability methods (as well as the interval convolution method above), is Moore’s (1984) article. The method outlined by Moore approximates the cumulative distribution of a random variable and uses this to approximate the cumulative distribution of a function of random variable [Eq. (58)]. There is no attempt to enclose the correct cumulative distribution, although subsequent approaches, in particular, Berleant (1993), Lodwick and Jamison (2003b), and Olsen (2005), do. Given Eq. (58) along with the stated assumption, Moore constructs an approximation to the cumulative distribution FY of the function Y = f (X) in the following way. Algorithm IV.1. Moore’s Method (Moore, 1984) Step 1. Partition—The support of each random variable, Xi , is subdivided into Ni subintervals, usually of equal width. Step 2. Compute the probability of the partition—The probability (histogram) for each random variable, on each of its subintervals, is computed from the given probability density function. Step 3. Compute the approximate value of the function on each subinterval— " Use interval analysis on each of the K = ni=1 Ni combinations of the subintervals. Each instantiation yields an interval where the probability assigned to the interval is the product of marginal probabilities, and Moore assumes that the random variables are independent. Step 4. Order the resultant intervals and subdivide overlapping segments— The intervals obtained in Step 3 are ordered with respect to their left end points. Any overlapping intervals are subdivided. Step 5. Compute the probabilities on the overlapping segments—Probabilities are assigned to the subdivided intervals in proportion to their length. That is, Moore (1984) assumes that if an interval is subdivided into say, three parts, each part of the interval receives one third of the probability associated with that interval. Step 6. Assemble into one cumulative distribution function—The probability of an overlapping set of subintervals is the sum of the probabilities on each subinterval. Starting with the left-most interval, the range value of the cumulative distribution at the left end point is 0, and the right end point is probability assigned to that subinterval. Linear interpolation between the
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
137
two range values is used as an approximation. The cumulative distribution at the right end point of the second subinterval is the sum of the probability of the first subinterval and second subinterval. The process is continued. Example. Suppose Y = X1 + X2 , where 1 x ∈ [0, 1] X1 = X2 = 0 otherwise, that is, X1 and X2 are uniform independent random distributions on [0, 1], denoted U [0, 1]. The problem is to compute, using Moore’s approach, the cumulative distribution of Y . Step 1. Partition—X1 and X2 are partitioned into two subintervals each, X1 = X11 ∪ X12 = [0, 12 ] ∪ [ 12 , 1], and X2 = X21 ∪ X22 = [0, 12 ] ∪ [ 12 , 1], resulting in a partition consisting of four boxes: (1) [0, 12 ] × [0, 12 ]; (2) [ 12 , 1] × [0, 12 ]; (3) [0, 12 ] × [ 12 , 1]; (4) [ 12 , 1] × [ 12 , 1] (Figure 1). Step 2. Compute the probability of the partition—Since the probability on each subinterval is 12 , and the random variables are assumed to be independent, the probability on each box is 12 × 12 = 14 . Step 3. Compute the approximate value of the function on each subinterval— Use interval arithmetic (or constraint interval arithmetic) to compute the value of Y on each box. For box (1), Y1 = [0, 12 ] + [0, 12 ] = [0, 1], where the probability for Y1 is 14 .
F IGURE 1.
Partition of [0, 1] × [0, 1].
138
LODWICK
F IGURE 2.
Resultant intervals and overlaps.
For box (2), Y2 = [ 12 , 1] + [0, 12 ] = [ 12 , 32 ], where the probability for Y2 is 14 . For box (3), Y3 = [0, 12 ] + [ 12 , 1] = [ 12 , 32 ], where the probability for Y3 is 14 . For box (4), Y4 = [ 12 , 1] + [ 12 , 1] = [1, 2], where the probability for Y4 is 14 . Step 4. Order the resultant intervals and subdivide overlapping segments— From the computations, the intervals are ordered according to their left end points as follows: Y1 , Y2 , Y3 , Y4 , with distinct (overlapping) subintervals of [0, 12 ]—one subinterval; [ 12 , 1]—three subintervals; [1, 32 ]—three subintervals; and [ 32 , 2]—one subinterval (Figure 2). Step 5. Compute the probabilities on the overlapping segments—Since [0, 12 ] came from subdividing Y1 = [0, 1] in half, the probability on the first subinterval is 12 × 14 = 18 . The assumption that Moore makes is that the proportion of the division of the interval is the probability. In the same manner, there are three overlapping subintervals comprising [ 12 , 1] arising from half portions of Y1 , Y2 , and Y3 , each bearing probability of 12 × 14 = 18 , so that the probability on [ 12 , 1] is 38 . In the same manner, the probability on [1, 32 ] is 38 and on [ 32 , 1] is 18 (Figure 2). Step 6. Assemble into one cumulative distribution function— ⎧ 0, x<0 ⎪ ⎪ 1 ⎪ , x = 12 ⎪ ⎪ 8 ⎪ ⎨ 1 + 3 = 1, x = 1 2 Y (x) = 84 83 ⎪ + 8 = 78 , x = 32 ⎪ 8 ⎪ ⎪1 7 ⎪ ⎪ ⎩ 8 + 8 = 1, x = 2 1, x>1
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
139
with linear interpolation in between (Figure 3). Example. Suppose Y = X1 + X2 , where X1 = X2 are normal distributions 1 2 1 2 X1 (x) = √1 e− 2 x , X2 (x) = √1 e− 2 x and we consider that an interval of 2π 2π plus or minus 3 standard deviations (SDs) from the mean value of 0 (Figure 4).
F IGURE 3.
F IGURE 4.
Assembled cumulative distribution function.
Normal distribution divided in units of 1 standard deviation.
140
LODWICK
F IGURE 5.
Normal distribution in block approximation.
Step 1. Partition—X1 and X2 are partitioned into six subintervals each (Figure 5), X1 = X11 ∪ X12 ∪ X13 ∪ X14 ∪ X15 ∪ X16 X2 = X21 ∪ X22 ∪ X23 ∪ X24 ∪ X25 ∪ X26 = [−3σ, −2σ ] ∪ [−2σ, −σ ] ∪ [−σ, 0] ∪ [0, σ ] ∪ [σ, 2σ ] ∪ [2σ, 3σ ], resulting in a partition consisting of 36 boxes (1) [−3σ, −2σ ]×[−3σ, −2σ]; (2) [−3σ, −2σ ]×[−2σ, −σ ]; (3) [−3σ, −2σ ]×[−σ, 0]; (4) [−3σ, −2σ ]× [0, σ ]; (5) [−3σ, −2σ ] × [σ, 2σ ]; (6) [−3σ, −2σ ] × [2σ, 3σ ]; . . . , (36) [2σ, 3σ ] × [2σ, 3σ ] (see Figure 6). Step 2. Compute the probability of the partition—Since the probability on each subinterval is divided by equal SDs, for [−3σ, −2σ ], [−2σ, −σ ], [−σ, 0], [0, σ ], [σ, 2σ ], and [2σ, 3σ ] where the associated probabilities are 0.021, 0.136, 0.341, 0.341, 0.136, and 0.021, respectively, the probabilities on the 36 boxes are the product of all pairwise combinations of these 6 numbers. That is, for box (1) 0.021×0.021; for box (2) 0.021×0.136; for box (3) 0.021×0.341; for box (4) 0.021×0.341; for box (5) 0.021×0.136; for box (6) 0.021 × 0.021; . . . for box (36) 0.021 × 0.021. Step 3. Compute the approximate value of the function on each subinterval— For box (1), Y1 = [−3σ, −2σ ] + [−3σ, −2σ ] = [−6σ, −4σ ], where the probability for Y1 is 0.000441. For box (2), Y2 = [−3σ, −2σ ] +
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
F IGURE 6.
141
Partition of −3 standard deviations to 3 standard deviations in the plane.
[−2σ, −σ ] = [−5σ, −3σ ], where the probability for Y2 is 0.002856. For box (3), Y3 = [−3σ, −2σ ] + [−σ, 0] = [−4σ, −2σ ], where the probability for Y3 is 0.007161. For box (4), Y4 = [−3σ, −2σ ] + [0, σ ] = [−3σ, −σ ], where the probability for Y4 is 0.007161. For box (5), Y5 = [−3σ, −2σ ] + [σ, 2σ ] = [−2σ, 0], where the probability for Y5 is 0.002856. For box (6), Y6 = [−3σ, −2σ ] + [2σ, 3σ ] = [−σ, σ ], where the probability for Y6 is 0.000441. For box (7), Y7 = [−3σ, −2σ ] + [−2σ, −σ ] = [−5σ, −3σ ], where the probability for Y7 is 0.021 × 0.126 = 0.002646. Continuing, for box (36), Y36 = [2σ, 3σ ] + [2σ, 3σ ] = [4σ, 6σ ], where the probability for Y36 is 0.000441. Step 4. Order the resultant intervals and subdivide overlapping segments— From the computations, the intervals are ordered according to their left end points as follows: Y1 , Y2 , Y7 , Y3 , Y8 , Y13 , . . . , Y36 with distinct (overlapping) subintervals of [−6σ, 6σ ]. Y1 through Y36 are subdivided into two subintervals each with a width of 1 SD σ . The associated probabilities in each box are given in Figure 6. (See also Table 1.) Step 5. Compute the probabilities on the overlapping segments—Since each Yi , i = 1, . . . , 36, is halved, the probability on each subinterval of Yi is 12 of the probability on Yi . Thus, we have the following resulting probabilities: For [−6σ, −5σ ], 12 × 0.000441 = 0.0002205. For [−5σ, −4σ ], 12 × 0.000441 + 12 × 0.002856 + 12 × 0.002646 = 0.0029715. The remaining intervals are computed in the same manner.
142
LODWICK TABLE 1 Units σ Yi
Box 1 [−6, −4] Box 2 [−5, −3] Box 3 [−4, −2] Box 4 [−3, −1] Box 5 [−2, 0] Box 6 [−1, 1] Box 7 [−5, −3] Box 8 [−4, −2] Box 9 [−3, −1] Box 10 [−2, 0] Box 11 [−1, 1] Box 12 [0, 2] Box 13 [−4, −2] Box 14 [−3, −1] Box 15 [−2, 0] Box 16 [−1, 1] Box 17 [0, 2] Box 18 [1, 3] Box 19 [−3, −1] Box 20 [−2, 0] Box 21 [−1, 1] Box 22 [0, 2] Box 23 [1, 3] Box 24 [2, 4] Box 25 [−2, 0] Box 26 [−1, 1] Box 27 [0, 2] Box 28 [1, 3] Box 29 [2, 4] Box 30 [3, 5] Box 31 [−1, 1] Box 32 [0, 2] Box 33 [1, 3] Box 34 [2, 4] Box 35 [3, 5] Box 36 [4, 6]
[−6, −5] [−5, −4] [−4, −3] [−3, −2] [−2, −1] [−1, 0] [0, 1] [1, 2] [2, 3] [3, 4] [4, 5] [5, 6]
1
1 1
1
1 1
1 1
1
1 1
1 1
1 1
1
1 1
1 1
1 1
1 1
1
1 1
1 1
1 1
1 1
1 1
1
1
1 1
1 1
1 1
1 1
1 1
1
1 1
1 1
1 1
1 1
Step 6. Assemble into one cumulative distribution function— ⎧ 0 x ∈ (−∞, −6σ ] ⎪ ⎪ ⎪ ⎪ 0.0002205 x = −5σ ⎨ 0.0031921 x = −4σ Fy (x) = .. ⎪ ⎪ ⎪ ⎪ ⎩. 1 x ∈ [6σ, ∞)
1
1 1
1 1
1 1
1
1 1
1 1
1
1 1
1
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
143
with linear interpolation in between. b. Berleant. Berleant and colleagues’ idea, articulated in Berleant (1993), Berleant and Goodman-Strauss (1998), and Berleant and Zhang (2004), was to use the Moore approach (Moore, 1984) (described in the previous subsection), adding enclosure and methods to handle known and unknown dependencies. Williamson and Downs (see the above and Williamson and Downs (1990a); Williams (1990b)) also have this objective. Moreover, the method developed can compute enclosures. They are able to combine interval uncertainty with probabilistic uncertainty, which Monte Carlo cannot do. However, their method does not deal with fuzzy, possibilistic, or cloud uncertainties. The histogram method is first developed for independent variables as follows (we begin with two variables X and Y whose support is assumed to be finite). Assume that Z = X ◦ Y , where ◦ ∈ {+, −, ×, ÷}. We omit a full discussion of Berleant’s approach to enclosure since a complete discussion of the method of Lodwick and Jamison (2003b), in conjunction with Lodwick and Jamison (2006), which provides a complete theoretical framework for mixed uncertainties, is presented in the subsections that follow. Algorithm IV.2 (Berleant, 1993). Step 1. X and Y are discretized using histograms [as was done by Moore (1984)]. Each histogram bar is characterized by both an interval describing its location on the real number line and by a probability mass. Step 2. Obtain the Cartesian product (Xi , Yj ) of all histogram bars describing X and Y . Step 3. For all Cartesian products (Xi , Yj ) compute Step 3.1. Zij = Xi ◦ Yj using interval arithmetic. Step 3.2. Associate with each Zij the probability p(Zij ) = p(Xi )p(Yj ) according to p(x ∈ X) ∈ [0, 1], p(y ∈ Y ) ∈ [0, 1],
(73)
p(z ∈ Z) = p(x ∈ X)p(y ∈ Y ). Step 4. This collection of intervals with associated probabilities is used to construct the probability density function for Z as follows: Step 4.1. Partition Z into intervals on which the histogram for Z will be placed—Z = Z1 ∪ · · · ∪ ZK . Step 4.2. Calculate the mass density that will be placed onto each Zk as follows:
144
LODWICK
Step 4.2.1. Any Zij that falls entirely inside the partition gets its associated probability mass assigned to Zk . Step 4.2.2. Any Zij that overlaps Zk gets the ratio of the overlapped portion divided by its total assigned to Zk . Step 4.2.3. Drawing the histogram associated with Zk involves computing the height hk where hk =
p(Zk ) . width(Zk )
The process outlined by Berleant and colleagues is precisely Moore’s method (given above), except that the histogram for the probability density function is obtained rather than the cumulative distribution function as obtained by Moore. As with Moore, Berleant assumes that ratios of overlaps translate into the same ratio assigned to the probability given the overlap. However, Berleant and colleagues continue their analysis to include known and unknown dependencies. A problem with known dependency is illustrated as follows. Example (Berleant, 1993). Let X(x) = and
⎧ ⎨ 12 ⎩
1 4
0
x ∈ [1, 2] x ∈ [2, 4] otherwise
⎧ 1 ⎪ ⎪ ⎨4
x ∈ [2, 3] x ∈ [3, 4] Y (x) = ⎪ ⎪ x ∈ [4, 5] ⎩ 0 otherwise. Assume that X and Y are positively correlated and Z = XY . 1 2 1 4
x ∈ [1, 2],
pX1 =
1 , 2
1 1 = , 4 4 1 11 xy ∈ [3, 8], pX1 Y2 = = , 22 4 1 xy ∈ [4, 10], pX1 Y3 = 0 = 0, 4 1 x ∈ [2, 4], pX2 = , 2 xy ∈ [2, 6],
pX1 Y1 = 1
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
145
1 = 0, 4 1 11 = , xy ∈ [6, 16], pX2 Y2 = 22 4 1 1 xy ∈ [8, 20], pX2 Y3 = 1 = , 4 4 1 y = [2, 3], pY1 = , 4 1 y = [3, 4], pY2 = , 2 1 y = [4, 5], pY3 = . 4 The two sets of probabilities pX1 Y1 , pX2 Y3 and pX1 Y3 , pX2 Y1 are as given above because we are assuming positive correlation. If the two random variables were negatively correlated, then pX1 Y1 , pX2 Y3 and pX1 Y3 , pX2 Y1 switch as follows: 1 x ∈ [1, 2], pX 1 = , 2 1 xy ∈ [2, 6], pX1 Y1 = 0 = 0, 4 1 11 = , xy ∈ [3, 8], pX1 Y2 = 22 4 1 1 xy ∈ [4, 10], pX1 Y3 = 1 = , 4 4 1 x ∈ [2, 4], pX 2 = , 2 1 1 xy ∈ [4, 12], pX2 Y1 = 1 = , 4 4 1 11 xy ∈ [6, 16], pX2 Y2 = = , 22 4 1 y ∈ [8, 20], pX2 Y3 = 0 = 0, 4 1 y = [2, 3], pY1 = , 4 1 y = [3, 4], pY2 = , 2 1 y = [4, 5], pY3 = . 4 To obtain the probability density, compute the ratios of the overlaps. xy ∈ [4, 12], pX2 Y1 = 0
146
LODWICK
Arithmetic for random variables with unknown or unspecified dependencies are handled in the following manner. All the intervals resulting from operations remain the same as when the variables are independent. Different dependency relationships imply different assignments of probability masses to the joint distribution cells, yielding different bounding curves. Berleant and colleagues wish to bound the space of all such curves (yielding an enclosure). Berleant calls these bounds that form the enclosure, dependency bounds, using the terminology of Williamson and Downs (1990a), Williams (1990b). To derive the dependency bounds, the lowest and highest possible probability for all z belonging to the domain of the resultant random variable Z must be known or computed. To do this, a pair (when two random variables are involved in the computation of Z) of joint distribution matrices (hypermatrices when more random variables are involved) that will provide the lowest and highest probabilities is formed. After obtaining these lowest and highest values (outlined next), the overlaps are computed (on the lowest and highest values) as presented above. Definition 18 (Berleant and Goodman-Strauss, 1998). A joint distribution matrix for marginal discretizations of two random variables X and Y with bar probability masses Xi , i = 1, . . . , m, and Yj , j = 1, . . . , n, is the m×n matrix 0 ≤ pij ≤ 1, such that xi = nj=1 pij , i = 1, . . . , m, and yj = m i=1 pij , j = 1, . . . , n, where i j pij = 1, i x1 = 1, j yj = 1. Example. Joint distribution matrix
y
0.04 0.02 0.02 0 0.02 0.10
0.02 0.08 0.05 0.01 0.04 0.20
0.03 0.04 0.08 0.09 0.06 0.30
0.01 0.01 0.10 0.20 0.08 0.10
x 0.10 0.15 0.25 0.30 0.20 1
Definition 19 (Berleant and Goodman-Strauss, 1998). A vertex matrix is a joint distribution matrix such that at least (m − 1) × (n − 1) of its entries are equal to 0.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
147
Example. Vertex matrix
y
x 0.10 0 0 0 0.10 0 0.15 0 0 0.15 0 0 0.15 0.10 0.25 0 0 0 0.30 0.30 0 0.05 0.15 0 0.20 0.10 0.20 0.30 0.10 1
The creation of vertex matrices is a key step in computing dependency bounds (enclosures), as will be seen, and an algorithm that does this is presented next. Algorithm IV.3 [The Cross-Off Algorithm for Creating a Vertex Matrix (Berleant and Goodman-Strauss, 1998)]. Input: x = x1 , . . . , xm , y = y1 , . . . , yn Step 1. Assign pij = ∗ to the joint distribution matrix. Step 2. Set xi = xi , i = 1, . . . , m, and yj = yj , j = 1, . . . , n. Step 3. Repeat until all * are replaced by numbers. Step 3.1. Step 3.2. Step 3.3. Step 3.4. Step 3.5.
Let xI = maxi xi , yJ = maxj yj . If xI ≥ yJ , pI J = yJ , and for each piJ = ∗, set piJ = 0. If xI ≤ yJ , pI J = xI , and for each pIj = ∗, set pIj = 0. xI = xI − min{xI , yJ }. yJ = yJ − min{xI , yJ }.
The authors further show that, given two discrete (discretized) random variables, X, and Y , the cross-off algorithm produces a vertex matrix after no more than n + m − 1 iterations. Let A denote a matrix that is a subset of the entries of the joint distribution matrix. That is, let A ⊆ {pij , i = 1, . . . , m, j = 1, . . . , n}. In what follows, the matrix A will be chosen to include each entry whose associated interval’s lower bound is at or below z, at which the probability of the result of interval arithmetic operation is to be maximized to get the upper dependency bound at z or whose associated interval’s high bound is at or above the number z, at which the probability is to be minimized to get the lower dependency bound at z. This is most easily understood by an example. Suppose the joint distribution matrix is given for the negative correlation example and z = xy = 4. Then, A consists of p11 = 0, p12 = 14 , p13 = 14 ,
148
LODWICK
and p21 = 14 since 4 ∈ [2, 6], 4 ∈ [3, 8], 4 ∈ [4, 10], and 4 ∈ [4, 12], respectively, and the pij of A are those that correspond to these intervals in the matrix. If z = xy = 8 is easily checked that matrix A consists of p12 , p13 , p21 , p22 , and p23 . To complete the rest of the matrix (given this A), the entries are maximized subject to the condition that the sums of the rows and columns add up to the values of the given marginals. By maximizing the sums P of the missing entries of A, we minimize the sum of 1 − P for the complement of matrix A. How the missing entries of matrix A are maximized is discussed next. Definition 20 (Berleant and Goodman-Strauss, 1998). Given two joint }, respectively, if distribution matrices D and D with entries {pij } and {pij pij > pij , (i,j )∈A
then it is said that D is higher than
(i,j )∈A
D
(D is lower than D).
Given this definition, it is clear that to compute dependency bounds, the consistent with the dependencies as lowest and highest matrices (D and D), defined by the marginals of X and Y , are sought. Definition 21 (Berleant and Goodman-Strauss, 1998). Two vertex matrices are said to be adjacent if they have at least (m−1)(n−1)−1 zeros in common among their entries. (respectively D) by repeatedly The algorithm to be developed computes D moving from one adjacent vertex matrix to another. Berleant and GoodmanStrauss (1998) prove (see their Lemma 6.2, page 158) that given any vertex matrix D0 , there exists a chain of vertex matrices D0 , D1 , D2 , . . . , such that each vertex matrix in the chain is adjacent to and Dk , . . . , DK = D higher than the preceding vertex matrix. Moreover, they also prove (see their Lemma 2.3, page 158) that once (m−1)(n−1) entries of the joint distribution matrix such that all the entries of an entire row and an entire column are not specified, then all of the missing entries are determined. All of this leads to an as follows: algorithm that computes D Algorithm IV.4. [Dependency Bounds Berleant and Goodman-Strauss (1998, pages 158, 159)] Step 1. Given a distribution matrix, use the cross-off algorithm to generate an initial vertex matrix D0 and set k := 0. Step 2. Loop.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
149
Step 2.1. Specify the locations of the zeros for each potential adjacent vertex matrix. Step 2.2. For each potential adjacent vertex matrix, apply the proof to Lemma 2.3 (Berleant and Goodman-Strauss, 1998) to find the values of the remaining entries. Discard any matrix not satisfying 0 ≤ pij < 1. Step 2.3. Compute the height of all the remaining adjacent vertex matrices as high or higher than any other adjacent matrix. Step 2.4. If Dk+1 is higher than Dk , then set k := k +1 and repeat the loop. = Dk and stop. Or else, set D The computation of D is done similarly. Example (Positive correction example). z = xy = 4, where pX 1 =
xy ∈ [2, 6],
1 p11 , 4
xy ∈ [3, 8],
1 , 4 p13 = 0, 1 pX2 = , 2 p21 = 0, 1 p22 = , 4 1 p23 = , 4 1 pY1 = , 4 1 pY2 = , 2 1 pY3 = , 4
xy ∈ [4, 10], x ∈ [2, 4], xy ∈ [4, 12], xy ∈ [6, 16], xy ∈ [8, 20], y = [2, 3], y = [3, 4], y = [4, 5], and
1 , 2
x ∈ [1, 2],
⎡
⎡1
⎤ 0 A = ⎣ 12 ∗ ⎦ , A = ⎣ ∗⎦. 0 ∗ 0 ∗ 3 1 pij = . pij = > 4 2 0
1 4
⎤
p12 =
4 1 4
150
LODWICK
Building lower and upper dependency bounds requires the analysis of all z ∈ Z = [2, 20] for this example. To find the upper dependency bound, the following problem is solved: max pij (i,j )∈A
subject to n
pij = xi ,
m
i = 1, . . . , m,
j =1
pij = yj ,
j = 1, . . . , n.
i=1
Connecting the maxima for each z, we form the upper dependency bound. Similarly, the same procedure is used to compute the lower dependency bound except a linear program that minimizes is used. There exists an uncountably infinite number of z ∈ Z = [2, 20]. However, for intervals, there are a finite number of transitions at which we need to compute the z, and these occur at the intersection of our subintervals. For this example, the transitions are at z = 2, 3, 4, 6, 8, 10, 12, 16, 20. c. Tonon. Tonon (2004), like Berleant and Goodman-Strauss (1998), uses Moore’s approach (Moore, 1984) to bracket the cumulative distribution of a function of random variables using the theory of random sets (Dubois and Prade, 1991) instead of linear programming for enclosure. Given D ⊆ Rn , P (D), the power set of D, a basic probability measure function, is a setvalued function m : P (D) → [0, 1] where ∀A ∈ P (D),
m(A) ≥ 0,
m(∅) = 0, m(A) = 1. A∈P (D)
Definition 22 (Demster, 1967; Dubois and Prade, 1991; Klir and Yuan, 1995; Shafer, 1987). If m(Aα ) > 0, Aα ∈ D, α ∈ $, an indexing set, Aα is called a focal element. A random set is the pair (F , p) where F is the set of all focal elements of D. For a specific pair (F , p), Aα ∈ F ⇒ m(Aα ) > 0, Aα ∈F m(Aα ) = 1. For E ∈ P (D), Bel(E) = m(Aα ) ≤ Pr(E) Aα ⊆E,Aα ∈F
≤ Pl(E) =
Aα ∩E=∅,Aα ∈F
m(Aα ).
(74)
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
151
Bel is called the belief measure and Pl is called the plausibility measure. For random sets, a measure is constructed on subsets of D (not singletons), and these are the outcomes of subsets Ai ⊆ D, which are the observations indicating an event somewhere in Ai . There is no reference made to a probability of a particular point in Ai . Any probability, for which p(Ai ) = m(Ai ), would be consistent with the basic probability measure function m. If the focal elements are all singletons, then, of course, the basic probability measure defines a probability distribution and m(Aα ) Bel(E) = Aα ⊆E,Aα ∈F
= Pr(E) = Pl(E) = Since
m(Aα ).
Aα ∩E=∅,Aα ∈F Aα ∈F
m(Aα ) = 1, we have
Bel(E) = 1 − Pl E C Pl(E) = 1 − Bel E C .
For D ⊆ R, each connected focal element, Aα , is an interval. For this case, two cumulative distributions can be computed: m(Aα ), F (u) = Bel{u ∈ R | u ≤ u} = Aα ∈F ,u≥sup(Aα )
(u) = Pl{u ∈ R | u ≤ u} = F
m(Aα ).
Aα ∈F ,u≥inf(Aα )
Note that for any Aα = [a α , a¯ α ] such that a α ≤ u implies that Aα ∩ (−∞, u] = ∅, and a¯ α ≤ u implies that Aα ⊆ (−∞, u]. An expectation is defined as follows. Definition 23. For all connected focal elements Aα ∈ R, the expectation μ is m(Aα ) inf(Aα ), m(Aα ) sup(Aα ) . μ= α
α
The extension principle for random sets is defined next. Definition 24. Let y = f (u), f : u ∈ D → Y ⊆ R. The image of (F , p) is (%, ρ) where % = {R = f (Aα ), Aα ∈ F } and ρ(R) = Aα : R=f (Aα ) m(Aα ). Of course, ρ(R) = 0 if there exists an Aα such that R = f (Aα ). Clearly, ρ(R) ≥ 0.
152
LODWICK
, Given the above definitions and observations, lower, F , and upper, F bounds on the cumulative distribution, F , F ≤ F ≤ F are computed as follows. Let y = f (&x), m a basic probability measure function, and {Ai } a ! partition of nonzero diameter (hence a focal element) such that D = N i=1 Ai . m(Ai ), (75) F (y) = Ai :,y≥sup f (Ai )
(y) = F
m(Ai ).
(76)
Ai :,y≥inf f (Ai )
(y). What the above lower It is clear by construction that F (y) ≤ F (y) ≤ F and upper cumulative distribution functions say is that, if y = f (&x), the range value, is greater than or equal to the maximum over the ith partition, Ai , then the probability measure of that partition is added to the lower cumulative distribution function. In a similar way, the upper cumulative distribution is formed. The actual value of the distribution inside the partition may be unknown. Tonon proceeds to use the “vertex method,” which is a simple way to evaluate the function only at the vertices of the partition (box) Ai , where he assumes that the function f is monotonic in each variable, thereby obtaining the minimum and maximum values of the function over this partition, which occur at the vertices (given the monotonicity assumption). This cuts down on the complexity of computing the global optimum. For nonmonotonic functions, one may apply interval global optimization techniques whose overestimation is minimal given partitions (boxes) that have small diameters. 3. Inverse Probability Method Inverse probability is a phrase coined by Olsen (2005) to denote that partitions are computed from the inverse distribution. Jamison and Lodwick (2003b, 2004) partition using the inverse of the marginal cumulative distributions for equally spaced grid points so that the boxes are of equal probability, as does Olsen. Olsen (2005), however, focuses on the development of an efficient algorithm for distribution arithmetic. In particular, the overestimation generated by the enclosure developed is smaller so that the method potentially converges in fewer steps. a. Jamison and Lodwick. The problem addressed by Jamison and Lodwick (2003b, 2004, 2006) is Eq. (58), one for which the function of distributions possesses the associated articulated assumptions given above. & = The Approach. Given the vector of continuous random variables, X (X1 , . . . , Xn )T , the joint cumulative distribution function at x& = (x1 ,
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
153
. . . , xn )T , FX (& x ), from the left is FX (& x ) = prob(X1 ≤ x1 , . . . , Xn ≤ xn ), x ), is and, from the right, the joint cumulative distribution function, GX (& GX (& x ) = prob(x1 ≤ X1 , . . . , xn ≤ Xn ). Bounds (lower and upper) are constructed, and an approximation between the lower and upper bounds for the cumulative distribution function of Y is computed in three steps. The first step partitions the domain D = {[s 1 , s¯1 ] × · · · × [s n , s¯n ]} into small boxes. The second step constructs lower and upper bounds and an intermediate estimate for the conditional cumulative distribution function of Y for each box of the partition given that X is in that box. The final step combines the cumulative distribution functions on each box into the cumulative distribution function over the entire domain. Step 1. Partition—The first step is to construct a partition on the domain, D, D = [s 1 , s¯1 ] × · · · × [s n , s¯n ]
−1 −1 −1 = FX−1 (0), F (1) × · · · × F (0), F (1) X X X n n 1 1 into subboxes of equal probability since this simplifies the derived method. For example, to divide [FX−1 (0), FX−1 (1)] into three pieces of equal probai i −1 −1 1 −1 1 bility, we use [FXi (0), FXi ( 3 )], [FXi ( 3 ), FX−1 ( 2 )] and [FX−1 ( 2 ), FX−1 (1)]. i 3 i 3 i Overlap is not an issue since the distribution of X is assumed to be continuous. The primary consideration in this process is how partitioning affects the size of the problem. If there are n random " variables and each variable Xi is divided into ki subintervals, then there are ni=1 ki conditional cumulative distribution functions to compute. For this study, we do not explore partitioning strategies. Regardless, it is desirable to minimize the number of subdivisions and subdivide only the variables that influence the results the most (a large partial derivative in the box may be used as an indicator). Remark. Approximating the actual cumulative distribution function using our approach is most difficult in [FY−1 (0), FY−1 (0 + δ)] and in [FY−1 (1 − δ), FY−1 (1)] since there are no overlaps. The best results for the intermediate approximation are obtained from overlaps where there is, potentially, a cancellation of overapproximations and underapproximations as will be seen. Remark. Equally spaced partitions in probability require the inverse of FX−1 i to be known in closed form or at least computable. When this is the case, the partitioning strategy proceeds in a straightforward manner. When this is not the case, an additional complexity arises in that the inverse must be approximated.
154
LODWICK
Step 2. Construction of Lower and Upper Bounds and an Approximation on Each Partition—The second step is to construct the bounds and the estimated conditional cumulative distribution function for each box of the partition. Let [b1 , c1 ] × · · · × [bn , cn ] be one such box and let A denote the event X falls in this box. Consider the family of n-dimensional boxes:
' b1 , FX−1 (t) × · · · × bn , FX−1 (t) ' t ∈ [0, 1] . n |A 1 |A From our assumption that f is continuous and increasing in each Xi ,
(t) × · · · × bn , FX−1 (t) f b1 , FX−1 n |A 1 |A
−1 (t), . . . , F (t) . = f (b1 , . . . , bn ), f FX−1 X |A |A n 1 Thus,
−1 −1 −1 (t), . . . , F (t) ≥ F (t), . . . , F (t) . FY |A f FX−1 F X|A |A |A |A |A X X X n n 1 1
Because for a level curve c = y = f (& x ), x& ∈ [s 1 , s¯1 ] × · · · × [s n , s¯n ], the assumption that f is increasing in each variable means that for a fixed x&∗ such that c = f (& x ∗ ), the level curve lies in [s 1 , x1∗ ] × · · · × [s i , xi∗ ] × · · · × [s n , xn∗ ] for each i = 1, . . . , n. This can be seen in Figure 7 for the example Y = 1 f (X1 , X2 ) = X1 X2 with the level curve c = 10 = f (X1 , X2 ) = X1 X2 so 1 ∗ that x& = {(x1 , x2 ) | |x1 x2 = 10 }. If X1 and X2 are i.i.d. U [0, 1] random 1 1 variables with Y = f (X1 , X2 ) = X1 X2 , and x&∗ = ( 10 , 10 ), then clearly 1 1 ] × [0, 10 ] since the right end point of this box is a point f (& x ∗ ) ∈ [0, 10 on the level curve. Now consider the n-dimensional box (coming from the right end point) (t), c1 ] × · · · × [FX−1 (t), cn ]. As before, we know that [FX−1 n |A 1 |A
(t), c1 × · · · × FX−1 (t), cn f FX−1 n |A 1 |A
(t), . . . , FX−1 (t) , f (c1 , . . . , cn ) . = f FX−1 n |A 1 |A This gives the inequality (t), . . . , FX−1 (t) ≥ GX|A FX−1 (t), . . . , FX−1 (t) . 1 − FY |A f FX−1 n |A n |A 1 |A 1 |A Combining both inequalities yields FX|A FX−1 (t), . . . , FX−1 (t) n |A 1 |A −1 (t), . . . , F (t) ≤ FY |A f FX−1 |A |A X n 1 −1 (t), . . . , F ≤ 1 − GX|A FX−1 Xn |A (t) . 1 |A
(77)
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
155
F IGURE 7. Outer and inner approximations on Y = 0.1 isoline. Outer and inner measure of F (Y ) = 0.1, Y -product of two i.i.d. U [0, 1] random variables with no partitioning (one block).
Example. Figure 7 illustrates the estimate and bounds for FY (0.1) when Y = X1 X2 and Xi are identical independent uniform [0, 1] random variables. The range has not been subdivided. This example shows the inner and outer measures of the area bounded to the left and right of the level set Y = X1 X2 =
1 1 0.1, the lower and upper box computed for x&∗ = ( 10 , 10 ), illustrating the inequalities of Eq. (77). N OTE: the two lower and upper bounds can be 1 computed from any x&∗ such that x&∗ = {(x1 , x2 ) | x1 x2 = 10 }. The algorithm that is developed here uses points along the diagonal to make the size of the boxes the same (hypersquares).
A simplification to this calculation is accomplished by the use of two auxiliary functions, g and h, defined below. To this end, define g : [0, 1]n →
n ( i=1
[s i , s¯i ]
(78)
156
LODWICK
by g(u1 , . . . , un ) = FX−1 (u1 ), . . . , FX−1 (un ) n 1 = (x1 , . . . , xn ).
(79)
By the assumptions, g is onto, strictly increasing in each variable, thus oneto-one. Thus, it is invertible so that & = (X1 , . . . , Xn ) g(U ) = X and U = g −1 (X1 , . . . , Xn ). It is clear that from Eqs. (78) and (79) that Ui = FXi (Xi ) [from which we derive Xi = FX−1 (Ui )] and has the standard uniform distribution i Ui (t) = 1/(¯si − s i ) for t ∈ [s i , s¯i ] (80) 0 otherwise. This is because Ui takes values in the interval [0, 1], and the probability that Ui is less than or equal to t is the probability that FXi (Xi ) ≤ t. But this is the probability that Xi is in the interval (−∞, c], where c is the constant such that FXi (c) = t. Thus, the probability that Xi is in the interval (−∞, c] is precisely t, which means that Ui is the standard uniform distribution [Eq. (80)]. In general, if a random variable is the domain of its cumulative distribution function, a standard uniform distribution is obtained. Fix y ∗ ∈ [s Y , s¯Y ] and consider the set A = {(x1 , . . . , xn ) | f (x1 , . . . , xn ) ≤ ∗ y }, which would be the area under the isoline in the example shown in Figure 7. Then FY (y ∗ ) = prob(X ∈ A) = prob U ∈ g −1 (A) ,
(81)
where Eq. (81) is equal to the area of the set (u1 , . . . , un ) | g(u1 , . . . , un ) ∈ A = (u1 , . . . , un ) | f g(u1 , . . . , un ) ≤ y ∗ . This means that the problem of finding the cumulative distribution function of Eq. (58) can be redefined as a problem of finding the cumulative distribution function for the random variable Y = h(U ) = f g(U ) , (82)
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
157
where h : [0, 1]n → [s Y , s¯Y ], (u1 ), . . . , FX−1 (un ) , h(u1 , . . . , un ) = f FX−1 n 1 where Ui is Eq. (80). This is because Y , defined by Eq. (82), has the identical distribution as Y = f (X). Next define k : [0, 1] → [s Y , s¯Y ]
(83)
by k(t) = h(t, . . . , t),
t ∈ [0, 1].
(84)
By assumption, k is one-to-one, onto, strictly increasing and invertible so that ∃t ∗ such that t ∗ = k −1 (y ∗ ). Let T = k −1 (Y ). Then, FT t ∗ = FY y ∗ , (85) which is the area of the set (u1 , . . . , un ) | h(u1 , . . . , un ) ≤ y ∗ = (u1 , . . . , un ) | h(u1 , . . . , un ) ≤ h t ∗ , . . . , t ∗ . Given that h is monotonically increasing in each variable, for ui ≤ t ∗ ∀i, we know h(u1 , . . . , un ) ≤ h t ∗ , . . . , t ∗ , so that this area is bounded below by (t ∗ )n . Similarly, for ui ≥ t ∗ ∀i, h(u1 , . . . , un ) ≥ h t ∗ , . . . , t ∗ , so that this area is bounded above 1 − (1 − t ∗ )n . This means that the following bounds hold: n ∗ n t ≤ FT t ∗ = FY y ∗ ≤ 1 − 1 − t ∗ . (86) Of course, an open question is how to pick bounds that are a simple computation such as that given above. The bounds [Eq. (86)] may be a wide envelope, particularly for a large number of variables (large n). A reasonable estimate of the cumulative distribution function, without having to perform the calculations needed to reduce the envelope to a small width, is desired. To do this, select an Y |A for intermediate value F (t), . . . , FX−1 (t) FY |A f FX−1 n |A 1 |A
158
LODWICK
that falls between the lower and upper estimate above. One estimate would be to average these probabilities, that is, set Y |A f F −1 (βt), . . . , F −1 (t) F X1 |A Xn |A 1 −1 = FX|A FX−1 |A (t), . . . , FXn |A (t) 1 2 −1 (t), . . . , F (t) . + 1 − GX|A FX−1 |A |A X n 1 & i , are independent, a reasonable choice for When the random variables, X , is to use t. This works since t n ≤ the intermediate estimate function, F n t ≤ 1 − (1 − t) and has several desirable properties. First is its simplicity. Second is that this estimate does not increase the maximum possible error in making a choice of intermediate value. This is so because the maximum of the difference 1−(1−t)n −t n occurs when t = 0.5, and at this value the midpoint estimate is 12 (0.5n + 1 − 0.5n ) = 0.5. A third property is that it is symmetric about the value t = 0.5. Any tendency to overestimate or underestimate the true value when t < 0.5 should be offset by a tendency to underestimate or & we use overestimate for values of t > 0.5. So, for independent X −1 (t) = t n , (87) F Y |A f FX1 |A (t), . . . , FX−1 n |A −1 −1 Y |A f F F (88) X1 |A (t), . . . , FXn |A (t) = t, and
Y |A F −1 (t), . . . , F −1 (t) = 1 − (1 − t)n F X1 |A Xn |A
(89)
to obtain a lower bound, intermediate estimate, and upper bound, respectively, on the actual cumulative distribution function, FY |A (y). Note that these equations [Eqs. (87)–(89)], define three functions of t, Hk (t) : [0, 1] → [0, 1], k = 1, 2, 3. When n = 3, the graphs of Hk for each of the three estimates are as follows (see Figure 8). The actual distribution lies between the lower and upper bounds, while the estimate H2 (t) = t provides a centrally located estimate. This is the source of the averaging of errors as the number of subdivisions increases. As the number of variables becomes large, the lower and upper bounds may become wide and the intermediate estimate becomes more important. Step 3. Combine— The last step of the method is to combine the estimated conditional cumulative distribution functions (CDF) into an approximation of the cumulative distribution function for Y . Assume the support has been & divided into subintervals for each Xi creating a partition of the support of X. If {Aj | j = 1, m} is the partition (so each Aj is also an n-dimensional Y |Ai (y) are the values calculated as Y |Ai (y), and F box), and if F Y |Ai (y), F
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
F IGURE 8.
159
Underestimate, approximation, and overestimate.
above for the lower bound, intermediate approximation, and the upper bound (respectively) of the cumulative distribution function for the variable Y given that X ∈ Ai , then we can combine these cumulative distribution functions to produce the bounds and estimate mfor the cumulative distribution function Y (y) = of interest by setting F j =1 FY |Ai (y)P (X ∈ Aj ), where P (E) equals the probability of the event E. The lower and upper bounds [F Y (y) Y (y)] are calculated similarly. and F Example. Consider Y = f (X1 , X2 ) = X1 + X2 with 2 partitions (4 boxes), the same problem with 8 partitions (64 boxes) and a more complicated function Y = (max{X13 , X2 })2 +X1 X2 X3 with 8 partitions (512 boxes), where X1 , X2 , and X3 are identical, independent uniform over [0, 1] (i.i.d. ˜U [0, 1]) illustrated in Figures 10 and 11, respectively. Note that while the lower and upper bounds are wide for the sum of two random variables using four boxes (Figure 9), the estimate is very good. If an estimate suffices, then computing with fewer boxes works well. All simulated results, using Monte Carlo used 10,000 random draws. Theoretical Considerations. The algorithm above converges to FY (y) when the supports of Xi are subdivided in such a way that the diameters of all
160
LODWICK
F IGURE 9.
Sum of 20 i.i.d. U [0, 1] random variables using 4 boxes, 2 × 2 partitions.
F IGURE 10.
Sum of 20 i.i.d. U [0, 1] random variables using 64 boxes, 8 × 8 partitions.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
161
F IGURE 11. Distribution arithmetic using Jamison and Lodwick on a complex problem using 512 boxes, 8 × 8 × 8 partitions. The cumulative distribution function for Y = (max{X13 , X2 })2 + X1 , X2 , X3 when X1 , X2 , and X3 are i.i.d. U [0, 1] random variables.
subboxes go to 0, that is, when
diam(Dk ) = diam s k1 , s¯1k × · · · × s kn , s¯nk → 0 ∀k. Y (y) This is because, from measure theory and our assumptions, F Y (y) and F are simply inner and outer measures for FY relative to the measure defined by FY for the region {& x | f (& x ) ≤ y}. Thus, the process will converge to the actual cumulative distribution function. b. Olsen. Olsen (2005) particularizes Jamison and Lodwick with the view of obtaining an efficient algorithm suitable for computing the function of random variables in high dimensions. Olsen uses Tonon’s approach to construct the resulting cumulative distribution function. Thus, this approach may be considered a hybrid of Jamison and Lodwick and Tonon with the intended purpose of obtaining an efficient algorithm that encloses the actual distribution. The theoretical considerations are those of the Jamison and Lodwick method discussed above. The Olsen algorithm is as follows.
162
LODWICK
Algorithm IV.5 (Olsen, 2005). Find the cumulative distribution of y = f (X1 , . . . , Xn ), where the Xi are independent random variables. Step 1. Partition Xi into m intervals of equal probability, p, as was done in the Jamison and Lodwick method, resulting in
j j j Xi = Xi1 , . . . , Xim , where Xi = x i , x¯i , so that 1 × Xn1 , A1 = X11 × X21 × · · · × Xn−1 1 × Xn2 , A2 = X11 × X21 × · · · × Xn−1 .. . m Anm = X1m × X2m × · · · × Xn−1 × Xnm ,
and probability of each Ak is pn given our independence assumption. Step 2. Compute the minimum and maximum of f (Aj ). N OTE: It is here that what Tonon (2004) and Fortin et al. (2006) call the vertex method, is used. The vertex method is simply what is known in optimization—the minimum and maximum of a monotone function over a compact set (box) occurs at the end points. Olsen (2005) assumes that the function is monotone over each box and thus simply evaluates f (Aj ) at the vertices of boxes. Step 3. Construct the lower and upper bounds for the cumulative distribution function like Tonon (2004), that is, p(Ai ) = pn , (90) F (y) = y≥sup f (Ai )
(y) = F
y>inf(Ai )
p(Ai ) = p n
y≥sup f (Ai )
.
(91)
y>inf f (Ai )
¯ one typically picks a set of grid Since the range of f is an interval [y, y], ¯ 0 = y < y1 · · · < yk < yk+1 < · · · < yK = y, ¯ where points yk ∈ [y, y]y Eqs. (90) and (91) are evaluated at the points of the grid selected. Example. Let Y = f (X1 , X2 ) = X1 + X2 , for X1 , X2 U [0, 1], and independent. Step 1. Partition Xi into m intervals of equal probability, say, p = 0.5. Thus, X11 = [0, 12 ], X12 = [ 12 , 1], X21 = [0, 12 ], X22 = [ 12 , 1], so that, A1 = [0, 12 ] × [0, 12 ], A2 = [0, 12 ] × [ 12 , 1], A3 = [ 12 , 1] × [0, 12 ], A4 = [ 12 , 1] × [ 12 , 1], each having probability of p = ( 12 )2 = 14 , since we have assumed independence. This is illustrated in Figure 12.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
F IGURE 12.
163
Example of inverse probability boxes with equal probability.
Step 2. Compute the minimum and maximum of f (X1 , X2 ), which occurs at the vertices {(0, 0), (0, 12 ), (0, 1), ( 12 , 0), ( 12 , 12 ), ( 12 , 1), (1, 0), (1, 12 ), (1, 1)}. Since the order does not matter for addition, the minimum and maximum of f on the vertices will be (0, 0) = 0 + 0 = 0, (0, 12 ) = 0 + 12 = 0.5, . . . (1, 1) = 1 + 1 = 2. The function values of f on each of the four Aj s at the vertices are B1 = {0, 12 , 12 , 1}, B2 = { 12 , 12 , 1, 32 }, B3 = { 12 , 12 , 1, 32 }, and B4 = {1, 32 , 32 , 2} with the minimum and maximum on B1 being {0, 1}; on B2 it is { 12 , 32 }; on B3 it is { 12 , 32 }; and on B4 it is {1, 2}. Step 3. Construct the lower and upper bounds on the cumulative distribution by taking the minimum and maximum of each Bj . Thus for the grid y0 = 0, y1 = 12 , y2 = 1, y3 = 32 , y4 = 2, begin with the smallest value y0 = 0. (0) = 0. This value is not greater than any of the Bj values, so that F 1 At y1 = 2 , y1 is greater than the minimum of B1 but no other Bj s, ( 1 ) = 1 , the probability of one Aj . Continuing the process, we so F 2 4 (1) = 3 , F ( 3 ) = 1, F (2) = 1. Similarly, for obtain yi < min{Bj }F 4 2 1 yi ≥ max{Bj }F (0) = 0, F ( 2 ) = 0, F (1) = 0, F ( 32 ) = 34 , F (2) = 1. The lower and upper cumulative distributions and the Moore approximation are illustrated in Figure 13. (See also Tables 2 and 3.)
164
LODWICK
F IGURE 13. function.
Inverse probability of Olsen—assembled lower and upper cumulative distribution
TABLE 2
yk > min{Bj } ⇒ p(Bj ) ∈ c.d.f. Fyk y0 y1 y2 y3 y4
=0 = 12 =1 = 32 =2
B1
B2
B3
B4
CDF
0
0 0
0 0
0
1 4 1 4 1 4
1 4 1 4 1 4
0 0 0 1 4 1 4
1 1
1 4 1 4 1 4 1 4
1 4 3 4
Remark. Functions that are nonmonotonic in a box, for the methods that evaluate over boxes, require global optima to be computed. When dependencies are unknown, global optimization over boxes is the only way to obtain a meaningful result when enclosures are necessary. From lower and upper approximations, a guess at the actual distribution from this information is often useful. The examples have used averages that for independent variables are reasonable. There are two areas that would help in obtaining a more efficient method for the computation of enclosure of resulting distribution from a function of distributions. 1. Strategies for partitioning that are efficient in terms of obtaining global optima and computing the probability over boxes. Global optimization
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
165
TABLE 3
yk ≤ max{Bj } ⇒ p(Bj ) ∈ c.d.f. Fyk y0 y1 y2 y3 y4
=0 = 12 =1 = 32 =2
B1
B2
B3
B4
CDF
0 0
0 0 0
0 0 0
0 0
1 4 1 4
1 4 1 4
0 0 0 0 1 4
1
1 4 1 4 1 4
1 4 3 4
methods that provide enclosure and verification require interval techniques. This means that partitioning strategies are useful (see Hansen, 1992; Kearfott, 1996a; Neumaier, 2004b). In particular, second derivative information, if available, may be useful in deciding the partitioning strategy. 2. As seen from the examples, the lower and upper approximations that enclose the distribution provide the bounds from which a guess to the actual distribution can be made. The examples use the average between the lower and upper approximation, which for independent variables is a reasonable approach. However, good strategies to pick a “best” approximation from the lower and upper bounds are needed. There is a trade-off between the number of partitions and the resulting complexity. One way to reduce the complexity is to have a good approximation method based on the information provided by the lower and upper bounds when the number of partitions is relatively low. B. General Theory of Uncertainty A problem in which some or all the types of uncertainty of interest (interval, probability, possibility, and fuzzy) occur within the problem requires a theory that has the uncertainties of interest as special instances of the theoretical framework. Two theories do this: (1) the theory of clouds (Neumaier, 2004a), and (2) the theory of interval-valued probability (Jamison and Lodwick, 2006; Lodwick and Jamison, 2006; Weichselberger, 1996, 2000), which can be considered as extensions of Walley (1991, 1999). Since interval probability measures include clouds (as well as intervals, probability, possibility and fuzzy uncertainty), the more detailed focus will be on interval-valued probability measures. This discussion begins with the theory of clouds. 1. Clouds The idea of a cloud is to enclose uncertainty in such a way that the enclosing functions have probabilistic-like characteristics. In particular, every cloud has
166
LODWICK
been shown to contain a probability distribution within it (Neumaier, 2005). Beyond the ability to model with a mixture of uncertainty, the original impetus was to be able to model analytically when missing information, precision of concepts, models, and/or measurements. Definition 25. A cloud (Neumaier, 2004a) over a set M is a mapping x that associates with each ξ ∈ M a (nonempty, closed, and bounded interval) x(ξ ), such that, (0, 1) ⊆ x(ξ ) ⊆ [0, 1]. (92) ξ ∈M
x(ξ ) = [x(ξ ), x¯ (ξ )], is called the level of ξ in the cloud x, where x(ξ ) and x¯ (ξ ) are the lower and upper level, respectively, and x¯ (ξ ) − x(ξ ) is called the width of ξ . When the width is 0 for all ξ , the cloud is called a thin cloud. When doing analysis over real numbers, a concept akin to a fuzzy number (gradual number or interval number in our previous settings) is required for clouds. Definition 26 (Neumaier, 2004a). A real cloudy number is a cloud over the set R of real numbers. χ[a,b] (χ being the characteristic function) is the cloud equivalent to an interval [a, b], providing support information without additional probabilistic content. A cloudy vector is cloud over Rn , where each component is a cloudy number. Neumaier (2004a) states that dependence or correlation between uncertain numbers (or the lack thereof) can be modeled by considering them jointly as components of a cloudy vector. Moreover, In many applications (not always, cf. Proposition IV.1, but roughly when x¯ (ξ ) ≈ 1 for ξ near the modes of the associated distribution), the level x(ξ ) may be interpreted as giving lower and upper bounds on the degree of suitability of ξ ∈ M as a possible scenario for data modeled by the cloud x. This degree of suitability can be given a probability interpretation by relating clouds to random variables [see Eq. (93) below]. We say that a random variable x with values in M belongs to a cloud x over M, and write x ∈ x, if Pr x(x) ≥ α ≤ 1 − α ≤ Pr x(x) ¯ > α , ∀α ∈ [0, 1]. (93) Pr denotes the probability of the statement given as argument, and it is required that the sets consisting of all ξ ∈ M where x(x) ≥ α and x(x) ¯ >α
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
167
are measurable in the σ -algebra on M consisting of all sets A ⊆ M for which Pr(x ∈ A) is defined. This approach gives clouds an underlying interpretation as the class of random variables x with x ∈ x. A fuzzy cloud is one for which the level is a pair of fuzzy membership functions. The above interpretation is equivalent to the interpretation of fuzzy set membership degree as an upper bound (for a single fuzzy set considered as a cloud, have the upper level as the membership function, and the x-axis as the lower level) for probabilities first advocated by Dubois et al. (1997). Since there exists at least one random variable in every cloud (Neumaier, 2005), this interpretation is meaningful. Computation with clouds involves arithmetic on the (vertical) interval x(ζ ) for each ξ that defines the lower and upper level of the cloud. Efficient methods to compute with clouds are still an open question. However, once a cloud is constructed, methods that have been discussed can be used. Example (Kawai, 2006). Suppose a histogram-based random variable x is given in Table 4. The original density is not crisp. The density of Figure 14 assumes uniform distribution within each interval. If the Xi are chosen in the order shown in Figure 14, then we have a discrete cloud that resembles Figure 15. The dashed lines of Figure 15 form x(ξ ) (the lower level) and the solid line of Figure 15 form x¯ (ξ ) (the upper level). There is nothing to prevent the cloud from being formed using a different order of the Xi . In fact, there may be some advantage in choosing the intervals in a different order so that it results in a bell-shaped cloud, as shown in Figure 16.
2. Interval-Valued Probability Measures The next presentation is an expanded version of Lodwick and Jamison (2006) and portions may also be found in Jamison and Lodwick (2006). A basis for TABLE 4
Intervals Xi X1 X2 X3 X4 X5 X6
(−∞, 2) [2, 3) [3, 4) [4, 5) [5, 6] (6, ∞)
Pr(x ∈ Xi )
Walley (1999) uses a cumulative αi
x(ξ ) := [αi−1 , αi ]
0.0 0.1 0.2 0.3 0.4 0.0
0.0 0.1 0.3 0.6 1.0 1.0
[0.0, 0.0] [0.0, 0.1] [0.1, 0.3] [0.3, 0.6] [0.6, 1.0] [1.0, 1.0]
168
LODWICK
F IGURE 14.
F IGURE 15.
Probability density—histogram-based random variable.
Discrete cloud constructed from probability density in order given.
linking various methods of uncertainty representation, including clouds, is examined next. This section begins by defining what is meant by an IVPM. This generalization of a probability measure includes probability measures,
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
F IGURE 16.
169
Discrete cloud constructed from density in different order than that given.
possibility and necessity measures, intervals, and clouds (Neumaier, 2004a). The set function defining IVPM is thought of as a method for providing a partial representation for an unknown probability measure, much like clouds. Throughout, arithmetic operations involving set functions are in terms of interval arithmetic (Moore, 1979), and Int[0,1] ≡ {[a, b] | 0 ≤ a ≤ b ≤ 1} denotes an arbitrary interval within [0, 1]. It will be seen how problems using mixed representations can be handled and solved. The first subsection defines in a formal way the term interval-valued probability measure as used by Weichselberger (1996, 2000). Weichselberger’s definition begins with a set of probability measures (our IVPMs relax “tightest”) and then defines an interval probability as a set function providing lower and upper bounds on the probabilities calculated from these measures. F -probabilities are simply the tightest bounds possible for the set of probability measures. This definition is followed by demonstrating that various forms of uncertainty representation (possibility, interval, cloud, and probability) all can be represented by such measures. The next subsection shows how IVPMs can be constructed from lower and upper bounding cumulative distribution functions. This is followed by an extension principle for a function of uncertain variables represented by IVPMs and integration with respect to IVPMs. Both definitions will be useful in analyzing problems involving uncertainty represented by IVPMs. An application to a problem in optimization is given.
170
LODWICK
Throughout this section, we are interested primarily in interval-valued probability defined on the Borel sets on the real line and real-valued random variables. The basic definitions from Weichselberger (with slight variation in notation) are presented next. Definition 27 (Weichselberger, 2000). Given measurable space (S, A), an interval-valued function im : A → Int[0,1] is called an R-probability if: (a) im (A) = [a − (A), a + (A)] ⊆ [0, 1] with a − (A) ≤ a + (A) (b) There exists a probability measure Pr on A such that ∀A ∈ A,
Pr(A) ∈ im (A).
By an R-probability field, we mean the triple (S, A, im ). Definition 28 (Weichselberger, 2000). Given an R-probability field R = (S, A, im ) the set M(R) = Pr | Pr is a probability measure on A such that ∀A ∈ A, Pr(A) ∈ im (A) is called the structure of R. Definition 29 (Weichselberger, 2000). An R-probability field R = (S, A, im ) is called an F -probability field, if ∀A ∈ A: (a) a + (A) = sup{Pr(A) | Pr ∈ M(R)}, (b) a − (A) = inf{Pr(A) | Pr ∈ M(R)}. It is interesting to note that given a measurable space (S, A) and a set of probability measures P , then defining a + (A) = sup{Pr(A) | Pr ∈ P } and a − (A) = inf{Pr(A) | Pr ∈ P } gives an F -probability, where P is a subset of the structure. The following examples show how intervals, possibility distributions, clouds, and (of course) probability measures can define R-probability fields on B , the Borel sets on the real line. Example (An interval defines an F -probability field). Let I = [a, b] be a nonempty interval on the real line. On the Borel sets, define a + (A) = 1 if I ∩ A = ∅ 0 otherwise and 1 if I ⊆ A a − (A) = 0 otherwise,
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
171
then im (A) = [a − (A), a + (A)] defines an F -probability field R = (R, B , im ). To see this, simply set P to be the set of all probability measures on B such that Pr(I ) = 1. Example (A probability measure is an F -probability field). Let Pr be a probability measure over (S, A). Define im (A) = [Pr(A), Pr(A)], which is equivalent to having total knowledge about a probability distribution over S. The concept of a cloud was introduced by Neumaier (2004a), and in the context of the notation of this section, it is defined as follows. Definition 30. A cloud over set S is a mapping c such that: ¯ with 0 ≤ n(s) ≤ p(s) ¯ ≤ 1, (1) ∀s ∈ S, c(s) ! = [n(s), p(s)] (2) (0, 1) ⊆ s∈S c(s) ⊆ [0, 1]. In addition, random variable X taking values in S is said to belong to cloud c (written X ∈ c) if ¯ > α). (3) ∀α ∈ [0, 1], Pr(n(X) ≥ α) ≤ 1 − α ≤ Pr(p(X) Clouds are closely related to possibility theory. A function p : S → [0, 1] is called a regular possibility distribution function if sup p(x) | x ∈ S = 1. Possibility distribution functions (Wang and Klir, 1992) define a possibility measure, Pos : S → [0, 1], where Pos(A) = sup p(x) | x ∈ A , and its dual necessity measure, Nec(A) = 1 − Pos Ac . By convention, we define sup{p(x) | x ∈ ∅} = 0. A necessity distribution function can also be defined as n : S → [0, 1] by setting n(x) = 1 − p(x).
172 Observe that
LODWICK
Nec(A) = inf n(x) | x ∈ Ac ,
where we define inf{n(x) | x ∈ ∅} = 1. Jamison and Lodwick (2002) showed that possibility distributions could be constructed that satisfy the following consistency definition. Definition 31. Let p : S → [0, 1] be a regular possibility distribution function with associated possibility measure Pos and necessity measure Nec. Then p is said to be consistent with random variable X if for all measurable sets A, Nec(A) ≤ Pr(X ∈ A) ≤ Pos(A). Remark. Recall that a distribution acts on real numbers, and measures act on sets of real numbers. The concept of a cloud can be stated in terms of certain pairs of consistent possibility distributions, which is shown in the following proposition. Proposition IV.1. Let p, ¯ p be a pair of regular possibility distribution functions over set S such that ∀s ∈ S p(s) ¯ + p(s) ≥ 1. Then the mapping c(s) = [n(s), p(s)], ¯ where n(s) = 1 − p(s) (that is, the dual necessity distribution function) is a cloud. In addition, if X is a random variable taking values in S and the possibility measures associated with p, ¯ p are consistent with X, then X belongs to cloud c. Conversely, every cloud defines such a pair of possibility distribution functions, and their associated possibility measures are consistent with every random variable belonging to the cloud c. Proof. ⇒ ¯ ≥ 1 imply property (1) of Definition 31. (1) p, ¯ p : S → [0, 1] and p(s)+p(s) (2) Since all regular possibility distributions satisfy sup{p(s) | s ∈ S} = 1 property (2) of Definition 31 holds. Therefore c is a cloud. Now assume consistency. Then α ≥ Pos{s | p(s) ≤ α} ≥ Pr{s | p(s) ≤ α} = 1 − Pr{s | p(s) > α} gives the right-hand side of the required inequalities and 1 − α ≥ Pos s | p(s) ≤ 1 − α ≥ Pr s | p(s) ≤ 1 − α = Pr s | 1 − p(s) ≥ α = Pr n(X) ≥ α
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
173
gives the left-hand side. ⇐ The opposite identity was proven in Section 5 of Neumaier (2004a). Example (A cloud defines an R-probability field). Let c be a cloud over the real line. If Pos1 , Nec1 , Pos2 , Nec2 are the possibility measures and their dual necessity measures relating to p(s) ¯ and p(s), define
im (A) = max Nec1 (A), Nec2 (A) , min Pos1 (A), Pos2 (A) . Neumaier (2005) proved that every cloud contains a random variable X. Consistency requires that Pr(X ∈ A) ∈ im (A) and thus every cloud defines an R-probability field. Example (A possibility distribution defines an R-probability field). Let p : S → [0, 1] be a regular possibility distribution function, and let Pos be the associated possibility measure and Nec the dual necessity measure. Define im (A) = [Nec(A), (A)]. If we define a second possibility distribution, p(x) = 1 ∀x, then the pair p, p define a cloud for which im (A) defines the R-probability. a. Construction from Kolmogorov–Smirnoff Statistics. In this subsection, an F -probability is constructed from lower and upper bounding cumulative distribution functions in a manner allowing practical computation. For example, given statistical data, we can construct a confidence interval for the underlying cumulative distribution function using the Kolmogorov estimation of confidence limits (see Kolmogorov, 1941). Then using this confidence interval, we can use the following development to construct an IVPM. Although a simple definition could be used by simply setting the interval equal to the lower and upper bound over all probability measures contained in the bound, it is not clear how to use this definition in practice. The development that follows is more amenable to actual use. Let F u (x) = Pr Xu ≤ x , and
F l (x) = Pr Xl ≤ x ,
be two cumulative distribution functions for random variables over the Borel sets on the real line, Xu and X l , with the property that F u (x) ≥ F l (x) ∀x. Set M Xu , Xl = X | ∀x F u (x) ≥ Pr(X ≤ x) ≥ F l (x) ,
174
LODWICK
which clearly contains Xu and Xl . We will think in terms of an unknown X ∈ M(Xu , Xl ). For any Borel set A, let Pr(A) = Pr(X ∈ A). We begin by developing probability bounds for members of the family of sets I = (a, b], (−∞, a], (a, ∞), (−∞, ∞), ∅ | a < b . For I = (−∞, b], it is clear by definition that
Pr(I ) ∈ F l (b), F u (b) . For I = (a, ∞), let
Pr(I ) ∈ 1 − F u (a), 1 − F l (a) . For I = (a, b], since I = (−∞, b] − (−∞, a], and considering minimum and maximum probabilities in each set, let
Pr(I ) ∈ max F l (b) − F u (a), 0 , F u (b) − F l (a) . Therefore, if we extend the definition of F u , and F l by defining F u (−∞) = F l (−∞) = 0, and F u (∞) = F l (∞) = 1, we can make the following general definition. Definition 32. For any I ∈ I , if I = ∅, define
im (I ) = a − (I ), a + (I ) = max F l (b) − F u (a), 0 , F u (b) − F l (a) , where a and b are the left and right end points of I . Otherwise, set im (∅) = [0, 0]. Remark. Note that with this definition
im (−∞, ∞) = max F l (∞) − F u (−∞), 0 , F u (∞) − F l (−∞) = [1, 1], which matches our intuition and thus, it is easy to see that Pr(I ) ∈ im (I ) ∀I ∈ I .
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
175
We can extend this to include finite unions of elements of I . For example, if E = I1 ∪ I2 = (a, b] ∪ (c, d] with b < c, then we consider the probabilities Pr (a, b] + Pr (c, d] , 1 − Pr (−∞, a] + Pr (b, c] + Pr (d, ∞) (the probability of the sets that make up E versus one less the probability of the intervals that make up the complement), and consider the minimum and maximum probability for each case as a function of the minimum and maximum of each set. The minimum for the first sum is max 0, F l (d) − F u (c) + max 0, F l (b) − F u (a) , and the maximum is F u (d) − F l (c) + F u (b) − F l (a). The minimum for the second is 1 − F u (∞) − F l (d) + F u (c) − F l (b) + F u (a) − F l (−∞) = F l (d) − F u (c) + F l (b) − F u (a), and the maximum is 1 − max 0, F l (∞) − F u (d) + max 0, F l (c) − F u (b) + max 0, F l (a) − F u (−∞) = F u (d) − max 0, F l (c) − F u (b) − F l (a). This gives
Pr(E) ≥ max and
so
u l u F l (d) − Fl (c) + Fu (b) − F (a) max 0, F (d) − F (c) + max 0, F l (b) − F u (a)
F u (d) − max 0, F l (c) − F u (b) − F l (a) , Pr(E) ≤ min F u (d) − F l (c) + F u (b) − F l (a)
Pr(E) ∈ max 0, F l (d) − F u (c) + max 0, F l (b) − F u (a) , F u (d) − max 0, F l (c) − F u (b) − F l (a) .
The final line is arrived at by noting that ∀x,
yF l (x) − F u (y) ≤ max 0, F l (x) − F u (y) .
176
LODWICK
Remark. Note the two extreme cases for E = (a, b] ∪ (c, d]. For F u (x) = F l (x) = F (x) ∀x, then, as expected, Pr(E) = F (d) − F (c) + F (b) − F (a) = Pr (a, b] + Pr (c, d] , that is, it is the probability measure. Moreover, for F l (x) = 0 ∀x,
Pr(E) ∈ 0, F u (d) , that is, it is a possibility measure for the possibility distribution function F u (x). Let
)
K
E=
* ' ' Ik ' Ik ∈ I .
k=1
That is, E is the algebra of sets generated by I . Note that every element of E has a unique representation as a union of the minimum number of elements of I (or, stated differently, as a union of disconnected elements of I ). Note also that R ∈ E and E is closed under complements. !J ! c Assume E = K k=1 Ik and E = j =1 Mj are the unique representations of E and E c in E in terms of elements of I . Then, considering minimum and maximum possible probabilities of each interval, it is clear that + K J Pr(E) ∈ max a − (Ik ), 1 − a + (Mj ) , j =1
k=1
min
K
a + (Ik ), 1 −
J
, a − (Mj )
.
j =1
k=1
This can be made more concise using the following result. ! and E c = Jj=1 Mj are the unique J − + representations of E and E c ∈ E , then K k=1 a (Ik ) ≥ 1 − j =1 a (Mj ), K J and k=1 a + (Ik ) ≥ 1 − j =1 a − (Mj ). Proposition IV.2. If E =
!K
k=1 Ik
Proof. We need only prove K k=1
a − (Ik ) ≥ 1 −
J j =1
a + (Mj ),
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
177
since we can exchange the roles of E and E c , giving J
−
a (Mj ) ≥ 1 −
j =1
K k=1
thereby proving the second inequality. Note of the form K
a + (Ik ),
K
k=1 a
− (I ) + k
J
j =1 a
+ (M
j ) is
J max 0, F l (bk ) − F u (ak ) + F u (aj +1 ) − F l (bj ) j =1
k=1
≥
K
J F u (aj +1 ) − F l (bj ). F l (bk ) − F u (ak ) +
k=1
j =1
Since the union of the disjoint intervals yields all of the real line, we have either F u (∞) or F l (∞) less either F u (−∞) or F l (−∞), which is 1 regardless. Next im is extended to E . ! !J c Proposition IV.3. For any E ∈ E , let E = K k=1 Ik , and E = j =1 Mj be the unique representations of E and E c in terms of elements of I , respectively. If +K , J im (E) = a − (Ik ), 1 − a − (Mj ) , k=1
j =1
then im : E → Int[0,1] is an extension of I to E and is well defined. In addition,
im (E) = inf Pr(X) ∈ E | X ∈ M Xu , Xl , sup Pr(X) ∈ E | X ∈ M Xu , Xl . Proof. First assume E = (a, b] ∈ I , then E c = (−∞, a] ∪ (b, ∞), so by the definition,
im (E) = max F l (b) − F u (a), 0 , 1 − max F l (a) − F u (−∞), 0 + max F l (∞) − F u (b), 0 , which matches the definition for im on I . The other cases for E ∈ I are similar. Thus, it is an extension. It is easy to show that it is well defined, since the representation of any element in E in terms of the minimum number of
178
LODWICK
elements of I is unique. In addition, it is clear that 0≤
K
a − (Ik )
k=1
and 1−
J
a − (Mj ) ≤ 1.
j =1
So we only need to show that K
a − (Ik ) ≤ 1 −
J
a − (Mj ).
j =1
k=1
That is, K
−
a (Ik ) +
J
a − (Mj ) ≤ 1.
j =1
k=1
If we relabel the end points of all these intervals as −∞ = c1 < c2 < · · · < cN = ∞, then K
a − (Ik ) +
J
a − (Mj ) =
j =1
k=1
N −1
max F l (cn+1 ) − F u (cn ), 0
n=1
≤ =
N −1 n=1 N −1
max F u (cn+1 ) − F u (cn ), 0 F u (cn+1 ) − F u (cn )
n=1
= 1. Thus K
a − (Ik ) +
k=1
J
a − (Mj ) ≤ 1.
j =1
For the last equation, assume E=
K k=1
Ik = (−∞, b1 ] ∪ (a2 , b2 ] ∪ · · · ∪ (aK , bK ]
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
179
and Ec =
J
Mj = (b1 , a2 ] ∪ · · · ∪ (bK , ∞).
j =1
We will show that
X ∈ M Xu , Xl
⇒
Pr(X ∈ E) ∈ im (E),
and there is an X ∈ M(Xu , Xl ) for which Pr(X ∈ E) = a + (E). Note first that J
a − (Mj ) =
j =1
K
max F l (ak+1 ) − F u (bk ), 0
k=1
≤
K
max F (ak+1 ) − F (bk ), 0
k=1
= Pr E c , which gives both
Pr(E) = 1 − Pr E c ≤ a + (E),
and by replacing E with E c a − (E) ≤ Pr(E). Next for x ≤ a2 , set
F (x) = min F l (b1 ), F u (x) ,
and for a2 < x ≤ b2 , set x − a2 b2 − x F u (x) + F u (x) . F (x) = min F l (b2 ), b2 − a2 b2 − a2 Continuing in this way gives a cumulative distribution function for which J a − (Mj ) Pr E c = j =1
and Pr(E) = 1 −
J j =1
a − (Mj ).
180
LODWICK
The other bound is similarly derived.
The family of sets, E , is a ring of sets generating the Borel sets B . For an arbitrary Borel set S, then it is clear that
Pr(S) ∈ sup a − (E) | E ⊆ S, E ∈ E , inf a + (F ) | S ⊆ F, F ∈ E . This prompts the following: Proposition IV.4. Let im : B → [0, 1] be defined by
im (A) = sup a − (E) | E ⊆ A, E ∈ E , inf a + (F ) | A ⊆ F, F ∈ E . Then im is an extension from E to B , and it is well defined. Proof. The last property of Proposition IV.3 ensures it is an extension since, for example, if E ⊆ F are two elements of E , then a + (E) ≤ a + (F ) so inf a + (F ) | E ⊆ F, F ∈ E = a + (E). Similarly, it ensures that sup a − (F ) | F ⊆ E, F ∈ E = a − (E). Next we show that im is well defined. Proposition IV.3 shows that ∀E ∈ E , Thus,
and
We also have
im (E) ⊆ [0, 1].
0 ≤ sup a − (E) | E ⊆ S inf a + (E) | S ⊆ E ≤ 1. sup a − (E) | E ⊆ S ≤ inf a + (F ) | S ⊆ F .
Proposition IV.5. The function im : B → Int[0,1] defines an F -probability field on the Borel sets and
im (B) = inf Pr(X ∈ B) | X ∈ M Xu , Xl , sup Pr(X ∈ B) | X ∈ M Xu , Xl , that is, M(Xu , Xl ) defines a structure. Proof. Clear.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
181
b. Interval-Valued Integration, Extension and Independence of F -Probabilities. This subsection defines three key concepts needed for the application of IVPMs to mathematical programming problems: integration, extension, and independence. Definition 33. Given F -probability field R = (S, A, im ) and an integrable function f : S → R, we define: f (x) dim = inf f (x) dp, sup f (x) dp . p∈M(R)
A
p∈M(R)
A
A
We make the following observations that are useful in actual evaluation. It is easy to see that if f is an A-measurable simple function such that f (x) = y x ∈ A with A ∈ A, 0 x∈ /A then f (x) dim = yim (A). A
Further, if f is a simple function taking values {yk | k ∈ K} on an at-most countable collection of disjoint measurable sets {Ak | k ∈ K} that is, f (x) = yk x ∈ Ak where A = Ak , 0 x∈ /A k∈K
then
f (x) dim = a
−
A
+
f (x) dim = sup
and −
f (x) dim
A
A
' ' yk Pr(Ak ) ' Pr ∈ M(R)
k∈K
A
a
+ f (x) dim , a f (x) dim ,
A
where a
' ' yk Pr(Ak ) ' Pr ∈ M(R) . = inf
(94)
k∈K
Note that these can be evaluated by probsolving two linear programming ! lems, since Pr ∈ M ( R ) implies that Pr(A) = 1, and Pr( A k∈K l∈L⊂K l ) ∈ ! im ( l∈L⊂K Al ) so the problem may be tractable. In general, if f is an
182
LODWICK
integrable function, and {fk } is a sequence of simple functions converging uniformly to f , then the integral with respect to f can be determined by noting that f (x) dim = lim fk (x) dim , k→∞
A
where
fk (x) dim =
lim
k→∞
A
lim a
−
k→∞
A
+ fk (x) dim , lim a fk (x) dim , k→∞
A
A
provided the limits exist. -Example. Consider the IVPM constructed from the interval [a, b]. Then R x dim = [a, b], that is, the interval-valued expected value is the interval itself. Definition 34. Let R = (S, A, im ) be an F -probability field and f : S → T a measurable function from measurable space (S, A) to measurable space (T , B ). Then the F -probability (T , B , lm ) defined by
lm (B) = inf Pr f −1 (B) | Pr ∈ M(R) , sup Pr f −1 (B) | Pr ∈ M(R) (95) is called the extension of the R-probability field to (T , B ). That this defines an F -probability field is clear from our earlier observation. In addition, it is easy to see that this definition is equivalent to setting lm (A) = im f −1 (A) , which allows for evaluation using the techniques described earlier. We now address the combination of IVPMs when the variables are independent. We do not address the situation when dependencies may be involved. Given measurable spaces (S, A) and (T , B ) and the product space (S × T , A × B ), assume iX×Y is an IVPM on A × B . Call iX and iY defined by iX (A) = iX×Y (A × T ) and iY (B) = iX×Y (S × B) the marginals of iX×Y . The marginals, iX and iY , are IVPMs. Definition 35. Call the marginal IVPMs independent if and only if iX×Y (A× B) = iX (A)iY (B) ∀A, B ⊆ S.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
183
Definition 36. Let R = (S, A, iX ) and Q = (T , B , lY ) be F -probability fields representing uncertain random variables X and Y . We define the F -probability field (S × T , A × B , iX×Y ) by defining . ' + iX×Y (A × B) = sup Pr(B) Pr(B) ' Pr ∈ M(R), Pr ∈ M(Q) , X Y X Y . ' − iX×Y (A × B) = inf Pr(B) Pr(B) ' Pr ∈ M(R), Pr ∈ M(Q) , X
Y
X
Y
where (S × T , A × B ) is the usual product of sigma algebra of sets. It is clear from this definition that iX×Y (A × B) ≡ iX (A)iY (B) for all A ∈ A and B ∈ B . Thus, if we have several uncertain parameters in a problem with the uncertainty characterized by IVPMs, and all are independent, we can form an IVPM for the product space by multiplication and use this IVPM. c. Application to Optimization. An example application of IVPM is an optimization problem, the recourse problem. Suppose we wish to optimize f (x, a) subject to g(x, b) = 0. Assume a and b are vectors of independent uncertain parameters, each with an associated IVPM. Assume the constraint can be violated at a known cost c so that the problem is to solve: h(x, a, b) = max f (x, a) − c g(x, b) . Using the independence, form an IVPM for the product space, ia×b , for the joint distribution. Then calculate the interval-valued expected value with respect to this IVPM. The resulting interval-valued expected value is h(x, a, b) dia×b . R
To optimize over such a value requires an ordering of intervals. One such ordering is to use the midpoint of the interval on the principle that in the absence of additional data, the midpoint is the best estimate for the true value. Another possible ordering is to use risk/return multi-objective decision making. For example, determine functions u : R2 → R and v : IntR → R2 by setting, for any interval I = [a, b], v(I ) = ( a+b 2 , b − a). Thus, v gives the midpoint and width of an interval. Then u would measure the decision makes preference for one interval over another considering both its midpoint
184
LODWICK
and width (a risk measure). The optimization problem becomes h(x, a, b) dia×b . max u v x
R
C. Generalized Extension Principles for Distribution One extension principle associated with general distributions is presented above [Eq. (95)], and is perhaps the only one that is able to deal with the complete set of uncertainty that is of interest to this article. The current research seems to lack a generalized extension principle except what is presented by Jamison and Lodwick (2006) and that given here. The reason for this is not only the complexity of the problem, but a lack of a general theory that captures the broad spectrum of uncertainty distribution, which IVPMs of Weichselberger (2000) and Jamison and Lodwick (2006) do.
R EFERENCES Aberth, O. (1978). A method for exact computation with rational numbers. Journal of Computational and Applied Mathematics 4, 285–288. Aberth, O. (1988). Precise Numerical Analysis. William C. Brown, Dubuque, IA. Alefeld, G., Herzberger, J. (1983). Introduction to Interval Computations. Academic Press, New York. Alefeld, G. (1990). Enclosure methods. In: Ullrich, C. (Ed.), Computer Arithmetic and Self-Validating Numerical Methods. Academic Press, Boston, MA, pp. 55–72. Anile, M., Deodato, S., Privitera, G. (1995). Implementing fuzzy arithmetic. Fuzzy Sets and Systems 72 (2), 239–250. Archimedes of Siracusa (1897). Measurement of a circle. In: Heath, T.L. (Ed.), The Works of Archimedes. Cambridge University Press, Cambridge. Dover edition, 1953. Audin, J.-P., Frankkowska, H. (1990). Set-Valued Analysis. Birkhäuser, Boston, MA. Baudrit, C., Dubois, D., Fargier H. (2005). Propagation of uncertainty involving imprecision and randomness. ISIPTA, 31–40. Berleant, D. (1993). Automatically verified reasoning with both intervals and probability density functions. Interval Computations 2, 48–70. Berleant, D., Goodman-Strauss, C. (1998). Bounding the results of arithmetic operations on random variables of unknown dependency using intervals. Reliable Computing 4, 147–165.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
185
Berleant, D., Zhang, J. (2004). Representation and problem solving with distribution envelope determination (DEnv). Reliability Engineering and Systems Safety 85 (1–3), 153–168. Böhmer, K., Hemker, P., Stetter, H.J. (1984). The defect correction approach. Computing Supplement 5, 1–32. Bondia, J., Sala, A., Sáinz, M. (2006). Modal fuzzy quantities and applications to control. In: Demirli, K., Akgunduz, A. (Eds.), 2006 Conference of the North American Fuzzy Information Processing Society, June 3–6, 2006, Montréal, Canada. Bornemann, F., Laurie, D., Wagon, S., Waldvogel, J. (2004). The SIAM 100-Digit Challenge: A Study in High-Accuracy Numerical Computing. SIAM, Philadelphia, PA. Burkill, J.C. (1924). Functions of intervals. Proceedings of the London Mathematical Society 22, 375–446. Corliss, G.F. (2004). Tutorial on validated scientific computing using interval analysis. PARA’04 Workshop on State-of-the-Art Computing. Technical University of Denmark, June 20–23, 2004. See http://www.eng.mu.edu/corlissg/PARA04/READ_ME.html. Daney, D., Papegay, Y., Neumaier, A. (2004). Interval methods for certification of the kinematic calibration of parallel robots. In: Proc. 2004 IEEE Int. Conf. Robotics Automation. New Orleans, LA, pp. 191–198. Davies, B. (2005). Whither mathematics? Notices of the AMS 52 (11), 1350– 1356. Demster, A.P. (1967). Upper and lower probabilities induced by multivalued mapping. Annals of Mathematical Statistics 38, 325–339. Dempster, M.A.H. (1969). Distributions in interval and linear programming. In: Hansen, E.R. (Ed.), Topics in Interval Analysis. Oxford University Press, pp. 107–127. Dempster, M.A.H. (1974). An application of quantile arithmetic to the distribution problem in stochastic linear programming. Bulletin of the Institute of Mathematics and Its Applications 10, 186–194. Dubois, D., Prade, H. (1977). Le flou, mécédonka, Tech. Rep. C.E.R.T.-D.E.R.A. Toulouse, April 1977. Dubois, D., Prade, H. (1978). Operations on fuzzy numbers. International Journal of Systems Science 9 (6), 613–626. Dubois, D., Prade, H. (1980). Fuzzy Sets and Systems: Theory and Applications. Academic Press, New York. Dubois, D., Prade, H. (1981). Additions of interactive fuzzy numbers. IEEE Trans. on Automatic Control 26 (4), 926–936. Dubois, D., Prade, H. (1987a). Evidence theory and interval analysis. Second IFSA Congress, Tokyo, July 20–25, 1987, pp. 502–505.
186
LODWICK
Dubois, D., Prade, H. (1987b). Fuzzy numbers: An overview. Technical Report 219, L.S.I., Univ. Paul Sabatier, Toulouse, France; also in Bezdek, J.C. (Ed.), Analysis of Fuzzy Information: Mathematics and Logic, vol. 1. CRC Press, Boca Raton, FL, pp. 3–39. Dubois, D., Prade, H. (1988). Possibility Theory—An Approach to Computerized Processing of Uncertainty. Plenum Press, New York. Dubois, D., Prade, H. (1991). Random sets and fuzzy interval analysis. Fuzzy Sets and Systems 42, 87–101. Dubois, D., Prade, H. (Eds.) (2000a). Fundamentals of Fuzzy Sets. Kluwer Academic Press. Dubois, D., Prade, H. (2005). Fuzzy elements in a fuzzy set. In: Proceedings of the 10th International Fuzzy System Association (IFSA) Congress, Beijing, pp. 55–60. Dubois, D., Kerre, E., Mesiar, R., Prade, H. (2000b). Fuzzy Interval Analysis. In: Dubois, D., Prade, H. (Eds.), Fundamentals of Fuzzy Sets. Kluwer Academic Press, pp. 55–72. Dubois, D., Moral, S., Prade, H. (1997). Semantics for possibility theory based on likelihoods. Journal of Mathematical Analysis and Applications 205, 359–380. Dwayer, P.S. (1951). Linear Computations. Wiley, New York. Ely, J.S. (1993a). The VPI software package for variable precision interval arithmetic. Interval Computation 2 (2), 135–153. Ely, J.S., Baker, G.R. (1993b). High-precision calculations of vortex sheet motion. Journal of Computational Physics 111, 275–281. Ferson, S. (2002). RAMAS Risk Calc 4.0 Software: Risk Assessment with Uncertain Numbers. Lewis Publishers, Boca Raton, FL. Ferson, S., Kreinovich, L.R., Ginzburg, V., Sentz, K., Myers, D.S. (2003). Constructing probability boxes and Dempster-Shafer structures. Technical Report SAND2002-4015. Sandia National Laboratories, Albuquerque, NM. Ferson, S., Kreinovich, V. (2005). Combining interval and probabilistic uncertainty: Foundations, algorithms, challenges—an overview. WorldWide Web. Fischer, P.C. (1958). Automatic propagated and round-off error analysis. Proceedings of the 13th national meeting of the Association for Computing Machinery, June 1958. See http://portal.acm.org/citation.cfm?id=610971. Fortin, J., Dubois, D., Fargier, H. (2006). Gradual numbers and their application to fuzzy interval analysis. IEEE Transactions on Fuzzy Systems, in press. Frank, M.J., Nelsen, R.B., Schweizer, B. (1987). Best-possible bounds for the distribution of a sum—A problem of Kolmogorov. Probability Theory and Related Fields 74, 199–211.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
187
Gardeñes, E., Mielgo, H., Trepat, A. (1986). Modal intervals: Reasons and ground semantics. In: Nickel, K. (Ed.), Interval Mathematics 1985. Proceedings of the International Symposium, Freiburg, Federal Republic of Germany, September 23–26, 1985, Springer-Verlag, Berlin, pp. 27–35. Guderley, K.G., Keller, C.L. (1972). A basic theorem in the computation of ellipsoidal error bounds. Numerische Mathematik 19 (3), 218–229. Hales, T.C. (2000). Cannonballs and honeycombs. Notices of the American Mathematical Society 47, 440–449. Hansen, E.R. (1975). A generalized interval arithmetic. In: Nickel, K. (Ed.), Interval Mathematics, Lecture Notes in Computer Science, vol. 29. Springer-Verlag, New York, pp. 7–18. Hansen, E.R. (1978). Interval forms of Newton’s method. Computing 20, 153– 163. Hansen, E.R. (1980). Global optimisation using interval analysis—The multidimensional case. Numerische Mathematik 34, 247–270. Hansen, E.R. (1992). Global Optimization Using Interval Analysis. Marcel Dekker, New York. Hansen, E.R. (2001). Publications related to early interval work of R.E. Moore, August 13, 2001. See http://interval.louisiana.edu/ Moores_early_papers/bibliography.html. Hanss, M. (2005). Applied Fuzzy Arithmetic. Springer-Verlag, Berlin. Jahn, K. (1980). The importance of 3-valued notions for interval mathematics. In: Nickel, K.E. (Ed.), Interval Mathematics. Academic Press, New York, pp. 75–98. Jamison, K.D., Lodwick, W.A. (2002). The construction of consistent possibility and necessity measures. Fuzzy Sets and Systems 132 (1), 1–10. Jamison, K.D., Lodwick, W.A. (2004). Interval-valued probability measures. UCD/CCM Report No. 213. Jamison, K.D., Lodwick, W.A. (2006) Interval-valued probability in the analysis of problems that contain a mixture of fuzzy, possibilistic and interval uncertainty. International Journal of Approximate Reasoning, in press. Jamison, K.D., Lodwick, W.A., Kawai, M. (2002). A simple closed form estimation for the cumulative distribution function of a monotone function of random variables. UCD/CCM Report No. 187. Jaulin, L. (2001a). Path planning using intervals and graphs. Reliable Computing 7 (1), 1–15. Jaulin, L., Kieffer, M., Didrit, O., Walter, E. (2001b). Applied Interval Analysis. Springer-Verlag, New York. Kahan, W.M. (1968a). Circumscribing an ellipsoid about the intersection of two ellipsoids. Canadian Mathematical Bulletin 11 (3), 437–441. Kahan, W.M. (1968b). A more complete interval arithmetic. Lecture Notes for a Summer Course at University of Michigan.
188
LODWICK
Kaplan, S. (1981). On the method of discrete probability distributions in risk and reliability calculations—Applications to seismic risk assessment. Journal of Risk 1 (3), 189–196. Kaucher, E. (1973). Über metrische und algebraische Eigenschaften eiginger beim numerischen Rechnen auftretender Räume. PhD thesis, University of Karlsruhe. Kaucher, E. (1980). Interval analysis in the extended space I R. Computing (Suppl.) 2, 33–49. Kaucher, E., Rump, S.M. (1982). E-methods for fixed point equations f (x) = x. Computing 28, 31–42. Kaufmann, A., Gupta, M.M. (1985). Introduction to Fuzzy Arithmetic: Theory and Applications. Van Nostrand Reinhold, New York. Kawai, M. (2006). Discrete clouds. Seminar presentation April 28, 2006, Department of Mathematical Sciences of University of Colorado at Denver. Kaymak, U., Sousa, J.M. (2003). Weighting of constraints in fuzzy optimization. Constraints 8, 61–78. Kearfott, R.B. (1996a). Rigorous Global Search: Continuous Problem. Kluwer Academic Publishers, Boston, MA. Kearfott, R.B. (1996b). Interval computations: Introduction, uses, and resources. Euromath Bulletin 2 (1), 95–112. Klaua, D. (1969). Partielle Mengen und Zhlen. Mtber. Dt. Akad. Wiss. 11, 585–599. Klir, G.J., Yuan, B. (1995). Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall, Upper Saddle River, NJ. Klir, G.J. (1997). Fuzzy arithmetic with requisite constraints. Fuzzy Sets and Systems 91 (2), 165–175. Kolmogorov, A.N. (1941). Confidence limits for an unknown distribution function. Annuals of Mathematics Statistics 12, 461–463. Korenerup, P., Matula, D.W. (1983). Finite precision rational arithmetic: An arithmetic unit. IEEE Transactions on Computers 32 (4), 378–388. Kulisch, U. (1971). An axiomatic approach to rounded computations. Numerische Mathematik 18, 1–17. Kulisch, U., Miranker, W.L. (1981). Computer Arithmetic in Theory and Practice. Academic Press, New York. Kulisch, U., Miranker, W.L. (1986). The arithmetic of the digital computer: A new approach. SIAM Review 28 (1), 1–40. Lin, T.Y. (2005). A function theoretic view of fuzzy sets: New extension principle. In: Filev, D., Ying, H. (Eds.), Proceedings of NAFIPS 05. Lodwick, W.A. (1989). Constraint propagation, relational arithmetic in AI systems and mathematical programs. Annals of Operations Research 21, 143–148.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
189
Lodwick, W.A. (1999). Constrained interval arithmetic. CCM Report 138. Lodwick, W.A. (2002). Special Issue on theLinkages Between Interval Analysis and Fuzzy Set Theory. Reliable Computing 8. (entire issue). Lodwick, W.A., Jamison, K.D. (2003a). Special Issue on the Interfaces Between Fuzzy Set Theory and Interval Analysis. Fuzzy Sets and Systems 135. (entire issue). Lodwick, W.A., Jamison, K.D. (2003b). Estimating and validating the cumulative distribution of a function of random variable: Toward the development of distribution arithmetic. Reliable Computing 9, 127–141. Lodwick, W.A., Jamison, K.D. (2006). Interval-valued probability in the analysis of problems that contain a mixture of fuzzy, possibilistic and interval uncertainty. In: Demirli, K., Akgunduz, A. (Eds.), 2006 Conference of the North American Fuzzy Information Processing Society, June 3–6, 2006, Montréal, Canada. Makino, K., Berz, M. (2003). Taylor models and other validated functional inclusion methods. International Journal of Pure and Applied Mathematics 4 (4), 379–456. Makino, K., Berz, M. (2005). Verified global optimization with Taylor model based range bounders. Transactions on Computers 11 (4), 1611–1618. Mayer, G. (1995). Epsilon-inflation in verification algorithms. Journal of Computational and Applied Mathematics 60, 147–169. Mayer, G. (1996). Success in epsilon-inflation. In: Alefeld, G., Lang, B. (Eds.), Scientific Computing and Validated Numerics. Proceedings of the International Symposium on Scientific Computing, Computer Arithmetic and Validated Numerics SCAN-95, Wuppertal, Germany, September 26– 29, 1995, Akademie-Verlag, Berlin, MA, pp. 98–104. Moore, R.E. (1959a). Automatic error analysis in digital computation. Technical Report LMSD-48421. Lockheed Missile and Space Division. Sunnyvale, CA. See http://interval.louisiana.edu/Moores_early_papers/ bibliography.html. Moore, R.E., Yang, C.T. (1959b). Interval analysis I. Technical Report Space Div. Report LMSD285875. Lockheed Missiles and Space Co., Sunnyvale, CA. Moore, R.E., Strother, W., Yang, C.T. (1960). Interval integrals. Technical Report Space Div. Report LMSD703073. Lockheed Missiles and Space Co., Sunnyvale, CA. Moore, R.E. (1962). Interval arithmetic and automatic error analysis in digital computing. PhD thesis, Stanford University, Stanford, California, Published as Applied Mathematics and Statistics Laboratories Technical Report No. 25, November 15, 1962. See http://interval.louisiana.edu/Moores_early_papers/bibliography.html.
190
LODWICK
Moore, R.E. (1965). The automatic analysis and control of error in digital computing based on the use of interval numbers. In: Rall, L.B. (Ed.), Error in Digital Computation, vol. I. John Wiley & Sons, New York, pp. 61–130. Moore, R.E. (1966). Interval Analysis. Prentice-Hall, Englewood Cliffs, NJ. Moore, R.E. (1979). Methods and Applications of Interval Analysis. SIAM, Philadelphia, PA. Moore, R.E. (1984). Risk analysis without Monte Carlo methods. Freiburger Intervall-Berichte 1, 1–48. Moore, R.E. (1992). Computing to arbitrary accuracy. In: Bresinski, C., Kulisch, U. (Eds.), Computational and Applied Mathematics I: Algorithms and Theory. North-Holland, Amsterdam, pp. 327–336. Moore, R.E. (1999). The dawning. Reliable Computing 54, 423–424. Moore, R.E., Lodwick, W.A. (2003). Interval analysis and fuzzy set theory. Fuzzy Sets and Systems 135 (1), 5–9. Nahmias, S. (1978). Fuzzy variable. Fuzzy Sets and Systems 1, 97–110. Nelsen, R.B. (1995). Copulas, characterization, correlation, and counterexamples. Mathematics Magazine 68 (3), 193–198. Negoita, C.V., Ralescu, D.A. (1975). Applications of Fuzzy Sets to Systems Analysis. Birkhäuser, Boston, MA. Neumaier, A. (1990). Interval Methods for Systems of Equations. Cambridge University Press, Cambridge. Neumaier, A. (1993). The wrapping effect, ellipsoid arithmetic, stability and confidence regions. Computing Supplementum 9, 175–190. Neumaier, A. (2004a). Clouds, fuzzy sets and probability intervals. Reliable Computing 10, 249–272. Neumaier, A. (2004b). Complete search in continuous global optimization and constraint satisfaction. In: Iserles, A. (Ed.), Acta Numerica 2004. Cambridge University Press, Cambridge, pp. 271–369. Neumaier, A. (2005). Structure of clouds (submitted for publication). See http://www.mat.univie.ac.at/~neum/papers.html. Nguyen, H.T. (1978). A note on the extension principle for fuzzy sets. Journal of Mathematical Analysis and Applications 64, 369–380. Nickel, K. (1969). Triplex-Algol and its applications. In: Hansen, E.R. (Ed.), Topics in Interval Analysis. Oxford University Press, pp. 10–24. Olsen, G. (2005). The inverse probability method: An interval algorithm for measuring uncertainty. Masters Thesis, University of Colorado at Denver, Department of Mathematics. Ortolf, H.-J. (1969). Eine Verallgemeinerung der Intervallarithmetik. Gesellschaft für Mathematik und Datenverarbeitung, Bonn. Nr. 11, pp. 1–71. Phillips, G.M. (1981). Archimedes the numerical analyst. American Mathematical Monthly 81, 165–169.
INTERVAL AND FUZZY ANALYSIS : A UNIFIED APPROACH
191
Popova, E.D. (1998). See http://www.math.bas.bg/~epopova/directed.html. Pryce, J.D., Corliss, G.F. (2006). Interval arithmetic with containment sets. (Submitted for publication.). Ramik, J. (1986). Extension principle in fuzzy optimization. Fuzzy Sets and Systems 19, 29–35. Ratschek, H., Rokne, J. (1988). New Computer Methods for Global Optimization. Ellis Horwood, Chichester, England. Revol, N., Rouillier, F. (2005). Motivations for an arbitrary precision interval arithmetic and the MPFI library. Reliable Computing 11 (4), 275–290. Sáinz, M.A. (2001). Modal intervals. Reliable Computing 7 (2), 77–111. Schulte, M.J., Swartzlander, E.E. Jr. (2000). A family of variable-precision interval processors. IEEE Transactions on Computers 49 (5), 387–397. Shafer, G. (1987). Belief functions and possibility measures. In: Bezdek, J.C. (Ed.), Analysis of Fuzzy Information: Mathematics and Logic, vol. 1. CRC Press, Boca Raton, FL, pp. 51–84. Smale, S. (1998). Mathematical problems for the next century. Mathematical Intelligencer 20 (2), 7–15. Stolfi, J., Andrade, M.V.A., Comba, J.L.D. (1994). Affine arithmetic: A correlation-sensitive variant of interval arithmetic. See http://www.dcc.unicamp.br/~stolfi/EXPORT/projects/affine-arith. Springer, M.D. (1979). The Algebra of Random Variables. Wiley & Sons, New York. Stetter, H.J. (1978). The defect correction principle and discretization methods. Numerische Mathematik 29, 425–443. Strother, W. (1952). Continuity for multi-valued functions and some applications to topology. Doctoral dissertation, Tulane University. Strother, W. (1955). Fixed points, fixed sets, and m-retracts. Duke Mathematical Journal 22 (4), 551–556. Strother, W. (1958). Continuous multi-valued functions. Boletim da Sociedade de Matematica de São Paulo 10, 87–120. Sunaga, T. (1958). Theory of an interval algebra and its application to numerical analysis. RAAG Memoirs 2, 547–564. Downloaded http://www.cs.utep.edu/interval-comp/early.html. Tonon, F. (2004). On the use of random set theory to bracket the results of Monte Carlo simulations. Reliable Computing 10, 107–137. Tucker, W. (2002). A rigorous ODE solver and Smale’s 14th problem. Foundation of Computational Mathematics 2, 53–117. Tupper, J.A. (1996). Graphing equations with generalized interval arithmetic. PhD thesis, University of Toronto. Walley, P. (1991). Statistical Reasoning with Imprecise Probabilities. Chapman & Hall, London.
192
LODWICK
Walley, P. (1999). Towards a unified theory of imprecise probability. In: De Cooman, G., Cozman, F.G., Moral, S., Walley, P. (Eds.), Proceedings of the First International Symposium on Imprecise Probabilities and Their Applications, held at the Conference Center “Het Pand” of the Universiteit Gent, Ghent, Belgium, 29 June–2 July 1999, ISIPTA. Walster, G.W. (1998). The extended real interval system. (Personal copy from the author). Wang, Z., Klir, G.J. (1992). Fuzzy Measure Theory. Plenum Press, New York. Warmus, M. (1956). Calculus of approximations. Bulletin de l’Académie Polonaise de Sciences 3 (4), 253–259. Downloaded from http://www.cs. utep.edu/interval-comp/early.html. Warmus, M. (1961). Approximations and inequalities in the calculus of approximations. Classification of approximate numbers. Bulletin de l’Académie Polonaise de Sciences 4 (4), 241–245. Downloaded from http://www.cs.utep.edu/interval-comp/early.html. Weichselberger, K. (1996). Interval probability on finite sample spaces. In: Rieder, H. (Ed.), Robust Statistics, Data Analysis and Computer-Intensive Methods. Springer-Verlag, New York, pp. 391–409. Weichselberger, K. (2000). The theory of interval-probability as a unifying concept for uncertainty. International Journal of Approximate Reasoning 24, 149–170. Williamson, R.C., Downs, T. (1990a). Probabilistic arithmetic I: Numerical methods for calculating convolutions and dependency bounds. International Journal of Approximate Reasoning 4, 89–158. Williams, R.C. (1990b). Interval arithmetic and probability arithmetic. In: Ullrich, C. (Ed.), Computer Arithmetic and Self-Validating Numerical Methods. Academic Press, Boston, MA, pp. 67–80. Yager, R.R. (1986). A characterization of the extension principle. Fuzzy Sets and Systems 18, 205–217. Young, R.C. (1931). The algebra of many-valued quantities. Mathematische Annalen 104, 260–290. Zadeh, L.A. (1965). Fuzzy sets. Information and Control 8, 338–353. Zadeh, L.A. (1968). Probability measures of fuzzy events. Journal of Mathematical Analysis and Applications 23, 421–427. Zadeh, L.A. (1975). The concept of a linguistic variable and its application to approximate reasoning. Information Sciences. Part I: 8, 199–249, Part II: 8, 301–357, Part III: 9, 43–80. Zadeh, L.A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems 1, 3–28. Zadeh, L.A. (2005). Toward a generalized theory of uncertainty (GTU)—An outline. Information Science 172 (1–2), 1–40.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 148
On the Regularization of the Watershed Transform FERNAND MEYER1 AND CORINNE VACHIER2 1 Centre de Morphologie Mathématique, ENSMP, 35, rue Saint Honoré,
F-77300 Fontainebleau, France 2 CMLA, ENS Cachan, CNRS, PRES UniverSud, 61, avenue du Président Wilson,
F-94235 Cachan, France
I. Introduction: History of the Watershed Transform . . . . . . . . . . . II. Key Contributions . . . . . . . . . . . . . . . . . . . . A. Watershed Definitions . . . . . . . . . . . . . . . . . . 1. The Watershed as a Geodesic SKIZ Computed on the Level Sets of the Function 2. An Efficient Algorithm Based on Queues of Pixels . . . . . . . . 3. The Watershed as a SKIZ Directly Computed on the Numerical Function . . 4. The Watershed as the Solution of an Energy-Minimization Problem . . . 5. The Watershed as a Result of Numerical Thinnings . . . . . . . . B. Use of the Watershed Transform for Image Segmentation . . . . . . . C. Connected Works . . . . . . . . . . . . . . . . . . . 1. Watersheds and Graphs . . . . . . . . . . . . . . . . 2. Morphological Scale-Space Analysis . . . . . . . . . . . . 3. The Watershed Regularization . . . . . . . . . . . . . . III. The Contours Regularization . . . . . . . . . . . . . . . . . A. Precision and Robustness of the Watershed Transform . . . . . . . . B. Ideas for Regularizing the Relief . . . . . . . . . . . . . . . 1. Noisy Contours: Closings . . . . . . . . . . . . . . . . 2. Case of Dotted Contours: Dilations . . . . . . . . . . . . . C. Introduction of Viscosity . . . . . . . . . . . . . . . . . IV. The Viscous Watershed Line . . . . . . . . . . . . . . . . . A. Viscous Flooding Principle . . . . . . . . . . . . . . . . 1. Oil Flooding . . . . . . . . . . . . . . . . . . . 2. Mercury Flooding . . . . . . . . . . . . . . . . . . B. Viscous Transforms . . . . . . . . . . . . . . . . . . 1. Definitions and Properties . . . . . . . . . . . . . . . . C. Model Comparison . . . . . . . . . . . . . . . . . . D. Generalizations . . . . . . . . . . . . . . . . . . . . V. Experiments . . . . . . . . . . . . . . . . . . . . . . A. Thin Contour Line . . . . . . . . . . . . . . . . . . . B. Segmentation of Fuzzy or Blurred Images . . . . . . . . . . . . C. Segmentation of Generic Images . . . . . . . . . . . . . . . VI. Summary . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
194 202 202 204 205 205 207 208 208 210 210 211 214 216 216 218 218 218 222 224 225 225 228 229 229 232 234 237 237 237 240 243 245
193 ISSN 1076-5670 DOI: 10.1016/S1076-5670(07)48003-X
Copyright 2007, Elsevier Inc. All rights reserved.
194
MEYER AND VACHIER
I. I NTRODUCTION : H ISTORY OF
THE
WATERSHED T RANSFORM
Since its origins, numerical image processing has developed along three main axes: (1) filtering theory and scale-space analysis, which corresponds to a robust framework for signal characterization; (2) segmentation for a precise identification of the shapes present in a scene; and (3) symbolic representations that seek to numerically restore a description as close as possible to our own vision. While the beginning of filtering theory coincided with that of signal processing, segmentation actually appeared with digital images in the 1970s. What seemed to be a marginal problem in the case of one-dimensional (1D) signals became a critical issue when dealing with artificial vision. What does image segmentation mean? Considering a digital image that is nothing but a collection of pixels with color or intensity attributes distributed on a grid, the segmentation consists of grouping points into connected sets corresponding to meaningful entities in the image. These entities are also called shapes. A partition D is any extensive and disjunctive application defined on a space E, support of the image, into P (E) (Serra, 2003). 1. D extensive means that {x} ⊂ D (x). As a consequence, the partition ! satisfies: E = D (x). 2. D disjunctive means that D (x) = D (y) or D (x) ∩ D (y) = ∅.
D(x) represents the region containing the point x. Adjacent regions are possibly superimposed on their borders. In that case, the second property is modified so that: Int(D (x)) ∩ Int(D (y)) = ∅, where Int(D (x)) corresponds to the interior of region D (x): Int(D (x)) = D (x) \ ðD(x) and ðD (x) represents the border of D (x). Depending on the images and the study context, the best partition D satisfies various objectives: regions D (x) must be significant, i.e., homogeneous for some criterion such as color, texture, or motion. In some cases, control of the size and/or the number of regions is also desirable—for example, if a relatively simple description of the scene is required. Moreover, the process should be robust and yield the same result if the image is degraded (by the presence of noise, blurring, poor lighting conditions). Many segmentation paradigms have been proposed in the literature: statistical methods, fuzzy classifications, Markov fields, neural networks or PDEbased methods such as Bayesian models, snakes (active contour models), Mumford and Shah’s functional, Canny and Deriche’s operator, and so on. Also worthy of mention are models based on analogies with physical phenomena inspiring some interesting concepts: various energy minimizations [as in the balloon model (Cohen, 1991)], application of Fermat’s law of
REGULARIZATION OF THE WATERSHED TRANSFORM
195
minimal action (Delechelle and Lemoine, 1999; Cohen, 2002), or adaptation of Coulomb’s law (Grigorishin and Yang, 1998). Among pioneering works in the segmentation field, two elegant and promising solutions emerged very early from morphological school under the impulse works by Matheron and Serra: the flat zone partition and the catchment basin partition. The flat zone partition is the most immediate partition of the space that can be associated with a function f ; it is obtained by grouping the points of same value in the function f . Let λ denote the value of f at a point x: λ = f (x). D (x) is nothing but the largest connected component containing x made of points of same value λ in f (Figure 1). In other words, considering the function f as a topographical relief and interpreting the function values as altitudes, the flat zone partition simply segments the space according to the relief plateaus (Figure 2). Denoting Xλ (f ) the level set of f at level λ (see Figure 1): (1) Xλ (f ) = x ∈ E, f (x) ≥ λ . Flat zones of f at level λ are connected sets belonging to Xλ (f )/Xμ (f ) with μ > λ. Hence, decomposing a function into its level sets yields the flat zones of f . Level sets decomposition of numerical functions were originally
F IGURE 1.
F IGURE 2.
Level sets decomposition of a function.
A piecewise constant function and the associated flat zone partition.
196
MEYER AND VACHIER
introduced in mathematical morphology for extending binary transforms to the case of gray-scale images. The resulting segmentation of the image is now known under the name of flat zone decomposition (Serra and Salembier, 1993; Salembier and Serra, 1995) or level set decomposition (Monasse and Guichard, 2000). Why are these partitions so interesting? Because they are parameter free and naturally yield a peculiar kind of scale-space (Monasse and Guichard, 2000). They are associated with meaningful filters that simplify images while preserving contour information: these filters have been precisely studied by Serra and Salembier and are called connected filters (Salembier and Serra, 1995). Connected filters act on images by enlarging the flat zone partition: adjacent flat zones merge; others remain unchanged; no new flat zone may be created. As a consequence, partitions associated with families of connected filters are nested, which naturally leads to a hierarchical description of the image content. Two examples of connected filters and of flat zone partitions are shown in Figures 3 and 4. In the first example, flat zones of similar
(a)
(b)
(c)
(d)
F IGURE 3. Illustration of the flat zone partition paradigm. The original image is a view of one of the three “golden islands” near Hyères in southern France. (a) Original image (200 × 300 pixels × 8 bits), (b) its 27,988 flat zones, (c) its 700 “almost flat” zones (gray-level differences lower than 38 are neglected), and (d) its 300 largest flat zones (flat zones of area lower than 200 pixels are merged with another).
REGULARIZATION OF THE WATERSHED TRANSFORM
(a)
197
(b)
(c) F IGURE 4. Effects of connected operators on gray-scale images. (a) Original image, (b) merging of the flat zones with similar gray levels (differences lower than 38 are neglected), and (c) elimination of small flat zones (flat zones of area lower than 200 pixels are absorbed by another larger flat zone). When two adjacent flat zones merge, the mean luminance is attributed to the resulting region. Flat zone partitions associated with these connected filters are presented in Figure 3.
luminance are merged. It is a partition of almost flat regions. In the second example, flat zones of smallest area are absorbed by the largest neighboring flat zone. In both cases, when two adjacent flat zones merge, the mean value of the merging zones is given to the resulting region. Note that morphological filters by reconstruction, introduced by Meyer for binary sets and extended later by Serra to numerical functions (Serra, 1988), are connected. Even if these partitions are very well adapted to images presenting large uniform zones and sharp color transitions, they may not be as well adapted to natural images characterized by slow gray-level transitions. In that case, contours do not coincide with points where luminance changes but with points where luminance variations are maximal; considering the gradient norm of the function '∇f ' rather than the original function f , regions do not coincide with flat zones but with catchment basins of '∇f ' (Figure 5). Let us now define catchment basins for two-dimensional (2D) functions.
198
MEYER AND VACHIER
F IGURE 5.
Function with low variations and associated catchment basins partition.
In topography, the watershed line refers to a ridge that divides areas drained by different river systems, and a catchment basin is the geographical area draining into a river or reservoir. The fundamental idea leading to the catchment basin partition is based on an analogy where functions are interpreted as relief. In standard image segmentation applications, contours correspond to high luminance transitions (i.e., points where the gradient norm takes high values). For the gradient image (or any other contour image) being regarded as a topographic relief, the function values correspond to altitudes and the contours to crest lines of the relief (i.e., to border points of the catchment basins, also called watershed points). This idea was first set forth, studied, and exploited by Lantuéjoul and Beucher and has led to the second very powerful morphological segmentation paradigm: the watershed transform. The history began in 1978 with Lantuéjoul’s works on drainage in porous media (Lantuéjoul, 1978). In 1979, Beucher and Lantuéjoul formally characterized meaningful segmentations by using the concept of watershed in contour detection (Beucher and Lantuéjoul, 1979). In 1980, the watershed-based segmentation was improved by introducing the concept of markers (Meyer and Beucher, 1990), nicely solving the problem of oversegmentation of the watershed. Considering a set of markers disseminated on a space E, the influence zone of a marker xi is defined as the set of points of E closer to xi than to any other marker xj : IZ(xi ) = x ∈ E, ∀j = i, d(x, xi ) ≤ d(x, xj ) . (2) The skeleton by influence zone (denoted SKIZ) is the set of points at equal distance from at least two markers. For binary images, the distance d is either the Euclidean distance (if the whole space E has to be segmented) or the geodesic distance (for segmenting grains on binary images). In the Euclidean
199
REGULARIZATION OF THE WATERSHED TRANSFORM
case, the SKIZ is also known as Voronoi partition. In the geodesic case, denoting X the set of grains to be segmented, the geodesic distance between two points x and y in X is: 1. The length of the shortest path linking x and y and entirely included in X if it exists; and 2. +∞ if x and y do not belong to the same connected component of X. As an example, the academic problem of separating grains in a binary image cannot be solved by an Euclidean SKIZ since the geometry of the shapes is not considered (Figure 6). However, a correct segmentation is very easily achieved by a two-step nonparametric algorithm involving the computation of the ultimate erosion of X (which produces markers xi of the different shapes forming X) and the computation of the SKIZ associated with the markers xi computed on X and its eroded sets (Lantuéjoul, 1978) or, equivalently on the reverse of its distance function (Beucher, 1990; Beucher and Lantuéjoul, 1979), which corresponds to the well-known watershed transform. As an illustration, the method is used to close the contours of the images presented in Figures 7 and 8. Of course, by transforming a set into a function via the
(a)
(b)
(c)
(d)
(e)
F IGURE 6. Binary grains separation. (a) Original set (a couple of superimposed disks), (b) Euclidean SKIZ, (c) one eroded set, (d) inverse distance function, and (e) watershed transform computed on the reverse distance function. The ultimate eroded (the markers) are the centers of the disks.
(a)
(b)
(c)
(d)
(e)
(f)
F IGURE 7. Beucher and Lantuéjoul method for segmenting binary shapes. (a) Dotted square, (b) distance function, (c) minima of the distance function (corresponding to the ultimate eroded of the negative original set), (d) SKIZ (or watershed line) computed on the distance function, (e) original square and imposed markers, and (f) watershed partition associated to a set of sources. (See Color Insert.)
200
MEYER AND VACHIER
(a)
(b)
(c)
(d)
(e) F IGURE 8. Beucher and Lantuéjoul method for segmenting binary shapes. (a) Original and (b) negative image; (c) distance function and (d) regional maxima of the distance function (in red), and (e) final segmentation (in green) superimposed onto the original drawing. (See Color Insert.)
distance transform, the link with the segmentation of any gray-scale image was found. In addition, to prevent oversegmentation due to the big number of catchment basins imbedded in the images, preselected markers can be imposed as regional minima of the function to be segmented. (As an example, the image on Figure 7 was segmented.) The most popular description of the watershed transform is certainly that based on the flooding analogy. Considering the regional minima of the relief as flooding sources, a progressive flooding of the relief is simulated, the flooding level being the same everywhere in the image: lakes progressively
REGULARIZATION OF THE WATERSHED TRANSFORM
F IGURE 9.
201
The relief flooding scenario: the points where two lakes meet define the watershed
line.
appear; they correspond to the catchment basins of the relief. As water coming from two different sources meets, a dam is erected (Figure 9). The process proceeds until the relief is entirely flooded. The watershed line is the set of dams finally erected. As seen previously, the relief generally corresponds to a gradient image so that the contours correspond to crest lines of the relief. To control the watershed segmentation, preselected markers can be imposed as regional minima of the function to be flooded (see Figure 7). Among all the segmentation methods, the watershed transform is one of the most popular judging from the great diversity of applications in which the method has been successfully applied. As with the flat zones approach, the watershed transform is a fully automatic and parameter-free procedure. Therefore, the watershed-based segmentation paradigm results from the combination of the watershed transform and a strategy of image filtering aimed at simplifying the scene such that the final segmentation is correct. Among filters classically associated with the watershed transform, we note the standard morphological openings and closing (Serra, 1982) for their denoising properties, the filters by reconstruction (Serra, 1988; Salembier and Serra, 1995) because they simplify images while preserving the contours of the remaining shapes, and the levelings (Meyer and Maragos, 1999; Vachier, 2001) because of their very nice scale-space properties (Vachier, 1995; Vachier and Meyer, 1995; Vachier and Vincent, 1995; Vachier, 2001). In particular, because they are connected, levelings preserve the contour information; in addition, levelings ensure a strict simplification of the image content: shapes [i.e., catchment basins of f or (−f )] are removed but no new feature may appear on the filtered image. Figure 10 shows how levelings act on numerical images; catchment basins may be partially or completely eliminated. Levelings act on images by merging some adjacent catchment basins so that partitions extracted on more and more leveled images using the watershed transform are nested, which leads to meaningful hierarchical descriptions of the image content (Vachier, 1995; Gomila, 2001).
202
MEYER AND VACHIER
F IGURE 10.
Simplification of numerical images by levelings.
The segmentations presented in Figure 11 correspond to partitions obtained via the watershed transform computed on increasingly leveled reliefs. Here, catchment basins of low volume are eliminated; others are preserved. This criterion combines the area and the contrast information; it is thereby well adapted to model the human perception (Vachier, 1995; Vachier, 1998). Note that levelings and watersheds may be computed on graphs so that the obtention of the partitions pyramid does not take more computation time than a unique watershed transform (Vachier, 1995). In addition, very efficient implementations of those algorithms based on hierarchical queues of pixels make the strategy one of the least expensive in terms of computation time. Finally, the strategy is versatile, well adapted to images with any number of dimensions. It offers nice perspectives in interactive segmentation strategies (Vachier, 1995). These points are detailed in Section II.
II. K EY C ONTRIBUTIONS Since its invention in 1979, many authors have contributed to make the watershed-based segmentation paradigm powerful through significant advances in theoretical, algorithmic, or experimental levels. With steadily growing processing capacities, increasingly complex problems of segmentation may be tackled. The list of publications on these topics is very long, thus only some major contributions are emphasized here. Formal definitions or computation methods are listed next. A. Watershed Definitions Since its introduction, various formulations of the watershed transform appear, providing elegant descriptions, revealing its nice properties and resolving some computation difficulties.
REGULARIZATION OF THE WATERSHED TRANSFORM
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
203
F IGURE 11. Illustration of the watershed-based segmentation paradigms. (a) Original image and (b) its gradient [computed as the morphological gradient (Beucher, 1990)], (c)–(h), N catchment basins of the gradient of highest volume. N is successively equal to 4200 (c), 200 (d), 100 (e), 25 (f), 50 (g), and 5 (h).
204
MEYER AND VACHIER
1. The Watershed as a Geodesic SKIZ Computed on the Level Sets of the Function Originally, the watershed transform was defined as a SKIZ computed on the level sets of the function (Beucher and Lantuéjoul, 1979). The image is denoted f , and Xλ (f ) is the upper level set of f at level λ: (3) Xλ (f ) = x ∈ E, f (x) ≥ λ . In the discrete case, the integer index n replaces the continuous index λ. The lower level sets of f are the complement of the upper level sets of f : [Xn (f )]c . An influence zone is associated with any regional minimum of the function f by progressively segmenting the lower level sets of f , starting from those of lowest altitude. Denoting by C0 the set of all regional minima of the function and by Cn−1 the segmented set obtained at level (n − 1), the segmentation of [Xn (f )]c involves the computation of the geodesic SKIZ of Cn−1 into [Xn (f )]c (see Figure 12). The procedure is iterated from bottom to top. This algorithm is better described using the topographical interpretation of the watershed line. A drop of water falling on a divide line may slide on any side of it and reach one of the two adjacent catchment basins, following a line of steepest slope. Conversely, a flooding scenario, in which the minima serve as sources, also follows the line of steepest slope from bottom to top; the watershed line is the set of meeting points of water in the relief flooding scenario (see Figure 9). The advantage of the flooding-based formulation of the watershed construction is to stress the contribution of each level set of the function. At each
F IGURE 12. Binary SKIZ-based computation of the watershed line. When a connected component Sn of level set n contains at least two disconnected sets of level (n − 1) (Cn−1 ), then the component of level n is segmented by computing a binary geodesical SKIZ of Cn−1 into Sn . Cn−1 is the result of the segmentation obtained at level (n − 1).
REGULARIZATION OF THE WATERSHED TRANSFORM
205
step of the relief flooding scenario, the lakes correspond to the lower level sets of the functions, so the watershed points may be extracted level by level. Denoting by Xn (f ) the level set of f at level n, the negative sets Sn = Xnc are the level sets of the lakes. As the level of the flooding increases, new lakes appear and existing lakes become larger. At some pass points of the relief, two lakes separated at level (n − 1) meet at level n: some watershed points are extracted; they correspond to the geodesic skeleton of the lakes at level (n−1) within the lakes of level n. This process is repeated for all successive levels, keeping the lakes separated along the watershed line. The lakes at level (n − 1) play the role of markers and the level set (Sn (f )) the role of reference set. This algorithm produces thin watershed lines. Its main drawback is the computation cost since each function level set must be processed independently. 2. An Efficient Algorithm Based on Queues of Pixels In 1991, Soille, Vincent, and Meyer suggested efficient computation methods of the watershed (Vincent, 1990; Vincent and Soille, 1991; Meyer, 1991). Earlier sequential algorithms functioned on hardwire machines on which only binary operations were available. With increasing memory sizes, parallel algorithms based on queues of pixels developed. The most efficient computation method of the watershed transform is based on hierarchical queues of pixels (Meyer, 1991). This offers two major benefits: first, it allows work on a narrow band near the lakes (and not on the entire image); second, it avoids a threshold by threshold processing. The simulation of the relief flooding requires level-by-level consideration of the pixels starting with those located in the neighborhood of a source (or a lake). These pixels are entered in the queue corresponding to their priority. Then, pixels are extracted, flooded, and their neighbors are queued since all the image is flooded. The computation is simple and efficient; it can be very easily implemented on a hardware support (Klein et al., 1995). In addition, the algorithm may be adapted to the computation of meaningful levelings such as area (Vincent, 1992), contrast (Grimaud, 1992; Vachier and Vincent, 1995), or volume closings (Vachier, 1998). Finally, using a similar scheme, an alternative algorithm exists that leads to a partition of the space into regions (without divide lines) (Vincent, 1990). 3. The Watershed as a SKIZ Directly Computed on the Numerical Function In 1994, the watershed transform was expressed as a SKIZ of a numerical function. Meyer solved the problem in the discrete case by introducing the concept of topographical distance (Meyer, 1994), while Najman and Schmitt
206
MEYER AND VACHIER
performed the same work in the continuous case, defining the image distance (Najman and Schmitt, 1994). The two distances are equivalent; in this chapter, we refer to topographical distance in both cases. Under the assumption that f is differentiable, and denoting ∇f its gradient, the topographical distance between two points x and y is the cost of the travel between y and x. This cost is nothing but the sum of the altitude gradients crossed from x to y. The following formulation is due to Najman and Schmitt (1994): / / /∇f γxy (s) / ds. (4) df (x, y) = inf γxy ∈Γxy
[0,1]
In this definition, γxy is a path of extremities x and y. A path γ from a point x to a point y is any continuous function from [0, 1] to E such that γ (0) = x and γ (1) = y. Γxy stands for the set of paths of extremities x and y. The topographical cost of a path γ is the sum along γ of the bidimensional gradients of f . The topographical distance is then the cost of the path of minimal cost. It is the Euclidean distance weighted with the gradient norm of f . If '∇f ' is constant on γxy , then df (x, y) = '∇f 'length(γxy ) (Meyer, 1994); this configuration is that of piecewise constant functions (Figure 13). Considering a set of points x1 , x2 , . . . , xI in (E, df ), each point xi may be associated with a catchment basin CB(xi ): that is, the set of points of E closer to xi than to any other point xj in the sense of the distance df (Meyer, 1994; Najman and Schmitt, 1994): CB(xi ) = x ∈ E, ∀j = i, f (xi ) + df (x, xi ) ≤ f (xj ) + df (x, xj ) . (5) If x and xi belong to a path of constant slope, then df (x, xi ) = C(xi − x) = f (x) − f (xi ). So, f (x) = df (x, xi ) + f (xi ). Generally, the points xi correspond to the regional minima of f and are supposed to be of altitude zero [f (xi ) = 0] so the expression is simpler. Otherwise, it is still possible to modify the homotopy of f to impose the points xi as regional minima of f . This step is described in detail in Section II.B. The watershed line is the set of points belonging to several catchment basins: x ∈ Wsh(f )
⇐⇒
∃i = j, x ∈ CB(xi ) ∩ CB(xj ).
(6)
Note that the topographical distance is not actually a distance because two nonmerged points can be at null distance from each other: df (x, y) = 0
/⇒
x = y.
(7)
As a consequence, the divide line resulting from the above definition can be thick. One solution consists in supplementing the topographical distance
REGULARIZATION OF THE WATERSHED TRANSFORM
207
F IGURE 13. Computation of the topographical distance when the variations of the function f are piecewise constant. Here, df (x, y) = a(x1 − x) + b(x2 − x1 ) + c(x3 − x2 ) + d(y − x3 ).
by the Euclidean distance in places where the function f is constant: df (x, y) = inf ∇f γxy (s) + 1 ds, γxy ∈Γxy
(8)
[0,1]
which comes down to consider the median axis of the thick watershed line. 4. The Watershed as the Solution of an Energy-Minimization Problem Following the precedent formulation, the watershed partition can be interpreted as the optimal partition for a certain minimization problem. The following results are due to Boomgaard and Smeulders (Boomgaard and Smeudlers, 1994). If x and xi belong to the same catchment basin, we have seen that: f (x) = df (x, xi ) + f (xi ). So, for a point x located anywhere in the image (Figure 14):
f (x) = inf df (x, xi ) + f (xi ) . i∈I
(9) (10)
This leads to an energy-based formulation of the watershed expressed by Hieu et al. in 2000: the watershed partition is the partition of the space minimizing the energy (Hieu et al., 2000):
(11) f (xi ) + df (x, xi ) dx. i∈I D(x ) i
This formulation provides a solid framework to compare the watershed with other energy-based segmentation methods (Najman, 1994; Hieu et al., 2000).
208
MEYER AND VACHIER
F IGURE 14. df (x, xi )] dx.
The watershed partition minimizes the energy
--
i∈I
D (xi ) [f (xi ) +
5. The Watershed as a Result of Numerical Thinnings Several images can yield the same watershed transform. Among all the images with the same watershed, the smallest ones are null outside the watershed line, so they are composed of very thin structures: the contour lines. Following this concept, a watershed transform can be defined by progressive thinnings of the functions. This idea was exploited by Bertrand for defining the topological watershed (Bertrand, 2005). Homotopic thinnings of numerical functions being defined (Bertrand, 1995), the topological watershed can be expressed as the result of a maximal thinning of a function. The link between standard and topological watersheds was established in Najman et al. (2005). B. Use of the Watershed Transform for Image Segmentation How is the watershed transform used in practice for segmenting an image f ? First, the edges are enhanced by computing the image gradient magnitude '∇f ': (Figure 15). For gray-tone images, '∇f ' can be approximated by the discrete morphological gradient δn1 B (f ) − εn2 B (f ), where δn1 B (f ) = f ⊕ n1 B is the flat dilation of f by a disk n1 B of radius n1 and εn2 B (f ) = f ) n2 B is the flat erosion of f by n2 B (Figure 15). The gradient precision and localization depends on n1 and n2 ; by increasing n1 and/or n2 , details can be neglected (Beucher, 1990), but conversely, morphological gradients are of a size at least equal to 1 (if n1 = 1 and n2 = 0 for example), which presents a drawback in case of very thin structures; but this is a sampling defect. Note that solutions exist for computing a 2D gradient function for color images (Angulo, 2003). With enhanced edges, the segmentation process involves computing the watershed transform of the gradient image '∇f '. With no preprocessing, the number of regions extracted equals the number of regional minima of '∇f '. The minima are often extremely numerous, which leads to oversegmentation. For this reason, the watershed is generally computed from a smaller set of markers that have been identified by a preliminary analysis (see Figure 17).
REGULARIZATION OF THE WATERSHED TRANSFORM
(a)
(b)
209
(c)
F IGURE 15. (a) A gray-scale image, (b) its gradient norm, and (c) the representation of the gradient image as a topographic surface.
(a)
(b)
F IGURE 16. (a) Profile of the original function f and of the marker function g. (b) The reconstructed function having as minima those of g is represented in bold.
(a)
(b)
(c)
F IGURE 17. Example of watershed partition. (a) Original image, (b) gradient image, and (c) watershed partition (the partition presented here comes from a pyramid of partitions; see construction description in Section II.C).
210
MEYER AND VACHIER
When the sources for flooding are not all minima of the topographic surface, two solutions are possible. First, the markers can be used as sources; in this case, catchment basins without sources are flooded from already flooded neighboring regions. Such a flooding algorithm, using hierarchical queues, was described in (Beucher and Meyer, 1992). The second solution consists of modifying the topographic surface as slightly as possible so that the markers become the only regional minima. This operation is called swamping. If m1 , m2 , . . . , mk are the binary markers, we construct a marker function g defined as follows: g = +∞ outside the markers and g = f inside the markers. Conversely, the topographic surface f is modified by constructing the highest flooding of f below the function g; in other words, we perform a reconstruction closing of f using g as marker. The process is illustrated in Figure 16 (Meyer and Beucher, 1990). As specified in Section II.C, several operators have been designed for selecting relevant markers: the r, h extrema (Schmitt and Prêteux, 1986), the dynamics (Grimaud, 1992), the waterfall (Beucher, 1994), and the generalization of these concepts in terms of extinction functions (Vachier and Meyer, 1995; Vachier and Vincent, 1995). C. Connected Works In addition to the list of the above-mentioned contributions, other connected works are of great interest. The following list is not exhaustive; it presents some key concepts that are currently the object of constant research. 1. Watersheds and Graphs The first area of Internet item concerns the link between the watershed transform and graph-based representations. Since the watershed transform is a partition of the space, a region adjacency graph can be defined in which nodes correspond to the regions, and edges link the nodes corresponding to adjacent regions (Figure 18). In the case of the SKIZ, this representation
F IGURE 18. Neighborhood graph of regions. Each region is a node of the graph; two nodes are connected by an edge if they are neighboring regions in the image.
REGULARIZATION OF THE WATERSHED TRANSFORM
F IGURE 19.
211
Graph-based segmentation: Voronoi tessellation associated with seeds.
is also called Delaunay’s triangulation. Weights may be assigned to the edges expressing some type of dissimilarity between adjacent regions; as an example, this weight can be the minimal altitude necessary to climb to pass from a region to the other. Then, the graph-based segmentation is a two-stage process. First, the graph is produced from a fine partition of the space; second, the subsequent coarser partitions are computed using only the neighborhood graph. The computation of coarser partitions consists of segmenting the graph. For example, some seeds being defined, all nodes that are closer, in the sense of a lexicographic distance (Meyer, 2005), to this seed than to any other are merged. These seeds are illustrated with a dark disk in Figure 19. The image partition into regions associated with the seeds is nothing but a skeleton by influence zone or Voronoi tessellation for this particular distance computed on the graph. Considering nested sets of seeds yields the construction of a pyramid of watershed partitions. This property is shown in Figure 20, where nodes have been ranked according to the volume of the region they point out (a precise description of this step can be found in Vachier (1995, 1998). The computation of the best partition into N regions according to a given criterion is immediate. Similarly, a detail can be very simply added to the image by reactivating the corresponding node in the graph. The graphbased representation presents major advantages; not only one but a family of partitions of the image are memorized in a unique structure, the problem of finding one optimal partition (according to a given criterion). Moreover, this yields to a powerful multiscale description of the image content. The primary applications of watershed-based graphs are interactive segmentation (Vachier, 1995; Meyer et al., 1997; Salembier, 1994), promising compression algorithms, image indexing (Arbelaez, 2005), shape recognition (Bloch, 2002; Bloch, 2000; Perchant and Bloch, 2000), and artificial vision (Gomila, 2001). 2. Morphological Scale-Space Analysis As seen previously, considering nested sets of nodes results in the construction of a pyramid of watershed partitions. Following this idea, the underlying
212
MEYER AND VACHIER
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
F IGURE 20. (a) Original image. (b)–(e) Pyramid of watershed partitions constructed on a volume criterion. From left to right, the partitions are more and more coarse. The number of catchment basins decreases; it is 35,513 for the finest partition (b) then 2000 (c), 200 (d), and finally 20 (e). (f) and (g) Some details of two partitions of the pyramid illustrating the nested partitions property.
question is: in which order must the nodes be eliminated? As an example, how has the pyramid presented in Figure 20 been constructed? In other words, considering that sources localize the shapes in the image, how is the importance of a given shape measured? This question has received increasing interest since the pioneering work by Maar on scale-space (Maar, 1982). Mathematical morphology proposed an original approach by using the regional extrema as markers of shapes (or nodes in the graph). This idea was first exploited by Schmitt and Prêteux (Schmitt and Prêteux, 1986) and led to the invention of the h and r, h extrema. In 1992, Grimaud corrected
REGULARIZATION OF THE WATERSHED TRANSFORM
213
the contrast measurement originally proposed by Schmitt and Prêteux and introduced the well-known dynamics. In 1995, this concept was extended by Vachier to any morphological scale-space measurements (Vachier, 1995; Vachier and Meyer, 1995; Vachier and Vincent, 1995; Vachier, 1998). In standard scale-space analysis, the feature extraction procedure relies on two steps: (1) a hierarchical simplification step where signal components are progressively eliminated and (2) a measurement step consisting in calculating the progressive loss of information. Most classical examples are Fourier spectral analysis, Matheron’s granulometries (Matheron, 1975), or linear and nonlinear diffusions (Perona and Malik, 1990). Classically, size or frequency distributions are extracted. This measures how many “objects” are eliminated at each step of simplification. In object-oriented approaches, it is preferable to measure for which simplification step each object is eliminated. This approach imposes that each object can be traced through the simplification scales. The question is then whether such operators exist to allow this type of analysis. Most standard scale-space operators are convolutions by Gauss functions (Maar, 1982). They satisfy important scale-space properties (Witkin, 1983), but also present some important drawbacks: edges are blurred and new objects may appear at coarse scales (Perona and Malik, 1990). As a consequence, tracing the objects through the scales becomes problematic. This remark is valid for all the linear filters. But in the nonlinear case, solutions exist. Images presented in Figure 21 were obtained by applying area filters (Vincent, 1992) of increasing activity; the image regions are progressively removed, starting from the smallest. This representation leads to a type of scale-space where the scale is the size. Furthermore, by using the extrema to point out the image structures as suggested by Maar (Maar, 1982), it is possible to track the image structures through scales. Indeed, some morphological filters ensure a monotonic decreasing of the number of extrema: the levelings (Vachier, 1995; Meyer and Maragos, 1999; Vachier, 2001). The scale for which a structure (i.e., the associated extremum) is eliminated is called its extinction scale (Vachier, 2001). Different attributes may be easily computed as the dynamics (Grimaud, 1992; Vachier and Vincent, 1995), the area (Vachier and Meyer, 1995), or the volume (Vachier, 1998) of the shape. Which morphological filters allow such an analysis? They must ensure a monotonic decreasing of the image feature—a feature being represented by an extrema is either entirely preserved (its contour stays unchanged) or entirely removed. Those filters are the well-known levelings (Meyer and Maragos, 1999) (Figure 22). A morphological connected operator ψ preserves the edges location (Serra and Salembier, 1993). If x and y are neighbors:
f (x) = f (y) ⇒ ψ(f )(x) = ψ(f )(y) . (12)
214
MEYER AND VACHIER
(a)
(b)
(c)
(d)
(e)
(f)
F IGURE 21. The features of levelings. (a) Original image and simplified images obtained by area filterings; shapes of area lower than 50 (b) then 1000 (c) are eliminated. (d)–(f) Extrema of the precedent images (regional maxima are white; regional minima are red). Note the monotonic decreasing of the extrema number. (See Color Insert.)
A morphological leveling preserves the edges location and the image luminance ordering (Figure 23) (Meyer and Maragos, 1999; Vachier, 2001):
f (x) < f (y) ⇒ ψ(f )(x) ≤ ψ(f )(y) . (13) More precisely, levelings attenuate the luminance transitions (Meyer and Maragos, 1999):
ψ(f )(x) < ψ(f )(y)
⇒ f (x) ≤ ψ(f )(x) < ψ(f )(y) ≤ f (y) . (14) 3. The Watershed Regularization Watershed segmentation does present a drawback: the absence of parameters is both a strength and a weakness. In some cases, when the images to
REGULARIZATION OF THE WATERSHED TRANSFORM
215
F IGURE 22. Valuation of the regional extrema with their extinction scale, the critical scale for which they disappear. Here, levelings correspond to area filters so the area extinction function is computed.
F IGURE 23. The effect of levelings on gray-tone images. Levelings preserve the function transitions and their “directions.”
be segmented are corrupted by blurring, noise, or poor lighting conditions, contours are poorly defined, and the segmentation must result from a compromise between a complete adherence to the data (and possibly to the noise) and a certain amount of modeling; poorly defined parts of the contour must sometimes be interpolated by fitting a model to the better defined parts. Energy-based methods (e.g., snakes, the balloon model) follow this line: smoothness terms are incorporated in the model (Kass et al., 1988; Cohen, 1991; Sethian, 1996; Caselles et al., 1997; Xu and Prince, 1998). When dealing with watershed partitions, different strategies may be imagined. First, following the flooding analogy used for computing the watershed, one may imagine introducing some degree of stiffness and a higher smoothness by flooding the topographic relief with a viscous fluid. This option is that of Hieu, Worring, and den Boomgaard (Hieu et al., 2000), who suggest adding a smoothness term in the watershed energy, or that of Salembier (Salembier, 1994), Marcotegui (Marcotegui, 1996), or Serra (Serra, 2002), who suggest regularization of the lakes along the flooding scenario, which is equivalent to
216
MEYER AND VACHIER
modifying the topographical distance. As an example, the solution proposed by Boomgaard et al. is drawn from the energy formulation of the watershed. As detailed in Section II, the watershed partition minimizes the energy
E= (15) f (xi ) + df (x, xi ) dx. i∈I D(x ) i
The minimization of E ensures that the partition adheres to the data. Boomgaard et al. suggest adding to this external energy an internal energy to force the contours to be regular, as usual in energy-based methods, the length of the contour is added and the energy becomes (Hieu et al., 2000):
E= (16) f (xi ) + df (x, xi ) + β ds dx. i∈I D(x ) i
∂D
The formulation is simple, but the calculation of the minimum of E is more delicate. A solution is proposed in Hieu et al. (2000) but, compared with the original watershed, it is much more time-consuming. Alternatively, the relief itself may be modified in such a way that by flooding this new relief with either a viscous or a nonviscous fluid would produce exactly the same progression of the flooding, and hence the same placement of the watershed lines. The advantage is that an ordinary watershed transform can be used on this new topographic surface, and the standard computation method is still valid. Furthermore, if the process must be repeated, as in multiscale segmentation procedures, the topographic surface need be smoothed out only once. This second promising alternative has been developed by Meyer and Vachier (2002) and is presented in Section III.
III. T HE C ONTOURS R EGULARIZATION A. Precision and Robustness of the Watershed Transform By construction, the localization of the contours extracted by the watershed transform is entirely determined by the topographic surface. The lakes follow faithfully the borders of land, and their contours can be rather irregular and chaotic. This is the case in the presence of noise, or poorly defined gradient images, as for example when the original image is blurred. Let us consider the flower image presented in Figure 24 and attempt to segment the heart of the flower. Two flooding sources have been manually placed: one inside and one outside the heart of the flower. As usual, transitions of luminance correspond to high value of the gradient norm; thus, the relief
REGULARIZATION OF THE WATERSHED TRANSFORM
F IGURE 24.
(a)
217
Original image (image to be segmented).
(b)
(c)
F IGURE 25. (a) Gradient image (relief to be flooded). (b) Flooding sources (one is placed in the center of the flower; the other is the edge of the field), and (c) watershed line (superimposed on the original image).
to be flooded is the gradient image. Here the gradient corresponds to the morphological gradient (dilation minus erosion by disks of size one). Without preprocessing, the watershed line is poorly localized (Figure 25). This was foreseeable; the precise localization of the heart of the flower is very delicate since the image is fuzzy. During the flooding procedure, water leaks between contour fragments, and some lakes may meet at the wrong places. This phenomenon is frequent in the case of noisy data (especially when gradient images are considered). When contours are blurred or badly defined, the segmentation must result from a compromise between a complete adherence to the data and a certain amount of modeling. As illustrated in the example in the watershed-based scenario, irregularly shaped lakes, when they meet naturally, create irregular watershed lines. Regularizing the watershed proceeds only from some smoothing of the waterfront. Either the waterfront itself is smoothed out, as if the fluid had some viscosity, or alternatively, the topographic surface itself must be smoothed out and some fjords must be filled. In both cases, the shape of the lakes during flooding will be smoothed out.
218
MEYER AND VACHIER
B. Ideas for Regularizing the Relief Among all the strategies adapted to the regularization of the watershed, we concentrate on those based on a prior smoothing of the relief. As noted out in Section II.C, the advantage of this option is that the ordinary watershed can be used on this new relief, and the benefits of fast watershed algorithms are preserved. Two examples of smoothing procedures are presented in the following text. The first is based on the use of morphological closings and the second on the use of morphological dilations. 1. Noisy Contours: Closings The simplest modification of the relief producing smoother divide lines is the morphological closing. Closing the relief f essentially consists of opening the catchment basins of the relief. Indeed, openings and closings are dual operators, and catchment basins are made of level sets of (−f ), that is, the sets Sh (f ) defined by: Sh = [Xh (f )]c . We recall that the morphological opening of a function f consists in opening each level set of f . If A is a set, the opening of A with a structuring element B, denoted γB (A), is defined by: γB (A) = {Bx | Bx ⊂ A} (17) x∈E
and the closing ϕB (A) satisfies:
c ϕB (A) = γB Ac .
(18)
The connected components of Sh correspond to the surface of the lakes appearing during the flooding of the relief leading to the computation of the watershed transform. So, the closing of f is an opening of the lakes: c c
= γB (Sh ) . (19) ϕB Xh (f ) = γB Xh (f )c By closing, a smoother image is produced (Figures 26 and 27). However, the contour is still poorly localized if one constructs the watershed associated with the sources. 2. Case of Dotted Contours: Dilations Morphological closing results from the combination of dilation and erosion. Dilation by disks acts on sets by enlarging them; the resulting set is larger and smoothed (Figure 29). Furthermore, by dilation, disconnected contour arcs can be reconnected as illustrated in Figure 30. In order to preserve the precision of the contour position, the standard dilation can be replaced by a distance function, as was the case when segmenting
219
REGULARIZATION OF THE WATERSHED TRANSFORM
(a)
(b)
(c)
F IGURE 26. (a) Closed gradient (the relief to be flooded is closed by a disk of radius r0 = 30), (b) flooding sources superimposed onto the gradient, and (c) watershed line computed on the closed relief (superimposed on the original relief).
F IGURE 27. Effect of a closing on a thin contour line. Black, the original contour; gray, the closed contour. Each point outside the closed contour belongs to a disk of radius R.
(a)
(b)
(c)
(d)
F IGURE 28. Watershed partition obtained after closing the relief. (a) Original contour and markers (in gray), (b) closed image (the structuring element is a Euclidean disk), (c) watersheds computed on the original and (d) on the closed image. The markers used for the segmentation are represented in cyan. (See Color Insert.)
grains (remember the dog and the square examples in Figures 8 and 7): N n=0
δn (f ),
(20)
220
MEYER AND VACHIER
F IGURE 29. Regularization of the watershed line by dilation of the relief. Top, original contour and dilated contours (the structuring element is a disk of increasing size). Bottom, associated watershed lines.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
F IGURE 30. Example of the dotted line. (a) Original image f and images obtained by dilating f with structuring elements of increasing size: 6 (b), 10 (c), and then 20 (d). (e)–(h) denote the associated watershed partitions. The markers used for the segmentation are those of Figure 28.
where δn B is the dilation by a disk of size n.
221
REGULARIZATION OF THE WATERSHED TRANSFORM
Anywhere the contour line exists, the segmentation remains unchanged, but the segmentation is better where contour information is missing (Figures 31 and 32). This transform also is valid for functions.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
F IGURE 31. Example of the dotted line. (a) Original image f and images obtained by adding f with dilations of increasing size. Maximal size of dilation is 3 (b), 6 (c), and then 10 (d). Images (e)–(h) are the associate watershed partitions.
F IGURE 32. function values.
Illustration of the effect of the distance by observation of the modification of the
222
MEYER AND VACHIER
Examples presented in this section illustrate a well-known duality: without prefiltering, the segmentation is precise but noise sensitive; with prefiltering, the segmentation is robust but less precise. One solution is to modulate the smoothing of the topography to obtain finer results than with a plain closing, so that the contour “sticks” to the data where the data are sure and sticks to a model where the data are poorly defined. C. Introduction of Viscosity Our goal is to regularize the watershed line to gain robustness, while preserving great precision of the localization where contours are well defined. We present here two types of scenarios developed by Meyer and Vachier (Meyer and Vachier, 2002; Vachier and Meyer, 1995) (Figures 33–35). The first scenario is well adapted for all gradient images for which bottoms of valleys are of gray-level 0; as consequence, the luminance of a point in a gradient image equals its contrast. This is (almost) true even if the image is degraded by blurring. In such a situation, higher values of the gradient are the sign of a well-defined contour in the original image. The watershed line must follow the relief more faithfully in these areas (and this independently on the depth of the adjacent valleys). Inversely, low values of the gradient indicate the presence of blurring due to motion or poor focus; for such parts of contour, a higher regularization must occur. This may be accomplished by use of an oil-type fluid. Such a fluid enters increasingly deeper into a narrow isthmus when its temperature increases. Suppose that the temperature of the fluid increases with the altitude of the relief; this means that the lakes of oil will stick better to the relief as their altitude increases and the contours at higher altitude gain in precision. In other situations, the valleys’ depth is the fundamental parameter of the gradient for modulating the regularization and a second scenario is needed. In such a situation, a smoothing effect that decreases with the depth of the valleys is desirable (Figure 36). A fluid of mercury type is acceptable; such a fluid enters deeper into a narrow isthmus or fjord when submitted to an increasing pressure. The radius of curvature of its meniscus increases with the pressure. The pressure at a given point of a lake is equal to the height of the column of liquid above this point. This means that, as the level of the lake increases, the pressure at a given altitude increases and the fluid is able to enter deeper within the fjords, whereas at the surface of the lake, the smoothing is maximal. The first model is called oil flooding and the second, mercury flooding. Their implementation is presented in Section IV.
REGULARIZATION OF THE WATERSHED TRANSFORM
(a)
(b)
(c)
(d)
(e)
(f)
223
F IGURE 33. Effect of noise on the watershed. (a) Original image f , (b) image after a closing of size 10 [ϕ10 (f )] and (c) result of the combinations of several dilations N n=0 δn (f ). The bottom, images (d)–(f) show the watersheds associated with the original and filtered images.
F IGURE 34. Ultrasound image of the left ventricle of the heart. Result of a closing by a Euclidean disk of size 20. Sum of dilations by disks of size 1 to 20.
224
MEYER AND VACHIER
F IGURE 35.
Watershed associated with images presented in Figure 34.
F IGURE 36. The regularization (i.e., the closing activity) must be locally adapted to the gradient values. At points of low gradient, an important regularization is needed (a closing by a large disk can be applied), whereas a weak regularization is sufficient at points of high gradient values (a closing by a small disk is applied). One flooding source is placed in the center of the ventricle; the other one corresponds to the image border.
IV. T HE V ISCOUS WATERSHED L INE The viscous watershed line is the set of points separating the viscous lakes. If f denotes the original relief, let g be the modified relief such that the viscous watershed line of the original relief coincides with the standard watershed line of the modified relief: Wsh(g) = ViscWsh(f ).
(21)
The watershed transform of a function g can be expressed as the SKIZs computed on the level sets of g. The catchment basins of g are subsets of the level sets of (−g): [Xh (g)]c . They must correspond to the viscous catchment basins of f . Thus, the construction of g is fully defined as soon as we have defined how each level set of f is transformed during the viscous flooding
REGULARIZATION OF THE WATERSHED TRANSFORM
simulation:
c Xh (g)c = Xh ViscousLakes(f ) ⇐⇒ ∀h, Xh (g) = Xh ViscousLakes(f ) .
225
∀h,
(22)
We now detail how the relief g is built. A. Viscous Flooding Principle The concept of viscous flooding was first proposed by Meyer (Meyer, 1993). It is drawn from a physical calculation where the rock granulometry is measured by injecting mercury into the rock. The amount of mercury increases as the pressure allows reaching narrower holes in the rock. This process inspired Matheron to introduce families of increasing openings (Matheron, 1967) as the basis of granulometric measurements. A smoothed version of a set may be obtained by taking the union of all disks of a given radius included in this set; the smoothing increases with the radius of the disks, as more and more details cannot be covered by such disks: γr (A) = (23) Bxr | Bxr ⊂ A , x∈E
where γr is the opening by the structuring element B r . If B r is the Euclidean disk of radius r, the opened set may be interpreted as the space filled by a viscous fluid with a given viscosity (Matheron, 1967). Hence openings by disks of decreasing radius represent the space filled by a viscous fluid with decreasing viscosity (Figure 37). When the fluid becomes less viscous, the space filled by the fluid increases; it can be defined as the set γr (S) with r ≤ r0 . We now explain how the viscosity of a fluid is linked to the image data. 1. Oil Flooding The viscous flooding scenario with an oil-type fluid is due to Meyer and Vachier (Vachier and Meyer, 2005) (Figures 38 and 39). The temperature is indexed on the gray levels; at level 0, the temperature T0 is cold and the oil is extremely viscous, and thus the lakes are roughly smoothed. When the fluid reaches higher levels, the temperature increases and the fluid becomes less viscous; details of relief appear. As in the nonviscous case, the watershed points correspond to points of a SKIZ but the sets to be considered are not the same; they correspond to the level sets of the viscous fluid lakes (Figures 39 and 40). As illustrated, the watershed line derived from the viscous model differs from the standard watershed line in points of contour of low gray level.
226
MEYER AND VACHIER
F IGURE 37. Lakes formed by fluids of decreasing viscosity are represented via openings by disks of decreasing radius.
F IGURE 38.
Original relief.
F IGURE 39. Formation of the catchment basins during the computation of the watershed transform: in white, the lakes; in red, the watershed points. (See Color Insert.)
F IGURE 40. Formation of the catchment basins during the computation of the viscous watershed transform: in white, the viscous lakes; in red, the viscous watershed points. (See Color Insert.)
At the flooding level h, the temperature of the oil is uniform and equal to Th ; the viscosity radius associated with the fluid at this temperature is r(h).
REGULARIZATION OF THE WATERSHED TRANSFORM
227
F IGURE 41. Oil flooding of a single cylinder. At the lowest levels, the temperature is low so the fluid is of high viscosity; at the surface, the temperature increases and the fluid is less viscous.
The viscous lakes are defined by the opening of size r(h); it is γr(h) Xhc
(24)
(Figure 41). As a consequence, the modified relief [denoted T (f )] having the same level sets as the viscous lakes is defined by its level sets as follows (Vachier and Meyer, 2005):
c = ϕr(h) Xh (f ) . (25) ∀h ≥ 0, Xh T (f ) = γr(h) Xh (f )c When h increases, the level set Xh (f ) decreases, but r(h) also decreases and thus the operator ϕr(h) decreases. Hence, the series Xh (T (f )), produced by applying a decreasing operator to a decreasing series of sets, decreases as h increases; they are thus really the level sets of a function T (f ). An explicit formula for this transformation is as follows: 0 0
ϕr(h) h. χh (f ) = h. χh ϕr(h) (f ) T (f ) = h≥0
h≥0
with χh (f )(p) = 1 ∀p ∈ Xh (f ) (26) 0 elsewhere. In this formulation, h. χh (f ) simply corresponds to the level set Xh (f ) represented at the altitude h. This formulation may be directly implemented. For each level h, the level set of the input function f is extracted and then closed using a disk of radius r(h). The output function results from the superposition of the closed level sets—the sets being repositioned at their original altitude.
228
MEYER AND VACHIER
2. Mercury Flooding Flooding with a mercury-type fluid was introduced in its first version in 2000 by Vachier, Meyer, and Lamara (Vachier et al., 2000), but the final formulation was introduced in 2005 (Vachier and Meyer, 2005). During flooding, a mercury-type fluid enters deeper and deeper into a narrow isthmus or fjord as submitted to increasing pressure; the radius of curvature of its meniscus increases with the pressure. Let us consider the mercury flooding of a relief f at level h and detail how the geometries of the lakes are defined. Consider a topographic surface f and its threshold at level t Xt (f ). When the flooding with mercury reaches the level t, its maximal extension is limited by the closing ϕr(0) of Xt (f ). The radius r(0) is the maximal radius of curvature of mercury at atmospheric pressure; let us call this pressure P0 . When the level of flooding increases and reaches the height h, there will be a pressure proportional to the height h − t of fluid above the level t. Let us call this pressure P (h − t). The maximal radius of curvature of mercury at pressure P (h − t) will be r(h − t) < r(0). Hence, the fluid will have entered more deeply into the fjords of Xt (f ), limited by the set ϕr(h−t) Xt (f ). Denoting Xh [T (f )] the threshold of the smoothed relief at altitude h. Xh [T (f )] results from the contribution of all levels below h as illustrated in Figure 42 in the case of a single cylinder. Imagine that we consider an aerial view of the flooding at level h. The contribution of level k itself is ϕr(h−k) Xk (f ). One verifies that the contribution of level h itself is ϕr(0) Xh (f ). Considering the aerial view, the flooding will be the union of all floodings at levels up to h. It is limited by a set that is the intersection
(a)
(b)
F IGURE 42. (a) Cylinder to be flooded. (b) Column of viscous fluid. The fluid is more compressed in the bottom of the cylinder; it behaves as if it were less viscous.
REGULARIZATION OF THE WATERSHED TRANSFORM
229
1 of all limiting sets for all levels up to h: 0
Xh ϕr(h−k) (f + h − k) Xh T (f ) = 0
= Xh
2
ϕr(h−k) (f + h − k) .
(27)
0≤k≤h
Defining t = h − k, then
Xh T (f ) = Xh
2
ϕr(t) (f + t) .
(28)
0≤t≤h
In this formula the height h appears not only in the threshold level but also in the range of the parameter t, which varies from 0 to h. What happens when t assumes a value l > h? The ground level of the deepest lakes of the function f + l is greater than or equal to l. Hence, the threshold Xh (f + l) represents the entire domain of definition E of the function f and the closing ϕr(h) of E is 1 again E. Thus, all levels of t above h in 0≤t ϕr(t) (f +t) have no influence on 1 its threshold at level h. Therefore, Xh [T (f )] = Xh [ 0≤t≤h ϕr(t) (f + t)] = 1 Xh [ 0≤t ϕr(t) (f + t)], and since the height h is no longer present in the parentheses, the functions that are thresholded may be identified: [T (f )] = 1 0≤t ϕr(t) (f + t) (Figure 43). Finally, 2 T (f ) = ϕr(t) (f + t). (29) t≥0
B. Viscous Transforms 1. Definitions and Properties The definition of the viscous watershed yields two new morphological transformations denoted T and T : 0 0
ϕr(h) h. χh (f ) = h. χh ϕr(h) (f ) T (f ) = h≥0
h≥0
230
MEYER AND VACHIER
F IGURE 43. Profile of the relief and mercury flooding. The contribution of the cylinder fk of basis Sk at level h ≥ k is γr (h − k)(Sk ). Its contribution at level h < k is ∅.
with χh (f )(p) = 1 ∀p ∈ Xh (f ) 0 elsewhere and T (f ) =
2
ϕr(t) (f + t).
(30)
(31)
t≥0
T (f ) inherits all the properties of the closing ϕr . It is idempotent (T ◦ T = T ), increasing [f1 ≤ f2 ⇒ T (f1 ) ≤ T (f2 )] and anti-extensive [T (f ) ≤ f ]. Hence, it is a morphological closing; it is called a viscous closing (Vachier and Meyer, 2005). Note that the standard closing corresponds to a viscous closing with a constant viscosity [r(h) = r0 ∀h ≥ 0]. Finally, the viscous closing T is finer than the standard morphological closing: f ≤ T (f ) ≤ ϕr0 (f ).
(32)
Another remark concerns the hierarchical action of the viscous closing: low-level sets are severely closed, whereas the highest-level sets are nearly preserved. The computation of the viscous closing involves the computation of a number of binary closings (one per level h), representing a nonnegligible cost. However, in segmentation applications, the viscous closings are generally computed on gradient images presenting a reduced number of gray levels. Furthermore, in case of interactive segmentation—for example, where the number, position, and shape of the markers is adjusted by hand— numerous computations of the watershed must be performed, each of them flooding the same topographic surface with a different set of sources. In
REGULARIZATION OF THE WATERSHED TRANSFORM
231
this case, the unique modification of the topographic surface is advantageous compared to a method attempting to model the flooding itself. What about T ? T is increasing and anti-extensive as the morphological closing but not idempotent because of the translation (f + k); thus, does not correspond to a morphological closing. It is called viscous transformation. T is finer than the standard morphological closing: f ≤ T (f ) ≤ ϕr(0) (f ).
(33)
Note that the image is first roughly closed [ϕr(0) (f )], then details are reinjected via softer closings [ϕr(t) (f + t) with r(t) ≤ r(0)]. As the flooding level h increases (h → ∞), the fluid at the basement is more tightly compressed [it seems like a nonviscous fluid (r(h) → 0)], and the data are less and less filtered [ϕr(h) (f + t) → f + h]. It is well known that families of closings of increasing size have granulometric properties (Matheron, 1967). The viscous transforms propose some meaningful combinations of the granulometric information. As an illustration, we present an industrial application of the study of macroscopic foam image sequences (Figure 44). The goal of image analysis is the segmentation of the foam bubbles. Rami, Vachier, and Schmitt have suggested a top-down segmentation procedure based on the segmentation of the images resulting from the granulometric analysis and a tracking of the bubbles throughout the granulometric scales (Vachier-Mammar et al., 2006). Indeed, big bubbles and small bubbles are represented at opposite granulometric scales. High granulometric scales lead to a rough segmentation of the big bubbles, whereas low granulometric levels allow refinement of the contours of the big bubbles
(a)
(b)
(c)
F IGURE 44. Effect of the viscous opening. (a) Original image (image of foam of size 300 × 300 pixels), (b) result of a standard opening by a Euclidean disk r(0) = 20 and (c) result of a viscous “opening” of size 20. Here the mercury model is chosen since the image is made of catchment basins located at different altitudes.
232
MEYER AND VACHIER
(a)
(b)
(c)
F IGURE 45. Watershed partitions computed on the gradient modulus of the images of Figure 44. (a) The segmentation of the original image produces precise but not robust contours. (b) By a morphological opening, big bubbles are correctly segmented but small bubbles are lost. (c) The viscous opening combines nice properties of openings of increasing size and ensures a correct segmentation of all the bubbles.
while extracting the contours of the smaller ones. The top-down segmentation procedure is powerful but computation-expensive. An alternative solution is offered by the viscous transformations since the viscosity allows combining the granulometric information in a unique formulation: the viscous opening. So, instead of segmenting n images (resulting from openings of increasing size), only one image need be segmented (the result of the viscous opening) and the watershed transform is calculated only once (Figure 45). C. Model Comparison The effects of the viscous transforms are shown in Figures 47 and 48. Both T and T (f ) are anti-extensive and more precise than the standard morphological closing ϕ (f ≤ T ≤ ϕ and f ≤ T ≤ ϕ). Finally, it can be shown that: T ≥T
(34)
TT = T.
(35)
and T is an invariant of T :
Let us now appreciate the relation between the mercury and the oil models. In the mercury model, for a given flooding level h, fluids of different viscosity invade the relief; the viscosity of the fluid in a lake depends on the depth of the lake. In a single cylinder with its basis at level k, all the sections being equal, the less viscous fluid imposes its geometry: it is the
233
REGULARIZATION OF THE WATERSHED TRANSFORM
fluid located at the bottom of the valley (i.e., at level k). The same result may be obtained with oil, considering that the viscosity radius of mercury at the pressure P (h − k) equals the viscosity radius of the oil at temperature T (h − k): see Figure 46. Note that in order to improve the visibility of the images presented in Figures 47 and 48, gray-level dynamics have been enhanced so that inequalities between the filtered images are not preserved in the illustration. In the original oil model, the temperature is a function of the flooding level and not of the valley’s depth. Of course, the two models are equivalent if the relief is a union of single cylinders having their bases at level 0. This configuration is common in segmentation applications. For example, gradients of piecewise constant functions are of this type. To illustrate how level sets of a function contribute to the formation of the viscous closed relief, oil- and mercury-based viscous closings action has been represented in Figure 49. In the mercury model, a low-level set can play
F IGURE 46. A same set of viscous lakes is obtained by mercury flooding (on the left) and oil flooding (on the right).
(a)
(b)
(c)
(d)
F IGURE 47. (a) Original relief (gradient of the flower image of Figure 24), (b) effect of the viscous transformation of size 30 (mercury model: T ), (c) effect of the viscous closing of size 30 (oil model: T ), and (d) comparison with the standard closing of size 30.
234
MEYER AND VACHIER
(a)
(b)
(c)
(d)
F IGURE 48. (a) Original relief (ultrasound image of the left ventricle of the heart), (b) effect of the viscous transformation of size 20 (mercury model: T ), (c) effect of the viscous closing of size 20 (oil model: T ), and (d) effect of a standard closing of size 20. We have: Id ≤ T ≤ T ≤ ϕ.
(a)
(b)
(c)
(d)
F IGURE 49. Comparison of the viscous transforms. (a) Conic hole (the luminance increases from blue to red), (b) effect of the viscous closing (oil type), (c) effect of the viscous transform (mercury type). The last figure (d) represents a view of the closed sets from above for the mercury model. (See Color Insert.)
a determinant role, while in the oil model, highest-level sets are the least filtered. D. Generalizations Viscous transforms have been defined by the way of a family of closings of increasing activity, that is, associated with structuring elements of increasing size. Rather than setting a given activity (i.e., size of structuring element) to regularize the relief, the viscosity allows combining the effects of a large panel of closings from fine to strong by indexing the activity to the graylevel. This principle can be applied for any family of filters of increasing activity. Moreover, the distance function computed on a set X described in Section III.B is nothing but a viscous dilation.
REGULARIZATION OF THE WATERSHED TRANSFORM
235
Indeed, considering a set X, X can be seen as a binary function f taking two different values 0 and 1: X = X1 (f ) and f = χ1 (f ), and consequently h
δr(i) (f ) =
i=1
h
δr(i) χ1 (f )
(36)
i=1
with r(h) = 0 and r(i) > r(h) for all i < h. The sets δr(i) (X) are nested: X ⊂ · · · ⊂ δr(i) (X) ⊂ · · · ⊂ δr(1) (X). Points belonging to δr(i) (X) \ δr(i+1) (X) will be set at level i when computing the sum hi=1 δr(i) (f ). Instead of summing the dilated sets, the same result can be obtain by translating and superposing the dilated sets: h
δr(i) (f ) =
i=1
0
0 i.δr(i) χ1 (f ) = δr(i) i.χ1 (f )
i=1:h
=
0
i=1:h
δr(i) χ1 (i.f ) .
(37)
i=1:h
This is illustrated in Figure 50. In the defined viscous dilation, the radius of the structuring element decreases with the gray level, as for the oil model developed in the case of the viscous closing. More generally, two viscous transforms (inspired from the oil and the mercury models) may be associated with the dilation: 0 δr(h) h. χh (f ) (38) δ v (f ) = h≥0
and δ˜v (f ) =
2
δr(t) (f + t).
t≥0
(39)
Of course, these transforms differ from the sum hi=1 δr(i) (f ) in the case of gray-scale functions. As an illustration, the viscous transformations have been
F IGURE 50.
The viscous dilation involves superposing increasingly less dilated sets.
236
MEYER AND VACHIER
(a) F IGURE 51.
(b)
(a) Original image and (b) segmentation resulting from the watershed transform.
(a)
(b)
F IGURE 52. Effect of a dilation (a) by a disk of size r0 = 20 and (b) associated watershed line (the markers are the same as in Figure 51).
(a)
(b)
(c)
(d)
F IGURE 53. (a) Effect of a viscous dilation of size r0 = 15; associated watershed line for a viscosity of 5 (b), 15 (c), and 20 (d).
tested on a gray-scale image (Figures 51–54). The viscous dilation allows reconnection of the disconnected contour portions while viscous closings have a regularization effect. It is still possible to combine the advantage of both dilations and closings by considering the family of the closing dilated sets. For the “oil” model, it is expressed as: 0 ϕr(h) δr(h) h. χh (f ) . (40) h≥0
237
REGULARIZATION OF THE WATERSHED TRANSFORM
(a)
(b)
(c)
(d)
F IGURE 54. Comparison of some viscous transformations (“oil” model). (a) Effect of a viscous closing of size r0 = 20, (b) effect of a viscous dilation of size r0 = 20, (c) result obtained when closing the dilated sets (with r0 = 20), (d) watershed obtained when closing the dilated sets of the relief (with r0 = 10).
V. E XPERIMENTS A. Thin Contour Line The synthetic test images presented in Figures 55 and 57 consist of a thin, broken and dotted contours line dividing the space into several parts. In the first example (Figure 55), the contour line is bright, with a high altitude. The viscous closing of the relief produces a thick contour zone whose crest line remains close to the fine initial line; smooth versions of the original contours are added at lower altitudes. (Note that in these cases, the images are made of cylinders, and both oil and mercury floodings produce the same result.) The effect on the viscous watershed construction is best illustrated by the series in Figure 56, where the same contour line is represented with increasing heights. As foreseen, the strongest regularization of the construction of the watershed line occurs for the contours with a low altitude. As the altitude increases, the watershed line is less smooth. The last example in Figure 57 summarizes all these effects; again the same contour results, but with a varying altitude, low in the central part of the image and high on the top and bottom of the image. The contour produced illustrates a benefit of the viscosity model: a strong smoothing in the central part of the image and gradually no smoothing toward the borders of the image (Figures 58–60). Finally, the results presented in Figures 55–57 must be compared to those from a standard closing (see Figure 28) or from sums of dilations (see Figure 31). B. Segmentation of Fuzzy or Blurred Images In many applications, images are fuzzy or blurred (as presented in Figure 24). In this example, the computation of the pure watershed transform applied on the gradient image leads to an incorrect result.
238
MEYER AND VACHIER
(a)
(b)
F IGURE 55. Viscous closing of a synthetic image. The original contour (a) is a one-pixel thin line. (b) The viscous closing gradually thickens the contour.
F IGURE 56. Viscous watershed transform computed on contours of increasing height (with a constant viscosity); the stronger the contour, the softer the regularization.
F IGURE 57.
Viscous watershed line in the case of a contour of variable height.
F IGURE 58.
Road on a landscape; markers used for the segmentation.
Figure 61 presents the segmentation obtained by a viscous watershed transform. Both oil and mercury flooding models are experimented. The results are similar; in both cases, the viscous watershed line is correctly localized, whereas the standard watershed produces incorrect contours.
239
REGULARIZATION OF THE WATERSHED TRANSFORM
(a)
(b)
(c)
(d)
F IGURE 59. Watersheds computed on the (a) original relief, (b) after closing the relief, (c) after dilating the relief ( N 0 δn (f )), and (d) after a viscous closing of the relief. (See Color Insert.)
F IGURE 60. transform.
Original fuzzy image and segmentation obtained via the standard watershed
F IGURE 61. Viscous closings of the gradient image and associated viscous watershed lines (oil type and mercury type).
Finally, we test the robustness of the viscous watershed transform against noise. Gaussian noise has been added to the original gradient image (Figure 62). Nonviscous and viscous watershed lines are computed on the degraded gradient. In all cases, adding noise barely changes the watershed positioning. In particular, the viscous model yields as good results as without noise. For both models, the quality of the segmentation is not affected by blurring (Figure 63).
240
F IGURE 62. gradient.
MEYER AND VACHIER
Addition of a Gaussian noise; the watershed transform was computed on the noisy
F IGURE 63. Viscous closings of the noisy gradient image and associated watershed transforms (oil type and mercury type).
C. Segmentation of Generic Images Many image-processing tasks, such as the processing of multimedia images, must deal with images of a very different nature. Various examples of image segmentation tasks are now presented. The segmentation algorithm proceeds through the following steps: 1. A first segmentation of the image by the standard watershed transform computed on the original gradient image is performed (Figure 64). The sources have been chosen among the most significant regional extrema of the original image in terms of volume. (For this step, refer to (Meyer et al., 1997; Vachier and Meyer, 1995).) This first segmentation is used to build a mosaic image—a piecewise constant image where the segmented regions are filled with the local mean gray values of the original image. 2. Thin contour lines are extracted computing the morphological gradient of the mosaic image (see Figure 64). 3. The viscous closing of the thin contour line image is performed (Figure 65), and the viscous watershed transform is computed (Figure 66). It is important to remember that in this case (where all the relief valleys are at the same level) oil and mercury models are equivalent. Why is it necessary to work with thin contour lines? The viscous regularization is based on hierarchical closings. When contours are thick or close to one another, they may coalesce by closing (as closings do not preserve the homotopy).
REGULARIZATION OF THE WATERSHED TRANSFORM
241
(a)
(b)
(c)
(d)
F IGURE 64. (a) Original image, (b) morphological gradient, (c) result of the watershed transform (in white), and (d) gradient of the mosaic image (each segmented region is filled with the local mean value of the original image; then the morphological gradient of this mosaic image is computed).
(a)
(b)
(c)
F IGURE 65. Viscous closing of the mosaic gradient for different viscosity radius. The viscosity radius is 5 (a), then 10 (b), and 20 (c).
Figures 66 and 67 present the results obtained by the previous algorithm for different viscosities of the fluid. Note that, instead of increasing the viscosity of the fluid, it is possible to lower the heights of the contour arcs; this allows the computation to proceed more quickly.
242
MEYER AND VACHIER
(a) F IGURE 66. 20 (c).
(b)
(c)
Watershed line computed on the precedent filtered relief. r0 = 5 (a), 10 (b), and
(a)
(b)
(d)
(c)
(e)
F IGURE 67. (a) Original image, watershed lines (b and c), and viscous watershed lines (d and e) obtained with two different sets of flooding sources. In the second case, a more important viscosity is imposed.
REGULARIZATION OF THE WATERSHED TRANSFORM
243
The proposed regularization adapts the closing size to the gradient norm (i.e., the height of the contours). This process is adapted when high-gradient values mean precise contours and low-gradient values fuzzy contours. This is not always the case. A poorly contrasted object can have extremely sharp contours and a low gradient norm; otherwise, the gradient line would be narrow. Adapting the algorithm to such a situation is the subject of a current work: the idea is to distinguish fuzzy contours, which must be smoothed, and sharp contours that must be followed with precision. In that case, the viscosity must be indexed to the contour thickness rather than to its height.
VI. S UMMARY Image segmentation relies on seeking an optimal partition of the space. The morphological watershed defines a meaningful framework for solving this task. The segmentation criteria being set (which kind of regions, in which number), levelings and graph representations are efficient tools for finding the best partition. In addition, the delicate question of the regularization of the watershed transform find in the viscosity models an elegant and promising solution that reinforces the potential of the watershed. However, the viscous transform applications have a larger scope of applications than the segmentation problems. In particular, viscous transforms can be applied directly to images and not only to their gradient norms. As an example, we present here the use of viscous closings in analyzing heart ultrasound images. Figure 68 presents the left ventricle of the heart. Because of the noise and heart motion, the localization of heart contours is very delicate. The computation of the viscous transforms of the original image closes and smoothes the ventricle edges (Figures 69 and 70). Then, a classical segmentation by computation of the watershed transform, applied on the viscous closed images, leads to a very good extraction of the ventricle contours. As noted in this chapter, the mercury model is better adapted to the
F IGURE 68.
Ultrasound image of the left ventricle of the heart.
244
MEYER AND VACHIER
(a)
(b)
(c)
F IGURE 69. (a) Relief closed by viscous closing (oil model), (b) gradient of the viscous closed image, and (c) watershed transform.
(a)
(b)
(c)
F IGURE 70. (a) Relief closed by viscous closing (mercury model), (b) gradient of the viscous closed image, and (c) watershed transform.
F IGURE 71.
Segmentation based on a geodesic active contour.
processing of natural images even if it does not define morphological filters (it is not idempotent). In this example, the results obtained by the association watershed/viscous transform (of mercury type) and the contour produced via a geodesic snake (Caselles et al., 1997) are similar to Figure 71. Both the
REGULARIZATION OF THE WATERSHED TRANSFORM
245
snake and viscous watershed models correspond to two different paradigms. The first smooth and segment by minimizing a unique functional being to a weighted sum of heterogeneous factors; ours is a two-step procedure in which smoothing and segmenting are used in sequence.
R EFERENCES Angulo, J. (2003). Morphologie mathematique et indexation d’images couleur. Application à la microscopie en biomédecine. PhD thesis, Ecole Nationale Supérieure des Mines de Paris, Paris, France. Arbelaez, P. (2005). Une approche mtrique pour la segmentation dimages. PhD thesis, Université Paris Dauphine, Paris, France. Bertrand, G. (1995). A parallel thinning algorithm for medial surfaces. Pattern Recognition Letters 16, 979–996. Bertrand, G. (2005). On topological watersheds. Journal of Mathematical Imaging and Vision 22 (2–3), 217–230. Beucher, S. (1990). Segmentation d’images et morphologie mathématique. PhD thesis, Ecole des Mines de Paris, Paris, France. Beucher, S. (1994). Watershed, hierarchical segmentation and waterfall algorithm. In: ISMM’94: Mathematical Morphology and Its Applications to Image Processing. Beucher, S., Lantuéjoul, C. (1979). Use of watersheds in contour detection. In: Proceedings International Workshop on Image Processing. Rennes, France, pp. 17–21. Beucher, S., Meyer, F. (1992). The morphological approach to segmentation: The watershed transformation. In: Dougherty, E.R. (Ed.), Mathematical Morphology in Image Processing. CRC Press, pp. 433–481. Bloch, I. (2000). Using mathematical morphology operators as modal operators for spatial reasoning. In: ECAI 2000, Workshop on Spatio-Temporal Reasoning, Berlin, Germany, pp. 73–79. Bloch, I. (2002). Mathematical morphology and spatial relationships: Quantitative semi-quantitative and symbolic settings. In: Sztandera, L., Matsakis, P. (Eds.), Applying Soft Computing in Defining Spatial Relationships. Physica-Verlag, Berlin, pp. 63–98. Caselles, V., Kimmel, R., Sapiro, G. (1997). Geodesic active contour. International Journal of Computer Vision 22, 61–79. Cohen, L. (1991). On active contour models and balloons. Computer Vision Graphics Image Processing: Image Understanding 53, 211–218. Cohen, L. (2002). Actes des journées d’études SEE: le traitement d’image l’aube du XXIiècle. Minimal paths and deformable models for image analysis (in French).
246
MEYER AND VACHIER
Delechelle, E., Lemoine, J. (1999). La trajectoire deformable: un modèle optique des contours géodésiques fondé sur le principe de fermat. In: VI’99, Trois-Rivi Quebec, Canada, Mai. den Boomgaard, R.V., Smeudlers, R.W. (1994). The morphological structure of images: The differential equations of morphological scale-space. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 1101–1113. Gomila, F. (2001). Mise en correspondance de partitions en vue du Suivi d’objets. PhD thesis, Ecole Nationale Supérieure des Mines de Paris, Paris, France. Grigorishin, T., Yang, Y. (1998). Image segmentation: An electrostatic field based approach. In: Vision Interface’98, Vancouver, British Columbia, pp. 279–286. Grimaud, M. (1992). New measure of contrast: Dynamics. In: Gader, P.D., Dougherty, E.R., Serra, J.C. (Eds.), Image Algebra and Morphological Processing III. Spie Proceedings, vol. 1769. Society of Photo Optical. Hieu, T.N., Worring, M., van den Boomgaard, R. (2000). Watersnakes: Energy driven watershed segmentation. Technical Report 12, Intelligent Sensory Information Systems Group, University of Amsterdam, Amsterdam. Kass, M., Witkin, A., Terzopoulos, D. (1988). Snakes: Active contour models. International Journal of Computer Vision 1 (4), 321–331. Klein, J.-C., Lemonnier, F., Gauthier, M., Peyrard, R. (1995). Hardware implementation of the watershed zone algorithm based on a hierarchical queue structure. In: I. Pitas (Ed.), Proceedings of IEEE Workshop on Nonlinear Signal and Image Processing. Neos Marmaras, Halkidiki, Greece, pp. 859–862. Lantuéjoul, C. (1978). La squelettisation et son application aux mesures topologiques des mosaiques polycristallines. PhD thesis, Ecole des Mines de Paris, Paris, France. Maar, D. (1982). Vision. Freeman, San Francisco. Marcotegui, B. (1996). Segmentation de séquences d’images en vue du codage. PhD thesis, Ecole Nationale supérieure des Mines de Paris, Paris, France. Matheron, G. (1967). Eléments pour une Théorie des Milieux Poreux. Masson, Paris. Matheron, G. (1975). Random Sets and Integral Geometry. John Wiley & Sons, New York. Meyer, F. (1991). Un algorithme optimal de lignes de partage des eaux. In: 8iéme congrés RFIA, Lyon-Villeurbanne, pp. 847–857. Meyer, F. (1993). Inondation par des fluides visqueux. Technical Report Note interne CMM, Ecole des Mines de Paris, Paris, France. Meyer, F. (1994). Topographic distance and watershed lines. Signal Processing 38 (1), 113–125.
REGULARIZATION OF THE WATERSHED TRANSFORM
247
Meyer, F. (2005). Grey-weighted, ultrametric and lexicographic distances. In: Ronse, C., Najman, L., Decencire, E. (Eds.), Mathematical Morphology: 40 Years On. Proceedings of the 7th ISMM, April 18–20, 2005. SpringerVerlag, Dordrecht, pp. 289–298. Meyer, F., Beucher, S. (1990). Morphological segmentation. Journal of Visual Communication and Image Representation 11 (1), 121–146. Meyer, F., Maragos, P. (1999). Morphological scale-space representation with levelings. In: Nielsen, M., Johansen, P., Olsen, O.F., Weickert, J. (Eds.), Scale Space Theory in Computer Vision. Lecture Notes in Computer Science, vol. 1682. Springer-Verlag, Berlin, pp. 187–198. Meyer, F., Oliveras, A., Salembier, P., Vachier, C. (1997). Morphological tools for segmentations: Connected filters and watershed. Annals of Telecommunications 52 (7–8), 367–379. Meyer, F., Vachier, C. (2002). Image segmentation based on viscous flooding simulation. In: Talbot, H., Bear, R. (Eds.), Mathematical Morphology: Proceedings of the VIth International Symposium: ISMM 2002. CSIRO, Sydney, Australia. Monasse, P., Guichard, F. (2000). Scale-space from a level lines tree. Journal of Visual Communication and Image Representation 11, 224–236. Najman, L. (1994). Morphologie mathématique: De la segmentation d’images à l’analyse multivoque. PhD thesis, Université Paris Dauphine, Paris, France. Najman, L., Couprie, M., Bertrand, G. (2005). Watersheds, mosaics, and the emergence paradigm. Discrete Appl. Math. 147 (2–3), 301–324. Special Issue: Advances in Discrete Geometry and Topology. Denoted to the 11th International Conference on Discrete Geometry for Computer Imagery, Naples, Italy, 19–21 November 2003. Najman, L., Schmitt, M. (1994). Watershed for a continuous function. Signal Processing 38, 99–112. Perchant, A., Bloch, I. (2000). Fuzzy morphisms between graphs. Fuzzy Sets and Systems 128, 149–168. Perona, P., Malik, J. (1990). Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis Machine Intelligence 12, 629–639. Salembier, P. (1994). Morphological multiscale segmentation for image coding. Signal Processing 38 (3), 359–386. Salembier, P., Serra, J. (1995). Flat zones filtering, connected operators, and filters by reconstruction. IEEE Transactions on Image Processing 4 (8), 1153–1160. Schmitt, M., Prêteux, F. (1986). Un nouvel algorithme en morphologie mathématique: Les r-h maxima et r-h minima. In: Proc. 2ieme Semaine Internationale de l’Image Electronique, pp. 469–475.
248
MEYER AND VACHIER
Serra, J. (1982). Image Analysis and Mathematical Morphology. Academic Press, New York. Serra, J. (1988). Image Analysis and Mathematical Morphology: Theoretical Advances. Academic Press, New York. Serra, J. (2002). Viscous lattices. In: Talbot, H., Bear, R. (Eds.), Mathematical Morphology: Proceedings of the VIth International Symposium: ISMM 2002. CSIRO, Sydney, Australia, pp. 79–90. Serra, J. (2003). Connexions et segmentation d’image. Traitement du Signal 20 (3), 243–254. Numro spcial: “Le traitement du signal l’aube du XXIe sicle” . Serra, J., Salembier, P. (1993). Connected operators and pyramids. In: Dougherty, E.R., Gader, P.D., Serra, J.C. (Eds.), Image Algebra and Morphological Image Processing IV. SPIE, pp. 65–76. Sethian, J. (1996). Level Set Methods: Evolving Interfaces in Geometry, Fluid Mechanics, Computer Vision, and Materials Science. Cambridge University Press. Vachier, C. (1995). Extraction de caractéristiques, segmentation d’image et morphologie mathématique. PhD thesis, Ecole des Mines de Paris, France. Vachier, C. (1998). Utilisation d’un critèrie volumique pour le filtrage d’image. In: RFIA’98: Reconnaissance des Formes et Intelligence Artificielle, 1:307–315 (Vol. I). Vachier, C. (2001). Extraction de charactéristiques per analyse morphologique multi-échelle. In: Proc. of GRETSI, Toulouse, 1, Sept. 2001. Vachier, C., Meyer, F. (1995). Extinction value: a new measurement of persistence. In: Proc of 1995 IEEE Workshop on Nonlinear Signal and Image Processing, pp. 254–257 (Vol. I). Vachier, C., Meyer, F. (2005). The viscous watershed transform. Journal of Mathematical Imaging and Vision 22, 251–267. Vachier, C., Meyer, F., Lamara, R. (2000). Segmentation par simulation d’une inondation visquese. In: Procceedings of RFIA. Vachier, C., Vincent, L. (1995). Valuation of image extrema using alternating filters by reconstruction. In: Dougherty, E.R., Preteux, F., Shen, S.S. (Eds.), Neural, Morphological, and Stochastic Methods in Image and Signal Processing. SPIE Proceedings, vol. 2568. CA, San Diego, pp. 94–103. Vachier-Mammar, C., Rami-Shojaei, S., Schmitt, S. (2006). Time-size analysis of bubbles images. Application to the characterization of aqueous proteins foams stability. Image and Vision Computing, submitted for publication. [Draft version available at http://www.cmla.enscachan.fr/Utilisateurs/vachier/]. Vincent, L. (1990). Algorithmes morphologiques à base de files d’attente et de lactets. Extensions aux graphes. PhD thesis, Ecole des Mines de Paris, France.
REGULARIZATION OF THE WATERSHED TRANSFORM
249
Vincent, L. (1992). Morphological area openings and closings for grayscale images. Shape in Picture, NATO Workshop, Driebergen. Vincent, L., Soille, P. (1991). Watersheds in digital space and efficient algorithm based on immersion simulations. IEEE Transactions on PAMI 13 (6), 583–598. Witkin, A.P. (1983). Scale-space filtering. In: Proc. 7th Int. Joint. Conf. Artificial Intelligence, pp. 1019–1022. Xu, C., Prince, L. (1998). Snakes, shapes and gradient vector flow. IEEE Transactions on Image Processing 7 (3), 359–369.
This page intentionally left blank
Index
A
Archimedes, of Siracusa, 83, 85 Arithmetic, 77, 80. See also Distribution arithmetic; Fuzzy arithmetic; Interval arithmetic affine, 98, 103–104 distribution, 131–165 ellipsoid, 109 from extension principles, 80 fuzzy, 95, 127–129 interval-like, 109 quantile, 98, 107–108 uncertainty information and, 107–108 range, 79, 87 interval arithmetic and, 98–99 rational, 99 statistical, 106 Taylor model, 98, 104–106 dependencies and, 104–105 triplex, 98 uncertainty information and, 106 Artificial vision, 196, 213 Atomic force microscopy (ATM), 49 TiO2 layer and, 49, 49f Axiomatic approach, to interval arithmetic, 94–95 dependencies and, 100 Axiomatic fuzzy arithmetic, 128 Axioms, of interval arithmetic, 82, 93, 94
Acetic acid (AcOH), 49 AcOH. See Acetic acid Addition/subtraction axioms of interval arithmetic, 82 fuzzy arithmetic and, 127 generalized intervals and, 101 internal convolution methods and, 133 Taylor model arithmetic and, 105 Affine arithmetic, 98, 103–104. See also Interval arithmetic addition/subtraction and, 103 division/multiplication and, 103–104 AlGaN. See Aluminum gallium nitride Algebra, computer, 79 Algorithms. See also Function(s); Software approximating, 111 Berleant, 143–144 example of, 144–146 compression, 213 cross-off, 147–148 vertex matrix creation and, 147–148 dependency bounds-Berleant/Goodman-Strauss, 148–150 efficient, based on queues of pixels, 207 functions and, 77 importance of, 77 Moore’s Method, 136–137 examples of, 137–143 optimization, 90 software and, 117–119 Aluminum gallium nitride (AlGaN), 59 Ammonia (NH3 ), 59 Analysis, with distributions, 129–184. See also Distribution arithmetic; General theory of uncertainty Analytical method, Murphy/Good, 4, 6–8. See also Numerical simulations method numerical method v., 11 Approximating algorithms, 111
B Backscattering analysis, Rutherford, 49 Banach spaces, 115 Band bending, 42 UTSC layer thickness v., 42–43, 43f Basic probability measure function, 150 Bayesian models, 196 Bel, 150, 151 Belief functions, 123 measure, 151 Berleant. See also Interval histogram methods
251
252 algorithm, 143–144 and colleagues’ interval histogram methods, 143–152 Berleant/Goodman-Strauss, dependency bounds, 148–150 Blackouts, of field emission current, 62, 64, 65, 66, 66f Blurred images, segmentation of, 239–242 Borel sets, 170, 174, 180 Brightness of FE, 5–6 of thermionic emission, 5–6 Brouwer fixed-point theorem, 89, 112–113 Burnout area, 65 LaS thin-film planar cathode and, 62, 64–65, 64f
C Carbon nanotubes (CNTs), 3, 34 Carbon-based cathodes, 2, 4, 34 Case-based fuzzy arithmetic, 128 Catchment basins, 200 definition of, for 2D functions, 199–200 partition, 197 Cathode ray tube (CRT) hot cathodes, 35 Cathodes. See also Planar cathodes carbon-based, 2, 4, 34 conventional, 6, 22 CRT hot, 35 diamond-based, 2, 4, 34 dielectric ultra-thin layer, 4 emission characteristics of Schottky, 15–16, 17f work function – 2 eV or less, 16–22 work function – 5 eV, 12–15 ideal, 67 SSE, 35–47 Closings. See Morphological closings Clouds, 84, 165–167. See also General theory of uncertainty definition of, 166 example of, 167, 168f, 169f possibility distributions and, 172–173 possibility theory and, 171–172 thin, 166 Cloudy numbers, 166 vectors, 166 CNTs. See Carbon nanotubes
INDEX COCONUT project, 118 Cold. See also Field emission; Planar cathodes emission, 2, 5, 6, 34 planar cathodes, paradigm for, 34 SSE cathode, nanostructured, 43 Composite-layer nanostructured SSE, 58–61 electron emission mechanism from, 60–61 dual-barrier model, 60–61 emission current instability in, 60 J–F characteristics of, 59–60, 59f Composition flat zone, 198 level set, 198 Compression algorithms, 231 Computational error analysis, 78, 79 mathematics, 86 Computer algebra, 79 Connected filters, 198, 198f, 199f gray-scale images and, 199f Constraint fuzzy arithmetic, 128 interval arithmetic, 95–98, 110 global optimization of, 97 Contours dotted, 220–224, 222f, 223f lines, 210 thin, 239 noisy, 220, 221f regularization of, 218–224, 225f, 226f Contraction mapping theorems, 83, 89 Controlled numerical calculations, 89–90 Convolutions, 132 distribution arithmetic and internal, 132–135 COSY software, 118 Cross-off algorithm, 147–148 vertex matrix creation and, 147–148 CRT hot cathodes. See Cathode ray tube hot cathodes C-XSC software, 100, 117
D D(F, W ). See Transmission coefficient Defect correction methods, 110, 115–116. See also Enclosure Delaunay’s triangulation, 213 Dempster–Shafer theory, 123
INDEX Dependencies axiomatic approach and, 100 error, 134 interval arithmetic and effects of, 100–106 Taylor model arithmetic and, 104–105 Dependency bounds, 146 Berleant/Goodman-Strauss, algorithm, 148–150 Deposition low-pressure metalorganic chemical vapor, 59 pulsed laser, 62 thin-film, 4, 34 sputtering technique, 47–49 of TiO2 on platinum, 47, 48–50 Diamond-based cathodes, 2, 4, 34 Dielectric ultra-thin layer cathodes, 4 Dilations, 220–224 Directed interval arithmetic, 93, 98, 100, 110 Distance image, 208 lexicographic, 213 topographical, 207–208 Distance d, 26 absolute value of, 29–30 J–F curves and, 33 Distribution arithmetic, 131–165 approaches to, 132 enclosure/verification and, 129 internal convolution methods in, 132–135 interval histogram methods in, 136–152 inverse probability method in, 152–165 usefulness of, 131–132 Distributions, 84. See also General theory of uncertainty analysis with, 129–184 enclosure in context of, 84 generalized extension principles for, 184 necessity, 123 possibility, 123, 124 clouds and, 172–173 Division. See Multiplication/division Dotted contours, 220–224, 222f, 223f Dubois, D., 78 Dynamics, 212, 215, 235
E Effective surface barrier, 5f, 16, 38, 42 Efficient algorithm, based on queues of pixels, 207
253 Efficient computation methods, of watershed transform, 207 Electron emission, 2, 4–22 characteristics cathode work function less than 2 eV, 16–22 cathode work function of 5 eV, 12–15 for Schottky cathodes, 15–16, 17f currents, 6–11 discovery of, 2 enhancement over TiO2 layer, in SSE cathodes, 50–51, 50f mechanism/dual-barrier model, from nanostructured SSE cathode, 60–61 from solids, 4–22 for TiO2 layer thicknesses, in SSE cathodes, 54f, 55 Electrostatic image method, 26, 27 parameters in, 14f Ellipsoid arithmetic, 109. See also Interval arithmetic E-methods, 112, 114 Emission. See also Electron emission; Field emission; Thermionic emission characteristics for Schottky cathodes, 15–16, 17f cold, 2, 5, 6, 34 field-induced ballistic, 11 region, 8, 11, 12f hot, 2, 4, 6 solid-state field-controlled, 35 cathodes, 35–47 T-F, 7–8 validity regions for, 8, 9f Emission current density (J ), 35. See also J–F v. F features in SSE cathodes, 38–42 Emitters. See also Planar cathodes Schottky, 15 thin-film, 1 Enclosure, 76, 79, 80, 83–85 approaches to, 110 defect correction methods and, 110, 115–116 definition of, 111 epsilon inflation methods and, 110, 113–115 importance of, 76 meaning of, 80 in context of interval analysis, 111–112
254 range of function methods and, 110, 113 verification and, 76, 80, 83–85, 110–116 in distribution arithmetic, 132 in fuzzy set theory, 129 Energy-based segmentation methods, 209 Energy-minimization problem, watershed transform and, 209, 210f Epsilon inflation methods, 110, 113–115. See also Enclosure Equilibrium space charge distribution (QSC ), 38 composite SSE and, 43, 44, 45 UTSC layer and, 42, 43 Esterification reaction, 50 Extended interval arithmetic, 98, 99–100, 110. See also Interval arithmetic F -probabilities and, 181–183 Extension principles, 76, 79, 80–81. See also United extension approach to interval arithmetic, 109 axiomatic approach v., 109–110 arithmetic from, 81 as bridge, 77, 80 fuzzy, 125–129 generalized, for distribution, 184 interval, 90–92 for random sets, 151–152 Zadeh’s, 78, 79, 81, 91, 125–126 Extinction functions, 212, 217f scale, 215, 217f Extrema h, 214 r, h, 212, 214, 215, 216f, 217f, 242 regional, 214, 217f, 242
F F. See Local field FE. See Field emission FE current density J -applied field F . See J–F FE current I -applied voltage V. See I–V FEAs. See Field emission arrays FEDs. See Flat-panel field emission displays FEES. See Field electron emission spectroscopy FEM. See Field emission microscopy Fermi level, 4, 5, 5f Fermi–Dirac distribution, 6, 37
INDEX Field distributions calculation, 26–29 image method in, 26, 27, 27f normalized plots of, 27, 28, 28f, 29 over planar cathode, in front of probe-ball, 30–31, 31f Field electron emission spectroscopy (FEES), 55 Field emission (FE). See also I–V; J–F brightness of, 5–6 currents, blackouts of, 62, 64, 65, 66, 66f definition of, 2–3 devices v. vacuum tube, 67 first report of, 2 at high temperatures, 15 for instrumentation, 3 example, 3 mechanisms, in composite SSE cathodes, 44–45, 45f microscopy, 3, 4 from planar cathodes, 1–2 at room temperature, 12–15 Spindt microtips and, 33–34 thermionic emission v., 4–5, 5f validity regions for, 8, 9f Field emission arrays (FEAs), 3 alternatives to, 3–4 microfabrication of, 3, 33–34 Field emission microscopy (FEM), 3, 4, 22. See also Scanning anode field emission microscopy Field enhancement factor (γ ), 23 for hemi-ellipsoid on plane, 23, 24t for hemisphere on post, 23, 24t Field ion microscopy, 3 Field-induced ballistic emission, 11, 22 region, 8, 11, 12f FI-LIB software, 117–118 Films nanocrystalline, 2 LaS, 2, 35 thin, 1, 3, 4 Filtering theory, 196 Filters, 215, 246 connected, 198, 198f, 199f gray-scale images and, 199f by reconstruction, 199, 203, 212 Fixed-point theorems, 83, 92 Flat zone decomposition, 198
255
INDEX partitions, 197, 197f, 198, 198f, 199f Flat-panel field emission displays (FEDs), 3 Flooding analogy, 202–203, 203f, 217. See also Watershed transforms Flooding principle, 227–231 mercury, 224, 227, 230–231 oil v., 234–236 oil, 224, 227–229 FM . See Macroscopic field F-N. See Fowler–Nordheim Focal elements, 150 Fortran 95 and C++ software, 100, 117 Fourier spectral analysis, 215 Fowler–Nordheim (F-N) equation, 7 plot, 12 of I–V characteristics, 53, 53f of J–F characteristics, 12, 13, 13f F-probabilities definition of, 169, 170 integration/extension/independence of, 181–183 Kolmogorov–Smirnoff statistics and, 173–180 Full-width half-maximum (FWHM), 15, 56 Function(s), 77, 78. See also Fuzzy set theory; Interval analysis algorithms and, 77 assembled cumulative distribution, 139f basic probability measure, 150 belief, 123 extinction, 212 fuzzy set theory and, 77, 79 as graphs, 126 Green’s, 8, 10, 37 importance of, 77, 79 interval analysis and, 77, 79 intervals and, 88 membership, 119–120 numerical, watershed as SKIZ on, 207–209 piecewise constant, topological distance and, 208, 209f set-valued, 80, 81 2D, 199 catchment basins for, 199–200 Fuzzy control, 93 extension principles, 125–129 images, segmentation of, 239–242
interval analysis, 79 intervals, 81, 120 definition of, 120–121 gradual numbers and, 122–123 lower bound, 122 measures, 123 numbers, 81, 89 definition of, 121 sets, 123 normalized, 123 upper bound, 122 Fuzzy arithmetic, 95, 127–129 addition/subtraction and, 127 axiomatic, 128 case-based, 128 constraint, 128 definition of, 127–128 gradual numbers and, 129 multiplication/division and, 127 Fuzzy set theory, 119–129. See also Interval analysis development of, 79 enclosure and verification in, 129 functions and, 77, 79 interval analysis and focus in, 79–80 historical background, 76–79 introduction, 76–85 relationships in, 120 themes in, 80–85 possibility theory v., 78–79, 124–125 Zadeh and, 78, 79, 81, 119 FWHM. See Full-width half-maximum
G Gallium nitride (GaN), 2 QW, 61 γ . See Field enhancement factor GaN. See Gallium nitride GaN/InGaN and planar cathodes, 2, 34 General theory of uncertainty, 79, 130–131, 165–184 clouds and, 165–167 IVPMs and, 167–184, 167t Generalized extension principles, for distribution, 184 Generalized intervals, 101 addition/subtraction and, 101 arithmetic, 98, 100–103 multiplication/division and, 102–103
256 Generic images, segmentation of, 242–245 Global optimization, 87, 95, 97, 110, 113, 117, 118, 152, 165 of constraint interval arithmetic, 97 software for, 118 GLOBSOV software, 118 Gradual numbers, 121–122 fuzzy arithmetic and, 129 fuzzy numbers and, 122–123 Granulometries, Matheron’s, 215 Graphs -based segmentation, 213, 213f functions as, 126 watershed transforms and, 212–213, 212f Gray-scale images, 198 connected filters and, 199f Gray-tone images, levelings on, 217f Green’s function technique, 8, 10, 37
H h extrema, 214. See also Extrema Hartree units, 6 Hausdorff spaces, 91, 92 Hemi-ellipsoid on plane, γ for, 23, 24t Hemisphere on post, γ for, 23, 24t High-resolution transmission electron microscopy (HRTEM), 49 LaS thin-film planar cathode and, 63f TiO2 layer and, 49, 49f Hot cathodes, CRT, 35 Hot emission, 2, 4, 6. See also Thermionic emission HRTEM. See High-resolution transmission electron microscopy Hyperboloidal model, 22
I Idempotence, 232, 233, 246, 247 Image(s) distance, 208 indexing, 213 numerical, levelings on, 203, 204f segmentation of fuzzy/blurred, 239–242 generic, 242–245 Image method, electrostatic, 26, 27 parameters in, 14f Image segmentation. See Segmentation
INDEX Independence, interval-valued, 181 F-probabilities and, 181–183 Indium gallium nitride (InGaN), 2 InGaN. See Indium gallium nitride InGaN/GaN and planar cathodes, 2, 34 Integration, interval-valued, 181 F-probabilities and, 181–183 Interactive segmentation, 204, 213, 232 Internal convolution methods, 132–135. See also Distribution arithmetic addition/subtraction and, 133 multiplication/division and, 133 Williamson and Downs, 134–135 International Field Emission Society, 3 Intersections, 87–88 Interval(s), 81–82 computer implementations of, 88 definitions of, 87 functions and, 88 fuzzy, 120–121 generalized, 101 addition/subtraction and, 101 division and, 102–103 multiplication and, 102 intersections of, 87–88 nonregular, 100 numbers, 88, 89 redefining of, 96 optimization, 90 as probabilities, 84, 89 unions of, 87–88 Interval analysis, 79–80. See also Fuzzy set theory application of, 83 development of, 78–79, 81, 85–86 enclosure in context of, 111–112 functions and, 77, 79 fuzzy set theory and focus in, 79–80 historical background, 76–79 introduction, 76–85 relationships in, 120 themes in, 80–85 introductory books for, 90 Moore’s contributions to, 86–87 united extension and, 79, 88 verification in context of, 112 web site, 90, 118 http://www.cs.utep.edu/interval-comp, 90
257
INDEX Interval arithmetic, 79, 81–83, 93–110 axiomatic approach to, 94–95 extension principle approach v., 109–110 axioms of, 82, 93, 94 addition/subtraction, 82, 93, 94 multiplication/division, 82, 93, 94 constraint, 95–98, 110 global optimization of, 97 with different representation, 98–99 directed, 93, 98, 100, 110 extended, 98, 99–100, 110 on extended real number system, 99–100 generalized, 98, 100–101 problem solving with, 89–90 range arithmetic and, 98–99 reducing effects of dependencies in, 100–106 rounded, 83 specialized, 98–109 uncertainty information and, 106–108 more, 106–108 from united extension, 95–98 variable precision, 109 Interval extension principle, 90–92 as united extension, 90–92 Interval histogram methods, 136–152. See also Distribution arithmetic; Moore’s Method Berleant and colleagues’, 143–150 Moore and, 136–143 Tonon and, 150–152 Interval-like arithmetic, 109 Interval-valued integration/extension/ independence, 181 F-probabilities and, 181–183 Interval-valued probability, 84, 89 Interval-valued probability measures (IVPMs), 130, 131, 167–184, 167t. See also Clouds; General theory of uncertainty integration/extension/independence in-, 181–183 Kolmogorov–Smirnoff statistics and, 173–180 mathematical programming problems and, 181–183 optimization and, 183–184 INTLAB software, 83, 99, 117
Inverse probability method, 152–165. See also Distribution arithmetic Jamison/Lodwick and, 152–161 Olsen and, 161–165 Isotone property, 92 IST Project, 118 Iterative calculation, in I–V to J–F conversion, 30–31, 31f I–V (FE current I –applied voltage V ), 23 conversion into J-F, 26 calculation procedure for, 26–31 iterative calculation in, 30–31, 31f measurements, from planar cathode, 32–33, 32f for probe-ball/SSE cathode gap, 51, 52f F-N plot of, 53, 53f IVPMs. See Interval-valued probability measures
J J. See Emission current density Jamison/Lodwick, inverse probability method and, 152–161 J–F (FE current density J –applied field F ), 8, 9f for composite-layer nanostructured SSE, 58–61, 59f conversion from I–V characteristics, 26 calculation procedure for, 26–31 distance d and, 33 F-N plot of, 12, 13, 13f for LaS thin-film planar cathode, 63f for SSE cathode, 41, 41f for TiO2 -SSE cathode, 51–55, 52f, 53f, 54f, 55f Joint distribution matrix, 146
K Kaufmann Ar+ ion source, 49 Knaster–Tarski theorem, 92 Kolmogorov–Smirnoff statistics, IVPMs and, 173–180
L Lanthanum monosulfide (LaS), 2, 4, 35, 61–67 intrinsic low work function, 61–67 nanocrystalline films of, 2, 35, 65, 66 patchwork field emission model and, 65–67
258 thin-film planar cathode fabrication, 62–65, 67, 68 burnout behavior in, 62, 64–65, 64f FE behavior in, 62 HRTEM image in, 37f J–F characteristics in, 63f LaS. See Lanthanum monosulfide Left profile, 122 Level set decomposition, 198 Levelings, 203 features of, 216f on gray-tone images, 217f morphological, 216 on numerical images, 203, 204f Lexicographic distance, 213 Linear diffusions, 215 Lippmann–Schwinger (LS) form, 8, 10, 37, 38 Local field (F ), 23. See also J–F v. J features in SSE cathodes, 35–47 Low-pressure metalorganic chemical vapor deposition (LP-MOCVD), 59 LP-MOCVD. See Low-pressure metalorganic chemical vapor deposition LS form. See Lippmann–Schwinger form
M Macroscopic field (FM ), 23 Main value, 106 Maple software, 118 Markers, 200, 212. See also Dynamics; Extinction; r, h extrema; Waterfall Markov fields, 196 Mathematica software, 100, 118 Mathematical morphology, 214 Mathematical programming problems, IVPMs and, 181–183. See also Interval-valued probability measures Matheron’s granulometries, 215 MATLAB software, 83, 117, 128 Matrices joint distribution, 146 vertex, 146–147 cross-off algorithm and creation of, 146–148, 147–148 Measures basic probability, function, 150 belief, 151
INDEX fuzzy, 123 interval-valued probability, 130, 131, 167–184 plausibility, 151 Membership function, 119–120 modal value of, 120 Mercury flooding, 224, 227, 230–231 oil flooding v., 234–236 Methanol, 50 Microfabrication, 33 of FEAs, 3, 33–34 Microscopy. See also Scanning anode field emission microscopy atomic force, 49 field emission, 3, 4, 22 field ion, 3 high-resolution transmission electron, 49 optical, 33 Microtips, Spindt, 33–34 FE and, 33–34 Miranda’s theorem, 113 Modal value, of membership function, 120 Moore, R.E., 78, 85–86. See also Interval analysis; United extension interval analysis and, major contributions, 86–87 interval histogram methods and, 136–143 Moore’s Method, 136–137 assembled cumulative distribution function in, 139f examples of, 137–143 normal distribution in, 139f, 140f block approximation, 140f units of 1 standard deviation, 139f partition of [0, 1] × [0, 1], 137f resultant intervals and overlaps in, 138f partition of (−3 to 3) standard deviations in the plane, 141, 141f, 142t Morphological connected operator, 215 dilations, 220–224 filters, 215, 246 by reconstruction, 199, 203, 212 gradients, 205f, 210, 219, 242, 243f leveling, 216 schools, 197 segmentation paradigms, 200 watershed, 245
259
INDEX Morphological closings, 203, 220, 221f, 232, 233, 233f, 234 noisy contours and, 220 reconstruction, 212 regularization of relief and, 220 volume, 207 Morphological openings, 203, 220, 227, 228f, 229, 233, 234f viscous, 233, 233f, 234 Morphological scale-space analysis, 196, 198, 203, 213–216 Morphology, mathematical, 214 Multilayer concept, 43. See also Solid-state field-controlled emission cathodes Multiplication/division affine arithmetic and, 103–104 axiom of interval arithmetic, 82, 93, 94 fuzzy arithmetic and, 127 generalized intervals and, 102–103 internal convolution methods and, 133 Taylor model arithmetic and, 105 Murphy/Good analytical method. See Analytical method, Murphy/Good
N Nanocrystalline films, of LaS. See Lanthanum monosulfide Nanostructured SSE. See Composite-layer nanostructured SSE Nanotubes, carbon (CNTs), 3, 34 Necessity distribution, 123 Neural networks, 196 NH3 . See Ammonia Noise, 218 Noisy contours, 220, 221f. See also Morphological closings Nonlinear diffusions, 215 Nonregular intervals, 100 Normalization, 123, 124 Normalized fuzzy sets, 123 NP-Hard, 95, 110, 117 Numbers. See also Real number system cloudy, 166 fuzzy, 89, 121 gradual, 121–122 fuzzy arithmetic and, 129 fuzzy intervals and, 122–123 Numerical analysis, 86 controlled, 89–90
Numerical function, watershed as SKIZ on, 207–209 Numerical simulations method, 4, 8, 10–11, 21–22. See also Analytical method, Murphy/Good analytical method v., 11 Numerical thinnings, watershed transforms and, 210
O Oil flooding, 224, 227–229 mercury flooding v., 234–236 Olsen, inverse probability method and, 161–165 One-dimensional Schrödinger equation, 8, 10, 37 Openings. See Morphological openings Optical microscopy, R and, 33 Optimization. See also Global optimization algorithm, 90 interval, 90 IVPMs and, 183–184 under uncertainty, 132 Ordering, 76–77 Oversegmentation, of watershed transform, 200
P Paraboloidal model, 22 PASCAL-XSC, 100 Piezo displacements, 25, 25f Planar cathodes cold paradigm, 34 definition, 1–2 experimental results, 33–67 fabrication procedures, 4 field distribution over, front of probe-ball and, 30–31, 31f field emission from, 1–2 InGaN/GaN and, 2, 34 introduction, 1–4 I–V measurements from, 32–33, 32f LaS thin-film, 62 fabrication, 62–65 qualities of, 4 SAFEM and, 22–33 structures for, 2 theoretical approaches, 33–67 thin-film, 1, 3, 4, 34 ultrathin dielectric layer, 2, 34, 35–47
260 Platinum SSE cathode and, 36, 47 TiO2 ultrathin layer on, 47, 48–50 Platinum–iridium (PtIr) wire, 25 Plausibility measure, 151 PLD. See Pulsed laser deposition Possibility distributions, 123, 124 clouds and, 172–173 Possibility theory, 78, 79, 123. See also Fuzzy set theory clouds and, 171–172 fuzzy set theory v., 78–79, 124–125 quantitative, 123 Zadeh and, 119 Prade, H., 78 Prefiltering, 224. See also Filters Probabilities. See also F-probabilities; Inverse probability method intervals as, 84, 89 interval-valued, 84, 89 measures, 130, 131, 167–184 R-, 170, 173, 182 Probe-ball, 25, 25f, 26 field distribution over planar cathode in front of, 30–31, 31f R of, 26 and SSE cathode gap, I–V for, 51, 52f PROFILE/BIAS software, 117 Proofs, 112. See also Verification Propagation, of uncertainty, 129 Propanol-2, 49 PtIr. See Platinum–iridium Pulsed laser deposition (PLD), 62 Pyramid, of watershed partitions, 213, 214, 214f
Q QSC . See Equilibrium space charge distribution Quantile arithmetic, 98, 107–108. See also Interval arithmetic uncertainty information and, 107–108 Quantitative possibility theory, 123 Quantum wells (QWs), 38, 39, 39f, 40, 40f, 44, 45, 46, 47, 52, 67 GaN, 61 QWs. See Quantum wells
INDEX
R R. See Radius of probe-ball r, h extrema, 212, 214, 215, 216f, 217f, 242. See also Extrema Radius of probe-ball (R), 26 optical microscope and, 33 Random sets, 150 extension principles for, 151–152 Range arithmetic, 79, 87. See also Interval arithmetic interval arithmetic and, 98–99 Range of function methods, 110, 113. See also Enclosure Rare-earth monosulfides. See Lanthanum monosulfide Rational arithmetic, 99. See also Interval arithmetic Real number system, 76, 79. See also Interval arithmetic; Numbers interval arithmetic on extended, 99–100 Recourse problem, 183 Rectangular parallelepipeds, 96 Regional extrema, 214, 217f, 242. See also Extrema Regularization of relief, 220–224 closings and, 220 watershed transform, 216–218 Richardson–Schottky emission relation, 7. See also Thermionic emission Right profile, 122 Ringoid, 83 Risk analysis, 132. See also Distribution arithmetic Rounded interval arithmetic, 83 R-probability field, 170, 173, 182. See also F-probabilities
S SAFEM. See Scanning anode field emission microscopy Scale-space analysis. See Morphological scale-space analysis Scanning anode field emission microscopy (SAFEM), 4 field emission measurements of TiO2 -SSE cathodes, 50–58 photo of, 25f
INDEX planar cathode field emission analyses, 22–33 absolute distance d in, 29–30 field distribution calculation in, 26–29 iterative calculation, 30–31, 31f measurement procedure in, 25–26 results/discussion in, 32–33 schematic description of, 25f Schottky cathodes, 15 emission characteristics for, 15–16, 17f effect, 6, 15 emitters, 15 junction, 36 –Richardson emission relation, 7 Schrödinger equation, one-dimensional, 8, 10, 37 Segmentation, 196. See also Watershed transforms of binary shapes, 201–202, 201f, 202f energy-based, 209 graph-based, 213, 213f of images, 196, 245 fuzzy/blurred, 239–242 generic, 242–245 watershed transform for, 210–212 interactive, 204, 213, 232 paradigms, 196 watershed transform as, 200 pioneering works in, 197 Semiconductor surfaces, III–V, 61. See also Ultrathin semiconductor layer Shapes, 196 binary, 201 segmentation of, 201–202, 201f, 202f recognition, 213 SIAM News, 90 Workshop on Validated Computing, 78 Signal processing, 196 Silane, 59 Simulations, 132. See also Distribution arithmetic Skeleton by influence zone. See SKIZ SKIZ (skeleton by influence zone), 200–201 watershed transforms as computed on level sets of function, 206–207, 206f computed on numerical function, 207–209
261 Snakes, 196, 217, 246, 247. See also Segmentation Software algorithms and, 117–119 COSY, 118 C-XSC, 100, 117 FI-LIB, 117–118 FILIB++, 118 Fortran 95 and C++, 100, 117 global optimization, 118 GLOBSOV, 118 INTLAB, 83, 99, 117 Maple, 118 Mathematica, 100, 118 MATLAB, 83, 117, 128 PROFILE/BIAS, 117 Taylor model arithmetic, 106 UniCalc, 119 Sol–gel technology, 49 Solid-state field-controlled emission (SSE) cathodes, 35–47. See also Titanium oxide-SSE cathodes basic structure, 35–43 composite structure, 43–47 band-edge diagram, 44, 44f FE mechanisms in, 44–45, 45f nanostructured, 58–61 QSC in, 43, 44, 45 schematic, 44, 44f diagram for, 36f evolutions of potential energy diagrams of, 39–40, 39f, 40f experimental results for, 47 geometrical structure of, 36f J v. F in, 35–47 J–F characteristics for, 41, 41f nanostructured cold, 43 platinum and, 36, 47 potential distribution across system in, 38 and probe-ball gap, I-V for, 51, 52f F-N plot of, 53, 53f TED features of, 38–42 T-F emission and, 42 TiO2 -, 47–58 Spaces. See also Morphological scale-space analysis Banach, 115 Hausdorff, 91, 92 Specialized interval arithmetic, 98–109
262 Spindt field emission arrays. See Field emission arrays Spindt microtips, 33–34 SSE cathodes. See Solid-state field-controlled emission cathodes Statistical arithmetic, 106 Statistics, Kolmogorov–Smirnoff, IVPMs and, 173–180 Subtraction/addition. See Addition/subtraction Swamping, 212
T Taylor model arithmetic, 98, 104–106. See also Interval arithmetic addition/subtraction and, 105 definition of, 105 dependencies and, 104–105 downloadable software package for, 106 multiplication/division and, 105 TB. See Top of surface barrier TE. See Thermionic emission TED. See Total energy distribution Temperatures FE at high, 15 TED and, 15, 16f FE at room, 12–15 TED and, 14, 14f T-F emission, 7–8. See also Field emission; Thermionic emission SSE cathodes and, 42 validity regions for, 8, 9f Theorem(s), 83 contraction mapping, 83, 89 fixed-point, 83, 92 Brouwer, 89, 112–113 Knaster–Tarski, 92 Miranda’s, 113 verification and, 112 Thermal runaway, 15n Thermionic emission (TE), 2 brightness of, 5–6 definition of, 4 discovery of, 2 FE v., 4–5, 5f uses of, 2 validity regions for, 8, 9f Thin. See also Clouds; Contours clouds, 166 contour lines, 239
INDEX Thin-film deposition sputtering technique, 47–49 planar cathodes, 1, 3, 4, 34 LaS, 62–65, 67, 68 Thinnings, numerical, 210. See also Watershed transforms III–V semiconductor surfaces, 61 TiO2 . See Titanium oxide Titanium isopropoxide, 49 Titanium oxide (TiO2 ), 32 AFM observation of, 49, 49f HRTEM observation of, 49, 49f on platinum substrate, 47, 48–50 USTC and, 35 Titanium oxide-SSE cathodes, 47–58 electron emission enhancement over TiO2 layer in, 50–51, 50f electron emission for TiO2 layer thicknesses, 54f, 55 experimental objectives of, 47 J–F characteristics of, 51–55, 52f, 53f, 54f, 55f SAFEM field emission measurements of, 50–58 stability in time of emission current in, 56–58, 58f TED of emitted electrons in, 55–56, 57f TM method. See Transfer matrix method TMA. See Trimethylaluminum TMG. See Trimethylgallium Tonon, interval histogram methods and, 150–152 Top of surface barrier (TB), 4 Topographical distance, 207–208 computation of, 208, 209f piecewise constant functions in, 208, 209f Total energy distribution (TED), 14 of emitted electrons, in TiO2 -SSE cathodes, 54–56, 57f high temperature FE and, 15, 16f room temperature FE and, 14, 14f SSE cathodes and, 38–42 Transfer matrix (TM) method, 46 Transforms. See Watershed transforms Transmission coefficient D(F, W ), 37, 38 Triangulation, Delaunay’s, 213 Trimethylaluminum (TMA), 59 Trimethylgallium (TMG), 59
263
INDEX Triplex arithmetic, 98, 106. See also Interval arithmetic uncertainty information and, 106 2D functions. See Two-dimensional functions Two-dimensional (2D) functions, 199
U Ultrathin dielectric layer planar cathodes, 2, 34, 35–47. See also Solid-state field-controlled emission cathodes Ultrathin semiconductor (UTSC) layer, 35. See also Solid-state field-controlled emission cathodes band bending v. thickness of, 42–43, 43f QSC and, 42, 43 TiO2 and, 35 Uncertainty, 80. See also General theory of uncertainty general theory of, 79, 130–131, 165–184 information, 106 interval arithmetic and, 106–108 quantile arithmetic and, 107–108 triplex arithmetic and, 106 optimization under, 132 propagation of, 129 UniCalc software, 119 Unified approach, for interval analysis/fuzzy set theory, 75–192. See also Distribution arithmetic; Fuzzy set theory; General theory of uncertainty; Interval analysis Unions, 87–88 United extension, 79, 80–81, 88, 90 interval analysis and, 79, 88 interval arithmetic from, 95–98 interval extension principle as, 90–92 Univariate interval Newton method, 113 UTSC layer. See Ultrathin semiconductor layer
V Vacuum barrier, 36 Vacuum technology, 2, 22, 33 FE devices v., 67 Validation, 112. See also Verification Validity regions, 8, 9f FE, 8, 9f TE, 8, 9f T-F, 8, 9f
Variable precision interval arithmetic, 109 Vectors, cloudy, 166 Verification, 76, 80, 83–85 in context of interval analysis, 111 definition of, 112 enclosure and, 76, 80, 83–85, 110–116 in distribution arithmetic, 132 in fuzzy set theory, 129 theorems and, 112 Verified computing, 79 Vertex matrix, 146–147 cross-off algorithm and creation of, 147–148 Viscosity, 224–226. See also Mercury flooding; Oil flooding Viscous openings, 233, 233f, 234 transformation, 233 watershed line, 226–238, 239f generalizations, 236–239 watershed transforms, 231–234, 245 Vision, artificial, 196, 213 Voronoi partition, 201 tessellation, 213, 213f
W Waterfall, 212 Watershed line, 200, 203, 203f, 208. See also Mercury flooding; Oil flooding viscous, 226–238, 239f generalizations, 236–239 Watershed partitions energy minimization in, 210f example of, 211f pyramid of, 213, 214, 214f Watershed points, 200 Watershed transforms, 195–251 efficient computation methods of, 207 experiments and, 239–245 flooding analogy and, 202–203, 203f graphs and, 212–213, 212f applications of, 213 history of, 196–204 illustration of, 204, 205f for image segmentation, 210–212 key contributions to, 204–218 oversegmentation of, 200
264 precision and robustness of, 218–219 regularization of, 216–218 as result of numerical thinnings, 210 as segmentation paradigm, 200 as SKIZ computed on level sets of function, 206–207, 206f computed on numerical function, 207–209 as solution of energy-minimization problem, 209, 210f viscous, 231–234 Work function 2 eV or less, 16–22
INDEX 5 eV, 12–15 LaS and intrinsic low, 61–67
Z Zadeh, L. extension principle of, 78, 79, 81, 91, 125–126 fuzzy set theory and, 78, 79, 81, 119 possibility theory and, 119 ZECA. See Zero emission current approximation Zero emission current approximation (ZECA), 37
(a)
(b)
(c)
(d)
(e)
(f)
M EYER AND VACHIER , F IGURE 7. Beucher and Lantuéjoul method for segmenting binary shapes. (a) Dotted square, (b) distance function, (c) minima of the distance function (corresponding to the ultimate eroded of the negative original set), (d) SKIZ (or watershed line) computed on the distance function, (e) original square and imposed markers, and (f) watershed partition associated to a set of sources.
(a)
(b)
(c)
(d)
(e) M EYER AND VACHIER , F IGURE 8. Beucher and Lantuéjoul method for segmenting binary shapes. (a) Original and (b) negative image; (c) distance function and (d) regional maxima of the distance function (in red), and (e) final segmentation (in green) superimposed onto the original drawing.
(a)
(b)
(c)
(d)
(e)
(f)
M EYER AND VACHIER , F IGURE 21. The features of levelings. (a) Original image and simplified images obtained by area filterings; shapes of area lower than 50 (b) then 1000 (c) are eliminated. (d)–(f) Extrema of the precedent images (regional maxima are white; regional minima are red). Note the monotonic decreasing of the extrema number.
(a)
(b)
(c)
(d)
M EYER AND VACHIER , F IGURE 28. Watershed partition obtained after closing the relief. (a) Original contour and markers (in gray), (b) closed image (the structuring element is a Euclidean disk), (c) watersheds computed on the original and (d) on the closed image. The markers used for the segmentation are represented in cyan.
M EYER AND VACHIER , F IGURE 39. Formation of the catchment basins during the computation of the watershed transform: in white, the lakes; in red, the watershed points.
M EYER AND VACHIER , F IGURE 40. Formation of the catchment basins during the computation of the viscous watershed transform: in white, the viscous lakes; in red, the viscous watershed points.
(a)
(b)
(c)
(d)
M EYER AND VACHIER , F IGURE 49. Comparison of the viscous transforms. (a) Conic hole (the luminance increases from blue to red), (b) effect of the viscous closing (oil type), (c) effect of the viscous transform (mercury type). The last figure (d) represents a view of the closed sets from above for the mercury model.
(a)
(b)
(c)
(d)
M EYER AND VACHIER , F IGURE 59. Watersheds computed on the (a) original relief, (b) after closing the relief, (c) after dilating the relief ( N 0 δn (f )), and (d) after a viscous closing of the relief.