Advances in
IMAGING AND ELECTRON PHYSICS VOLUME
152
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES-CNRS Toulouse, France
HONORARY ASSOCIATE EDITORS
TOM MULVEY BENJAMIN KAZAN
Advances in
IMAGING AND ELECTRON PHYSICS VOLUME
152 Edited by
PETER W. HAWKES CEMES-CNRS, Toulouse, France
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier 525 B Street, Suite 1900, San Diego, California 92101-4495, USA 84 Theobald’s Road, London WC1X 8RR, UK This book is printed on acid-free paper. Copyright © 2008, Elsevier Inc. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the Publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (www.copyright.com), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-2008 chapters are as shown on the title pages. If no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/2008 $35.00 Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, E-mail:
[email protected]. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.” For information on all Elsevier – Academic Press publications visit our Web site at www.books.elsevier.com ISBN-13: 978-0-12-374219-3 Printed in the United States of America 08 09 10 11 9 8 7 6 5 4 3 2 1
CONTENTS
Preface Contributors Future Contributions
1. Stack Filters: From Definition to Design Algorithms
ix xi xiii
1
Nina S. T. Hirata I. Introduction II. Stack Filters III. Optimal Stack Filters IV. Stack Filter Design Approaches V. Application Examples VI. Conclusion Acknowledgments References
2. The Foldy–Wouthuysen Transformation Technique in Optics
1 4 20 26 35 39 44 44
49
Sameen Ahmed Khan I. Introduction II. The Foldy–Wouthuysen Transformation III. Quantum Formalism of Charged-Particle Beam Optics IV. Quantum Methodologies in Light Beam Optics V. Conclusion Appendix A Appendix B Acknowledgments References
3. Nonlinear Systems for Image Processing
49 51 58 60 62 64 66 73 74
79
Saverio Morfu, Patrick Marquié, Brice Nofiélé, and Dominique Ginhac I. II. III. IV. V.
Introduction Mechanical Analogy Inertial Systems Reaction–Diffusion Systems Conclusion
79 83 95 108 133
v
vi
Contents
VI. Outlooks Acknowledgments Appendix A Appendix B Appendix C Appendix D References
4. Complex-Valued Neural Network and Complex-Valued Backpropagation Learning Algorithm
134 141 142 143 144 145 146
153
Tohru Nitta I. Introduction II. The Complex-Valued Neural Network III. Complex-Valued Backpropagation Learning Algorithm IV. Learning Speed V. Generalization Ability VI. Transforming Geometric Figures VII. Orthogonality of Decision Boundaries in the Complex-Valued Neuron VIII. Conclusions References
5. Blind Source Separation: The Sparsity Revolution
154 155 162 169 175 181 209 217 218
221
J. Bobin, J.-L. Starck, Y. Moudden, and M. J. Fadili I. Introduction II. Blind Source Separation: A Strenuous Inverse Problem III. Sparse Multichannel Signal Representation IV. Morphological Component Analysis for Multichannel Data V. Morphological Diversity and Blind Source Separation VI. Dealing With Hyperspectral Data VII. Applications VIII. Conclusion References
6. “Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
222 224 231 237 244 275 284 296 298
303
Ray L. Withers I. Introduction II. The Modulation Wave Approach III. Applications of The Modulation Wave Approach
303 309 313
Contents
vii
IV. Selected Case Studies V. Conclusions Acknowledgments References
323 332 332 332
Contents of Volume 151
339
Index
341
Corrigendum Color Plate Section
This page intentionally left blank
PREFACE
Six chapters make up this new volume, with contributions on electron microscopy, neural networks, stack filters, blind source separation and, a very novel topic, the Foldy-Wouthuysen transformation in optics. Stack filters, of which median filters are the best known in practice, have a large literature, some abstrusely mathematical, some experimental. N.S.T. Hirata takes us systematically through the subject, with sections on the relation between these and morphological filters, the design of optimal filters and examples of such designs. This very clear and methodical account will, I am sure, be found helpful. This is followed by an account of the Foldy-Wouthuysen transformation as applied to optics by S.A. Khan, who has already contributed to these Advances with R. Jagannathan on a related subject, the study of electron optics via quantum mechanics. First, the transformation is described and the necessary mathematics recapitulated. The quantum approach to charged particle optics is then introduced and the chapter concludes with an examination of the same approach in connection with light optics. I am delighted to include here this very novel work, which sheds a new light on the foundations of electron wave optics. The third chapter too deals with a very novel theme, the role of nonlinear systems and tools in image processing. Here, S. Morfu, P. Marquié, B. Nofiélé and D. Ginhac explain how nonlinearity extends the types of processing of interest and discuss in detail their implementation on cellular neural networks. Many of these ideas were completely new to me and I hope that readers will find them as stimulating as I did. The values of the elements of neural networks need not be real, as T. Nitta explains in a chapter on complex-valued networks. After an introduction to the notion of a complex-valued neuron, T. Nitta introduces complexvalued back-propagation learning algorithms and then considers many practical aspects of the procedure in a long and lucid presentation. The following chapter brings us back to one of the perennial problems of image processing, source separation in the absence of any detailed information about the system response. J. Bobin, J.-L. Starck, Y. Moudden and M.J. Fadili give an account of the progress that is being made thanks to sparsity and morphological diversity. All aspects of the method are presented in detail and this long chapter too is itself a short monograph on the topic.
ix
x
Preface
The volume concludes with a contribution by R.L. Withers on the problem of imaging disordered, or rather, locally ordered crystal phases, which generate highly structured diffuse intensity distributions around the strong Bragg reflections of the average structure. This clear analysis of a complex subject will certainly be frequently consulted. All these contributions contain much novel or very recent material and I am extremely pleased to include such studies in these Advances. The authors are warmly thanked for taking so much trouble to make this work accessible to a wide audience. I am delighted to report that the whole series of Advances has now been made available by Elsevier on their ScienceDirect database, right back to volume I when the title was Advances in Electronics and the editor was the late Ladislaus (Bill) Marton. Peter W. Hawkes
CONTRIBUTORS
Tohru Nitta National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki, 305-8568 Japan Ray L. Withers Research School of Chemistry, Australian National University Canberra, A.C.T, 0200, Australia S. Morfu, P. Marquié, B. Nofiélé and D. Ginhac Laboratoire LE2I UMR 5158, Aile des sciences de l’ingénieur, BP 47870 21078 Dijon, Cedex, France Sameen Ahmed Khan Engineering Department, Salalah College of Technology, Post Box No. 608, 211 Salalah, Sultanate of Oman Nina S. T. Hirata Department of Computer Science, Institute of Mathematics and Statistics, University of São Paulo, Rua do Matão, 1010, 05508-090 São Paulo, SP – Brazil J. Bobin, J.-L. Starck and Y. Moudden Laboratoire AIM, CEA/DSM-CNRS-Université Paris Diderot, CEA Saclay, IRFU/SEDI-SAP, Service d’Astrophysique, Orme des Merisiers, 91191, Gif-surYvette, France M. J. Fadili GREYC CNRS UMR 6072, Image Processing Group, ENSICAEN 14050, Caen Cedex, France
xi
This page intentionally left blank
FUTURE CONTRIBUTIONS
S. Ando Gradient operators and edge and corner detection W. Bacsa Optical interference near surfaces, sub-wavelength microscopy and spectroscopic sensors P. E. Batson (vol. 153) First results uing the Nion third-order STEM corrector C. Beeli Structure and microscopy of quasicrystals A. B. Bleloch (vol. 153) STEM and EELS: mapping materials atom by atom C. Bobisch and R. Möller Ballistic electron microscopy G. Borgefors Distance transforms Z. Bouchal Non-diffracting optical beams F. Brackx, N. de Schepper and F. Sommen The Fourier transform in Clifford analysis A. Buchau Boundary element or integral equation methods for static and time-dependent problems B. Buchberger Gröbner bases T. Cremer Neutron microscopy N. de Jonge and E. C. Heeres Electron emission from carbon nanotubes A. X. Falcão The image foresting transform R. G. Forbes Liquid metal ion sources B. J. Ford The earliest microscopical research C. Fredembach Eigenregions for image classification
xiii
xiv
Future Contributions
A. Gölzhäuser Recent advances in electron holography with point sources D. Greenfield and M. Monastyrskii (vol. 155) Selected problems of computational charged particle optics M. Haider, H. Müller and S. Uhlemann (vol. 153) Present and future hexapole correctors for high resolution electron microscopes H. F. Harmuth and B. Meffert (vol. 154) Dirac’s difference equation and the physics of finite differences M. I. Herrera The development of electron microscopy in Spain F. Houdellier, M. Hÿtch, F. Hüe and E. Snoeck (vol. 153) Aberration correction with the SACTEM–Toulouse: from imaging to diffraction J. Isenberg Imaging IR-techniques for the characterization of solar cells K. Ishizuka Contrast transfer and crystal images A. Jacobo Intracavity type II second-harmonic generation for image processing B. Kabius and H. Rose (vol. 153) Novel aberration correction concepts L. Kipp Photon sieves A. Kirkland, P. D. Nellist, L.-Y. Chamg and S. J. Haigh (vol. 153) Aberration-corrected imaging in CTEM and STEM G. Kögel Positron microscopy T. Kohashi Spin-polarized scanning electron microscopy O. L. Krivanek. N. Dellby, R. J. Keyse, M. F. Murfitt, C. S. Own and Z. S. Szilagyi (vol. 153) Aberration correction and STEM R. Leitgeb Fourier domain and time domain optical coherence tomography B. Lencová Modern developments in electron optical calculations H. Lichte New developments in electron holography M. Matsuya Calculation of aberration coefficients using Lie algebra
Future Contributions
xv
S. McVitie Microscopy of magnetic specimens P. G. Merli and V. Morandi Scanning electron microscopy of thin films M. A. O'Keefe Electron image simulation D. Oulton and H. Owens Colorimetric imaging N. Papamarkos and A. Kesidis The inverse Hough transform K. S. Pedersen, A. Lee and M. Nielsen The scale-space properties of natural images S. J. Pennycook (vol. 153) Some applications of aberration-corrected electron microscopy E. Rau Energy analysers for electron microscopes E. Recami Superluminal solutions to wave equations H. Rose (vol. 153) History of direct aberration correction G. Schmahl X-ray microscopy R. Shimizu, T. Ikuta and Y. Takai Defocus image modulation processing in real time S. Shirai CRT gun design methods T. Soma Focus-deflection systems and their applications I. Talmon Study of complex fluids by transmission electron microscopy N. Tanaka (vol. 153) Aberration-corrected microscopy in Japan M. E. Testorf and M. Fiddy Imaging from scattered electromagnetic fields, investigations into an unsolved problem N. M. Towghi Ip norm optimal filters E. Twerdowski Defocused acoustic transmission microscopy
xvi
Future Contributions
Y. Uchikawa Electron gun optics K. Urban (vol. 153) Aberration correction in practice K. Vaeth and G. Rajeswaran Organic light-emitting arrays M. van Droogenbroeck and M. Buckley Anchors in mathematical morphology M. Yavor Optics of charged particle analysers Y. Zhu and J. Wall (vol. 153) Aberration-corrected electron microscopes at Brookhaven National Laboratory
CHAPTER
1 Stack Filters: From Definition to Design Algorithms Nina S. T. Hirata*
Contents
I Introduction II Stack Filters A Preliminaries B Stack Filters: Definition and Properties C Subclasses of Stack Filters D Relation to Morphological Filters III Optimal Stack Filters A Mean Absolute Error Optimal Stack Filters B Equivalent Optimality in the Binary Domain C Formulation as a Linear Programming Problem IV Stack Filter Design Approaches A Overview B Heuristic Solutions C Optimal Solution V Application Examples A Design Procedure B Examples VI Conclusion Acknowledgments References
1 4 4 7 10 17 20 20 23 25 26 26 27 33 35 35 36 39 44 44
I. INTRODUCTION Many nonlinear filters such as the median, rank-order, order statistic, and morphological filters became known in the 1980s (Bovik et al., 1983; Brownrigg, 1984; Haralick et al., 1987; Heygster, 1982; Huang, 1981; Justusson, 1981; Lee and Kassam, 1985; Maragos and Schafer, 1987a, b; Pitas and Venetsanopoulos, 1990; Prasad and Lee, 1989; Serra, 1982, 1988; Serra and Vincent, 1992; Wendt et al., 1986). The state of the art in the area of nonlinear filters at the end of the 1980s is compiled in one of the * Department of Computer Science, Institute of Mathematics and Statistics, University of São Paulo, Rua do Matão, 1010, 05508-090 São Paulo, SP – Brazil Advances in Imaging and Electron Physics,Volume 152, ISSN 1076-5670, DOI: 10.1016/S1076-5670(08)00601-0. Copyright © 2008 Elsevier Inc. All rights reserved.
1
2
Nina S. T. Hirata
first books on that subject (Pitas and Venetsanopoulos, 1990). Since then, several other books on nonlinear filters have been published (Dougherty and Astola, 1999; Heijmans, 1994; Marshall and Sicuranza, 2006; Mitra and Sicuranza, 2000; Soille, 2003). Many of the nonlinear filters are derived from order statistics (Pitas and Venetsanopoulos, 1992). Median filters are the best known among those based on order statistics, and they are the root of other classes of filters in the sense that many classes of filters have been derived as generalizations of the median filter. Two findings played key roles in the development of new classes of nonlinear filters from median filters: (1) the “threshold decomposition structure” first observed in median filters (Fitch et al., 1984) that allows multilevel signal filtering to be reduced to binary signal filtering, and (2) the possibility of choosing an arbitrary rank element rather than the median as the output of the filter. The first finding led to the introduction of a general class known as stack filters (Wendt et al., 1986)—the subject of this chapter, and the second one to the development of rankorder (Heygster, 1982; Justusson, 1981; Nodes and Gallagher, 1982) and order statistic filters (Bovik et al., 1983). Both stack filters and order statistic filters include the median and the rank-order filters as particular cases. Median filters initially were considered an alternative to linear filters because they have, for instance, better edge-preservation capabilities. However, compared to stack filters, when applied on images, median filters tend to produce blurred images, destroying details. An example of the effects of median and stack filters is shown in Figure 1. The stack filter does not suppress all the noise as the median filter does; however, its output is much sharper than the output of the median filter. Another major class of nonlinear filters that became known around the same time are morphological filters (Haralick et al., 1987; Serra, 1982, 1988; Serra and Vincent, 1992). They include very popular filters such as openings and closings. Although developed independently, morphological operators are strongly related to stack filters. Maragos and Schafer (1987b) have shown the connections of stack filters and morphological operators. In fact, they have shown that stack filters correspond to morphological increasing operators with flat structuring elements. This chapter provides an overview of stack filters. The previous text briefly contextualizes stack filters within the scope of nonlinear filters. The remainder of this text is written to answer the following three questions: 1. What are stack filters? 2. How do stack filters relate to other classes of filters? 3. How to design stack filters from training data? In order to answer these questions, this chapter presents an extensive account on stack filters, divided in four major sections. Section II introduces basic definitions and notations, followed by a definition of
Stack Filters: From Definition to Design Algorithms
FIGURE 1 images.
3
From left to right: original, corrupted, median-filtered, and stack-filtered
stack filters and some of their properties, such as equivalence to positive Boolean functions. The section also includes median, rank-order filters, and their generalizations viewed as subclasses of the stack filters. Section II ends with a brief explanation of the relation between stack filters and morphological operators. Section III formally characterizes the notion of optimal stack filters in the context of statistical estimation. Optimality is considered with respect to the mean absolute error (MAE) criterion because there is a linear relation between the MAE of a stack filter (relative to multilevel signals) and the MAEs relative to the binary cross sections. The formulation, based on costs derived from the joint distribution of the processes corresponding to images to be processed and respective ideal output images, allows a clear delineation between the theoretical formulation of the design problem and the process of estimating costs from data. Section IV presents an overview of the main stack filter design approaches. In particular, heuristic algorithms that provide suboptimal solutions and a recent algorithm that provides an optimal solution are described. All these algorithms use training data to explicitly or implicitly estimate the costs involved in the theoretical formulation of the design problem. Section V presents examples of images processed by stack filters that have been designed using the exact solution algorithm. The last
4
Nina S. T. Hirata
section highlights some important issues reported throughout the text and discusses some of the remaining challenges.
II. STACK FILTERS A. Preliminaries 1. Signals and Operators Formally, a digital signal defined on a certain domain E is a mapping f : E → K, where K = {0, 1, . . . , k}, with 0 < k ∈ N, is the set of intensities or gray levels.1 Given a signal definition domain E and a set of levels K, the set of all signals defined on E with levels in K will be denoted as K E . In particular, if k = 1, the signals are binary and they can be equivalently represented by subsets of E. The set of all binary signals defined on E is denoted {0, 1}E or, equivalently, P(E) (the collection of all subsets of E). If k > 1, then the signals are multilevel. The translation of a signal f ∈ K E by q ∈ E is denoted fq and defined by, for any p ∈ E, fq (p) = f (p − q). Analogously, given a set X ⊆ E, its translation by q ∈ E, denoted Xq , is defined as Xq = {p + q | p ∈ X}. The transpose of X, ˇ is defined as X ˇ = {−p | p ∈ X}. denoted X, Signal processing may be performed by operators of the form : K E → K E . Binary signal operators also can be represented by set operators, that is, mappings of the form : P(E) → P(E).
a. W-Operators. In particular, operators of great interest are those that are locally defined. The notion of local definition is characterized by a neighborhood window in the following manner. Let W ⊆ E be a finite set, to be called window. Usually, window W is a connected set of points in E, containing the origin of E. The origin of E will be denoted o. An operator : K E → K E is locally defined within W if, for any p ∈ E, [( f )](p) = [( f |Wp )](p)
(1)
where f |Wp corresponds to the input signal f restricted to W around p. This is equivalent to say that, for any p ∈ E, there exists ψp : K W → K such that
[( f )](p) = ψp ( f−p |W )
(2)
where f−p |W is just to guarantee that the domain of function ψp is W. 1 We consider E = Z for one-dimensional signals and E = Z2 for two-dimensional signals (or images).
Stack Filters: From Definition to Design Algorithms
5
Operator is translation invariant if, for any p ∈ E,
[( f )]p = ( fp )
(3)
that is, if applying the operator and then translating the output signal is equivalent to first translating the input signal and then applying the operator. An operator that is both translation invariant and locally defined within W can thus be characterized by one function ψ: K W → K (i.e., ψp = ψ for all p ∈ E). More precisely, the output of , for a given input signal f, at any location p ∈ E, is given by
[( f )](p) = ψ( f−p |W ).
(4)
These operators will be called W-operators. The function ψ is called the characteristic function of . If is a binary operator (i.e., : {0, 1}E → {0, 1}E ), then its characteristic function ψ can be seen as a Boolean function on d = |W| Boolean variables x1 , x2 , . . . , xd . More specifically, supposing W = {w1 , w2 , . . . , wd }, wi ∈ E, i = 1, 2, . . . , d, for any f ∈ {0, 1}E , ψ( f−p |W ) corresponds to the Boolean function ψ evaluated for xi = f−p (wi ), i = 1, . . . , d.
b. Increasing Operators. For any f1 , f2 ∈ K E , f1 ≤ f2 ⇐⇒ f1 (p) ≤ f2 (p), ∀p ∈ E. An operator is increasing if, for any f1 , f2 ∈ K E , f1 ≤ f2 implies ( f1 ) ≤ ( f2 ). In set notation, is increasing if, for any S1 , S2 ⊆ E, S1 ⊆ S2 implies (S1 ) ⊆ (S2 ). c. Diagram Representation of W-Operators. A visual representation of operators is useful to illustrate some concepts. Here the diagram representation of binary W-operators used henceforth is introduced. Given W, binary signals in {0, 1}W can be represented as subsets of W or, equivalently, as elements in {0, 1}d . Suppose that W = {w1 , w2 , w3 }, the binary signal g ∈ {0, 1}W with g(w1 ) = 0, g(w2 ) = 0, and g(w3 ) = 0 corresponds to the element 000 ∈ {0, 1}3 , and so on. The set {0, 1}d with the usual ≤ relation (i.e., for any u = (u1 , u2 , . . . , ud ), v = (v1 , v2 , . . . , vd ) ∈ {0, 1}d , u ≤ v if and only if ui ≤ vi , i = 1, 2, . . . , d) is a partially ordered set. Together with the usual logical operations (OR +, AND ·, and NEGATION ·), it forms a Boolean lattice. Partially ordered sets can be depicted by Hasse diagrams. The diagram at the left side of Figure 2 corresponds to the representation of {0, 1}3 . Each element of the lattice is represented by a vertex and two vertices corresponding to elements u and v, such that u < v, are linked if and only if there is no other element w such that u < w < v. The diagram at the right side
6
Nina S. T. Hirata
111
011 001
101
110 100
010
000 FIGURE 2 Left: representation of {0, 1}3 . Right: representation of ψ: {0, 1}3 →{0, 1} with ψ(111) = ψ(011) = ψ(101) = ψ(001) = 1 and ψ(110) = ψ(010) = ψ(100) = ψ(000) = 0.
corresponds to the representation of the function ψ: {0, 1}3 → {0, 1} with ψ(111) = ψ(011) = ψ(101) = ψ(001) = 1 and ψ(110) = ψ(010) = ψ(100) = ψ(000) = 0. Elements in {0, 1}3 mapped to 1 are depicted by solid circles, whereas those mapped to 0 are depicted by open circles. In particular, in this example ψ is increasing (i.e., if ψ(u) = 1, then ψ(v) = 1 for any v > u). If a W-operator is increasing, whenever an element is solid, all elements above it (according to the partial order relation) are necessarily solid in its Hasse diagram representation.
2. Thresholding and Stacking a. Thresholding. The threshold of a value y ∈ K at a level t ∈ K is represented by Tt (y) =
1, if y ≥ t, 0, if y < t.
(5)
The mapping Tt from K E to {0, 1}E , given by, for any f ∈ K E and t ∈ K,
(Tt [ f ])(p) = Tt ( f (p)), p ∈ E,
(6)
is called the threshold of f at level t. The binary signal Tt [ f ] defines a subset of E, called the cross section of f at level t. Notice that Tt (·) denotes a single value, whereas Tt [·] denotes a signal.
b. Threshold Decomposition Structure. According to the threshold decomposition structure of a signal, any signal f ∈ K E can be expressed as f (p) =
k t=1
Tt ( f (p)), p ∈ E.
(7)
Stack Filters: From Definition to Design Algorithms
7
Gray-level signal (Threshold level 3) (Threshold level 2)
ADD
(Threshold level 1) FIGURE 3 Threshold decomposition structure of a 1D signal.
Figure 3 shows a one-dimensional (1D) signal of length 11 with k = 3 and its threshold decomposition structure. By summing the cross sections, the original multilevel signal can be retrieved.
c. Operators That Commute With Thresholding. Hereafter, operators : K E → K E are assumed to satisfy ({0, 1}E ) ⊆ {0, 1}E (that is, all binary signals are mapped to binary signals). Definition 1. Let : K E → K E . Then commutes with the threshold operation if and only if Tt [( f )] = (Tt [ f ]) for all t ∈ K and f ∈ K E . In other words, applying on a signal f and then thresholding the resulting signal at any level t yields a binary signal that is exactly the same as the one obtained by first thresholding f at level t and then applying . Theorem 1 (Maragos and Schafer, 1987a). An operator commutes with thresholding if and only if it is increasing. If commutes with thresholding, it is an immediate consequence that its characteristic function ψ also commutes with thresholding (i.e., Tt (ψ( g)) = ψ(Tt [g]) for all g ∈ K W ). Moreover, it is not difficult to see that ψ is also increasing.
B. Stack Filters: Definition and Properties Before defining stack filters, it is expedient to understand median filters and their threshold decomposition structure. Median filters are parameterized by a sliding window of odd size d. For each location, the output is the median of the d observations under the window. Figure 4 shows the characteristic function of the 3-point window binary median filter. For each element in {0, 1}3 , output is 1 only if at least two components have value equal to 1. The threshold decomposition structure of median filters (Fitch et al., 1984) is illustrated in Figure 5. At the top left is the input signal, and at the top right is the median filtered signal. In the lower part, at the left side are
8
Nina S. T. Hirata
111
011 001
101 010
110 100
000 FIGURE 4 Three-point width median filter. Output for an element in {0, 1}3 is 1 (solid circles) if and only if at least two components have value 1.
Median
Thresholding
Stacking Binary median Binary median Binary median Binary median Binary median
FIGURE 5 Threshold decomposition structure of median filters.
the five binary signals obtained by thresholding the input signal; the right side shows the respective median filtered signals. Their addition equals the output signal. The fact that median filters possess the threshold decomposition structure implies that the median filtered output for a multilevel signal can be obtained as a sum of the (binary) median filtered outputs of its cross sections. Note that the median of binary observations can be computed based only on counting (see details later). Hence, sorting of d observations required by multilevel median filters may be avoided. From a practical point of view, simplicity of the counting circuitry over the sorting circuitry was an important issue for hardware implementation and it has prompted investigations to find other filters with the same decomposition structure. These investigations led to the introduction of the stack filters. Stack filters are commonly defined as the filters that “obey a weak superposition property known as the threshold decomposition and an ordering property known as the stacking property” (Coyle and Lin, 1988; Wendt et al., 1986).
Stack Filters: From Definition to Design Algorithms
9
An operator , characterized by function ψ, obeys the threshold decomposition property if
[( f )](p) =
k
ψ(Tt [ f−p |W ]),
(8)
t=1
and it obeys the stacking property if for all 1 ≤ t < k
(Tt [ f ]) ≥ (Tt+1 [ f ]).
(9)
Notice that the stacking property is nothing more than increasingness. It is possible to obtain different filters by considering different binary (Boolean) functions at the right side of Eq. (8). Provided the chosen function obeys the stacking property, the resulting operator is a stack filter. Gilbert (1954) showed that Boolean functions obeying the stacking property are the monotone (positive) ones (Coyle and Lin, 1988). Thus, a stack filter can be built simply by choosing a positive Boolean function (PBF) for the right side of Eq. (8). Example 1. A 3-point width window binary median filter outputs 1 if and only if at least two components in the input have value 1. By assigning Boolean variables x1 , x2 , and x3 , respectively, to the three input components, the filter can be characterized by the Boolean function ψ(x1 , x2 , x3 ) = x1 x2 + x1 x3 + x2 x3 . The + signal corresponds to the logical OR, while xi xj expresses the logical AND operation between xi and xj . The corresponding multilevel median is obtained by replacing the logical AND by minand the logical OR, by max. Thus, given v1 , v2 , v3 ∈ K, med(v1 , v2 , v3 ) = max min{v1 , v2 }, min{v1 , v3 }, min{v2 , v3 } . Another characterization of stack filters is as operators that commute with thresholding (see Definition 1.1). If commutes with thresholding, then it can be expressed as
[( f )](p) =
k
Tt ([( f )](p))
t=1
=
k [(Tt [ f ])](p) t=1
=
k
ψ(Tt [ f−p |W ])
t=1
= max{t ∈ K | ψ(Tt [ f−p |W ]) = 1}.
(10)
10
Nina S. T. Hirata
The first equality is simply the threshold decomposition of ( f ); the second one holds because commutes with thresholding; the third one rewrites the second in terms of the characteristic function; and since they are increasing, if ψ(Tt [ f−p |W ]) = 1 for a given t, then ψ(Tt [ f−p |W ]) = 1 for all t < t (and, equivalently, if ψ(Tt [ f−p |W ]) = 0 for a given t, then ψ(Tt [ f−p |W ]) = 0 for all t > t), which implies the last equality. From Eq. (10) it follows that operators that commute with thresholding obey the threshold decomposition [Eq. (8)] and, from Theorem 1 it follows that they obey the stacking property [Eq. (9)]. Conversely, operators that obey the threshold decomposition and the stacking property commute with thresholding. To see that, let be an operator that obeys the threshold decomposition and the stacking property. From the threshold decomposition structure of ( f ) and Eq. (8),
( f ) =
k
Tt [( f )] =
t=1
k
(Tt [ f ]).
(11)
t=1
Moreover, T1 [( f )] ≥ T2 [( f )] ≥ . . . ≥ Tk [( f )] because thresholding generates a non-increasing sequence of cross sections, and (T1 [ f ]) ≥ (T2 [ f ]) ≥ . . . ≥ (Tk [ f ]) because obeys the stacking property. Hence, it can be concluded that Tt [( f )] = (Tt [ f ]) for any t = 1, 2, . . . , k. In summary, stack filters can be characterized as those that (1) possess the threshold decomposition and the stacking properties, (2) correspond to the positive Boolean functions when their domain is restricted to binary signals, or (3) commute with thresholding.
C. Subclasses of Stack Filters The best known subclass of the stack filters are the median filters. A natural extension of the median filters are the rank-order filters. These two classes of filters and respective weighted versions are reviewed in this section.
1. Median Filters The use of the median as a filter was first proposed in the early 1970s by Tukey for time series analysis (Tukey, 1974, 1977). Median filters were soon extended for two-dimensional (2D) signals (images) (Pratt, 1978) and became very popular due to their simplicity, capability for preserving edges better than linear filters (Pitas and Venetsanopoulos, 1990), and for removing impulse noise. Since then, several works on this subject have been published dealing with their properties (Gallagher and Wise 1981; Nodes and Gallagher, 1982) or their applications (Narendra, 1978; Schmitt et al., 1984; Scollar et al., 1984; Tyan, 1982). However, it has been observed
Stack Filters: From Definition to Design Algorithms
11
that median filters may cause edge jitter (Bovik et al., 1987) or streaking (Bovik, 1987), may destroy fine details (Nieminen et al., 1987), and cannot be tuned to remove or retain some predefined set of feature types (Brownrigg, 1984). To overcome these drawbacks, one modification of median filters resulted in the class of weighted median filters (Justusson, 1981). They act basically in the same manner as median filters except that weighted median filters assign a weight to each point in the sliding window and then the median is taken after duplicating each sample in the input by its corresponding weight. The use of weighted median to filter particular structural patterns from images has been investigated by Brownrigg (1984). Figure 6 shows the action of the weighted median filter based on a 5-point window, with weights (1, 2, 3, 2, 1). Many other variations of the median filter (a compilation may be found in Pitas and Venetsanopoulos (1990)), as well as studies of their deterministic and statistical properties (Gallagher and Wise, 1981; Justusson, 1981; Ko and Lee, 1991; Nodes and Gallagher 1982; Prasad and Lee 1989; Sun et al., 1994; Tyan, 1982; Yin et al., 1996) have been reported. Applications of median filters reported in the literature include a varying set of problems in signal and image processing, such as the elimination of pitches in digital speech signals (Rabiner et al., 1975), correction of transmission errors in digital speech signals (Jayant, 1976), the correction of scanner noise by removing salt-and-pepper artifacts (Wecksung and Campbell, 1974), enhancement of edge gradients by elimination of spurious oscillations (Frieden, 1976), image enhancement (Huang, 1981; Loupas et al., 1987; Narendra, 1981; Pratt, 1978; Rosenfeld and Kak, 1982; Scollar et al., 1984), satellite image processing (Carla et al., 1986), and biological/biomedical image processing (Grochulski et al., 1985; Schmitt et al., 1984). As mentioned previously, median filters possess the threshold decomposition structure (Fitch et al., 1984). Figure 7 shows the equivalence of
sort
duplicate
median FIGURE 6 Weighted median filter: window of size 5 and weight vector (1, 2, 3, 2, 1).
12
Nina S. T. Hirata
computing the median on multilevel data or on their cross sections. The shaded five columns in the signal correspond to the 5-point neighborhood considered for the median computation. As mentioned, the median of multilevel data requires sorting (Figure 7a), whereas median of binary values requires only counting (Figure 7b).
2. Rank-Order Filters A straightforward extension of the median filters is the rank-order filter (Justusson, 1981; Heygster, 1982; Nodes and Gallagher, 1982), based on order statistics. Given realizations u1 , u2 , . . . , ud of d random variables, with d ∈ N, d > 0, and r ∈ N, 1 ≤ r ≤ d, the r-th smallest2 element in the samples
sort
(a)
median (b)
threshold
counting
is > = 3?
# of answerYES
FIGURE 7 The threshold decomposition structure guarantees computation of the median based only on counting (and no sorting). (a) Median computation via sorting. (b) Median computation via counting.
2 Instead of the r-th smallest element, a common practice is to consider the r-th largest element. This issue
is clarified later.
Stack Filters: From Definition to Design Algorithms
13
u1 , u2 , . . . , ud is called the r-th order statistic and is denoted u(r) . Thus, u(1) ≤ u(2) ≤ . . . ≤ u(d) . If d is odd, then u((d+1)/2) is the median. The order statistics u(1) and u(d) are, respectively, the minimum and the maximum. By assigning a random variable to each point in the window and positioning it at any location of the input signal domain, the values of the signal under the window can be seen as realizations of those random variables. The rank-order filters are those filters that, instead of the median, outputs the r-th order statistics among the observations, 1 ≤ r ≤ d. This class includes the median filter, r = (d + 1)/2, as a particular case. Applications of rank-order filters include filtering of cell pictures (Heygster, 1982), detection of narrow-band signals (Wong and Chen, 1987), and document image analysis (Ma et al., 2002). The weighted version of the rank-order filters are termed generalized rank-order filters (Wendt et al., 1986) and weighted order statistic (WOS) filters (Yli-Harja et al., 1991). They work as follows: let u1 , u2 , . . . , ud be realizations of d random variables, d ∈ N, d > 0, let = (ω1 , ω2 , . . . , ωd ), ωi ∈ N and ωi > 0 for all i, and let r ∈ N, 1 ≤ r ≤ ωi . Each sample ui is duplicated by its respective weight ωi to obtain a sequence of ωi elements. The filter that outputs the element of rank r from this sequence is the WOS filter with weight and rank r. Note that the term order statistic filters is more commonly used to refer to d aj u(j) , where aj are real coefficients. They are also filters defined by y = j=1
known as L-filters. They generalize the rank-order, moving average, and other filters (see Bovik et al., 1983; Pitas and Venetsanopoulos, 1992). The two basic differences of these filters from WOS filters are: (1) WOS filters first duplicate each observation by the corresponding weight and then compute the order statistics, whereas order statistic filters do the inverse, and (2) weights of WOS filters are positive integers, whereas coefficients of order statistic filters are real numbers. Figure 8 shows an example of a WOS filter. Duplication by a given weight vector can be understood as a mapping to a space of larger dimension. If all elements of same Hamming weight are depicted horizontally side by side, then a WOS filter corresponds to tracing a horizontal line in the expanded lattice diagram and mapping all elements above that line to 1 and all elements below it to 0. This fact is precisely what defines the characterization of WOS filters as a counting (threshold) function, as explained in the following text. In general, determination of the element at a given rank requires that elements be first sorted. Most sorting algorithms have computational complexity of O(d log d). However, for binary variables the element at a given rank can be determined based on counting the number of samples with value 1 (or 0). For instance, given the samples 101101, there are four 1s (and, therefore, two 0s). Thus, considering descending order, it can be
14
Nina S. T. Hirata
11111
01111
Duplication by (1,1,3)
10111
11011
11101
11110
00111 01011
01101
01110
10011
10101
10110
11001
11010
11100
00011
00110
01001
01010
01100
10001
10010
10100
11000
00101
00001
00010
00100
01000
10000
111 00000
011
101
110
001
010
100
Thresholding at 3 111
000 011
101
110
001
010
100
000
FIGURE 8 Weighted order statistic filter.
easily inferred that element 1 occupies the first four ranks and the two last ranks are occupied by element 0. In other terms, for binary inputs u = (u1 , u2 , . . . , ud ) ∈ {0, 1}d , rank function for a given rank r (considering descending order) can be expressed by a counting function as follows:
ψr (u) = 1 ⇐⇒ |u| ≥ r,
(12)
where |u| denotes the Hamming weight of u (i.e., the number of components equal to 1 in vector u). According to this equation, for binary inputs, the median is given by
ψ(d+1)/2 (u) = 1 ⇐⇒ |u| ≥ (d + 1)/2.
(13)
With regard to WOS filters, in the binary domain they also can be expressed as a counting-based function. Let = (ω1 , ω 2 , . . . , ωd ) be a weight vector, a vector of positive integers. Denote d∗ = ωi and let 0 ≤ r∗ ≤ d∗ . d Define the function ψ,r∗ by, for any u ∈ {0, 1} ,
ψ,r∗ (u) = 1 ⇐⇒
ωi ui ≥ r∗ .
According to this, Eq. (12) is a particular case where = (1, 1, . . . , 1).
(14)
15
Stack Filters: From Definition to Design Algorithms
Binary functions that can be expressed in the form of Eq. (14) with arbitrary (non-necessarily positive) integer weights are called linearly separable Boolean functions. If both weights and thresholds (rank) are positive, then they are linearly separable positive Boolean functions (Muroga, 1971). Thus, while stack filters correspond to PBFs, WOS filters correspond to threshold functions (with positive weights and threshold). In addition, as a subclass of the stack filters, WOS (and thus median and rank-order) filters possess the threshold decomposition structure. For a fixed weight vector, different WOS filters may be obtained by varying the threshold. Figure 9 shows five WOS filters generated by weight vector = (1, 1, 3). An interesting question is to determine whether two filters ψ1 ,r1∗ and ψ2 ,r2∗ are identical (Astola et al., 1994). Of more interest might be whether two weight vectors are equivalent in the sense that they generate the same set of filters. Note also that some authors define WOS filters as the ones that duplicate the d input samples by their respective weights and outputs the r-th largest 111 011
101
110
001
010
100
000
Duplication by (1,1,3)
5
111
11111
4 01111
10111
11011
11101
11110
011
101
110
001
010
100
00111 01011 01101 01110 10011 10101 10110 11001 11010 11100
3
000
00011 00101 00110 01001 01010 01100 10001 10010 10100 11000
00001
00010
00100
01000
10000
111 00000
2 1
011
101
110
001
010
100
000
111 011
101
110
001
010
100
000
111 011
101
110
001
010
100
000
FIGURE 9 WOS filters generated by the weight vector = (1, 1, 3).
16
Nina S. T. Hirata
element of the sequence. This implies descending order. However, other authors define the output of the filter as the r-th smallest element, which implies ascending order. This difference may generate some confusion. In general, descending order is adopted because of the convenience of having the threshold value of the threshold function equal to the desired rank. Since WOS are a subclass of the stack filters, not all PBFs can be expressed as Eq. (14). Figure 10 illustrates a positive Boolean function with d = 4 variables that does not correspond to any WOS filter (it is not linearly separable). To see that, consider the six elements with Hamming weight 2 (0011, 0101, 1010, 1100, 0110, 1001). There must be a weight vector (a, b, c, d) such that the first four, when expanded by the weight vector, result in elements with Hamming weight larger than the weight of the two others. More specifically, the following eight inequalities must be satisfied: c+d >b+c c+d >a+d a+c >b+c a+c >a+d
b+d >b+c b+d >a+d a+b>b+c a+b>a+d
It is easy to verify that there are no positive integers that satisfy the above inequalities. Therefore, the filter shown above is not a WOS filter. The number of WOS filters, as well as of stack filters, is not known for a general
1111
0111
0011
1011
0101
0001
1101
0110
1001
0010
0100
0000
FIGURE 10 A positive Boolean function that is not WOS.
1110
1010
1100
1000
Stack Filters: From Definition to Design Algorithms
17
dimension d. Finding the number of monotone Boolean functions on d variables is an open problem known as Dedekind’s problem (Kleitman, 1969; Kleitman and Markowsky, 1975).
D. Relation to Morphological Filters While stack filters have been initially investigated predominantly in the 1D signal-processing context, mathematical morphology has its origin in the study of binary images and their processing modeled respectively as sets and set operators (Serra, 1982). Mathematical morphology is a discipline that, from a practical point of view, is concerned with the development and application of operators that identify and modify particular structural (geometrical) information in images. Such information is identified by probing the image to be processed with structuring elements of different shapes and sizes (Serra, 1982; Soille, 2003). From a theoretical point of view, one of the main concerns is the study of algebraic representation and properties of the operators in the context of lattice theory. Lattice theory is an appropriate framework for the formal study of morphological operators since images can be modeled as elements of complete lattices (Heijmans, 1994; Matheron, 1975; Serra, 1988). Many morphological operators are obtained by composing two basic operators, the erosion and the dilation. In fact, it can be shown that any translation-invariant image operator can be expressed as a supremum of interval operators, which can be expressed in terms of these two operators. The first decomposition results in terms of the basic operators are credited to Matheron (1975), who showed that any increasing operator can be expressed as a supremum of erosions by structuring elements in the kernel of the operator. Maragos (1989) showed the necessary conditions for the existence of a more compact sup-representation, namely, the minimal decomposition as a supremum of erosions by structuring elements in the basis of the operator. These results have been extended to non-necessarily increasing operators by Banon and Barrera (1991, 1993). Although these results hold for any translation-invariant mappings between two complete lattices, hereafter the scope is restricted to binary and gray-level image operators. The specialization of the results mentioned above for binary Woperators (Banon and Barrera, 1991) is described next. Binary morphology is based on set operators. Let S ⊆ E denote a binary image and B ⊆ E be a subset to be called a structuring element. The erosion of S by B is defined, ∀S ∈ P(E), as
εB (S) = {p ∈ E | Bp ⊆ S} =
b∈B
where Bp denotes the set B translated by p.
S−b ,
(15)
18
Nina S. T. Hirata
The dilation of S by B is defined, ∀S ∈ P(E), as
δB (S) = {p ∈ E | Bˇ p ∩ S = ∅} =
Sb ,
(16)
b∈B
where Bˇ denotes the transpose of set B. Given A ⊆ B ⊆ E, [A, B] = {X ⊆ E | A ⊆ X ⊆ B} denotes the interval with extremities A and B. The interval operator, parameterized by an interval [A, B], is defined, ∀S ∈ P(E), as
λ[A, B] (S) = {p ∈ E | Ap ⊆ S ⊆ Bp }.
(17)
They are equivalent to the hit-or-miss operators, denoted H(U, V) , U, V ∈ P(E), and defined as H(U, V) (S) = {p ∈ E: Up ⊆ S and Vp ⊆ Sc }, for any S ∈ P(E). Equivalence is given by the equality λ[A, B] = H(A, Bc ) . An interval operator λ[A,B] can be expressed, ∀S ∈ P(E), as
λ[A, B] (S) = εA (S) ∩ [δBc (S)]c .
(18)
The kernel of a W-operator : P(E) → P(E) is defined as
KW () = {X ∈ P(W) | o ∈ (X)}.
(19)
Note that if W = E, then KW () = K() = {X ∈ P(E) | o ∈ (X)}, the original definition of the kernel (see, for instance, Banon and Barrera, 1991). The basis of is denoted BW () and defined as the set of all maximal intervals contained in the kernel, that is, [A, B] ⊆ KW () is maximal if ∀[A , B ] ⊆ KW () such that [A, B] ⊆ [A , B ] we have [A, B] = [A , B ]. Theorem 2 (Banon and Barrera, 1991). Any W-operator can be expressed uniquely as a union of interval operators, characterized by intervals in its kernel; that is,
=
λ[A, B] | [A, B] ⊆ KW () .
(20)
In terms of its basis, can be expressed as
=
λ[A, B] | [A, B] ∈ B() .
(21)
Stack Filters: From Definition to Design Algorithms
19
In fact, Eq. (20) can be simplified to = λ[A,A] | A ∈ KW () . A simple proof of this equality is provided by Heijmans (1994). If is increasing, then all maximal intervals contained in KW () are of the form [A, E], and hence [δEc (S)]c = [δ∅ (S)]c = ∅c = E, ∀S ∈ P(E). Thus, εA (S) ∩ [δEc (S)]c = εA (S), resulting in the decomposition of as a supremum (union) of erosions. Recalling that p ∈ (S) ⇐⇒ ψ(S−p ∩ W) = 1 [the set operator version of Eq. (4)], and thus o ∈ (X) ⇐⇒ ψ(X) = 1 for all X ∈ P(W), Eq. (19) can be rewritten as KW () = {X ∈ P(W) | ψ(X) = 1}. This characterization of the kernel in terms of the characteristic function ψ establishes the connection between binary W-operators and Boolean functions. In fact, the canonical decomposition in terms of the kernel corresponds to the canonical sum of products form of the corresponding Boolean function, and the minimal decomposition in terms of the basis corresponds to the minimal sum of products form of the Boolean function. An interval operator corresponds to a logic product term. The connection between mathematical morphology and stack filters was discussed by Maragos and Schafer (1987b). Since erosion is an increasing operator, it commutes with thresholding. The erosion of gray-level images by a flat structuring element B can be defined by simply replacing ∩ with min; that is,
[εB ( f )](p) = min{f (q) | q ∈ Bp }.
(22)
Similarly, dilation is given by
[δB ( f )](p) = max{f (q) | q ∈ Bˇ p }.
(23)
Considering descending ordering, gray-level erosion and dilation correspond respectively to ψd and ψ1 (rank filters of ranks d and 1, respectively). As mentioned previously, stack filters are operators that commute with thresholding, or equivalently, they are increasing operators by flat structuring elements. They are also known as flat filters (Heijmans, 1994). Thus, as morphological operators, stack filters can be expressed in the binary domain as a union of binary erosions by structuring elements that are subsets of W or, in the gray-level domain, as the maximum of gray-level erosions with the same structuring elements (Maragos and Schafer, 1987b; Soille, 2002).
20
Nina S. T. Hirata
III. OPTIMAL STACK FILTERS One of the main concerns when designing a filter is to find filters that have good filtering performance on signals of a given domain. The goodness of a filter may be stated in statistical terms by assuming that signals to be processed, as well as their respective ideal filtered signals, are modeled by random processes. It will be assumed that the input (observed, to be processed) signals and the corresponding ideal (desired output) signals are modeled by stationary random processes fi and fo , respectively. More strictly, it is assumed that they form a stationary joint random process (fi , fo ) with joint distribution P(fi , fo ). An optimal filter is one that, given fi , best estimates fo according to some performance measure. Let M be a statistical measure that describes the closeness of (fi ) to fo . Then, a filter opt is optimal with respect to measure M and process (fi , fo ), if Mopt ≤ M for any filter . In the case of stack filters, it is well known that the MAE of a stack filter can be expressed as a linear combination of the MAEs of the corresponding binary filter (Coyle and Lin, 1988). Thus, optimal MAE stack filters can be expressed in terms of optimal MAE PBFs with respect to the cross sections of the multilevel signals. The MAE of stack filters, its relation to the MAE of the corresponding PBF, and the integer linear programming formulation of the problem of finding an optimal MAE stack filter are presented in the subsequent sections.
A. Mean Absolute Error Optimal Stack Filters Let be a stack filter with characteristic function ψ: K W → K and let (fi , fo ) be a pair of observed-ideal jointly stationary random processes with joint distribution P(fi , fo ). The MAE of at a given location p ∈ E with respect to these processes is defined as
MAEp = E ψ(fi−p |W ) − fo (p) ,
(24)
where E[·] denotes the expected value of its argument. Clearly, fi−p |W is a random process with realizations in K W , and fo (p) is a random variable with realizations in K. Due to stationarity, location p is arbitrary. Thus, p may be dropped from fi−p |W and from fo (p), resulting
21
Stack Filters: From Definition to Design Algorithms
respectively in a multivariate random variable g with realizations in K W and a random variable y with realizations in K. The process (g, y) is the local process of (fi , fo ) and its joint distribution is denoted P(g, y). Thus, considering joint stationarity of (fi , fo ) and the local definition of , MAE can be rewritten as
MAE = E[|ψ(g) − y|].
(25)
The expected value in Eq. (25) is with respect to the joint distribution P(g, y). The next two propositions establish the linear relation between the MAE of on multilevel signals and the MAE of on binary signals (obtained by thresholding the multilevel ones). Proposition 1. Let ψ : K W → K, g ∈ K W , and y ∈ K. Then, for any t ∈ K, k k Tt (ψ(g)) − Tt (y) = Tt (ψ(g)) − Tt (y) . t=1
(26)
t=1
If ψ( g) > y, then
Proof. k
Tt (ψ(g)) − Tt (y) =
t=1
y t=1
k 1−1 + Tt (ψ(g)) − Tt (y) , t=y+1
0
and since all terms in the second sum of the right side are non-negative and because | i ai | = i |ai | if ai ≥ 0 for all i, k Tt (ψ(g)) − Tt (y) = t=1 y k = Tt (ψ(g)) − Tt (y) + Tt (ψ(g)) − Tt (y) t=1
t=y+1
k = Tt (ψ(g)) − Tt (y). t=1
Similarly, if ψ( g) ≤ y, then k t=1
Tt (ψ(g)) − Tt (y) =
ψ( g)
t=1
1−1 +
k t=ψ( g)+1
Tt (ψ(g)) −Tt (y) , 0
22
Nina S. T. Hirata
and since all the terms in the second sum at the right side are non-positive k Tt (ψ(g)) − Tt (y) = t=1
=
ψ( g)
Tt (ψ(g)) − Tt (y) + t=1
=
k
Tt (ψ(g)) − Tt (y)
t=ψ( g)+1
k Tt (ψ(g)) − Tt (y) . t=1
Given a process (g, y) as defined above, let g and y denote realizations of g and y, respectively. The binary signal Tt [g] can be regarded as a realization of a binary random vector denoted by Ut and the binary value Tt (y) as a realization of a binary random variable denoted by bt . Proposition 2. Let : K E → K E be a W-operator that commutes with thresholding (hence, a stack filter) characterized by a function ψ: K W → K. Let also (g, y) be as defined above and let (Ut , bt ), t = 1, 2, . . . , k, be the processes corresponding to the cross sections of (g, y). Then
MAE =
k
MAEt ,
(27)
t=1
where MAEt corresponds to the mean absolute error of with respect to the process (Ut , bt ). Proof. MAE =
= E ψ(g) − y
[Eq. (25)]
k
Tt (ψ(g)) − kt=1 Tt (y) (Threshold decomposition) =E t=1 k
=E [ Tt (ψ(g)) − Tt (y) ]
(Rearranging sum)
t=1 k
Tt (ψ(g)) − Tt (y)
=E
t=1
(Proposition 1)
23
Stack Filters: From Definition to Design Algorithms
=
=
k
E Tt (ψ(g)) − Tt (y) (Expected value
t=1
commutes with sum)
k
E ψ(Tt [g]) − Tt (y)
(ψ commutes with thresholding)
t=1
Ut
bt
k
= E ψ(Ut ) − bt t=1
(Rewriting in terms of (Ut , bt ))
MAEt
Notice that the first five equalities hold for any W-operator , not necessarily for stack filters. This proposition shows that the MAE of a stack filter with respect to a random process (fi , fo ) (or equivalently, to its corresponding local process (g, y)) can be expressed as a linear combination (summation) of the MAEs of the filter with respect to each of the binary processes (Ut , bt ) corresponding to the cross sections of (fi , fo ).
B. Equivalent Optimality in the Binary Domain This text section shows the characterization of optimal MAE stack filters in terms of the MAE optimality of the corresponding PBFs. Recalling that (Ut , bt ) denotes the binary joint random process corresponding to the cross sections of (g, y) (the local process of (fi , fo )) at level t and MAEt denotes the MAE of with respect to this process, let C(ψ) = kt=1 MAEt and let Pt (u, b) denote the probability of Ut = u and bt = b (that is, Pt (u, b) = P(Ut = u, bt = b)), where u ∈ {0, 1}d and b ∈ {0, 1}. Using this notation,
C(ψ) =
k
E[|ψ(Ut ) − bt |]
t=1
=
k t=1 u
=
k u t=1
|ψ(u) − b|Pt (u, b)
b
b
|ψ(u) − b|Pt (u, b) .
Cu (ψ)
The term Cu (ψ) is the amount u contributes to C(ψ) =
(28)
k
t=1 MAEt .
24
Nina S. T. Hirata
Since b ∈ {0, 1}, Cu (ψ) can be rewritten, for any Boolean function ψ and u ∈ {0, 1}d , as
Cu (ψ) = ψ(u)
k
Pt (u, 0) + (1 − ψ(u))
t=1
k
Pt (u, 1).
(29)
t=1
Thus
C(ψ) =
Cu (ψ) =
u
=
k
Pt (u, 0) +
{u | ψ(u)=1} t=1
k
Pt (u, 1).
(30)
{u | ψ(u)=0} t=1
The Boolean function that minimizes C(ψ) is obtained by minimizing Cu for each u, i.e., by the Boolean function:
ψopt (u) =
⎧ k k ⎪ ⎪ ⎪ Pt (u, 1) > Pt (u, 0), ⎨ 1, if t=1
t=1
t=1
t=1
k k ⎪ ⎪ ⎪ Pt (u, 1) ≤ Pt (u, 0). ⎩ 0, if
(31)
However, our aim is to minimize MAE. Notice that the equality MAE = C(ψ) holds if ψ is a PBF, but ψopt may not be a PBF. An optimal PBF (that characterizes the optimal stack filter) is the one with the smallest value C among all PBFs. Denoting P(u, b) = kt=1 Pt (u, b), b ∈ {0, 1}, and rewriting C(ψ) in terms of Cu (ψ) given in Eq. (29), it follows that
C(ψ) =
ψ(u)P(u, 0) + (1 − ψ(u)) P(u, 1)
u
=
[ψ(u)P(u, 0) + P(u, 1) − ψ(u)P(u, 1)] u
=
P(u, 1) +
u
(P(u, 0) − P(u, 1))ψ(u).
(32)
u
Thus, since the first sum in the last equality does not depend on ψ, finding a PBF ψ that minimizes C(ψ) is equivalent to finding a PBF ψ that minimizes
C (ψ) =
(P(u, 0) − P(u, 1))ψ(u). u
cu
(33)
Stack Filters: From Definition to Design Algorithms
25
C. Formulation as a Linear Programming Problem Optimal MAE stack filters can be computed by finding a PBF ψ that minimizes the cost C(ψ) defined in Eq. (30) or, equivalently, the cost C (ψ) defined in Eq. (33). As mentioned previously, finding a Boolean function (not necessarily positive) that minimizes those costs is straightforward. However, to guarantee positiveness of the Boolean function, monotonicity constraints must be imposed—the relation ψ(u1 ) ≤ ψ(u2 ) must hold for each pair (u1 , u2 ) ∈ {0, 1}d × {0, 1}d , such that u1 < u2 . To simplify notation, let xu ∈ {0, 1} be a variable corresponding to the value of the Boolean function at u (i.e., xu = ψ(u)), for each element u ∈ {0, 1}d . Using this notation, x corresponds to a vector with 2d components. Consider also cu = P(u, 0) − P(u, 1), the costs relative to individual elements in Eq. (33). Then the MAE stack-filter problem can be formulated as the following integer linear programming (ILP) problem (Coyle and Lin, 1988). Problem 1 (ILP formulation of the optimal MAE stack-filter problem).
min
d −1 2
cu xu
u=0
subject to
xu ≤ xv , if u ≤ v
(34)
xu ≥ 0 xu ≤ 1 xu integer The constraints in Problem 1 can be rewritten as a totally unimodular matrix and since all components in the right side of the inequalities are integers, all basic feasible solutions of Problem 1 are integral. Thus the integrality constraint in Problem 1 can be dropped (see, for instance, Cook et al. [1998]), resulting in: Problem 2 (Relaxation of the ILP in Problem 1).
min
d −1 2
cu xu
u=0
subject to
xu ≤ xv , if u ≤ v xu ≥ 0 −xu ≥ −1
(35)
26
Nina S. T. Hirata
The number of constraints of the form xu ≤ xv can be reduced by considering the transitivity of the partial-order relation. More specifically, u < w and w < v implies that u < v and therefore the third constraint is redundant. Thus, a constraint u < v should be included in the ILP formulation above if and only if there is no w such that u < w < v.
IV. STACK FILTER DESIGN APPROACHES A. Overview Given the joint distribution of the cross sections of input-output signals, an optimal MAE stack filter can be computed by solving the linear programming (LP) problem presented in the previous section. However, the number of variables and the number of constraints in the LP are, respectively, 2d and (d 2d−1 ), increasing exponentially to the window size d. Therefore, for windows of moderate size, solution of the LP problem by naive approaches becomes infeasible. To overcome this limitation, some heuristic approaches that result in suboptimal solutions were proposed in the 1990s. Joint probabilities are estimated from training data (sample pairs of input-output signals). Two major classes of heuristic approaches exist for stack-filter design. The first one, called adaptive algorithms, consists of repeatedly scanning the input data, updating a counting vector, and enforcing monotonicity (Lin et al., 1990; Lin and Kim, 1994; Yoo et al., 1999). The second approach estimates the costs given in Eq. (30) and searches the optimal solution directly on the Boolean lattice (Han and Fan, 1997; Hirata et al., 2000; Lee et al., 1999; T˘abus et al., 1996; T˘abus and Dumitrescu, 1999). Recently Dellamonica et al. (2007) proposed an algorithm for computing an exact solution of the LP problem. They were able to compute the exact solution for problems with window size up to 25. Their approach considers the network flow problem associated with the dual of the LP and strategies to decompose it into smaller subproblems that are solved efficiently. However, it requires a large amount of computer memory. In addition to these two methods, other approaches also have been proposed. Among them are those based on genetic algorithms (Doval et al., 1998; Undrill et al., 1997), neural networks (Zeng, 1996), sample selection probabilities (Doval et al., 1998; Prasad and Lee, 1994; Prasad, 2005; Shmulevich, et al., 2000), and structural approaches (Coyle et al., 1989; Gabbouj and Coyle, 1990; Yin, 1995). Figure 11 shows a taxonomy of the major approaches for stack-filter design. Notice, however, that other approaches for PBF design are not included in the diagram because they do not appear related to stack-filter design in the literature.
Stack Filters: From Definition to Design Algorithms
27
Design approaches
Statistical (Training data)
Heuristic
Structural
Exact
Others Graph search Adaptive FIGURE 11
LP (Minimum cost network flow)
Taxonomy of major stack-filter design approaches.
The following text sections describe the primary ideas of the two classes of heuristic approaches previously mentioned, and the algorithm for exact solution of the ILP problem.
B. Heuristic Solutions The adaptive and the lattice search–based heuristic algorithms that generate suboptimal solutions are described in this section.
1. Adaptive Algorithms The first algorithm of this class was proposed by Lin et al. (1990). The algorithm starts with the null function (which is a PBF). Iterative scanning of training data, and eventually several passes over the training data collection, sequentially updates the initial function in such a manner that it converges to the optimal PBF. More specifically, an array D with 2d positions is kept in memory. This array is indexed by elements of {0, 1}d . For each element u observed in the training data collection, D[u] is incremented (or decremented) depending on the corresponding output value b. Values in any position of D are allowed to vary from 0 to N, where N is some positive integer. At any time, a Boolean function can be obtained from D by setting ψ(u) = T N (D[u]), 2
for u ∈ {0, 1}d . If D[u] ≥ D[v] whenever u ≥ v, then ψ is a PBF. Let (ui , bi ) ∈ {0, 1}d × {0, 1}, i = 1, . . . , m, denote the collection of training data. Note that each pair (ui , bi ) is obtained by thresholding a d-point observation of a multilevel signal. Figure 12 shows the algorithm.
28
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:
Nina S. T. Hirata
D[u] = N/2, for all u ∈ {0, 1}d i=1 repeat if bi == 1 then D[ui ] = min{D[ui ] + 1, N} else D[ui ] = max{D[ui ] − 1, 0} end if Check and, if necessary, enforce monotonicity i = (i mod m) + 1 until convergence Return T N [D] 2
FIGURE 12 The adaptive algorithm proposed by Lin et al. (1990).
Monotonicity enforcement must consider two cases: those in which bi = 1 and those in which bi = 0. When bi = 1, D[ui ] is incremented and it is necessary to check if this increment violates monotonicity. Violation in this case corresponds to D[ui ] becoming larger than D[v] for some v such that |v| = |ui | + 1. If that happens, then D[ui ] and D[v] are swapped, resulting in D[ui ] < D[v]. However, after such swapping, there may exist w such that |w| = |v| + 1 and D[v] > D[w], configuring another monotonicity violation. Thus, monotonicity checking followed by swapping must be carried sequentially toward the largest element in the lattice until no violation exists. Since the longest path in the lattice {0, 1}d has length d, in the worst case d swapping will be necessary. The process of monotonicity enforcement is similar when bi = 0; in this case, swapping advances towards the smallest element in the lattice. Other adaptive algorithms are improvements of the first one. Lin and Kim (1994) proposed a modification to reduce the number of iterations. The modification is based on the observation that, when a multilevel sample is thresholded at the k levels, there are at most d + 2 distinct binary observations. Then, instead of k iterations, only d + 2 iterations are necessary at most. The amount of increment/decrement is directly related to the number of occurrences of each distinct binary signal. With this modification in the original algorithm, they report a speedup of factor 20. The most recent improvement for the algorithm was proposed by Yoo et al. (1999). They introduce a parameter L that corresponds to the enforcement period; that is, enforcements are done at each L increment/decrement iteration. They also make D[u] vary between −N/2 and N/2 and initialize D with zeros. Another significant modification is in the form in which the enforcements are performed. Since several increments/decrements may have been performed in L iterations, there may exist more than one
Stack Filters: From Definition to Design Algorithms
111
111
29
111
011
101
110
011
101
110
011
101
110
001
010
100
001
010
100
001
010
100
000
000
000
FIGURE 13 The three rounds of enforcements as proposed by Yoo et al. (1999) for a lattice of dimension d = 3.
local monotonicity violation. Their approach is highly parallel, allowing simple parallel implementation. It consists of d rounds of (possibly parallel) enforcements, covering all pairs of elements in the lattice that have Hamming distance equal to 1. Figure 13 shows the three rounds of enforcement for d = 3. The pairs of elements that are compared with each other in each round are highlighted by bold arcs linking them. After the three rounds, every pair whose Hamming distance equals 1 have been checked. The updating to enforce monotonicity does not consist of swapping as in the previous two algorithms. Instead, if u < v with D[u] > D[v], then the update performed is D[u] = D[v] =
D[u] + D[v] 2
D[u] + D[v] , 2
where · denotes the greatest integer smaller than or equal to its argument, and · denotes the smallest integer greater than or equal to its argument. Proofs that the enforcement strategy generates PBFs that converge to the optimal PBF as the number of iterations grows are provided in the respective works.
2. Graph Search−Based Algorithms Algorithms in this class perform searches over the graph that correspond to the Boolean lattice {0, 1}d . A formulation proposed by Hirata et al. (2000) provides a unifying framework for those algorithms. This section describes the unifying formulation and how other approaches of this class fit in this formulation.
30
Nina S. T. Hirata
Given a function ψ: {0, 1}d → {0, 1}, let L(ψ) = {u ∈ {0, 1}d | ψ(u) = 0} and U(ψ) = {u ∈ {0, 1}d | ψ(u) = 1}. Obviously, L(ψ) ∪ U(ψ) = {0, 1}d and L(ψ) ∩ U(ψ) = ∅. Thus, {L(ψ), U(ψ)} is a partition of {0, 1}d . A partition {L(ψ), U(ψ)} of {0, 1}d is a (L, U) partition of {0, 1}d if for all u ∈ L(ψ) it satisfies {v ∈ {0, 1}d | v ≤ u} ⊆ L(ψ) (or, equivalently, if for all u ∈ U(ψ) it satisfies {v ∈ {0, 1}d | u ≤ v} ⊆ U(ψ)). It is easy to see that a (L, U) partition of {0, 1}d defines a PBF and, conversely, a PBF defines a (L, U) partition of {0, 1}d . Figure 14 shows a (L, U) partition of {0, 1}4 . Because of the relationship between PBFs and (L, U) partitions, the problem of designing an optimal PBF can be viewed as a problem of finding an optimal (L, U) partition of the lattice. An optimal (L, U) partition of {0, 1}d is the one that minimizes
C(ψ) =
P(u, 0) +
u∈U
P(u, 1),
u∈L
which is simply the cost of Eq. (30) rewritten in terms of (L, U).
1111 Upper set
0111
0011
0101
0001
1011
1101
0110
1001
0010
0100
1110
1010
1100
1000
Lower set 0000 FIGURE 14 A PBF partitions the lattice in two subsets: the L (lower) and U (upper) sets. Elements in the U set (shaded ones) are those mapped to 1, whereas elements in the L set are those mapped to 0. Positiveness implies that no element in the U set lies below any element in the L set (and, equivalently, no element in the L set lies above any element in the U set).
Stack Filters: From Definition to Design Algorithms
31
If just a small portion of the lattice is analyzed at a time, it may be possible to decide in which region of the optimal partition it should d and let c(Z) = belong. Let Z ⊆ {0, 1} u ∈ Z P(u, 0) − P(u, 1). Since C(ψ) = (P(u, 0) − P(u, 1)) ψ(u), it makes sense to set ψ(Z) = 1 (or, equivalently, u to place Z in U) if c(Z) < 0, and to set ψ(Z) = 0 (or, equivalently, to place Z in L) if c(Z) > 0. If c(Z) = 0, it does not matter. Following this reasoning, an (L, U) partition may be build incrementally by finding subsets Z with the above characteristics and placing them in the upper part U or in the lower part L. However, since an element in U implies that any other element lying above it in the lattice must also be in U (and, similarly, an element in L implies that any other lying below it must also be in L), these subsets Z cannot be arbitrary. They must be chosen to maintain the validity of the partition being built. Also, if arbitrary subsets are chosen, it may happen that some elements will be placed in both parts alternatively several times. Thus, it is necessary to guarantee that the process finishes. The subsets that can be placed in U or L will be called feasible sets (they are defined next). The operator \ is the usual set subtraction. Let Q ⊆ {0, 1}d , such that for any u ∈ Qc , either {v ∈ {0, 1}d | u ≤ v} ⊆ Qc or {v ∈ {0, 1}d | v ≤ u} ⊆ Qc (that is, Q is a subset of {0, 1}d obtained by removing some elements from the top and others from the bottom, but none from the “middle”). An (L, U) partition of Q may be defined in a similar way as defined above. Definition 2. Let FU be the class of non-empty subsets F of Q such that (Q\F, F) is a valid partition of Q, and c( F) < 0. A subset F ∈ FU is U feasible if and only if F is minimal in FU relative to ⊆. Definition 3. Let FL be the class of non-empty subsets F of Q such that ( F, Q\F) is a valid partition of U, and c( F) ≥ 0. A subset F ∈ FL is L feasible if and only if F is minimal in FL relative to ⊆. The next theorem states that an optimal partition can be built by successively moving small subsets of Q to one of the regions. Theorem 3. Let F be a feasible set of Q, and (L , U ) be an optimal partition of Q\F. Then, (a) (b)
if F is U feasible, then (L , U ∪ F) is an optimal partition of Q, and if F is L feasible, then (L ∪ F, U ) is an optimal partition of Q.
Proof. See Hirata et al. (2000) The algorithm (see Figure 15) builds the optimal (L, U) partition by iteratively moving feasible sets from Q to one of the parts. It starts with empty upper and lower sets, and then sequentially moves feasible sets from the
32
1: 2: 3: 4: 5: 6:
Nina S. T. Hirata
Q = {0, 1}d L=U=∅ while Q is not empty do Search for a feasible set F in Q. If F is U feasible, then do U ← U ∪ F; if F is L feasible, then do L ← L ∪ F. Do Q ← Q\F. end while Return (L, U)
FIGURE 15 The lattice search algorithm proposed by Hirata et al. (2000).
remaining of Q to one of the regions. Since this is a greedy strategy, in the sense that once a subset is moved, it will never be put back into Q, the algorithm finishes when Q is empty. It can be shown that a non-empty set Q always contains at least one feasible set. It is interesting to notice that only elements with non-null cost need to be considered in the process, resulting in a sparse graph. This may be interesting for relatively large windows, because the number of such elements is likely to be much smaller than the number of nodes in the entire lattice. However, adequate data structures need to be considered in order to build and traverse the graph efficiently. Notice that given the costs, there is an inherent optimal BF that is not necessarily a PBF [see Eq. (31)]. If the optimal BF ψopt is not positive, that indicates that there are at least two elements u and v such that u < v and ψopt (u) > ψopt (v) (i.e., ψopt (u) = 1 and ψopt (v) = 0). These elements are said to be in the inversion set. There are two possibilities to “fix” ψopt in order to make it positive: (1) switch ψopt (u) from 1 to 0 or (2) switch ψopt (v) from 0 to 1. If there exists more than two elements in the inversion set, then the possible number of switchings is usually much larger. Finding an optimal (L, U) partition can be understood as finding the best set of switchings—the one that results in a PBF with smallest overall cost. Any set of valid switchings (i.e., one that results in a positive function) determines a valid (L, U) partition of the lattice. Similarly, a valid (L, U) partition determines a set of switchings. It is clear that, in considering how to switch values of ψopt at different elements in the lattice, only elements in the inversion set need to be processed. Therefore, to find an optimal (L, U) partition, set Q in Line 1 of the algorithm in figure 15 can be initialized only with those elements in the inversion set. All elements above any element of Q must be placed in U, and all elements below any element of Q must be placed in L in the second line of the algorithm. In practice, finding feasible sets is not a trivial task. Hirata et al. (2000) propose searching for feasible sets with one minimal/maximal element first and, in case none is found, searching for feasible sets with two minimal/maximal elements, and so on. The maximum number of minimal/maximal elements in the feasible sets is a parameter of the
Stack Filters: From Definition to Design Algorithms
33
algorithm. Thus, by not searching for feasible sets with minimal/maximal elements larger than this parameter the algorithm may miss the optimal solution. Other lattice search algorithms may be fit in the above formulation as discussed next. The approach proposed by Lee et al. (1999) considers an initial empty upper region. At each iteration the smallest subset with “negative cost that obey the stacking property” is moved to the upper region, until no such subset is found. The smallest subset with negative cost is equivalent to U-feasible sets defined above. L-feasible sets are not considered in their work. Tabus et al. (1996) propose an approach in which the inversion set is computed first (their inversion sets are called undecided sets) and then the LP restricted to the inversion set is solved. However, if the inversion set is relatively large, resolution of the associated LP problem becomes computationally infeasible. To address large inversion sets, the inversion set size (and thus, of the associated LP) is reduced by removing some easily detectable feasible sets (for instance, the feasible sets with one minimal/maximal element) from it (T˘abus¸ and Dumitrescu, 1999). Another approach that may be fit in this formulation is the one proposed by Han and Fan (1997). In their approach, only one element is moved to the upper region at a time. Among the elements that can be moved to the upper region (to preserve the validity of the partition) the one with the largest negative cost is preferred (this would correspond to a unitary U-feasible set). If no such candidate exists, then all candidates are added into a queue and processed afterward (an element can be moved to the upper region only if all elements larger than it have already been moved). Every time an element is moved to the upper region, a new valid partition is configured. In particular, every time a negative cost element is moved to the upper region, the respective new partition may correspond to the optimal PBF and thus its MAE should be compared to the minimum found so far. The process must be repeated until no negative cost elements are left in the unprocessed part of the diagram.
C. Optimal Solution Recall that the problem of designing an optimal MAE stack filter can be formulated as an LP problem (see Section III.C). The solution of the dual of an LP problem allows solution of the original LP problem. Thus, a common practice for solving LP problems is to solve their respective duals. The LP formulation of the optimal MAE stack-filter design problem is closely related to flow models (first suggested by Gabbouj and Coyle (1991).) Recently, Dellamonica et al. (2007) showed that the dual of the LP relaxation (Problem 2) corresponds to the LP formulation of a
34
Nina S. T. Hirata
minimum-cost network flow (MCNF) problem. In an MCNF problem, networks are modeled as directed graphs with costs associated with the arcs and demands associated with vertices. A feasible flow in the network is an assignment of values to the arcs that satisfy the demand; that is, the amount of flow in the arcs entering the vertex minus the amount of flow in arcs leaving the vertex must equal the vertex demand, for every vertex. The total cost of a flow is the sum of the flow in the arcs multiplied by their respective costs. The MCNF is a feasible flow with minimum cost. An MCNF problem can be solved by the network simplex algorithm, an efficient specialization of the original simplex algorithm. It can be shown that there is always a tree solution to an MCNF problem. The network simplex algorithm starts with an initial tree solution and at each iteration finds an improved tree solution by adding a new arc and removing another in such a way as to not increase the cost of the solution. However, the graph associated with the MCNF problem may be very large. To overcome this difficulty, Dellamonica et al. (2007) propose a strategy that decomposes the problem into smaller subproblems. According to the proposed decomposition principle, once an optimal solution is found for a subproblem (defined on a subset of the whole lattice), it partially defines an optimal solution for the entire lattice. In other words, there exists an optimal solution for the entire lattice that, when restricted to the domain of the subproblem, exactly matches the solution to the subproblem. The subproblems in the proposed decomposition strategy correspond to solving the MCNF restricted to subsets that are ideals of the lattice. According to Dellamonica et al., a subset I ⊆ {0, 1}d is an ideal3 of the lattice {0, 1}d if for all u ∈ I the relation {v ∈ {0, 1}d | v ≤ u} ⊆ I holds. Thus, the main steps of the algorithm are as follows: (1) generate an ideal, (2) solve the associated MCNF problem, (3) fix the values of the solution for the elements in the ideal, and (4) consider a larger ideal, until the whole lattice is covered. A key point exploited during these iterations is a simple extension of a tree solution corresponding to the smaller ideal to a feasible solution of the larger ideal. Details may be found in their work (Dellamonica et al., 2007). In order to find an optimal solution for an MCNF problem, given a feasible tree solution, the algorithm must find an arc in the graph to enter the solution in such a way as to decrease the total cost. Since the graph associated with the MCNF problem may be huge, it is not feasible to store the entire graph in memory. A solution to this difficulty consists of keeping only the tree solution and generating the candidate arcs only when they 3 Lattice ideals are usually defined as subsets that satisfy the property described in the text and also that
are closed under the supremum operation—if u, v ∈ I , then u + v ∈ I . The + operation in this case is the logical bitwise operation OR.
Stack Filters: From Definition to Design Algorithms
35
are needed. It is shown that some particularities of the problem allow a simple characterization of the candidate arcs. Again, details may be found in their work (Dellamonica et al., 2007). This algorithm has some similarities to the graph search algorithm based on feasible sets described previously. A first similarity is the fact that subproblems may be related to ideals at the bottom or top (in this case, called sup-ideals and defined similarly to ideals) of the lattice, allowing the optimal solution to be defined gradually for elements at the top and bottom parts of the lattice. A second similarity is the decomposition principle: in the graph search algorithm, once a feasible set is moved, the final solution for the elements in that set is fixed; the same happens to the elements in an ideal once the associated MCNF problem is solved. The code of the algorithm is available at the web page http:// www.vision.ime.usp.br/nonlinear/stackfd.
V. APPLICATION EXAMPLES A. Design Procedure This section describes a procedure for optimal MAE stack-filter design from training data. Consider given a window W of size d and a set of training data {(fi 1 , fo 1 ), (fi 2 , fo 2 ), . . . , (fi m , fo m )} with m pairs of observedideal signals. Then, the design procedure consists of the following three steps: 1. Estimate P(u, b) = kt=1 Pt (u, b), u ∈ {0, 1}d , from the training data. 2. Compute cu for each observed pattern u from the probabilities estimated in step (1). If u has not been observed, then consider cu = 0. 3. Apply an algorithm that finds a PBF ψ that minimizes C (ψ) [see Eq. (33)]. The probabilities in Step (1) involve probabilities for each threshold level. To estimate them, let • Nt be the number of observations through W in the cross sections at level t of the observed images • Nt (u, 1) be the number of times u is observed in the cross sections at level t with ideal value b = 1 • Nt (u, 0) be the number of times u is observed in the cross sections at level t with ideal value b = 0 The probabilities Pt (u, 1) and Pt (u, 0) are estimated, respectively, by
Nt (u, 1) , Pˆ t (u, 1) = Nt
36
Nina S. T. Hirata
and
Nt (u, 0) Pˆ t (u, 0) = . Nt (u,1) Since Nt = N for all t ∈ K (N being a positive integer), kt=1 NtN = t k 1 1 t=1 Nt (u, 1), and the constant N can be dropped. As a consequence, N there is no need to estimate the joint probabilities Pt (u, b) for each of the threshold levels. This explains why occurrences of (u, b) on different cross ˆ sections can be pooled and just a single value P(u, b) computed.
B. Examples Three application examples are presented. The filters have been designed using the stackfd algorithm (see Section IV.C and Dellamonica et al., 2007) implementation available at http://www.vision.ime.usp.br/nonlinear /stackfd. This text section presents some examples of filters that can be obtained by training algorithms and does not evaluate the performance of the algorithms. Performance details of the main design algorithms can be found in the corresponding papers (optimal solution algorithm, Dellamonica et al., 2007) and best heuristic algorithm (Yoo et al., 1999). The first example considers images corrupted with salt-and-pepper and dropout noise. Figure 16 shows an image without noise, whereas
FIGURE 16 Gray-level (ideal) image “boat.”
Stack Filters: From Definition to Design Algorithms
FIGURE 17
37
Test image “boat” (MAE = 14.5995).
Figure 17 shows a corrupted image. The noise consists of 5% additive and 5% subtractive impulse noise (both with amplitude 200, and maximum and minimum saturated at 255 and 0, respectively) plus horizontal line segments of intensity 255 with probability of occurrence 0.35%, with length following a normal distribution with mean 5 and variance of 49 pixels. One pair of noisy-ideal images has been used to compute a 3 × 3 and a 21-point (5 × 5 without the four corner points) window stack filters. The test image is an independent realization of the same noise type. Figure 18 shows the output of the optimal 21-point window stack filter for the test image shown in Figure 17. Pixels at positions where the window does not fit entirely in the image domain have not been processed by the filter. Output value for these pixels has been set to 0. The MAE values were computed disregarding these pixels. Figure 19 shows the output of the 3 × 3 window stack filter for the same test image. To contrast with the effects of the median filter, Figure 20 shows the output of the 3 × 3 window median filter for the same test image. Observe that the median tends to blur more than the optimal stack filter. For the same type of noise, stack filters trained with a given image tend to work for other images, not necessarily similar to the ones used to design the filter. Figure 21 shows the effect of the previous 21-point window stack filter on another image, with an independent realization of the same type of noise.
38
Nina S. T. Hirata
FIGURE 18 Output of the d = 21 optimal stack filter (MAE = 2.4411).
FIGURE 19 Output of the optimal 3×3 stack filter (MAE = 3.0499).
Stack Filters: From Definition to Design Algorithms
FIGURE 20
39
Output of the 3×3 median filter (MAE = 5.1266).
The second example considers salt-and-pepper noise. Figure 22 shows the filtering effect of the 5 × 3 window optimal stack filter computed from one pair of training images. The noise consists of 3% salt-and-pepper noise (with 3% additive and 3% subtractive impulse noise, both with amplitude 200, and maximum and minimum saturated at 255 and 0, respectively). The third example considers the effect of increasing window sizes. Binary images are considered to allow better perception. Figure 23 shows a noisy image, the respective ideal output, and a sequence of outputs obtained from filters based on increasing window sizes. As can be seen, as the window size increases, the respective MAE decreases. In fact, if actual costs were used, this behavior would always be true, with MAE becoming constant eventually, but never increasing. However, in practice, since filters are computed from estimated costs, from some point MAE starts to increase due to estimation imprecision (this is known as the curse of dimensionality).
VI. CONCLUSION An extensive overview of stack-filters, including their definition, some of their properties, relation to morphological operators, the fact that they
40
Nina S. T. Hirata
(a)
(b) (c) FIGURE 21 Effect of the 21-point window optimal stack filter trained with the “boat” images on a different image with same type of noise. (a) Ideal image. (b) Test (MAE = 14.2919). (c) Filtered (MAE = 2.3418).
are a generalization of median, rank-order filters and their variations, characterization of MAE on multilevel signals in terms of the MAEs of binary cross sections of the multilevel signals, main design approaches from training data, and some application examples have been presented. One of the most important properties of the class of stack filters is its equivalence to the class of positive Boolean functions. Another important result is the MAE theorem that relates the MAE of a stack filter with respect to multilevel signals to a linear combination of the MAEs of the corresponding PBF with respect to the binary cross sections of these multilevel signals.
Stack Filters: From Definition to Design Algorithms
(a)
(b)
(c) FIGURE 22 Filtering of salt-and-pepper noise: effect of the 5×3 window optimal stack filter. (a) Ideal image. (b) Test (MAE = 7.2031). (c) Filtered (MAE = 2.2941).
41
42
Nina S. T. Hirata
(a)
(b)
(c)
(d)
(e) (f) FIGURE 23 Binary image filtering: effect of increasing window sizes. (a) Test (MAE = 0.1051). (b) Ideal image. (c) d = 9 (MAE = 0.0259). (d) d = 15 (MAE = 0.0129). (e) d = 21 (MAE = 0.0087). (f) d = 24 (MAE = 0.0083).
These two results allow the reduction of the problem of designing stack filters to the problem of designing PBFs. Design of PBFs can be modeled as a linear programming problem. However, the number of variables and constraints in the problem is exponential
Stack Filters: From Definition to Design Algorithms
43
to the window size. This fact kept the problem unsolvable in conventional computers until recently. Several heuristic solutions have been proposed to overcome such limitation. The heuristic design approaches covered in this chapter are adaptive algorithms and those based on graph search techniques, both suboptimal approaches. A recently proposed algorithm that provides an exact optimal solution has also been described. The algorithm that produces an optimal solution was reported to have solved problem instances for d = 25 as fast as or even faster than the fastest heuristic algorithms. The optimality and speed is achieved at the expense of significantly higher memory requirement. Solving an instance for d = 25 requires ∼ 3.5 Gb of memory, whereas the fastest heuristic algorithms require ∼ 250 Kb. This fact makes heuristic solutions still very attractive. However, although heuristic solutions do converge on the optimal solution, there is no knowledge on how fast it converges. From a practical perspective, it is necessary to fix the number of iterations or to stop iterating when the decrease in the error between two successive iterations becomes negligible. The existence of an efficient algorithm for the computation of optimal solutions makes possible the realization of experimental researches in order to investigate the convergence behavior of the iterative heuristic algorithms (how fast they converge, how the type of noise affects convergence speed, etc). As mentioned, the computation of the optimal solution requires large memory space. The memory requirement will probably be satisfied by technological advances in hardware components, allowing larger instances to be solved. This may give a false impression that there are no challenges left for the design of stack filters. However, the main issue in the process of designing image operators from training data that still needs to be addressed is the difficulty in obtaining training data. This limitation affects the precision of statistical estimation and, consequently, performance of the designed filter. While it is relatively easy to edit images and produce the ideal outputs for binary images, the same task is much more complex for gray-level images. In practice, given a fixed amount of training data, there is a maximum window size that corresponds to minimum error; operators designed for windows larger than that will present a bigger error. This phenomenon is known as the curse of dimensionality and it is due to overfitting (excessive adjustment to training data, while training data do not reflect the true distribution with high precision). A possible means of improving performance of the designed filters with respect to error, for a fixed amount of training data, is to consider multilevel training. At each training level, filters are designed on moderate size and distinct windows and, at the last level, these filters are composed, resulting in a multilevel filter that ultimately depends on a larger window.
44
Nina S. T. Hirata
Aside from the issue related to precision of estimations from training data, knowledge on stack filters has already reached a very mature stage, including efficient and interesting design algorithms. This work, by reporting a broad overview of this knowledge, may contribute to the dissemination of the use of stack filters together with these algorithms to promote advances in knowledge related to the precision issue above and to the development of new classes of nonlinear filters and design algorithms.
ACKNOWLEDGMENTS N. S. T. Hirata acknowledges partial support from CNPq (Brazil), grant 312482/2006-0.
REFERENCES Astola J. T., Alaya-Cheikh, F., and Gabbouj, M. (1994). When are two weighted order statistic filters identical? In “Nonliner Image Processing V,” vol. 2180, no. V (E. R. Dougherty, J. T. Astola, and H. G. Longbotham, eds.), Proc. IS & T/SPIE Symposium on Electronic Imaging Science & Technology, pp. 45–54, San Jose, California, February 6–10, 1994. Banon, G. J. F., and Barrera, J. (1991). Minimal representations for translation-invariant set mappings by mathematical morphology. SIAM J. Appl. Math., 51(6):1782–1798. Banon, G. J. F., and Barrera, J. (1993). Decomposition of mappings between complete lattices by mathematical morphology, part I. General lattices. Signal Process. 30, 299–327. Bovik, A. C. (1987). Streaking in median filtered images. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-35, 493–503. Bovik, A. C., Huang, T. S., and Munson D. C. Jr. (1983). A generalization of median filtering using linear combinations of order statistics. IEEE Trans. Acoust. Speech Signal Process. 31, 1342–1350. Bovik, A. C., Huang, T. S., and Munson D. C. Jr. (1987). The effect of median filtering on edge estimation and detection. IEEE Trans. Pattern Anal. 9, 191–194. Brownrigg, D. R. K. (1984). The weighted median filter. Commun. ACM 27, 807–818. Carla, R., Sacco, V. M., and Baronti, S. (1986). Digital techniques for noise reduction in APT NOAA satellite images. In Proc. IGARSS 86, 995–1000. Cook, W. J., Cunningham, W. H., Pulleyblank, W. R., and Schrijver, A. (1998). Combinatorial Optimization. John Wiley & Sons, New York. Coyle, E. J., and Lin, J.-H. (1988). Stack filters and the mean absolute error criterion. IEEE Trans. Acoust. Speech Signal Process. 36, 1244–1254. Coyle, E. J., Lin, J.-H., and Gabbouj, M. (1989). Optimal stack filtering and the estimation and structural approaches to image processing. IEEE Trans. Acoust. Speech Signal Process. 37, 2037–2066. Dellamonica, D. Jr., Silva, P. J. S., Humes, C. Jr., Hirata, N. S. T., and Barrera, J. (2007). An exact algorithm for optimal MAE stack filter design. IEEE Trans. Image Process. 16, 453–462. Dougherty, E. R., and Astola, J. T., editors. (1999). Nonlinear Filters for Image Processing. The International Society for Optical Engineering and IEEE Press, New York. Doval, A. B. G., Mohan, A. K., and Prasad, M. K. (1998). Evolutionary algorithm for the design of stack filters specified using selection probabilities. In Adaptive Computing in Design and Manufacture. Fitch, J. P., Coyle, E. J., and Gallagher, N. C. Jr. (1984). Median filtering by threshold decomposition. IEEE Trans. Acoust. Speech Signal Process. ASSP-32, 1183–1188.
Stack Filters: From Definition to Design Algorithms
45
Frieden, B. (1976). A new restoring algoritm for the preferential enhancement of edge gradients. J. Opt. Soc. Am. 66, 280–283. Gabbouj, M., and Coyle, E. J. (1990). Minimum mean absolute error stack filtering with structural constraints and goals. IEEE Trans. Acoust. Speech Signal Process. 38, 955–968. Gabbouj, M., and Coyle, E. J. (1991). On the LP which finds a MMAE stack filter. IEEE Trans. Signal Process. 39, 2419–2424. Gallagher, N. C. Jr., and Wise, G. L. (1981). A theoretical analysis of the properties of median filters. IEEE Trans. Acoust. Speech Signal Process. ASSP-29, 1136–1141. Gilbert, E. N. (1954). Lattice-theoretic properties of frontal switching functions. J. Math. Phys. 33, 57–67. Grochulski, W., Mitraszewski, P., and Penczek, P. (1985). Application of combined medianaveraging filters to scintigraphic image processing. Nucl. Med. 24, 164–168. Han, C.-C., and Fan, K.-C. (1997). Finding of optimal stack filter by graphic searching methods. IEEE Trans. Signal Process. 45, 1857–1862. Haralick, R. M., Sternberg, S. R., and Zhuang, X. (1987). Image analysis using mathematical morphology. IEEE Trans. Pattern Anal. PAMI-9, 532–550. Heijmans, H. J. A. M. (1994). Morphological Image Operators. Academic Press, Boston. Heygster, G. (1982). Rank filters in digital image processing. Comput. Graph. Image Process. 19, 148–164. Hirata, N. S. T., Dougherty, E. R., and Barrera, J. (2000). A switching algorithm for design of optimal increasing binary filters over large windows. Pattern Recogn. 33, 1059–1081. Huang, T. S., editor (1981). Two-Dimensional Digital Signal Processing II: Transforms and Median Filters. Springer-Verlag, New York. Jayant, N. (1976). Average and median-based smoothing techniques for improving digital speech quality in the presence of transmission errors. IEEE Trans. Commun. 24, 1043–1045. Justusson, B. I. (1981). Median filtering: Statistical properties. In “Topics in Applied Physics, Two-Dimensional Digital Signal Procesing II”. (Huang, T.S., ed.). Springer-Verlag, New York. Kleitman, D. J. (1969). On Dedekind’s problem: The number of monotone boolean functions. Proc. Am. Math. Soc. 21, 677–682. Kleitman, D. J., and Markowsky, G. (1975). On Dedekind’s problem: The number of isotone Boolean functions II. Trans. Am. Math. Soc. 213, 373–390. Ko, S.-J., and Lee, Y. H. (1991). Center weighted median filters and their applications to image enhancement. IEEE Trans. Circuits Sys. 38, 984–993. Lee, W.-L., Fan, K.-C., and Chen, Z.-M. (1999). Design of optimal stack filters under the MAE criterion. IEEE Trans. Signal Process. 47, 3345–3355. Lee, Y. H., and Kassam, S. A. (1985). Generalized median filtering and related nonlinear filtering techniques. IEEE Trans. Acoust. Speech Signal Process. ASSP-33, 672–683. Lin, J.-H. and Kim, Y.-T. (1994). Fast algorithms for training stack filters. IEEE Trans. Signal Process. 42(4): 772–781. Lin, J.-H., Sellke, T. M., and Coyle, E. J. (1990). Adaptive stack filtering under the mean absolute error criterion. IEEE Trans. Acoust. Speech Signal Process. 38, 938–954. Loupas, T., McDicken, N., and Allan, P. I. (1987). Noise reduction in ultrasonic images by digital filtering. Br. J. Radiol. 60, 389–392. Ma, H., Zhou, J., Ma, L., and Tang, Y. Y. (2002). Order statistic filters (OSF): A novel approach to document analysis. Int. J. Pattern Recog. 16, 551–571. Maragos, P. (1989). A representation theory for morphological image and signal processing. IEEE Trans. Pattern Anal. 11, 586–599. Maragos, P., and Schafer, R. W. (1987a). Morphological filters: Part I: Their set-theoretic analysis and relations to linear shift-invariant filters. IEEE Trans. Acoust. Speech Signal Process. 35, 1153–1169.
46
Nina S. T. Hirata
Maragos, P., and Schafer, R. W. (1987 b). Morphological filters: Part II: Their relations to median, order statistic, and stack-filters. IEEE Trans. Acoust. Speech Signal Process. 35, 1170–1184 (corrections in ASSP 37, April 1989, p. 597 ). Marshall, S. and Sicuranza, G. L., eds. (2006). Advances in Nonlinear Signal and Image Processing. EURASIP Book Series on SP&C. Hindawi Publishing Corporation, New York. Matheron, G. (1975). Random Sets and Integral Geometry. John Wiley, New York. Mitra, S. and Sicuranza, G., eds. (2000). Nonlinear Image Processing. Academic Press, New York. Muroga, S. (1971). Threshold Logic and Its Applications. Wiley, New York. Narendra, P. M. (1981). A separable median filter for image noise smoothing. IEEE Trans. on pattern Analysis and machine Intelligence, 3(1), 20–29. Nieminen, A., Heinonen, P., and Neuvo, Y. (1987). A new class of detail-preserving filters for image processing. IEEE Trans. Pattern Anal. 9, 74–90. Nodes, T. A., and Gallagher, N. C. (1982). Median filters: Some modifications and their properties. IEEE Trans. Acoust. Speech Signal Process. 30, 739–746. Pitas, I., and Venetsanopoulos, A. N. (1990). Nonlinear Digital Filters–Principles and Applications. Kluwer Academic Publishers, Amsterdam. Pitas, I., and Venetsanopoulos, A. N. (1992). Order statistics in digital image processing. Proc. IEEE 80, 1893–1192. Prasad, M. K. (2005). Stack filter design using selection probabilities. IEEE Trans. Signal Process. 53, 1025–1037. Prasad, M. K., and Lee, Y. H. (1989). Weighted median filters: Generation and properties. In IEEE International Symposium on Circuits and Systems 1, pp. 425–428. Prasad, M. K., and Lee, Y. H. (1994). Stack filters and selection probabilities. IEEE Trans. Signal Process. 42, 2628–2643. Pratt, W. K. (1978). Digital Image Processing. Wiley Interscience, New York. Rabiner, L. R., Sambur, M. R., and Schmidt, C. E. (1975). Applications of nonlinear smoothing algorithm to speech processing. IEEE Trans. Acoust. Speech Signal Process. 23, 552–557. Rosenfeld, A., and Kak, A. C. (1982). Digital Picture Processing, vol. 2. Academic Press, New York. Schmitt, R. M., Meyer, C. R., Carson, P. L., and Samuels, B. I. (1984). Median and spatial low-pass filtering in ultrasonic computed tomography. Med. Phys. 11, 767–771. Scollar, I., Weidner, B., and Huang, T. S. (1984). Image enhancement using the median and the interquantile distance. Comput. Vision Graphics Image Processing 25, 236–251. Serra, J. (1982). Image Analysis and Mathematical Morphology. Academic Press, New York. Serra, J. (1988). Image Analysis and Mathematical Morphology. vol 2: Theoretical Advances. Academic Press, New York. Serra, J., and Vincent, L. (1992). An overview of morphological filtering. Circuits Systems Signal Process. 11, 47–108. Shmulevich, I., Melnik, V., and Egiazarian, K. (2000). The use of sample selection probabilities for stack filter design. IEEE Signal Process. Lett. 7, 189–192. Soille, P. (2002). On morphological operators based on rank filters. Pattern Recogn. 35, 527–535. Soille, P. (2003). Morphological Image Analysis, 2nd ed. Springer-Verlag, Berlin. Sun, T., Gabbouj, M., and Neuvo, Y. (1994). Center weighted median filters: Some properties and their applications in image processing. Signal Process. 35, 213–229. T˘abu¸s, I., and Dumitrescu, B. (1999). A new fast method for training stack filters (Çetin, ¨ un, ¨ A., Gurcan, M. N., and Yardimci, Y., eds.), pp. 511–515. In IEEEA. E., Akarun, L., Ertuz EURASIP Workshop on Nonlinear Signal and Image Processing (NSIP’99), Antalya, Turkey. T˘abu¸s, I., Petrescu, D., and Gabbouj, M. (1996). A training framework for stack and Boolean filtering—fast optimal design procedures and robustness case study. IEEE Trans. Image Process. 5, 809–826. Tukey, J. (1974). Nonlinear (nonsuperposable) methods for smoothing data. In Cong. Rec., EASCON, page 673. Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley, Reading, MA.
Stack Filters: From Definition to Design Algorithms
47
Tyan, S. G. (1982). Median filtering: Deterministic properties. In Digital Signal Processing II. Transforms and Median Filters (Huang, T. S., ed.) Springer-Verlag, New York. Undrill, P. E., Delibasis, K., and Cameron, G. G. (1997). Stack filter design using a distributed pararell implementation of genetic algorithms. J. UCS 3, 821–834. Wecksung, G., and Campbell, K. (1974). Digital image processing at EC&G. Computer 7, 63–71. Wendt, P. D., Coyle, E. J., and Gallagher, N. C. Jr. (1986). Stack filters. IEEE Trans. Acoust. Speech Signal Process. 34, 898–911. Wong, K. M., and Chen, S. (1987). Detection of narrow-band sonar signals using order statistical filters. IEEE Trans. Acoust. Speech Signal Process. 35, 597–613. Yin, L. (1995). Stack filter design: A structural approach. IEEE Trans. Signal Process. 43, 831–840. Yin, L., Yang, R., Gabbouj, M., and Neuvo, Y. (1996). Weighted median filters: A tutorial. IEEE Trans. Circuits Syst. II: Analog and Digital Signal Processing 43, 157–192. Yli-Harja, O., Astola, J. T., and Neuvo, Y. (1991). Analysis of the properties of median and weighted median filters using threshold logic and stack filter representation. IEEE Trans. Signal Process. 39, 395–410. Yoo, J., Fong, K. L., Huang, J. -J., Coyle, E. J., and Adams, G. B. III. (1999). A fast algorithm for designing stack filters. IEEE Trans. Image Process. 8, 1014–1028. Zeng, B. (1996). Design of optimal stack filters: a neural net approach with bp algorithm. In IEEE International Conference on Systems, Man, and Cybernetics 4, 2762–2767.
This page intentionally left blank
CHAPTER
2 The Foldy–Wouthuysen Transformation Technique in Optics Sameen Ahmed Khan*
Contents
I Introduction II The Foldy–Wouthuysen Transformation III Quantum Formalism of Charged-Particle Beam Optics IV Quantum Methodologies in Light Beam Optics V Conclusion Appendix A Appendix B Acknowledgments References
49 51 58 60 62 64 66 73 74
I. INTRODUCTION The Foldy–Wouthuysen transform is widely used in high-energy physics. It was historically formulated by Leslie Lawrence Foldy and Siegfried Adolf Wouthuysen in 1949 to understand the nonrelativistic limit of the Dirac equation, the equation for spin-1/2 particles (Foldy and Wouthuysen, 1950; Foldy, 1952; see also Pryce, 1948; Tani, 1951; see Acharya and Sudarshan, 1960 for a detailed general discussion of the Foldy–Wouthuysen-type transformations in particle interpretation of relativistic wave equations). The approach of Foldy and Wouthuysen used a canonical transform that has come to be known as the Foldy–Wouthuysen transformation (a brief account of the history of the transformation is found in the obituaries of Foldy and Wouthuysen [Brown et al., 2001; Leopold, 1997] and the biographical memoir of Foldy [2006]). Before their work, there was some difficulty in understanding and gathering all the interaction terms of a * Engineering Department, Salalah College of Technology, Salalah, Sultanate of Oman Advances in Imaging and Electron Physics,Volume 152, ISSN 1076-5670, DOI: 10.1016/S1076-5670(08)00602-2. Copyright © 2008 Elsevier Inc. All rights reserved.
49
50
Sameen Ahmed Khan
given order, such as those for a Dirac particle immersed in an external field. Their procedure clarified the physical interpretation of the terms, and it became possible to apply their work systematically to a number of problems that had previously defied solution (see Bjorken and Drell, 1964; Costella and McKellar, 1995, for technical details). The Foldy– Wouthuysen transform was extended to the physically important cases of the spin-0 and the spin-1 particles (Case, 1954) and was even generalized to the case of arbitrary spins ( Jayaraman, 1975). The powerful machinery of the Foldy–Wouthuysen transform has found applications in very diverse areas, such as atomic systems (Asaga et al., 2000; Pachucki, 2004); synchrotron radiation (Lippert et al., 1994), and derivation of the Bloch equation for polarized beams (Heinemann and Barber, 1999). The application of the Foldy–Wouthuysen transformation in acoustics is very natural; comprehensive and mathematically rigorous accounts can be found in Fishman (1994, 2004), Orris and Wurmser (1995), and Wurmser (2001, 2004). For ocean acoustic see Patton (1986). In the traditional scheme, the purpose of expanding the light optics = −(n2 (r) − Hamiltonian H p2⊥ ) as the expansion p2⊥ )1/2 in a series using ( 12 n0
parameter is to understand the propagation of the quasiparaxial beam in terms of a series of approximations (paraxial + nonparaxial). A similar situation is the case of charged-particle optics. In relativistic quantum mechanics a similar problem exists of understanding the relativistic wave equations as the nonrelativistic approximation plus the relativistic correction terms in the quasirelativistic regime. For the Dirac equation (which is first order in time) this is done most conveniently by using the Foldy–Wouthuysen transformation, leading to an iterative diagonalization technique. The main framework of the newly developed formalisms of optics (both light optics and charged-particle optics) is based on the transformation technique of the Foldy–Wouthuysen theory, which casts the Dirac equation in a form displaying the different interaction terms between the Dirac particle and an applied electromagnetic field in a nonrelativistic and easily interpretable form. In the Foldy–Wouthuysen theory the Dirac equation is decoupled through a canonical transformation into two two-component equations: one reduces to the Pauli equation (Osche, 1977) in the nonrelativistic limit, and the other describes the negative-energy states. There is a close algebraic analogy between (1) the Helmholtz equation (governing scalar optics) and the Klein–Gordon equation and (2) the matrix form of Maxwell’s equations (governing vector optics) and the Dirac equation. Thus, it is logical to use the powerful machinery of standard quantum mechanics (particularly, the Foldy–Wouthuysen transform) in analyzing these systems. The suggestion to use the Foldy–Wouthuysen transformation technique in the Helmholtz equation was first mentioned in the literature as a remark
The Foldy–Wouthuysen Transformation Technique in Optics
51
(Fishman and McCoy, 1984). The same idea was independently outlined by Jagannathan and Khan (see pp. 277 in Jagannathan and Khan, 1996). Only in recent works has this idea been exploited to analyze the quasiparaxial approximations for specific beam optical systems (Khan, 2005a; Khan et al., 2002). The Foldy–Wouthuysen technique is ideally suited for the Lie algebraic approach to optics. With all these positive features, the power, and ambiguity-free expansion, the Foldy–Wouthuysen transformation is still seldom used in optics. The Foldy–Wouthuysen transformation results in nontraditional prescriptions of Helmholtz optics (Khan, 2005a) and Maxwell optics (Khan, 2006b), respectively. The nontraditional approaches give rise to interesting wavelength-dependent modifications of the paraxial and aberrating behavior. The nontraditional formalism of Maxwell optics provides a unified framework of light beam optics and polarization. The nontraditional prescriptions of light optics are in close analogy with the quantum theory of charged-particle beam optics (Conte et al., 1996; Jagannathan et al., 1989; Jagannathan, 1990, 1993, 1999, 2002, 2003; Jagannathan and Khan, 1995, 1996, 1997; Khan and Jagannathan, 1993, 1994, 1995; Khan, 1997, 1999a, 1999b, 2001, 2002a, 2002b, 2002c). The following text sections provide details of the standard Foldy–Wouthuysen transform. An outline of the quantum theory of charged-particle beam optics and the nontraditional prescriptions of light optics are also presented. A comprehensive account can be found in the references. The Feshbach–Villars technique adopted from quantum mechanics to linearize the Klein–Gordon equation is described in Appendix A. An exact matrix representation of the Maxwell equations is presented in Appendix B.
II. THE FOLDY–WOUTHUYSEN TRANSFORMATION The standard Foldy–Wouthuysen theory is described briefly to clarify its use for the purposes of the above studies in optics. Let us consider a charged particle of rest mass m0 , charge q in the presence of an electromagnetic field characterized by E = −∇φ − ∂t∂ A, and B = ∇ × A. Then the Dirac equation is
i
∂ D (r, t) (r, t) = H ∂t D = m0 c2 β + qφ + cα · π H
(1)
+ O = m0 c 2 β + E = qφ E = cα · O π,
(2)
52
Sameen Ahmed Khan
where
0 σ 1l 0 10 , β= , 1l = , σ 0 0 −1l 01 01 0 −i 1 0 σ = σx = , σy = , σz = , 10 i 0 0 −1 α=
(3)
with π = p − qA, p = −i∇, and π2 = πx2 + πy2 + πz2 . In the nonrelativistic situation the upper pair of components of the Dirac spinor are large compared to the lower pair of components. The which does not couple the large and small components of , operator E, is called an “odd” operator that couples the large is called “even” and O to the small components. Note that
= −Oβ, βO
= Eβ. βE
(4)
such that the The search is for a unitary transformation, = −→ U, equation for does not contain any odd operator. In the free-particle case (with φ = 0 and π = p) such a Foldy– Wouthuysen transformation is denoted by
F −→ = U pθ , F = eiS = eβα· U
tan 2| p|θ =
| p| . m0 c
(5)
This transformation eliminates the odd part completely from the freeparticle Dirac Hamiltonian, reducing it to the diagonal form:
i
∂
= eiS m0 c2 β + cα · p e−iS
∂t " ! βα · p sin | p|θ m0 c2 β + cα · = cos | p|θ + p | p| ! " βα · p × cos | p|θ − sin | p|θ
| p| = m0 c2 cos 2| p|θ + c| p| sin 2| p|θ β
" !# = m20 c4 + c2 p 2 β .
(6)
Generally, when the electron is in a time-dependent electromagnetic field, it is not possible to construct an exp(i S) that removes the odd
The Foldy–Wouthuysen Transformation Technique in Optics
53
operators from the transformed Hamiltonian completely. This necessitates a nonrelativistic expansion of the transformed Hamiltonian in a power series in 1/m0 c2 keeping through any desired order. Note that in the F = nonrelativistic case, when |p| m0 c, the transformation operator U 2 exp(iS) with S ≈ −iβO/2m0 c , where O = cα · p is the odd part of the free Hamiltonian. So, in the general case we can start with the transformation
(1) = eiS1 ,
iβO iβα · π . S1 = − = − 2m0 c 2m0 c2
(7)
Then, the equation for (1) is
" ! ∂ ∂ (1) ∂ iS1 ∂ iS1 i S1 +e e = i e i i = i ∂t ∂t ∂t ∂t ∂ iS1 = i + eiS1 H e D ∂t ∂ iS1 −iS1 −i S1 = i e (1) + eiS1 H e De ∂t −i ∂ S1 = eiS1 H − ieiS1 e−iS1 (1) De ∂t (1) (1) , =H D where we have used the identity Now, using the two identities
(8) ∂ ∂t
I = 0. e A e−A + e A ∂t∂ e−A = ∂t∂
1 1 − Be A = A, [ A, B]] + [ A, [ A, [ A, B]]] + · · · e A B + [ A, B] + [ 2! 3! " ! 1 1 2 3 A(t) ∂ −A(t) e e A(t) + A(t) · · · = 1+ A(t) + ∂t 2! 3! ! " ∂ 1 2 1 3 × 1 − A(t) + A(t) − A(t) · · · ∂t 2! 3! ! " 1 1 2 3 = 1+ A(t) + A(t) + A(t) · · · 2! 3! $ % & ∂ A(t) 1 ∂ A(t) ∂ A(t) × − + A(t) + A(t) ∂t 2! ∂t ∂t
54
Sameen Ahmed Khan
%
∂ A(t) 2 ∂ A(t) A(t) + A(t) A(t) ∂t ∂t & ' ∂ A(t) 2 + A(t) ... ∂t ( ) ∂ A(t) 1 ∂ A(t) ≈− − A(t), ∂t 2! ∂t ( ( )) 1 ∂ A(t) − A(t), A(t), 3! ∂t ( ( ( ))) 1 ∂ A(t) − A(t), A(t), A(t), , 4! ∂t 1 − 3!
(9)
with A = i S1 , we find
( ) S1 ∂ S1 ∂ (1) + i S1 , HD − HD ≈ HD − ∂t 2 ∂t ( ( )) 1 S1 ∂ − S1 , S1 , HD − 2! 3 ∂t ( ( ))) ( i ∂ S 1 D − − S1 , S1 , . S1 , H 3! 4 ∂t
(10)
D = m0 c2 β + E + O, simplifying the right-hand Substituting in Eq. (10), H and collecting the terms side using the relations βO = −Oβ and βE = Eβ together yields
(1) ≈ m0 c2 β + E 1 + O 1 H D
') ( $ * + 1 1 ∂ O 2 1 ≈ E + − O, E + i E βO O, ∂t 2m0 c2 8m20 c4 1
4 βO 8m30 c6 $ ' * + ∂ O β 1 3 1 ≈ E + i O O , O, − 2 ∂t 2m0 c 3m20 c4 −
(11)
55
The Foldy–Wouthuysen Transformation Technique in Optics
1 and O 1 obeying the relations βO 1 = −O 1 β and βE 1 = E 1 β exactly with E like E and O. Whereas the term O in HD is of order zero with respect to = O((1/m0 c2 )0 )], the odd part of the expansion parameter 1/m0 c2 [i.e., O (1) , namely O 1 , contains only terms of order 1/m0 c2 and higher powers H D 2 1 = O((1/m0 c2 )1 )]. of 1/m0 c [i.e., O A second Foldy–Wouthuysen transformation is applied with the same prescription to reduce the strength of the odd terms further in the transformed Hamiltonian:
(2) = eiS2 (1) , 1 iβO S2 = − 2m0 c2 iβ =− 2m0 c2
(
β 2m0 c2
$
E + i ∂O O, ∂t
*
+
' −
)
1 3m20 c4
3 . (12) O
After this transformation,
i
∂ (2) (2) (2) , =H D ∂t 2 ≈ E 1 , E
(2) = m0 c2 β + E 2 + O 2 H D
2 ≈ β O 2m0 c2
$
1 + i ∂O1 1 , E O ∂t
*
+
' ,
(13)
2 = O((1/m0 c2 )2 ). After the third transformation, where, now, O
(3) = eiS3 (2) ,
2 iβO S3 = − 2m0 c2
(14)
we have
i
∂ (3) (3) (3) , =H D ∂t 3 ≈ E 2 ≈ E 1 , E
(3) = m0 c2 β + E 3 + O 3 H D 3 ≈ β O 2m0 c2
$
3 , 3 = O((1/m0 c2 )3 ). So, neglecting O where O
2 + i ∂O2 2 , E O ∂t
*
+
' ,
(15)
56
Sameen Ahmed Khan
1 2 βO 2m0 c2 ( $ ') * + ∂ O 1 O, E + i O, − 2 ∂t 8m0 c4 ⎧ $ '2 ⎫ ⎨ ⎬ * + 1 ∂ O 4 + O, E + i − 3 β O . ⎭ ∂t 8m0 c6 ⎩
(3) ≈ m0 c2 β + E + H D
(16)
O) pairs can be By starting with the second transformation, successive (E, obtained recursively using the rule
/ 0 j = E 1 E →O j−1 → E j−1 , O E / 0 j = O → E j−1 , O 1 E →O j−1 , O
j > 1,
(17)
and retaining only the relevant terms of desired order at each step. = qφ and O = cα · With E π, the final reduced Hamiltonian [Eq. (16)] is, to the order calculated,
$
(3) H D
π2 p4 = β m0 c2 + − 2m0 8m30 c6 − −
iq2 8m20 c2 q2 8m20 c2
· curl E − divE,
' + qφ − q
4m20 c2
q β · B 2m0 c
· E × p (18)
with the individual terms having direct physical interpretations. The terms in the first set of parentheses result from the expansion of #
m20 c4 + c2 π2 , showing the effect of the relativistic mass increase. The second and third terms are the electrostatic and magnetic dipole energies. The next two terms, taken together (for hermiticity), contain the spin-orbit interaction. The last term, the so-called Darwin term, is attributed to the zitterbewegung (trembling motion) of the Dirac particle—because of the rapid coordinate fluctuations over distances of the order of the Compton wavelength (2π/m0 c), the particle sees a somewhat smeared-out electric potential. The Foldy–Wouthuysen transformation technique clearly expands the Dirac Hamiltonian as a power series in the parameter 1/m0 c2 , thus
The Foldy–Wouthuysen Transformation Technique in Optics
57
enabling the use of a systematic approximation procedure to study the deviations from the nonrelativistic situation. The similarities between the nonrelativistic particle dynamics and paraxial optics for light beams and particle beams, respectively, are noted in the charts below. Standard Dirac Equation
Light Beam Optical Form
D + O D m0 c2 β + E m0 c 2 Positive energy Nonrelativistic, | π | m0 c Nonrelativistic motion + Relativistic corrections
+ O −n0 β + E −n0 Forward propagation Paraxial beam, p n0
Standard Dirac Equation
Particle-Beam Optical Form
D + O D m0 c2 β + E m0 c 2 Positive energy Nonrelativistic, |π| m0 c Nonrelativistic motion + Relativistic corrections
+ O −p0 β + E ∂ −p0 = i ∂z Forward propagation Paraxial beam, |π⊥ | k Paraxial behavior + Aberration corrections
⊥
Paraxial behavior + Aberration corrections
Noting the above similarities, the concept of the Foldy–Wouthuysen form of the Dirac theory has been adopted to study paraxial optics and deviations. The Helmholtz equation governing scalar optics is first linearized in a procedure similar to the manner in which the Klein–Gordon equation is written in the Feshbach–Villars form (linear in ∂/∂t), unlike the Klein–Gordon equation (quadratic in ∂/∂t). This enables use of the Foldy–Wouthuysen transformation technique. (See Appendix A for the Feshbach–Villars form of the Klein–Gordon equation.) In the case of vector optics, Maxwell’s equations are cast in a spinor form resembling exactly the Dirac equation [Eqs. (1) and (2)] in all respects: i.e., a multicomponent with the upper half of its components large compared to an odd the lower components and the Hamiltonian with an even part (E), part (O), a suitable expansion parameter, (| p⊥ |/n0 1) characterizing the dominant forward propagation, and a leading term with a β coefficient and anticommuting with O. It is important to note commuting with E that the Dirac field and the electromagnetic field are two distinct entities. However, their striking resemblance in the underlying algebraic structure can be exploited to perform some useful calculations with meaningful results. (See Appendix B for the derivation of an exact matrix representation of Maxwell’s equations and differences from other representations).
58
Sameen Ahmed Khan
The additional feature of our formalism is to return finally to the original representation after making an extra approximation, dropping β from the final reduced optical Hamiltonian, taking into account the fact our primary interest is in only the forward-propagating beam. The Foldy–Wouthuysen transformation has allowed entirely new approaches to light optics and charged-particle optics, respectively.
III. QUANTUM FORMALISM OF CHARGED-PARTICLE BEAM OPTICS The classical treatment of charged-particle beam optics has been very successful in the design and function of numerous optical devices—from electron microscopes to very large particle accelerators. It is natural, however, to look for a prescription based on the quantum theory, since any physical system is quantum mechanical at the fundamental level. Such a prescription is sure to explain the grand success of the classical theories. It is certain to be of assistance in a deeper understanding and better design of charged-particle beam devices. The starting point of the quantum prescription of charged-particle beam optics is building a theory based on the basic equations of quantum mechanics (Dirac, Klein–Gordon, Schrödinger) appropriate to the situation under study. In order to analyze the evolution of the beam parameters of the various individual beam optical elements (quadrupoles, bending magnets, and so on) along the optic axis of the system, the first step is starting with the basic time-dependent equations of quantum mechanics, followed by obtaining an equation of the form
i
0 / 0 / 0 ∂ / x, y; s ψ x, y; s , ψ x, y; s = H ∂s
(19)
where (x, y; s) constitutes a curvilinear coordinate system adapted to the geometry of the system. Equation (19) is the basic equation in the quantum formalism, known as the beam-optical equation; H and ψ as the beamoptical Hamiltonian and the beam wavefunction, respectively. The second step requires obtaining a relationship between any relevant observable {O(s)} at the transverse plane at s and the observable {O(sin )} at the transverse plane at sin , where sin is some input reference point. This is achieved by the integration of the beam-optical Eq. in (19):
0 / 0 / (s, sin ) ψ x, y; sin , ψ x, y; s = U which provides the required transfer maps
(20)
The Foldy–Wouthuysen Transformation Technique in Optics
/ 02 1 / 0 O (sin ) −→ O (s) = ψ x, y; s |O| ψ x, y; s , / 3 / 0 † 04 ψ x, y; sin . OU = ψ x, y; sin U
59
(21)
This two-step algorithm is an oversimplified picture of the quantum formalism. Several crucial points should be noted. The first step in the algorithm of obtaining the beam-optical equation is not to be treated as a mere transformation that eliminates t in preference to a variable s along the optic axis. A clever set of transforms is required to not only eliminate the variable t in preference to s but also yield the s-dependent equation, which has a close physical and mathematical correspondence with the original t-dependent equation of standard time-dependent quantum mechanics. The imposition of this stringent requirement on the construction of the beam-optical equation ensures the execution of the second step of the algorithm. The beam-optical equation is such that all the required rich machinery of quantum mechanics becomes applicable to the computation of the transfer maps that characterize the optical system. This describes the essential scheme of obtaining the quantum formalism. The remainder is mostly a mathematical detail inbuilt into the powerful algebraic machinery of the algorithm, accompanied with some reasonable assumptions and approximations dictated by the physical considerations. The nature of these approximations can be best summarized in the optical terminology as a systematic procedure of expanding the beam-optical Hamiltonian in a power series of | π⊥ /p0 |, where p0 is the design (or average) momentum of beam particles moving predominantly along the direction of the optic axis, and π⊥ is the small transverse kinetic momentum. The required expansion is obtained using the ambiguity-free procedure of the Foldy–Wouthuysen transformation. The Feshbach–Villars procedure (Feshbach and Villars, 1958) brings the Schrödinger and the Klein–Gordon equations to a two-component form facilitating the application of the Foldy–Wouthuysen expansion. The leading order approximation, along with | π⊥ /p0 | 1, constitutes the paraxial or ideal behavior, and higherorder terms in the expansion give rise to the nonlinear or aberrating behavior. The paraxial and aberrating behavior is modified by the quantum contributions, which are in powers of the de Broglie wavelength (– λ0 = /p0 ). The classical limit of the quantum formalism reproduces the well-known Lie algebraic formalism of charged-particle beam optics (e.g., see Dragt and Forest 1986; Dragt et al., 1988; Rangarajan et al., 1990; Ryne and Dragt, 1991; see also Forest et al., 1989; Forest and Hirata, 1992). The Hamiltonian description allows us to relate our formalism with other traditional prescriptions, such as the quantum-like approach (Fedele and Man’ko, 1999). A complete coverage of the new field of quantum aspects of beam physics (QABP), can be found in the proceedings of the series of meetings under the same name (Chen, 1999, 2002; Chen and Reil, 2003) and their Reports (Chen, 1998, 2000, 2003a, 2003b).
60
Sameen Ahmed Khan
IV. QUANTUM METHODOLOGIES IN LIGHT BEAM OPTICS Historically, the scalar wave theory of optics (including aberrations to all orders) is based on Fermat’s principle of least time. In this approach, the beam-optical Hamiltonian is derived using Fermat’s principle. This approach is purely geometrical and works adequately in the scalar regime. All the laws of geometrical optics can be deduced from Maxwell’s equations (e.g., see Born and Wolf, 1999). This deduction is traditionally done using the Helmholtz equation, which is derived from Maxwell’s equation. In this approach, one takes the square root of the Helmholtz operator followed by an expansion of the radical (Dragt, 1982, 1988; Dragt et al., 1986). It should be noted that the square-root approach reduces the original boundary value problem to a first-order initial value problem. This reduction has great practical value, since it leads to the powerful system or the Fourier optic approach (Goodman, 1996). However, the beam-optical Hamiltonian in the square-root approach is no different from the geometrical approach of Fermat’s principle. Moreover, the reduction process itself can never be claimed to be rigorous or exact. The Helmholtz equation governing scalar optics is algebraically very similar to the Klein–Gordon equation for a spin-0 particle. Exploiting this similarity, the Helmholtz equation is linearized in a procedure very similar to the one used by Feshbach–Villars (1958), to linearize the Klein– Gordon equation. This brings the Helmholtz equation to a Dirac-like form, allowing the Foldy–Wouthuysen expansion used in the Dirac electron theory. This formalism gives rise to wavelength-dependent contributions modifying the paraxial behavior (Khan et al., 2002) and the aberration coefficients (Khan, 2005a). This is the nontraditional prescription of scalar optics. In regard to polarization, a systematic procedure for the passage from scalar to vector wave optics to handle paraxial beam propagation problems, completely taking into account the manner in which Maxwell’s equations couple the spatial variation and polarization of light waves, has been formulated by analyzing the basic Poincaré invariance of the system. This procedure has been successfully used to clarify several issues in Maxwell optics (Mukunda et al., 1983a, 1983b, 1985; Simon et al., 1986, 1987). In all the aforementioned approaches, the beam optics and the polarization are studied separately using very different processes. The derivation of the Helmholtz equation from Maxwell’s equations is an approximation, since the spatial and temporal derivatives of the permittivity and permeability of the medium are meglected. It is logical to seek a prescription based fully on Maxwell’s equations. The starting point for such a prescription is the exact matrix representation of the Maxwell equations, taking into account the spatial and temporal variations of the permittivity and permeability (Khan, 2005b). It is necessary and sufficient to
The Foldy–Wouthuysen Transformation Technique in Optics
61
use 8 × 8 matrices for such an exact representation (Khan, 2005b). This representation uses the Riemann—Silberstein vector (Silberstein, 1907a; Silberstein, 1907b). For a detailed discussion of the Riemann-Silberstein complex vector see Bialynicki-Birula (1994, 1996a, 1996b). The derivation of the required matrix representation and how it differs from numerous other is presented in Appendix B. The derived representation using 8 × 8 matrices has a close algebraic similarity to the Dirac equation, enabling the use of the Foldy–Wouthuysen transform. The beam-optical Hamiltonian derived from this representation reproduces the Hamiltonians obtained in the traditional prescription along with wavelength-dependent matrix terms, which we have called polarization terms. These polarization terms are very similar to the spin terms in the Dirac electron theory and the spin-precession terms in the beam-optical version of the Thomas—BMT equation (Conte et al., 1996). The matrix formulation provides a unified treatment of beam optics and light polarization. Some well-known results of light polarization are obtained as the paraxial limit of the matrix formulation (Mukunda et al., 1983a, 1983b, 1985; Simon et al., 1986, 1987). Results from the specific example of the graded-index medium considered in the nontraditional prescription of Maxwell optics (Khan, 2006b) are worth noting. First, it predicts an image rotation (proportional to the wavelength), and its magnitude is explicitly given. Second, it provides all nine aberrations permitted by the axial symmetry. (The traditional approaches give six aberrations. The exact treatment of Maxwell optics modifies the six aberration coefficients by wavelength-dependent contributions and also gives rise to the remaining three permitted by the axial symmetry.) The existence of the nine aberrations and image rotation are well known in axially symmetric magnetic lenses, even when treated classically. The quantum treatment of the same system leads to the wavelength-dependent modifications (Jagannathan and Khan, 1996). The alternate procedure for the Helmholtz optics (Khan, 2005a) gives the usual six aberrations (though modified by the wavelength-dependent contributions) and does not provide any image rotation. These extra aberrations and the image rotation are the exclusive outcome of the fact that the formalism is based on a treatment starting with an exact matrix representation of the Maxwell equations. The traditional beam optics (in particular, the Lie algebraic formalism of light beam optics (Dragt, 1982, 1988; Dragt et al., 1986) is completely obtained from the nontraditional prescriptions in the limit wavelength, – λ −→ 0, termed the traditional limit of our formalism. This is analogous to the classical limit obtained by taking −→ 0 in quantum prescriptions. The use of the Foldy–Wouthuysen machinery in the nontraditional prescriptions of Helmholtz optics and Maxwell optics is very similar to the one used in the quantum theory of charged-particle beam optics, developed by Jagannathan et al. There too the classical prescriptions are recovered (Lie algebraic formalism of charged-particle beam optics; Todesco, 1999; Turchetti, 1989) in the limit – λ0 −→ 0, where –λ0 = /p0 is the de Broglie
62
Sameen Ahmed Khan
wavelength and p0 is the design momentum of the system under study. The Foldy–Wouthuysen transformation has allowed novel approaches to light optics and charged-particle optics, respectively.
V. CONCLUSION The use of the Foldy-Wouthuysen transformation technique in optics (Khan, 2006a) has shed light on the deeper connections in the wavelengthdependent regime between light optics and charged-particle optics (Khan, 2002b). The beginning of the analogy between geometrical optics and mechanics is usually attributed to Descartes (1637 CE), but it can actually be traced back to Ibn Al-Haitham Alhazen (c. 0965–1037 CE) (Ambrosini et al., 1997; see also Khan, 2007, and the references therein for the “Medieval Arab Contributions to Optics”; Rashed, 1990, 1993). Historically, variational principles played a fundamental role in the evolution of mathematical models in classical physics, and many equations were derived using them. Here the relevant examples are Fermat’s principle in optics and Maupertuis’ principle in mechanics. The analogy between the trajectory of material particles in potential fields and the path of light rays in media with continuously variable refractive index was formalized by Hamilton in 1833. This Hamiltonian analogy led to the development of electron optics in 1920s, when Busch derived the focusing action and a lens-like action of the axially symmetric magnetic field using the methodology of geometrical optics. Around the same time, Louis de Broglie associated his now-famous wavelength to moving particles. Schrödinger extended the analogy by passing from geometrical optics to wave optics through his wave equation incorporating the de Broglie wavelength. This analogy played a fundamental role in the early development of quantum mechanics. On the other hand, the analogy, led to the development of practical electron optics, and one of the early inventions was the electron microscope by Ernst Ruska. A detailed account of Hamilton’s analogy is available in Hawkes and Kasper (1989a, 1989b, 1994), Born and Wolf (1999), and Forbes (2001). Until very recently, it was possible to recognize this analogy only between geometrical optics and classical prescriptions of electron optics—the quantum theories of charged-particle beam optics have been under development only for about a decade (Conte et al., 1996; Jagannathan et al., 1989; Jagannathan, 1990, 1993, 1999, 2002, 2003; Khan and Jagannathan 1993, 1994, 1995; Jagannathan and Khan, 1995, 1996, 1997; Khan, 1997, 1999a, 1999b, 2001, 2002a, 2002b, 2002c). The quantum prescriptions have the very expected wavelength-dependent effects, which have no analogue in the traditional descriptions of light beam optics. With the recent development of the nontraditional prescriptions of Helmholtz optics (Khan, 2002b, 2005a, Khan et al., 2002) and the matrix formulation of Maxwell optics (Khan, 2006b), accompanied with wavelength-dependent
The Foldy–Wouthuysen Transformation Technique in Optics
63
effects, it is seen that the analogy between the two systems persists. The nontraditional prescription of Helmholtz optics closly resembles the quantum theory of charged-particle beam optics based on the Klein– Gordon equation. The matrix formulation of Maxwell optics is in close semblance with the quantum theory of charged-particle beam optics based on the Dirac equation. The Table summarizes the Hamiltonians in the different prescriptions of light beam optics and charged-particle beam 0, p are the paraxial Hamiloptics for magnetic systems, respectively. H tonians, with lowest-order wavelength-dependent contributions. From the Hamiltonians in the table the following observations are made: The classical/traditional Hamiltonians of particle/light optics are modified by wavelength-dependent contributions in the quantum/nontraditional prescriptions respectively. The algebraic forms of these modifications in each row are very similar. The starting equations have one-to-one algebraic correspondence: Helmholtz ↔ Klein–Gordon; Matrix form of Maxwell ↔ Dirac equation. Finally, the de Broglie wavelength, –λ0 , and –λ have an analogous status, and the classical/traditional limit is obtained by taking – λ −→ 0, respectively. The parallels between the two systems λ0 −→ 0 and – are certain to provide more insights. If not for the Foldy–Wouthuysen transformation, it would not have been possible to recognize the new aspects of the similarities between light optics and charged-particle optics (Khan, 2002b).
TABLE
Hamiltonians in Different Prescriptions Notation
Light Beam Optics
Charged-Particle Beam Optics
Fermat’s principle 1/2 H = − n2 (r) − p2⊥
Maupertuis’ principle 1/2 H = − p20 − π2⊥ − qAz
NonTraditional Helmholtz 2 0, p = −n(r) + 1 H 2n0 p⊥
– ∂ − iλ 3 n(r) p2⊥ , ∂z
Klein–Gordon formalism 2 0, p = −p0 − qAz + 1 H 2p0 π ⊥ * 2 ∂ 2+ π⊥ , ∂z + i 4 π⊥
Maxwell matrix 2 0, p = −n(r) + 1 H 2n0 p⊥ – −iλβ ·u
Dirac formalism 0, p = −p0 − qAz + H
16n0
16p0
1 2 π⊥ 2p0
− 2p0 {μγ⊥ · B⊥ / 0 + q + μ z Bz + i m0 c Bz
λ2 w 2 β + 2n1 0 – √ Refractive index, n(r) = c (r)μ(r) √ Resistance, h(r) = μ(r)/(r)
π⊥ = p⊥ − qA⊥ Anomalous magnetic moment, μa
u(r) = − 2n(1r ) ∇n(r)
Anomalous electric moment, a
w(r) = 2h(1r ) ∇h(r) and β are the Dirac matrices.
μ = 2m0 μa /, γ = E/m0 c2
= 2m0 a /
64
Sameen Ahmed Khan
APPENDIX A The Feshbach–Villars Form of the Klein–Gordon Equation The method used to cast the time-independent Klein–Gordon equation ∂ into a beam optical form linear in ∂z , suitable for a systematic study, through successive approximations, using the Foldy–Wouthuysen-like transformation technique borrowed from the Dirac theory, is similar to the manner the time-dependent Klein–Gordon equation is transformed (Feshbach and Villars, 1958) to the Schrödinger form, containing only firstorder time derivative, in order to study its nonrelativistic limit using the Foldy–Wouthuysen technique (see, e.g., Bjorken and Drell, 1964). Defining
=
∂ , ∂t
(A1)
the free particle Klein–Gordon equation is written as
' $ 2 c4 ∂ m 0 . = c2 ∇ 2 − ∂t 2
(A2)
Introducing the linear combinations
+ =
! " i 1 + , 2 m0 c 2
− =
! " i 1 − 2 m0 c2
(A3)
the Klein–Gordon equation is seen to be equivalent to a pair of of coupled differential equations:
i
0 ∂ 2 ∇ 2 / + = − + + − + m0 c2 + ∂t 2m0
i
0 ∂ 2 ∇ 2 / + + − − m0 c2 − . − = ∂t 2m0
(A4)
Equation (A4) can be written in a two-component language as
! " ! " ∂ + FV + i = H0 , − ∂t −
(A5)
The Foldy–Wouthuysen Transformation Technique in Optics
65
FV , given by with the Feshbach–Villars Hamiltonian for the free particle, H 0
⎛ FV = ⎝ H 0
m0 c2 + p2
p2 2m0
p2 2m0 p2 −m0 c2 − 2m0
− 2m0
= m0 c2 σz +
⎞ ⎠
p2 p2 σz + i σy . 2m0 2m0
(A6)
For a free nonrelativistic particle with kinetic energy m0 c2 , it is seen that + is large compared to − . In the presence of an electromagnetic field, the interaction is introduced through the minimal coupling
p −→ π = p − qA,
i
∂ ∂ −→ i − qφ. ∂t ∂t
(A7)
The corresponding Feshbach–Villars form of the Klein–Gordon equation becomes
! " ! " ∂ + FV + =H i − ∂t − ⎛ ⎞ ⎛ + m1c2 + 0 1 ⎠= ⎜ ⎝ ⎝ 2 − − m1c2 0
FV
H
0 ⎞ i ∂t∂ − qφ ⎟ / ∂ 0 ⎠ i ∂t − qφ /
+ O = m0 c σz + E 2
π2 = qφ + E σz , 2m0
π2 =i O σy . 2m0
(A8)
As in the free-particle case, in the nonrelativistic situation + is large com does not couple + and − , whereas O is pared to − . The even term E odd, which couples + and − . Starting from Eq.(A8), the nonrelativistic limit of the Klein–Gordon equation, with various correction terms, can be understood using the Foldy–Wouthuysen technique (see e.g., Bjorken and Drell, 1964). It is clear that we have adopted the above technique for studying the z-evolution of the Helmholtz wave equation in an optical system comprising a spatially varying refractive index. The additional feature of our formalism is the extra approximation of dropping σz in an intermediate stage to take into account that we are interested only in the forward-propagating beam along the z-direction.
66
Sameen Ahmed Khan
APPENDIX B An Exact Matrix Representation of the Maxwell's Equations in a Medium Matrix representations of the Maxwell’s equations are very well-known (Laporte and Uhlenbeck 1931; Moses, 1959; Majorana, 1974). However, these representations all lack an exactness and/or are denoted in terms of a pair of matrix equations (Bialynicki–Birula, 1994, 1996a, 1996b). Some of these representations are in free space. Such a representation is an approximation in a medium with space- and time-dependent permittivity (r, t) and permeability μ(r, t), respectively. Even this approximation is often expressed through a pair of equations using 3 × 3 matrices: one for the curl and one for the divergence that occurs in the Maxwell equations. This practice of writing the divergence condition separately is completely avoidable by using 4 × 4 matrices for Maxwell’s equations in free-space (Moses, 1959). A single equation using 4 × 4 matrices is necessary and sufficient when (r, t) and μ(r, t) are treated as “local” constants (Bialynicki-Birula, 1996b; Moses, 1959). A treatment that considers the variations of (r, t) and μ(r, t) has been presented in Bialynicki-Birula(1996b). This treatment uses the Riemann– Silberstein vectors, F ± (r, t) to re-express the Maxwell equations as four equations; two equations are for the curl and two are for the divergences, and there is mixing in F + (r, t) and F − (r, t). This mixing is very precisely expressed through the two derived functions of (r, t) and μ(r, t). These four equations are then expressed as a pair of matrix equations using 6 × 6 matrices—again one for the curl and one for the divergence. Even though this treatment is exact, it involves a pair of matrix equations. We present a treatment that allows the expression of the Maxwell equations in a single matrix equation instead of a pair of matrix equations. This approach is a logical continuation of the treatment in Bialynicki-Birula (1996b). We use the linear combination of the components of the Riemann–Silberstein vectors, F ± (r, t) and the final matrix representation is a single equation using 8 × 8 matrices. This representation contains all four Maxwell equations taking into account the spatial and temporal variations of the permittivity (r, t) and the permeability μ(r, t). Section 1 summarizes the treatment for a homogeneous medium and introduces the required functions and notation. Section 2 presents the matrix representation in an inhomogeneous medium with sources.
The Foldy–Wouthuysen Transformation Technique in Optics
67
1. HOMOGENEOUS MEDIUM Begin with the Maxwell equations (Jackson, 1998; Panofsky and Phillips, 1962) in an inhomogeneous medium with sources,
∇ · D (r, t) = ρ, ∇ × H (r, t) −
∂ D (r, t) = J, ∂t
∇ × E (r, t) +
∂ B (r, t) = 0, ∂t
∇ · B (r, t) = 0.
(B1)
The media are assumed to be linear, that is, D = E, and B = μH, where is the permittivity of the medium and μ is the permeability of the medium. In general, = (r, t) and μ = μ(r, t). This section treats them as “local” constants in the various derivations. The magnitude √ of the velocity of light in the medium is given by v(r, t) = |v(r, t)| = 1/ (r, t)μ(r, t). In vacuum, 0 = 8.85 × 10−12 C2 /N.m2 and μ0 = 4π × 10−7 N/A2 . One possible way to obtain the required matrix representation is to use the Riemann–Silberstein vector (Bialynicki-Birula, 1996b) given by
! " 1 1 ; F (r, t) = √ (r, t)E (r, t) + i √ B (r, t) μ(r, t) 2 " ! 1 ; 1 − (r, t)E (r, t) − i √ B (r, t) . F (r, t) = √ μ(r, t) 2 +
(B2)
For any homogeneous medium it is equivalent to use either F + (r, t) or F − (r, t). The two differ by the sign before “i” and are not the complex conjugate of one another. No form is assumed for E(r, t) and B(r, t). These will both be needed in an inhomogeneous medium (see Section 2). If for a certain medium (r, t) and μ(r, t) are constants (or can be treated as local constants under certain approximations), then the vectors F ± (r, t) satisfy
i
∂ ± 1 F (r, t) = ±v∇ × F ± (r, t) − √ (iJ) ∂t 2
1 ∇ · F ± (r, t) = √ (ρ). 2
(B3)
68
Sameen Ahmed Khan
Thus, by using the Riemann–Silberstein vector it has been possible to re-express the four Maxwell equations (for a medium with constant and μ) as two equations. The first one contains the two Maxwell equations with curl and the second one contains the two Maxwell equations with divergences. The first of the two equations in Eq. (B3) can be immediately converted into a 3 × 3 matrix representation. However, this representation does not contain the divergence conditions (the first and the fourth Maxwell equations) contained in the second equation in Eq. (B3). A further compactification is possible only by expressing the Maxwell equations in a 4 × 4 matrix representation. To this end, using the components of the Riemann–Silberstein vector, we define,
⎡
⎡
⎤ −Fx+ + iFy+ ⎢ ⎥ Fz+ ⎥, + (r, t) = ⎢ + ⎣ ⎦ Fz + + Fx + iFy
⎤ −Fx− − iFy− ⎢ ⎥ Fz− ⎥. − (r, t) = ⎢ − ⎣ ⎦ Fz − − Fx − iFy
(B4)
The vectors for the sources are
W+ =
!
⎤ −Jx + iJy 1 ⎢ Jz − vρ ⎥ √ ⎣ J + vρ ⎦ , z 2 Jx + iJy "
⎡
W− =
!
⎤ −Jx − iJy 1 ⎢ Jz − vρ ⎥ √ ⎣ J + vρ ⎦ . (B5) z 2 Jx − iJy "
⎡
Then we obtain
∂ + = −v {M · ∇} + − W + ∂t ∂ − = −v M ∗ · ∇ − − W − , ∂t
(B6)
0 / where ( )∗ denotes complex conjugation and the triplet, M = Mx , My , Mz , is expressed in terms of
=
0 1l
−1l , 0
β=
1l 0
0 , −1l
1l =
1 0
0 . 1
(B7)
Alternately, the matrix J = − can be used. Both differ by a sign. For our purpose, it is fine to use either or J. However, they have a different meaning: J is contravariant and is covariant; the matrix corresponds to the Lagrange brackets of classical mechanics, and J corresponds to the Poisson brackets. An important relation is = J −1 . The M matrices are:
The Foldy–Wouthuysen Transformation Technique in Optics
⎡
0 ⎢0 Mx = ⎣ 1 0 ⎡ 0 ⎢0 My = ⎣ i 0 ⎡ 1 ⎢0 Mz = ⎣ 0 0
⎤ 0 1⎥ = −β, 0⎦ 0 ⎤ 0 −i 0 0 0 −i ⎥ = i, 0 0 0 ⎦ i 0 0 ⎤ 0 0 0 1 0 0 ⎥ = β. 0 −1 0 ⎦ 0 0 −1 0 0 0 1
69
1 0 0 0
(B8)
Each of the four Maxwell equations are easily obtained from the matrix representation in Eq. (B3). This is done by taking the sums and differences of row I with row IV and row II with row III, respectively. The first three give the y, x, and z components of the curl; the last one gives the divergence conditions present in the evolution Eq. (B3). Note that the matrices M are all nonsingular and all are Hermitian. Moreover, they satisfy the usual algebra of the Dirac matrices, including,
Mx β = −βMx , My β = −βMy , Mx2 = My2 = Mz2 = I, Mx My = −My Mx = iMz , My Mz = −Mz My = iMx , Mz Mx = −Mx Mz = iMy .
(B9)
Before proceeding further, note the following: The pair (± , M) are not unique. Different choices of ± would give rise to different M, such that the triplet M continues to satisfy the algebra of the Dirac matrices in Eq. (B9). We have preferred ± via the Riemann–Silberstein vector [Eq. (B2)] in (Bialynicki-Birula, 1996b). This vector has certain advantages over the other possible choices. The Riemann–Silberstein vector is well known in classical electrodynamics and has certain interesting properties and uses (Bialynicki-Birula, 1996b). In deriving the above 4 × 4 matrix representation of the Maxwell equations, we have ignored the spatial and temporal derivatives of (r, t) and μ(r, t) in the first two equations. We have treated and μ as local constants.
70
Sameen Ahmed Khan
2. INHOMOGENEOUS MEDIUM The previous section provided the evolution equations for the Riemann– Silberstein vector in Eq. (B3), for a medium, treating (r, t) and μ(r, t) as local constants. From these pairs of equations we derived the matrix form of the Maxwell equations. This section provides the exact equations, taking into account the spatial and temporal variations of (r, t) and μ(r, t). It is possible to write the required evolution equations using (r, t) and μ(r, t), but we follow the procedure in Bialynicki-Birula (1996b) of using the two derived laboratory functions
1 Velocity function : v(r, t) = √ (r, t)μ(r, t) B μ(r, t) . Resistance function : h(r, t) = (r, t)
(B10)
The function, v(r, t) has the dimensions of velocity and the function, h(r, t) has the dimensions of resistance (measured in Ohms). We can equivalently use the conductance function, κ(r, t) = 1/h(r, t) = (r, t)/μ(r, t) (measured in Ohms−1 or Mhos!) in place of the resistance function, h(r, t). These derived functions allow a more transparent understanding of the dependence of the variations (Bialynicki-Birula, 1996b). Moreover, the derived functions are the ones√that are measured experimentally. In terms of these func; tions, = 1/ vh and μ = h/v. Using these functions, the exact equations satisfied by F ± (r, t) are
i
/ 0 1/ 0 ∂ + F (r, t) = v(r, t) ∇ × F + (r, t) + ∇v(r, t) × F + (r, t) ∂t 2 0 / i ; v(r, t) ∇h(r, t) × F − (r, t) − √ v(r, t)h(r, t) J + 2h(r) 2
˙ t) i h(r, i v˙ (r, t) + F (r, t) + F − (r, t) 2 v(r, t) 2 h(r, t) / 0 1/ 0 ∂ ∇v(r, t) × F − (r, t) i F − (r, t) = −v(r, t) ∇ × F − (r, t) − ∂t 2 0 i ; v(r, t) / ∇h(r, t) × F + (r, t) − √ v(r, t)h(r, t) J − 2h(r, t) 2 ˙ t) i v˙ (r, t) − i h(r, + F (r, t) + F + (r, t) 2 v(r, t) 2 h(r, t) +
The Foldy–Wouthuysen Transformation Technique in Optics
∇ · F + (r, t) =
71
/ 0 1 ∇v(r, t) · F + (r, t) 2v(r, t)
0 / 1 ∇h(r, t) · F − (r, t) 2h(r, t) 1 ; + √ v(r, t)h(r, t) ρ, 2 0 / 1 ∇v(r, t) · F − (r, t) ∇ · F − (r, t) = 2v(r, t) +
0 / 1 ∇h(r, t) · F + (r, t) 2h(r, t) 1 ; + √ v(r, t)h(r, t) ρ, 2
+
(B11)
∂h ˙ where v˙ = ∂v ∂t and h = ∂t . The evolution equations in Eq. (B11) are exact (for a linear media), and the dependence on the variations of (r, t) and μ(r, t) has been neatly expressed through the two derived functions. The coupling between F + (r, t) and F − (r, t) is via the gradient and time derivative of only one derived function, namely, h(r, t) or equivalently κ(r, t). Either of these can be used and both are the directly measured quantities. We further note that the dependence of the coupling is logarithmic
/ 0 1 ∇h(r, t) = ∇ ln h(r, t) , h(r, t)
0 1 ˙ ∂ / h(r, t) = ln h(r, t) , h(r, t) ∂t (B12)
where ln is the natural logarithm. The coupling can be best summarized by expressing the equations in Eq. (B11) in a (block) matrix form. For this we introduce the following logarithmic function:
L(r, t) =
/ 0 1 1l ln (v(r, t)) + σx ln h(r, t) , 2
(B13)
where σx is from the triplet of the Pauli matrices
! 0 σ = σx = 1
1 , σy = 0
0 i
−i , σz = 0
1 0
0 −1
" .
(B14)
72
Sameen Ahmed Khan
Using the above notation, the matrix form of the equations in (B11) is
C
∂ ∂ i 1l − L ∂t ∂t
{1l∇ − ∇L} ·
F + (r, t) F − (r, t)
= v(r)σz {1l∇ + ∇L} ×
F + (r, t) F − (r, t)
i ; − √ v(r, t)h(r, t) J 2 F + (r, t) F − (r, t)
1 ; = + √ v(r, t)h(r, t) ρ, 2
(B15)
where the dot-product and the cross-product are to be understood as
A C
A C
B u · D v B u × D v
= =
A·u+B·v C·u+D·v A×u+B×v . C×u+D×v
(B16)
Note that the 6 × 6 matrices in the evolution equations in Eq. (B15) are either Hermitian or antihermitian. Any dependence / on the 0variations / of (r, t)0 and μ(r, t) is at best weak. Further note, ∇ ln (v(r, t)) = −∇ ln (n(r, t)) / 0 / 0 and ∂t∂ ln (v(r, t)) = − ∂t∂ ln (n(r, t)) . In some media, the coupling may ˙ t) = 0) and in the same media the refracvanish (∇h(r, t) = 0 and h(r, ˙ 00t) = 0). It tive index, n(r, t) = c/v(r, t) may vary (∇n(r, t) = 0 and/or / / n(r, may be further possible to use the approximations ∇ ln h(r, t) ≈ 0 and / / 00 ∂ ∂t ln h(r, t) ≈ 0. We use the following matrices to express the exact representation
=
σ 0
0 , σ
α=
0 σ
σ , 0
I=
1l 0
0 , 1l
(B17)
where are the Dirac spin matrices and α are the matrices used in the Dirac equation. Then,
∂ ∂t
I 0 0 I
+ −
v˙ (r, t) − 2v(r, t)
˙ t) 0 h(r, + 2h(r, t) iβαy
I 0
iβαy 0
0 I
+ −
+ −
The Foldy–Wouthuysen Transformation Technique in Optics
{M ·/∇ + 0· u} −iβ ∗ · w αy + 0 W , I W−
= −v(r, t) −
I 0
( · w) αy −iβ M ∗ · ∇ + ∗ · u
73
+ − (B18)
where
1 ∇v(r, t) = 2v(r, t) 1 w(r, t) = ∇h(r, t) = 2h(r, t) u(r, t) =
1 1 ∇ {ln v(r, t)} = − ∇ {ln n(r, t)} 2 2 1 ∇ {ln h(r, t)} . (B19) 2
The above representation contains thirteen 8 × 8 matrices! Ten of these are Hermitian. The exceptional ones are those that contain the three components of w(r, t), the logarithmic gradient of the resistance function. These three matrices, for the resistance function, are antihermitian. We have expressed the Maxwell equations in a matrix form in a medium with varying permittivity (r, t) and permeability μ(r, t), in presence of sources. We have been able to do so using a single equation instead of a pair of matrix equations. We have used 8 × 8 matrices and have been able to separate the dependence of the coupling between the upper components (+ ) and the lower components (− ) through the two laboratory functions. Moreover, the exact matrix representation has an algebraic structure very similar to the Dirac equation. It is interesting to note that the Maxwell equations can be derived from the Fermat principle of geometrical optics by the process of wavization analogous to the quantization of classical mechanics (Pradahan, 1987). We believe that this representation would be more suitable for some of the studies related to the photon wave function (Bialynicki-Birula, 1996b).
ACKNOWLEDGMENTS I am grateful to Professor Ramaswamy Jagannathan for my training in the field of quantum theory of charged-particle beam optics, which was the topic of my doctoral thesis, which he so elegantly supervised. Naturally, we were dealing with the relativistic wave equations; he taught me the related techniques, including the Foldy–Wouthuysen transformation. I am thankful to him for suggesting the novel project of investigating the light beam optics, initially the scalar optics (Helmholtz optics) using the Foldy–Wouthuysen transformation. I also thankfully acknowledge the collaboration, during the initial work, with Professor Rajiah Simon. Later Professor Jagannathan guided me to the logical continuation from scalar optics to the vector optics, leading to the matrix formulation of Maxwell optics, again using the Foldy–Wouthuysen transformation. I had the benefit of many discussions with Professor Simon. During the course of my investigations, I had the privilege to enjoy the hospitality of the Institute of Mathematical Sciences (MatScience/IMSc) in Chennai (Madras),
74
Sameen Ahmed Khan
India. Also thank Professor Hawkes for showing a keen interest in the quantum theory of charged-particle beam optics, which led Jagannathan and me to write a long and comprehensive chapter a decade ago. Professor Hawkes’ continued encouragement has resulted in this chapter.
REFERENCES Acharya, R., and Sudarshan, E. C. G. (1960). Front description in relativistic quantum mechanics. J. Mathe. Phys. 1, 532–536. Ambrosini, D., Ponticiello, A., Schirripa Spagnolo, G., Borghi, R., and Gori, F. (1997). Bouncing light beams and the Hamiltonian analogy. Eur. J. Phys. 18, 284–289. Asaga, T., Fujita, T., and Hiramoto M. (2000). EDM operators free from Schiff’s Theorem. arXiv: hep-ph/0005314. Bialynicki-Birula, I. (1994). On the wave function of the photon. Acta Phys. Polonica A 86, 97–116. Bialynicki-Birula, I. (1996a). The photon wave function. In “Coherence and Quantum Optics VII” (J. H., Eberly, L. Mandel, and E. Wolf. eds.), pp. 313–322. Plenum Press, New York. Bialynicki-Birula, I. (1996b). Photon wave function. In “Progress in Optics, vol. XXXVI (E. Wolf, ed.), pp. 245–294. Elsevier, Amsterdam. Bjorken, J. D., and Drell, S. D. (1964). Relativistic Quantum Mechanics. McGraw-Hill, New York. Born, M. and Wolf, E. (1999). Principles of Optics, Cambridge University Press, Cambridge, UK. Brown, R. W., Krauss, L. M., and Taylor, P. L. (2001). Obituary of Leslie Lawrence Foldy. Physics Today 54(12), 75–76. Case, K. M. (1954). Some generalizations of the Foldy-Wouthuysen transformation. Phys. Rev. 95, 1323–1328. Chen, P. (ed). (1999). Proceedings of the 15th Advanced ICFA Beam Dynamics Workshop on Quantum Aspects of Beam Physics, January 4–9, 1998, Monterey, California, World Scientific, Singapore. Chen, P. (ed.). (2002). Proceedings of the 18th Advanced ICFA Beam Dynamics Workshop on Quantum Aspects of Beam Physics, October 15–20, 2000, Capri, Italy, World Scientific, Singapore. Chen, P. and K. Reil (Eds.) (2003) Proceedings of the Joint 28th ICFA Advanced Beam Dynamics and Advanced & Novel Accelerators Workshop on Quantum Aspects of Beam Physics and Other Critical Issues of Beams in Physics and Astrophysics, January 7–11, 2003, Hiroshima University, Japan, World Scientific, Singapore, http://home.hiroshimau.ac.jp/ogata/qabp/home.html, http://www.slac.stanford.edu/pubs/slacreports/slacr-630.html; Chen, P. (1998). Workshop Reports: ICFA Beam Dynamics Newsletter, 16, (1998) 22–25. Chen, P. (2000). Workshop Reports: ICFA Beam Dynamics Newsletter, 23, (2000) 13–14. Chen, P. (2003a). Workshop Reports: ICFA Beam Dynamics Newsletter, 30, (2003) 72–75. Chen, P. (2003b). Workshop Reports: Bulletin of the Association of Asia Pacific Physical Societies, 13(1), (2003) 34–37. Conte, M., Jagannathan, R., Khan, S. A., and Pusterla, M. (1996). Beam optics of the Dirac particle with anomalous magnetic moment. Particle Accelerators 56, 99–126. Costella, J. P., and McKellar, B. H. J. (1995). The Foldy-Woutuysen transformation. arXiv: hep-ph/9503416. American Journal of Physics, 63 (1995) 1119–1121. Dragt, A. J. (1982). A Lie algebraic theory of geometrical optics and optical aberrations. J. Opt. Soc. Am. 72, (1982) 372–379. Dragt, A. J. (1988). Lie Algebraic Method for Ray and Wave Optics. University of Maryland Physics Department Report.
The Foldy–Wouthuysen Transformation Technique in Optics
75
Dragt, A. J., and Forest, E. (1986). Advances in Imaging and Electron Physics. Academic Press, San Diego, 67, 65–120. Dragt, A. J., Forest, R., and Wolf, K. B. (1986). Foundations of a Lie algebraic theory of geometrical optics. In “Lie Methods in Optics, Lecture Notes in Physics,” Vol. 250, pp. 105–157. Springer Verlag, Berlin. Dragt, A. J., Neri, F., Rangarajan, G., Douglas, D. R., Healy, L. M., and Ryne, R. D. (1988). Lie Algebraic Treatment of Linear and Nonlinear Beam Dynamics. Ann. Rev. Nucl. Part. Sci., 38, 455–496. Forest, E., and Hirata, K. (1992). A Contemporary Guide to Beam Dynamics, KEK Report 92-12 National Laboratory for High Energy Physics, Tsukuba, Japan. Forest, E., Berz, M., and Irwin, J. (1989). Part. Accel. 24, 91–97. Fedele. R., and Man’ko, V. I. (1999). The role of Semiclassical description in the Quantum-like theory of light rays. Physi. Rev. E 60, 6042–6050. Feshbach, H., and Villars, F. M. H. (1958). Elementary relativistic wave mechanics of spin 0 and spin 1/2 particles. Rev. Mod. Phys. 30, 24–45. Fishman, L. (1992). Exact and operator rational approximate solutions of the Helmholtz, Weyl composition equation in underwater acoustics-the quadratic profile. J. Math. Phys. 33(5), 1887–1914. Fishman, L. (2004). One-way wave equation modeling in two-way wave propagation problems. In Mathematical Modelling of Wave Phenomena 2002, Mathematical Modelling in Physics, Engineering and Cognitive Sciences, vol. 7, B. Nilsson, L. Fishman (eds.), pp. 91–111. Växjö University Press, Växjö Sweden. Fishman, L., and McCoy, J. J. (1984). Derivation and application of extended parabolic wave theories. Part I. The factored Helmholtz equation. J. Math. Phy. 25, 285–296. Foldy, L. L. (1952). The electromagnetic properties of the Dirac particles. Phys. Rev., 87(5), 682–693. Foldy, L. L. (2006). Origins of the FW transformation: A memoir, appendix G. In “Physics at a Research University, Case Western Reserve University 1830–1990” (William Fickinger, ed), pp. 347–351. http://www.phys.cwru.edu/history Foldy, L. L. and Wouthuysen, S. A. (1950). On the Dirac theory of spin 1/2 particles and its non-relativistic limit. Phys. Rev. 78, 29–36. Forbes, G. W. (2001). Hamilton’s optics: Characterizing ray mapping and opening a link to waves. Optics Photonics News 12(11), 34–38. Goodman, J. W. (1996). Introduction to Fourier Optics. 2nd ed., McGraw-Hill, New York. Hawkes, P. W., and Kasper, E. (1989a). “Principles of Electron Optics”, Vol. I, “Basic Geometrical Optics.” Academic Press, London. Hawkes, P. W., and Kasper, E. (1989b). “Principles of Electron Optics”, Vol. II, “Applied Geometrical Optics.” Academic Press, London. Hawkes, P. W., and Kasper, E. (1994). “Principles of Electron Optics”, Vol. III, “Wave Optics.” Academic Press, London. Heinemann, K., and Barber, D. P. (1999). The semiclassical Foldy-Wouthuysen transformation and the derivation of the Bloch equation for spin-1/2 polarised beams using Wigner functions, arXiv: physics/9901044. In Proceedings of the 15th Advanced ICFA Beam Dynamics Workshop on Quantum Aspects of Beam Physics, (P. Chen, ed.), pp. 4–9 January 4–9, 1998, Monterey, California. World Scientific, Singapore. Jackson, J. D. (1998). Classical Electrodynamics, 3rd ed. John Wiley & Sons, New York. Jagannathan, R., (1990). Quantum theory of electron lenses based on the Dirac equation. Phys. Rev. A 42, 6674–6689. Jagannathan, R. (1993). Dirac equation and electron optics. In “Dirac and Feynman: Pioneers in Quantum Mechanics” (R. Dutt and A. K. Ray, eds.), pp. 75–82. Wiley Eastern, New Delhi, India. Jagannathan, R. (1999). The Dirac equation approach to spin-1/2 particle beam optics, arXiv: physics/9803042, pp. 670–681. In Proceedings of the 15th Advanced ICFA Beam Dynamics
76
Sameen Ahmed Khan
Workshop on quantum Aspects of Beam Physics, (P. Chen. ed.) January 4–9, 1998, Monterey, California, World Scientific, Singapore. Jagannathan, R. (2002). Quantum mechanics of Dirac particle beam optics: Single-particle theory, arXiv: physics/0101060, pp. 568–577. In Proceedings of the 18th Advanced ICFA Beam Dynamics Workshop on Quantum Aspects of Beam Physics, (P. Chen. ed.) October 15–20, 2000, Capri, Italy, World Scientific, Singapore. Jagannathan, R. (2003). Quantum mechanics of Dirac particle beam transport through optical elements with straight and curved axes, arXiv: physics/0304099. pp. 13–21, In Proceedings of the 28th Advanced ICFA Beam Dynamics Workshop on Quantum Aspects of Beam Physics, (P. Chen. and K. Reil. eds.) January 2003, Hiroshima, Japan, World Scientific, Singapore. Jagannathan, R., and Khan, S. A. (1995), Wigner functions in charged particle optics. In “Selected Topics in Mathematical Physics-Professor R. Vasudevan Memorial Volume,” (R. Sridhar, K. Srinivasa Rao, and V. Lakshminarayanan, Eds.), pp. 308–321. Allied Publishers, Delhi, India. Jagannathan, R., and Khan, S. A. (1996). Quantum theory of the optics of charged particles. In Advances in Imaging and Electron Physics, vol. 97, (Peter Hawkes, ed.), pp. 257–358. Academic Press, San Diego. Jagannathan, R., and Khan, S. A. (1997). Quantum mechanics of accelerator optics. ICFA Beam Dynamics Newsletter 13, 21–27. Jagannathan, R., Simon, R., Sudarshan, E. C. G., and Mukunda, N. (1989). Quantum theory of magnetic electron lenses based on the Dirac equation. Phys. Lett. A 134, 457–464. Jayaraman, J. (1975). A note on the recent Foldy-Wouthuysen transformations for particles of arbitrary spin. J. Phys. A Math. Gen. 8, L1–L4. Khan, S. A. (1997). Quantum Theory of Charged-Particle Beam Optics. Ph.D Thesis, University of Madras, Chennai, India. Khan, S. A. (1999a). Quantum theory of magnetic quadrupole lenses for spin-1/2 particles, arXiv: hysics/9809032, pp. 682–694. In Proceedings of the 15th Advanced ICFA Beam Dynamics Workshop on Quantum Aspects of Beam Physics, (P. Chen. ed.) January 4–9, 1998, Monterey, California, World Scientific, Singapore. Khan, S. A. (1999b). Quantum aspects of accelerator optics, arXiv: physics/9904063, In: Proceedings of the 1999 Particle Accelerator Conference (PAC99), pp. 2817–2819. (Luccio, A., MacKay, W., Ed.) March 29–April 02, 1999, New York City, NY), (IEEE Catalogue Number: 99CH36366). Khan, S. A. (2001). The World of Synchrotrons, arXiv: physics/0112086, Resonance Journal of Science Education, 6 (11) 77–84. (Monthly Publication of the Indian Academy of Sciences (IAS)). Khan, S. A. (2002a). Quantum formalism of beam optics, arXiv: physics/0112085, pp. 517–526. In Proceedings of the 18th Advanced ICFA Beam Dynamics Workshop on Quantum Aspects of Beam Physics, (P. Chen. ed.) October 15–20, 2000, Capri, Italy, World Scientific, Singapore. Khan, S. A. (2002b). Analogies between light optics and charged-particle optics. arXiv: physics/0210028. ICFA Beam Dynamics Newsletter 27, 42–48. Khan, S. A. (2002c). Introduction to Synchrotron Radiation, arXiv: physics/0112086, Bulletin of the IAPT, 19 (5), 149–153. (IAPT: Indian Association of Physics Teachers). Khan, S. A. (2005a). Wavelength-dependent modifications in Helmholtz optics. Int. J. Theoret. Phys., 44(1), 95–125. Khan, S. A. (2005b). An exact matrix representation of Maxwell’s equations. Phys. Scripta 71(5), 440–442. Khan, S. A. (2006a). The Foldy-Wouthuysen transformation technique in optics. OptikInternational Journal for Light and Electron Optics 117(10), 481–488. Khan, S. A. (2006b). Wavelength-dependent effects in light optics. In “New Topics in Quantum Physics Research,” (Volodymyr Krasnoholovets and Frank Columbus, eds.), pp. 163–204. Nova Science Publishers, New York.
The Foldy–Wouthuysen Transformation Technique in Optics
77
Khan, S. A. (2007). Arab origins of the discovery of the refraction of light: Roshdi Hifni Rashed awarded the 2007 King Faisal International Prize. Optics Photonics News 18(10), 22–23. Khan, S. A., and Jagannathan, R. (1993). Theory of relativistic electron beam transport based on the Dirac equation. pp. 102–107. In Proceedings of the 3rd National Seminar on Physics and Technology of Particle Accelerators and their Applications, PATPAA-93, (S. N. Chintalapudi, ed) November 25–27, 1993, Kolkata (Calcutta), IUC-DAEF, Kolkata (Calcutta), India. Khan, S. A., and Jagannathan, R. (1994). Quantum mechanics of charged-particle beam optics: An operator approach. Presented at the JSPS-KEK International Spring School on High Energy Ion Beams—Novel Beam Techniques and Their Applications, March 1994, Tsukuba, Japan. Khan, S. A., and Jagannathan, R. (1995). On the quantum mechanics of charged particle beam transport through magnetic lenses. Phys. Rev. E 51, 2510–2515. Khan, S. A., Jagannathan, R., and Simon, R. (2002). Foldy-Wouthuysen transformation and a quasiparaxial approximation scheme for the scalar wave theory of light beams. arXiv: physics/0209082. Laporte, O., and Uhlenbeck, G. E. (1931). Applications of spinor analysis to the Maxwell and Dirac equations. Phys. Rev., 37, 1380–1397. Leopold, H. (1997). Obituary of Siegfried A Wouthuysen. Physics Today 50(11), 89. Lippert, M., Brückel, Th., Köhler, Th., and Schneider, J. R. (1994). High-resolution bulk magnetic scattering of high-energy synchrotron radiation. Europhys. Lett. 27(7), 537–541. Majorana, E. (1974). (unpublished notes), quoted after Mignani, R., Recami, E., and Baldo, M. About a Diraclike equation for the Photon, according to Ettore Majorana. Lett. Nuovo Cimento 11, 568–572. Moses, E. (1959). Solutions of Maxwell’s equations in terms of a spinor notation: the direct and inverse problems. Phys. Rev. 113(6), 1670–1679. Mukunda, N., Simon, R., and Sudarshan, E. C. G. (1983a). Paraxial-wave optics and relativistic front description. I. The scalar theory. Phys. Rev. A 28, 2921–2932. Mukunda, N., Simon, R., and Sudarshan, E. C. G. (1983b). Paraxial-wave optics and relativistic front description. II. The vector theory. Phys. Rev. A 28, 2933–2942. Mukunda, N., Simon, R., and Sudarshan, E. C. G. (1985). Fourier optics for the Maxwell field: Formalism and applications. J. Opti. Soc. Am. A 2(3), 416–426. Orris, G. J., and Wurmser, D. (1995). Applications of the Foldy-Wouthuysen transformation to acoustic modeling using the parabolic equation method. J Acoust. Soc. Am. 98, 2870. Osche, G. R. (1977). Dirac and Dirac-Pauli equation in the Foldy-Wouthuysen representation, Phys. Rev. D 15(8), 2181–2185. Pachucki, K. (2004). Higher-order effective Hamiltonian for light atomic systems. arXiv: physics/0411168. Panofsky, W. K. H., and Phillips, M. (1962). Classical Electricity and Magnetics. Addison-Wesley Publishing Company. Reading, Massachusetts, USA. Patton, R. S. (1986). In “Path Integrals from meV to MeV” (M. Gutzwiller, A. Inomata, J. R. Klauder, and L. Streit, Eds.), pp. 98–115. World Scientific, Singapore. Pradhan, T. (1987). Maxwell’s equations from geometrical optics. Phys. Lett. A 122(8), 397–398. Pryce, M. H. L. (1948). The mass-centre in the restricted theory of relativity and its connexion with the quantum theory of elementary particles. Proc. R. Soc. London A Math Phys. Sci. A 195, 62–81. Rangarajan, G., Dragt, A. J., and Neri, F. (1990). Solvable map representation of a nonlinear symplectic map. Part. Accel. 28, 119–124. Rashed, R. (1990). A Pioneer in Anaclastics—Ibn Sahl on Burning Mirrors and Lenses. ISIS, 81 464–491. Rashed, R. (1993). Géométrie et Dioptrique au Xe siècle: Ibn Sahl, al-Quhî et Ibn al-Haytham. Collection Sciences et Philosophie Arabes, Textes et Études, Les Belles Lettres, Paris. France.
78
Sameen Ahmed Khan
Ryne, R. D., and Dragt, A. J. (1991). Magnetic optics calculations for cylindrically symmetric beams. Part. Accel. 35, 129–165. Silberstein, L. (1907a). Elektromagnetische Grundgleichungen in bivektorieller Behandlung, Ann. Phys. (Leipzig) 22, 579–586. Silberstein, L. (1907b). Nachtrag zur Abhandlung über Elektromagnetische Grundgleichungen in bivektorieller Behandlung. Ann. Phys. (Leipzig) 24, 783–784. Simon R., Sudarshan, E. C. G., and Mukunda, N. (1986). Gaussian-Maxwell beams. J. Optic. Soc. Am. A 3(4), 536–540. Simon R., Sudarshan, E. C. G., and Mukunda, N. (1987). Cross polarization in laser beams. Appl. Optics 26(9), 1589–1593. Tani, S. (1951). Connection between particle models and field theories. I. The case spin 1/2. Prog. Theoret. Phys. 6, 267–285. Todesco, E. (1999). Overview of single-particle nonlinear dynamics. Presented at 16th ICFA Beam Dynamics Workshop on Nonlinear and Collective Phenomena in Beam Physics, Arcidosso, Italy, Sept 1–5, 1998. AIP Conf. Proc. 468, 157–172. Turchetti, G., Bazzani, A., Giovannozzi, M., Servizi, G., and Todesco, E. (1989). Normal forms for symplectic maps and stability of beams in particle accelerators, pp. 203–231. Proceedings of the Dynamical Symmetries and Chaotic Behaviour in Physical Systems, Bologna, Italy. Wurmser, D. (2001). A new strategy for applying the parabolic equation to a penetrable rough surface. J. Acoust. Soc. Am. 109(5), 2300. Wurmser, D. (2004). A parabolic equation for penetrable rough surfaces: using the FoldyWouthuysen transformation to buffer density jumps. Ann. Phys. 311, 53–80.
CHAPTER
3 Nonlinear Systems for Image Processing Saverio Morfu*, Patrick Marquié*, Brice Nofiélé*, and Dominique Ginhac*
Contents
I Introduction II Mechanical Analogy A Overdamped Case B Inertial Systems III Inertial Systems A Image Processing B Electronic Implementation IV Reaction–Diffusion Systems A One-Dimensional Lattice B Noise Filtering of a One-Dimensional Signal C Two-Dimensional Filtering: Image Processing V Conclusion VI Outlooks A Outlooks on Microelectronic Implementation B Future Processing Applications Acknowledgments Appendix A Appendix B Appendix C Appendix D References
79 83 84 90 95 95 103 108 108 111 119 133 134 134 135 141 142 143 144 145 146
I. INTRODUCTION For almost 100 years, nonlinear science has attracted the attention of researchers to circumvent the limitation of linear theories in the explanation of natural phenomenons. Indeed, nonlinear differential equations can model the behavior of ocean surfaces (Scott, 1999), the recurrence of ice ages (Benzi et al., 1982), the transport mechanisms in living cells * Laboratoire LE2I UMR 5158, Aile des sciences de l’ingénieur, BP 47870 21078 Dijon, Cedex, France Advances in Imaging and Electron Physics,Volume 152, ISSN 1076-5670, DOI: 10.1016/S1076-5670(08)00603-4. Copyright © 2008 Elsevier Inc. All rights reserved.
79
80
Saverio Morfu et al.
(Murray, 1989), the information transmission in neural networks (Izhikevich, 2007; Nagumo et al., 1962; Scott, 1999), the blood pressure propagation in arteries (Paquerot and Remoissenet, 1994), or the excitability of cardiac tissues (Beeler and Reuter, 1977; Keener, 1987). Therefore, nonlinear science appears as the most important frontier for a better understanding of nature (Remoissenet, 1999). In the recent field of engineering science (Agrawal1, 2002; Zakharov and Wabnitz, 1998), considering nonlinearity has allowed spectacular progress in terms of transmission capacities in optical fibers via the concept of soliton (Remoissenet, 1999). More recently, nonlinear differential equations in many areas of physics, biology, chemistry, and ecology have inspired unconventional methods of processing that transcend the limitations of classical linear methods (Teuscher and Adamatzky, 2005). This growing interest for processing applications based on the properties of nonlinear systems can be explained by the observation that fundamental progress in several fields of computer science sometimes seems to stagnate. Novel ideas derived from interdisciplinary fields often open new directions of research with unsuspected applications (Teuscher and Adamatzky, 2005). On the other hand, complex processing tasks require intelligent systems capable of adapting and learning by mimicking the behavior of the human brain. Biologically inspired systems, most often described by nonlinear reaction-diffusion equations, have been proposed as convenient solutions to very complicated problems unaccessible to modern von Neumann computers. It was in this context that the concept of the cellular neural network (CNN) was introduced by Chua and Yang as a novel class of information-processing systems with potential applications in areas such as image processing and pattern recognition (Chua and Yang, 1988a, b). In fact, CNN is used in the context of brain science or the context of emergence and complexity (Chua, 1998). Since the pioneer work of Chua, the CNN paradigm has rapidly evolved to cover a wide range of applications drawn from numerous disciplines, including artificial life, biology, chemistry, physics, information science, nonconventional methods of computing (Holden et al., 1991), video coding (Arena et al., 2003; Venetianer et al., 1995), quality control by visual inspection (Occhipinti et al., 2001), cryptography (Caponetto et al., 2003; Yu and Cao, 2006), signal-image processing (Julian and Dogaru, 2002), and so on (see Tetzlaff (2002), for an overview of the applications). In summary, the past two decades devoted to the study of CNNs have led scientists to solve problems of artificial intelligence by combining the highly parallel multiprocessor architecture of CNNs with the properties inherited from the nonlinear bio-inspired systems. Among the tasks of high computational complexity routinely performed with nonlinear systems are the optimal path in a two-dimensional (2D) vector field (Agladze et al., 1997), image skeletonization (Chua, 1998), finding
Nonlinear Systems for Image Processing
81
the shortest path in a labyrinth (Chua, 1998; Rambidi and Yakovenchuk, 2001), or controlling mobile robots (Adamatzky et al., 2004). However, the efficiency of these nonlinear systems for signal-image processing or pattern recognition does not come only from their biological background. Indeed, the nonlinearity offers an additional dimension lying in the signal amplitude, which gives rise to novel properties not shared by linear systems. Noise removal with a nonlinear dissipative lattice (Comte et al., 1998; Marquié et al., 1998), contrast enhancement based on nonlinear oscillators properties (Morfu and Comte, 2004), edge detection exploiting vibration noise (Hongler et al., 2003), optimization by noise of nonoptimum problems or signal detection aided by noise via the stochastic resonance phenomenon (Chapeau-Blondeau, 2000; Comte and Morfu, 2003; Gammaitoni et al., 1998) constitute a nonrestrictive list of examples in which the properties of nonlinear systems have allowed overcoming the limitations of classical linear approaches. Owing to the rich variety of potential applications inspired by nonlinear systems, the efforts of researchers have focused on the experimental realization of such efficient information-processing devices. Two different strategies were introduced (Chua and Yang, 1988a; Kuhnert, 1986), and today, the fascinating challenge of artificial intelligence implementation with CNN is still being investigated. The first technique dates from the late 1980s with the works of Kuhnert, who proposed taking advantage of the properties of Belousov– Zhabotinsky-type media for image-processing purposes (Kuhnert, 1986; Kuhnert et al., 1989). The primary concept is that each micro-volume of the active photosensitive chemical medium acts as a one-bit processor corresponding to the reduced/oxidized state of the catalyst (Agladze et al., 1997). This feature of chemical photosensitive nonlinear media has allowed implementation of numerous tools for image processing. Edge enhancement, classical operations of mathematical morphology, the restoration of individual components of an image with overlapped components (Rambidi et al., 2002), the image skeletonization (Adamatzky et al., 2002), the detection of urban roads, or the analysis of medical images (Teuscher and Adamatzky, 2005) represent a brief overview of processing tasks computed by chemical nonlinear media. However, even considering the large number of chemical “processors,” the very low velocity of trigger waves in chemical media is sometimes incompatible with real-time processing constraints imposed by practical applications (Agladze et al., 1997). Nevertheless, the limitations of these unconventional methods of computing no way dismiss the efficiency and high prospects of the processing developed with active chemical media (Adamatzky and de Lacy Costello, 2003). By contrast, analog circuits do not share the weakness of the previous strategy of integration. Therefore, because of their real-time processing
82
Saverio Morfu et al.
capability, electronic hardware devices constitute the most common way to implement CNNs (Chua and Yang, 1988a). The first step to electronically develop a CNN for image-processing purposes consists of designing an elementary cell. More precisely, this basic unit of CNNs usually contains linear capacitors, linear resistors, and linear and nonlinear controlled sources (Chua and Yang, 1988b; Comte and Marquié, 2003). Next, to complete the description of the network, a coupling law between cells is introduced. Owing to the propagation mechanism inherited from the continuous-time dynamics of the network, the cells do not only interact with their nearest neighbors but also with cells that are not directly connected. Among the applications that can be electronically realized are character recognition (Chua and Yang, 1988), edge filtering (Chen et al., 2006; Comte et al., 2001), noise filtering (Comte et al., 1998; Julián and Dogaru, 2002; Marquié et al., 1998), contrast enhancement, and graylevel extraction with a nonlinear oscillators network (Morfu, 2005; Morfu et al., 2007). The principle of CNN integration with discrete electronic components is closely related to the development of nonlinear electrical transmission lines (NLTLs) (Remoissenet, 1999). Indeed, under certain conditions (Chua, 1998), the parallel processing of information can be ruled by nonlinear differential equations that also describe the evolution of the voltage at the nodes of an electrical lattice. It is then clear that considering a onedimensional (1D) lattice allows signal filtering, while extending the concept to a 2D network can provide image processing applications. The development of NLTLs was motivated mainly by the fact that these systems are quite simple and relatively that inexpensive experimental devices allow quantitative study of the properties of nonlinear waves (Scott, 1970). In particular, since the pioneering works by Hirota and Suzuki (1970) and Nagashima and Amagishi (1978) on electrical lines simulating the Toda lattice (Toda, 1967), these NLTLs, which can be considered as analog simulators, provide a useful way to determine the behavior of excitations inside the nonlinear medium (Jäger, 1985; Kuusela, 1995; Marquié et al., 1995; Yamgoué et al., 2007). This chapter is devoted primarily to the presentation of a few particular nonlinear processing tools and discusses their electronic implementation with discrete components. After a brief mechanical description of nonlinear systems, we present a review of the properties of both purely inertial systems and overdamped systems. The following sections show how taking advantage of these properties allows the development of unconventional processing methods. Especially considering the features of purely inertial systems, we show how it is possible to perform various image-processing tasks, such as contrast enhancement of a weakly contrasted picture, extraction of gray levels, or encryption of an image. The electronic sketch of the elementary cell of this
Nonlinear Systems for Image Processing
83
inertial CNN is proposed, and the nonlinear properties that allows the previous image processing tasks are experimentally investigated. The third part of this chapter is devoted exclusively to the filtering applications inspired by reaction-diffusion media—for example, noise filtering, edge detection, or extraction of interest regions in a weakly noisy contrasted picture. In each case, the elementary cell of the electronic CNN is developed and we experimentally investigate its behavior in the specific context of signal-image processing. We conclude by discussing the possible microelectronic implementations of the previous nonlinear systems. In addition, the last section contains some perspectives for future developments inspired by recent properties of nonlinear systems. In particular, we present a paradoxical nonlinear effect known as stochastic resonance (Benzi et al., 1982; Chapeau-Blondeau, 1999; Gammaitoni et al., 1998), which is purported to have potential applications in visual perception (Simonotto et al., 1997). We trust that the multiple topics in this contribution will assist readers in better understanding the potential applications based on the properties of nonlinear systems. Moreover, the various electronic realizations presented constitute a serious background for future experiments and studies devoted to nonlinear phenomena. As it is written for an interdisciplinary readership of physicist and engineers, it is our hope that this chapter will encourage readers to perform their own experiments.
II. MECHANICAL ANALOGY In order to understand the image-processing tools inspired by the properties of nonlinear systems, we present a mechanical analogy of these nonlinear systems. From a mechanical point of view, we consider a chain of particles of mass M submitted to a nonlinear force f deriving from a potential and coupled with springs of strength D. If Wn represents the displacement of the particle n, the fundamental principle of the mechanics is written as
M
d 2 Wn dWn d +λ + Rn , =− dt dWn dt2
(1)
d2 W dW represents the inertia term and λ corresponds to a friction dt dt2 force. Furthermore, the resulting elastic force Rn applied to the nth particle by its neighbors can be defined by:
where M
Rn = D
!
" Wj − Wn ,
j∈Nr
(2)
Saverio Morfu et al.
84
where Nr is the neighborhood, namely, Nr = {n − 1, n + 1} in the case of a 1D chain. We propose to investigate separately the purely inertial case, that is d2 W dW d2 W , and the overdamped one deduced when M 2 << M 2 >> λ dt dt dt dW λ . dt
A. Overdamped Case In this section, an overdamped system is presented by neglecting the inertia term of Eq. (1) compared to the friction force. We specifically consider λ = 1 and the case of a cubic nonlinear force
f (W) = −W(W − α)(W − 1),
(3)
DW deriving from the double-well potential (W) = − 0 f (u)du as represented in Figure 1 for different values of α. The roots of the nonlinear force 0 and 1 correspond to the positions of the local minima of the potential, namely, the well bottoms, whereas the root α represents the position of the potential maximum. The nonlinearity threshold α defines the potential barrier between the potential minimum with the highest energy and the potential maximum. To explain the propagation mechanism in this chain, it is convenient to define the excited state by the position of the potential minimum with the highest energy, and the rest state by the position corresponding to the minimum of the potential energy. As shown in Figure 1a,
0.02
0.06 ␣ 5 0.4
␣ 5 0.8
0.04
20.02
␣ 5 0.7
F(W )
F(W )
0
␣ 5 0.2
0.02 20.04 20.2
␣ 5 0.6
␣ 5 0.3
0
0.2
0.4
0.6
0.8
1
1.2
0 20.2
0
0.2
0.4
0.6
W
W
(a)
(b)
0.8
1
1.2
FIGURE 1 Double-well potential deduced from the nonlinear force (3). (a) For α < 1/2 the well Dbottom with highest energy is located at W = 0, the potential barrier is given α by = 0 f (u)du = φ(α) − φ(0). (b) For α > 1/2 the symmetry of the potential is reversed: W = 1 becomes D α the position of the well bottom of highest energy, and the potential barrier is = 1 f (u)du = φ(α) − φ(1).
Nonlinear Systems for Image Processing
85
the excited state is 0 and the rest state is 1 when the nonlinearity threshold α < 1/2. In the case α > 1/2, since the potential symmetry is reversed, the excited state becomes 1 and the rest state is 0 (Figure 1b). The equation that rules this overdamped nonlinear systems can be deduced from Eq. (1). Indeed, when the second derivative versus time is neglected compared to the first derivative and when λ = 1, Eq. (1) reduces to the discrete version of Fisher’s equation, introduced in the 1930s as a model for genetic diffusion (Fisher, 1937):
dWn = D(Wn+1 + Wn−1 − 2Wn ) + f (Wn ). dt
(4)
1. Uncoupled Case We first investigate the uncoupled case, that is, D = 0 in Eq. (4), to determine the bistability of the system. The behavior of a single particle of displacement W and initial position W 0 obeys
dW = −W(W − α)(W − 1). dt
(5)
The zeros of the nonlinear force f , W = 1 and W = 0 correspond to stable steady states, whereas the state W = α is unstable. The stability analysis can be realized by solving Eq. (5) substituting the nonlinear force f = −W(W − α)(W − 1) for its linearized expression near the considered steady states W ∗ ∈ {0, 1, α}. If fW (W ∗ ) denotes the derivative versus W of the nonlinear force for W = W ∗ , we are led to solve:
dW = fW (W ∗ )(W − W ∗ ) + f (W ∗ ). dt
(6)
The solution of Eq. (6) can then be easily expressed as
W(t) = W ∗ + Ce fW (W
∗ )t
−
f (W ∗ ) fW (W ∗ )
(7)
where C is a constant depending on the initial condition—the initial position of the particle. The solution in Eq. (7), obtained with a linear approximation of the nonlinear force f , shows that the stability is set by the sign of the argument of the exponential function. Indeed, for W ∗ = 0 and W ∗ = 1, the sign of fW (W ∗ ) is negative, involving that W(t # → ∞) tends to a constant. Therefore, the two points W ∗ = 0 and W ∗ = 1 are stable steady states. Conversely, for W ∗ = α, fW (W ∗ ) is
Saverio Morfu et al.
86
positive, inducing a divergence for W(t # → ∞). W ∗ = α is an unstable steady state. We now focus our attention on the particular case α = 1/2 since it will allow interesting applications in signal and image processing. This case is intensively developed in Appendix A, where it is shown that the displacement of a particle with initial position W 0 can be expressed by
⎞
⎛ W(t) =
1 2
− 1⎜ ⎟ ⎠. ⎝1 + # 1 2 1 − t (W 0 − 2 )2 − W 0 (W 0 − 1)e 2 W0
(8)
This theoretical expression is compared in Figure 2 to the numerical results obtained solving Eq. (5) using a fourth-order Runge–Kutta algorithm with integrating time step dt = 10−3 . As shown in Figure 2, when the initial condition W 0 is below the unstable state α = 1/2, the particle evolves toward the steady state 0. Otherwise, if the initial condition W 0 exceeds the unstable state α = 1/2, the particle evolves toward the other steady state 1. Therefore, the unstable states α = 1/2 acts as a threshold and the system exhibits a bistable behavior.
1
1 Stable state
0.9 0.8
0.8
0.7
0.7 Unstable state
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3 0.2
0.2 Stable state
0.1 0
displacement W
Displacement x (normalized units)
0.9
0
1
2
3
4 5 6 7 t (normalized units)
8
9
10
16
8 F(W )(1023)
0.1 0
0
FIGURE 2 Bistable behavior of the overdamped system in the case α = 1/2. Left: Evolution of a particle for different initial conditions in the range [0; 1]. The solid line is plotted with the analytical expression in Eq. (8), whereas the (o) signs correspond to the numerical solution of Eq. (5) for different initial conditions W 0 ∈ [0; 1]. The potential φ obtained by integrating the nonlinear force (3) is represented at the right as a reference.
Nonlinear Systems for Image Processing
87
2. Coupled Case We now consider the coupled case (D = 0). In such systems ruled by Eq. (4), the balance between the dissipation and the nonlinearity gives rise to the propagation of a kink (a localized wave) called a diffusive soliton that propagates with constant velocity and profile (Remoissenet, 1999). To understand the propagation mechanism, we first consider the weak coupling limit and the case α < 1/2. The case of strong coupling, which corresponds to a continuous medium, is discussed later since it allows theoretical characterization of the waves propagating in the medium.
a. Weak Coupling Limit. As shown in Figure 3a, initially all particles of the chain are located at the position 0—the excited state. To initiate a kink, an external forcing allows the first particle to cross the potential barrier in W = α and to fall in the right well, at the rest state defined by the position W = 1. Thanks to the spring coupling the first particle to the second one, but despite the second spring, the second particle attempts α3 α4 (Morfu, 2003) to cross the potential barrier with height (α) = − + 12 6 (see Figure 3b).
W4 D W3
W4 D (Wn)
W3
?
(Wn)
D
W2
D D W2
W1
D Δ(␣)
D
Δ(␣)
W1
0
␣ (t 5 0)
1
Wn
0 W2 ␣
1
Wn
(t . 0)
(a) (b) FIGURE 3 Propagation mechanism. (a) Initially all particles of the chain are in the excited state 0, that is, at the bottom of the well with highest energy. (b) State of the chain for t > 0. The first particle has crossed the potential barrier and attempts to pull the second particle down in its fall.
88
Saverio Morfu et al.
According to the value of the resulting force applied to the second particle by the two springs compared to the nonlinear force f between [0, α[, two behaviors may occur: 1. If the resulting elastic force is sufficiently important to allow the second particle to cross the potential barrier (α), then this particle falls in the right well and pulls the next particle down in its fall. Since each particle of the chain successively undergoes a transition from the excited state 0 to the rest state 1, a kink propagates in the medium. Moreover, its velocity increases versus the coupling and as the barrier decreases (namely, as α decreases). 2. Otherwise, if the resulting force does not exceed a critical value (i.e., if D < D∗ (α)), the second particle cannot cross the potential barrier and thus stays pinned at a position w in [0; α[: it is the well-known propagation failure effect (Comte et al., 2001; Erneux and Nicolis, 1993; Keener, 1987; Kladko et al., 2000). The mechanical model associated with Eq. (4) shows that in the weak coupling limit the characteristics of the nonlinear system are ruled by the coupling D and the nonlinear threshold α. Moreover, the propagation of a kink is due to the transition from the excited state to the rest state and is only possible when the coupling D exceeds a critical value D∗ (α).
b. Limit of Continuous Media. The velocity of the kink and its profile can be theoretically obtained in the limit of continuous media—when the coupling D is large enough compared to the nonlinear strength. Then, in the continuous limit, the discrete Laplacian of Eq. (4) can be replaced by a second derivative versus the space variable z: ∂2 W ∂W = D 2 + f (W). ∂t ∂z
(9)
This equation, introduced by Nagumo in the 1940s as an elementary representation of the conduction along an active nerve fiber, has an important meaning in understanding transport mechanism in biological systems (Murray, 1989; Nagumo et al., 1962). Unlike the discrete Equation (4), the continuous Equation (9) admits D1 propagative kink solution only if 0 f (u)du = 0, which reduces to α = 1/2 in the case of the cubic force (3) (Scott, 1999). Introducing the propagative variable ξ = z − ct, these kinks and antikinks have the form (Fife, 1979; Henry, 1981)
! " 1 1 1 ± tanh √ (ξ − ξ0 ) , W(ξ) = 2 2 2D
(10)
89
Nonlinear Systems for Image Processing
where ξ0 is the initial position ; of the kink for t = 0 and where the kink velocity is defined by c = ± D/2(1 − 2α). When α < 1/2, the excited state is 0, and the rest state is 1. Therefore, the rest state 1 spreads in the chain, which set the sign of the velocity according to the profile of the kink initiated in the nonlinear system: " − ξ0 ) , a kink ; propagates from left to right with a positive velocity c = D/2(1 − 2α) (Figure 4a, left). " !
1. If the profile is given by W(ξ) =
1 2
!
1 − tanh
2. Otherwise, if the profile is set by W(ξ) =
1 2
√1 (ξ 2 2D
√1 (ξ 2 2D
1 + tanh
− ξ0 )
,
a; kink propagates from right to left with a negative velocity c = − D/2(1 − 2α) (Figure 4a, right). When α > 1/2, since the symmetry of the potential is reversed, the excited state becomes 1 and the rest state is 0. The propagation is then due to a transition between 1 and 0, which provides the following behavior: ! " 1. If W(ξ) = 12 1 − tanh √1 (ξ − ξ0 ) , a kink propagates from right to 2 2D ; left with a negative velocity c = D/2(1 − 2α) (Figure 4b, left).
25
0 Z
5
0.6
0.5
0.2 0 215
15
25
0 Z
5
W
W(z)
W(z)
0.6 0.2 0 215
1
1
1
0 15 0.5 21.5 23.5 F(W ) 102
(a)
25
0 Z
5
15
0.6
0.5
0.2 0 215
25
0 Z
5
15
4
W
W(z)
W(z)
0.6 0.2 0 215
1
1
1
0 2 0 2 F(W ) 10
(b)
FIGURE 4 Propagative solution of the continuous Nagumo Equation (9) with D = 1. Spatial representation of the kink for t = 0 in dotted line and for t = 20 in solid line. The arrow indicates the propagation direction, the corresponding potential is represented at the right end to provide a reference. (a) α = 0.3, (b) α = 0.7.
Saverio Morfu et al.
90
" − ξ0 ) , a kink propagates from left ; to right with a positive velocity c = − D/2(1 − 2α) (Figure 4b, right).
2. Else if W(ξ) =
1 2
!
1 + tanh
√1 (ξ 2 2D
B. Inertial Systems In this section, we neglect the dissipative term of Eq. (1) compared to the inertia term and we restrict our study to the uncoupled case. Moreover, in image-processing context, it is convenient to introduce a nonlinear force f under the form
f (W) = −ω02 (W − m)(W − m − α)(W − m + α),
(11)
where, m and α < m are two parameters that allow adjusting the width and height = ω02 α4 /4 of the potential (Figure 5):
E (W) = −
0
W
f (u)du.
(12)
The nonlinear differential equation that rules the uncoupled chain can be deduced by inserting the nonlinear force (11) into Eq. (1) with D = 0. 0
0
first particle: W 1
21 0
second particle: W 2
Potential energy
22 23 24 25 26 27 28 0
W2
0
0.5
1
1.5 2 2.5 3 3.5 4 m 2 ␣!W 2 m 1 ␣!W 2 W (Arb.Unit)
4.5
5
0
2m 2W 2
FIGURE 5 Double-well potential deduced from the nonlinear force (11) represented √ for m = 2.58, α = 1.02, and ω0 = 1. A particle with an initial condition Wi0 < m − α 2 evolves with an initial potential energy above the barrier .
Nonlinear Systems for Image Processing
91
Neglecting the dissipative term, the particles of unitary mass are then ruled by the following nonlinear oscillator equations:
d 2 Wi = f (Wi ). dt2
(13)
1. Theoretical Analysis We propose here to determine analytically the dynamics of the nonlinear oscillators obeying Eq. (13) (Morfu and Comte, 2004; Morfu et al., 2006). Setting xi = Wi − m, Eq. (13) can be rewritten as
d2 xi = −ω02 xi (xi − α)(xi + α). dt2
(14)
Noting xi0 the initial position of the particle i and considering that all the particles initially have a null velocity, the solutions of Eq. (14) can be expressed with the Jacobian elliptic functions as
xi (t) = xi0 cn(ωi t, ki ),
(15)
where ωi and 0 ≤ ki ≤ 1 represent, respectively, the pulsation and the modulus of the cn function (see recall on the properties of Jacobian elliptic function in Appendix B). Deriving Eq. (15) twice and using the properties in Eq. (B3), yields
dxi = −xi0 ωi sn(ωi t, ki )dn(ωi t, ki ), dt
d2 xi 0 2 2 2 = −x ω cn(ω t, k ) dn (ω t, k ) − k sn (ω t, k ) i i i i i i i . i i dt2
(16)
Using the identities in Eq. (B4) and (B5), Eq. (16) can be rewritten as
2ki ωi2 2ki − 1 02 d 2 xi = − x x2 − xi . 2 0 2ki dt2 xi
(17)
Identifying this last expression with Eq. (14), we derive the pulsation of the Jacobian elliptic function
# 2 ωi = ω0 xi0 − α2 ,
(18)
Saverio Morfu et al.
92
and its modulus 2
1 xi0 ki = . 2 x02 − α2
(19)
i
Finally, introducing the initial condition Wi0 = xi0 + m, the solution of Eq. (13) can be straightforwardly deduced from Eqs. (15), (18), and (19):
/ 0 Wi (t) = m + Wi0 − m cn(ωi t, ki ),
(20)
with
/
ωi Wi0
0
# = ω0
/
Wi0
−m
02
− α2
and
/
ki Wi0
0
02 / 0 Wi − m 1 = / . 0 2 W 0 − m 2 − α2 i (21)
Both the modulus and the pulsation are driven by the initial condition Wi0 . Moreover, the constraints to ensure the existence of the pulsation / 02 ωi and of the modulus, respectively, are written as Wi0 − m − α2 ≥ 0 and 0 ≤ ki ≤ 1. These two restrict the range of the conditions
allowed ini√ F √ m + α 2; +∞ , as shown in tial conditions Wi0 to −∞; m − α 2 Figure 6, where the pulsation and the modulus are represented versus the initial condition Wi0 . Note that this allowed range of initial conditions corresponds also to a particle with an initial potential energy exceeding the barrier between the potential extrema (see Figure 5).
2. Nonlinear Oscillator Properties To illustrate the properties of nonlinear oscillators, we consider a chain of length N = 2 particles with a weak difference of initial conditions and with a null initial velocity. The dynamics of these two oscillators are ruled by Eq. (20), where the pulsation and modulus of both oscillators are driven by their respective initial condition. Moreover, we have restricted our study to the case of the following nonlinearity parameters m = 2.58, α = 1.02, ω0 = 104 . We have applied the initial condition W10 = 0 to the first oscillator, while the initial condition of the second oscillator is set to W20 = 0.2, which corresponds to the situation of Figure 5. Figure 7a shows that the oscillations of both particles take place in the range [Wi0 ; 2m − Wi0 ] as predicted by Eq. (20) [that is, [0; 5.16] for the first oscillator and [0; 4.96] for the second one]. Moreover, owing to their difference of initial amplitude and to the nonlinear behavior of the system, the two oscillators quickly attain a phase opposition for the first time at t = topt = 1.64 × 10−3 . This phase opposition corresponds to the
93
Nonlinear Systems for Image Processing
2.5 Ⲑ0
2 1.5
⌬Ⲑ0
1 0.5 0
Forbidden range of parameters ]m 2 ␣ŒW; 2 m 1 ␣ŒW[ 2
⌬Ⲑ0
0
⌬Wi
0
0.5
1
⌬Wi0
1.5
2
2.5 Wi0
3
3.5
4
4.5
5
4
4.5
5
(a) 1.5 Forbidden range of parameters ]m 2 ␣ŒW; 2 m 1 ␣ŒW[ 2
k
1 0.5 0
0
0.5
1
1.5
2
2.5 Wi0
3
3.5
(b) FIGURE 6 (a): Normalized pulsation ω/ω0 versus the initial condition Wi0 . (b) Modulus parameter k versus Wi0 . The parameters of theFnonlinearity m = 2.58, α = 1.02 impose the allowed amplitude range ] − ∞; 1.137] ]4.023; +∞[.
5 4 3 2 1 0
topt 5 4 3
0
0.5
1
1.5 time
2
2.5 3 23 x(10 )
␦(t) 5 W2(t) 2 W1(t)
w2(t)
w1(t)
topt 6 5 4 3 2 1 0
2 1 0 21 22 23 24
0
0.5
1
1.5 time
(a)
2
2.5 3 x (1023)
25
0
0.5
1
1.5 time
2
2.5
3 23 x (10 )
(b)
FIGURE 7 (a) Temporal evolution of the two oscillators. Top panel: evolution of the first oscillator with initial condition W10 = 0. Bottom panel: evolution of the second oscillator with initial condition W20 = 0.2. (b) Temporal evolution of the displacement difference δ between the two oscillators. Parameters: m = 2.58, α = 1.02, and ω0 = 1.
94
Saverio Morfu et al.
situation where the first oscillator has reached its minimum W1 (topt ) = 0, whereas the second oscillator has attained its maximum W2 (topt ) = 4.96. As shown in Figure 7b, the displacement difference δ(t) = W2 (t) − W1 (t) is then maximum for t = topt and becomes δ(topt ) = 4.96. For this optimal time, a “contrast enhancement” of the weak difference of initial conditions is realized, since initially the displacement difference was δ(t = 0) = 0.2. Note that in Figure 7b, the displacement difference between the two oscillators also presents a periodic behavior with local minima and local maxima. In particular, the difference δ(t) is null for t = 3.96 × 10−5 , t = 1.81 × 10−4 , t = 3.5 × 10−4 , t = 5.21 × 10−4 ; minimum for t = 1.4 × 10−4 , t = 4.64 × 10−4 , t = 1.47 × 10−3 and maximum for t = 3 × 10−4 , t = 6.29 × 10−4 , t = 1.64 × 10−3 . These characteristic times will be of crucial interest in image-processing context to define the filtered tasks performed by the nonlinear oscillators network. Figure 6a reveals that the maximum variation of the pulsation com√ pared to the amplitude Wi0 , that is, ω/ω0 , is reached for Wi0 = m − α 2, that is, for a particle with an initial potential energy near the barrier . Therefore, to quickly realize a great amplitude contrast between the two oscillators, √ it could be interesting to launch them with an initial amplitude near m − α 2, or to increase the potential barrier height . We chose to investigate this latter solution by tuning the parameter of the nonlinearity α, when the initial amplitude of both oscillators remains W10 = 0 and W20 = 0.2. The results are reported in Figure 8, where we present the evolution of the difference δ(t) for different values of α. As expected, when the nonlinearity parameter α increases, the optimal time is significantly √ reduced. However, when α is adjusted near the critical value (m − W20 )/ 2 as in Figure 8d, the optimum reached by the difference δ(t) is reduced to 4.517 for α = 1.63 instead of 4.96 for α = 1.02. Even if it is not the best contrast enhancement that can be performed by the system, the weak difference of initial conditions between the two oscillators is nevertheless strongly enhanced for α = 1.63. To highlight the efficiency of nonlinear systems, let us consider the case of a linear force f (W) = −ω0 W in Eq. (13). In the linear case, the displacement difference δ(t) between two harmonic oscillators can be straightforwardly expressed as
δ(t) = cos(ω0 t),
(22)
where represents the slight difference of initial conditions between the oscillators. This last expression shows that it is impossible to increase the weak difference of initial conditions since the difference δ(t) always remains in the range [−; ]. Therefore, nonlinearity is a convenient solution to overcome the limitation of a linear system and to enhance a weak amplitude contrast.
Nonlinear Systems for Image Processing
topt
topt 5 ␦(t) 5W2(t ) 2W1(t)
␦(t) 5W2(t ) 2W1(t)
5 3 1 21 23 25 0
1
2
3 1 21 23 25 0
3
2 t (b)
topt
3 31023
topt 5 ␦(t ) 5W2(t ) 2W1(t )
5 ␦(t ) 5W2(t ) 2W1(t )
1
31023
t (a)
3 1 21 23 25 0
95
1
2
3
3 1 21 23 25 0
1 2 3 t 31023 t 31023 (c) (d) FIGURE 8 Influence of the nonlinearity parameter α on the displacement difference δ between the two oscillators of respective initial conditions 0 and 0.2. Parameters m = 2.58 and ω0 = 1. (a): (topt = 1.75 × 10−3 ; α = 0.4). (b): (topt = 1.66 × 10−3 ; α = 1.05). (c): (topt = 1.25 × 10−3 ; α = 1.5). (d): (topt = 0.95 × 10−3 ; α = 1.63).
III. INERTIAL SYSTEMS This section presents different image-processing tasks inspired by the properties of the nonlinear oscillators presented in Section II.B. Their electronic implementation is also discussed.
A. Image Processing By analogy with a particle experiencing a double-well potential, the pixel number (i, j) is analog to a particle (oscillator) whose initial position
96
Saverio Morfu et al.
corresponds to the initial gray level Wi,0 j of this pixel. Therefore, if N × M denotes the image size, we are led to consider a 2D network, or CNN, consisting of uncoupled nonlinear oscillators. The node i, j of this CNN relates to
d2 Wi, j dt2
= −ω02 (Wi, j − m − α)(Wi, j − m + α)(Wi, j − m),
(23)
with i = 1, 2 . . . N and j = 1, 2 . . , M. Note that we take into account the range of oscillations [0; 2m − Wi,0 j ] predicted in Section II.B.2 to define the gray scale of the images, namely, 0 for the black level and 2m = 5.16 for the white level. The image to be processed is first loaded as the initial condition at the nodes of the CNN. Next, the filtered image for a processing time t can be deduced noting the position reached by all oscillators of the network at this specific time t. More precisely, the state of the network at a processing time t is obtained by solving numerically Eq. (23) with a fourth-order Runge– Kutta algorithm with integrating time step dt = 10−6 .
1. Contrast Enhancement and Image Inversion The image to process with the nonlinear oscillator network is the weak contrasted image of Figure 9a. Its histogram is restricted to the range [0; 0.2], which means that the maximum gray level of the image (0.2) is the initial condition of at least one oscillator of the network, while the minimum gray level of the image (0) is also the initial condition of at least one oscillator. Therefore, the pixels with initial gray level 0 and 0.2 oscillate with the phase difference δ(t) predicted by Figure 7b. In particular, as explained in Section II.B.2, their phase difference δ(t) can be null for the processing times t = 3.96 × 10−4 , 1.81 × 10−4 , 3.5 × 10−4 , and 5.21 × 10−4 ; minimum for t = 1.4 × 10−4 , 4.64 × 10−4 , and 1.47 × 10−3 and maximum for t = 3 × 10−4 , 6.29 × 10−3 , and 1.64 × 10−3 . As shown in Figure 9b, 9d, 9f, and 9h, the image goes through local minima of contrast at the processing times corresponding to the zeros of δ(t). Furthermore, the processing times providing the local minima of δ(t) realize an image inversion with a growing contrast enhancement (Figure 9c, 9g, and 9j). Indeed, since the minima of δ(t) are negative, for these processing times the minimum of the initial image becomes the maximum of the filtered image and vice versa. Finally, the local maxima of δ(t) achieve local maxima of contrast for the corresponding processing times (Figure 9e, 9i, and 9k). Note that the best enhancement of contrast is attained at the processing time topt for which δ(t) is maximum. The histogram of each filtered image in Figure 9 also reveals the temporal dynamic of the network. Indeed, the width of the image histogram is periodically increased and decreased, which indicates that the
Nonlinear Systems for Image Processing
800
2500
0
4.96
500
0
4.96
2500
4.96 (c)
350 0
4.96
1200
0
4.96
0
(e)
(d)
300 4.96
250
0
(g)
4.96
0
(h)
500
4.96 (f)
1600 0
0
(b)
(a)
4.96 (i)
300 0
4.96 (j)
0
4.96 (k)
FIGURE 9 Filtered images and their corresponding histogram obtained with the nonlinear oscillators network (23) for different processing times. (a) Initial image (t = 0). (b) t = 3.96 × 10−5 . (c) t = 1.4 × 10−4 . (d) t = 1.81 × 10−4 . (e) t = 3 × 10−4 . (f) t = 3.5 × 10−4 . (g) t = 4.64 × 10−4 . (h) t = 5.21 × 10−4 . (i) t = 6.29 × 10−4 . (j) t = 1.47 × 10−3 . (k) t = topt = 1.64 × 10−3 . Parameters: m = 2.58, α = 1.02, ω0 = 1.
97
98
Saverio Morfu et al.
contrast of the corresponding filtered image is periodically enhanced or reduced. Another interesting feature of the realized contrast is determined by the plot of the network response at the processing time topt (Morfu, 2005). This curve also represents the gray level of the pixels of the filtered image versus their initial gray level. Therefore, the horizontal axis corresponds to the initial gray scale, namely, [0; 0.2], whereas the vertical axis represents the gray scale of the processed image. Such curves are plotted in Figure 10 for different values of the nonlinearity parameter α, and at the optimal time defined by the maximum of δ(t). In fact, these times were established in Section II.B.2 in Figure 8. Moreover, to compare the nonlinear contrast enhancement to a uniform one, we have superimposed (dotted line) the curve resulting from a simple multiplication of the initial gray scale by a scale factor. In Figure 10a, since the response of the system for the lowest value of α is most often above the dotted line, the filtered image at the processing time topt = 1.75 × 10−3 for α = 0.4 will be brighter than the image obtained with a simple rescaling. 5
Wi
Wi
5
2.5
0
0
0.1
2.5
0
0.2
0
0
(a)
(b) 5
Wi
Wi
0.2
0
Wi
5
2.5
0
0.1
Wi
0
0.1
W
0 i
(c)
0.2
2.5
0
0
0.1
W
0.2
0 i
(d)
FIGURE 10 Response of the nonlinear system for different nonlinearity parameters α at the corresponding optimal time topt (solid line) compared to a uniform rescaling (dotted line). The curves are obtained with Eqs. (20) and (21) setting the time to the optimum value defined by the maximum of δ(t) (see Figure 8). In addition, we let the initial conditions Wi0 vary in the range [0; 0.2] in Eqs. (20) and (21). (a): (topt = 1.75 × 10−3 ; α = 0.4). (b): (topt = 1.66 × 10−3 ; α = 1.05). (c): (topt = 1.25 × 10−3 ; α = 1.5). (d): (topt = 0.95 × 10−3 ; α = 1.63), ω0 = 1.
Nonlinear Systems for Image Processing
99
As shown in Figure 10b, increasing the nonlinearity parameter α to 1.05 involves an optimum time 1.66 × 10−3 and symmetrically enhances the light and dark gray levels. When the nonlinearity parameter is adjusted to provide the greatest potential barrier (Figure 10c and 10d), the contrast of the medium gray level is unchanged compared to a simple rescaling. Moreover, the dark and light grays are strongly enhanced with a greater distortion when the potential barrier is maximum, that is, for the greatest value of α (Figure 10d).
2. Gray-Level Extraction Considering processing times exceeding the optimal time topt , we propose to perform a gray-level extraction of the continuous gray scale represented in Figure 11a (Morfu, 2005). For the sake of clarity, it is convenient to redefine the white level by 0.2, whereas the black level remains 0. For the nine specific times presented in Figure 11, the response of the system displays a minimum that is successively reached for each level of the initial gray scale. Therefore, with time acting as a discriminating parameter, an appropriate threshold filtering allow extraction of all pixels with a gray level in a given range. Indeed, in Figure 11, the simplest case of a constant threshold Vth = 0.25 provides nine ranges of gray at nine closely different processing times, which constitutes a gray-level extraction. Moreover, owing to the response of the system, the width of the extracted gray-level ranges is reduced in the light gray. Indeed, the range extracted in the dark gray for the processing time t = 3.33 × 10−3 (Figure 11c) is approximatively twice greater than the range extracted in the light gray for t = 3.51 × 10−3 (Figure 11i). To perform a perfect gray-level extraction, the threshold must match with a slight offset the temporal evolution of the minimum attained by the response of the system. Under these conditions, the width of the extracted gray range is set by the value of this offset. Note that the response of the system after the optimal processing times also allow consecutive enhancement of the fragment of the image with different levels of brightness, which is also an important feature of image processing. For instance, in Belousov–Zhabotinsky-type media this property of the system enabled Rambidi et al. (2002) to restore individual components of the picture when the components overlap. Therefore, we trust that considering the temporal evolution of the image loaded in our network could give rise to other interesting image-processing operations.
3. Image Encryption Encryption is another field of application of nonlinear systems. In fact, the chaotic behavior of nonlinear systems can sometimes produce chaotic like waveforms that can be used to encrypt signals for secure communications (Cuomo and Oppenheim, 1993; Dedieu et al., 1993). Even if
100
Saverio Morfu et al.
0.2
0 (a)
Wi0
0.2
0
0
0.2
0.2
0 0
(e)
0.2
0 0
0 0
0.2
6 Wi
Wi
Wi
0.2
Wi0
(g)
6
Wi0
0.2
Wi Wi0
(f)
6
Wi0
6
Wi Wi0
0 0
(d)
6
Wi
6
0 0
Wi0
(c)
(b)
0 0
wi
Wi
Wi 0 0
6
6
6
Wi0
0.2
0 0
Wi0
0.2
(h) (i) (j) FIGURE 11 Gray-level extraction. The response of the system is represented at the top of each figure. At the bottom of each figure, a threshold filtering of the filtered image is realized replacing the pixel gray level with 0.2 (white) if that gray level exceeds the threshold Vth = 0.25, otherwise with 0 (black). (a) Initial gray scale (t = 0). (b) t = 3.3 × 10−3 . (c) t = 3.33 × 10−3 . (d) t = 3.36 × 10−3 . (e) t = 3.39 × 10−3 . (f) t = 3.42 × 10−3 . (g) t = 3.45 × 10−3 . (h) t = 3.48 × 10−3 . (i) t = 3.51 × 10−3 . (j) t = 3.54 × 10−3 . Nonlinearity parameters: m = 2.58, α = 1.02, and ω0 = 1.
Nonlinear Systems for Image Processing
101
many attempts to break the encryption key of these cryptosystems and to retrieve the information have been reported (Short and Parker, 1998; Udaltsov et al., 2003), cryptography based on the properties of chaotic oscillators still attracts the attention of researchers because of the promising applications of chaos in the data transmission field (Kwok and Tang, 2007). Contrary to most studies, in which the dynamics of a single element are usually considered, we propose here a strategy of encryption based on the dynamics of a chain of nonlinear oscillators. More precisely, we consider the case of a noisy image loaded as the initial condition in the inertia network introduced in Section II.B. In addition, we add a uniform noise over [−0.1; 0.1] to the weak-contrast picture of the Coliseum represented in Figure 9a. Since the pixels of the noisy image assume a gray level in the range [−0.1; 0.3], an appropriate change of scale is realized to reset the dynamics of the gray levels to [0; 0.2]. The resulting image is then loaded as the initial condition in the network. For the sake of clarity, the filtered images are presented at different processing times with the corresponding system response in Figure 12. Before the optimal time, we observe the behavior described in Section III.A.1: the image goes through local minima and maxima of contrast until the optimum time topt = 1.64 × 10−3 , where the best contrast enhancement is realized (Figure 12a). Next, for processing times exceeding topt , the noisy part of the image seems to be amplified while the coherent part of the image begins to be increasingly less perceptible (see Figure 12b and 12c obtained for t = 3.28 × 10−3 and t = 6.56 × 10−3 ). Finally, for longer processing times, namely, t = 8.24 × 10−3 and t = 9.84 × 10−3 , the noise background has completely hidden the Coliseum, which constitutes an image encryption. Note that this behavior can be explained with the response of the system, as represented below each filtered image in Figure 12. Indeed, until the response of the system versus the initial condition does not display a “periodic-like” behavior, the coherent part of the image remains perceptible (Figure 12a and 12b). By contrast, as soon as a “periodicity” appears in the system response, the coherent image begins to disappear (Figure 12c). Indeed, the response in Figure 12c shows that four pixels of the initial image with four different gray levels take the same final value in the encrypted image (see the arrows). Therefore, the details of the initial image, which corresponds to the quasi-uniform area of the coherent image, are merged and thus disappear in the encrypted image. Despite the previous merging of gray levels, since noise induces sudden changes in the gray levels of the initial image, the noise conserves its random feature in the encrypted image. Moreover, since the system tends to enlarge the range of amplitude, the weak initial amount of noise is strongly amplified whenever the processing time exceeds topt . The periodicity of
Saverio Morfu et al.
102
5
5
0
0
Wi
0
0.2
Wi
Wi
Wi 0
5
0
0
0.2
Wi
(a)
0
0
(b)
5
0.2
(c)
Wi
5
Wi 0 0
0
Wi
0
Wi
0.2
0
0
0
Wi
0.2
(d) (e) FIGURE 12 Encrypted image and the corresponding response of the nonlinear oscillators network for different times exceeding topt . (a): Enhancement of contrast of the initial image for t = topt = 1.64 × 10−3 . (b): t = 3.28 × 10−3 . (c): t = 6.56 × 10−3 . (d): t = 8.24 × 10−3 . (e): t = 9.84 × 10−3 . Parameters: m = 2.58, α = 1.02, ω0 = 1.
the system response can then be increased for longer processing times until only the noisy part of the image is perceptible (Figure 12d and 12e). A perfect image encryption is then realized. To take advantage of this phenomenon for image encryption, the coherent information (the enhanced image in Figure 12a), must be restored using the encrypted image of Figure 12e. Fortunately, owing to the absence of dissipation, the nonlinear systems is conservative and reversible. It is thus
Nonlinear Systems for Image Processing
103
possible to revert to the optimal time—when the information was the most perceptible. However, the knowledge of the encrypted image is not sufficient to completely restore the coherent information, since at the time of encryption, the velocity of the oscillators was not null. Consequently, it is necessary to know both the position and the velocity of all particles of the network at the time of encryption. The information then can be restored solving numerically Eq. (23) with a negative integrating time step dt = −10−6 . Under these conditions, the time of encryption constitutes the encryption key.
B. Electronic Implementation The elementary cell of the purely inertial system can be developed according to the principle of Figure 13 (Morfu et al., 2007). First, a polynomial source is realized with analog AD633JNZ multipliers and classical inverting amplifier with gain −K. Taking into account the scale factor 1/10 V −1 of the multipliers, the response of the nonlinear circuit to an input voltage VT 1Wi
0
Wi
m2α
m
m1␣ 2K 2K
AD633JN 2
(Wi 2 m)(Wi 2 m 1 ␣)/10
1 R2C2
ee
AD633JN
P (Wi) 5 (Wi 2 m 1 ␣)(Wi 2 m)(Wi 2 m 2 ␣)K 2/100
Wi 5 2 1 ee P (Wi) R2C2 FIGURE 13 Sketch of the elementary cell of the inertial system. m and α are adjusted with external direct current sources, whereas −K is the inverting amplifier gain obtained using TL081CN operational amplifier. The 1N4148 diode allows introduction of the initial condition Wi0 .
Saverio Morfu et al.
104
Wi is given by
P(Wi ) =
0/ 0/ 0 K2 / W i − m Wi − m − α W i − m + α , 100
(24)
where the roots m, m − α, m + α of the polynomial circuit are set with three different external direct current (DC) sources. As shown in Figure 14, the experimental characteristic of the nonlinear source is then in perfect agreement with its theoretical cubic law [Eq. (24)]. Next, a feedback between the input/output of the nonlinear circuits is ensured by a double integrator with time constant RC such that
K2 W=− 100R2 C2
E E
0/ 0/ 0 Wi − m + α Wi − m − α Wi − m dt.
/
(25)
Deriving Eq. (25) twice, the voltage Wi at the input of the nonlinear circuit is written as
0/ 0/ 0 K2 / d 2 Wi Wi − m + α Wi − m − α Wi − m , = − 2 2 2 dt 100R C
(26)
which corresponds exactly to the equation of the purely inertial system (13) with
ω0 = K/(10RC).
(27)
0.4
P (Wi) (Volt)
0.2
0
20.2
20.4
1.5
2
3
2.5
3.5
Wi (Volt) FIGURE 14 Theoretical cubic law in Eq. (24) in solid line compared to the experimental characteristic plotted with crosses. Parameters: m = 2.58 V, α = 1.02 V, K = 10.
4
Nonlinear Systems for Image Processing
105
Finally, the initial condition Wi0 is applied to the elementary cell via a 1N4148 diode with threshold voltage VT = 0.7 V. We adjust the diode anode potential to Wi0 + VT with an external DC source with the diode cathode potential initially set to Wi0 . Then, according to Section III, the circuit begins to oscillate in the range [Wi0 ; 2m − Wi0 ], while the potential of the diode anode remains VT + Wi0 . Assuming that m > Wi0 /2, which is the case in our experiments, the diode is instantaneously blocked once the initial condition is introduced. Note that using a diode to set the initial condition presents the main advantage to “balance” the effect of dissipation inherent in electronic devices. Indeed, the intrinsic dissipation of the experiments tends to reduce the amplitude of the oscillations Wi0 . As soon as the potential of the diode cathode is below Wi0 , the diode conducts instantaneously, introducing periodically the same initial condition in the elementary cell. Therefore, the switch between the two states of the diode presents the advantage of refreshing the oscillation amplitude to their natural value as in absence of dissipation. In summary, the oscillations are available at the diode cathode and are represented in Figure 15a for two different initial conditions, namely, W10 = 0 V (top panel) and W20 = 0.2 V (bottom panel). As previously explained, the way to introduce the initial condition allows balancing the dissipative effects since the oscillation remains with the same amplitude, namely in the range [0 V; 5.34 V] for the first oscillator with initial condition 0, and [0.2 V; 5.1 V] for the second one. Moreover, these ranges match with fairly good agreement the theoretical predictions presented in Section II.B.2, that is [0 V; 5.16 V] for the first oscillator and [0.2 V; 4.96 V] for the second one. Figure 15a also reveals that the two oscillators quickly achieve a phase opposition at the optimal time topt = 1.46 ms instead of 1.64 ms as theoretically established in Section II.B.2. The oscillations difference between the two oscillators in Figure 15b reaches local minima and maxima in agreement with the theoretical behavior observed in Section III. A maximum of 5.1 V is obtained corresponding to the phase opposition W1 (topt ) = 0 V and W2 (topt ) = 5.1 V. Therefore, the weak difference of initial conditions between the oscillators is strongly increased at the optimal time topt . Despite a slight discrepancy of 11% for the optimal time, mainly imputable to the component uncertainties, a purely inertial nonlinear system is then implemented with the properties of Section III. To perfectly characterize the experimental device, we now focus on the response of the nonlinear system to different initial conditions in the range [0 V; 0.2 V]. The plot of the voltage reached at the optimal time topt = 1.46 ms versus the initial condition is compared in Figure 16 to the theoretical curve obtained for the optimum time defined in Section II.B.2, namely, 1.64 ms. The experimental response of the system is
Saverio Morfu et al.
106
topt topt 6
Amplitude (Volt)
4 2 0 22 24 26 0
1
2 3 Time (ms)
4
5
(a) (b) FIGURE 15 (a): Temporal evolution of two elementary cells of the chain with respective initial conditions W10 = 0 V (top panel) and W20 = 0.2 V (bottom panel). (b): Evolution of the voltage difference between the two oscillators. Parameters: K = 10, R = 10 K, C = 10 nF, m = 2.58 V, α = 1.02 V, topt = 1.46 ms. 5
Wi (topt ) (Volt)
4
3
2
1
0 0
0.04
0.08
0.12
0.16
0.2
0
Wi (Volt) FIGURE 16 Response of the system to a set of initial conditions Wi0 ∈ [0; 0.2] at the optimal time. The solid line is obtained with Eqs. (20), (21), and (27) setting the time to the theoretical optimal value 1.64 ms, the initial condition varying in [0; 0.2V]. The crosses are obtained experimentally for the corresponding optimal time 1.46ms. Parameters: R = 10 K, C = 10 nF, m = 2.58 V, α = 1.02 V, K = 10.
107
Nonlinear Systems for Image Processing
then qualitatively confirmed by the theoretical predictions, which allows establishing the validity of the experimental elementary cell for the contrast enhancement presented in Section III.A.1. Finally, we also propose to investigate the response of the system after the optimum time, since it allows the extraction of gray levels. In order to enhance the measures accuracy, we extend the range of initial conditions to [0, 0.5 V] instead of [0, 0.2 V]. The corresponding experimental optimal time becomes topt = 564 μs, whereas the theoretical ones, deduced with the methodology in Section II.B.2, is 610 μs. The resulting theoretical and experimental responses are then plotted in Figure 17a, where a better agreement is effectively observed compared to Figure 16.
6
Wi ( topt) (Volt)
Wi ( topt) (Volt)
5 4 3 2 1 0 21 20.1
4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 20.5 20.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0
0.1
5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 20.1
0.2
0.3
0.4
0.5
0.6
0.4
0.5
0.6
0
Wi (Volt)
(a)
(b) 6 5
Wi ( topt) (Volt)
Wi ( topt) (Volt)
0
Wi (Volt)
4 3 2 1 0
0
0.1
0.2 0
0.3
Wi (Volt)
0.4
0.5
0.6
21 20.1
0
0.1
0.2
0.3
0
Wi (Volt)
(c) (d) FIGURE 17 Theoretical response of the purely inertial system (solid line) compared to the experimental ones (crosses) for 4 different times and for a range of initial conditions [0; 0.5 V ]. Parameters: R = 10 K, C = 10 nF, m = 2.58 V, α = 1.02 V, K = 10. (a) Experimental time t = 564 μs corresponding to the theoretical time t = 610 μs. (b) Experimental time t = 610 μs and theoretical time 713 μs. (c) Experimental time t = 675 μs and theoretical time 789 μs. (d) Experimental time t = 720 μs and theoretical time 841 μs.
Saverio Morfu et al.
108
We have also reported the experimental device response for three different times beyond the optimal time topt = 564 μs in Figure 17b, c, and d—namely, for the experimental times t = 610 μs, t = 675 μs, and t = 720 μs. Since a time scale factor 610/564 = 1.1684 exists between the experimental and the theoretical optimal time, we apply this scale factor to the three previous experimental times. It provides the theoretical times 713 μs, 789 μs, and 841 μs. For each of these three times, we can then compare the experimental response to the theoretical one deduced by letting the initial condition vary in [0; 0.5 V] in Eqs. (20), (21), and (27). Despite some slight discrepancies, the behavior of the experimental device is in good agreement with the theoretical response of the system for the three processing times exceeding the optimal time. Therefore, the extraction of gray levels, presented in Section III.A.2, is electronically implemented with this elementary cell.
IV. REACTION–DIFFUSION SYSTEMS A. One-Dimensional Lattice The motion Eq. (4) of the nonlinear mechanical chain can also describe the evolution of the voltage at the nodes of a nonlinear electrical lattice. This section is devoted to the presentation of this nonlinear electrical lattice. The nonlinear lattice is realized by coupling elementary cells with linear resistors R according to the principle of Figure 18a. Each elementary cell consists of a linear capacitor C in parallel with a nonlinear resistor whose current-voltage characteristic obey the cubic law
INL (u) = βu(u − Va )(u − Vb )/(R0 Va Vb ),
(28)
where 0 < Va < Vb are two voltages, β is a constant, and R0 is the analog to a weighting resistor. D0
R4 1Vcc
i D1 R Un21
R
Un
R
D2
Un11 R U
RNL
C
RNL
C
RNL
C
2Vcc
R1 R3
R4 R2
(a) (b) FIGURE 18 (a) Nonlinear electrical lattice. (b) The nonlinear resistor RNL .
Nonlinear Systems for Image Processing
109
40 Nonlinear Current (mA)
30 20 10 0 210 220 230 240 250 260
0
0.2
0.4
0.6 0.8 1 1.2 Voltage (V ) FIGURE 19 Current-voltage characteristics of the nonlinear resistor. The theoretical law [Eq. (28)] in the solid line is compared to the experimental data plotted with crosses. The dotted lines represent the asymptotic behavior of the nonlinear resistor. Parameters: R0 = 3.078 K, Vb = 1.12 V, Va = 0.545 V, β = 1.
The nonlinear resistor can be developed using two different methods. The first method to obtain a cubic current is to consider the circuit of Figure 18b with three branches (Binczak et al., 1998; Comte, 1996). A linear resistor R3 , a negative resistor, and another linear resistor R1 are successively added in parallel thanks to 1N4148 diodes. Due to the switch of the diodes, the experimental current-voltage characteristic of Figure 19 asymptotically displays a piecewise linear behavior with successively a positive slope, a negative one, and finally a positive one. This piecewise linear characteristics is compared to the cubic law [Eq. (28)], which presents the same roots Va , Vb , and 0 but also the same area below the characteristic between 0 and Va . This last conditions leads to β = 1 and R0 = 3.078 K (Morfu, 2002c). An alternative way to realize a perfect cubic nonlinear current is to use a nonlinear voltage source that provides a nonlinear voltage P(u) = βu(u − Va )(u − Vb )/(Va Vb ) + u as shown in Figure 20 (Comte and Marquié, 2003). This polynomial voltage is realized with AD633JNZ multipliers and classical TL081CN operational amplifiers. A resistor R0 ensures a feedback between the input/output of the nonlinear source such that Ohm’s law applied to R0 corresponds to the cubic current in Eq. (28):
P(u) − u = INL (u). R0
(29)
As shown in Figure 21, this second method gives a better agreement with the theoretical cubic law [Eq. (28)].
110
Saverio Morfu et al.
INL(U )
U
R0 P(U )
RNL INL(U )
U
Polynomial generation circuits
FIGURE 20 Realization of a nonlinear resistor with a polynomial generation circuit. β = 10Va Vb .
2 1.5
INL (mA)
1 0.5 0 20.5 21 21.5 22 22 21.5 21 20.5 0 0.5 1 1.5 2 U (Volts) FIGURE 21 Current-voltage characteristics of the nonlinear resistor of Figure 20. Parameters: β = −10 Va Vb , Va = −2 V, Vb = 2 V.
Applying Kirchhoff’s laws, the voltage Un at the nth node of the lattice can be written as
C
0 dUn 1/ = Un+1 + Un−1 − 2Un − INL (Un ), dτ R
(30)
where τ denotes the experimental time and n = 1 . . . N represents the node number of the lattice. Moreover, we assume zero-flux or Neumann boundary conditions, which involves for n = 1 and n = N, respectively,
0 1/ dU1 = U2 − U1 − INL (U1 ), dτ R 0 dUN 1/ C = UN−1 − UN − INL (UN ). dτ R C
(31) (32)
Nonlinear Systems for Image Processing
111
Next, introducing the transformations
Wn =
Un , Vb
D=
R0 αβ, R
t=
τ , R0 αCβ
(33)
yields the discrete Nagumo equation in its normalized form,
/ 0 dWn = D Wn+1 + Wn−1 − 2Wn + f (Wn ). dt
(34)
Therefore, an electronic implementation of the overdamped network presented in Section II.A is realized.
B. Noise Filtering of a One-Dimensional Signal One of the most important problems in signal or image processing is removal of noise from coherent information. In this section, we develop the principle of nonlinear noise filtering inspired by the overdamped systems (Marquié et al., 1998). In addition, using the electrical nonlinear network introduced in Section IV.A, we also present an electronic implementation of the filtering tasks.
1. Theoretical Analysis To investigate the response of the overdamped network to a noisy signal loaded as an initial condition, we first consider the simple case of a constant signal with a sudden change of amplitude. Therefore, we study the discrete normalized Nagumo equation
0 / dWn = D Wn+1 + Wn−1 − 2Wn + f (Wn ), dt
(35)
with f (Wn ) = −Wn (Wn − α)(Wn − 1) in the specific case α = 1/2. Furthermore, the initial condition applied to the cell n is assumed to be uniform for all cells, except for the cell N/2, where a constant perturbation b0 is added; namely:
Wn (t = 0) = V 0
∀n =
WN/2 (t = 0) = V 0 + b0 .
N 2 (36)
The solution of Eq. (35) to the initial condition in Eq. (36) can be expressed with the following form
Wn (t) = Vn (t) + bn (t).
(37)
112
Saverio Morfu et al.
Inserting Eq. (37) in Eq. (35), we collect the terms of order 0 and 1 in with the reductive perturbation methods to obtain the set of differential equations (Taniuti and Wei, 1968; Taniuti and Yajima, 1969):
dVn = D(Vn+1 + Vn−1 − 2Vn ) + f (Vn ) dt dbn = D(bn+1 + bn−1 − 2bn ) − (3Vn2 − 2Vn (1 + α) + α)bn dt
(38) (39)
Assuming that Vn is a slow variable, Eq. (38) reduces to
dVn = f (Vn ), dt
(40)
which provides the response of the system to a uniform initial condition V 0 (see details in Appendix A):
⎛ V(t) =
⎞ 1 2
− 1⎜ ⎟ ⎝1 + # ⎠. t 2 (V 0 − 12 )2 − V 0 (V 0 − 1)e− 2 V0
(41)
Next, to determine the evolution of the additive perturbation, it is convenient to consider a perturbation under the following form:
bn (t) = In (2Dt)g(t),
(42)
where In is the modified Bessel function of order n (Abramowitz and Stegun, 1970). Substituting Eq. (42) in Eq. (39), and using the property of the modified Bessel function,
dIn (2Dt) = D(In+1 + In−1 ), dt
(43)
we obtain straightforwardly
dg = −2Dg − 3Vn2 − 2Vn (1 − α) + α g, dt
(44)
dg = −2Ddt − 3Vn2 − 2Vn (1 − α) + α dt. g
(45)
that is,
113
Nonlinear Systems for Image Processing
Noting that
dVn df (Vn ) = − 3Vn2 − 2Vn (1 − α) + α , dt dt
(46)
and deriving Eq. (40) versus time, we derive
Vn
= − 3Vn2 − 2Vn (1 − α) + α , Vn
(47)
where Vn and Vn
denote the first and second derivative versus time. Combining Eq. (47) and Eq. (45) allows expression of g(t) as:
g(t) = Ke−2Dt
dVn , dt
(48)
where K is an integrating constant. Deriving Eq. (41), we obtain g(t) and thus the evolution of the perturbation:
bn (t) = K
In
(2Dt)e−2Dt e−t/2 8
V 0 V 0 − 12 (V 0 − 1) V0
−
1 2
2
− V 0 (V 0
− 1)e−t/2
3/2
. (49)
Writing bn (t = 0) = b0n provides the value of the integrating constant K. The evolution of the perturbation bn (t) is then ruled by: t
In (2Dt)e−2Dt e− 2 2 t V0 − 12 − V0 (V0 − 1)e− 2
b0 bn (t) = n 8
3 2
.
(50)
Finally, in the case of multiple perturbations, the perturbation at the nth node of the lattice follows as
bn (t) =
b0
t
In −n (2Dt)e−2Dt e− 2 2 8 t V0 − 12 − V0 (V0 − 1)e− 2 n
n
where In −n is the modified Bessel function of order n − n.
3 2
,
(51)
114
Saverio Morfu et al.
Eq. (41) shows that the evolution of the constant background does not depend on the coupling D. By contrast, Eq. (51) shows that the coupling D can be tuned to speed up the diffusion of the perturbation without affecting the constant background. Therefore, in signal-processing context, this property can be used to develop a noise filtering tool validate.
2. Theoretical and Numerical Results In order to validate the theoretical analysis developed in Section IV.B.1, we have solved numerically Eq. (35) using a fourth-order Runge–Kutta algorithm with an integrating time step dt = 10−3 . Moreover, a uniform initial condition V 0 = 0.4 is loaded for all the N = 48 cells of the network except for the 24th cell. Indeed, for this cell, an additive perturbation b0 = 0.2 is superimposed onto the constant background V 0 in order to match exactly the initial condition [Eq. (36)] considered in the theoretical Section IV.B.1. We have investigated the evolution of both the constant background and the perturbation versus time. In Figure 22, the numerical results plotted with (•) signs match with perfect agreement the theoretical results predicted by Eqs. (41) and (51). Moreover, the curve (a) of Figure 22 shows that the constant background given by Eq. (41) is unaffected by the nonlinear systems regardless of the coupling value D. By contrast, the behavior of the system for the
1 0.9 0.8 0.7 W
0.6 0.5
(a)
0.4 0.3
(b)
0.2 0.1
(c)
0 0.4 0.5 0.6 0.2 0.3 Normalized time FIGURE 22 (a) Temporal evolution of a uniform initial condition U 0 = 0.4 applied to the entire network. (b) Temporal evolution of the perturbation applied to the cell n = 24 for D = 0.5 and b0 = 0.2. (c) Temporal evolution of the perturbation applied to the cell n = 24 for D = 5 and b0 = 0.2. Solid line: theoretical expressions of Eqs. (41) and (51); (•) signs: numerical results. 0
0.1
115
Nonlinear Systems for Image Processing
additive perturbation b0 depends on the coupling parameter D (curves (b) and (c) in Figure 22). Indeed, for weak coupling values, namely, D = 0.5, the perturbation slowly decreases and seems to be quasi-unchanged, whereas for D = 5, the curve (c) exhibits a greater decreasing behavior. After time t = 0.4, the perturbation is significantly reduced for D = 5. Therefore, the coupling parameter D can be tuned to speed up the diffusion of the perturbation without disturbing the constant background. Furthermore, the time acts as a parameter that adjusts the filtering of the perturbation. The state of the lattice for two different processing times is shown in Figure 23a and b for the previous coupling values, that is, D = 5 and D = 0.5, respectively. The initial perturbation represented in the dotted line (curve (I)) has almost disappeared for the specific value of the coupling D = 5 and for a processing time t = 2 (Figure 23a, curve (III)). As expected, curve (III) of Figure 23b shows that the perturbation is not filtered for D = 0.5 and for the same processing time t = 2. Furthermore, in both cases the constant background is slowly attracted by the nearest stable state—0 in our case. Note that the spatiotemporal views of Figure 24 also reveal that the noise filtering is performed for D = 5 and a processing time t = 2. Finally, to validate the processing task realized by the overdamped system, we propose to remove the noise from a more complex signal—a noisy sinusoïdal signal. The signal is first sampled with a total number of samples corresponding to the size of the overdamped network, namely, N. Next, a serial to parallel conversion is realized to load the N samples at the nodes of the 1D lattice. Therefore, we are led to consider the distribution of initial
0.7
0.7 0.6
0.6
(I)
0.5
0.5
(I)
Wn
Wn
(II) 0.4 0.3
(III)
0.3 (III)
0.2
0.2
0.1
0.1
0 0
5
10 15 20 25 30 35 40 45 n (a)
(II)
0.4
0 0
5
10 15 20 25 30 35 40 45 n (b)
FIGURE 23 Response of the lattice to a uniform initial condition corrupted by a constant perturbation at two different processing times. (•) signs: numerical results; solid line: theoretical expression in Eq. (51). (a): D = 5; (b): D = 0.5. (I) initial condition for t = 0, (II) state of the lattice for t = 1, (III) state of the lattice for t = 2.
Saverio Morfu et al.
0.6
0.6
0.5
0.5 Wn
Wn
116
0.4
0.4 0.3
0.3
0.2
0.2 40 30 20 n 10
2 0 0
0.5
1 time
40 30 n 20 10
1.5
2 0 0
0.5
1 time
1.5
(b)
(a)
FIGURE 24 Spatiotemporal view of the response of the lattice to the previous initial condition. (a): D = 5; (b): D = 0.5.
conditions of Figure 25a in relation to
!
2n xn = A cos 2π N
" +
1 + ηn , 2
(52)
where ηn is a discrete white gaussian noise of root mean square RMS amplitude σ = 0.15. A and 2/N represent, respectively, the amplitude and the frequency of the coherent signal. First we numerically investigate the response of the network with the coupling D = 0.5. As in the case of a constant background corrupted by a local perturbation, the system is unable to remove the noise from the sinusoidal signal for both processing times presented in Figure 25b and d. By contrast, for the favorable value of the coupling D = 5, the noise is completely filtered at the processing time t = 1 as shown in Figure 25e.
3. Experimental Results To validate the electronic implementation of the nonlinear noise filtering tool, we consider the nonlinear electrical lattice introduced in Section IV.A with the nonlinear resistor of Figure 18b. In order to match the coupling value D = 5 and D = 0.5, the coupling resistor R is set to R = 300 and R = 3 K, respectively. Moreover, all results are presented in normalized units using the transformation Eq. (33) to allow direct comparison with the theoretical analysis of Section IV.B.2. First, we experimentally report in Figure 26 the temporal evolution of the set of initial conditions consisting of a constant signal locally corrupted by a perturbation. As predicted in the theoretical section, the constant background is unaffected regardless of the coupling value (curve (a)), whereas when the coupling is adjusted
Nonlinear Systems for Image Processing
117
1
Wn
0.8 0.6 0.4 0.2 0 0
5
10 15 20 25 30 35 40 45 n
1
1
0.8
0.8
0.6
0.6
Wn
Wn
(a)
0.4 0.2 0 0
0.4 0.2
5
0 0
10 15 20 25 30 35 40 45 n
5
(c)
1
1
0.8
0.8
0.6
0.6
Wn
Wn
(b)
0.4 0.2 0 0
10 15 20 25 30 35 40 45 n
0.4 0.2
5
10 15 20 25 30 35 40 45 n (d)
0 0
5
10 15 20 25 30 35 40 45 n (e)
FIGURE 25 Noise filtering of a one-dimensional signal with an overdamped nonlinear network. (a): Noisy sinusoidal signal sampled and loaded as the initial condition at the nodes of the lattice. σ = 0.15, N = 48, and A = 0.264. (b), (c), (d), and (e) correspond to the filtered signal obtained for the following couples of processing time t and coupling D: (b) (t = 0.4, D = 0.5); (c) (t = 1, D = 0.5); (d) (t = 0.4, D = 5); (e) (t = 1, D = 5).
to its favorable value D = 5, the perturbation can be removed after a normalized processing time t = 0.4 (curve (c)). This result is also confirmed by the spatial response of the system at two different processing times. Indeed, as shown in Figure 27, the state of the lattice for t = 2 and t = 4 provides the signal without the perturbation only if the coupling D is chosen to equal 5. Finally, we propose to filter the noisy sinusoidal signal of Figure 28a. After a processing time t = 0.6, the noise is completely removed for the coupling D = 5 (Figure 28c), which is not the case if the coupling is set to D = 0.5 (Figure 28b). Therefore, with a suitable choice of both processing time and resistor coupling, a noise filtering tool inspired by the
Saverio Morfu et al.
118
1 0.9 0.8 0.7 W
0.6 0.5
(a)
0.4 0.3
(b)
0.2 (c)
0.1 0
0.4 0.5 0.6 0.2 0.3 Normalized time FIGURE 26 (a) Temporal evolution in normalized units of a uniform initial condition W 0 = 0.4 applied to the network. (b) Temporal evolution of the perturbation applied to the cell n = 24 for b0 = 0.2 and D = 0.5 corresponding to a coupling resistor R = 3 K. (c) Temporal evolution of the perturbation applied to the cell n = 24 for b0 = 0.2 and D = 5 corresponding to a coupling resistor R = 300 . C = 33 nF. Nonlinearity parameters β = 1, Vb = 1.12V, Va = 0.545V involving α = 0.49. 0.1
1
1
0.9
0.9
0.8 0.7
(I)
0.6 0.5
(II)
0.4 0.3 0.2
(III)
0.1 0
5
10 15 20 25 30 35 40 45 n (a)
Normalized voltage Wn
Normalized voltage Wn
0
0.8 0.7
(I)
0.6 0.5
(II)
0.4 0.3 0.2
(III)
0.1 0
5
10 15 20 25 30 35 40 45 n (b)
FIGURE 27 Response of the lattice to a uniform initial condition corrupted by a constant perturbation at two different processing times. Parameters: C = 33 nF, Vb = 1.12 V, Va = 0.545 V, α = 0.49. (a): R = 300 that is D = 5; (b): R = 3 K that is D = 0.5. (I) initial condition for t = 0, (II) state of the lattice for t = 2(τ = 0.1 ms), (III) state of the lattice for t = 4 (τ = 0.2 ms).
properties of the nonlinear overdamped network is electronically implemented. Moreover, according to Eq. (33), the processing time could be adjusted by the value of the capacitor C to match real-time processing constraints.
Nonlinear Systems for Image Processing
119
1
Wn
0.8 0.6 0.4 0.2 0 0
5
10 15 20 25 30 35 40 45 n
1
1
0.8
0.8
0.6
0.6
Wn
Wn
(a)
0.4 0.2 0 0
0.4 0.2
5
10 15 20 25 30 35 40 45 n
0 0
5
10 15 20 25 30 35 40 45 n
(b)
(c)
FIGURE 28 Noise filtering of a one-dimensional signal with an electrical nonlinear lattice. (a): Normalized noisy sinusoidal signal given by Eq. (52) loaded as initial condition at the nodes of the lattice. σ = 0.15, N = 48, and A = 0.264. (b): Filtered signal obtained for a processing time t = 0.6 (τ = 92.3 μs) and a coupling D = 0.5 (that is, R = 3 K). (c): Filtered signal obtained for a processing time t = 0.6 (τ = 92.3 μs) and a coupling D = 5 (that is R = 300 ). Parameters: C = 100 nF, β = 1, Vb = 1.12 V, Va = 0.545 V.
C. Two-Dimensional Filtering: Image Processing We now numerically extend the properties of the 1D lattice to a 2D network. Consider a CNN whose cell state Wi, j , representing the gray level of the pixel number i, j, follows the following set of equations:
/ 0 dWi, j Wk, l − Wi, j , = f (Wi, j ) + D dt
i, j = 2 . . . N − 1, 2 . . . M − 1,
(k, l)∈Nr
(53) where Nr = {(i − 1; j), (i + 1, j), (i, j + 1), (i, j − 1)} is the set of the four nearest neighbors, N × M the image size, and f (Wi, j ) represents the nonlinearity. The boundary conditions for the edges of the image express
0 / 0 / dW1, j = f W1, j + D W1, j−1 + W2, j + W1, j+1 − 3W1, j , j = 2..M − 1 dt / 0 / 0 dWN, j = f WN, j + D WN, j−1 + WN−1, j + WN, j+1 − 3WN, j , dt j = 2..M − 1
120
Saverio Morfu et al.
0 / dWi, 1 = f (Wi, 1 ) + D Wi−1, 1 + Wi+1, 1 + Wi, 2 − 3Wi, 1 , i = 2..N − 1 dt 0 / dWi, M = f (Wi, M ) + D Wi−1, M + Wi+1, M + Wi, M−1 − 3Wi, M , dt i = 2..N − 1 while for the image corners, we consider the two nearest neighbors, that is
dW1, 1 dt dWN, M dt dWN, 1 dt dW1, M dt
0 / = f (W1, 1 ) + D W2, 1 + W1, 2 − 2W1, 1 , / 0 = f (WN, M ) + D WN, M−1 + WN−1, M − 2WN, M , / 0 = f (WN, 1 ) + D WN−1, 1 + WN, 2 − 2WN, 1 , 0 / = f (W1, M ) + D W2, M + W1, M−1 − 2W1, M .
1. Noise Filtering The initial condition applied to the cell i, j of the network corresponds to the initial gray level Wi,0 j of the noisy image shown in Figure 29. The image after a processing time t is obtained noting the state Wi, j (t) of all cells of the network at this specific time t (Comte et al., 1998). Figure 30 shows the filtered image obtained at the processing times t = 1, t = 3, t = 6, t = 9 and for the coupling values D = 0.075, D = 0.1, D = 0.2, and D = 0.3, respectively. The bistable behavior of the system established in Section II.A.1 involves a natural evolution of the image toward the two stable states of the system—0 and 1. Thus, as time increases, the image evolves into a black-and-white pattern. Therefore, to achieve correct noise filtering, the coupling parameter and the processing time must be adjusted.
FIGURE 29 Noisy image of the Coliseum.
Nonlinear Systems for Image Processing
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
121
(m) (n) (o) (p) FIGURE 30 Noise filtering of the image represented in Figure 29. (a)−(d). Filtered image obtained for D = 0.075 and for the respective processing times t = 1, t = 3, t = 6, and t = 9. (e)−(h). Filtered image obtained for D = 0.1 and for the respective processing times t = 1, t = 3, t = 6, and t = 9. (i)−(l). Filtered image obtained for D = 0.2 and for the respective processing times t = 1, t = 3, t = 6, and t = 9. (m)−(p). Filtered image obtained for D = 0.3 and for the respective processing times t = 1, t = 3, t = 6, and t = 9.
122
Saverio Morfu et al.
For the lowest coupling value D = 0.075, Figure 30 shows that the noise is not removed before the image is binarized. For the coupling parameter D = 0.2 and D = 0.3, even if the noise is quickly removed, the filtered image becomes blurred for t = 6 and t = 9 (Figure 30k, l, o, p). Therefore, these settings of the coupling parameter are inappropriate. In fact, Figure 30f and g shows the filtered image with the best setting of the coupling and processing time: a coupling D = 0.1 and the processing times t = 3 or t = 6. Indeed, the filtered images are neither blurred nor binarized. Moreover, the system not only removes the noise, it also enhances the contrast of the initial image.
2. Edge Filtering Because of a strong relationship between edge and object recognition, edge detection constitutes one of the most important steps for image recognition. Indeed, scene information often can be interpreted because of the edges. Classical edge detection algorithms are based on a second-order local derivative operator (Gonzalez and Wintz, 1987), whereas nonlinear techniques of edge enhancement are inspired mainly by the properties of reaction-diffusion media (Chua and Yang, 1988; Rambidi et al., 2002). We propose a strategy of edge detection based on the propagation properties of the nonlinear diffusive medium (Comte et al., 2001). The image loaded in the 2D network is the black-and-white picture Figure 31a. We established in Section II.A.2 that a 1D lattice modeled by the Nagumo equation supports kink and anti-kink propagation owing to the bistable nature of the nonlinearity. Indeed, if the nonlinearity threshold parameter α < 1/2, the stable state 1 propagates, while if α > 1/2, the stable state 0 propagates. Therefore, extending this property to a 2D network allows calculation of either erosion for α > 1/2 or dilation for α < 1/2, which are basic mathematical morphology operations, commonly performed in image processing (Serra, 1986). Moreover, if the initial image is subtracted from the image obtained with the network obeying to Eq. (53), we can deduce the contours of the image after a processing time t. Figure 31b shows the contour of a black-and-white image and its profile obtained with this method. The profile of the contour shows that its resolution is ∼10 pixels, which is insufficient to allow good edge enhancement of a more complex image. This poor resolution is mainly attributable to the spatial expansion of the kink that results from the initial condition loaded in the lattice. Since the kink expansion reduces with the coupling, a natural solution consists of lowering the coupling. Unfortunately, the existence of the propagation failure effect provides a lower bound of the coupling D∗ and thus hinders contour detection with good resolution. An alternative solution can be developed by using a nonlinearity that eliminates the propagation failure effect. Indeed, it has been shown for dissipative media (Bressloff and
Nonlinear Systems for Image Processing
123
1
0 0
50
100
150
200
250
150
200
250
(a) 1
0 0
50
100 (b)
1
0 0
50
100
150 200 250 (c) FIGURE 31 Contour detection of a black square in a white background. (a) Initial image and its profile. (b) Edge detection of the object and its profile obtained with the standard cubic nonlinearity [Eq. (5)] with threshold α = 1/3. Processing time t = 4, D = 1. (c) Contour and the corresponding profile obtained with the nonlinearity [Eq. (54)]. Processing time t = 4, D = 1.
Rowlands, 1997) or for systems where both inertia and dissipation are taken into account (Comte et al., 1999) that an inverse method allows definition of a nonlinear function for which exact discrete propagative kinks exist. Especially in the purely dissipative case, such function expresses
f (Wi, j ) = D (1 − a2 /2) − (a0 Wi, j + a1 )2 −
Da2 (a0 Wi, j + a1 ) 1 − (a0 Wi, j + a1 )2
+ 2D(a0 Wi, j + a1 ),
(54)
124
Saverio Morfu et al.
where = 0.5, a2 = 0.9, a0 = 1.483, and a1 = − 0.742 to ensure that the zeros of the nonlinearity remain 0, 1/3, and 1. As expected, when this new nonlinearity is numerically implemented, the resolution of the detected contour in Figure 31c is reduced to 3 pixels. Note that edge enhancement with the nonlinear overdamped network is not restricted to a black-and-white image. Indeed, the concept is based on the propagation properties of the system and can be extended to the case of an image with 256 gray levels. For instance, we propose to show numerically the contour enhancement of Figure 31a by considering the methodology used for the edge detection of the black-and-white picture. The simulation results are summarized in Figure 32 for different processing times in the favorable case of the nonlinear function [Eq. (54)]. It is clear that again the time allows adjustment of the quality of the processing. Indeed, for processing times below t = 1, the edges of the image details are not revealed, whereas for processing times exceeding 1.33, the details begin to disappear. Furthermore, as times increases, the contours of the image are increasingly thinner owing to the propagation mechanism. The best contour enhancement is thus performed when the image details have not yet disappeared and when the enhanced contours remain sufficiently thin. In fact, this situation corresponds to the intermediate processing time t = 1.33 (Figure 32e).
3. Extraction of Regions of Interest As explained in the previous subsections, in the case of cubic nonlinearity, a nonlinearity threshold α = 0.5 allows noise filtering, while considering α = 0.5 provides the contour of an image with poor resolution. Moreover, the nonlinearity f (W) can be determined using an inverse method to optimize the filtering task. Therefore, the choice of the nonlinearity is of crucial interest in developing interesting and powerful image-processing tools. In this section, we go one step further by proposing a new nonlinearity to extract the regions of interest of an image representing the soldering between two rods of metal (Morfu et al., 2007). The noisy and weakly contrasted image of Figure 33 presents four regions of interest: • First, the two rods of metal constitute the background of the image in light gray • The stripe in medium gray at the center of the image represents the “soldered joint” • A white spot corresponds to a “projection” of metal occurring during the soldering of the two rods of metal • A dark gray spot represents a gaseous inclusion inside the soldering joint.
Nonlinear Systems for Image Processing
125
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h) (i) (j) FIGURE 32 Contour enhancement of a an image with 256 gray levels realized with the modified nonlinearity in Eq. (54). (a) Initial image. (b)−(j). Filtered image for the respective processing times t = 0.33, t = 0.66, t = 1, t = 1.33, t = 1.66, t = 2, t = 2.33, t = 2.66, and t = 3.
Saverio Morfu et al.
126
Projection Two roads of metal
Soldered joint 0 0.5 Gaseous inclusion
1
FIGURE 33 Noisy and weakly contrasted image of soldering between two rods of metal. The image histogram is represented at the right.
j 21
0.016
D
D
i 21
Particle
j 11
j
0.012 D
D F(Wij )
D
D
D
i
i 11
D
D
D D
0.008 0.004
D 0
0
Wij
W 0ij
1
(a) (b) FIGURE 34 Mechanical point of view of the bistable overdamped network used for image processing. (a) The pixel with coordinates i, j and gray level Wi, j is analog to an overdamped particle coupled by springs of strength D to its four nearest neighbors. (b) The particle is attracted in one of the two wells of the bistable potential according to the resulting elastic force applied by the four coupled particles.
a. Limit of the Bistable Network. We first discuss the inability of the bistable overdamped network ruled by Eq. (53) to extract the four objects of the image. As explained in Section II.A.1, the bistability is ensured by using the cubic nonlinearity in Eq. (3). According to the mechanical description of the bistable system presented in Section II, a pixel of the image is analog DW to a particle experiencing a double-well potential φ(W) = − 0 f (u)du and coupled to its four nearest neighbors by springs of strength D. As schematically shown in Figure 34, the particle with initial position Wi,0 j is attracted in one of the two wells of potential depending on the competition between the resulting elastic force and the nonlinear force f (Wi, j ). D Wk, l − Wi, j , (55) (k, l)∈Nr
Nonlinear Systems for Image Processing
127
(a) (b) (c) FIGURE 35 Filtered images obtained with the bistable overdamped network described by Eqs. (3) and (53) in the case α = 1/2. Coupling parameter: D = 0.05. Processing times: (a) t = 4; (b) t = 10; (c) t = 3000.
This property of the system allows sufficiently large time, the network is organized near the two stable states set by the nonlinearity, namely, 0 and 1. In image-processing context, it means that the resulting filtered image tends to be an almost black-and-white pattern. Figure 35 confirms this evolution of the filtered image versus the processing time since, when a cubic nonlinearity is considered, a quasi – black-and-white image is obtained at the time t = 3000 (Figure 35c). Note that for none of the proposed processing times was the bistable system able to properly remove the noise and to enhance the contrast of the regions of interest. Indeed, for t = 4 the noise is reduced but the details of the image begin to disappear (Figure 35a). In particular the projection is merged into the background for t = 10, indicating that the bistable nature of the system destroys the coherent information of the initial image (Figure 35b). Therefore, the inability of the overdamped system to extract the regions of interest is directly related to the bistable nonlinear force f (W).
b. The Multistable Network. To solve this problem and to maintain the coherent structure of the image, we introduce a nonlinearity with a multistable behavior. For instance, the following nonlinear force
f (W) = −β(n − 1) sin 2π(n − 1)W
(56)
DW derives from a potential φ(W) = − 0 f (u)du, which presents n wells and a potential barrier height between two consecutive potential extrema defined by β/π. This potential is represented in Figure 36 in the case of n = 5 wells of potential. The multistable behavior of the network obeying to Eq. (53) with the sinusoidal force in Eq. (56) can be established by considering the uncoupled case.
128
Saverio Morfu et al.
F(Wij)
0.02 0
Ⲑ
20.02 0
0.5 Wi,j
1
FIGURE 36 Multistable potential represented for β = 9.82 × 10−2 and n = 5. The potential barrier between two consecutive extrema is β/π.
Setting D = 0 in Eq. (53), we obtain
dWi, j = −β(n − 1) sin 2π(n − 1)Wi, j . dt
(57)
The stability analysis of the system can be performed with the methodology developed in Section II.A.1 by considering the roots of the sinusoidal force in Eq. (56). According to the sign of the derivative of the sinusoidal force, we can straightforwardly deduce that the unstable steady states of the system are given by
Wthk = (2k + 1)/(2(n − 1))
with
k ∈ Z,
(58)
while the stable steady states are defined by
Wk∗ = k/(n − 1)
with
k ∈ Z.
(59)
Eq. (57) is solved in Appendix C to provide the temporal evolution of an overdamped particle experiencing the multistable potential of Figure 36 in the uncoupled case. If k denotes the nearest integer of (n − 1)Wi,0 j , and Wi,0 j the initial position of the particle, the displacement Wi, j (t) of the particle is expressed as
! " 0 −β(n−1)2 2πt / k 1 0 arctan tan π(n − 1)Wi, j e . + Wi, j (t) = π(n − 1) n−1 (60) The multistable behaviour of the system is illustrated in Figure 37, which shows the temporal evolution of a particle submitted to different initial conditions in the range [0; 1]. It is clear that the unstable steady states of the system Wthk act as thresholds, while the stable steady states Wk∗ correspond to attractors. Indeed, the final state of the particle
Nonlinear Systems for Image Processing
129
1
W*5
1 Wth4
0.8
Wth3
W*4
0.6
Wi,j
Wij
W*3
0.4
Wth2
W*2
0.2 Wth1 0 0
0.05
0.1 0.15 0.2 Normalized time t
0.25
0 0.04 0 20.04 F(Wi,j)
W*1
FIGURE 37 Temporal evolution of an overdamped particle experiencing the multistable potential. Parameters: n = 5 and β = 0.25. Solid line: theoretical expression of Eq. (60); open circles: numerical results obtained solving Eq. (57).
depends on the value of the initial condition compared to the thresholds Wthk . In particular if we neglect the transitional regime, the asymptotic behavior of the uncoupled network is reduced to the following rules
if
2k − 1 2k + 1 < Wi,0 j < 2(n − 1) 2(n − 1)
Wi, j (t #→ +∞) =
k . (n − 1)
(61)
Therefore, the asymptotic functioning [Eq. (61)] of the uncoupled network proves the multistable behavior of the system. We now numerically use this multistable feature to extract the regions of interest of the image. In the coupled case, a pixel with initial gray level Wi,0 j can take one of the n possible stable states according to the competition between the sinusoidal force and the resulting elastic force. The specific case n = 5 is shown numerically in Figure 38. Unlike the bistable network, the noise is quickly removed without disturbing the coherent structure of the image consisting of “the projection,” “the gaseous inclusion,” “the background,” and the “soldered joint” (Figure 38a for t = 0.2 and (b) for t = 2). Next, for a sufficiently longer time, namely t = 5000, the image no longer evolves and each defect of the soldering appears with a different mean gray level corresponding to one of the five stable steady states of the system (Figure 38c). An extraction of the interest regions of the image is then performed with this overdamped network.
Saverio Morfu et al.
130
(a) (b) (c) FIGURE 38 Filtered images obtained with the multistable overdamped network described by Eqs. (53) and (56). Nonlinearity parameters: β = 9.82 × 10−2 , n = 5. Coupling parameter: D = 1.6. Processing times: (a) t = 0.2; (b) t = 2; (c) t = 5000. j21 Ui21, j21
i 21
j 11
j Ui21, j R
R
R
R
i
R
R
R
Ui21 ,j11
Ui,j11
Ui,j C
Ui,j21 R
R
RNL
R INL(Ui,j)
i 11 Ui11,j21
Ui11,j
Ui11,j11
FIGURE 39 Electronic sketch of the multistable nonlinear network. R Represents the coupling resistor, C a capacitor, and RNL a nonlinear resistor. INL denotes the nonlinear current and Ui, j the voltage of the cell with coordinates i, j.
c. Electronic Implementation of the Multistable Network. The electronic implementation of the multistable network is realized according to the methodology of Figure 39 by coupling elementary cells with linear resistors. Each elementary cell includes a capacitor in parallel with a nonlinear resistor whose current-voltage characteristics can be approximated by the sinusoïdal law on the range [−2V; 2V]: INL (U) % IM sin(2πU).
(62)
The methodology in the Section IV.A to realize the cubic nonlinearity with a polynomial source can be used to obtain the sinusoidal law in Eq. (62). First, a least-square method at the order 15 allows us to fit the
131
Nonlinear Systems for Image Processing
sinusoidal expression in Eq. (62) by a polynomial law P(U) in the range [−2V; 2V]. This provides the coefficients of the polynomial source P(U) by generating the sinusoidal current
INL (U) = P(U)/R0 .
(63)
2
1.5
0.5
Uth 3
1
1
0
0
Uth 2 20.5
21 20.5
U (Volt)
0.5
Uth1 21.5 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
time (ms)
3 2 1 0 21 22 23 22 21.5
21
22 0
U*5
U(Volt)
Uth 4
2
1.5
The experimental current-voltage characteristics is compared in Figure 40b to the theoretical expression in Eq. (62). The weak discrepancies observed between the theoretical and experimental laws can be reduced by increasing the order of the least-square method. However, enhancing the agreement with the sinusoidal law presents the main disadvantage to considerably increasing the number of electronic components used for the realization of the nonlinear resistor. Nevertheless, at the order 15, the experimental nonlinear current presents nine zeros and its derivative ensures the existence of five stable steady states and four unstable steady states. It is thus not of crucial interest to increase the order of the approximation, provided that the nonlinear resistor exhibits the multistability.
U*4
U*3
U*2
U*1
I NL (mA)
(a) (b) FIGURE 40 (a) Response of an elementary cell of the multistable network to different initial conditions in the uncoupled case. The plot of the theoretical expression [Eq. (60)] in the solid line is compared to the experimental results represented by crosses. (b) Nonlinear current-voltage characteristics. The sinusoidal law [Eq. (62)] in the solid line matches the experimental characteristics shown by plus sign (+). The component values are R0 = 2 K, C = 390 nF, IM = 2mA. The zeros of the sinusoidal current defines the four unstable states Uth1 , Uth2 , Uth3 , and Uth4 , which correspond to thresholds and the five stable steady states U1* , U2* , U3* , U4* , and U5* , which correspond to attractors.
132
Saverio Morfu et al.
Applying the Kirchhoff laws to the electrical network of Figure 39, we deduce the differential equation, which rules the evolution of the voltage Ui, j at the nodes (i, j)
C
dUi, j 1 = −INL (Ui, j ) + (Uk, l − Ui, j ). dτ R
(64)
(k, l)∈Nr
In Eq. (65), Nr = {(i; j − 1), (i; j + 1), (i − 1; j), (i + 1; j)} denotes the neighborhood and τ represents the experimental time. Next, the transformations
τ = tR0 C, β =
IM R0 R0 , , Ui, j = Wi, j (n − 1) − 2 and D = 2 R (n − 1)
(65)
lead to the normalized equation
/ 0 P(Wi, j (n − 1) − 2) dWi, j Wk, l − Wi, j . = +D dt n−1
(66)
(k, l)∈Nr
The normalization is completed by noting that for Wi, j ∈ [0; 1], that is, for Ui, j ∈ [−2V; 2V],
P Wi, j (n − 1) − 2 = −R0 INL Wi, j (n − 1) − 2 % −β(n − 1)2 sin(2π(n − 1)Wi, j ).
(67)
The experimental network described by Eq. (66) appears as an analog simulation of the normalized multistable network used for image processing. Let us finally reveal the multistable behaviour of the elementary cell of the experimental network by investigating its response to different initial conditions in the uncoupled case. In addition, to allow a direct comparison with the theoretical expression (60), all the results are presented in normalized units in Figure 40a. First, we note that the component uncertainties does not explain the observed discrepancies. The poor correlation between the experimental results and the theoretical prediction is allocated to the nonlinearity provided by the nonlinear resistor, which does not exactly follow the sinusoidal law in Eq. (62). Nevertheless, the multistable property of the system is experimentally established. Indeed, there exist four threshold values, Uth1 , Uth2 , Uth3 , and Uth4 that allow determination of the final state of the elementary cell among the five possible stable steady states,
Nonlinear Systems for Image Processing
133
U1∗ , U2∗ , U3∗ , U4∗ , and U5∗ . Therefore, the image-processing task inspired by the multistable property of the system is implemented with the electronic device in Figure 39.
V. CONCLUSION This chapter has reported a variety of image-processing operations inspired by the properties of nonlinear systems. Considering a mechanical analogy, we have split the class of nonlinear systems into purely inertial systems and overdamped systems. Using this original description, we have established the properties of nonlinear systems in the context of image processing. For purely inertial systems, image-processing tasks such as contrast enhancement, image inversion, gray level extraction, or image encryption can be performed. The applications of the nonlinear techniques presented herein are similar to those developed by means of chemical active media (Teuscher and Adamatzky, 2005), even if these last media are rather overdamped than inertial. In particular, the dynamics of the nonlinear oscillators network, which enables contrast enhancement, can also be used to reveal “hidden images.” Indeed, “hidden images” are defined as fragments of a picture with brightness very close to the brightness of the image background. Despite a weak difference of brightness between the hidden image and the image background, our nonlinear oscillators network take advantage of its properties to reveal the hidden image. Another interesting property of this network is that is consecutively reveals fields of the image with increasing or decreasing brightness at different processing times. We trust that this feature, also shared by Belousov–Zhabotinsky chemical media, may have potential applications in image analysis in medicine (Teuscher and Adamatzky, 2005). Finally, the noise effects in this purely inertial network lead to cryptography applications. Unlike classical cryptography devices, built with chaotic oscillators, we have proposed an encryption scheme based on the reversibility of our inertial system. Moreover, the encryption key, which ensures the restoration of the initial data, is the time of evolution of the data loaded in the nonlinear network. Therefore, the main advantage of our device is that it allow an easy change of the encryption key. The properties of strongly dissipative or overdamped systems can also give rise to novel image-processing tools. For instance, we have shown the possibility of achieving noise filtering, edge detection, or extraction of regions of interest of a weakly contrasted picture. With regard to noise filtering applications based on reaction-diffusion media, the processing is based on the transient behavior of the network since the filtered image depends on the processing times. By contrast, the extraction of regions of
134
Saverio Morfu et al.
interest presents the main advantage of independence from the processing time since the filtering is realized when the network reaches a stationary pattern. Therefore, this feature can allow an automatic implementation of the processing task.
VI. OUTLOOKS A. Outlooks on Microelectronic Implementation For each nonlinear processing example, we have attempted to propose an electronic implementation using discrete electronic components. Even if these macroscopic realizations are far from real practical applications, they present the primary advantage of validating the concept of integration of CNN for future development in microelectronics. Indeed, in recent years the market for solid-state image sensors has experienced explosive growth due to the increasing demands for mobile imaging systems, video cameras, surveillance, or biometrics. Improvements in this growing digital world continue with two primary image sensor technologies: charge coupled devices (CCD) and Complementary Metal Oxyde Semiconductor (CMOS) sensors. The continuous advances in CMOS technology for processors and Dynamic Random Access Memory (DRAMs) have made CMOS sensor arrays a viable alternative to the popular CCD sensors. New technologies provide the potential for integrating a significant amount of Very Large Scale Integration (VLSI) electronics into a single chip, greatly reducing the cost, power consumption, and size of the camera (Fossum, 1993; Fossum, 1997; Litwiller, 2001; Seitz, 2000). In past years, most research on complex CMOS systems has dealt with the integration of sensors providing a processing unit at chip level (system-on-chip approach) or at column level by integrating an array of processing elements dedicated to one or more columns (Acosta et al., 2004; Kozlowski et al., 2005; Sakakibara 2005; Yadid-Precht and Belenky, 2003). Indeed, pixel-level processing is generally dismissed because pixel sizes are often too large to be of practical use. However, as CMOS image sensors scale to 0.18-μm processes and below, integrating a processing element at each pixel or group of neighboring pixels becomes feasible. More significantly, using a processing element per pixel offers the opportunity to achieve massively parallel computations and thus the ability to implement full-image systems requiring significant processing such as digital cameras and computational sensors (El-Gamal et al., 1999; Loinaz et al., 1998; Smith et al., 1998). The latest significant progress in CMOS technologies have made possible the realization of vision systems on chip (VSoCs). Such VSoCs are eventually targeted to integrate within a semiconductor substrate the functions of optical sensing, image processing in space and time, high-level processing, and the control of actuators. These chips consist of arrays of mixed-signal processing elements (PEs), which operate
Nonlinear Systems for Image Processing
135
in accordance with single-instruction multiple-data (SIMD) computing architectures. The main challenge in designing a SIMD pixel parallel sensor array is the design of a compact, low-power but versatile and fully programmable processing element. For this purpose, the processing function can be based on the paradigm of CNNs. CNNs can be viewed as a very suitable framework for systematic design of image-processing chips (Roska and Rodriguez-Vazquez, 2000). The complete programmability of the interconnection strengths, its internal image-memories, and other additional features make this paradigm a powerful beginning for the realization of simple and medium-complexity artificial vision tasks (Espejo et al., 1996). Some proof-of-concept chips operating on preloaded images have been designed (Czuni et al., 2001; Rekeczky et al., 1999). Only a few researchers have integrated CNN on real vision chips. As an example, Espejo (Espejo et al., 1998) reports a 64 × 64-pixel programmable computational sensor based on a CNN. This chip is the first fully operational CNN vision-chip reported in the literature that combines the capabilities of image transduction, programmable image-processing, and algorithmic control on a common silicon substrate. It has successfully demonstrated operations such as low-pass image filtering, corner and border extraction, and motion detection. More recently, other studies have focused on the development of CMOS sensors including the CNN paradigm (Carmona et al., 2003; Petras et al., 2003). The chip consists of 1024 processing units arranged into a 32 × 32 grid and contains approximatively 500,000 transistors in a standard 0.5-μm CMOS technology. However, in these pioneering vision chips, the pixel size is often greater than 100 μm × 100 μm. Obviously, these dimensions cannot be considered as realistic dimensions for a real vision chip. A major part of this crucial problem should be resolved in future years by using the newly emerging CMOS technologies. Indeed, CMOS image sensors directly benefit from technology scaling by reducing pixel size, increasing resolution, and integrating more analog and digital functionalities on the same chip with the sensor. We expect that further scaling of CMOS image sensor technology and improvement in their imaging performances will eventually allow the implementation of efficient CNNs dedicated to nonlinear image processing.
B. Future Processing Applications The nonlinear processing tools developed in this chapter are inherited from the properties of homogeneous media. In the case of applications based on the properties of reaction-diffusion media, it is interesting to consider the effects of both nonlinearity and structural inhomogeneities. Indeed, novel properties inspired by biological systems, which are inhomogeneous rather
136
Saverio Morfu et al.
than homogeneous (Keener, 2000; Morfu et al., 2002a; Morfu et al., 2002b; Morfu, 2003), could allow optimizing the filtering tools developed in this chapter. For instance, in Section IV.C.1, the noise removal method based on the homogeneous Nagumo equation provides a blurry filtered image. In addition, it is difficult to extract the edge of the image with an accurate location. Indeed, noting that the contours of the image correspond to steplike profiles, the diffusive process increases the spatial expansion of the contours. To avoid this problem, anisotropic diffusion has been introduced to reduce the diffusive effect across the image contour. This method has been proposed by Perona and Malik (1990) to encourage intraregion smoothing in preference to interregion smoothing. To obtain this property, Perona and Malik replaced the classical linear isotropic diffusion equation
∂I(x, y, t) = div(∇I), ∂t
(68)
by
∂I(x, y, t) = div(g(&∇I&)∇I), ∂t
(69)
to adapt the diffusion with the image gradient. In Eqs. (68) and (69), I(x, y, t) represents the brightness of the pixel located at the spatial position (x, y) for a processing time t, while &∇I& is the gradient amplitude. Moreover, the anisotropy is ensured by the function g(&∇I&) which “stops” the diffusion across the edges. For instance, Perona and Malik considered the function
g(x) =
1 1+
x2 K2
,
(70)
where K is a positive parameter. Noting that when x # → ∞, g(x) # → 0, the effect of anisotropic diffusion is to smooth the original image while the contours are preserved. Indeed, the edge of the image corresponds to brightness discontinuities that lead to strong values of the image gradient (Black et al., 1998). This interesting property of anisotropic diffusion is illustrated in Figure 41. For sake of clarity, the algorithm developed by Perona and Malik is rather extensively detailed in Appendix D and we discuss here only the results obtained by filtering the noisy picture in Figure 41a. Contrary to the isotropic nonlinear diffusion based on the Nagumo equation, the edge of the image remains well localized for all the processing times presented
Nonlinear Systems for Image Processing
(a)
(b)
(c)
(d)
(e)
(f)
137
(g) (h) (i) FIGURE 41 Noise filtering based on anisotropic diffusion. The filtering images are obtained using the algorithm detailed in Appendix D with the parameters dt = 0.01 and K = 0.09. (a) initial image, (b)−(i) images for the respective processing times t = 1, t = 2, t = 3, t = 4, t = 5, t = 6, t = 7, and t = 8.
in Figure 41. However, although the noise seems removed for processing times exceeding t = 5, the contrast of the image is never enhanced. Therefore, anisotropic diffusion and nonlinear diffusion do not share the same weakness and it could be interesting to attempt to circumvent the limitations of this two techniques. For instance, if we compare the continuous Equation (9) of nonlinear diffusion with the anisotropic Equation (69) proposed by Perona and Malik, it is clear that the anisotropy can be introduced into our system via the coupling parameter D. Moreover, with Perona and Malik’s method, the pixel brightness does not directly experience the nonlinearity as in our method. Therefore, the nonlinear noise filtering tool presented in Section IV.C.1 could be more efficient if the interesting properties of anisotropic
138
Saverio Morfu et al.
diffusion were also considered by introducing a coupling law. In particular we expect that the anisotropy preserves the location of the image edges, while the nonlinearity enhances the image contrast and removes the noise in the same time. Finally, we close this chapter by presenting another interesting and nonintuitive phenomenon that occurs in nonlinear systems under certain conditions. This effect, known as the stochastic resonance (SR) effect was introduced in the 1980s to account for the periodicity of ice ages (Benzi et al., 1982). Since then, the SR effect has been widely reported in a growing variety of nonlinear systems (Gammaitoni et al., 1998), where it has been shown that adding an appropriate amount of noise to a coherent signal at a nonlinear system input enhances the response of the system. Detection of subthreshold signal using noise has been proven in neural information process (Longtin, 1993; Nozaki et al., 1999; Stocks and Manella, 2001) and in data transmission fields (Barbay, et al., 2001; Comte and Morfu, 2003; Duan and Abbott, 2005; Morfu et al., 2003; Zozar and Amblard, 2003), as well as information transmission in array such as a stochastic resonator (Báscones et al., 2002; Chapeau-Blondeau, 1999; Lindner et al., 1998; Morfu, 2003). Recent studies have also shown that noise can enhance image perception (Moss et al., 2004; Simonotto et al., 1997), autostereogram interpretation (Ditzinger et al., 2000), human visual perception by microsaccades in the retina (Hongler et al., 2003), and image processing (Vaudelle et al., 1998; Chapeau-Blondeau, 2000; Histace and Rousseau, 2006; Blanchard et al., 2007). The investigation of noise effects in nonlinear systems is undoubtedly of great interest in nonlinear signal processing or in image processing context (Zozar and Amblard, 1999, 2005).
CXY and RXY
0.65 0.6 0.5 0.4 0.3 0.2 0.1 0 0
0.2 0.4 0.6 0.8 1 1.2 1.4 Noise RMS amplitude s (a) (b) FIGURE 42 (a) Initial black-and-white image with p1 = 0.437. (b) Similarity measures from Eqs. (72) and (73) versus the noise RMS amplitude value σ for Vth = 1.1.
Nonlinear Systems for Image Processing
139
We thus propose to present the phenomenon of SR using the methodology exposed in proposed by Chapeau-Blondeau (2000). Moreover, to show a visual perception of the SR effect, we consider the black-and-white image of Figure 42a, where we note the probability p1 to have a white pixel and p0 = 1 − p1 the probability to have a black one. A gaussian white spatial noise ηi, j with RMS amplitude value σ is added in each pixel Ii, j of the initial image. The resulting noisy image is then threshold filtered with a threshold Vth to obtain the image Ib , according to the following threshold filtering rule:
if Ii, j + ηi, j > Vth then Ibi, j = 1 else Ibi, j = 0.
(71)
The similarity between the two images I and Ib can then be quantified by the cross-covariance (Chapeau-Blondeau, 2000)
G
H
/
CIIb = BG
1 20/ 1 20 I − I I b − Ib
HG H, 1 202 / 1 202 I− I Ib − I b
/
(72)
or by
G
H
IIb RIIb = BG HG H , I 2 Ib2 where < . > corresponds to an average over the images. These two similarity measures are defined by
p1 (1 − Fη (Vth − 1)) RIIb = B ! "! " ! " . p1 p1 1 − Fη (Vth − 1) + 1 − p1 1 − Fη (Vth ) and
CIIb =
p1 (1 − Fη (Vth − 1)) − p1 q1 . # 2 2 (p1 − p1 )(q1 − q1 )
(73)
140
Saverio Morfu et al.
with q1 = p1 (1 − Fη (Vth − 1)) + (1 − p1 )(1 − Fη (Vth )) and where Fη is the cumulative distribution function of the noise. In the case of a gaussian white noise of RMS amplitude σ, the cumulative distribution function can be expressed as
! " u 1 1 Fη (u) = + er f √ . 2 2 2σ
(74)
Du In Eq. (74) the error function is defined by er f (u) = √2π 0 exp(−t2 )dt. The two quantities expressed in Eqs. (72) and (73) are plotted versus the RMS noise amplitude σ in Figure 42b, where a resonant-like behavior reveals the standard stochastic resonance signature. Indeed, there exists an optimum amount of noise that maximizes the similarity measures in Eqs. (72) and (73). According to Figure 42b, this optimal noise RMS value is σ = 0.4. To validate the similarity measures, we qualitatively analyze the pictures obtained for different noise amplitudes. It is confirmed in Figure 43 that the noise optimal value σ = 0.4 allows the best visual perception of the Coliseum through the nonlinear systems. Even if the model of human visual perception is more complex than a standard threshold filtering (Bálya et al., 2002), this simple representation is convenient to determine analytically the optimum amount of noise that provides the best visual perception of images via SR. Moreover, the SR phenomenon is shared by a wide class of nonlinear systems, including neural networks that also intervene in the process of image perception. Since neurons are basically threshold devices that are supposed to work in a noisy environment, the interest of considering noise effect seems to be of crucial importance in developing artificial intelligence applications that perfectly mimic the real behavior of nature. Therefore, for the next few decades we trust that one of the most interesting challenges could be completing the description of nonlinear models by including the contribution of noise effects.
(a) (b) (c) (d) FIGURE 43 (a), (b), (c), (d), Threshold filtered image with the rules (71) and a threshold Vth = 1.1 and with white gaussian noise with respective RMS noise amplitude σ = 0.1, σ = 0.4, σ = 0.8, σ = 1.4.
Nonlinear Systems for Image Processing
141
ACKNOWLEDGMENTS S. Morfu thanks J.M. Bilbault and O. Michel for giving him the opportunity to evolve in this fascinating scientific world and extends his appreciation to P.O. Amblard, F. ChapeauBlondeau, and D. Rousseau who have brought specific references to his attention. He is also grateful to his colleague Julien Dubois for useful advice. P. Marquié dedicates this chapter to Arnaud and Julie and thanks J.M. Bilbault and M. Remoissenet. The authors take this opportunity to highlight the technical assistance of M. Rossé for the design of the sinusoidal nonlinear resistor. Finally, S. Morfu warmly dedicates this chapter to Giovanni and Grazia Morfu.
142
Saverio Morfu et al.
APPENDIX A Response of a Cell of an Overdamped Network In the uncoupled case, and for α = 1/2, a particle of displacement W follows
1 dW = −W(W − )(W − 1). dt 2
(A1)
Separating the variables in Eq. (A1) yields
2dW 4dW 2dW − + = −dt, W W − 1/2 W − 1
(A2)
which can be integrated to obtain 1 W(W − 1) = K exp− 2 t , (W − 1/2)2
(A3)
where K is an integration constant. Equation (A3) can be arranged as a second-order equation in W
! W
2
1 − Ke
− 12 t
"
! − W 1 − Ke
− 12 t
"
1 1 − Ke− 2 t = 0. 4
(A4)
Provided that the discriminant is positive, the solutions are given by
1 1 W(t) = ± 2 2
B
1 1
1 − Ke− 2 t
.
(A5)
Assuming that initially the position of the particle is W(t = 0) = W 0 , the integration constant K can be expressed in the form
K=
W 0 (W 0 − 1) (W 0 − 12 )2
.
(A6)
Inserting the constant Eq. (A6) in the solution in Eq. (A5) leads to the following expression of the displacement:
! " |W 0 − 12 | 1 1± # W(t) = . 1 2 (W 0 − 12 )2 − W 0 (W 0 − 1)e− 2 t
(A7)
Nonlinear Systems for Image Processing
143
Assuming that when t # → +∞, the particle evolves to the steady states W = 0 for W 0 < 1/2 and W = 1 for W 0 > 1/2, we finally obtain the displacement of the particle with initial position W 0 as
! " W 0 − 12 1 1+ # . W(t) = 1 2 (W 0 − 12 )2 − W 0 (W 0 − 1)e− 2 t
(A8)
APPENDIX B Recall of Jacobian Elliptic Function We recall here the properties of Jacobian elliptic functions used in Section II.B. These three basic functions—cn(u, k), sn(u, k), and dn(u, k)— play an important role in nonlinear evolution equations and arise from the inversion of the elliptic integral of first kind (Abramowitz and Stegun, 1970):
E u(ψ, k) =
0
ψ
dz , ; 1 − k sin2 z
(B1)
where k ∈ [0; 1] is the elliptic modulus. The Jacobian elliptic functions are defined by
# sn(u, k) = sin(ψ), cn(u, k) = cos(ψ), dn(u, k) = 1 − k sin2 (ψ). (B2) This definition involves the following properties for the derivatives:
d sn(u, k) = cn(u, k)dn(u, k), du d cn(u, k) = −sn(u, k)dn(u, k), du d dn(u, k) = −ksn(u, k)cn(u, k). du
(B3)
Considering the circular function properties, we also have
sn2 (u, k) + cn2 (u, k) = 1.
(B4)
144
Saverio Morfu et al.
Moreover, using the result in Eq. (B2), we obtain the following identity:
dn2 (u, k) + ksn2 (u, k) = 1.
(B5)
APPENDIX C Evolution of an Overdamped Particle Experiencing a Multistable Potential The equation of motion of an overdamped particle submitted to the sinusoidal force in Eq. (56) can be expressed as
dW = −β(n − 1) sin 2π(n − 1)W , dt
(C1)
where W represents the particle displacement. The steady states of the system are deduced from the zeros of the nonlinear force. Using the methodology exposed in Section II.A.1, we can establish that the roots of the nonlinear force correspond alternatively to unstable and stable steady states. If k is an integer, the unstable and stable states of the system are written, respectively, as follows:
Wthk =
k (n − 1)
Wk∗ =
2k + 1 2(n − 1)
k ∈ Z.
(C2)
Separating the variables of Eq. (C1), we obtain
dW
= −β(n − 1)dt.
(C3)
sin 2π(n − 1)W Using the identity sin(2a) = 2 sin a cos a, Eq. (C3) becomes
dW = −β(n − 1)dt. tan [π(n − 1)W] cos2 [π(n − 1)W]
(C4)
Next, considering the derivative of the tangent function in Eq. (C4), yields
1 π(n − 1)
E
t 0
d tan [π(n − 1)W] =− tan [π(n − 1)W]
E
t 0
2β(n − 1)dt.
(C5)
Nonlinear Systems for Image Processing
145
A direct integration of Eq. (C5) gives
tan π(n − 1)W = tan π(n − 1)W 0 e−β(n−1)
2 2πt
,
(C6)
where W 0 denotes the initial position of the particle. Inverting the tangent function provides straightforwardly the solution of Eq. (C1) in the form
W(t) =
! " / 0 k 1 2 arctan tan π(n − 1)W 0 e−β(n−1) 2πt + , π(n − 1) n−1 (C7)
where k is an integer coming from the tangent inversion. Note that from a physical point of view, k must ensure that the particle position evolves toward one of the stable states of the system for a sufficiently long time, that is, when t # → +∞. Indeed, for an initial condition between two consecutive unstable steady states, the asymptotic behavior of the uncoupled network can reduce to the following rule:
if
2k − 1 2k + 1 < W0 < 2(n − 1) 2(n − 1)
W(t # → +∞) =
k (n − 1)
(C8)
This rule can be transformed to yield
if k −
1 1 < (n − 1)W 0 < k + 2 2
W(t # → +∞) =
k (n − 1)
(C9)
Finally, identifying Eq. (C7) with Eq. (C9) when t # → +∞, we deduce that k must be the nearest integer of W 0 (n − 1).
APPENDIX D Perona and Malik Anisotropic Diffusion Algorithm We recall here the algorithm introduced by Perona and Malik to compute their method based on anisotropic diffusion equation. The anisotropic diffusion Eq. (69) can be discretized with the time step dt
146
Saverio Morfu et al.
to obtain
Ist+1 = Ist +
dt g(∇Is, p )∇Is, p . ηs
(D1)
p∈Nr
In Eq. (D1), Ist represents the brightness of the pixel located at the position s in a discrete 2D grid that corresponds to the filtered image after a processing time t. ηs is the number of neighbors of the pixel s, that is, 4, except for the image edge, where ηs = 3 and for the image corners where ηs = 2. The spatial neighborhood of the pixel s is noted Nr. The local gradient ∇Is, p can be estimated by the difference of brightness between the considered pixel s and its neighbor p:
∇Is, p = Ip − Ist ,
p ∈ Nr.
(D2)
Finally, the description of the system is completed by defining the edge stopping function g(x) as the Lorentzian function:
g(x) =
1 1+
x2 K2
,
(D3)
where K is a positive parameter.
REFERENCES Abramowitz, M., and Stegun, I. A. (1970). Handbook of Mathematical Functions. Dover, New York, p. 569. Acosta-Serafini, P., Ichiro, M., and Sodini, C. (2004). A linear wide dynamic range CMOS image sensor implementing a predictive multiple sampling algorithm with overlapping integration intervals. IEEE J. Solid-State Circuits 39, 1487–1496. Adamatzky, A., and de Lacy Costello, B. (2003). On some limitations of reaction-diffusion chemical computers in relation to Voronoi diagram and its inversion, Phys. Lett. A 309, 397–406. Adamatzky, A., de Lacy Costello, B., Melhuish, C., and Ratcliffe, N. (2004). Experimental implementation of mobile robot taxis with onboard Belousov-Zhabotinsky chemical medium. Mater. Sci. Eng. C 24, 541–548. Adamatzky, A., de Lacy Costello, B., and Ratcliffe, N. (2002). Experimental reaction-diffusion preprocessor for shape recognition. Phys. Lett. A 297, 344–352. Agladze, K., Magone, N., Aliev, R., Yamaguchi, T., and Yoshikawa, K. (1977). Finding the optimal path with the aid of chemical wave. Physica D 106, 247–254. Agrawal, G. P. (2002). Fiber-Optic Communication Systems, 3rd ed., Wiley Inter-Science, New York. Arena, P., Basile, A., Bucolo, M., and Fortuna, L. (2003). An object oriented segmentation on analog CNN chip. IEEE Trans. Circ. Syst. I 50, 837–846.
Nonlinear Systems for Image Processing
147
Bálya, D., Roska, B., Roska, T., and Werblin, F. S. (2002). A CNN framework for modeling parallel processing in a mammalian retina. Int. J. Circ. Theor. Appl. 30, 363–393. Blanchard, S., Rousseau, D., Gindre, D., and Chapeau-Blondeau, F. (2007). Constructive action of the speckle noise in a coherent imaging system. Optics Lett. 32, 1983–1985. Barbay, S., Giacomelli, G., and Marin, F. (2001). Noise-assisted transmission of binary information: theory and experiment. Phys. Rev. E 63, 051110/1–9. Báscones, R., Garcìa-Ojalvo, J., and Sancho, J. M. (2002). Pulse propagation sustained by noise in arrays of bistable electronic circuits. Phys. Rev. E 65, 061108/1–5. Beeler, G. W., and Reuter, H. (1997). Reconstruction of the action potentials of ventricular myocardial fibers. J. Physiol. 268, 177–210. Benzi, R., Parisi, G., Sutera, A., and Vulpiani, A. (1982). Stochastic resonance in climatic change. Tellus 34, 10–16. Binczak, S., Comte, J. C., Michaux, B., Marquié, P., and Bilbault, J. M. Experiments on a nonlinear electrical reaction-diffusion line. Electron. Lett. 34, 1061–1062. Black, M. J., Sapiro, G., Marimont, D. H., and Heeger, D. (1998). Robust anisotropic diffusion IEEE Trans. On Image Processing 7, 421–432. Bressloff, P. C., and Rowlands, G. (1997). Exact travelling wave solutions of an “integrable” discrete reaction-diffusion equation. Physica D 106, 255–269. Caponetto, R., Fortuna, L., Occhipinti, L., and Xibilia, M. G. (2003). Sc-CNNs for chaotic signal applications in secure communication systems, Int. J. Bifurcat. Chaos, 13, 461–468. Carmona Galan, R., Jimenez-Garrido, F., Dominguez-Castro, R., Espejo, S., Roska, T., Rekeczky, C., Petras, I., and Rodriguez-Vazquez, A. (2003). A bio-inspired two-layer mixed-xignal flexible programmable chip for early vision. IEEE Trans. Neural Networ. 14, 13131–336. Chapeau-Blondeau, F. (1999). Noise-assisted propagation over a nonlinear line of threshold elements Electr. Lett. 35, 1055–1056. Chapeau-Blondeau, F. (2000). Stochastic resonance and the benefit of noise in nonlinear systems. Lect Notes Phys. 550, 137–155. Chen, H-C., Hung, Y-C., Chen, C-K., Liao, T-L., and Chen, C-K. (2006). Image processing algorithms realized by discrete-time cellular neural networks and their circuit implementation. Chaos Soliton Fract. 29, 1100–1108. Chua, L. O., and Yang, L. (1998a). Cellular neural networks: Theory, IEE Trans. Circ. Syst. 35, 1257–1272. Chua, L. O., and Yang, L. (1988b). Cellular neural networks: Applications. IEE Trans. Circ. Syst. 35, 1273–1290. Chua, L. O. (1998). CNN: A paradigm for complexity (World Scientific Series on Nonlinear Science, Series A, vol. 31). World Scientific Publishing, Singapore. Comte, J. C. (1996). Etude d’une ligne non linéaire de type Nagumo-Neumann. DEA Laboratory report LIESIB, Dijon, France. Comte, J. C., and Marquié, P. (2002). Generation of nonlinear current-voltage characteristics. A general method. Int. J. Bifurc. Chaos 12, 447–449. Comte, J. C., Marquié, P., and Bilbault, J. M. (2001). Contour detection based on nonlinear discrete diffusion in a cellular nonlinear network. Int. J. Bifurc. Chaos 11, 179–183. Comte, J. C., Marquié, P., Bilbault, J. M., and Binczak, S. (1998). Noise removal using a twodimensional diffusion network Ann. Telecom. 53, 483–487. Comte, J. C., Marquié, P., and Remoissenet, M. (1999). Dissipative lattice model with exact travelling discrete kink-soliton solutions: Discrete breather generation and reaction diffusion regime. Phys. Rev. E 60, 7484–7489. Comte, J. C., and Morfu, S. (2003). Stochastic resonance: Another way to retrieve subthreshold digital data. Phys. Lett. A 309, 39–43. Comte, J. C., Morfu, S., and Marquié, P. (2001). Propagation failure in discrete bistable reactiondiffusion systems: Theory and experiments. Phys. Rev. E 64, 027102/1–4.
148
Saverio Morfu et al.
Cuomo, K. M., and Oppenheim, A. V. (1993). Circuit implementation of synchronized chaos with applications to communications. Phys. Rev. Lett. 71, 65–68. Czuni, L., and Sziranyi, T. (2001). Motion segmentation and tracking with edge relaxation and optimization using fully parallel methods in the cellular nonlinear network architecture. Real-Time Imaging 7, 77–95. Dedieu, H., Kennedy, M. P., and Hasler, M. (1993). Chaos shift keying: Modulation and demodulation of a chaotic carrier using self-synchronizing Chua’s circuits. IEEE Trans. Circ. Syst. Part II 40, 634–642. Ditzinger, T., Stadler, M., Strüber, D., and Kelso, J. A. S. (2000). Noise improves threedimensional perception: Stochastic resonance and other impacts of noise to the perception of autostereograms, Phys Rev. E 62, 2566–2575. Duan, F., and Abbott, D. (2005). Signal detection for frequency-shift keying via short-time stochastic resonance. Phys. Lett. A 344, 401–410. Dudek, P. (2006). Adaptive sensing and image processing with a general-purpose pixelparallel sensor/processor array integrated circuit. International Workshop on Computer Architecture for Machine Perception and Sensing, Montreal, 2006, 1–6. El-Gamal, A., Yang, D., and Fowler, B. (1999). Pixel level processing—Why, what and how? Proc. SPIE Electr. Imaging ’99 Conference 3650, 2–13. Erneux, T., and Nicolis, G. (1993). Propagating waves in discrete bistable reaction-diffusion systems. Physica D 67, 237–244. Espejo, S., Rodriguez-Vázquez, A., Carmona, R., and Dominguez-Castro, R. (1996). A 0.8-μm CMOS programmable analog-array-processing vision-chip with local logic and image-memory. European Solid-State Devices and Reliability Conference, Neuchatel, 1996, pp. 276–279. Espejo, S., Dominguez-Castro, R., Linan, G., and Rodriguez-Vázquez, A. (1998). A 64 × 64 CNN universal chip with analog and digital I/O. Proc. 5th IEEE Int. Conf. Electronics, Circuits and Systems, 1998, pp. 203–206. Fife, P. C. (1970). Mathematical Aspects of Reacting and Diffusing Systems (Lecture Notes in Biomathematics, vol. 28). Springer-Verlag, New York. Fisher, R. A. (1937). The wave of advance of advantageous genes. Ann. Eugen. 7, 355–369. Fossum, E. (1993). Active pixel sensors: Are CCDs dinosaurs? Int. Soc. Opt. Eng. (SPIE) 1900, 2–14. Fossum, E. (1997). CMOS image sensor: Electronic camera on a chip. IEEE Trans. Electr. Dev. 44, 1689–1698. Gammaitoni, L., Hänggi, P., Jung, P., and Marchesoniand, F. (1998). Stochastic resonance. Rev. Mod. Phys. 70, 223–282. Gonzalez, R. C., and Wintz, P. (1997). Digital Image Processing, 2nd ed. Addison-Wesley. Henry, D. (1981). Geometric Theory of Semilinear Parabolic Equations. Springer-Verlag, New York. Hirota, R., and Suzuki, K. (1970). Studies on lattice solitons by using electrical networks. J. Phys. Soc. Jpn. 28, 1366–1367. Histace, A., and Rousseau, D. (2006). Constructive action of noise for impulsive noise removal in scalar images. Electr. Lett. 42, 393–395. Holden, A. V., Tucker, J. V., and Thompson, B. C. (1991). Can excitable media be considered as computational systems? Physica D 49, 240–246. Hongler, M., De Meneses, Y., Beyeler, A., and Jacquot, J. (2003). The resonant retina: Exploiting vibrational noise to optimally detect edges in an image. IEEE Trans. Patt. Anal. Machine Intell. 25, 1051–1062. Izhikevitch, E. M. (2007). Dynamical Systems in Neuroscience: The Geometry of Excitability and Bursting. MIT Press, Cambridge, Massachussetts. Jäger, D. (1985). Characteristics of travelling waves along the nonlinear transmission line for monolithic integrated circuits: a review, Int. J. Electron. 58, 649–669.
Nonlinear Systems for Image Processing
149
Julián, P., and Dogaru R. (2002). A piecewise-linear simplicial coupling cell for CNN gray-level image processing. IEEE Trans. Circ. Syst. I 49, 904–913. Keener, J. P. (1987). Propagation and its failure in coupled systems of discrete excitable cells. SIAM J. Appl. Math. 47, 556–572. Keener, J. P. (2000). Homogeneization and propagation in the bistable equation. Physica D 136, 1–17. Kladko, K. (2000). Universal scaling of wave propagation failure in arrays of coupled nonlinear cells. Phys. Rev. Lett. 84, 4505–4508. Kozlowski, L., Rossi, G., Blanquart, L., Marchesini, R., Huang, Y., Chow, G., Richardson, J., and Standley, D. (2005). Pixel noise suppression via SoC management of target reset in a 1920 × 1080 CMOS image sensor. IEEE J. Solid-State Circ. 40, 2766–2776. Kuhnert, L. (1986). A new optical photochemical device in a light-sensitive chemical active medium. Nature 319, 393–394. Kuhnert, L., Agladze K. I., and Krinsky, V. I. (1989). Image processing using light-sensitive chemical waves. Nature 337, 244–247. Kuusela, T. (1995). Soliton experiments in transmission lines. Chao Solitons Fract. 5, 2419–2462. Kwok, H. S., and Tang, W. K. S. (2007). A fast image encryption system based on chaotic maps with finite precision representation. Chaos Solitons Fract. 47, 1518–1529. Lindner, J. F., Chandramouli, S., Bulsara Adi, R., Löcher, M., and Ditto, W. L. (1998). Noise enhanced propagation. Phys. Rev. Lett. 81, 5048–5051. Litwiller, D. (2001). CCD vs. CMOS: Facts and fiction. Photon. Spectra ·(Special Issue)·, 154–158. Loinaz, M., Singh, K., Blanksby, A., Inglis, D., Azadet, K., and Ackland, B. (1998). A 200-mv 3.3-v CMOS color camera IC producing 352 × 288 24-b Video at 30 frames/s. IEEE J. Solid-State Circ. 33, 2092–2103. Longtin, A. (1993). Stochastic resonance in neuron models. J. Stat. Phys. 70, 309–327. Marquié, P., Bilbault, J. M., and Remoissenet, M. (1995). Observation of nonlinear localized modes in an electrical lattice. Phys. Rev. E 51, 6127–6133. Marquié, P., Binczak, S., Comte, J. C., Michaux, B., and Bilbault, J. M. (1998). Diffusion effects in a nonlinear electrical lattice. Phys. Rev. E 57, 6075–6078. Morfu, S. (2002c). Etude des défauts et perturbations dans les réseaux électroniques dissipatifs non linéaires: Applications à la transmission et au traitement du signal. Ph. D. thesis, Laboratory LE2I, Dijon, France. Morfu, S. (2003). Propagation failure reduction in a Nagumo chain. Phys. Lett. A 317, 73–79. Morfu, S. (2005). Image processing with a cellular nonlinear network. Phys. Lett. A 343, 281–292. Morfu, S., and Comte, J. C. A. (2004). Nonlinear oscillators network devoted to image processing. Int. J. Bifurc. Chaos 14, 1385–1394. Morfu, S., Bossu, J., and Marquié, P. Experiments on an electrical nonlinear oscillators network. Int. J. Bifurcat. Chaos, 17, 3535–3538 (2007). To appear. Morfu, S., Bossu, J., Marquié, P., and Bilbault, J. M. (2006). Contrast enhancement with a nonlinear oscillators network. Nonlinear Dynamics 44, 173–180. Morfu, S., Comte, J. C., and Bilbault, J. M. (2003). Digital information receiver based on stochastic resonance. Int. J. Bifurc. Chaos 13, 233–236. Morfu, S., Comte, J. C., Marquié, P., and Bilbault, J. M. (2002). Propagation failure induced by coupling inhomogeneities in a nonlinear diffusive medium. Phys. Lett. A 294, 304–307. Morfu, S., Nekorkin, V. B., Bilbault, J. M., and Marquié, M. (2002). The wave front propagation failure in an inhomogeneous discrete Nagumo chain: Theory and experiments. Phys. Rev. E 66, 046127/1–8. Morfu, S., Nofiélé, B., and Marquié, P. (2007). On the use of multistability for image processing. Phys. Lett. A 367, 192–198. Moss, F., Ward, L. M., and Sannita, W. G. (2004). Stochastic resonance and sensory information processing: A tutorial and review of application. Clin. Neurophysiol. 115, 267–281.
150
Saverio Morfu et al.
Murray, J. D. (1989). Mathematical Biology. Springer-Verlag, Berlin. Nagumo, J., Arimoto, S., and Yoshisawa, S. (1962). An active pulse transmission line simulating nerve axon. Proc. IRE 50, 2061–2070. Nagashima, H., and Amagishi, Y. (1978). Experiments on the Toda lattice using nonlinear transmission line. J. Phys. Soc. Jpn. 45, 680–688. Nozaki, D., Mar, D. J., Grigg, P., and Collins, J. J. (1999). Effects of colored noise on stochastic resonance in sensory neurons. Phys. Rev. Lett. 82, 2402–2405. Occhipinti, L., Spoto, G., Branciforte, M., and Doddo, F. (2001). Defects detection and characterization by using cellular neural networks. IEEE Int. Symposium on Circuits and Systems ISCAS 3, Sydney, Australia, May 6–9, 2001, pp. 481–484. Paquerot, J. F., and Remoissenet, M. (1994). Dynamics of nonlinear blood pressure waves in large arteries. Phys. Lett. A 194, 77–82. Perona, P., and Malik, J. (1990). Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Patt. Anal. Machine Intell. 12, 629–639. Petras, I., Rekeczky, C., Roska, T., Carmona, R., Jimenez-Garrido, F., and RodriguezVazquez, A. (2003). Exploration of spatial-temporal dynamic phenomena in a 32 × 32-cells stored program 2-layer CNN universal machine chip prototype. J. Circ. Syst. Computers 12, 691–710. Rambidi, N. G., Shamayaev, K. E., and Peshkov, G. Y. (2002). Image processing using lightsensitive chemical waves. Phys. Lett. A, 298, 2002, 375–382. Rambidi, N. G., and Yakovenchuk, D. (2001). Chemical reaction-diffusion implementation of finding the shortest paths in a labyrinth. Phys. Rev. E 63, 026607/1–6. Rekeczky, C., Tahy, A., Vegh, Z., and Roska, T. (1999). CNN-based spatio-temporal nonlinear in filtering and endocardial boundary detection in echocardiography. Int. J. Circuit Theory Applicat. 27, 171–207. Remoissenet, M. (1999). Waves Called Solitons: Concepts and Experiments, 3rd rev. enlarged ed., Springer-Verlag, Berlin, p. 284. Roska, T., and Rodriguez-Vazquez, A. (2000). Review of CMOS implementations of the CNN universal machine-type visual microprocessors. ISCAS 2000 IEEE International Symposium on Circuits and Systems, Geneva, 2000, pp. 120–123. Sakakibara, M., Kawahito, S., Handoko, D., Nakamura, N., Higashi, M., Mabuchi, K., and Sumi, H. (2005). A high-sensitivity CMOS image sensor with gain-adaptative column amplifiers/IEEE J. Solid-State Circ. 40, 1147–1156. Scott, A. C. (1970). Active and Nonlinear Wave Propagation in Electronics. Wiley Interscience, New York. Scott, A. (1999). Nonlinear Science: Emergence and Dynamics of Coherent Structures (Oxford Applied and Engineering Mathematics, 8), Oxford University Press, New York. Seitz, P. (2000). Solid-state image sensing. Handbook of Computer Vision and Applications 1, 165–222. Serra, J. (1986). Introduction to mathematical morphology. Comput. Vision Graph. 35, 283–305. Short, K. M., and Parker, A. T. (1998). Unmasking a hyperchaotic communication scheme. Phys. Rev. E 58, 1159–1162. Simonotto, E., Riani, M., Seife, C., Roberts, M., Twitty, J., and Moss, F. (1997). Visual perception of stochastic resonance. Phys. Rev. Lett. 78, 1186–1189. Smith, S., Hurwitz, J., Torrie, M., Baxter, D., Holmes, A., Panaghiston, M., Henderson, R., Murrayn, A., Anderson, S., and Denyer, P. A single-chip 306 × 244-pixel CMOS NTSC video camera. In “Solid-State Circuits Conference, 1998. Digest of Technical Papers. 45th ISSCC 1998 IEEE International,” pp. 170–171. 432, San Francisco. Stocks, N. G., and Manella, R. (2001). Generic noise-enhanced coding in neuronal arrays. Phys. Rev. E 64, 030902/1–4. Taniuti, T., and Wei, C. C. (1968). Reductive perturbation method in nonlinear wave propagation. J. Phys. Soc. Jap. 21, 941–946.
Nonlinear Systems for Image Processing
151
Taniuti, T., and Yajima, N. (1969). Perturbation method for a nonlinear wave modulation. J. Math. Phys. 10, 1369–1372. Tetzlaff. R. (ed). (2002). Cellular Neural Networks and Their Applications, World Scientific Publishing, Singapore. Teuscher, C., and Adamatzky, A (eds). (2005). Proceedings of the 2005 Workshop on Unconventional Computing, From Cellular Automata to Wetware. Luniver Press, Lightning Source. Toda, M. (1967). Wave propagation in anharmonic lattices. J. Phys. Soc. Jpn. 23, 501–506. Udaltsov, V. S., Goedgebuer, J. P., Larger, L., Cuenot, J. B., Levy, P., and Rhodes, W. T. (2003). Cracking chaos-based encryption systems ruled by nonlinear time delay differential equations. Phys. Lett. A 308, 54–60. Vaudelle, F., Gazengel, J., Rivoire, G., Godivier, X., and Chapeau-Blondeau, F. (1998). Stochastic resonance and noise-enhanced transmission of spatial signals in optics: The case of scattering. J. Opt. Soc. Am. B 13, 2674–2680. Venetianer, P. L., Werblin, F., Roska, T., Chua, L. O. (1995). Analogic CNN algorithms for some image compression and restoration tasks. IEEE Trans. Circ. Syst. I 42, 278–284. Yadid-Pecht, O., and Belenky, A. (2003). In-pixel autoexposure CMOS APS. IEEE J. Solid-State Circ. 38, 1425–1428. Yamgoué, S. B., Morfu, S., and Marquié, P. (2007). Noise effects on gap wave propagation in a nonlinear discrete LC transmission line. Phys. Rev E 75, 036211/1–036211/1–7. Yu, W., and Cao, J. (2006). Cryptography based on delayed chaotic neural networks. Phys Lett. A 356, 333–338. Zakharov, V. E., and Wabnitz, S. (1998). Optical solitons: Theoretical challenges and industrial perspectives. Springer-Verlag, Berlin. Zozor, S., and Amblard, P. O. (1999). Stochastic resonance in discrete time nonlinear AR(1) models. IEEE Trans. Signal Proc. 49, 109–120. Zozor, S., and Amblard, P. O. (2003). Stochastic resonance in locally optimal detectors. IEEE Trans. Signal Proc. 51, 3177–3181. Zozor, S. and Amblard P. O. (2005). Noise aidded processing: revisiting dithering in a SigmaDelta quantizer. IEEE Trans. Signal Proc. 53, 3202–3210.
This page intentionally left blank
CHAPTER
4 Complex-Valued Neural Network and Complex-Valued Backpropagation Learning Algorithm Tohru Nitta*
Contents
I Introduction II The Complex-Valued Neural Network A The Complex-Valued Neuron B Multilayered Complex-Valued Neural Network III Complex-Valued Backpropagation Learning Algorithm A Complex-Valued Adaptive Pattern Classifier B Learning Convergence Theorem C Learning Rule IV Learning Speed A Experiments B Factors to Improve Learning Speed C Discussion V Generalization Ability VI Transforming Geometric Figures A Examples B Systematic Evaluation C Mathematical Analysis D Discussion VII Orthogonality of Decision Boundaries in the Complex-Valued Neuron A Mathematical Analysis B Utility of the Orthogonal Decision Boundaries VIII Conclusions References
154 155 155 162 162 162 163 164 169 169 174 175 175 181 182 191 194 203 209 210 211 217 218
* National Institute of Advanced Industrial Science and Technology, Tsukuba Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki, 305-8568 Japan Advances in Imaging and Electron Physics,Volume 152, ISSN 1076-5670, DOI: 10.1016/S1076-5670(08)00604-6. Copyright © 2008 Elsevier Inc. All rights reserved.
153
154
Tohru Nitta
I. INTRODUCTION In recent years there has been much interest in complex-valued neural networks whose parameters (weights and threshold values) are all complex numbers and their applications (Hirose, 2003; ICANN, 2007; ICANN/ICONIP, 2003; ICONIP, 2002; IJCNN, 2006; KES, 2001, 2002, 2003). The applications of complex-valued neural networks cover various fields, including telecommunications, image processing, computer vision, and independent component analysis. Many types of information can be represented naturally in terms of angular or directional variables that can be represented by complex numbers. For example, in computer vision, optical flow fields are represented as fields of oriented line segments, each of which can be described by a magnitude and direction, or a complex number. Thus, it is natural to choose complex-valued neural networks to process information expressed by complex numbers. One of the most popular neural network models is the multilayer neural network and the related backpropagation training algorithm (called real-BP here in the sense of treating real-valued signals) (Rumelhart, et al., 1986a, b). The Real-BP is an adaptive procedure widely used in training a multilayer perceptron for a number of classification applications in areas such as speech and image recognition. The Complex-BP algorithm is a complex-valued version of the Real-BP, which was proposed by several researchers independently in the early 1990s (Benvenuto and Piazza, 1992; Georgiou and Koutsougeras, 1992; Kim and Guest, 1990; Nitta, 1993, 1997; Nitta and Furuya, 1991). The Complex-BP algorithm can be applied to multilayered neural networks whose weights, threshold values, and input and output signals are all complex numbers. This algorithm enables the network to learn complex-valued patterns naturally and has several inherent properties. This chapter, elucidates the Complex-BP proposed in Nitta and Furuya, (1991) and Nitta (1993, 1997) and its properties (Nitta, 1997, 2000, 2003, 2004). The primary contents are as follows. (1) The multilayered complexvalued neural network model and the derivation of the related ComplexBP algorithm are described. The learning convergence theorem for the Complex-BP can be obtained by extending the theory of adaptive pattern classifiers (Amari, 1967) to complex numbers. (2) The average convergence speed of the Complex BP is superior to that of the Real-BP, whereas the generalization performance remains unchanged. In addition, the required number of learnable parameters is only about half√of the Real-BP, where a complex-valued parameter z = x + iy (where i = −1) is counted as two because it consists of a real part x and an imaginary part y. (3) The Complex-BP can transform geometric figures (e.g., rotation, similarity transformation, and parallel displacement of straight lines, circles), whereas the Real-BP cannot. Numerical experiments suggest that the
Complex-Valued Neural Network and Complex-Valued Backpropagation
155
behavior of a Complex-BP network that has learned the transformation of geometric figures is related to the identity theorem in complex analysis. Mathematical analysis indicates that a Complex-BP network that has learned a rotation, a similarity transformation, or a parallel displacement has the ability to generalize the transformation with an error represented by the sine. (4) Weight parameters of a complex-valued neuron have a restriction related to two-dimensional (2D) motion, and learning proceeds under this restriction. (5) The decision boundary of a complex-valued neuron consists of two hypersurfaces that intersect orthogonally and divide a decision region into four equal sections. The Exclusive OR (XOR) problem and the detection of symmetry problem, neither of which can be solved with a single real-valued neuron, can both be solved by a single complexvalued neuron with the orthogonal decision boundaries, revealing the potent computational power of complex-valued neurons. Furthermore, the fading equalization problem can be successfully solved by a single complex-valued neuron with the highest generalization ability. This chapter is organized as follows. Section II describes the multilayered complex-valued neural network model. Section III presents the derivation of the Complex-BP algorithm. Section IV experimentally investigates the learning speed. Section V deals with the generalization ability of the Complex-BP network. Section VI shows how the Complex-BP algorithm can be applied to the transformation of geometric figures and analyzes the behavior of the Complex-BP network mathematically. Section VII is devoted to a theoretical analysis of decision boundaries of the complexvalued neuron model and the constructive proof that some problems can be solved with a single complex-valued neuron with the orthogonal decision boundaries. Section VIII follows with our conclusions.
II. THE COMPLEX-VALUED NEURAL NETWORK This section describes the complex-valued neural network proposed by Nitta and Furuya (1991) and Nitta (1993, 1997).
A. The Complex-Valued Neuron 1. The Model The complex-valued neuron is defined as follows (Figure 1). The input signals, weights, thresholds, and output signals are all complex numbers. The net input Yn to a complex-valued neuron n is defined as:
Yn =
m
Wnm Xm + Vn ,
(1)
156
Tohru Nitta
X1
Wn1 Vn Output fC (z)
neuron n
XN
Yn 5 SWnm Xm 1Vn 5 z
WnN
m
FIGURE 1 A model neuron used in the Complex-BP algorithm. Xm, Yn, Vn, Wnm, z, and fC (z) are all complex numbers. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
where Wnm is the complex-valued weight connecting complex-valued neurons n and m, Xm is the complex-valued input signal from complexvalued neuron m, and Vn is the complex-valued threshold value of neuron n. To obtain the complex-valued output signal, convert the net input Yn into its√real and imaginary parts as follows: Yn = x + iy = z, where i denotes −1. The complex-valued output signal is defined as
fC (z) = fR (x) + ifR (y),
(2)
where fR (u) = 1/(1 + exp(−u)), u ∈ R (R denotes the set of real numbers), that is, the real and imaginary parts of an output of a neuron are the sigmoid functions of the real part x and imaginary part y of the net input z to the neuron, respectively. It is obvious that 0 < Re[ fC ], Im[ fC ] < 1, √ and | fC (z)| < 2. Note that the activation function fC is not a regular complex-valued function because the Cauchy–Riemann equations do not hold: ∂fC (z)/∂x + i∂fC (z)/∂y = (1 − fR (x))fR (x) + i(1 − fR (y))fR (y) = 0, where z = x + iy.
2. The Activation Function In the case of complex-valued neural networks, careful attention should be paid to the choice of activation functions. In the case of real-valued neural networks, an activation function of real-valued neurons is usually chosen to be a smooth (continuously differentiable) and bounded function such as a sigmoidal function. As some researchers have noted (Georgiou and Koutsougeras, 1992; Kim and Adali, 2003; Kuroe et al., 2002; Nitta, 1997), in the complex region, we should recall Liouville’s theorem (e.g., Derrick, 1984), which states that if a function G is regular at all z ∈ C and bounded, then G is a constant function, where C denotes the set of complex numbers. That is, we need to choose either the regularity or the boundedness for an activation function of complex-valued neurons. In the
Complex-Valued Neural Network and Complex-Valued Backpropagation
157
literature (Nitta and Furuya, 1991; Nitta, 1993, 1997), Eq. (2) is adopted as an activation function of complex-valued neurons, which is bounded but nonregular (that is, the boundedness is chosen). A first attempt to extend the real-valued neural network to complex numbers in 1990 was formulated as a complex-valued neuron with a regular complex function
fC (z) =
1 , 1 + exp(−z)
(3)
where z = x + iy, because the regularity of a complex function seemed to be natural and to produce many interesting results. Kim and Guest (1990) independently proposed the complex-valued neuron with this activation function [Eq. (3)]. However, the complex-valued backpropagation algorithm with this regular complex-valued activation function never converged in our experiments. We considered that the cause was nonboundedness of the complex function [Eq. (3)] and decided to adopt bounded functions. Actually, Eq. (3) has the periodic poles at the imaginary axis. The complex function defined in Eq. (2) is valid as an activation function of complex-valued neurons for several reasons. First, although Eq. (2) is not regular (i.e., the Cauchy—Riemann equations do not hold), there is a strong relationship between its real and imaginary parts. Consider a complexvalued neuron with n-inputs, weights wk = wkr + iwki ∈ C(1 ≤ k ≤ n), and a threshold value θ = θ r + iθ i ∈ C. Then, for n input signals xk + iyk ∈ C(1 ≤ k ≤ n), the complex-valued neuron generates
X + iY = fC
! n
wkr
k=1
= fR
+ iwki
/
xk + iyk
0
" r i + θ + iθ
! n
" wkr xk − wki yk + θ r
k=1
+ ifR
! n
wki xk
+ wkr yk
" +θ
i
(4)
k=1
as an output. Both the real and imaginary parts of the right-hand side n of Eq. (4) contain the common variables xk , yk , wkr , wki k=1 , and influence each other via those 4n variables. Moreover, the activation function [Eq. (2)] can process complex-valued signals properly through its amplitude-phase relationship. As described in Nitta (1997), the 1 − n − 1 Complex-BP network with the activation function Eq. (2) can transform
158
Tohru Nitta
geometric figures (e.g., rotation, similarity transformation and parallel displacement of straight lines, circles). This cannot be done without the ability to process signals properly through the amplitude-phase relationship in the activation function. Section VI is devoted to this topic. Second, it has been proved that the complex-valued neural network with the activation function Eq. (2) can approximate any continuous complexvalued function, whereas the one with a regular activation function (for example, fC (z) = 1/(1 + exp(−z)), proposed by Kim and Guest, 1990, and fC (z) = tanh(z) by Kim and Adali, 2003) cannot approximate any nonregular complex-valued function (Arena et al., 1993, 1998). That is, the complex-valued neural network with the activation function Eq. (2) is a universal approximator, and the one with a regular activation function is not a universal approximator. It should be noted that the complex-valued neural network with the regular complex-valued activation function such as fC (z) = tanh(z) with the poles can be a universal approximator on the compact subsets of the deleted neighborhood of the poles (Kim and Adali, 2003). The fact is very important as a theory; unfortunately, the complex-valued neural network for the analysis is not usual, that is, the output of the hidden neuron is defined as the product of several activation functions. Thus, the statement seems to be insufficient compared with the case of a nonregular complex-valued activation function. Therefore, the ability of complex-valued neural networks to approximate complexvalued functions relies heavily on the regularity of activation functions used. Third, the stability of the learning of the complex-valued neural network with the activation function Eq. (2) has been confirmed through some computer simulations (Arena et al., 1993; Nitta, 1993, 1997; Nitta and Furuya, 1991). Finally, the activation function Eq. (2) clearly satisfies the following five properties an activation function H(z) = u(x, y) + iv(x, y), z = x + iy should possess, which Georgiou and Koutsougeras (1992) pointed out: H is nonlinear in x and y. H is bounded. The partial derivatives ux , uy , vx , and vy exist and are bounded. H(z) is not entire. That is, there exists some zo ∈ C such that H(z0 ) is not regular. 5. ux vy is not identically equal to vx uy . 1. 2. 3. 4.
The activation function Eq. (2) is valid for the above reasons.
3. Relationship With the Real-Valued Neuron The following question arises in encountering the complex-valued neuron: Is a complex-valued neuron simply equivalent to the two real-valued neurons? The answer is no.
Complex-Valued Neural Network and Complex-Valued Backpropagation
159
We first confirm the basic structure of the weights of a real-valued neuron. Consider a real-valued neuron with n-inputs, weights wk ∈ R (1 ≤ k ≤ n), and a threshold value θ ∈ R. Let an output function fR : R → R of the neuron be fR (u) = 1/(1 + exp(−u)). Then, for n input signals xk ∈ R (1 ≤ k ≤ n), the real-valued neuron generates
$ fR
n
' wk xk + θ
(5)
k=1
as an output. This may be interpreted as follows: a real-valued neuron moves a point xk on a real line (one dimension) to another point wk xk whose distance from the origin is wk times as long as that of the point xk (1 ≤ k ≤ n), and regarding w1 x1 , . . . , wn xn as vectors w1 x1 , . . . , wn xn , the real-valued neuron adds them, resulting in a 1-dimensional (1D) realn moves the end point of the vector valued vector k=1 wk xk , and finally, n n k=1 wk xk to another point k=1 wk xk + θ (Figure 2). The output value of the real-valued neuron canbe obtained by applying the nonlinear transformation fR to the value nk=1 wk xk + θ. Thus, the real-valued neuron basically administers the movement of points on a real line (1D) and its weight parameters w1 , . . . , wn are completely independent of one another. Next, we examine the basic structure of the weights of the complexvalued neuron. Consider a complex-valued neuron with n-inputs, weights wk = wkr + iwki ∈ C (1 ≤ k ≤ n), and a threshold value θ = θ r + iθ i ∈ C. Then, for n input signals xk + iyk ∈ C (1 ≤ k ≤ n), the complex-valued
0
0
x1 w1x1
xn wn xn
0
n
S
n
S
wk xk
k51
wk xk 1
k51
fR 0 FIGURE 2 An image of the processing in a real-valued neuron. From Nitta, T. (2000). Figure 1 used by permission from Springer Science and Business Media.
160
Tohru Nitta
neuron generates
X + iY = fC
! " n (wkr + iwki )(xk + iyk ) + (θ r + iθ i ) k=1
! " ! " n n r i r i r i (wk xk − wk yk ) + θ + ifR (wk xk + wk yk ) + θ = fR k=1
k=1
(6) as an output. Hence, a single complex-valued neuron with n-inputs is equivalent to two real-valued neurons with 2n-inputs (Figure 3). We shall refer to a real-valued neuron corresponding to the real part X of an output of a complex-valued neuron as a real-part neuron and a real-valued neuron corresponding to the imaginary part Y as an imaginary-part neuron. Note here that
⎤ ⎞ x1 ⎜( ⎟ )⎢ ⎥ ⎜ wr −wi · · · wr −wi ⎢ y1 ⎥ r ⎟ θ X ⎥ ⎜ ⎟ ⎢ n n 1 1 = F⎜ ⎢ ... ⎥ + i ⎟ r i r Y θ ⎥ ⎜ w1i ⎟ ⎢ w1 · · · wn wn ⎝ ⎠ ⎣ xn ⎦ yn ! x1 cos α1 − sin α1 +··· = F |w1 | sin α1 cos α1 y1 r " θ xn cos αn − sin αn + |wn | + i , sin αn cos αn yn θ ⎛
⎡
w xX 5 w yY 5 k
x1 y1
w xX 1
xn yn
k
1
r n
w yX
n
wr k
w xY
w yX 52 wxY 52 wki
w yX w Xx
k
k
(7)
1
x1 w yY 1 y1
X
w xY xn yn
i
Y
n
w yY
n
FIGURE 3 Two real-valued neurons which are equivalent to a complex-valued neuron. From Nitta, T. (2000). Figure 2 used by permission from Springer Science and Business Media.
Complex-Valued Neural Network and Complex-Valued Backpropagation
161
Im wk ak ( xk, yk) Re 0 FIGURE 4 An image of the two-dimensional motion for complex-valued signals. From Nitta, T. (2000). Figure 3 used by permission from Springer Science and Business Media.
where, F(t [x y]) = t [ fR (x) fR (y)], αk = arctan(wki /wkr ) (1 ≤ k ≤ n). In Eq. (7), |wk | means reduction or magnification of the distance between a point cos αk − sin αk (xk , yk ) and the origin in the complex plane, the countersin αk cos αk t r i clockwise rotation by αk radians about the origin, and [θ θ ] translation. Thus, we find that a single complex-valued neuron with n-inputs applies a linear transformation called 2D motion to each input signal (complex number), that is, Eq. (7) basically involves 2D motion (Figure 4). As seen above, a real-valued neuron basically administers the movement of points on a real line (1D), and its weight parameters are completely independent of one another. Conversely, as seen, a complex-valued neuron basically administers 2D motion on the complex plane, and we may also interpret that the learning means adjusting 2D motion. This structure imposes the following restrictions on a set of weight parameters of a complex-valued neuron (Figure 3).
(Weight for the real part xk of an input signal to real-part neuron) = (Weight for the imaginary part yk of an input signal to imaginarypart neuron), (8) (Weight for the imaginary part yk of an input signal to real-part neuron) = − (Weight for the real part xk of an input signal to imaginary(9) part neuron). Learning is carried out under these restrictions. From a different angle, we can find that the real-part neuron and the imaginary-part neuron influence each other via their weights.
162
Tohru Nitta
B. Multilayered Complex-Valued Neural Network A complex-valued neural network consists of such complex-valued neurons described in Section II.A.1. For the sake of simplicity, the networks used in the analysis and experiments of this chapter will have three layers. We use wml for the weight between the input neuron l and the hidden neuron m, vnm for the weight between the hidden neuron m and the output neuron n, θm for the threshold of the hidden neuron m, and γn for the threshold of the output neuron n. Let Il , Hm , and On denote the output values of the input neuron l, the hidden neuron m, and the output neuron n, respectively. Let also Um and Sn denote the internal potentials of the hidden neuron m and the output neuron n, respectively. That is, Um = l wml Il + θm , Sn = m vnm Hm + γn , Hm = fC (Um ), and On = fC (Sn ). Let δn = Tn − On denote the error between the actual pattern On and the target pattern Tn of output neuron n. We define the square error for the 2 pattern p as Ep = (1/2) N n=1 |Tn − On | , where N is the number of output neurons.
III. COMPLEX-VALUED BACKPROPAGATION LEARNING ALGORITHM A. Complex-Valued Adaptive Pattern Classifier The Real-BP is based on the adaptive pattern classifiers model, or APCM (Amari, 1967), which guarantees that the Real-BP converges. This section formulates a complex-valued version of the APCM (called complex APCM) for the Complex-BP by introducing complex numbers to the APCM, which will guarantee that the Complex-BP converges. Let us consider two information sources of complex-valued patterns. Two complex-valued patterns x ∈ C n and y ∈ C m occur from information sources 1 and 2 with the unknown joint probability P(x, y), respectively. We will assume that the number of patterns is finite. Note that the set of pairs of complex-valued patterns {(x,y)} corresponds to the set of learning patterns in neural networks. The purpose of learning is to estimate a complex-valued pattern y that occurs from information source 2 given a complex-valued pattern x that occurred from information source 1. Let z(w, x) : C p × C n → C m be a complex function that provides an estimate of y, where w ∈ C p is a parameter that corresponds to all weights and thresholds in neural networks, and z(w, x) corresponds to the actual output pattern of neural networks. Let r(y , y) : C m × C m → R+ be an error function that represents an error that occurs when we give an estimate y
for the true complex-valued pattern y (R+ denotes the set of nonnegative real numbers). Note that r is a nonnegative real function and not a complex
Complex-Valued Neural Network and Complex-Valued Backpropagation
163
function. We define an average error R(w) as def
R(w) =
x
r(z(w, x), y)P(x, y).
(10)
y
R(w) corresponds to the error between the actual output pattern and the target output pattern of neural networks, and the smaller R(w) is, the better the estimation.
B. Learning Convergence Theorem This section presents a learning algorithm for the complex APCM described in Section III.A and prove that it convergences. The algorithm is a complex-valued version of the probabilistic-descent method (Amari, 1967). We introduce a parameter n for discrete time. Let (xn , yn ) be a complexvalued pattern that occurs at time n. Moreover, we assume that the complex-valued parameter w is modified by
wn+1 = wn + wn ,
(11)
where wn denotes a complex-valued parameter at time n. Equation (11) can be rewritten as follows:
Re[wn+1 ] = Re[wn ] + Re[wn ],
(12)
Im[wn+1 ] = Im[wn ] + Im[wn ],
(13)
where Re[z], Im[z] denote the real and imaginary parts of a complex number z, respectively. By definition, a parameter w is optimal if and only if the average error R(w) is the local or global minimum. Then, the following theorem holds. Theorem 1. Let A be a positive definite matrix. Then, by using the update rules
Re[wn ] = −εA∇ Re r(z(wn , xn ), yn ), Im
Im[wn ] = −εA∇ r(z(wn , xn ), yn ),
(14) n = 0, 1, . . . ,
(15)
the complex-valued parameter w approaches the optimum as near as desired by choosing a sufficiently small learning constant ε > 0 (∇ Re is a gradient operator with respect to the real part of w, and ∇ Im with respect to the imaginary part). Proof. The theory of APCM (Amari, 1967) is applicable to this case. The differences are that w ∈ Rp , x ∈ Rn and y ∈ Rm are real-valued variables in the APCM, whereas w ∈ C p , x ∈ C n , and y ∈ C m are complex-valued
164
Tohru Nitta
variables in the complex APCM that influence z(w, x) : C p × C n → C m , r(y , y) : C m × C m → R+ , and R(w) : C p → R1 . In one training step of the complex APCM, the real part Re[w] ∈ Rp and the imaginary part Im[w] ∈ Rp of the complex-valued parameter w are independently changed according to Eqs. (14) and (15). Thus, the manner of changing the parameter w in both models is identical in the sense of updating reals. Hence, there is no need to take into account the change of w from a real-valued variable to a complex-valued variable. Next, x ∈ C n and y ∈ C m , appear in the functions z(w, x) and r(y , y), which are manipulated only in the form of the mathematical expectation with respect to the complex-valued random variables (x, y); For example, E(x, y) [z(w, x)] and E(x, y) [r(y , y)]. Generally, a complex-valued random variable can be manipulated in the same manner as a real-valued random variable. Hence, we can manipulate the functions z(w, x) and r(y , y), just as the corresponding real functions in the APCM. Therefore, there is no need to change the logic of the proof in Amari’s APCM theory. For reference, we describe below the learning convergence theorem in APCM (Amari, 1967) whose parameters and functions assume real values: x ∈ Rn , y ∈ Rm , w ∈ Rp , z(w, x) : Rp × Rn → Rm , r(y , y) : Rm × Rm → R+ , and R(w) : Rp → R1 . Theorem 2 (Convergence Theorem for APCM (Amari, 1967)). Let A be a positive definite matrix. Then, by using the update rule
wn = − εA∇r(z(wn , xn ), yn ),
n = 0, 1, . . . ,
(16)
the real-valued parameter w approaches the optimum as near as desired by choosing a sufficiently small learning constant ε > 0 (∇ is a gradient operator with respect to w). Amari (1967) has proved that the performance of the classifier in the APCM depends on the constant ε and the components of the positive definite matrix A. Similarly, it is assumed that the constant ε and the components of A influence the performance of the classifier in the Complex APCM (it should be rigorously proved).
C. Learning Rule 1. Generalization of the Real-Valued Backpropagation Learning Algorithm This section, applies the theory of the complex APCM to the threelayered complex-valued neural network defined in Section II.B to derive an updating rule of the Complex-BP learning algorithm.
Complex-Valued Neural Network and Complex-Valued Backpropagation
165
For a sufficiently small learning constant (learning rate) ε > 0 and a unit matrix A, using Theorem 1, it can be shown that the weights and the thresholds should be modified according to the following equations:
vnm = −ε γn = −ε wml = −ε θm = −ε
∂Ep ∂Ep − iε , ∂Re[vnm ] ∂Im[vnm ]
(17)
∂Ep ∂Ep − iε , ∂Re[γn ] ∂Im[γn ]
(18)
∂Ep ∂Ep − iε , ∂Re[wml ] ∂Im[wml ]
(19)
∂Ep ∂Ep − iε . ∂Re[θm ] ∂Im[θm ]
(20)
Equations (17)–(20) can be expressed as:
vnm = H m γn , (21) γn = ε Re[δn ](1 − Re[On ])Re[On ] + iIm[δn ](1 − Im[On ])Im[On ] , (22) (23) wml = I l θm , Re[δn ](1 − Re[On ])Re[On ]Re[vnm ] θm = ε (1 − Re[Hm ])Re[Hm ] n
+ Im[δn ](1 − Im[On ])Im[On ]Im[vnm ] − i(1 − Im[Hm ])Im[Hm ] Re[δn ](1 − Re[On ])Re[On ]Im[vnm ] n
− Im[δn ](1 − Im[On ])Im[On ]Re[vnm ]
,
(24)
where z denotes the complex conjugate of a complex number z. In this connection, the updating rule of the Real-BP is as follows:
vnm = Hm γn ,
(25) n
γn = ε(1 − On )On δ , wml = Il θm , θm = (1 − Hm )Hm
n
(26) (27) vnm γn ,
(28)
166
Tohru Nitta
where δn , Il , Hm , On , vnm , γn , wml , θm are all real numbers. Equations (21)–(24) resemble those of the Real-BP.
2. Geometry of Learning This section clarifies the structure of the learning rule of the Complex-BP algorithm when the three-layered complex-valued neural network defined in Section II.B is used as an example. Let zR , zI be the real part and the imaginary part of the magnitude of change of a learnable parameter z, respectively; that is, zR = Re[z], zI = Im[z]. Then, the learning rule in Equations (21)–(24) can be expressed as:
(
) R vnm I vnm
( =
Re[Hm ]
Im[Hm ]
−Im[Hm ]
Re[Hm ]
(
(
vnm
γnR
( ) R wml I wml
− sin βm cos βm = |Hm |e−iβm γn ,
)
γnI
(
An 0 =ε 0 Bn ( =
R θm I θm
)
(
−Im[Il ]
Re[Il ]
=
0 Dm
=
)
cos φl − sin φl
n
(
)
γnI
,
(29) (30)
) ,
Im[δn ] Im[Il ]
(
Cm 0
Re[δn ]
Re[Il ]
= |Il | (
γnR
γnI )( ) sin βm γnR
cos βm
= |Hm |
)(
)(
(31)
R θm
)
I θm )( ) R sin φl θm
cos φl
I θm
Re[vnm ]
Im[vnm ]
−Im[vnm ]
Re[vnm ]
( cos ϕnm Cm 0 |vnm | 0 Dm − sin ϕnm n
(32)
,
)(
γnR
)
γnI )( ) sin ϕnm γnR
cos ϕnm
γnI
, (33)
where An = (1 − Re[On ])Re[On ], Bn = (1 − Im[On ])Im[On ], Cm = (1 − Re[Hm ])Re[Hm ], Dm = (1 − Im[Hm ])Im[Hm ], βm = arctan(Im[Hm ]/Re[Hm ]), φl = arctan(Im[Il ]/Re[Il ]), and ϕnm = arctan(Im[vnm ]/Re[vnm ]).
Complex-Valued Neural Network and Complex-Valued Backpropagation
167
InEq. (29), |Hm | is a similarity transformation (reduction, magnification), cos βm sin βm and is a clockwise rotation by βm radians around the − sin βm cos βm origin. Thus, Eq. (29) performs the linear transformation called 2D motion. Hence, the magnitude of change in the weight between the hidden and R , vI ) can be obtained via the above linear transoutput neurons (vnm nm formation (2D motion) of (γnR , γnI ), which is the magnitude of change in the threshold of the output neuron (Figure 5). Similarly, the magniR , θ I ) can be tude of change in the threshold of the hidden neuron (θm m obtained by applying the 2D motion concerning vnm (the weight between the hidden and output neurons) to (γnR , γnI ), which is the magnitude of R , wI ) change in the threshold of the output neuron Eq. (33). Finally, (wml ml R , θ I ) can be obtained by applying the 2D motion concerning Il to (θm m Eq. (32). Thus, the error propagation in the Complex-BP is based on the 2D motion. Let us now contrast this with the geometry of the Real-BP. Representing the magnitude of the learnable parameter updates (real number) as a point on a real line (1D), we can interpret vnm as the product of |γn | and Hm [Eq. (25)], (Figure 6). Similarly, the product of |γn | and vnm produces θm
Im
I R (D␥n , D␥n )
m
( Dv nm , Dvnm ) R
Hm 0
I
Re
FIGURE 5 An image of the error backpropagation in the Complex-BP. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Hm Dvnm 0 Dgn FIGURE 6 An image of the error backpropagation in the Real-BP. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
168
Tohru Nitta
[Eq. (28)], and the product of |θm | and Il leads wml [Eq. (27)]. Hence, the error propagation in the Real-BP is based on the 1D motion. Therefore, we can find that extending the Real-BP to complex numbers varies the structure of the error propagation from one to two dimensions. The 2D structure of the error propagation just described also means that the units of learning in the Complex-BP algorithm are complex-valued sigR and vI nals flowing through neural networks. For example, both vnm nm are functions of the real parts (Re[Hm ], Re[On ]) and the imaginary parts (Im[Hm ], Im[On ]) of the complex-valued signals (Hm , On ) flowing through R the neural networks [Eq. (29)]. That is, there is a relation between vnm I and vnm through (Re[Hm ], Re[On ]) and (Im[Hm ], Im[On ]). Similarly, there R and wI [Eq. (32)] and between θ R and are relations between wml m ml I θm [Eq. (33)]. Equation (31) indicates no relation between γnR and γnI . However, one can be determined since Re[On ] is a function of Re[Hm ] and Im[Hm ], because
Re[On ] = fR (Re[Sn ]),
(34)
where
Re[Sn ] =
Re[vnm ]Re[Hm ] − Im[vnm ]Im[Hm ] + Re[γn ].
(35)
m
Similarly, Im[On ] is also a function of Re[Hm ] and Im[Hm ], because
Im[On ] = fR (Im[Sn ]),
(36)
where
Im[Sn ] =
Re[vnm ]Im[Hm ] + Im[vnm ]Re[Hm ] + Im[γn ].
(37)
m
Thus, both γnR and γnI are functions of Re[Hm ] and Im[Hm ]. Hence, there is also a relation between γnR and γnI through Re[Hm ] and Im[Hm ]. Therefore, in the Complex-BP algorithm, both the real part and the imaginary part of learnable parameters are modified as a function of the real part and the imaginary part of complex-valued signals, respectively (Figure 7). From these facts, we can conclude that the complex-valued signals flowing through neural networks are the unit of learning in the Complex-BP algorithm.
Complex-Valued Neural Network and Complex-Valued Backpropagation
A signal flowing through the neural network (complex number)
Real part
i
169
Imaginary part
A magnitude of Real Imaginary change of a i part part learnable parameter (complex number) FIGURE 7 Factors to determine the magnitude of change of learnable parameters. The starting point of an arrow refers to a determination factor of the endpoint. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
IV. LEARNING SPEED This section discusses the learning performance of the Complex-BP algorithm presented in Section III.C.
A. Experiments First, we study the learning speed of the Complex-BP algorithm on a number of examples using complex-valued patterns. Then we compare its performance with that of the Real-BP. We investigate the learning speed in terms of a computational complexity perspective (i.e., time and space complexities). Here, time complexity means the sum of four operations for real numbers, and space complexity is the sum of learnable parameters (weights and thresholds), where a complex-valued parameter w = wR + iwI is counted as 2 because it consists of a real part wR and an imaginary part wI . The average number of learning cycles needed to converge by the Complex-BP algorithm was compared with that of the Real-BP algorithm. In the comparison, the neural network structures such that the time complexity per learning cycle of the Complex-BP was almost equal to that of the Real-BP were used. In addition, the space complexity was also examined. In the experiments, the initial real and imaginary components of the weights and thresholds were chosen to be random numbers between −0.3 and +0.3. The stopping criteria used for learning was
I J N J (p) (p) K |Tn − On |2 = 0.10, p n=1
(38)
170
Tohru Nitta
(p)
(p)
where Tn , On ∈ C denote the desired output value, the actual output value of the neuron n for the pattern p [i.e., the left side of Eq. (38)] denotes the error between the desired output pattern and the actual output pattern; N denotes the number of neurons in the output layer. The presentation of one set of learning patterns to the neural network was regarded as one learning cycle.
1. Experiment 1 First, a set of simple complex-valued learning patterns shown in Table I was used to compare the performance of the Complex-BP algorithm with that of the Real-BP algorithm. We used a 1-3-1 three-layered network for the Complex-BP and a 2-7-2 three-layered network for the Real-BP because their time complexities per learning cycle were almost equal as shown in Table II. In the experiment with the Real-BP, the real component of a complex number was input into the first input neuron, and the imaginary component was input into the second input neuron. The output from the first output neuron was interpreted as the real component of a complex-number;
TABLE I
Learning Patterns (Experiment 1)
Input pattern
Output pattern
0 i 1 1+i
0 1 1+i i
√ i = −1. Reprinted from Nitta, T. (1997). © (1997), with permission from Elsevier.
TABLE II Computational Complexity of the Complex-BP and the Real-BP (Experiment 1) Network
Complex-BP 1-3-1 Real-BP 2-7-2
Time complexity
Space complexity
× and ÷
+ and −
Sum
Weights
Thresholds
Sum
78 90
52 46
130 136
12 28
8 9
20 37
Time complexity is the sum of the four operations performed per learning cycle. Space complexity is the sum √ of the parameters (weights and thresholds), where a complex-valued parameter z = x + iy (where i = −1) is counted as two because it consists of a real part x and an imaginary part y. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
171
the output from the second output neuron was interpreted as the imaginary component. The average convergence of 50 trials for each of the 6 learning rates (0.1, 0.2, . . . , 0.6) was used as the evaluation criterion. Although we stopped learning at the 50,000th iteration, all trials succeeded in converging. The result of the experiments is shown in Figure 8.
2. Experiment 2 Next, we conducted an experiment using the set of complex-valued learning patterns shown in Table III. The learning patterns were defined according to the following two rules: 1. The real part of complex number 3 (output) is 1 if complex number 1 (input) is equal to complex number 2 (input), otherwise it is 0. 2. The imaginary part of complex number 3 is 1 if complex number 2 is equal to either 1 or i; otherwise it is 0. Average of learning cycles 100,000 70,000
Complex-BP
50,000 40,000
Real-BP
30,000 20,000 15,000 10,000 7,000 5,000 4,000 3,000 2,000 1,500 1,000
Learning rate 0.1 0.2 0.3 0.4 0.5 0.6 FIGURE 8 Average of learning speed (a comparison between the Complex-BP and the Real-BP (Experiment 1). Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
172
Tohru Nitta
TABLE III Learning Patterns (Experiment 2) Input pattern
Output pattern
Complex number 1
Complex number 2
Complex number 3
0 0 i i 1 i 1+i 1+i
0 i i 1 1 0 1+i i
1 i 1+i i 1+i 0 1 i
i=
√ −1. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
TABLE IV Computational Complexity of the Complex-BP and the Real-BP (Experiment 2) Network
Complex-BP 2-4-1 Real-BP 4-9-2
Time complexity
Space complexity
× and ÷
+ and −
Sum
Weights
Thresholds
Sum
134 150
92 76
226 226
24 54
10 11
34 65
Time complexity is the sum of the four operations performed per learning cycle. Space complexity is the sum √ of the parameters (weights and thresholds), where a complex-valued parameter z = x + iy (where i = −1) is counted as two because it consists of a real part x and an imaginary part y. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
The experimental task was the same as in Experiment 1 except for the layered network structure; a 2-4-1 three-layered network was used for the Complex-BP, and a 4-9-2 three-layered network was used for the Real-BP because their time complexities per learning cycle were equal, as shown in Table IV. In the experiment with the Real-BP, the real and imaginary components of complex number 1 and the real and imaginary components of complex number 2 were input into the first, second, third, and fourth input neurons, respectively. The output from the first output neuron was interpreted to be the real component of a complex number; the output from the second output neuron was interpreted to be the imaginary component. We stopped learning at the 100,000th iteration. The results of the experiments are shown in Figure 9. For reference, the rate of convergence is shown in Table V.
Complex-Valued Neural Network and Complex-Valued Backpropagation
173
Average of learning cycles 100,000 70,000
Complex-BP
50,000 40,000
Real-BP
30,000 20,000 15,000 10,000 7,000 5,000 4,000 3,000 2,000 1,500 1,000 Learning rate 0.1 0.2 0.3 0.4 0.5 0.6 FIGURE 9 Average of learning speed (a comparison between the Complex-BP and the Real-BP (Experiment 2). Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
TABLE V
Rate of Convergence (Experiment 2) Learning rate
Network
Complex-BP 2-4-1 Real-BP 4-9-2
0.1
0.2
0.3
0.4
0.5
0.6
100 0
96 22
88 64
92 78
90 90
98 100
Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
We can conclude from these experiments that the Complex-BP exhibits the following characteristics in learning complex-valued patterns: the learning speed is several times faster than that of the conventional technique (see Figures 8 and 9), whereas the space complexity (i.e., the number of learnable parameters) is only about the half of Real-BP (see Tables II and IV).
174
Tohru Nitta
B. Factors to Improve Learning Speed This section shows how the structure of the error propagation of the Complex-BP algorithm described in Section III.C.2 improves learning speed. In the learning rule of the real-valued backpropagation /[Eqs. (25)–(28)], 0 (1 − Hm )Hm ∈ R and (1 − On )On ∈ R are the derivative 1 − fR (u) fR (u) of the sigmoid function fR (u) = 1/(1 + exp(−u)), which is the activation function of each neuron. The value of the derivative asymptotically approaches 0 as the absolute value of the net input u to a neuron increases (Figure 10). Hence, as |u| increases to make the / output 0value of a neuron exactly approach 0.0 or 1.0, the derivative 1 − fR (u) fR (u) shows a small value, which causes a standstill in learning. This phenomenon is called getting stuck in a local minimum if it continuously takes place for a considerable length of time, and the error between the actual output value and the desired output value remains large. As is generally known, this is the mechanism of standstill in learning in the Real-BP. On the other hand, two types of derivatives of the sigmoid function appear in the learning rule of the Complex-BP algorithm [Eqs. (21)–(24)]: one is the derivative of the real part of an output function ((1 − Re[On ])Re[On ], (1 − Re[Hm ])Re[Hm ]); the other is that of the imaginary part ((1 − Im[On ])Im[On ], (1 − Im[Hm ])Im[Hm ]). The learning rule of the Complex-BP algorithm basically consists of the two linear combinations
(1 – fR (u))fR (u) 0.30 0.25 0.20 0.15 0.10 0.05 0.10
u –10.0 –5.0 0.0 5.0 10.0 FIGURE 10 Derived function of the sigmoid function fR(u). Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
175
of these derivatives:
α1 (1 − Re[On ])Re[On ] + β1 (1 − Im[On ])Im[On ],
(39)
α2 (1 − Re[Hm ])Re[Hm ] + β2 (1 − Im[Hm ])Im[Hm ],
(40)
where αk , βk ∈ R (k = 1, 2). Note that Eq. (39) has a very small value only when both (1 − Re[On ])Re[On ] and (1 − Im[On ])Im[On ] are very small. Hence, Eq. (39) does not show an extremely small value even if (1 − Re[On ])Re[On ] is very small, because (1 − Im[On ])Im[On ] is not always small in the Complex-BP algorithm (whereas the magnitude of learnable parameter updates inevitably becomes quite small if (1 − On )On ∈ R is quite small in the Real-BP algorithm [Eqs. (25)–(28)]. In this sense, the real factor ((1 − Re[On ])Re[On ], (1 − Re[Hm ])Re[Hm ]) makes up for the imaginary factor ((1 − Im[On ])Im[On ], (1 − Im[Hm ])Im[Hm ]) having an abnormally small value and vice versa. Thus, compared with the updating rule of the Real-BP, the Complex-BP is such that the probability for a standstill in learning is reduced. This indicates that the learning speed of the Complex-BP is faster than that of the Real-BP. We can assume that the structure of reducing standstill in learning by the linear combinations [Eqs. (39) and (40)] of the real component and the imaginary component of the derivative of an output function causes the learning speed of the Complex-BP algorithm on a number of examples using complex-valued patterns described in Section IV.A.
C. Discussion We conducted the experiments on the learning characteristics using the comparatively small number of learning patterns and the comparatively small networks and showed the superiority of the Complex-BP algorithm in terms of a computational complexity perspective in Section IV.A. We believe that the Complex-BP algorithm can be increasingly superior to the Real-BP algorithm when it tackles larger problems with larger networks, such as massive real-world applications. This is because the experimental results in Section IV.A suggest that the difference of the learning speed between the Complex-BP and the Real-BP shown in Experiment 2 is larger than that shown in Experiment 1 where the network size and the number of learning patterns used in Experiment 2 are larger than those used in Experiment 1 (see Figures 8 and 9, Tables I and III). Systematic experiments are needed to clarify this statement.
V. GENERALIZATION ABILITY This section describes the research results on the usual generalization ability of the two sets of the Real-BP and Complex-BP networks and the
176
Tohru Nitta
algorithms. In this connection, the inherent generalization ability of the complex-valued neural network are described in Section VI. The learning patterns and the networks for the Experiments 1 and 2 in Section IV.A were also used to test the generalization ability on unseen data inputs. The learning constant used in these experiments was 0.5. Experiments 1 and 2 in the following correspond to the ones in Section IV.A, respectively.
1. Experiment 1 After training (using a 1-3-1 network and the Complex-BP) with the four training points (see Table I, Figures 11a and b), by presenting the 12 test points shown in Figure 11c, the Complex-BP network generated the points as shown in Figure 12a. Figure 12b shows the case in which the 2-7-2 Real-BP network was used.
2. Experiment 2 After training with the eight training points shown in Table III, Figures 13a and b, the 2-4-1 Complex-BP network formed the set of points as shown in Figure 14a for the eight test points (Figure 13c). The results for the 4-9-2 Real-BP network appear in Figure 14b. Here, we need to know the distances between the input training points and the test points to evaluate the generalization performance of the Real-BP and the Complex-BP. However, Figures 13a and c do not always express the exact distances between the input training points and the test points. To clarify this, for any input training point x = (x1 , x2 ) ∈ C 2 and test point y = (y1 , y2 ) ∈ C 2 , we define a distance measure as def
&x − y&2 = |x1 − y1 |2 + |x2 − y2 |2 = (Re[x1 − y1 ])2 + (Im[x1 − y1 ])2 + (Re[x2 − y2 ])2 +(Im[x2 − y2 ])2
(41)
and show the distances between the input training points and the test points in Table VI using the distance measure [Eq. (41)]. For example, the closest input training point to the test point 6 is 5 (Table VI). The above simulation results clearly suggest that the Complex-BP algorithm has the same degree of generalization performance compared to the Real-BP. It may be stated that the generalization performance will not change according to the network size and the number of the learning patterns (see Figures 12 and 14).
Complex-Valued Neural Network and Complex-Valued Backpropagation
177
Im 1.0
0.0 (a)
3
4
1
2
0.0 Im 1.0
0.0 (b)
4
2
1
3
0.0 Im 1.0
1.0 Re 11
12
7
8
9
10
3
4
5
6
1
2
0.0 (c)
1.0 Re
0.0
1.0 Re FIGURE 11 Learning and test patterns for the comparison of the generalization performance of the Complex-BP and the Real-BP (Experiment 1). A solid circle denotes an input training point, an open circle indicate an output training point, and a solid triangle shows an input test point. (a) Input training points. (b) Output training points. (c) Input test points. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
178
Tohru Nitta
Im 1.0 10 12
9
6
5 2
4 11 8 1 7
3
0.0 (a)
0.0
1.0
Re
Im 10
1.0
6 5 2
9
12
4
1
8
11 3 7
0.0 (b)
0.0
1.0
Re FIGURE 12 Result of the comparison of the generalization performance of the Complex-BP and the Real-BP (Experiment 1). An open square denotes an output test point generated by the Complex-BP, and a solid square an output test point generated by the Real-BP. (a) Complex-BP. (b) Real-BP. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
11i
179
8
5
i
6
3
1
7
4
1
2
0 (a)
0
1
11i
i
Im 1.0 4
5 7
3 6
2
1
0.0 (b)
0.0
8 1.0 Re
11i 6
7
8
5 i
3
4
1
1
2
0 (c)
0
1
i
11i
FIGURE 13 Learning and test patterns for the comparison of the generalization performance of the Complex-BP and the Real-BP (Experiment 2). A solid circle denotes an input training point, an open circle an output training point, and a solid triangle an input test point. (a) Input training points. (b) Output training points. (c) Input test points. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
180
Tohru Nitta
lm 7
1.0
2
1
5
8
3
6
4
0.0 (a)
0.0
1.0 Re
lm 1.0
8 3 7 6 5
4
2
1
0.0 (b)
0.0
1.0 Re FIGURE 14 Result of the comparison of the generalization performance of the Complex-BP and the Real-BP (Experiment 2). An open square denotes an output test point generated by the Complex-BP, and a solid square an output test point generated by the Real-BP. (a) Complex-BP. (b) Real-BP. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
181
TABLE VI Distances between the Input Training Points and the Test Points Input training point
Test point
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
1 2 1 3 2 2 3 3
1 1 2 2 2 3 3 2
1 2 1 1 1 2 1 2
2 2 1 1 2 2 2 1
2 3 1 3 1 1 2 2
2 2 2 2 1 2 2 1
2 1 3 1 1 3 2 2
3 2 3 1 2 2 1 1
The distance-measure &x − y&2 is used for an input training point x ∈ C 2 and a test point y ∈ C 2 , which is defined in Eq. (41). Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
VI. TRANSFORMING GEOMETRIC FIGURES This section shows that the Complex-BP can transform geometric figures in a natural way. We used a 1-6-1 three-layered network, which transformed a point (x, y) into (x , y ) in the complex plane. Although the Complex-BP network generates a value z within the range 0 ≤ Re[z], Im[z] ≤ 1, for the sake of convenience, we present it in the figures below as having a transformed value within the range −1 ≤ Re[z], Im[z] ≤ 1. We also conducted some experiments with a 2-12-2 network with real-valued weights and thresholds to compare the Complex-BP with the Real-BP. The real component of a complex number was input into the first input neuron, and the imaginary component was input into the second input neuron. The output from the first output neuron was interpreted as the real component of a complex number; the output from the second output neuron was interpreted as the imaginary component. The learning constant used in these experiments was 0.5. The initial real and imaginary components of the weights and the thresholds were chosen to be random real numbers between 0 and 1. The experiments described in this section consisted of two parts: a training step, followed by a test step.
182
Tohru Nitta
A. Examples 1. Simple Transformation This section presents the results of the experiments on simple transformation. The training input and output pairs were presented 1,000 times in the training step.
Rotation. In the first experiment (using a 1-6-1 network and the ComplexBP), the training step consisted of learning a set of complex-valued weights and thresholds, such that the input set of (straight line) points (indicated by solid circles in Figure 15a) gave as output, the (straight line) points
lm 1.0
21.0
1.0
Re
21.0
(a)
lm
21.0
1.0
Re
21.0 (b) FIGURE 15 Rotation of a straight line. A solid circle denotes an input training point, an open circle an output training point, a solid triangle an input test point, an open triangle a desired output test point, a solid square an output test point generated by the Real-BP, and an open square an output test point generated by the Complex-BP. (a) Case 1. (b) Case 2. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
183
(indicated by open circles) rotated counterclockwise over π/2 radians around the origin. These complex-valued weights and thresholds were then used in a second (test) step, in which the input points lying on two straight lines (indicated by solid triangles in Figures 15a and b) would hopefully be mapped to an output set of points lying on the straight lines (indicated by open triangles) rotated counterclockwise over π/2 radians around the origin. The actual output test points for the Complex-BP did, indeed, lie on the straight lines (indicated by open squares). It appears that the complex-valued network has learned to generalize the transformation of each point Zk (= rk exp[iθk ]) into Zk exp[iα](= rk exp[i(θk + α)]) (i.e., the angle of each complex-valued point is updated by a complex-valued factor exp[iα]), but the absolute length of each input point is preserved. To compare performance of a real-valued network, the 2-12-2 (realvalued) network mentioned above was trained using the linear pairs of points, that is, the (input) solid circles and (desired output) open circles of Figure 15. The solid triangle points of Figure 15 were then input with this real-valued network. The outputs were the solid squares. Obviously, the Real-BP did not preserve each input point’s absolute length. All points were mapped onto straight lines, as shown in Figure 15. In the above experiments, the 11 training input points lay on the line y = −x + 1 (0 ≤ x ≤ 1) and the 11 training output points lay on the line y = x + 1 (−1 ≤ x ≤ 0). The 13 test input points lay on the lines y = 0.2 (−0.9 ≤ x ≤ 0.3) (Figure 15a) and y = −x + 0.5 (0 ≤ x ≤ 0.5) (Figure 15b). The desired output test points should lie on the lines x = − 0.2 and y = x + 0.5. Next, we performed an experiment on rotation of the word ISO, which consisted of three characters (Figure 16). The training set of points was as follows: the input set of points lay on the slanted character I (indicated by solid circles in Figure 16a), and the output set of points lay on the vertical (straight) character I (indicated by open circles). The angle between the input points and the output points was π/4 radians. In a test step, we gave the network some points (indicated by solid triangles in Figures 16b and c) on two slanted characters S and O as the test input points. The Complex-BP rotated the slanted characters S and O counterclockwise over π/4 radians around the origin, whereas the Real-BP destroyed them (see Figure 16).
Similarity Transformation. We examined a similarity transformation with scaling factor α = 1/2 from one circle x2 + y2 = 1 to another circle x2 + y2 = 0.52 (Figure 17a). The training step consisted of learning a set of complexvalued weights and thresholds, such that the input set of (straight line) points (indicated by solid circles in Figure 17a) provided as output the half-scaled straight line points (indicated by open circles). In a second (test) step, the input points lying on a circle (indicated by solid triangles) would hopefully be mapped to an output set of points (indicated by open
184
Tohru Nitta
lm 1.0
Re
21.0
1.0
21.0
(a)
Im 1.0
Re
21.0
1.0
21.0
(b)
Im 1.0
21.0
1.0
Re
21.0 (c) FIGURE 16 Rotation of the word ISO. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15. (a) Learning pattern I. (b) Test pattern S. (c) Test pattern O. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
Im
185
Im
1.0
1.0
21.0
(a)
1.0
Re 21.0
21.0
1.0
Re
21.0
(b) Im 1.0
21.0
1.0
Re
21.0 (c) FIGURE 17 Similarity transformation. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15. (a) Reduction of a circle. (b) Reduction of a curved line. (c) Magnification of a square. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
triangles) lying on a half-scaled circle. The actual output test points for the Complex-BP did, indeed, lie on the circle (indicated by open squares). It appears that the complex-valued network has learned to generalize the transformation of each point Zk (= rk exp[iθk ]) into αZk (= αrk exp[iθk ]): that is, the absolute length of each complex-valued point is shrunk by a real-valued factor α, but the angle of each input point is preserved. To compare the performance of a real-valued network, the real-valued network was trained using the linear pairs of points—the (input) solid circles and (desired output) open circles of Figure 17a. The solid triangle points of Figure 17a were then input with this real-valued network. The outputs were the solid squares. Obviously, the Real-BP did not preserve each input point’s angle. All angles were mapped onto a straight line, as shown in Figure 17a.
186
Tohru Nitta
We also conducted an additional experiment. Figure 17b shows the result of the responses of the networks to presentation of the points on an arbitrary curved line. The curved line was halved by the Complex-BP, holding its shape, as in the case of the circle, whereas a similar response did not occur in the Real-BP. In these two experiments, the 11 training input points lay on the line y = x (0 ≤ x ≤ 1), and the 11 training output points lay on the line y = x (0 ≤ x ≤ 0.5). In the case of Figure 17a, the 12 test input points lay on the circle with equation x2 + y2 = 1, and the desired output test points should lie on the circle with equation x2 + y2 = 0.52 . In addition, we conducted an experiment on the magnification of a square. The 11 training input points (indicated by solid circles in Figure 17c) lay on the line y = x (0 ≤ x ≤ 0.3), and the training output points (indicated by open circles) lay on the straight line y = x (0 ≤ x ≤ 0.99), which could be generated by magnifying the line y = x (0 ≤ x ≤ 0.3) with a scale magnification factor of 3.3. For a square whose side was 0.3 (indicated by solid triangles), the Complex-BP generated a square whose side was nearly 1.0 (indicated by open squares), whereas the Real-BP generated points (indicated by solid squares) on the straight line y = x.
Parallel displacement. Figure 18a shows the results of an experiment on parallel displacement of a straight line. The training points used in the experiment were as follows: the input set of (straight line) points (indicated by solid circles in Figure 18a) yielded as output the straight line points displaced in parallel (indicated by open circles). The distance of the par√ allel displacement was 1/ 2, and the direction was a −π/4-radian angle.
Im
Im 1.0
1.0
21.0
1.0
Re
21.0
1.0
21.0
Re
21.0 (a) (b) FIGURE 18 Parallel displacement. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15. (a) Straight line. (b) Curved line. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
187
In a test step, the input points lying on a straight line (indicated by solid triangles in Figure 18a) would hopefully be mapped to an output set of points (indicated by open triangles) lying on a straight line displaced in parallel. The actual output test points for the Complex-BP did, indeed, lie on the straight line (indicated by open squares). It appears that the complex-valued network has learned to generalize the transformation of each point Zk into Zk + α, where α is a complex number. To compare how a real-valued network would perform, the 2-12-2 realvalued network was trained using the linear pairs of points,—the (input) solid circles and (desired output) open circles of Figure 18a. The solid triangle points of Figure 18a were then input with this real-valued network. The outputs were the solid squares. Obviously, the Real-BP did not displace them in parallel. In the above experiments, the 11 training input points lay on the line y = x + 1 (−1 ≤ x ≤ 0), and the 11 training output points lay on the line y = x (−0.5 ≤ x ≤ 0.5). The 11 test input points lay on the straight line y = x (−0.5 ≤ x ≤ 0.5). The desired output test points should lie on the straight line y = x − 1 (0 ≤ x ≤ 1). We also conducted an experiment on parallel displacement of an arbitrary curved line. As shown in Figure 18b, only the Complex-BP moved it in parallel.
2. Complex Transformation This section shows that the Complex-BP can perform more complicated transformation. The following experiments were conducted under the same conditions as the previous experiments in Section VI.A.1 except the number of iterations (i.e., the training input and output pairs) were presented 7,000 times in the training step.
Rotation. First, we conducted an experiment using two rotation angles: π/4 and π/2 radians. Figure 19 shows how the training points mapped onto each other. Those (input) points lying along the straight line (indicated by solid circles with superscript 1), mapped onto points lying along the straight line (indicated by open circles with subscript 1) (denoted as Learning Pattern 1), and those (input) points lying along the straight line (indicated by solid circles with subscript 2), mapped onto points lying along the straight line (indicated by open circles with subscript 2) (denoted as Learning Pattern 2). In the test step, by presenting the points lying along the six straight lines (indicated by solid triangles with subscripts or superscripts 1–6), the actual output points (indicated by open squares with subscripts or superscripts 1–6) yielded the pattern as shown in Figure 19, where subscript
188
Tohru Nitta
Im Learning Pattern 1
1
1.0
1
/2 radians
1
2
1
2 2 2 2
1 11 6 6
5
5
5
1
1
1
21.0
1 1 6 1 6 1 1 6 6 1 6 1 6 6 5 5 5 5
1
2 2
1
1 1
3
1
2
1
1 1
3
3
3
3 2 1 1 3 21 3 1 3 11 3 5 5 5 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 14 2 3 3 3 4 2 1.0 3 4 4 3 4 652 2 3 4 4 3 4 6 2 4 3 6 2 2 3 4 6 2 3 4 4 5 2 3 2 4 6 2 2 4 6 2 5 6 2 4 6 2 2 4 6 2 5 6 4 5 2 4 5 /4 radians 4 2 4 5 5 5 21.0 2 1
Re
Learning Pattern 2
FIGURE 19 Two rotations: π/4 and π/2 radians. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15.
or superscript denoted a (test) pattern number; for example, a (test) point with subscript 6 belonged to Test Pattern 6. It appears that this complex-valued network has learned to generalize the rotation factor α as a function of the angle θ in the second and third quadrants: a point Zk (= rk exp[iθk ]) is transformed into another point Zk exp[iα(θk )] (= rk exp[i(θk + α(θk ))]), where α(θk ) ≈ π/2 for θk in the second quadrant, and α(θk ) ≈ π/4 for θk in the third quadrant. However, the opposite paradoxically holds true in the first and fourth quadrants. Note that the (input) points (Test Patterns 2 and 5) lying along the borderline of the two input learning patterns were rotated nearly counterclockwise in a 3π/8 = (π/2 + π/4)/2 radians arc about the origin. In the above experiment, the 11 training input points lay along the line x = 0 (0 ≤ y ≤ 1) and the 11 training output points lay along the line y = 0 (−1 ≤ x ≤ 0) (Learning Pattern 1); the 11 training input points lay along the line x = 0 (−1 ≤ y ≤√0) and the 11 training output points lay along the line y = −x (0 ≤ x ≤ 1/ 2) (Learning Pattern 2). The 66 test input points lay along the six lines y = x (0 ≤ x ≤ 0.5) (Test Pattern 1), y = 0 (0 ≤ x ≤ 1) (Test Pattern 2), y = −x (0 ≤ x ≤ 0.5) (Test Pattern 3), y = x (−0.5 ≤ x ≤ 0) (Test Pattern 4), y = 0 (−1 ≤ x ≤ 0) (Test Pattern 5), and y = −x (−0.5 ≤ x ≤ 0) (Test Pattern 6).
Complex-Valued Neural Network and Complex-Valued Backpropagation
189
Im Learning Pattern 1 1.0
1 1 1 1 1
0.5
1 1 1 1
21.0
1 1 1 2 1 2 2 1 1 22 2
20.5
1
1
1
1
1 1
0.5
1.0
Re
2 2 2 2 2
20.5
2 2 2 2
Learning Pattern 2
21.0
Border Line
FIGURE 20 Two similarity transformations: 0.1 and 0.5 similitude ratios. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15.
Similarity Transformation. Next we examined how a complex-valued neural network that learned two similitude ratios performed. Figure 20 shows how the training points mapped onto each other. Those points lying northeast of the borderline mapped onto points along the same line, but with a scale reduction factor of 2. Those points lying southwest of the borderline mapped onto points along the same line, but with a scale reduction factor of 10. In the test step, by presenting the points lying on the outer circle (indicated by solid triangles in Figure 20), the actual output points (indicated by open squares) formed the pattern as shown in the figure. It appears that this complex-valued network has learned to generalize the reduction factor α as a function of the angle θ: Zk (= rk exp[iθk ]) is transformed into α(θk )Zk (= α(θk )rk exp[iθk ]), where α(θk ) ≈ 0.5 for θk northeast of the borderline, and α(θk ) ≈ 0.1 for θk southwest of the borderline. However, angles are preserved for each input point.
190
Tohru Nitta
In the above experiment, the 11 training input points lay along the line y = x (0 ≤ y ≤ 1) and the 11 training output points lay along the line y = x (0 ≤ x ≤ 0.5) (Learning Pattern 1); the 11 training input points lay along the line y = x (−1 ≤ y ≤ 0) and the 11 training output points lay along the line y = x (−0.1 ≤ x ≤ 0) (Learning Pattern 2). The 24 test input points lay on the circle x2 + y2 = 1.
Parallel Displacement. Finally, we performed an experiment on downward parallel displacement. Figure 21 shows the training points mapped onto each other. Those (input) points lying along the straight line indicated by solid circles with superscript 1, mapped onto points lying along the straight line indicated by open circles with superscript 1, where the distance the network should learn was 0.4 (Learning Pattern 1). Those input points lying along the straight line indicated by solid circles with subscript 2 mapped onto points lying along the straight line indicated by open circles with subscript 2, where the distance the network should learn was 0.8 (Learning Pattern 2). In the test step, by presenting the points lying along the two straight lines (indicated by solid triangles with superscripts 1 and 2), the actual output points (indicated by open squares with superscripts 1 and 2) took the Im
Learning Pattern 1
1 1 1 1 1 1 1 1 1 1 1
1.0 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1
1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
Test Pattern 1 2 2 2 2 2 2 2 2 2 2 2 111 1 1
Learning Pattern 2
0.5 2 2
2 2 2 2 2 2
1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2
1 1
2 2
1 1 1 11
Test Pattern 2
2 2 2 2 2 2 2 2 2 2
Re 21.0
20.5
2 22 2 2
2
0.5
2 2 2 2 2 2 2 2 2
2
1.0 2
2 2 22
20.5 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2
21.0 FIGURE 21 Two parallel displacements. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15.
Complex-Valued Neural Network and Complex-Valued Backpropagation
191
pattern as shown, where the superscript denoted a (test) pattern number; for example, a (test) point with superscript 2 belonged to Test Pattern 2. It appears that this complex-valued network has learned to generalize the parallel displacement factor α as a function of a distance d from real axis—a point Zk is transformed into another point Zk + α(dk ), where α(dk ) ≈ −0.4i for dk ≈ 1.0, and α(dk ) ≈ −0.8i for dk ≈ 0.2. In the above experiment, the 21 training input points lay along the line y = 1 (−1 ≤ x ≤ 1) and the 21 training output points lay along the line y = 0.6 (−1 ≤ x ≤ 1) (Learning Pattern 1); the 21 training input points lay along the line y = 0.2 (−1 ≤ x ≤ 1) and the 21 training output points lay along the line y = −0.6 (−1 ≤ x ≤ 1) (Learning Pattern 2). The 42 test input points lay along the two lines y = 0.8 (−1 ≤ x ≤ 1) (Test Pattern 1), and y = 0.2 (−1 ≤ x ≤ 1) (Test Pattern 2).
B. Systematic Evaluation This section reports a systematical investigation of the generalization ability of the Complex-BP algorithm on the transformation of geometric figures. In the experiments (using 1-1-1 and 1-6-1 networks, and the ComplexBP), training input and output pairs were as follows: 1. Rotation. The input set of (straight line) points (indicated by solid circles in Figure 22a) gave as output the (straight line) points (indicated by open circles) rotated counterclockwise over π/2 radians. 2. Similarity transformation. The input set of (straight line) points (indicated by solid circles in Figure 22b) gave as output the half-scaled straight line points (indicated by open circles). 3. Parallel displacement. The input set of (straight line) points (indicated by solid circles in Figure 22c) gave as output the straight line points displaced in parallel (indicated by √ open circles), where the distance of the parallel displacement was 1/ 2, and the direction was a −π/4radian angle. Note that the purpose of the use of a 1-1-1 network is to investigate the degree of the approximation of the results of the mathematical analysis, which is presented in Section VI.C. For each of the above three cases, the input (test) points lying on a circle (indicated by solid triangles in Figure 22) were presented in a second (test) step. We then evaluated the performance of the generalization ability of the Complex-BP on a rotation angle, a similitude ratio, and a parallel displacement vector. The evaluation results are shown in Figure 23. The vertical line of Figure 23 denotes the generalization performance, and the horizontal line the difference φ between the argument of a test point and
192
Tohru Nitta
Im
Im a50.5
1.0 radians 2
0.5
0.5
0.5 21.0
0.5 Re
Re
20.5
20.5
20.5
20.5
(a)
1.0
(b)
Im 1.0 0.5
0.5
Re
20.5 20.5 (c) FIGURE 22 Learning and test patterns for the systematic investigation of the generalization ability of the Complex-BP. The circles and triangles (solid or open) have the same meanings as in Figure 15. (a) Rotation. (b) Similarity transformation. (c) Parallel displacement. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
that of an input training point. Error in Figure 23 refers to the left side of Eq. (38); that is, the error between the desired output patterns and the actual output patterns in the training step. As evaluation criteria of the generalization performance, we used |ER (φ)|, |ES (φ)|, and |EP (φ)|, which denoted the Euclidean distances between the actual output test point and the expected output test point (see Figure 23). As shown in Figure 23, the generalization error (generalization performance) increased as the distance between the test point and the input training point became larger (i.e., φ became larger), and it showed the maximum value around the point that yielded the largest distance (φ ≈ 180). Furthermore, it decreased again as the test point approached the input training point. Figure 23 also suggests
Complex-Valued Neural Network and Complex-Valued Backpropagation
|E R()|
193
|E S ()|
0.4 0.15
Error 5 0.06 Error 50.08 Error 50.10
Error 5 0.02 Error 50.06 Error 50.10
0.3 0.10
0.2
0.05 0.1
0.0 (a)
0
100
200
300
0.00 0
(b)
|E P ()|
100
200
300
|E R ()|
0.4 Error 5 0.02 Error 5 0.06 Error 5 0.10
Error 5 0.06 Error 5 0.08 Error 5 0.10
0.20
0.3 0.15
0.2 0.10
0.1
0.05
0.0
0.00
(c)
0
100
200
300
(d)
|E S ()|
0
100
200
300
|E P ()|
0.100
0.3 Error 5 0.02 Error 5 0.06 Error 5 0.10
Error 5 0.02 Error 5 0.06 Error 5 0.10
0.075 0.2
0.050
0.1 0.025
0.000 (e)
0
100
200
300
0.0 (f)
0
100
200
300
FIGURE 23 Results of the evaluation of the performance of the generalization ability of the Complex-BP. |ER(ϕ)|, |ES(ϕ)|, and |EP(ϕ)| denote the Euclidean distances between the actual output test point and the expected output test point in the test step, respectively; ϕ denotes the difference between the argument of a test point and that of an input training point. Error refers to the left side of Eq. (38), i.e., the error between the desired output patterns and the actual output patterns in the training step. (a) Rotation, 1-1-1 network. (b) Similarity transformation, 1-1-1 network. (c) Parallel displacement, 1-1-1 network. (d) Rotation, 1-6-1 network. (e) Similarity transformation, 1-6-1 network. (f) Parallel displacement, 1-6-1 network. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
194
Tohru Nitta
that the generalization error on the transformation of geometric figures decreases as the number of hidden neurons increases, where only one hidden neuron was used in the three experiments of Figures 23a–c and six hidden neurons were used in the three experiments of Figures 23d–f. In the above experiments, the 11 training input points lay on the line x = 0 (0 ≤ y ≤ 1) and the 11 training output points lay on the line y = 0 (−1 ≤ x ≤ 0) for the rotation, the 11 training input points lay on the line x = 0 (0 ≤ y ≤ 1) and the 11 training output points lay on the line x = 0 (0 ≤ y ≤ 0.5) for the similarity transformation, and the 11 training input points lay on the line x = 0 (0 ≤ y ≤ 1) and the 11 training output points lay on the line x = 0.5 (−0.5 ≤ y ≤ 0.5) for the parallel displacement. The eight test input points lay on the circle x2 + y2 = 0.52 for all three cases.
C. Mathematical Analysis This section presents a mathematical analysis of the behavior of a complexvalued neural network that has learned the concept of rotation, similarity transformation, or parallel displacement using the Complex-BP algorithm. We introduce a simple 1-1-1 three-layered complex-valued network for the analysis. We use v exp[iw] ∈ C for the weight between the input and hidden neurons, c exp[id] ∈ C for the weight between the hidden and output neurons, s exp[it] ∈ C for the threshold of the hidden neuron, * + and r exp[il] ∈ C for the threshold of the output neuron. Let v0 exp iw0 , * + * + * + c0 exp id0 , s0 exp it0 , and r0 exp il0 denote the learned values of the learnable parameters. We define the following constants in advance:
K =
(1 +
√
1 2)c0 + 2r0
c0 B = √ , C = r0 , 2
, G=
c0 s0 kac0 v0 , A = , 2(kav0 + s0 ) 2(kav0 + s0 )
π HR = A cos(t0 + d0 ) + B cos d0 + + C cos(l0 ), 4 π HI = A sin(t0 + d0 ) + B sin d0 + + C sin(l0 ), 4 # M = 2K HR2 + HI2 , L 0 / π M = 2 K 2 A2 + B2 + C2 + 2AB cos t0 − + 2AC cos t0 + d0 − l0 4
τ 2 π − τK A cos(t0 + d0 − ω) + 2BC cos d0 + − l0 + 4 4 π + B cos d0 + − ω + C cos(l0 − ω) . (42) 4
Complex-Valued Neural Network and Complex-Valued Backpropagation
195
Im
Learning pattern (output)
␣ radians radians
Test pattern (input)
Learning pattern (input)
x radians Re
␣ radians
Test pattern (output) FIGURE 24 Learning and test patterns used in the mathematical analysis of the behavior of a Complex-BP network that has learned the counterclockwise rotation of the points in the complex plane by α radians around the origin. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
First, we investigate the behavior of the Complex-BP network that has learned rotation. Learning patterns are as follows: p points with equal intervals on a straight line that forms an angle of x radians with the real axis are rotated counterclockwise over α radians in the complex plane (Figure 24). That is, there are p training points such that for any 1 ≤ k ≤ p, an input point can be expressed as
ka exp[ix],
(43)
and the corresponding output point as
π 1 1 ka exp[i(x + α)] + √ exp i , 2 4 2
(44)
where k, p ∈ N (N denotes the set of natural numbers), a, x, α ∈ R+ , 0 < pa ≤ 1 which limits the values of learning patterns to the range from −1 to 1 in the complex plane, and the constant a denotes the interval
196
Tohru Nitta
between points. Note that although the output points take a value z within the range −1 ≤ Re[z], Im[z] ≤ 1, we transformed them as having a value z within the range 0 ≤ Re[z], Im[z] ≤ 1, because the Complex-BP network generates a value z within the range 0 ≤ Re[z], Im[z] ≤ 1. For this reason, Eq. (44) appears somewhat complicated. The following theorem explains the qualitative properties of the generalization ability of the Complex-BP on a rotation angle. Theorem 3. Fix 1 ≤ k ≤ p arbitrarily. To the Complex-BP network that has learned the training points [Eqs. (43) and (44)], a test point ka exp[i(x + φ)] is given, which can be obtained by a counterclockwise rotation of an input training point ka exp[ix] by arbitrary φ radians around the origin (Figure 24). Then, the network generates the following value:
π 1 1 ka exp [i(x + φ + α)] + √ exp i + ER (φ) ∈ C. 2 4 2
(45)
The first term of Eq. (45) refers to the point that can be obtained by the counterclockwise rotation of the test point ka exp[i(x + φ)] by α radians around the origin (Figure 24). Note that α is the angle that the network has learned. Also, the second term ER (φ) is a complex number that denotes the error, and the absolute value called generalization error on angle is given in the following expression:
! " φ , |E (φ)| = M sin 2 R
(46)
where M is a constant. A technical result is needed to prove Theorem 3: Lemma 1. For any 1 ≤ k ≤ p, the following approximate equations hold:
1 K G cos(x + w0 + d0 ) + HR + = 2 1 K G sin(x + w0 + d0 ) + HI + = 2
1 1 ka cos(x + α) + , 2 2
(47)
1 1 ka sin(x + α) + . 2 2
(48)
Proof. For any 1 ≤ k ≤ p, by computing the output value of the ComplexBP network for the input training point ka exp[ix], we find that the real part of the output value is equal to the left side of Eq. (47) and the imaginary part is equal to the left side of Eq. (48). In the above computations, the sigmoid function in the activation function [Eq. (2)] of each neuron was
197
Complex-Valued Neural Network and Complex-Valued Backpropagation
approximated by the following piecewise linear functions:
g(x) =
⎧ ⎪ ⎨ ⎪ ⎩
1 x 2(kav0 +s0 )
+
1 2
1 0
/
−(kav0 + s0 ) ≤ x ≤ kav0 + s0 0 / 0 kav + s0 < x 0 / x < −(kav0 + s0 )
0 (49)
for the hidden neuron, and
h(x) =
⎧ ⎪ ⎪ ⎪ ⎪ ⎨
√ 1 x (1+ 2)c0 +2r0
+
1 2
1 ⎪ ⎪ ⎪ ⎪ ⎩0
√
−( 1+2 2 c0 + r0 ) ≤ x ≤ √ 1+ 2 0 0 <x c + r 2 √ x < − 1+2 2 c0 + r0
√ 1+ 2 0 2 c
+ r0
(50) for the output neuron. Conversely, the real part and the imaginary part of the output value of the complex-valued neural network for the input training point should be equal to the real part and the √imaginary part of the output training point (1/2)ka exp[i(x + α)] + (1/ 2) exp [i(π/4)], respectively. This concludes the proof of Lemma 1. Proof of Theorem 3 Theorem 3 is proved according to the following policy. Using Equations (47) and (48) in Lemma 1, we compute the output value of the Complex-BP network for the test point ka exp[i(x + φ)], and transform it into [The point generated by the counterclockwise rotation over α radians of the test point ka exp[i(x + φ + α)] ] + [The error]. First, we compute the real part of the output value when the test point ka exp[i(x + φ)] is fed into the Complex-BP network. Using the equation
cos θ − λ sin θ =
; 1 + λ2 cos(θ + φ)
(51)
for any θ, where λ = tan φ, and by computing [ Eq. (47) ] − λ·[ Eq. (48)], we derive
0
0
K G cos(x + φ + w + d ) + HR
1 1 1 + = ka cos(x + φ + α) + 2 2 2 + ER re (φ),
where
ER re (φ)
(52)
! " ! " ! " φ φ π φ 0 0 0 · A sin t + d + + B sin d + + = 2K sin 2 2 4 2 ! " φ + C sin l0 + . (53) 2
198
Tohru Nitta
Note that the left side of Eq. (52) refers to the real part of the output value of the Complex-BP network for the test point, and the first term of the right side of Eq. (52) refers to the real part of the point generated by the counterclockwise rotation over α radians of the test point ka exp[i(x + φ + α)]. Finally, ER re (φ) refers to the real part of a complex number that denotes the error. Similarly, using the equation
λ cos θ + sin θ =
; 1 + λ2 sin(θ + φ)
(54)
for any θ, and by computing λ·[ Eq. (47) ] + [ Eq. (48) ], we derive
0
0
K G sin(x + φ + w + d ) + HI
1 1 1 + = ka sin(x + φ + α) + 2 2 2 + ER im (φ),
where
ER im (φ) = −2K sin
(55)
! " ! " ! " φ φ π φ · A cos t0 + d0 + + B cos d0 + + 2 2 4 2 ! " φ + C cos l0 + . (56) 2
Note that the left side of Eq. (55) refers to the imaginary part of the output value of the Complex-BP network for the test point, and the first term of the right side of Eq. (55) refers to the imaginary part of the point generated by the counterclockwise rotation over α radians of the test point ka exp[i(x + φ + α)]. Finally, ER im (φ) refers to the imaginary part of a complex number that denotes the error. Hence, it follows from Equations (52) and (55) that the output value of the Complex-BP network for the test point can be expressed as
π 1 1 ka exp[i(x + φ + α)] + √ exp i + ER (φ), 2 4 2
(57)
where def
R ER (φ) = ER re (φ) + iEim (φ);
(58)
that is, the Complex-BP network rotates the test point ka exp[i(x + φ)] counterclockwise over α radians with the error ER (φ), and Eq. (46) follows from Equations (53) and (56).
Complex-Valued Neural Network and Complex-Valued Backpropagation
199
The above theorem tells us that the generalization error on angle |ER (φ)| increases as the distance between the test point and the input training point increases (i.e., φ becomes larger), and it shows the maximum value M at the point that gives the largest distance (φ = 180). Furthermore, it decreases as the test point becomes closer to the input training point. Remark. The value of M differs with each learning because M depends on the values of the learnable parameters after learning; that is, the value of M is a constant in the world after learning in which the learnable parameters are fixed. Therefore, M is a constant, not a function of α, in Theorem 3 where a situation after one learning with a fixed value of α is assumed. Next, we explain the behavior of the Complex-BP network that has learned a similarity transformation. We use the following learning patterns: p points with equal intervals a on a straight line that forms an angle of x radians with the real axis are transformed into the points obtained by the similarity transformation with the similitude ratio β in the complex plane, respectively (Figure 25). That is, there are p training points; for any 1 ≤ k ≤ p, an input point can be expressed as
ka exp[ix],
(59)
and the corresponding output point as
π 1 1 kaβ exp[ix] + √ exp i , 2 4 2
(60)
where k, p ∈ N, a, x, β ∈ R+ , 0 < paβ ≤ 1, which limits the values of learning patterns to the range from −1 to 1 in the complex plane. Note that although the output points take a value z within the range −1 ≤ Re[z], Im[z] ≤ 1, we transformed them as having a value z within the range 0 ≤ Re[z], Im[z] ≤ 1 as in the case of rotation. The following theorem shows the qualitative property of the generalization ability of the Complex-BP on a similitude ratio. Theorem 4. Fix 1 ≤ k ≤ p arbitrarily. To the Complex-BP network that has learned the training points [Equations (59) and (60)], a test point ka exp[i(x + φ)] is given that can be obtained by a counterclockwise rotation of an input training point ka exp[ix] by arbitrary φ radians around the origin (Figure 25). Then, the network generates the following value:
π 1 1 kaβ exp [i(x + φ)] + √ exp i + ES (φ) ∈ C. 2 4 2
(61)
200
Tohru Nitta
Im Learning pattern (output)

Test pattern (input) radians Learning pattern (input) x radians Re

Test pattern (output)
FIGURE 25 Learning and test patterns used in the mathematical analysis of the behavior of a Complex-BP network that has learned the similarity transformation with the similitude ratio β in the complex plane. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
The first term of Eq. (61) refers to the point that can be obtained by the similarity transformation of the test point ka exp[i(x + φ)] on the distance from the origin with the similitude ratio β in the complex plane (Figure 25). Note that β is the similitude ratio that the network has learned. Also, the second term ES (φ) is a complex number that denotes the error, and the absolute value called generalization error on similitude ratio is given in the following expression:
! " φ . |E (φ)| = M sin 2 S
(62)
Remark. For the same reason as Theorem 3, M is a constant, not a function of β, in Theorem 4 where a situation after one learning with a fixed value of β is assumed. The proof is omitted, because it can be done in the same way as Theorem 3. We derive from Theorem 4 that the generalization error on similitude ratio |ES (φ)| increases as the distance between the test point and the input training point increases (i.e., φ becomes larger), and it takes the
Complex-Valued Neural Network and Complex-Valued Backpropagation
201
maximum value M at the point that gives the largest distance (φ = 180). Furthermore, it decreases as the test point approaches the input training point. Finally, we will show the behavior of the Complex-BP network that has learned parallel displacement. We use the following learning patterns: p points with equal intervals a on a straight line that forms an angle of x radians with the real axis are transformed into the points that can be obtained by the parallel displacement with a complex number γ = τ exp[iω] (called parallel displacement vector here) determining the direction and distance in the complex plane, respectively (Figure 26). That is, there are p training points; for any 1 ≤ k ≤ p, an input point can be expressed as
ka exp[ix],
(63)
and the corresponding output point as
π 1 1 (ka exp[ix] + γ) + √ exp i , 2 4 2
(64)
Im
Test pattern (input)
␥
Learning pattern (input)
radians
␥
Re x radians Test pattern (output)
Learning pattern (output)
FIGURE 26 Learning and test patterns used in the mathematical analysis of the behavior of a Complex-BP network which has learned the parallel displacement of the points with the parallel displacement vector γ in the complex plane. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
202
Tohru Nitta
where k, p ∈ N, a, x ∈ R+ , γ ∈ C and
−1 ≤ Re[pa exp[ix] + γ], Im[pa exp[ix] + γ] ≤ 1,
(65)
which limits the values of learning patterns to the range from −1 to 1 in the complex plane. Note that although the output points take a value z within the range −1 ≤ Re[z], Im[z] ≤ 1, we transformed them as having a value z within the range 0 ≤ Re[z], Im[z] ≤ 1 as in the previous cases. We can obtain the following theorem that clarifies the qualitative property of the generalization ability of the Complex-BP on a parallel displacement vector. Theorem 5. Fix 1 ≤ k ≤ p arbitrarily. To the Complex-BP network that has learned the training points [Equations (63) and (64)], a test point ka exp[i(x + φ)] is given that can be obtained by a counterclockwise rotation of an input training point ka exp[ix] by arbitrary φ radians around the origin (Figure 26). Then, the network generates the following value:
π 1 1 ka exp [i(x + φ)] + γ + √ exp i + EP (φ) ∈ C. 2 4 2
(66)
The first term of Eq. (66) refers to the point that can be obtained by the parallel displacement of the test point ka exp[i(x + φ)] with the parallel displacement vector γ (Figure 26). Note that γ is the parallel displacement vector that the network has learned. Also, the second term EP (φ) is a complex number that denotes the error, and the absolute value called the generalization error on parallel displacement vector is given in the following expression:
! " φ . |EP (φ)| = M sin 2
(67)
Remark. For the same reason as Theorem 3, M is a constant, not a function of γ, in Theorem 5 where a situation after one learning with a fixed value of γ is assumed. We can obtain this theorem in the same manner as Theorem 3. Therefore the proof is omitted. Theorem 5 indicates that the generalization error on parallel displacement vector |EP (φ)| increases as the distance between the test point and the input training point becomes larger (i.e., φ becomes larger), and it shows the maximum value M at the point that gives the largest distance (φ = 180). Furthermore, it decreases as the test point approaches the input training point.
Complex-Valued Neural Network and Complex-Valued Backpropagation
203
D. Discussion This section discusses the experimental and mathematical results described in Sections VI.A, VI.B, and VI.C on the ability of the Complex-BP algorithm to generalize the transformation. As seen in Section VI.A, 1-n-1 Complex-BP networks have the ability to generalize the transformation of geometric figures. This brings us to the second point: do the 1-n-1 Complex-BP networks have such a usual generalization ability that the Real-BP networks have? At first glance, the Real-BP and Complex-BP algorithms appear to have two different generalization abilities. To determine whether this is true or not, we tested the generalization ability of the 1-n-1 Complex-BP network for a continuous mapping task that 2-m-2 Real-BP networks could solve, which appeared in Tsutsumi (1989). In the experiments, a set of 25 training points shown in Figure 27 was used for a 1-6-1 Complex-BP network and a 2-12-2 Real-BP network. After sufficient training, by presenting the 252 test points on the 12 dotted lines shown in Figure 28, the actual output points formed the solid lines as shown in Figures 28a and b. Figure 29 shows the case in which the input training points are Figure 27b and the target training points are Figure 27a. Figures 28 and 29 suggest that both the 1-6-1 Complex-BP network and the 2-12-2 Real-BP network can obtain the same degree of generalization. We next discuss some examples on transformation described in Section VI.A.1. The counterclockwise rotation of a point (x, y) in the complex plane by θ radians around the origin corresponds to multiplying that complex number z1 = x + iy by the complex number z2 = exp[iθ], which has a radius of 1 and an argument of θ radians. That is, z1 z2 denotes the point generated by the counterclockwise rotation of a point (x, y) by θ radians around the origin. Furthermore, similarity transformations (reduction and magnification), and parallel displacement of a point (x, y) in the complex plane correspond, respectively to (1) multiplying a complex number z1 = x + iy by the real number α and (2) adding a complex number w to z1 = x + iy. We therefore believe that the complex-valued neural network has learned the complex function g(z) = z exp[iθ], g(z) = αz, or g(z) = z + w (for rotation, similarity transformation, and translation, respectively) in the experiments of Section VI.A.1. For example, θ = π/2 in Figure 15 (rotation), α = 0.5 in Figure 17a (reduction), and w = 0.5 − 0.5i in Figure 18a (parallel displacement). It should be noted that the neural network has learned nothing but some points on a certain straight line in the domain; nevertheless, the domain of the complex function g is [−1, 1] × [−1, 1]. The neural network presented a sequence of some points in the domain; the responses of the neural network to presentation of all points in the domain were nearly the values of the learned complex function g. This behavior of complex-valued neural networks closely resembles the identity theorem in complex analysis, which we now state.
204
Tohru Nitta
Im 1.0
0.0
21
22
23
24
25
16
17
18
19
20
11
12
13
14
15
6
7
8
9
10
1
2
3
4
5
1.0
0.0
Re
(a) Im 1.0 21
22
23
16
17
18
11
12
13
6
0.0
1
7
2
24
25
19
20
14
15
9
10
4
5
8
3
0.0 1.0 (b) Re FIGURE 27 Learning pattern for the comparison of the usual generalization performance of the 1-n-1 Complex-BP network and the 2-m-2 Real-BP network. The solid circles and the open circles denote the training points. (a) Learning pattern #1. (b) Learning pattern #2. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
205
Im 1.0
0.0 0.0 (a)
1.0 Re
Im 1.0
0.0 0.0 (b)
1.0 Re
FIGURE 28 Results of the comparison of the usual generalization performance of the 1-n-1 Complex-BP network and the 2-m-2 Real-BP network (Input: Learning pattern #1 in Figure 27a; Target: Learning pattern #2 in Figure 27b). The 12 dotted lines denote the input test pattern, and the solid lines the output test pattern. (a) Network output by the Complex-BP. (b) Network output by the Real-BP. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
206
Tohru Nitta
Im 1.0
0.0 0.0 (a)
1.0 Re
Im 1.0
0.0 0.0 (b)
1.0 Re
FIGURE 29 Results of the comparison of the usual generalization performance of the 1-n-1 Complex-BP network and the 2-m-2 Real-BP network (Input: Learning pattern #2 in Figure 27b; Target: Learning pattern #1 in Figure 27a). The 12 dotted lines denote the input test pattern, and the solid lines the output test pattern. (a) Network output by the Complex-BP. (b) Network output by the Real-BP. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
207
Theorem 6 (The Identity Theorem). Let F and G be regular functions over the complex domain D. If F(z) = G(z) on a given line in D, then F(z) = G(z) over D identically. This theorem is indicative of a phenomenon found in complex analysis that does not exist in real analysis. We interpret the behavior of the Complex-BP, as shown in Section VI.A.1, in terms of the identity theorem. We assume that the training points are obtained from the points on a given (straight) line in the domain of the true complex function F : [−1, 1] × [−1, 1] → [−1, 1] × [−1, 1]. The neural network approximates F on the basis of the given training points, resulting in a complex function G (where G(z) should equal F(z), at least on the training points). Then, for all z in the complex domain, the neural network generates the point G(z) that is close to F(z), as though it had satisfied the identity theorem. We believe that Complex-BP networks satisfy the identity theorem; that is, Complex-BP networks can approximate complex functions just by training them over only a part of the domain of the complex functions. Conversely, as seen in Section VI.C, the generalization error of the Complex-BP on transformation of geometric figures can be represented by the sine of the difference between the argument of the test point and that of the input training point. These mathematical results agree qualitatively with the simulation results in Section VI.B, which state that the generalization error increases as the distance between the test point and the input training point increases, and it assumes the maximum value M around the point that gives the largest distance; furthermore, it decreases as the test point approaches the input training point. Here we investigated the theoretical values and the experimental values of M based on the simulation results (using 1-1-1 networks) in Section VI.B. Table VII shows that there are some errors between the theoretical values and the experimental values of M. It is assumed that the cause of such errors is that Theorems 3, 4, and 5 are approximations (i.e., the sigmoid function in the output function of each neuron was approximated by the piecewise linear function). TABLE VII Comparison of the Theoretical Values and the Experimental Values of M (and M') Type of transformation
Theoretical value
Experimental value
Rotation Similarity transformation Parallel displacement
0.19 0.02 0.49
0.35 0.03 0.27
Learning was stopped when the error between the desired output pattern and the actual output pattern was equal to 0.06 in the case of rotation, 0.02 in the cases of similarity transformation and parallel displacement. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
208
Tohru Nitta
Thus, we can conclude that although there are approximation errors, Theorems 3, 4, and 5 clarify the qualitative property of the generalization ability of the Complex-BP on the transformation of geometric figures. The mathematical analysis in Section VI.C is restricted to a class of transformation of geometric figures such that the training points lie on a line starting from the origin, and the test points are obtained by an arbitrary counterclockwise rotation of the input training points. Thus, Theorems 3, 4, and 5 are applicable to only Figures 16 and 17 of the simulation results (Figures 15–18) presented in Section VI.A.1 (note that the training points in Figure 16 are not precise). Judging from the method of the analysis, it seems reasonable to suppose that similar theorems not only for Figures 15 and 18 but also for more general networks such as 1 − n − 1 and 1 − n1 − n2 − · · · − nk − 1 networks can be proved using the approach of Theorems 3, 4, and 5. However, the following problems may, occur. (1) Unlike Theorems 3, 4, and 5, the generalization error on the transformation of geometric figures cannot be simply represented by the sine. (2) The approximation errors increase as the number of neurons increases, because the sigmoid function in the output function of each neuron is approximated by piecewise linear functions. Another approach is needed to solve these problems. The Complex-BP algorithm can transform geometric figures so that the Real-BP cannot be its inherent property. On the other hand, we have experimentally clarified in Section V and in the beginning of this section that the Complex-BP algorithm has the same degree of the usual generalization performance compared to the Real-BP. The question now arises: while the Complex-BP has the inherent generalization ability (such as the ability to transform geometric figures), why can the Complex-BP have the same degree of the usual generalization performance compared to the Real-BP? Our answer to this question is given below. The learning patterns used in the experiments on transforming geometric figures (in Sections VI.A.1 and VI.B) were all very specific—meaning that they used some 10 learning patterns with narrow intervals (i.e., high density) massed on part of the plane (see Figures 15–18 and 22). However, the learning patterns used in the experiments on the learning speed in Section IV and the usual generalization ability in Section V and in the beginning of this section all had only low specificity, as detailed below: 1. Only four learning patterns with wide intervals (i.e., low density) were scattered on the plane in Experiment 1 of Sections IV and V—only a few learning patterns were used, and the distances between the learning patterns were large (see Table I and Figures 11a and b). 2. The learning patterns were evenly distributed on the plane, although many learning patterns with high density were used in the experiment of this section (see Figure 27).
Complex-Valued Neural Network and Complex-Valued Backpropagation
209
In addition, we cannot compare reasons why the Complex-BP algorithm could have the same degree of usual generalization performance compared to the Real-BP algorithm in Experiment 2 of Section V because the 2-4-1 Complex-BP network was used in Experiment 2 of Section V, which was substantially different from the 1-n-1 Complex-BP network used for transforming geometric figures. It is fairly certain that the inherent generalization ability to transform geometric figures is unique to the 1-n-1 network structure. Thus, we believe that the structures of the network and the learning patterns caused the experimental results that the ComplexBP could have the same degree of the usual generalization performance compared to the Real-BP. In other words, the use of the very specific learning patterns in the particular 1-n-1 network made the inherent ability to generalize emerge. There have been two applications of the ability to transform geometric figures of the Complex-BP network. One concerns optical flows, and the other relates to fractal images. Watanabe et al., (1994) applied the ComplexBP network in the computer vision field. They successfully used the ability to transform geometric figures of the Complex-BP network to complement the optical flow (2D velocity vector field on an image). The ability to transform geometric figure of the Complex-BP can also be used to generate fractal images. Miura and Aiyoshi (2003) applied the Complex-BP to the generation of fractal images and showed in computer simulations that some fractal images (such as the snow crystal) could be obtained with high accuracy where the iterated function systems (IFS) were constructed using the ability to transform geometric figure of the Complex-BP. The search for other various tasks in which the behavior of the ComplexBP network is dramatically different from that of the Real-BP network (for example, what the Real-BP can do, that the Complex-BP cannot) is a future research topic, which will greatly expand the complex-valued neural network world. It should be noted that it is cannot always be said that other inherent abilities of the Complex-BP algorithm, which may be discovered in the future, are superior to those of the Real-BP because the superiority of the Complex-BP as compared to the Real-BP depends on the specific problems and manner in which the algorithms are applied.
VII. ORTHOGONALITY OF DECISION BOUNDARIES IN THE COMPLEX-VALUED NEURON Decision boundary is a boundary by which the pattern classifier (such as the Real-BP) classifies input patterns, and generally consists of hypersurfaces. Decision boundaries of real-valued neural networks have been examined empirically by Lippmann (1987). This section mathematically analyzes
210
Tohru Nitta
the decision boundaries of the complex-valued neuron and presents their utility.
A. Mathematical Analysis We analyze the decision boundary of a single complex-valued neuron. Let the weights denote w = t [w1 · · · wn ] = wr + iwi , wr = t [w1r · · · wnr ], i w = t [w1i · · · wni ], and let the threshold denote θ = θ r + iθ i . Then, for n input signals (complex numbers) z = t [z1 · · · zn ] = x + iy, x = t [x1 · · · xn ], y = t [y1 · · · yn ], the complex-valued neuron generates
" ! ! x X + iY = fR [t wr − t wi ] + θ r + i fR [t wi y
t
wr ]
x + θi y
"
(68) as an output. Here, for any two constants CR , CI ∈ (0, 1), let
! " x X(x, y) = fR [t wr − t wi ] + θ r = CR , y !
t
Y(x, y) = fR [ w
i
t
x w] + θi y r
(69)
" = CI .
(70)
Note that Eq. (69) is the decision boundary for the real part of an output of the complex-valued neuron with n-inputs. That is, input signals (x, y) ∈ R2n are classified into two decision regions {(x, y) ∈ R2n |X(x, y) ≥ CR } and {(x, y) ∈ R2n |X(x, y) < CR } by the hypersurface given by Eq. (69). Similarly, Eq. (70) is the decision boundary for the imaginary part. The normal vectors QR (x, y) and QI (x, y) of the decision boundaries [Equations (69) and (70)] are given by
" ∂X ∂X ∂X ∂X Q (x, y) = ··· ··· ∂x1 ∂xn ∂y1 ∂yn ! " x
t r t i r = fR [ w − w ] + θ · [t wr − t wi ], y !
R
" ∂Y ∂Y ∂Y ∂Y ··· ··· ∂x1 ∂xn ∂y1 ∂yn ! " x = fR [t wi t wr ] + θ i · [t wi t wr ]. y
(71)
!
QI (x, y) =
(72)
Complex-Valued Neural Network and Complex-Valued Backpropagation
211
Noting that the inner product of Equations (71) and (72) is 0, we can find that the decision boundary for the real part of an output of a complex-valued neuron and that for the imaginary part, it intersects orthogonally. It can be easily shown that this orthogonal property also holds true for the other types of the complex-valued neurons proposed in Kim and Guest (1990), Benvenuto and Piazza (1992), and Georgiou and Koutsougeras (1992). Generally, a real-valued neuron classifies an input real-valued signal into two classes (0, 1). Conversely, a complex-valued neuron classifies an input complex-valued signal into four classes (0, 1, i, 1 + i). As described previously, the decision boundary of a complex-valued neuron consists of two hypersurfaces that intersect orthogonally and divides a decision region into four equal sections. Thus, it can be considered that a complexvalued neuron has a natural decision boundary for complex-valued patterns.
B. Utility of the Orthogonal Decision Boundaries This section shows the utility of the property on the decision boundary described in the previous section. Minsky and Papert (1969) clarified the limitations of a single real-valued neuron; in many cases, a single real-valued neuron is incapable of solving the problems. A classic example is the XOR problem with its long history in the study of neural networks; Many other difficult problems also involve the XOR as a subproblem. Another example is the detection of symmetry problem. Rumelhart et al., (1986a, b) showed that the three-layered real-valued neural network (i.e., with one hidden layer) can solve such problems—including the XOR problem and the detection of symmetry problem—and the interesting internal representations can be constructed in the weight-space. The XOR problem and the detection of symmetry problem cannot be solved with a single real-valued neuron. In the following text, first, contrary to expectation, it will be proved that such problems can be solved by a single complex-valued neuron with orthogonal decision boundaries, which reveals a potent computational power of complex-valued neuron. In addition, it is shown as an application of the above computational power that the fading equalization problem can be successfully solved by a single complex-valued neuron with the highest generalization ability. Rumelhart et al., (1986a, b) showed that increasing the number of layers raised the computational power of neural networks. This section shows that extending the dimensionality of neural networks to complex numbers causes the similar effect on neural networks. This may be a new direction for enhancing the ability of neural networks.
212
Tohru Nitta
1. The XOR Problem This section proves that the XOR problem can be solved by a single complex-valued neuron with the orthogonal decision boundaries. The input-output mapping in the XOR problem is shown in Table VIII. In order to solve the XOR problem with a single complex-valued neuron, the input-output mapping is encoded (as shown in Table IX), where the outputs 1 and i are interpreted as 0 and 0 and 1 + i are interpreted to be 1 of the original XOR problem (Table VIII), respectively. We use a single complex-valued neuron with only one input and a weight w = u + iv ∈ C (assuming that it has no threshold parameters). The activation function is defined as
1C (z) = 1R (x) + i1R (y), z = x + iy,
(73)
where 1R is a real-valued step function defined on R; that is, 1R (u) = 1 if u ≥ 0, 0 otherwise for any u ∈ R. The decision boundary of a single TABLE VIII
The XOR Problem
Input
Output
x1
x2
y
0 0 1 1
0 1 0 1
0 1 1 0
Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
TABLE IX An Encoded XOR Problem for a Single Complex-Valued Neuron Input
Output
z = x + iy
Z = X + iY
−1 − i −1 + i 1−i 1+i
1 0 1+i i
Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
Decision boundary for the imaginary-part (x 5 0) 21 1 i
213
Im Decision boundary for the real-part (y 5 0)
11i
Re 1 21 2 i
0
1 12i
FIGURE 30 The decision boundary in the input space of the complex-valued neuron that solves the XOR problem. The solid circle means that the output in the XOR problem is 1, and the open one 0. Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
complex-valued neuron described above consists of the following two straight lines that intersect orthogonally:
[u − v] · t [x y] = 0,
(74)
u] · t [x y] = 0
(75)
[v
for any input signal z = x + iy ∈ C, where u and v are the real and imaginary parts of the weight parameter w = u + iv, respectively. Equations (74) and (75) are the decision boundaries for the real and imaginary parts of a single complex-valued neuron, respectively. Letting u = 0 and v = 1 (i.e., w = i), provides the decision boundary shown in Figure 30, which divides the input space (the decision region) into four equal sections, and has the highest generalization ability for the XOR problem. On the other hand, the decision boundary of the three-layered real-valued neural network for the XOR problem does not always have the highest generalization ability (Lippmann, 1987). In addition, the required number of learnable parameters is only two (i.e., only w = u + iv), whereas at least nine parameters are needed for the three-layered real-valued neural network to solve the XOR problem (Rumelhart et √al., 1986a, b), where a complex-valued parameter z = x + iy (where i = −1) is counted as two because it consists of a real part x and an imaginary part y. This solution to the XOR problem uses the orthogonal property of a single complex-valued neuron. Note that several researchers had solved the XOR problem with a single complex-valued neuron in different ways (Nemoto and Kono, 1991; Igelnik et al., 2001; Aizenberg, 2006).
214
Tohru Nitta
2. The Detection of Symmetry Another interesting task that cannot be resolved by use of a single real-valued neuron is the detection of symmetry problem (Minsky and Papert, 1969). This section offers a solution to this problem using a single complex-valued neuron with orthogonal decision boundaries. The problem is to detect whether the binary activity levels of a 1D array of input neurons are symmetrical about the center point. For example, the input-output mapping in the case of three inputs is shown in Table X. We used patterns of various lengths (from 2 to 6) and could solve all the cases with single complex-valued neurons. Only a solution to the case with six inputs is presented here because the other cases can be solved similarly. We use a complex-valued neuron with six inputs with weights wk = uk + ivk ∈ C for input signal k (1 ≤ k ≤ 6) (we assume that it has no threshold parameters). In order to solve the problem with the complexvalued neuron, the input-output mapping is encoded as follows. An input xk ∈ R is encoded as an input xk + iyk ∈ C to the input neuron k, where yk = 0 (1 ≤ k ≤ 6); the output 1 ∈ R is encoded as 1 + i ∈ C; and the output 0 ∈ R is encoded as 1 or i ∈ C, which is determined according to inputs (for example, the output corresponding to the input t [0 0 0 0 1 0] is i). The activation function is the same as in Eq. (73). The decision boundary of the complex-valued neuron with six inputs as just described consists of the following two straight lines that
TABLE X Detection of Symmetry Problem with Three Inputs Input
Output
x1
x2
x3
y
0 0 0 1 0 1 1 1
0 0 1 0 1 0 1 1
0 1 0 0 1 1 0 1
1 0 1 0 0 1 0 1
Output 1 means that the corresponding input is symmetric, and 0 asymmetric. Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
215
intersect orthogonally:
[u1 · · · u6 −v1 · · · −v6 ] · t [x1 · · · x6 y1 · · · y6 ] = 0,
(76)
u6 ] · t [x1 · · · x6 y1 · · · y6 ] = 0
(77)
[v1 · · · v6
u1 · · ·
for any input signal zk = xk + iyk ∈ C, where uk and vk are the real and imaginary parts of the weight parameter wk = uk + ivk , respectively (1 ≤ k ≤ 6). Equations (76) and (77) are the decision boundaries for the real and imaginary parts of the complex-valued neuron with six inputs, respectively. Letting t [u1 · · · u6 ] = t [−1 2 −4 4 −2 1] and t [v1 · · · v6 ] = t [1 −2 4 −4 2 −1] (i.e., w1 = −1 + i, w2 = 2 − 2i, w3 = −4 + 4i, w4 = 4 − 4i, w5 = −2 + 2i and w6 = 1 − i) yields the orthogonal decision boundaries shown in Figure 31, which successfully detect the symmetry of the 26 (= 64) input patterns. In addition, the required number of learnable parameters is 12 (i.e., six complex-valued weights), whereas at least 17 parameters are needed for the three-layered real-valued neural network to solve the detection of symmetry (Rumelhart√et al., 1986a, b), where a complex-valued parameter z = x + iy (where i = −1) is counted as two, as in Section VII.B.1.
Decision boundary for the real-part of the net-input (x 5 0)
Im
Decision boundary for the imaginary-part of the net-input (y 5 0)
11i
i 1
Re 0 1 0
0
1
FIGURE 31 The decision boundary in the net-input space of the complex-valued neuron with six inputs that solves the detection of symmetry problem. Note that the plane is not the input space but the net-input space because the dimension of the input space is 6 and the input space cannot be written in a two-dimensional plane. The solid circle indicates a net-input for a symmetric input and the open one asymmetric. There is only one solid circle at the origin. The four circled complex numbers represent the output values of the complex-valued neuron in their regions, respectively. Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
216
Tohru Nitta
3. The Fading Equalization Technology This section, shows that single complex-valued neurons with orthogonal decision boundaries can be successfully applied to the fading equalization technology (Lathi, 1998). Channel equalization in a digital communication system can be viewed as a pattern classification problem. The digital communication system receives a transmitted signal sequence with additive noise and attempts to estimate the true transmitted sequence. A transmitted signal can assume one of the following four possible complex values: −1 − i, −1 + i, 1 − i, √ and 1 + i (i = −1). Thus, the received signal will take value around −1 − i, −1 + i, 1 − i, and 1 + i (for example, −0.9 − 1.2i, 1.1 + 0.84i or a similar value because some noises are added). We need to estimate the true complex values from such complex values with noises. Thus, a method with excellent generalization ability is needed for the estimate. The input-output mapping in the problem is shown in Table XI. We use the same complex-valued neuron with only one input (as in Section VII.B.1). To solve the problem with the complex-valued neuron, the TABLE XI Input-Output Mapping in the Fading Equalization Problem Input
Output
−1 − i −1 + i 1−i 1+i
−1 − i −1 + i 1−i 1+i
Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
TABLE XII An Encoded Fading Equalization Problem for Complex-Valued Neurons Input
Output
−1 − i −1 + i 1−i 1+i
0 i 1 1+i
Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
Decision boundary for the real-part (x 5 0)
Decision boundary for the imaginary-part (y 5 0)
Im 21 1 i 1
217
11i i
11i Re
0
0
21 2 i
0
1 1
12i
FIGURE 32 The decision boundary in the input space of the complex-valued neuron with one input that solves the fading equalization problem. The solid circle indicates an input in the fading equalization problem. The four circled complex numbers represent the output values of the complex-valued neuron in their regions, respectively. Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
input-output mapping in Table XI is encoded as shown in Table XII. Letting u = 1 and v = 0 (i.e., w = 1), we have the orthogonal decision boundary shown in Figure 32, which has the highest generalization ability for the fading equalization problem, and can estimate true signals without errors. In addition, the required number of learnable parameters is only two (i.e., only w = u + iv).
VIII. CONCLUSIONS We have described the multilayered complex-valued neural network model where the input signals, weights, thresholds, and output signals are all complex numbers, and the related Complex-BP, a complex-valued version of the backpropagation learning algorithm. Furthermore, we have elucidated their inherent properties. The error backpropagation has a structure that concerns 2D motion. A unit of learning consists of complex-valued signals flowing through the neural network. Compared with the updating rule of the Real-BP, the Complex-BP updating rule is such that it reduces the probability for a standstill in learning. As a result, the average convergence speed is superior to that of the Real-BP (whereas the generalization performance remains unchanged). In addition, the number of learnable parameters needed is almost half of the Real-BP, where a complex-valued parameter z = x + iy was counted as two because it consisted of a real part x and an imaginary
218
Tohru Nitta
part y. Thus, it seems that the Complex-BP algorithm is well suited for learning complex-valued patterns. Of note, the Complex-BP can transform geometric figures in a way that the Real-BP cannot. Numerical experiments suggest that the behavior of a Complex-BP network that has learned the transformation of geometric figures is related to the identity theorem in complex analysis. Mathematical analysis indicates that a Complex-BP network that has learned a transformation, has the ability to generalize that transformation with an error represented by the sine of the difference between the argument of the test point and that of the training point. This mathematical result agrees qualitatively with simulation results. Furthermore, the 1-n-1 type Complex-BP network, with the ability to transform geometric figures can also solve a continuous mapping task on the usual generalization ability very well, as can the 2-m-2 type Real-BP network. We believe that the structure of the learning patterns caused the successful experimental result that the 1-n-1 type Complex-BP network can solve the continuous mapping task. A decision boundary of a single complex-valued neuron consists of two hypersurfaces that intersect orthogonally and divides a decision region into four equal sections. The XOR problem and the detection of symmetry problem that cannot be solved with a single real-valued neuron can be solved by a single complex-valued neuron with the orthogonal decision boundaries, which reveals a potent computational power of complex-valued neurons. Furthermore, the fading equalization problem can be successfully solved by a single complex-valued neuron with the highest generalization ability. The work presented in this chapter probably represents just the beginning of the possible extension of the backpropagation algorithm to the complex number domain.
REFERENCES Aizenberg, I. (2006). Solving the parity n problem and other nonlinearly separable problems using a single universal binary neuron. In “Advances in Soft Computing. Springer Series. Computational Intelligence, Theory and Application” (B. Reusch, ed.), pp. 457–471. Springer, New York. Amari, S. (1967). A theory of adaptive pattern classifiers. IEEE Trans. Electr. Comput. 16(3), 299–307. Arena, P., Fortuna, L., Re, R., and Xibilia, M. G. (1993). On the capability of neural networks with complex neurons in complex valued functions approximation. Proc. IEEE Int. Symposium. Circ. Syst. 4, 2168–2171. Arena, P., Fortuna, L., Muscato, G., and Xibilia, M. G. (1998). Neural networks in multidimensional domains. Lect Notes Contr. Inf. Sci. 234, 1–165. Benvenuto, N., and Piazza, F. (1992). On the complex backpropagation algorithm. IEEE Trans. Signal Proc. 40(4), 967–969. Derrick, W. R. (1984). “Complex Analysis and Applications.” Wadsworth, New York.
Complex-Valued Neural Network and Complex-Valued Backpropagation
219
Georgiou, G. M., and Koutsougeras, C. (1992). Complex domain backpropagation. IEEE Trans. Circ. Syst.–II 39(5), 330–334. Hirose, A. (ed.). (2003). “Complex-Valued Neural Networks—Theories and Applications.” World Scientific Publishing, Singapore. ICANN/ICONIP. (2003). Seven papers in the special session: Complex-valued neural networks. In “Artificial Neural Networks and Neural Information Processing.” Lect. Notes Comput Sci. 2714, 943–1002 (Proceedings of International Conference on Artificial Neural Networks/International Conference on Neural Information Processing, ICANN/ICONIP ’03–Istanbul, June 26–29). ICANN. (2007). Six papers in the special session: Complex-valued neural networks. In “Artificial Neural Networks and Neural Information Processing.” Lect. Notes Comput. Sci. 4668, 838–893 (Proceedings of International Conference on Artificial Neural Networks, ICANN ’07–Portugal, September 9–13). ICONIP. (2002). Six papers in the special session: Complex-valued neural networks. Proc. Int. Conf. Neural Inf. Proc. 3, 1074–1103. Igelnik, B., Tbib-Azar, M., and LeClair, S. R. (2001). A net with complex weights. IEEE Trans. Neural Networks 12(2), 236–249. IJCNN. (2006). Twelve papers in the special session: Complex-valued neural networks. Proc. Int. Joint Conf. Neural Netw. Vancouver, BC, Canada, pp. 595–626, 1186–1224. KES. (2001). Six papers in the special session: Complex-valued neural networks and their applications. In “Knowledge-based Intelligent Information Engineering Systems and Allied Technologies” (N. Baba, L. C. Jain, and R. J. Howlett, eds.), Part I, pp. 550–580. IOS Press, Tokyo. KES. (2002). Five papers in the special session: Complex-valued neural networks. In “Knowledge-Based Intelligent Information Engineering Systems and Allied Technologies” (E. Damiani, R. J. Howlett, L. C. Jain, and N. Ichalkaranje, eds.), Part I, pp. 623–647. IOS Press, Amsterdam. KES. (2003). Seven papers in the special session: Complex-valued neural networks. In “Knowledge-based Intelligent Information and Engineering Systems.” Lect. Notes Comput. Sci. 2774, 304–357. Kim, M. S., and Guest, C. C. (1990). Modification of backpropagation networks for complexvalued signal processing in frequency domain. Proc. Int. Joint Conf. Neural Netw. 3, 27–31. Kim, T., and Adali, T. (2003). Approximation by fully complex multilayer perceptrons. Neural Computation 15(7), 1641–1666. Kuroe, Y., Hashimoto, N., and Mori, T. (2002). On energy function for complex-valued neural networks and its applications. Proc. Int. Conf. Neural Inf. Proc. 3, 1079–1083. Lathi, B. P. (1998). “Modern Digital and Analog Communication Systems.” Oxford University Press, New York, 3rd ed. Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE Acoustic Speech Signal Proc. 4, April 4–22. Minsky, M. L., and Papert, S. A. (1969). “Perceptrons.” MIT Press, Cambridge. Miura, M., and Aiyoshi, E. (2003). Approximation and designing of fractal images by complex neural networks. EEJ Trans. Electron. Inf. Syst. 123(8), 1465–1472 (in Japanese). Nemoto, I., and Kono, T. (1991). Complex-valued neural networks. Trans. Inst. Electron. Inf. Comm. Eng. J74–D-II, 1282–1288 (in Japanese). Nitta, T., and Furuya, T. (1991). A complex back-propagation learning. Trans. Inf. Proc. Soc. Jp. 32(10), 1319–1329 (in Japanese). Nitta, T. (1993). A complex numbered version of the back-propagation algorithm. Proc. World Congress Neur. Netw. 3, 576–579. Nitta, T. (1997). An extension of the back-propagation algorithm to complex numbers. Neur. Netw. 10(8), 1392–1415.
220
Tohru Nitta
Nitta, T. (2000). An analysis of the fundamental structure of complex-valued neurons. Neur. Proc. Lett. 12(3), 239–246. Nitta, T. (2003). Solving the XOR problem and the detection of symmetry using a single complex-valued neuron. Neur. Netw. 16(8), 1101–1105. Nitta, T. (2004). Orthogonality of decision boundaries in complex-valued neural networks. Neur. Computation 16(1), 73–97. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986a). Learning internal representations by error propagation. In “Parallel Distributed Processing: Explorations in the microstructures of cognition” (D. E. Rumelhart and J. L. McClelland, eds.), vol. 1, pp. 318–362. Cambridge, MA: MIT Press. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986b). Learning representations by back-propagating errors. Nature 323, 533–536. Tsutsumi, K. (1989). A multi-layered neural network composed of backpropagation and Hopfield nets and internal space representation. Proc. Int. Joint Conf. Neural Netw. Washington, D.C., June 1, pp. 365–371. Watanabe, A., Yazawa, N., Miyauchi, A., and Miyauchi, M. (1994). A method to interpret 3D motions using neural networks. IEICE Trans. Fund. Electr. Comm. Comput. Sci. E77-A(8), 1363–1370.
CHAPTER
5 Blind Source Separation: The Sparsity Revolution Jerome Bobin*, Jean-Luc Starck*, Yassir Moudden*, and Mohamed Jalal Fadili†
Contents
I Introduction A Organization B Definitions and Notations II Blind Source Separation: A Strenuous Inverse Problem A Modeling Multichannel Data B Independent Component Analysis C The Algorithmic Viewpoint III Sparse Multichannel Signal Representation A The Blessing of Sparsity and Overcomplete Signal Representations B The Sparse Decomposition Issue C Overcomplete Multichannel Representations IV Morphological Component Analysis for Multichannel Data A Morphological Diversity and Morphological Component Analysis B Multichannel Overcomplete Sparse Recovery C Multichannel Morphological Component Analysis D Recovering Sparse Multichannel Decompositions Using mMCA E Handling Bounded Noise With mMCA F Choosing the Overcomplete Dictionary V Morphological Diversity and Blind Source Separation A Generalized Morphological Component Analysis B Results C Speeding Up Blind GMCA D Unknown Number of Sources E Variations on Sparsity and Independence F Results
222 223 223 224 224 226 227 231 231 233 234 237 237 239 239 242 243 243 244 245 249 250 257 263 265
* Laboratoire AIM, CEA/DSM-CNRS-Université Paris Diderot, CEA Saclay, IRFU/SEDI-SAP, Service d’Astrophysique, Orme des Merisiers, 91191, Gif-sur-Yvette, France † GREYC CNRS UMR 6072, Image Processing Group, ENSICAEN 14050, Caen Cedex, France
Advances in Imaging and Electron Physics,Volume 152, ISSN 1076-5670, DOI: 10.1016/S1076-5670(08)00605-8. Copyright © 2008 Elsevier Inc. All rights reserved.
221
222
Jerome Bobin et al.
VI Dealing With Hyperspectral Data A Specificity of Hyperspectral Data B GMCA for Hyperspectral BSS C Comparison With GMCA VII Applications A Application to Multivalued Data Restoration B Application to the Planck Data C Software VIII Conclusion References
275 275 276 280 284 284 292 296 296 298
I. INTRODUCTION Finding a suitable representation of multivariate data is a longstanding problem in statistics and related areas. Good representation means that the data are somehow transformed so that their essential structure is made more visible or more easily accessible. This problem is encountered for instance, in unsupervised learning, exploratory data analysis, and signal-processing. In the latter, a typical field where the good representation problem arises is source separation. Over the past few years, the development of multichannel sensors motivated interest in such methods for the coherent processing of multivariate data. Areas of application include biomedical engineering, medical imaging, speech processing, astronomical imaging, remote sensing, communication systems, seismology, geophysics, and econometrics. Consider a situation where a collection of signals is emitted by some physical objects or sources. These physical sources could be, for example, different brain areas emitting electrical signals; people speaking in the same room (the classical cocktail party problem), thus emitting speech signals; or radiation sources emitting their electromagnetic waves. Assume further that there are several sensors or receivers. These sensors are in different positions, so that each sensor records a mixture of the original source signals with different weights. It is assumed that the mixing weights are unknown, since their knowledge entails knowing all the properties of the physical mixing system, which is not accessible in general. Of course, the source signals are unknown as well, since the primary problem is that they cannot be recorded directly. The blind source separation (BSS) problem is to find the original signals from their observed mixtures, without prior knowledge of the mixing weights, and with very little knowledge of the original sources. In the classical cocktail party example, the BSS problem amounts to recovering the voices of the different speakers from the mixtures recorded at several microphones. A flurry of research activity has focused on BSS, which is one of the hottest areas in the signal-processing community. Some specific
Blind Source Separation: The Sparsity Revolution
223
issues have already been addressed using a blend of heuristic ideas and rigorous derivations, as indicated by the extensive literature on the subject. As clearly emphasized by previous work, it is fundamental that the sources to be retrieved present some quantitatively measurable diversity (e.g., decorrelation, independence, morphological diversity). Recently sparsity and morphological diversity have emerged as a novel and effective source of diversity for BSS. This chapter provides new and essential insights into the use of sparsity in source separation and outlines the fundamental role of morphological diversity as a source of diversity or contrast between the sources. This chapter describes a BSS method, and more generally a multichannel sparsity-based data analysis framework, termed generalized morphological component analysis (GMCA), which is fast, efficient, and robust to noise. GMCA takes advantage of both morphological diversity and sparsity, using recent sparse overcomplete signal representations. Theoretical arguments and numerical experiments in multivariate image processing are reported to characterize and illustrate the good performance of GMCA for BSS.
A. Organization Section II formally states the BSS problem and surveys the current state of the art in the field of BSS. Section III provides the necessary background on sparse overcomplete representation and decomposition, with extensions to the multichannel setting. Section IV describes the multichannel extension of the morphological component analysis (MCA) algorithm and states some of its theoretical properties. Section V presents a new way of thinking of sparsity in BSS. All necessary ingredients introduced in previous sections are combined, and the GMCA algorithm for BSS is provided. The extension of GMCA to hyperspectral data and application of GMCA to multichannel data restoration analysis are reported in Section VI and VII.A. We also discuss an application of the GMCA BSS algorithm to an astronomical imaging experiment.
B. Definitions and Notations Unless stated otherwise, a vector x will be a row vector 1 2 x = [x1 , . . . , xt ]. We equip the vector space Rt with the scalar product x, y = xyT . The p -norm 0 / p 1/p , with the usual notation of a vector x is defined by &x&p = i |x[i]| &x&∞ = maxi |x[i]|. The notation &x&0 defines the 0 quasi-norm of x (i.e., the number of nonzero elements in x). Bold symbols represent matrices, and XT is the transpose of X. The i-th entry of xp is xp [i], xp is the p-th row, and xq the q-th column of X. The 0 / p 1/p , “entrywise” p-norm of a matrix X is defined by &X&p = i, j |xi [j]|
224
Jerome Bobin et al.
not to be confused with matrix-induced The Frobenius norm of / p-norms. 0 X is obtained for p = 2, &X&2F = Trace XT X . Similar to vectors, &X&∞ and &X&0 , respectively, denote the maximum in magnitude and the number of nonzero entries in the matrix X. In the proposed iterative algorithms, x˜ (h) will be the estimate of x at iteration h. = [φ1T , . . . , φTT ]T defines a T × t dictionary the rows of which of (see Tropp, 2004, are unit 2 -norm atoms {φi }i . The mutual1coherence 2 and references therein) is μ = maxi =j φi , φj . When T > t, this dictionary is said to be redundant or overcomplete. In the next section, we will be interested in the decomposition of a signal x in . We thus define S0 (x) (respectively S1 (x)) the set of solutions to the minimization problem minc &c&0 s.t. x = c (respectively, minc &c&1 s.t. x = c). When the 0 -sparse decomposition of a given signal x has a unique solution, let α = (x), where x = α denotes this solution. Finally, we define λ (.) to be a thresholding operator with threshold λ (hard thresholding or soft thresholding; this will be specified when needed). The support (x) of row vector x is (x) = {k; |x[k]| > 0}. Note that the notion of support is well adapted to 0 -sparse signals as these are synthesized from a few nonzero dictionary elements. Similarly, we define the δ-support of x as δ (x) = {k; |x[k]| > δ&x&∞ }.
II. BLIND SOURCE SEPARATION: A STRENUOUS INVERSE PROBLEM A. Modeling Multichannel Data 1. The Blind Source Separation Model In a source separation setting, the observed data are composed √ √of m distinct monochannel datum {xi }i=1,...,m . Each datum could be a t × t image or a monodimensional signal with t samples. In the next, we assume that each observation {xi }i=1,...,m is a row-vector of size t. The classical instantaneous linear mixture model states that each datum is the linear combination of n so-called sources {sj }j=1,...,n such that:
∀i = 1, . . . , m;
xi =
n
aij sj ,
(1)
j=1
where the set of scalar values {aij }i=1,...,m; j=1,...,n models the “weight” of each source in the composition of each datum. For convenience, the mixing model with additive noise can be rewritten in matrix form:
X = AS + N,
(2)
Blind Source Separation: The Sparsity Revolution
225
where X is the m × t measurement matrix, S is the n × t source matrix, and A is the m × n mixing matrix. A defines the contribution of each source to each measurement. An m × t matrix N is added to account for instrumental noise or model imperfections. In this chapter, we study only the overdetermined case: m ≥ n; the converse underdetermined case (m < n) is a more difficult problem (see Georgiev et al., 2005; Jourjine et al., 2000 for further details). Further work is devoted to this particular case. In the BSS problem, both the mixing matrix A and the sources S are unknown and must be estimated jointly. In general, without further a priori knowledge, decomposing a rectangular matrix X into a linear combination of n rank-1 matrices is clearly ill posed. The goal of BSS is to understand the different cases in which this or that additional prior constraint allows discovery of well-posed inverse problems and devising separation methods that can handle the resulting models.
2. A Question of Diversity Note that the mixture model in Eq. (1) is equivalent to the following one:
X=
n
ai s i ,
(3)
i=1
where ai is the i-th column of A. BSS is equivalent to decomposing the data X into a sum of n rank-1 matrices {Xi = ai si }i=1,...,n . Obviously, there are infinitely many ways of decomposing a given matrix with rank n into the linear combination of n rank-1 matrices. Further information is required to disentangle the sources. Let us assume that the sources are random vectors. These may be known a priori to be different in the sense of being simply decorrelated. A separation scheme then looks for sources S such that their covariance matrix RS is diagonal. Unfortunately, the covariance matrix RS is invariant by orthonormal transformations such as rotations. Therefore, an effective BSS method must go beyond decorrelation (see Cardoso, 1998, 2001 for further reflections about the need for stronger a priori constraints going beyond the decorrelation assumption). The next sections emphasize different sets of a priori constraints and different methods to handle them. Section II.B, gives an overview of BSS methods that use statistical independence as the key assumption for separation. Recently sparsity has emerged as a very effective method to distinguish the sources. These new approaches are introduced in Section II.C.
226
Jerome Bobin et al.
B. Independent Component Analysis 1. Generalities The previous section emphasized the need for further a priori assumptions to bring BSS to the “land” of well-posed inverse problems. This section deals with noiseless mixtures assuming that X = AS. The case where the data are perturbed by additive noise is discussed at the end of this section. The seminal work by Comon (1994) paved the way for the outgrowth of independent component analysis (ICA). In the celebrated ICA framework, the sources are assumed to be independent random variables with joint probability density function fS such that:
fS (s1 , . . . , sn ) =
n M
fsi (si ).
(4)
i=1
Disentangling sources requires a means to measure the difference of separable sources. As statistical independence is verified by the probability density function (pdf) of the sources, devising a good “measure” of independence is not trivial. In that setting, ICA then is reduced to finding a multichannel representation/basis on which the estimated sources S˜ are as “independent as possible.” Equivalently, ICA looks for a separating/ demixing matrix B such that the estimated sources S˜ = BAS are independent. Until the end of the section devoted to ICA, we will assume that the mixing matrix A is a square invertible matrix (m = n and det(A) > 0). Previously, we could wonder whether independence makes the sources identifiable. Under mild conditions, the Darmois theorem (Darmois, 1953) shows that statistical independence means separability (Comon, 1994). It states that if at most one of the sources is generated from a Gaussian distribution, then if the entries of S˜ = BAS are independent, then B is a separating matrix and S˜ is equal to S up to a scale factor (multiplication by a diagonal matrix with strictly positive diagonal entries) and permutation. As a consequence, if at most one source is Gaussian, maximizing independence between the estimated sources leads to perfect estimation of S and A = B−1 . The Darmois theorem then motivates the use of independence in BSS. It paved the way for the popular ICA.
a. Independence and Gaussianity. The Kullback–Leibler (KL) divergence from the joint density fS (s1 , . . . , sn ) to the product of its marginal density is
Blind Source Separation: The Sparsity Revolution
227
a popular measure of statistical independence:
( J (S) = K fS (s1 , . . . , sn ),
n M
) fS (si )
" fS (s1 , . . . , sn ) = fS (s1 , . . . , sn ) log Nn . S i=1 fS (si ) E
(5)
i=1
!
(6)
Interestingly (see Cardoso, 2003), the KL can be decomposed into two terms as follows:
J (S) = C (S) −
n
G (si ) + K,
(7)
i=1
/ 0+ * * where C (S) = K N (E{S}, RS ), N E{S}, diag (RS ) and G (si ) = K f (si ) , 0+ / N E{si }, σs2i , σs2i is the variance of si , and N (m, ) is the normal probability density function with mean m and covariance . In Eq. (7) K is a constant. The first term in Eq. (7) vanishes when the sources are decorrelated. The second term measures the marginal Gaussianity of the sources. This decomposition of the KL entails that maximizing independence is equivalent to minimizing the correlation between the sources and maximizing their non-Gaussianity. Note that, owing to the central limit theorem, intuition tells us that mixing independent signals should lead to a kind of Gaussianization. It then seems natural that demixing leads to processes that deviate from Gaussian processes.
C. The Algorithmic Viewpoint a. Approximating Independence. In the ICA setting, the mixing matrix is square and invertible. Solving a BSS problem is equivalent to looking for a demixing matrix B that maximizes the independence of the estimated sources: S˜ = BX. In that setting, maximizing the independence of the sources (with respect to the KL divergence) is equivalent to maximizing the non-Gaussianity of the sources. Since the seminal article by Comon (1994), a variety of ICAalgorithms have been proposed. They all merely differ in the way they devise assessable quantitative measures of independence. Some popular approaches that have given “measures” of independence are presented below: • Information maximization (see Bell and Sejnowski, 1995; Nadal and Parga, 1994): Bell and Sejnowski showed that maximizing the information of the sources is equivalent to minimizing the measure of independence based on the KL divergence in Eq. (5).
228
Jerome Bobin et al.
• Maximum likelihood: Maximum likelihood (ML) has also been proposed to solve the BSS issue. The ML approach (Cardoso, 1997; Parra and Pearlmutter, 1997; Pham et al., 1992) has been showed to be equivalent to information maximization (InfoMax) in the ICA framework. • Higher-order statistics: As noted previously, maximizing the independence of the sources is equivalent to maximizing their non-Gaussianity under a strict decorrelation constraint. Because Gaussian random variables have vanishing higher-order cumulants, devising a separation algorithm based on higher-order cumulants should provide a way of accounting for the non-Gaussianity of the sources. A wide range of algorithms have been proposed based on the use of higher-order statistics (Hyvarinen et al., 2001; Belouchrani et al., 1997; Cardoso, 1999, and references therein). Historical papers (see Comon, 1994) proposed ICA algorithms that use approximations of the KL divergence (based on truncated edgeworth expansions). Interestingly, those approximations explicitly involve higher-order statistics. Lee et al. (1998) showed that most ICA-based algorithms are similar in theory and in practice.
b. Limits of ICA. Despite its theoretical strength and elegance, ICA has several limitations: • Probability density assumption: Even implicit, ICA algorithm requires information on the sources distribution. As stated in Lee et al. (1998), whatever the contrast function to minimize (mutual information, ML, higher-order statistics), most ICAalgorithms can be equivalently restated in a natural gradient form (Amari, 1999; Amari and Cardoso, 1996). In such a setting, the “demixing” matrix B is estimated iteratively: B ← B + μB where the natural gradient of B is given by:
˜ S˜ T B, B ∝ I − h(S)
(8)
* + ˜ = h(˜sij ) and S˜ is the where the function h is applied elementwise: h(S) current estimate of S: S˜ = BX. Interestingly, the so-called score function h in Eq. (8) is closely related to the assumed pdf of the sources (see Amari and Cardoso, 1996; Amari and Cichocki, 2002). Assuming that all the sources are generated from the same probability density function fS , the so-called score function h is defined as follows:
˜ =− h(S)
˜ ∂ log fS (S) ∂S˜
.
(9)
Blind Source Separation: The Sparsity Revolution
229
As expected, the way the “demixing” matrix (and thus the sources) is estimated closely depends on the way the sources are modeled (from a statistical point of view). For instance, separating platykurtic (distribution with negative kurtosis) or leptokurtic (distribution with positive kurtosis) sources requires completely different score functions. Even if ICA is shown in Amari and Cardoso to be quite robust to “mismodeling,” the choice of the score function is crucial with respect to the convergence (and rate of convergence) of ICA algorithms. Some ICA-based techniques (see Koldovsky and Oja, 2006) emphasized adapting the popular FastICA algorithm to adjust the score function to the distribution of the sources. They particularly emphasize modeling sources the distribution of which belongs to specific parametric classes of distributions such as N generalized Gaussian: fS (S) ∝ ij exp(−μ|sij |θ ).1 • Noisy ICA: Only a few works have already investigated the problem of noisy ICA (see Davies, 2004; Koldovsky and Tichavsky, 2006). As pointed out by Davies (2004), noise clearly degenerates the ICA model: it is not fully identifiable. In the case of additive Gaussian noise as stated in Eq. (2), using higher-order statistics yields an efficient estimate of the mixing matrix A = B−1 (higher-order statistics are blind to additive Gaussian noise; this property does not hold for non-Gaussian noise). Further, in the noisy ICA setting, applying the demixing matrix to the data does not yield an efficient estimate of the sources. Furthermore, most ICA algorithms assume the mixing matrix A to be square. When there are more observations than sources (m > n), a dimension reduction step is preprocessed. When noise perturbs the data, this subspace projection step can dramatically deteriorate the performance of the separation stage. The next section introduces a new way of modeling the data to avoid most of the aforementioned limitations of ICA.
1. Sparsity in Blind Source Separation In the above paragraph, we pointed out that BSS is overwhelmingly a question of contrast and diversity. Indeed, devising a source separation technique consists of finding an effective way of disentangling between the sources. From this viewpoint, statistical independence is a kind of “measure” of diversity between signals. Within this paradigm, we can wonder if independence is a natural way of differentiating between signals. As a statistical property, independence is a non-sense in a nonasymptotic study. In practice, one must deal with finite-length signals, sometimes with a few samples. Furthermore, most real-world data are 1 Note that the class of generalized Gaussian contains well-known distributions: the Gaussian (θ = 2) and
the Laplacian (θ = 1) distributions.
230
FIGURE 1
Jerome Bobin et al.
Examples of natural images.
modeled by stationary stochastic processes. Let us consider the images in Figure 1. Natural pictures are clearly nonstationary. As these pictures are slightly correlated, independence fails in differentiating between them. Hopefully, the human eye (more precisely the different levels of the human visual cortex) is able to distinguish between those two images. Then, what makes the eye so effective in discerning between visual “signals”? The answer may come from neurosciences. Indeed, for a decades, many researchers (Barlow, 1961; Field, 1999; Hubel and Wiesel, 1981;2 Olshausen and Field, 2006; Simoncelli and Olshausen, 2001, and references therein) in this field have endeavored to provide some exciting answers: the mammalian visual cortex seems to have learned via the natural selection of individuals, an effective way of coding the information in natural scenes. Indeed, the first level of the mammalian visual cortex (termed V1) seems to verify several interesting properties: (1) it tends to “decorrelate” the responses of visual receptive fields (following Simoncelli and Olshausen, 2001; an efficient coding cannot duplicate information in more than one neuron), (2) owing to a kind of “economy/compression principle,” saving neurons’ activity yields a sparse activation of neurons for a given stimulus (this property can be considered as a way of compressing information). Furthermore, the primary visual cortex is sensitive to particular stimuli (visual features) that surprisingly look like oriented Gabor-like wavelets (see Field, 1999). It gives support to the crucial part played by contours in natural scenes. Furthermore, each stimulus tends to be coded by a few neurons. Such a way of coding information is often referred to as sparse coding. These few elements of neuroscience motivate the use of sparsity 2 Hubel and Wiesel were awarded with the Nobel Prize in medicine in 1981.
Blind Source Separation: The Sparsity Revolution
231
as an effective way of compressing signal’s information, thus extracting its very essence. Inspired by the behavior of our visual cortex, seeking a sparse code may provide an effective way of differentiating between “different” signals. Here, “different” signals are signals with different sparse representations.
a. A Pioneering Work in Sparse BSS. The seminal paper of Zibulevsky and Pearlmutter (2001) introduced sparsity as an alternative to standard contrast functions in ICA. In this work, the authors proposed to estimate the mixing matrix A and the sources S in a fully Bayesian framework. Each source {si }i=1,...,n is assumed to be sparsely represented in the basis : ∀i = 1, . . . , n;
si =
t
αi [k]φk .
(10)
k=1
As the sources are assumed to be sparse, the distribution of their coefficients in is a “sparse” (i.e., leptokurtic) prior distribution:
fS (αi [k]) ∝ e−μi gγ (αi [k]) ,
(11)
where gγ (αi [k]) = |αi [k]|γ with γ ≤ 1.3 Zibulevsky proposed to estimate A and S via a maximum a posteriori (MAP) estimator. The optimization task is then run using a Newton-like algorithm: the relative newton algorithm (RNA; see Zibulevski, 2003 for more details). This new sparsity-based method paved the way for the use of sparsity in BSS. Note that several other works emphasized the use of sparsity in a parametric Bayesian approach (Hyvarinen et al., 2001 and references therein). Recently, sparsity has emerged as an effective tool for solving underdetermined source separation issues (Bronstein et al., 2005; Georgiev et al., 2005; Li et al., 2006; Vincent, 2007 and references therein). This chapter concentrates on overdetermined BSS (m ≥ n). Inspired by the work of Zibulevsky, we present a novel sparsity-based source separation framework providing new insights into BSS.
III. SPARSE MULTICHANNEL SIGNAL REPRESENTATION A. The Blessing of Sparsity and Overcomplete Signal Representations The last section emphasized the crucial role played by sparsity in BSS. Indeed, sparse representations provide an effective way to “compress” 3 Applying g (.) pointwisely to a vector α is equivalent to computing its norm. γ γ i
232
Jerome Bobin et al.
signals to a few very significant content. In previous work (see Bobin et al., 2006, 2007), we claimed that the sparser the signals are, the better the separation is. Therefore, the first step toward separation consists in finding an effective sparse representation, where effective means very sparse. Owing to its essential role in BSS, this section particularly emphasizes the quest for sparse representation.
1. What’s at Stake? In the past decade sparsity has emerged as one of the leading concepts in a wide range of signal-processing applications (restoration (Starck et al., 2002), feature extraction (Starck et al., 2005), source-separation (Bobin et al., 2006; Li et al., 2006; Zibulevsky and Pearlmutter, 2001), and compression (Vetterli, 2001), to name only a few). Sparsity has long been a theoretical and practical attractive signal property in many areas of applied mathematics (computational harmonic analysis (Donoho et al., 1998), statistical estimation (Donoho, 1993; Donoho and Johnstone, 1995)). Very recently researchers have advocated the use of overcomplete signal representations. Indeed, the attractiveness of redundant signal representations relies on their ability to sparsely represent a large class of signals. Furthermore, handling very sparse signal representations allows more flexibility and entails effectiveness in many signal-processing tasks (restoration, separation, compression, estimation). Neuroscience also underlined the role of overcompleteness. Indeed, the mammalian visual system has been shown to probably be in need of overcomplete representation (Olshausen and Field, 2006). In that setting, overcomplete sparse coding may lead to more effective (sparser) codes. In signal-processing, both theoretical and practical arguments (Starck et al., 2002, 2007) have supported the use of overcompleteness. It entails more flexibility in representation and effectiveness in many image-processing tasks. In the general sparse representation framework, a line vector signal x ∈ Rt is modeled as the linear combination of T elementary waveforms (the so-called signal atoms):
{φi }i=1,...,T ;
x=
T i=k
α[k]φk ,
(12)
2 1 where α[k] = x, φk are called the decomposition coefficients of x in the dictionary = [φ1T , . . . , φTT ]T (the T × t matrix whose rows are the atoms normalized to a unit 2 -norm). In the case of overcomplete representations, the number of waveforms {φk } that compose the dictionary is higher than the dimension of the space in which x lies: T > t. In practice, the dimensionality of the sparse decomposition (i.e., the vector of coefficients α) can be very high: T ) t.
Blind Source Separation: The Sparsity Revolution
233
Nonetheless, handling overcomplete representations is clearly an ill-posed problem owing to elementary linear algebra. Indeed decomposing a signal in an overcomplete representation requires solving an underdetermined linear problem with more unknowns than data: T > t. Linear algebra tells us that the problem x = α has no unique solution. The next section provides solutions to this puzzling issue.
B. The Sparse Decomposition Issue The transition from ill-posedness to well-posedness in the sparse decomposition framework is often fulfilled by reducing the space of candidate solutions to those satisfying some side constraints. Researchers have emphasized adding a sparsity constraint to the previous ill-posed problem. Among all the solutions of x = α the sparsest one (with the least number of nonzero coefficients αi ) is preferred, Donoho and Huo (2001) proposed to solve the following minimization problem:
min &α&0 s.t x = α. α
(13)
Clearly this is a combinatorial optimization problem that requires enumerating all the combinations of atoms {φi }i=1,...,T that synthesize x. This nondeterministic polynomial time (NP)-hard problem then appears hopeless. Donoho and Huo (2001) proposed to relax the nonconvex 0 sparsity by substituting the problem in Eq. (13) with the following convex problem:
min &α&1 s.t. x = α. α
(14)
The problem in Eq. (14) is known as basis pursuit (see Chen et al., 1998). However, the solutions to the 0 and 1 problems are not equivalent in general. An extensive work (Bruckstein and Elad, 2002; Donoho and Elad, 2003; Donoho and Huo, 2001; Fuchs, 2004; Feuer and Nemirovsky, 2003; Gribonval and Nielsen, 2003; Tropp, 2004) has focused on conditions under which the problems in Eqs. (13) and (14) are equivalent. Considering that x = k∈(x) α[k]φk , we recall that (x) is the support of x in and K = Card ((x)). The signal x is said to be K-sparse in . Interestingly, the first seminal work addressing the uniqueness and equivalence of the solutions to the 0 and 1 sparse decomposition recovery emphasized essentially the structure of the overcomplete dictionary . One quantitative measure that gives information about the structure of an overcomplete dictionary is its mutual coherence μ (see also Section I.B):
1 2 μ = max φi , φj . i =j
(15)
234
Jerome Bobin et al.
This parameter can be viewed as a worst-case measure of resemblance between all pairs of atoms. Interestingly, Donoho and Huo (2001) showed / 0 that if a vector x∗ with Card (x∗ ) = K is sufficiently sparse and verifies:
K<
! " 1 1 1+ , 2 μ
(16)
then x∗ is the unique maximally sparse solution to the 0 sparse decomposition problem in Eq. (13), and the 0 and 1 sparse decomposition problems are equivalent. Consequently, recovering sparse decompositions is then made tractable. Note, however, that despite its simplicity, the identifiability test of Eq. (16) is pessimistic (worst-case analysis). More involved, but sharper, bounds of identifiability and equivalence between 0 and 1 problems have been proposed in the literature (see e.g., Bruckstein and Elad, 2002; Donoho and Elad, 2003; Feuer and Nemirovsky, 2003; Gribonval and Nielsen, 2003; Tropp, 2004; and Bruckstein et al., 2008, for an extensive review).
C. Overcomplete Multichannel Representations This section extends the sparse decomposition problem to the multichannel case. Previous work on the subject includes (Cotter et al., 2005; Fornasier and Rauhut, 2008) wherein all channels are constrained to have a common sparsity pattern (i.e., joint support); Chen and Huo (2005) in which the sparsity measure used is different, thus leading to different constraints; and Gribonval and Nielsen (2006), which introduced the concept of multichannel dictionary. In this chapter, we address a more general problem as we assume no constraint on the sparsity pattern of the different channels. Extending the redundant representation framework to the multichannel case requires defining what a multichannel overcomplete representation is. We assume that the multichannel dictionary at hand is the tensor product of a spectral dictionary (m × n matrix) and a spatial or temporal dictionary (T × t matrix).4 Each atom of is then the tensor product of an atomic spectrum ξi and a spatial elementary signal φj :
∀{i, j} ∈ {1, . . . , n} × {1, . . . , T},
ψij = ξi ⊗ φj .
(17)
Recall that most popular sparse recovery results in the monochannel setting rely on the mutual coherence of the dictionary. In the multichannel case a similar quantity can be defined. Recalling the definition of 4 The adjectives spectral and spatial that characterize the dictionaries are not formal. Owing to the symmetry
of the multichannel sparse decomposition problems, and have no formal difference. In practice, and more particularly, in multispectral, hyper-spectral imaging, will refer to the dictionary of physical spectra and to the dictionary of image/signal waveforms.
Blind Source Separation: The Sparsity Revolution
235
mutual coherence in Section I.B, the mutual coherence for multichannel dictionaries is as follows:
0 ≤ μ = max {μ , μ } < 1.
(18)
This expression of the multichannel mutual coherence is interesting as atoms can be selected based on their spatial or spectral morphology. In other words, discriminating two different multichannel atoms ψγ={i,p} and ψγ ={j,q} can be made based on the following; • Spatial or temporal (resp. spectral) diversity: In this case, i = j and p = q (respectively, i = j and p = q). These atoms have the same spectrum (respectively spatial shape), but they can be distinguished based on their spatial (respectively spectral) diversity. From Eq. (18), their coherence is lower than μ (respectively μ ). Disentangling these multichannel atoms can be done equivalently in the monochannel case. • Both diversities: i = j and p = q, the “separation” task seems easier as the atoms do not share either the same spectra or the same spatial (or temporal) “shape.” Note that from Eq. (18), the coherence between these atoms in this case is lower than μ μ ≤ max {μ , μ }. Let us assume that the data X are K-sparse in . Hence, X are the linear combination of K multichannel atoms:
X=
αγ ψγ ,
(19)
γ∈(X)
This equation is clearly similar to the monochannel case. Owing to this key observation, the next paragraph shows that most sparse decomposition results can be extended to the multichannel case.
1. Multichannel Sparse Recovery Results The last paragraph emphasized the apparent similarities between the monochannel and multichannel sparse models in Eq. (19). Similarly, decomposing multichannel data in requires solving the following problem:
min &α&0 s.t X = α, α
(20)
where α = γ αγ ψγ . The convex 1 minimization problem would be recast equivalently in the multichannel case:
min &α&1 s.t X = α. α
(21)
236
Jerome Bobin et al.
From the optimization viewpoint, monochannel and multichannel problems are similar. This point leads us to straightforwardly extend sparse recovery results in Eq. (16) to the multichannel case. The uniqueness and equivalence condition of the sparse multichannel decomposition problem in Eq. (20) is then similar to the monochannel case. Assume that X is K-sparse in the multichannel dictionary = ⊗ . The 0 -sparse decomposition problem in Eq. (20) has a unique solution, and problems in Eqs. (20) and (21) are equivalent when " ! 1 1 K< where μ = max{μ , μ }. 1+ 2 μ In this framework, most results in the monochannel case (Bruckstein and Elad, 2002; Donoho and Elad, 2003; Feuer and Nemirovsky, 2003; Fuchs, 2004; Gribonval and Nielsen, 2003; Tropp, 2004) can be straightforwardly extended to the multichannel case.
2. Practical Sparse Signal Decomposition The previous sections, we emphasized conditions under which the 0 -sparse decomposition problem in Eq. (20) can be replaced with the convex 1 -sparse decomposition problem in Eq. (21). Most algorithms that have been proposed to solve sparse decomposition issues can be divided into three main categories: • Linear programming: In the seminal paper (Chen et al., 1998), the authors proposed to solve the convex 1 -sparse decomposition problem in Eq. (21) with linear programming methods such as interior point methods. Unfortunately, linear programming–based methods are computationally demanding and thus not well suited to large-scale problems such as ours. • Greedy algorithms: The most popular greedy algorithm must be the matching pursuit (MP) and its orthogonal version orthogonal matching pursuit (OMP) (Mallat and Zhang, 1993). Conditions have been given under which MP and OMP are proved to solve the 1 and 0 sparse decomposition problems (Gribonval and Vandergheynst, 2006; Tropp, 2004; Tropp and Gilbert). Greedy algorithms also have been proposed by the statistics community for solving variable selection problems least angle regression/liberty alliance single sign on (LARS/LASSO; see Efron et al., 2004; Tibshirani, 1996). Homotopy-continuation algorithms also have been introduced to solve the sparse decomposition problem (Malioutov et al., 2005; Osborne et al., 2000; Plumbley, 2006). Interestingly, a recent work by (Donoho and Tsaig, 2006) enlightens the links between greedy algorithms such as OMP, variable selection algorithms, and homotopy. Such greedy algorithms, however, entail high computational cost.
Blind Source Separation: The Sparsity Revolution
237
• Iterative thresholding: Recently, iterative thresholding algorithms have been proposed to mitigate the greediness of the aforementioned stepwise algorithms. Iterative thresholding was first introduced for solving sparsity-based inverse problems (see Combettes and Wajs, 2005; Daubechies et al., 2004; Figueiredo and Nowak, 2003). Most of these algorithms can be easily extended to handle multichannel data.
IV. MORPHOLOGICAL COMPONENT ANALYSIS FOR MULTICHANNEL DATA A. Morphological Diversity and Morphological Component Analysis 1. An Introduction to Morphological Diversity Recall that a monochannel signal x is said to be sparse in a waveform dictionary if it can be well represented from a few dictionary elements. As discussed in Starck et al. (2005), a single basis is often not well adapted to large classes of highly structured data such as “natural images.” Furthermore, over the past 10 years, new tools have emerged from modern computational harmonic analysis: wavelets (Mallat, 1998), ridgelets (Candès and Donoho, 1999b), curvelets (Candès and Donoho, 1999a; Candès et al., 2006; Starck et al., 2002), bandlets (LePennec and Mallat, 2005), contourlets (Do and Vetterli, 2005), to name a few. It is quite tempting to combine several representations to build a larger dictionary of waveforms that will enable the sparse representation of larger classes of signal. In Starck et al. (2004, 2005), the authors proposed a practical algorithm (termed MCA) aimed at decomposing signals in overcomplete dictionaries made of a union of bases. In the MCA setting, x is the linear combination of D morphological components:
x=
D i=1
ϕi =
D
αi i ,
(22)
i=1
where {i }i=1,...,D are orthonormal bases of Rt . Morphological diversity then relies on the sparsity of those morphological components in specific bases. In terms of 0 quasi-norm, this morphological diversity can be formulated as follows:
∀{i, j} ∈ {1, . . . , D};
j = i ⇒ &ϕi Ti &0 < &ϕi Tj &0 .
(23)
238
Jerome Bobin et al.
In other words, MCA relies on the incoherence between the subdictionaries {i }i=1,...,D to estimate the morphological components {ϕi }i=1,...,D by solving the following convex minimization problem:
{ϕi }1≤i≤D
O O2 D D O O O O = arg min Ox − ϕi O + 2λ &ϕi Ti &1 . O O {ϕi }1≤i≤D 2
i=1
(24)
i=1
Note that the minimization problem in Eq. (24) is closely related to basis pursuit denoising (BPDN; see Chen et al., 1998). In Bobin et al. (2007), we proposed a particular block-coordinate relaxation, iterative thresholding algorithm (MCA/MOM) to solve Eq. (24). Theoretical arguments and experiments were provided to show that MCA provides at least as good results as basis pursuit for sparse overcomplete decompositions in a union of bases. Moreover, MCA is clearly much faster than basis pursuit. Thus, MCA is a practical alternative to classical sparse overcomplete decomposition techniques.
2. Morphological Diversity in Multichannel Data The previous paragraph briefly described morphological diversity in the monochannel case. We extend morphological diversity to the multichannel case. In this particular setting, we assume that each observation or channel {xi }i=1,...,m is the linear combination of D morphological components:
∀i ∈ {1, . . . , m};
xi =
D
ϕij ,
(25)
j=1
where each morphological component ϕij is sparse in a specific basis j . Then each channel {xi }i=1,...,m is assumed to be sparse in the overcomplete dictionary made of the union of the D bases {i }i=1,...,D . We further assume that each column of the data matrix X is sparse in the dictionary made of the union of D bases {i }i=1,...,D to account for inter-channel structures. The multichannel data X are then assumed to be sparse in the multichannel dictionary = [1 . . . D ] ⊗ [1 . . . D ]. The multichannel data are then modeled as the linear combination of D × D
multichannel morphological components:
X=
D D j=1 k=1
jk ,
(26)
Blind Source Separation: The Sparsity Revolution
239
where jk is sparse in k ⊗ j . In the same vein as discussed in subsection III.C on how to discriminate two multichannel atoms, separating two multichannel morphological components ip and jq =ip may be achieved based either on spatial/temporal (respectively spectral) morphologies (i = j and p = q, respectively i = j and p = q) or on both morphologies (i = j and p = q). The “separation” task seems easier in the latter case as the morphological components share neither the same spectral basis nor the same spatial (or temporal) basis. Analyzing multichannel signals requires accounting for their spectral and spatial morphological diversities. For that purpose, the proposed multichannel extension to MCA (coined mMCA) aims to solve the following minimization problem:
O O2 O O D D D
D
O O O O min OX − jk O + 2λ &Tk jk Tj &1 . {jk } O O j=1 k=1 j=1 k=1
(27)
F
B. Multichannel Overcomplete Sparse Recovery 1. General Multichannel Overcomplete Sparse Decomposition Recall that is an m × M overcomplete dictionary with M > m, and is a T × t overcomplete dictionary with T > t. Let us first consider the noiseless case. The multichannel extension of Eq. (13) is written as follows:
min &α&0 s.t X = α, α
(28)
where α is an M × T matrix [see also Eq. (20)]. Arguing as in the monochannel case, the convex 1 minimization problem in Eq. (14) can also be rewritten in the multichannel setting:
min &α&1 s.t X = α; α
(29)
see also Eq. (21).
C. Multichannel Morphological Component Analysis The problem at stake in Eq. (27) can be solved by extending to the multichannel case well-known sparse decomposition algorithms as reviewed in subsection 3.3.2. Extension of matching pursuit (MP) and OMP to the multichannel case has been proposed by Gribonval and Nielsen (2006). The
240
Jerome Bobin et al.
aforementioned greedy methods iteratively select one dictionary atom at a time. Unfortunately, this stepwise selection of active atoms is burdensome and the process may be sped up, as in Donoho et al. (submitted), where a faster stagewise orthogonal matching pursuit (StOMP) is introduced. It is shown to solve the 0 sparse recovery problem in Eq. (13) with random dictionaries under mild conditions. Because of the particular structure of the problem in Eq. (27), extending the MCA algorithm (Starck et al., 2005) to the multichannel case would lead to faster and still effective decomposition results. Recall that in the mMCA setting, the data X are assumed to be the linear combination of D × D morphological components {jk }j=1,...,D;k=1,...,D . (jk ) is the support of jk in the subdictionary jk = k ⊗ j . As X is K-sparse in the whole dictionary, / 0 j, k Card (jk ) = K. The data can be decomposed as follows:
X=
D D
jk =
D D
αjk [i]ψjk [i].
(30)
j=1 k=1 i∈(jk )
j=1 k=1
Substituting Eq. (30) in Eq. (27), the mMCA algorithm approaches the solution to Eq. (27) by iteratively and alternately estimating each morphological component jk in a Block-coordinate relaxed way (see Sardy et al., 2000). Each matrix of coefficients αjk is then updated as follows:
O O2 αjk = arg min ORjk − k αjk j OF + 2λ&αjk &1 , αjk
(31)
where Rjk = X − p, q =j,k q αpq p is a residual term. Since we are assuming that the subdictionaries {j }j and {k }k are orthonormal, the updated rule in Eq. (31) is equivalent to the following:
O O2 O O αjk = arg min OTk Rk Tj − αjk O + 2λ&αjk &1 , αjk
F
(32)
which has a unique solution αjk = λ Tk Rk Tj known as soft thresholding with threshold λ as follows:
λ (u[i]) =
0 if u[i] < λ . u[i] − λ sign (u[i]) if u[i] ≥ λ
(33)
For a fixed λ, mMCA selects groups of atoms based on their scalar product with the residual Rjk . Assuming that we select only the most coherent atom (with the highest scalar product) with the residual Rjk , then
241
Blind Source Separation: The Sparsity Revolution
one mMCA iteration is reduced to a stepwise multichannel matching pursuit (mMP) step. In contrast to mMP, the mMCA algorithm is allowed to select several atoms at each iteration. Thus, when hard thresholding is used instead of soft thresholding, mMCA is equivalent to a stagewise mMP algorithm. Allowing mMCA to select new atoms is obtained by decreasing the threshold λ at each iteration. The mMCA algorithm is summarized as follows: 1. Set the number of iterations Imax and threshold λ(0) . 2. While λ(h) is higher than a given lower bound λmin (e.g., can depend on the noise variance, see Section IV.E), For j = 1, . . . , D and k = 1, . . . , D
(h) • Compute the residual term Rjk assuming the current estimates of pq =jk , (h−1) ˜ pq =jk are fixed: (h) (h−1) Rjk = X − pq =jk ˜ pq =jk .
• Estimate the current coefficients of ˜ jk(h) by thresholding with threshold λ(h) : (h) (h) α˜ jk = λ(h) Tk Rjk Tj . • Get the new estimate of jk by reconstructing from the selected coefficients α˜ (h) jk :
˜ jk(h) = k α˜ (h) k j .
3. Decrease the threshold λ(h) following a given strategy.
1. The Thresholding Strategy In a previous work (Bobin et al., 2007) we proposed a thresholding strategy that is likely to provide the solution to the 0 -sparse monochannel problem. The strategy, termed MOM (Mean of Max) can be extended to the multichannel case. At each iteration h the residual is projected onto each subdictionary and we define: (h−1)
mjk
O ⎛ ⎞ O O O O T O (h−1) TO O ⎝ ⎠ = Ok X − q α˜ pq p j O . O O p, q
(34)
∞
The multichannel MOM (mMOM) threshold is then computed as the mean (h−1) of the two largest values in the set {mjk }j=1,...,D; k=1,...,D
λ(h) =
1 (h−1) (h−1) . mj k + mj k 0 0 1 1 2
(35)
The next section shows conditions under which mMCA/mMOM selects atoms without error and converges asymptotically to the solution of the multichannel 0 -sparse recovery problem in Eq. (20).
242
Jerome Bobin et al.
D. Recovering Sparse Multichannel Decompositions Using mMCA The mMOM rule defined in Eqs. (34) and (35) is such that mMCA will select, at each iteration, atoms belonging to the same subdictionary jk = k ⊗ j . Although it seems more computationally demanding, the mMOM strategy has several nice properties. We show sufficient conditions under which (1) mMCA/mMOM selects atoms belonging to the active atom set of the solution of the 0 -sparse recovery problem (exact selection property), and (2) mMCA/mMOM converges exponentially to X and its sparsest representation in . The mMCA/mMOM exhibits an autostopping behavior and requires only one parameter λmin , whose choice is easy and discussed in Section IV.E. The next proposition states that mMCA/mMOM verifies the exact selection property at each iteration. Proposition 1 (Exact Selection Property). Suppose that X is K-sparse such that:
X=
D D
αjk [i]ψjk [i],
j=1 k=1 i∈(jk )
where K =
j, k
/ 0 Card (jk ) satisfying K <
that the residual R(h) is K-sparse such that:
μ−1 2 . At the h-th iteration, assume
R(h) =
D D
βjk [i]ψjk [i].
j=1 k=1 i∈(jk )
Then mMCA/mMOM picks up coefficients belonging to the support of X at iteration (h). When the previous exact selection property holds, the next proposition shows that mMCA/mMOM converges exponentially to X and its sparsest representation in = [1 . . . D ] ⊗ [1 . . . D ]. Proposition 2 (Convergence). Suppose that X is K-sparse such that:
X=
D D
αjk [i]ψjk [i],
j=1 k=1 i∈(jk )
where K =
j, k
μ−1
/ 0 Card (jk ) .
If K < 2 , then mMCA/mMOM converges exponentially to X and its sparsest representation in . More precisely, the residual converges to zero at an exponential rate.
Blind Source Separation: The Sparsity Revolution
243
See Bobin et al. (in press) for detailed proofs. Note that the above conditions are not sharp, Exact selection and convergence may still be valid beyond the bounds retained in the latter two statements.
E. Handling Bounded Noise With mMCA When bounded noise perturbs the data, the data are modeled as follows:
X=
D D
αjk [i]ψjk [i] + N,
(36)
j=1 k=1 i∈(jk )
where N is a bounded noise: &N&F < . Sparse recovery then needs to solve the following problem:
O O O O D D
O O O &αjk &0 s.t O X − α k jk j O < . O O O j=1 k=1 k=1
min αjk
D D j=1
(37)
F
Stability conditions of sparse recovery have been investigated in Donoho et al. (2006); Fuchs (2006); Tropp (2006) in the monochannel case. More particularly, conditions are proved in (Donoho et al., 2006) under which OMP verifies an exact selection property in the presence of bounded noise. Donoho et al. (2006) also showed that the OMP solution lies in a 2 ball centered on the exact solution to the 0 -sparse recovery problem with a radius on the order of . Exhibiting similar stability results in the mMCA setting is challenging and will be addressed in the future study. In the mMCA framework, assuming the noise level is known, the mMCA/mMOM algorithm stops when λ ≤ λmin with λmin = 3 − 4.
F. Choosing the Overcomplete Dictionary The choice of the overcomplete dictionary is a key step because it determines where to look for a sparse representation. It is the expression of some prior information available on the signal. Interestingly, the 1 -sparse recovery problem can be seen in the light of a Bayesian framework. Solving the following problem
O O2 O O D D D
D
O O O O min OX − k αjk j O + 2λ &αjk &1 {αjk } O O j=1 k=1 j=1 k=1
(38)
F
is equivalent, in a Bayesian framework, to making the assumption (among others) of an independent Laplacian prior on the coefficients of each morphological component in the sparse representation domain. Choosing
244
Jerome Bobin et al.
the set of subdictionaries is then equivalent to assuming some specific prior for each morphological component. Furthermore, the attractiveness of mMCA lies in its ability to take advantage of sparse representations, which have fast implicit analysis and synthesis operators without requiring the explicit manipulation of each atom: wavelets (Mallat, 1998), curvelets (Candès et al., 1999b), bandlets (LePennec and Mallat, 2005), contourlets (Do and Vetterli, 2005), ridgelets (Candès, 1999b), and wave atoms (Demanet and Ying, 2006) and to name a few. As a consequence, mMCA is a fast nonlinear sparse decomposition algorithm whose computational complexity is dominated by that of the transforms involved in the dictionary. In the image-processing experiments reported in this chapter, we assume that a wide range of images can be decomposed into a piecewise smooth (contour) part and an oscillating texture part. We assume a priori that the contour part is sparse in the curvelet tight frame, and the texture part is sparsely described by the local discrete cosine transform (DCT) (Malioutov, 2005).5 However, all the results we previously proved were given with the assumptic that each subdictionary was an orthonormal basis. When the selected subdictionaries are more generally tight frames, the solution to Eq. (32) is no longer a simple thresholding. Nevertheless, in (Elad, 2006) and (Combettes, 2005), the authors showed that thresholding is the first step toward solving Eq. (32) when the subdictionary is redundant. Rigorously, proximal-type iterative shrinkage is shown to converge to a solution of Eq. (32). In practice, even when the subdictionary is a tight frame (for instance the curvelet frame), we will only use a single thresholding step to solve Eq. (32). As far as the choice of the spectral dictionary is concerned, it is based on a spectral sparsity assumption.
1. Epilogue This section, has surveyed the tricky problem raised by sparse overcomplete signal decomposition for multichannel data. We presented a multichannel extension to the MCA algorithm. The so-called mMCA algorithm is the backbone of the next sparsity-based algorithm we propose to solve the sparse BSS issue.
V. MORPHOLOGICAL DIVERSITY AND BLIND SOURCE SEPARATION In a previous work (Bobin et al., 2007), we introduced an extension of the mMCA framework for BSS. The GMCA framework states that the 5 An alternative choice would be the wave atoms (Demanet and Ying).
Blind Source Separation: The Sparsity Revolution
245
observed data X are generated according to Eq. (2). In words, X is a linear instantaneous mixture of unknown sources S using an unknown mixing matrix A, with an additive perturbation term N that accounts for noise or model imperfection. (Note: Remember that we only consider the overdetermined source separation case, i.e., m ≥ n, and thus A has full column rank).
A. Generalized Morphological Component Analysis Hereafter, we assume that the sources are sparse in the spatial dictionary that is the concatenation of D orthonormal bases {i }i=1,...,D : = * T +T 1 , . . . , TD . In the GMCA setting, each source is modeled as the linear combination of D morphological components where each component is sparse in a specific basis:
∀i ∈ {1, . . . , n};
si =
D k=1
ϕik =
D
αik k .
(39)
k=1
GMCA seeks an unmixing scheme, through the estimation of A, which leads to the sparsest sources S in the dictionary . This is expressed by the following optimization task written in its augmented Lagrangian form:
˜ S} ˜ = arg min 2λ {A, A, S
D n
&ϕik Tk &0 + &X − AS&2F ,
(40)
i=1 k=1
where each row of S is such that si = D k=1 ϕik . Obviously, this algorithm is combinatorial by nature. We then propose to substitute the 1 norm for the 0 sparsity, which amounts to solving the optimization problem:
˜ S} ˜ = arg min 2λ {A, A, S
n D
&ϕik Tk &1 + &X − AS&2F .
(41)
i=1 k=1
More conveniently, the product AS can be split into n × D multichannel morphological components: AS = i, k ai ϕik . Based on this decomposition, we propose an alternating minimization algorithm to estimate iteratively one term at ap time. Define the {i, k}-th multichannel residual by Ri, k = X − {p, q} ={i,k} a ϕpq as the part of the data X unexplained by the multichannel morphological component ai ϕik . Estimating the morphological component ϕik = αik k assuming A and ϕ{pq} ={ik} are fixed leads to the component-wise optimization problem:
ϕ˜ ik = arg min 2λ&ϕik Tk &1 + &Ri, k − ai ϕik &2F , ϕik
(42)
246
Jerome Bobin et al.
or equivalently,
α˜ ik = arg min 2λ&αik &1 + &Ri, k Tk − ai αik &2F ,
(43)
αik
since here k is an orthogonal matrix. By classical ideas in convex analysis, a necessary condition for α˜ ik to be a minimizer of the above functional is that the null vector be an element of its subdifferential at α˜ ik , that is,
0∈−
1
T
&ai &22
ai Ri, k Tk + αik +
λ &ai &22
∂&αik &1 ,
(44)
where ∂&αik &1 is the subgradient defined as (owing to the separability of the 1 -norm):
C = sign(α[l]), l ∈ (α) . ∂&α&1 = u ∈ R u[l] ∈ [−1, 1], otherwise.
t u[l]
Hence, Eq. (44) can be rewritten equivalently as two conditions leading to the following closed-form solution:
⎧ ⎨ 0, if aiT X T [l] ≤ λ i, k k αˆ jk [l] = ⎩ α [l], otherwise. where α =
T 1 ai Ri, k Tk &ai &22
λ sign &ai &22
−
(45)
T ai Ri, k Tk . This exact solution is
known as soft thresholding. Hence, the closed-form estimate of the morphological component ϕik is
$ ϕ˜ ik = δ
'
1
iT
&ai &22
a Xi, k Tk k with δ =
λ &ai &22
.
(46)
Now, considering fixed {ap }p =i and S, updating the column ai is then just a least-squares estimate:
⎛ a˜ i =
1 &si &22
⎝X −
⎞ ap sp ⎠ sTi ,
(47)
p =i
where sk = D k=1 ϕik . In a simpler context, this iterative and alternating optimization scheme has already proved its efficiency in Bobin et al., (2006).
Blind Source Separation: The Sparsity Revolution
247
In practice, each column of A is forced to have unit 2 norm at each iteration to avoid the classical scale indeterminacy of the product AS in Eq. (2). The GMCA algorithm is summarized as follows: 1. Set the number of iterations Imax and threshold δ(0) 2. While δ(h) is higher than a given lower bound δmin (e.g., can depend on the noise standard deviation), For i = 1, . . . , n For k = 1, . . . , D (h−1) • Compute the term rik(h) assuming the current estimates of ϕ{pq} ={ik} , ϕ˜ {pq} ={ik} are fixed: (h−1) T (h−1) (h−1) X − {p, q} ={i,k} a˜ p ϕ˜ {pq} rik(h) = a˜ i (h) • Estimate the current coefficients of ϕ˜ ik by thresholding with threshold δ(h) : (h) (h) T α˜ ik = δ(h) rik k • Get the new estimate of ϕik by reconstructing from the selected coefficients α˜ (h) ik : (h) ϕ˜ ik = α˜ (h) ik k
(h)
(h) Update ai assuming ap =k and the morphological components ϕ˜ pq are fixed: n p(h−1) (h) (h)T (h) 1 i = (h) 2 X − p =i a˜ a˜ s˜p s˜i &˜s i
&
2
− Decrease the threshold δ(h) .
GMCA is an iterative thresholding algorithm where at each iteration, coarse versions of the morphological component {ϕik }i=1,...,n; k=1,...,D for each source si are first computed. These raw sources are estimated from their most significant coefficients in . This first step then amounts to performing a single mMCA decomposition step in the multichannel representation A ⊗ with the threshold δ(h) . Following this step, the column ai corresponding to the i-th source is estimated from the most significant features of si . Each source and its corresponding column of A are then alternately estimated. The entire optimization scheme then progressively refines the estimates of S and A as δ decreases toward δmin . This particular iterative thresholding scheme provides robustness to the algorithm by working first on the most significant features in the data and then progressively incorporating smaller details to finel-tune the model parameters. The main difference with the mMCA algorithm lies in the mixing matrix update. Such stage is then equivalent to updating a part of the multichannel dictionary in which mMCA decomposes the data X.
1. The Dictionary As an MCA-like algorithm (see Starck et al., 2004), the GMCA algorithm involves multiplications by matrices Tk and k . Thus, GMCA is attractive
248
Jerome Bobin et al.
in large-scale problems as long as the redundant dictionary is a union of bases or tight frames. For such dictionaries, matrices Tk and k are never explicitly constructed, and fast implicit analysis and reconstruction operators are used instead (for instance, wavelet transforms, global or local discrete cosine transform).
2. Complexity Analysis Here we provide a detailed analysis of the complexity of GMCA. We begin by noting that the bulk of the computation is invested in the application of Tk and k at each iteration and for each component. Hence, fast implicit operators associated with k or its adjoint are of key importance in large-scale applications. In our analysis, we let Vk denote the cost of one application of a linear operator k or its adjoint. The computation of the multichannel residuals for all (i, k) costs O(nDmt) flops. Each step of the double “For” loop computes the correlation of this residual T with ai using O(mt) flops. Next, it computes the residual correlations (application of Tk ), thresholds them, and then reconstructs the morphological component ϕik . This costs O(2Vk + T) flops. The sources are then reconstructed with O(nDt), and the update of each mixing matrix column involves O(mt) flops. Noting that in our setting, n ∼ m t, and Vk = O(t) or O(t log t) for most popular transforms, the entire GMCA algorithm then costs O(Imax n2 Dt) + O(2Imax n D k=1 Vk + nDT). Thus, in practice GMCA could be computationally demanding for large-scale high-dimensional problems. In Section V.C, we prove that adding more assumptions leads to a very simple, accurate, and much faster algorithm that can handle very large-scale problems.
3. The Thresholding Strategy a. Hard or Soft Thresholding? Rigorously, we should use a softthresholding operator. In practice, hard thresholding leads to better results. Furthermore, it was shown empirically in Bobin et al. (2007) that the use of hard thresholding is likely to provide the 0 -sparse solution for the single-channel sparse decomposition problem. By analogy, the use of a hard-thresholding operator is assumed to solve the multichannel 0 quasi-norm problem instead of Eq. (41). b. Handling Noise. The GMCA algorithm is well suited to deal with noisy data. Assume that the noise standard deviation is σN . Then we simply apply the GMCA algorithm as described previously, terminating as soon as the threshold δ is less than τσN . Here, τ typically takes its value in the range 3–4. This attribute of GMCA makes it a suitable choice for use in noisy applications. GMCA not only manages to separate the sources, but it also succeeds in removing additive noise as a by-product.
Blind Source Separation: The Sparsity Revolution
249
4. The Bayesian Point of View GMCA can also be considered from a Bayesian viewpoint. For instance, let us assume that the entries of the mixtures {xi }i=1,...,m , the mixing matrix A, the sources {sj }j=1,...,n , and the noise matrix N are random variables. For simplicity, N is Gaussian; its samples are iid from a multivariate Gaussian distribution N (0, N ) with zero mean and covariance matrix N . The noise covariance matrix N is assumed known. For simplicity, the noise samples are considered to be decorrelated from one channel to the other; the covariance matrix N is thus diagonal. We assume that each entry of A is generated from a uniform distribution. (Of note other priors on A could be imposed here; e.g., known fixed column). We assume that the sources {si }i=1,...,n are statistically independent of each other and their coefficients in (the {αi }i=1,...,n ) are generated from a Laplacian law:
∀i = 1, . . . , n;
fS (αi ) =
T M
fS (αi [k]) ∝ exp (−μ&αi &1 ).
(48)
k=1
In a Bayesian framework, the use of the MAP estimator leads to the following optimization problem:
˜ S} ˜ = arg min &X − AS&2 + 2μ {A, N A, S
n D
&ϕik Tk &1 ,
(49)
i=1 k=1
where &.&N is the Frobenius norm defined by &X&2N = Trace XT −1 N X . Note that this minimization task is similar to Eq. (41) except that here the data fidelity term involving the norm &.&N accounts for noise. In the 2 I ), Eqs. (41) case of homoscedastic and decorrelated noise (i.e., N = σN m 2 . Note that in this framework the and (49) are equivalent with λ = μσN independence assumption in Eq. (48) does not necessarily entail that the sources are “truly” independent. Rather, it means that there are no a priori assumptions that indicate any dependency between the sources.
B. Results Next we illustrate the performance of GMCA with a simple toy experiment. We consider two sources s1 and s2 sparse in the union of the DCT and a discrete orthonormal wavelet basis. Their coefficients in are randomly generated from a Bernoulli–Gaussian distribution; the probability for a coefficient {α1, 2 [k]}k=1,...,T to be nonzero is p = 0.01 and its amplitude is drawn from a Gaussian distribution with mean 0 and variance 1.
250
Jerome Bobin et al.
The signals were composed of t = 1024 samples. We define the mixing ˆ −1 A&1, 1 , where P is a matrix that reduces matrix criterion CA = &I − PA the scale/permutation indeterminacy of the mixing model. Indeed, when ˆ up to scale and permutation. In the A is perfectly estimated, it is equal to A simulation experiments, the true sources and mixing matrix are obviously known and thus P can be computed easily. The mixing matrix criterion is thus strictly positive unless the mixing matrix is perfectly estimated up to scale and permutation. This mixing matrix criterion is experimentally much more sensitive to separation errors. Figure 2/ illustrates the0 evolution of CA as the signal-to-noise ratio SNR = 10 log10 &AS&22 /&N&22 increases. We compare our method to the RNA (Zibulevski, 2003) that accounts for sparsity and efficient FastICA (EFICA) (Koldovsky and Oja, 2006). The latter is a FastICA variant designed for highly leptokurtic sources. Both RNA and EFICA were applied after “sparsifying” the data via an orthonormal wavelet transform. Figure 2 shows that GMCA behaves similarly to state-of-the-art sparse BSS techniques.
C. Speeding Up Blind GMCA 1. The Orthonormal Case Let us assume that the dictionary is no longer redundant and reduces to an orthonormal basis. The 0 optimization problem [Eq. (40)] then can be 0.18 0.16
Mixing matrix criterion
0.14 0.12 0.1 0.08 0.06 0.04 0.02
5
10
15
20
SNR (dB)
FIGURE 2 Evolution of the mixing matrix criterion A as the noise variance varies. Generalized morphological component analysis (solid line), efficient FastICA (), Relative Newton algorithm (+).
Blind Source Separation: The Sparsity Revolution
251
written as follows:
˜ S} ˜ = arg min & X − Aα&2 + 2λ {A, F A, S
n
&αi &0 with S = α,
(50)
i=1
where each row of X = XT stores the decomposition of each observed channel in . Similarly, the 1 norm problem [Eq. (41)] reduces to:
˜ S} ˜ = arg min & X − Aα&2 + 2λ {A, F A, S
n
&αi &1 with S = α.
(51)
i=1
The GMCA algorithm no longer needs transforms at each iteration as only the data X must be transformed once in . Clearly, this case is computationally much cheaper. Unfortunately, no orthonormal basis is able to sparsely represent large classes of signals; still, we would like to use “very” sparse signal representations, which motivated the use of redundant representations. The next section provides arguments supporting the substitution of Eq. (51) for Eq. (41) even when the dictionary is redundant.
a. The Redundant Case. In this section, we assume is redundant. We consider that each datum {xi }i=1,...,m has a unique 0 -sparse decomposition (i.e., S 0 (xi ) is a singleton for any i ∈ {1, . . . , m}). We also assume that the sources have unique 0 -sparse decompositions (i.e., S 0 (si ) is a singleton for all i ∈ {1, . . . , n}). We then define X = [ (x1 )T , . . . , (xm )T ]T and
S = [ (s1 )T , . . . , (sn )T ]T . Until this point, we believed in morphological diversity as the source of discernibility between the sources we sought to separate. Thus, distinguishable sources must have “discernibly different” supports in . Intuition indicates that when one mixes very sparse sources, their mixtures should be less sparse. Two cases must be considered: • Sources with disjoint supports in : The mixing process increases the 0 norm: & (xj )&0 > & (si )&0 for all j ∈ {1, . . . , m} and i ∈ {1, . . . , n}. When is made of a single orthogonal basis, this property is exact. • Sources with δ-disjoint supports in : This argument is not so obvious; we conjecture that the number of significant coefficients in is higher for mixture / signals 0 than for the original sparse sources with high probability: Card δ (xj ) > Card (δ (si )) for any j ∈ {1, . . . , m} and i ∈ {1, . . . , n}. Owing to this “intuitive” viewpoint, even in the redundant case, the method is likely to solve the following optimization problem:
˜
˜ S } = arg min & X − A S &2 + 2λ& S &0 . {A, F A, S
(52)
252
Jerome Bobin et al.
Obviously, Eqs. (52) and (40) are not equivalent unless is orthonormal. When is redundant, no rigorous mathematical proof is simply derived. Nevertheless, experiments show that intuition leads to good results. In Eq. (52), note that a key point is still doubtful: sparse redundant decompositions (operator ) are nonlinear, and in general no linear model is preserved. Writing ( X ) = A ( S ) at the solution is then an invalid statement in general. The next section focuses on this source of fallacy.
b. When Nonlinear Processes Preserve Linearity. Whatever the sparse decomposition used (e.g., matching pursuit (Mallat and Zhang, 1993), basis pursuit (Chen et al., 1998)), the decomposition process is nonlinear. The simplification made earlier is no longer valid unless the decomposition process preserves linear mixtures. Let us first focus on a single signal. Assume that y is the linear combination of m original signals (y could be a single datum in the BSS model): y=
m
νi yi .
(53)
i=1
Assuming each {yi }i=1,...,m has a unique 0 -sparse decomposition, we define αi = (yi ) for all i ∈ {1, . . . , m}. As defined earlier, S0 (y) is the set of 0 -sparse solutions perfectly synthesizing y: for any α ∈ S0 (y); y = α. Among these solutions, one is the linearity-preserving solution α defined such that:
α =
m
νi αi .
(54)
i=1
As α belongs to S0 (y), a sufficient condition for the 0 -sparse decomposition to preserve linearity is the uniqueness of the sparse decomposition. Indeed, Eq. (46) proved that generally if
||α||0 < (μ−1 + 1)/2,
(55)
then this is the unique maximally sparse decomposition, and that in this case S1 (y) contains this unique solution as well. Therefore, if all the sources have sufficiently sparse decompositions in in the sense of inequality [Eq. (55)], then the sparse decomposition operator (.) preserves linearity. In Bobin et al. (2007), the authors showed that when is the union of D orthonormal bases, MCA is likely to provide the unique 0 pseudonorm sparse solution to Eq. (13) under the assumption that the sources are
Blind Source Separation: The Sparsity Revolution
253
sparse enough. Furthermore, in Bobin et al. (2007), experiments illustrate that the uniqueness bound (55) is too pessimistic. Uniqueness should hold, with high probability, beyond the bound [Eq. (55)]. Hence, based on this discussion and the results reported in Bobin et al. (2007), we consider in the next experiments that the operation (y), which stands for the decomposition of y in using MCA, preserves linearity.
c. In the BSS Context. In the BSS framework, recall that each observation {xi }i=1,...,m is the linear combination of n sources: xi =
n
aij sj .
(56)
j=1
Owing to the last paragraph, if the sources and the observations have unique 0 -sparse decompositions in , then the linear mixing model is preserved, that is:
(xi ) =
n
aij (sj ),
(57)
j=1
and we can estimate both the mixing matrix and the sources in the sparse domain by solving Eq. (52).
2. The Fast Blind GMCA Algorithm According to the last section, a fast GMCA algorithm working in the sparse transform domain (after decomposing the data in using a sparse decomposition algorithm) could be designed to solve Eq. (50) and [respectively Eq. (51)] by an iterative and alternate estimation of S and A. There is an additional important simplification when substituting problem Eq. (51) for Eq. (41). Indeed, as m ≥ n, Eq. (51) is a multichannel overdetermined leastsquares error fit with 1 -sparsity penalization. We again use an alternating minimization scheme to solve for A and S : • Update the coefficients: When A is fixed, the marginal optimization problem has a unique solution given by the forward-backward proximal fixed-point equation (see Combettes and Wajs, 2005, Proposition 3.1):
˜ S) , ˜ S = δ
˜ S + M( X − A
(58)
where M is a relaxation descent-direction matrix such that the spec˜† tral radius of I − MA is bounded above by 1. Choosing M = A
254
Jerome Bobin et al.
˜S= (pseudo-inverse of A, which is full column-rank), yields
† ˜ δ A X , δ is a thresholding operator [hard for Eq. (50) and soft for Eq. (51)], and the threshold δ decreases with increasing iteration count. ˜ = X
˜T • Update the mixing matrix A by a least-squares estimate: A S T −1 ˜ ˜
S S . Note that the latter two-step estimation scheme resembles the alternating sparse coding/dictionary learning algorithm presented by Aharon et al. (2006) in a different framework. The two-stage iterative process leads to the following fast GMCA algorithm: 1. Perform an MCA to each data channel to compute X :
X = [ (xi )T ]T . 2. Set the number of iterations Imax and threshold δ(0) . 3. While each δ(h) is higher than a given lower bound δmin (e.g., can depend on the noise standard deviation), • Proceed with the following iteration to estimate the coefficients of the sources S at iteration h assumingA is fixed: (h)
S (h+1) = δ(h) A† X . • Update A assuming S is fixed: ! " T T −1 ˜ (h+1) = X
˜ (h) ˜ (h) ˜ S (h)
.
A S S • Decrease the threshold δ(h) . 4. Stop when δ(h) = δmin .
As in Section V.A, the coarse to fine process is also the core of this fast version of GMCA. Indeed, when δ(h) is high, the sources are estimated from their most significant coefficients in . Intuitively, the coefficients with high amplitude in S are (1) less perturbed by noise and (2) should belong to only one source with overwhelming probability. The estimation of the sources is refined as the threshold δ decreases toward a final value δmin . Similar to the previous version of the GMCA algorithm, the optimization process provides robustness to noise and helps convergence even in a noisy context. Experiments in Section V.F illustrate the good performance of this fast GMCA algorithm.
a. Complexity Analysis. When the approximations are valid, the fast simplified GMCA version requires only the application of MCA on each channel, which is faster than the nonfast version (see Section V.A.2). Indeed, once MCA is applied on each channel, the remainder of the algorithm requires O(Imax n2 Dt) flops.
Blind Source Separation: The Sparsity Revolution
255
3. A Fixed-Point Algorithm Recall that the GMCA algorithm is composed of two steps: (1) estimating S assuming A is fixed, and (2) Inferring the mixing matrix A assuming S is fixed. In the simplified GMCA algorithm, the first step is reduced to a least-squares estimation of the sources followed by a thresholding as follows:
† ˜ X . ˜ S = δ A
(59)
The next step is a least-squares update of A:
T −1 ˜ = X
˜T ˜ ˜ .
A S S S
(60)
˜ † X such that
ˆ S and rewrite the previous ˆS=A ˜ S = δ
Define
equation as follows:
T ! T "−1 ˜ ˜ ˆ ˆ ˆS ˆ A = A S δ S δ S δ
.
(61)
Interestingly, Eq. (62) is a fixed-point algorithm with the following stationarity condition:
T T ˆ S δ
ˆ S = δ
ˆ S δ
ˆS .
(62)
T ˆ S to be symmetric. The ˆ S δ
This fixed-point condition constrains
section provides a precise probabilistic interpretation of this condition.
4. Convergence Study We now provide some heuristics to enlighten the convergence behavior of the above fast GMCA algorithm. From a statistical point of view, the sources sp and sq are assumed to be random processes. We assume that the entries of αp [k] and αq [k] are identically and independently generated from a sparse prior with a heavy-tailed pdf which is assumed to be unimodal at zero, even monotonically increasing for negative values. For instance, any generalized Gaussian distribution verifies these hypotheses. Figure 3 represents the joint pdf of two independent sparse sources (on the left) and
256
Jerome Bobin et al.
500
500
450
450
400
400
350
350
300
300
250
250
200
200
150
150
100
100
50
50 50 100 150 200 250 300 350 400 450 500
50 100 150 200 250 300 350 400 450 500
FIGURE 3 Contour plots of a simulated joint probability density function (pdf) of two independent sources generated from a generalized Gaussian law f(x) ∝ exp(−μ|x|0.5 ). Left, joint pdf of the original independent sources; right, joint pdf of 2 mixtures. (See color plate).
the joint pdf of two mixtures (on the right). We then take the expectation of both sides of Eq. (62):
E{αˆ p [k]αˆ q [k]} =
k∈δ (αˆ q )
E{αˆ p [k]αˆ q [k]},
(63)
E{αˆ p [k]αˆ q [k]}.
(64)
k∈δ (αˆ p )∩δ (αˆ q )
and symmetrically,
k∈δ (αˆ p )
E{αˆ p [k]αˆ q [k]} =
k∈δ (αˆ p )∩δ (αˆ q )
Intuitively, the sources are correctly separated when the branches of the star-shaped contour plot (see Figure 3 on the left) of the joint pdf of the sources are aligned with the axes. The following questions arise: do Eqs. (63) and (64) lead to a unique solution, and do acceptable solutions belong to the set of fixed points? Note that if the sources are perfectly estimated, then E{δ ( S ) δ ( S )T } is diagonal and E{ S δ ( S )} = E{δ ( S ) δ ( S )}. As expected, the set of acceptable solutions (up to scale and permutation) verifies the convergence condition. Let us assume that αˆ p and αˆ q are uncorrelated mixtures of the true sources αp and αq . Hard thresholding then correlates αˆ p and δ (αˆ q ) (respectively, αˆq and δ (αˆ p )) unless the joint pdf of the estimated sources αp and αq has the same symmetries as the thresholding operator (this property has also been outlined in Field (1999). Figure 4 shows a good empirical point of view of the previous remark. On the left side of Figure 4 depicts the joint pdf of two unmixed sources that have been hard-thresholded. Note that for any thresholds we apply, the thresholded
Blind Source Separation: The Sparsity Revolution
500
500
450
450
400
400
350
350
300
300
250
250
200
200
150
150
100
100
50
50 50 100 150 200 250 300 350 400 450 500
257
50 100 150 200 250 300 350 400 450 500
FIGURE 4 Contour plots a simulated joint probability density function (pdf) of two independent sources generated from a generalized Gaussian law that have been hard-thresholded. Left, joint pdf of the original independent sources that have been hard-thresholded; right, joint pdf of two mixtures of the hard-thresholded sources. (See color plate).
sources are still decorrelated as their joint pdf has the same symmetries as the thresholding operator. On the contrary, on the right side of Figure 4, the hard-thresholding process further correlates the two mixtures. For a fixed δ, several fixed points lead to decorrelated coefficient vectors αˆ p and αˆ q . Figure 4 provides a good intuition: for fixed δ the set of fixed points is divided into two different categories: (1) those that depend on the value of δ (plot on the right), and (2) those that are valid fixed points for all values of δ (plot on the left of Figure 4). The latter solutions lead to acceptable sources up to scale and permutation. Note that those conditions must hold for every threshold δ ≥ δ , where δ is the minimum scalar δ such that the sources sp and sq are δ-disjoint. Because fast GMCA involves a decreasing thresholding scheme, the final fixed points are stable if they verify the convergence conditions [Eqs. (63) and (64)] for all δ. To conclude, if the fast GMCA algorithm converges, it should converge to the true sources up to scale and permutation. Finally, we note that noise is naturally handled in the accelerated GMCA as for the original version. For instance, in presence of noise the MCA used in the first step to determine the sparse decomposition of the observations typically is stopped at 3 − 4σN . This strategy will be supported by the experiments of Section V.F.
D. Unknown Number of Sources In blind source separation, the number of sources is assumed to be a fixed known parameter of the problem. In practical situations, the number of sources often is rarely known and must be estimated. In an ideal theoretical setting, the number of sources is the dimension of the subspace of Rm (recall that m is the number of observations or channels) in which the
258
Jerome Bobin et al.
data lie. A misestimation of the number of sources n may entail several difficulties as noted: • Underestimation: In the GMCA algorithm, underestimating the number of sources will clearly lead to solutions made up of linear combinations of “true” sources. The solution then may be suboptimal with respect to the sparsity of the estimated sources. • Overestimation: In case of overestimation, the GMCA algorithm may have to cope with a mixing matrix estimate that does not have a full columnrank. The optimization problem at hand can be ill conditioned. Henceforth, estimating the number of sources is a crucial and strenuous issue. To our knowledge, only a few works have focused on the estimation of the number of sources n. Recently Balan (2007) approached the problem using the minimum description length. In this chapter, we introduce a sparsity-based method to estimate n within the GMCA framework. It is possible, as shown by Balan (2007), to use classical model selection criteria in the GMCA algorithm. Such criteria, including Akaikes information criterion (AIC) (Akaike, 1970) and Bayesian information criterion (BIC) (Schwarz, 1978), would provide a balance between the complexity of the model (here the number of sources) and its ability to faithfully represent the data. It would amount to adding a penalty term in Eq. (40). This penalty term would merely prevent a high number of sources. In the sparse BSS framework, we propose an alternative approach. Indeed, for a fixed number of sources p < n, the sparse BSS problem amounts to solving the following optimization task:
min
A, αColDim(A)=p
&α&1 s.t. &X − Aα&F < ,
(65)
where ColDim (A) is the number of columns of the matrix A. The general algorithm we would prefer to use is thus the following:
⎧ ⎨ min p
⎫ ⎬
&α&1 s.t. &X − Aα&F < . min ⎭ ⎩A, αColDim(A)=p
(66)
Let us write Pp, the problem in Eq. (65). Interestingly, if p < n, there exists a minimal value (p) such that if < (p), problem Pp, has no solution. For a fixed p < n, this minimal value (p) is obtained by approximating the data X with its largest p singular vectors. Furthermore, in the noiseless case, for p < n, (p) is always strictly positive as the data lie in a subspace whose dimension is exactly n. Then,
Blind Source Separation: The Sparsity Revolution
259
when p = n, the problem Pn, has at least one solution for = (n) = 0. Then, devising a joint estimation scheme for the mixing matrix A, the sources S and the number of sources n is possible via a constructive approach. Indeed, we propose to look for the solutions of Pp, for increasing values of p ≥ 1 and varying values of . As the GMCA algorithm is likely to provide the solution of Pp, (p) for a fixed p, we propose the following GMCA-based algorithm:
While &X − Aα&F > (n) and p ≤ m: 1. Increase p by adding a new column to A (this step is described below). 2. Solve Pp, (p) using the GMCA algorithm for a fixed p: minA, αColDim(A)=p &α&1 s.t. &X − Aα&F < (p).
1. The Role of the GMCA Algorithm The algorithm above strives to find a particular path described by a sequence {pi , i }i of solutions to Ppi , i . Ideally, an optimal scheme would provide the sequence {i, (i)}i=1,...,n of solutions to Pi, (i) , thus leading to the optimal value (n) = 0 when i = n. Nevertheless, this sequence is difficult to obtain in practice; the optimal sequence {i, (i)}i=1,...,n is unknown a priori. Hopefully, for a fixed p, the manner in which the threshold decreases and stops in the GMCA algorithm (Step 2 of the above algorithm) should allow GMCA to provide a solution close to Pp, (p) . Indeed, in the GMCA framework, there is a bijective map between a value of in Eq. (65) and the threshold δ used in the GMCA algorithm such that both formulations share the same solution. Obviously, when → 0, then δ → 0. Thus, in practice, for a fixed value of p, managing the threshold such that it tends to 0 in the GMCA algorithm should lead to a solution close to Pp, (p) . In the noiseless case, Step 2 of the above algorithm then amounts to running an entire GMCA estimation of A and S = α for a fixed p with a final threshold δmin = 0. Of note, the sequence {i, (i)}i=1,...,n can be estimated in advance. Indeed, for a fixed number of components p = i, the optimal approximation error is given by the projection on the subspace of dimension p spanned by the p singular vectors related to the p highest singular values of X. A preprocessing step would require computing the singular value decomposition of X to estimate the optimal sequence {i, (i)}i=1,...,n . In practical situations, the use of GMCA avoids this preprocessing step.
260
Jerome Bobin et al.
2. Increasing Iteratively the Number of Components In the aforementioned algorithm, Step 1 amounts to adding a column vector to the current mixing matrix A. The simplest choice would be to choose this vector at random. Wiser choices also can be made based on additional prior information as follows: • Decorrelation: If the mixing matrix is assumed to be orthogonal, the new column vector can be chosen as orthogonal to the subspace spanned by the columns of A with ColDim (A) = p − 1. • Known spectra: If a set of spectra are known a priori, the new column can be chosen among the set of unused spectra. The new spectrum can be chosen based on its coherence with the residual. Let A denote a set of spectra {ηl ∈ A}l=1,...,Card(A) and let note A c the set of unused spectra (i.e., spectra that have not been chosen previously), then the pth column of A is chosen such that
t 1 T k ηl = arg max η − AS] [X , 2 l &η & ηl ∈A c l k=1
(67)
2
where [X − AS]k is the k th column of X − AS. Any other prior information can be taken into account, which will guide the choice of a new column vector of A.
3. The Noisy Case In the noisy case, the parameter 2 can be interpreted as a bound on noise (for bounded noise such as the case of Gaussian white noise with covari2 I). In the second probabilistic case, the noise is known ance matrix σN to be bounded above and below by ±πσN with probability higher than 1 − exp(−π2 /2). In practice, in Step 2 of the above algorithm, the final threshold of the GMCA algorithm is chosen as δmin % 3σN . The choice π = 3 then guarantees the noise to be bounded with probability higher than 0.98.
a. A Simple Experiment. In this experiment, the data are assumed to be the linear combination of n sources as stated by the classical instantaneous mixture model. The entries of S have been independently drawn from a Laplacian probability density with scale parameter μ = 1 ( is chosen as the Dirac basis). The entries of the mixing matrix are independently drawn from a zero-mean Gaussian distribution of unit variance. The data are not contaminated by noise. This experiment focuses on comparing the classical Principal component analysis (PCA) (the popular subspace selection method) and the GMCA algorithm assuming n is unknown. In the absence of noise contaminating
Blind Source Separation: The Sparsity Revolution
261
the data, only the n highest eigenvalues provided by the PCA, which coincide with the Frobenius norm of each product {ai si }i=1,...,n , are nonzero. PCA therefore provides the true number of sources. In Figure 5, the aforementioned GMCA algorithm has been applied to the same data to estimate the number of sources. In this experiment, the number of channels is m = 64. Each observation has t = 256 samples. The number of sources n varies from 2 to 20. Each point has been computed from 25 trials. Figure 5 depicts the mean value of the number of sources estimated by GMCA. For each of the 25 trials, GMCA provides exactly the true number of sources. Figure 6 compares the performances of PCA and GMCA in recovering the true input sources. In this experiment, the number of channels is m = 128. Each channel has t = 2048 samples. The top panel of Figure 6 shows the mean recovery SNR (in dB) of the estimated sources. Clearly, the GMCA provides sources that are closer to the true sources than PCA. Let us define the following 1 -norm–based criterion:
C1
O n O O i ˜ i s˜i O1 i=1 a si − a = , n O i O O O i=1 a si 1
(68)
22 20
Estimated number of sources
18 16 14 12 10 8 6 4 2
5
10
15
20
Input-number of sources FIGURE 5 Estimating the number of sources with GMCA-(GMAC). Abscissa, true number of sources. Ordinate, estimated number of sources with GMCA. Each point is the mean number of sources computed from 25 trials. For each point, the estimation variance is zero.
262
Jerome Bobin et al.
GMCA versus PCA 80 70
Recovery SNR
60 50 40 30 20 10 0 210 0 10
1
10
2
10
Number of sources GMCA versus PCA 0
L1 Sparsity criterion
10
21
10
22
10
23
10
24
10
0
10
1
10
2
10
Number of sources FIGURE 6 GMCA (circles) versus PCA (solid lines) Abscissa: Input number of sources. Ordinate top panel: recovery SNR (in dB); Bottom Panel: 1 -sparsity criterion. Each point is an average value over 25 trials.
where symbol ∼ means estimated parameters. C1 provides a sparsitybased criterion that quantifies the deviation between the estimated sources and the true sparsest sources. The panel at the bottom of Figure 6 shows the evolution of C1 when the number of sources varies. As expected, the GMCA-based algorithm also provides the sparsest solutions. These preliminary examples show that GMCA is able to find the true dimension of the subspace in which the data lie (i.e., the true number of sources). Furthermore, GMCA provides far sparser solutions than PCA with much smaller recovery errors. Further work is needed to better
Blind Source Separation: The Sparsity Revolution
263
characterize the behavior of GMCA when the number of sources is unknown. This is clearly one perspective to consider in a future work.
E. Variations on Sparsity and Independence Previously, we have considered the data X as a collection of m channels each of which has t entries or samples. Considering instead X as a collection of t signals having m entries (the columns of the matrix X) leads to an interesting point of view. Let us assume now that the data X have already been decomposed in the spatial dictionary (as in Section V.C). We then handle the coefficients of X in defined as follows:
X = X .
(69)
For the sake of simplicity, we will assume that is a nice orthonormal matrix; X then has the same dimension as X. We also will assume that m = n and that the mixing matrix A is invertible. Similarly, the sources are represented in via their decomposition coefficients S . We assume that each entry of S is random and generated from a Laplacian distribution with scale parameter μ. The entries of S are mutually independent. Recalling that the {i, j}th entry of S is written αij , the probabilistic model is defined as follows:
∀i = 1, . . . , n : j = 1, . . . , t;
/ 0 0 / f αij ∝ exp −μ|αij | .
(70)
The matrix S can be viewed as the concatenation of t n × 1 column vectors {θSk }k=1,...,T such that
∀k = 1, . . . , t; i = 1, . . . , n;
θSk [i] = αij .
(71)
Clearly, the set of vectors {θSk }k=1,...,T are mutually independent and their individual probability function is as follows:
fS (θSk ) =
n M
fS θSk [i] ∝ exp −μ&θSk &1 .
(72)
i=1
The noiseless sparse BSS problem therefore amounts to determining the matrix B = A−1 that minimizes the sparsity of the estimated sources. From the point of view of optimization, the problem can be rewritten as follows:
min
A, {θSk }
t O O O kO OθS O k=1
1
s.t. X = A S .
(73)
264
Jerome Bobin et al.
Assuming A is fixed, the problem is equivalent to recovering the sparse decomposition of each column of X separately in the dictionary A. Generally, the mixing matrix A is unknown. The problem in Eq. (73) is then equivalent to seeking the basis (not necessarily orthogonal) A in which the columns of X are jointly the sparsest. This problem is then quite similar to the search for the “best sparsifying basis” described by Saito and Benichou (2003). In this framework, the sparse BSS issue is equivalent to solving the “best sparsifying basis” problem for an ensemble of vectors. In the upcoming text we proceed further and exhibit close links between different problems as summarized in the diagram
Best sparsifying/Unconditional bases ↔ Sparse BSS ↔ ICA
1. Sparse BSS and Unconditional Bases In the probabilistic framework described in Eq. (72), the set of vectors {θSk }k=1,...,T belongs with high probability (if C ) 1/μ) to the 1 -ball = {θ &θ&1 ≤ C}. Furthermore, each column vector θSk is, by definition, the transformed version of the original source column vectors θSk :
∀k = 1, . . . , t;
k θX = AθSk .
(74)
We further assume that A has columns with unit 2 -norm. It follows that k} the set of mixed vectors {θX k=1,...,T belongs, with high probability, to A, the image of the 1 -ball by A. In this particular framework, we can find very close connections with the work of Donoho (1993) in a different context. Inspired by this work, we can transpose exactly the same results to our framework. Indeed, the k} vectors {θX k=1,...,T must have a kind of unique “unconditional basis” (see Donoho, 1993), which turns out to be A−1 . Conversely, looking for the k} sparsest representation of the set of vectors {θX k=1,...,T (with respect to the 1 metric) solves the sparse BSS issue. Inspired by Donoho, the following proposition yields mild conditions, proving that the sparsest solution provides the sparse BSS solution. Proposition 3. Assume that the sources, in the sparse domain, S have entries independently and identically distributed from a Laplacian density. Assume that the mixing matrix A is invertible, has columns with unit 2 -norm. Then, the norm & · &1, 1 is a “contrast” function:
E {& S &1 } ≤ E {&A S &1 }. The proof is inspired by that of Lemma 4 in Donoho (1993).
(75)
Blind Source Separation: The Sparsity Revolution
265
The latter viewpoint of the sparse BSS problem yields several conclusions: • Contrast function: The 1 -norm is a contrast function. Indeed, looking for the sparsest solutions provides the solutions of the BSS problem. • Unconditional basis: Looking for a demixing matrix can be equivalent to k} seeking the “unconditional basis” of the set of vectors {θX k=1,...,T . Note that in harmonic analysis, the search for unconditional bases is motivated by their ability to provide the so-called diagonal processes. In the sparse BSS framework, the “diagonality” property is no more than the independence of the entries of the column vectors {θSk }k=1,...,T . This remark clearly stresses the interplay among apparently different concepts, namely sparse BSS, unconditional bases, and ICA. Interestingly, Meyer (personal communication) has already demonstrated the intuitive link between ICA and unconditional basis.
2. Sparse BSS and Sparse ICA From the ICA viewpoint, the Laplacian prior may be exploited via a maximum likelihood (ML) approach. Estimating the sources in the ML framework amounts to solving the following optimization problem:
min &B X &1 − log |det (B)|. B
(76)
In the noiseless case, the equality condition X = A S in the above problem can be recast as
min & S &1 , s.t. X = A S ,
A, S
(77)
which is valid when log |det (A)| is constant (for instance, in the orthogonal case). Then the above problem is directly equivalent to the sparse BSS problem described in Eq. (73). As a consequence, sparse ICA is equivalent to sparse BSS.
F. Results 1. The Sparser, the Better Heretofore, we have asserted that sparsity and morphological diversity are the clues for good separation results. The role of morphological diversity is twofold as noted below. 1. Separability: The sparser the sources in the dictionary (redundant or not), the more “separable” they are. As noted earlier, sources with
Jerome Bobin et al.
266
different morphologies are diversely sparse (i.e., they have δ-disjoint supports in with a “small” δ). The use of a redundant is thus motivated by the grail of sparsity for a wide class of signals for which sparsity means separability. 2. Robustness to noise or model imperfections: The sparser the sources, the less dramatic the noise. In fact, sparse sources are concentrated on very few significant coefficients in the sparse domain for which additive noise is a slight perturbation. As a sparsity-based method, GMCA should be less sensitive to noise. Furthermore, from a signal-processing viewpoint, dealing with highly sparse signals leads to easier and more robust models. To illustrate those points, let us consider n = 2 unidimensional sources with t = 1024 samples. These sources are the Bump and HeaviSine signals available in the WaveLab toolbox—see WaveLab (2005). The first column of Figure 7 shows the two synthetic sources. The sources are randomly mixed, and a Gaussian 5
2
4
0
12 10 8
3
22
6
24
4
2 1
2 26
0
0 22
28
21 0
200
400
600
800
1000
0
1200
200
400
600
800
1000
1200
0
200
400
600
800
1000
1200
0
200
400
600
800
1000
1200
10
4
10
8
8
2
6
6 0
4
22
2
4 2
0 24
0
22
26
22 0
200
400
600
800
1000
24 0
1200
200
400
600
800
3
3
2
2
1
1
0
0
21
21
22
22
1000
1200
23
23 0
200
400
600
800
1000
1200
0
200
400
600
800
1000
1200
FIGURE 7 The sparser the better. First column: original sources. Second column: mixtures with additive Gaussian noise (signal-to-noise ratio = 19 dB). Third column: sources estimated with (GMCA) using a single discrete orthogonal wavelet transform (DWT). Fourth column: Sources estimated with GMCA using a redundant dictionary made of the union of a DCT and a DWT.
Blind Source Separation: The Sparsity Revolution
267
Mixing matrix criterion
0.2
0.15
0.1
0.05
0
5
10
15
20
25
30
Signal to noise ratio (dB) FIGURE 8 The sparser the better. Behavior of the mixing matrix criterion when the noise variance increases for DWT − GMCA (dashed line) and (DWT + DCT) − GMCA (solid line).
noise with variance corresponding to SNR = 19 dB is added to provide m = 2 observations portrayed in the second column. We assumed that MCA preserves linearity for such sources and mixtures (see our choice of the dictionary later). The mixing matrix is assumed to be unknown. The third and fourth columns of Figure 7 depict the GMCA estimates computed with, respectively, (1) a single orthonormal discrete wavelet transform (DWT) and (2) a union of DCT and DWT. Visually, GMCA performs quite well in both cases. Figure 8 shows the value of the mixing matrix criterion CA (defined in Section V.B) as the SNR increases. In Figure 8, the dashed line corresponds to the behavior of GMCA in a single DWT; the solid line depicts the results obtained using GMCA when is the union of the DWT and the DCT. On the one hand, GMCA gives satisfactory results as CA is rather low for both experiments. Conversely, the values of CA provided by GMCA in the MCA domain are approximately five times better than those determined by GMCA using a unique DWT. This simple toy experiment clearly confirms the benefits of sparsity for BSS. Furthermore, it underscores the effectiveness of “very” sparse representations provided by nonlinear decompositions in overcomplete dictionaries. This is an occurrence of what Donoho calls the “blessing of dimensionality”.
2. Capability of GMCA to Provide the Sparsest Solution We have run a simple noiseless experiment. The data X consists of 4 mixtures (Figure 10), each of which is the linear combination of 4 sources (Figure 9). The mixing matrix has been chosen at random. The
268
Jerome Bobin et al.
FIGURE 9 The 256 × 256 source images.
GMCA algorithm has been performed in the biorthogonal wavelet domain; see Mallat (1998). The estimated sources are shown in Figure 11. These results were obtained using the GMCALab toolbox (Bobin). We previously emphasized GMCA as being able to provide the sparsest sources in the sense advocated by the sparse BSS framework. Figure 12 ˜ 1 − &S&1 along the 500 provides the evolution of the sparsity divergence &S& GMCA iterations. Clearly, the GMCA algorithm tends to estimate sources with increasing sparsity. Furthermore, the GMCA solution has the same sparsity (with respect to the sparsity divergence) as the true sources. This simple experiment demonstrates that GMCA is able to recover the solution with the correct sparsity level.
3. Dealing With Noise The previous paragraph emphasized sparsity as the key for very efficient source separation methods. This section compares several BSS techniques
Blind Source Separation: The Sparsity Revolution
FIGURE 10
269
The 256 × 256 noiseless mixtures.
with GMCA in an image separation context. We chose three different reference BSS methods: • JADE (joint approximate diagonalization of eigen-matrices): The wellknown independent component analysis (ICA) based on fourth-order statistics (see Cardoso, 1999). • Relative Newton algorithm: The seminal sparsity-based BSS technique of Zibulevsky (2003) method has already been reviewed. In the experiments reported hereafter, we used the RNA on the data transformed by a basic orthogonal two-dimensional (2D) wavelet transform (2D-DWT). • EFICA: This separation method improves the FastICA algorithm for sources following generalized Gaussian distributions (leptokurtic marginals with heavy tails). We also applied EFICA on data transformed by a two-dimensional-DWT where the leptokurticity assumption on the source marginal statistics is valid. Figure 13 shows the original sources (top) and the two mixtures (bottom). The original sources s1 and s2 have a unit variance. The matrix A that
270
Jerome Bobin et al.
FIGURE 11 The sources estimated using GMCA.
12000
Sparsity divergence
10000 8000 6000 4000 2000 0
0
50
100
150
200 250 300 350 400 450 Iteration number FIGURE 12 GMCA provides the sparsest solution. Abscissa: Iteration number. ˜ 1 − &S&1 . Ordinate: Sparsity divergence &S&
500
Blind Source Separation: The Sparsity Revolution
271
FIGURE 13 Top: the 256 × 256 source images. Bottom: two different mixtures. Gaussian noise is added such that the SNR 10 dB.
mixes the sources is such that x1 = 0.25s1 + 0.5s2 + n1 and x2 = −0.75s1 + 0.5s2 + n2 , where n1 and n2 are Gaussian noise vectors (with decorrelated samples) such that SNR = 10 dB. The noise covariance matrix N is diagonal. In Section V.F we claimed that a sparsity-based algorithm would lead to more robustness to noise. The comparisons we make here are twofold: (1) we evaluate the separation quality in terms of the correlation of the original and estimated sources as the noise variance varies, and (2) because the estimated sources are also perturbed by noise, correlation coefficients are not always very sensitive to separation errors so that we also assess the performances of each method by computing the mixing matrix criterion CA . The GMCA algorithm was applied using a dictionary consisting of the union of a fast curvelet transform (available online; see Candès et al., 2006; Curvelab, 2006) and a local discrete cosine transform (LDCT). The union
Jerome Bobin et al.
272
1.05 1 Correlation coefficient
Correlation coefficient
1 0.9 0.8 0.7
0.95 0.9 0.85 0.8 0.75 0.7
0.6
0.65 5
10
15
20
25
SNR (dB)
30
35
40
5
10
15
20
25
30
35
40
SNR (dB)
FIGURE 14 Evolution of the correlation coefficient between original and estimated sources as the noise variance varies. Solid line: GMCA; dashed line: JADE; (); (+) RNA. Abscissa: SNR (dB). Ordinate: correlation coefficients.
of the curvelet transform and LDCT are often well suited to a wide class of “natural” images. Figure 14 portrays the evolution of the correlation coefficient of source 1 (left) and source 2 (right) as a function of the SNR. At first glance, GMCA, RNA, and EFICA are very robust to noise as they give correlation coefficients close to the optimal value 1. On these images, JADE performs poorly, which might be due to the correlation between these two sources. For higher noise levels (SNR < 10 dB), EFICA tends to perform slightly worse than GMCA and RNA. As noted earlier, in our experiments a mixing matrix-based criterion is more sensitive to separation errors and thus better discriminates between the methods. Figure 15 depicts the behavior of the mixing matrix criterion as the SNR increases. Recall that the correlation coefficient was not able to discriminate between GMCA and RNA. The mixing matrix criterion clearly reveals the differences between these methods. First, it confirms the dramatic behavior of JADE on that set of mixtures. Second, RNA and EFICA behave rather similarly. Third, GMCA seems to provide far better results with mixing matrix criterion values that are up to 10 times better than JADE and approximately two times better than with RNA or EFICA. To summarize, the findings of this experiment confirm the key role of sparsity in BSS: • Sparsity yields better results: Among the methods we used, only JADE is not a sparsity-based separation algorithm. Whatever the method, separating in a sparse representation enhances the separation quality: RNA, EFICA, and GMCA clearly outperform JADE. • GMCA takes better advantage of overcompleteness and morphological diversity: RNA, EFICA, and GMCA provide better separation results with the benefit of sparsity. Nonetheless, GMCA takes better advantage of overcomplete sparse representations than RNA and EFICA.
Blind Source Separation: The Sparsity Revolution
273
2
Mixing matrix criterion
1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 5
10
15
20
25
30
35
40
SNR (dB) FIGURE 15 Evolution of the mixing matrix criterion A as the noise variance varies. Solid line, GMCA; dashed line, JADE; () EFICA; (+) RNA. Abscissa: SNR (dB). Ordinate: mixing matrix criterion value.
4. Higher-Dimension Problems and Computational Cost This section proposes an analysis of GMCA behavior when the dimension of the problem increases. Indeed, for a fixed number of samples t, it would be more difficult to separate mixtures with a high number of sources n. In the following experiment, GMCA is applied on data that are random mixtures of n = 2 to 15 sources. The number of mixtures m is set to be equal to the number of sources: m = n. The sources are selected from a set of 15 images (of size 128 × 128 pixels). These sources are depicted in Figure 16. GMCA was applied using the fast curvelet transform (Candès et al., 2006). Hereafter, we analyze the convergence of GMCA in terms of the mixing matrix criterion CA . This criterion is normalized as follows C¯A = CnA2 to be independent of the number of sources n. The plot on the left in Figure 17 shows how GMCA behaves when the number of iterations Imax varies from 2 to 1000. Regardless of the number of sources, the normalized mixing matrix criterion drops when the number of iterations is higher than 50. When Imax > 100, the GMCA algorithm tends to stabilize. Then, increasing the number of iterations does not lead to a substantial separation enhancement. When the dimension of the problem increases, the normalized mixing matrix criterion at convergence gets slightly larger (Imax > 100). As expected, for a fixed number of samples t, the separation task is likely to be more difficult when the number of sources n increases. Fortunately, GMCA still provides good separation results with low mixing matrix criterion (lower than 0.025) values up to n = 15 sources.
274
Jerome Bobin et al.
Computational cost (in seconds–log scale)
Normalized mixing matrix criterion
FIGURE 16 The set of 15 sources used to analyze how GMCA scales when the number of sources increases.
100
1021
1022
1023 100
101
102
103
103
102
101
100 2
4
Number of iterations (log scale)
6 8 10 Number of sources
12
14
16
FIGURE 17 Left: Evolution of the normalized mixing matrix criterion when the number of GMCA iterations Imax increases. Abscissa: Number of iterations. Ordinate: Normalized mixing matrix criterion. The number of sources varies as follows solid line, n = 2; dashed line, n = 5; , n = 10; ◦, n = 15. Right: Behavior of the computational cost when the number of sources increases. Abscissa: Number of sources. Ordinate: Computational cost in seconds. The number of iterations varies as follows: solid line, Imax = 10, dashed line, Imax = 100; ◦; Imax = 1000.
The plot on the right in Figure 17 illustrates how the computational cost6 of GMCA scales when the number of sources n varies. Recall that the fast GMCA algorithm is divided into two steps: (1) sparsifying the data and 6 The experiments were run with IDL on a PowerMac G5 2-GHz computer.
Blind Source Separation: The Sparsity Revolution
275
computing X , and (2) estimating the mixing matrix A and S . This plot shows that the computational burden obviously increases when the number of sources n grows. Note that, when m = n, the computational burden of step (1) is proportional to the number of sources n and independent of the number of iterations Imax . Then, for high Imax values, the computational cost of GMCA tends to be proportional to the number of iterations Imax .
VI. DEALING WITH HYPERSPECTRAL DATA A. Specificity of Hyperspectral Data Considering the objective function in the Eq. (27) from a Bayesian perspective, the 1 -penalty terms imposing sparsity are easily interpreted as coming from Laplacian prior distributions on the components sk , and Eq. (27) is akin to an MAP estimation of the model parameters A and S. Interestingly, there is a striking asymmetry in the treatment of A and S; this is, in fact, a common feature of the great majority of BSS methods. Invoking a uniform improper prior distribution for the spectral parameters A is standard practice. On the one hand, this unbalanced treatment may not seem so unfair when A and S actually do have very different roles in the model and very different sizes. As mentioned earlier, A is often simply seen as a mixing matrix of small and fixed size, while each row si of the source matrix S is usually seen as a collection of t samples from a process in time or pixels in an image, which can grow much larger than the number of channels m as more data are collected. On the other hand, there are applications in which one deals with data from instruments with a very large number of channels that are well organized according to some physically meaningful index. A typical example is hyperspectral data where images are collected in a large number of, what is more, contiguous regions of the electromagnetic spectrum. It then makes sense to consider the continuity, the regularity, and so on of some physical property from one channel to its neighbor. For instance, the spectral signatures of the objects in the scene may be known a priori to have a sparse representation in some specified possibly redundant dictionary of spectral waveforms. In what follows, the term hyperspectral is used generically to identify data with the following specific properties regardless of other definitions or models living in other scientific communities: 1. High dimensionality: The number of channels m in common hyperspectral imaging devices can be greater than a hundred. Consequently, problems involving hyperspectral data often have very high dimensions. 2. Contiguity: The large number of channels in the instrument achieve a regular/uniform sampling of some additional and meaningful physical
276
Jerome Bobin et al.
index (wavelength, space, time). We refer to this added dimension as the spectral dimension. 3. Morphospectral coherence: Hyperspectral data are assumed to be structured a priori according to the linear mixture model given in Eq. (3). We next describe an extension of the GMCA algorithm for hyperspectral data processing when it is known a priori that the underlying objects of interest Xk = ak sk exhibit sparse spectral signatures and sparse spatial morphologies in dictionaries of spectral and spatial waveforms specified a priori.
B. GMCA for Hyperspectral BSS 1. Principle A well-known property of the linear mixture model [Eq. (3)] is its scale and permutation invariance; without additional prior information, the indexing of the Xk in the decomposition of data X is not meaningful and, ak , sk can trade a scale factor with full impunity. A consequence is that, unless a priori specified otherwise, information on the separate scales of ak and sk is lost and only a joint scale parameter for ak , sk can be estimated. In a Bayesian perspective, this a priori knowledge of the multiplicative mixing process and of the loss of information it entails needs to be translated into a practical joint prior probability distribution for Xk = ak sk . The relevant distribution after the multiplicative mixing is the distribution of Xk = ak sk , which has the obvious property of being a function of ak and sk through their product only. Actually, the variables that matter are γ k and νk , which are the sparse coefficient vectors representing, respectively, ak in and sk in :
Xk = αk = γ k νk = ak sk ,
(78)
where αk = γ k νk is a rank one matrix of coefficients. For the sake of simplicity, and are two orthonormal bases. Unfortunately, deriving the distribution of the product of two independent random vectors γ k and νk starting from assumptions on their separate distribution functions is notoriously cumbersome. We propose instead that the following pπ is a good candidate joint sparse prior distribution for γ k and νk after the loss of information induced by multiplication:
pπ (γ k , νk ) = pπ (γ k νk , 1) ∝ exp(−λk &γ k νk &1 ) ∝ exp(−λk
|γ k [i]νk [j]|),
i, j
(79)
Blind Source Separation: The Sparsity Revolution
277
where γ k [i] is the ith entry in γ k and νk [j] is the jth entry in νk . The property &γ k νk &1, 1 = &γ k &1 &νk &1 is obvious. Thus, the proposed distribution has the nice property (for subsequent derivations) that the conditional distributions of γ k given νk and of νk given γ k are Laplacian distributions that are commonly and conveniently used to model sparse distributions. This distribution provides a convenient and formal expression for our prior knowledge of the sparsity of both ak and sk in dictionaries of spectral and spatial waveforms and of the multiplicative mixing process. Inserting this prior distribution in a Bayesian MAP estimator leads to the following minimization problem:
O2 O n O O O O k min OX − γ νk O k O O {γ , νk }
N
k=1
+
n
λk &γ k νk &1 .
(80)
k=1
Interestingly, this can be expressed slightly differently as follows:
O O2 n O O O O min OX − Xk O αk O O k=1
N
+
n
λk &αk &1
(81)
k=1
with Xk = αk and ∀k, rank(Xk ) ≤ 1, thus uncovering a nice interpretation of our problem as that of approximating the data X by a sum of rank one matrices Xk , which are sparse in the specified dictionary of rank one matrices. This is the usual 1 minimization problem as in Eq. (38), but with the additional constraint that the Xk are all rank one at most. The latter constraint is enforced here mechanically through a proper parametric representation of Xk = ak sk or αk = γ k νk . Note that rescaling the parameters A and S is not so problematic now as with GMCA, since it does not affect the objective function Eq. (80). Indeed, rescaling the columns of the so-called mixing matrix, A ← ρA, while applying the proper inverse scaling to the lines of the source matrix, S ← ρ1 S, leaves both the quadratic measure of fit and the 1 -sparsity measure in Eq. (80) unaltered. Although renormalizing is still worthwhile numerically, it is no longer dictated by the lack of scale invariance of the objective function and the need to avoid trivial solutions, as in GMCA. The next section, emphasizes the extension of the GMCA algorithm to the hyperspectral BSS issue.
2. The Algorithmic Viewpoint Let us consider now in detail the case where the multichannel dictionary = ⊗ is an orthonormal basis obtained as the tensor product of two
278
Jerome Bobin et al.
orthonormal bases and of spectral and spatial waveforms, respectively. As noted previously, when non-unitary or redundant transforms are used, the above are no longer strictly valid. Nevertheless, simple shrinkage still gives satisfactory results in practice as studied in Combettes and Wajs (2005); Elad (2006). We also assume that A is left invertible and that S is right invertible. In this case, the minimization problem [Eq. (80)] is best formulated in coefficient space, leading to a slightly different (however much faster) algorithm, since there is only one transformation to be applied and this needs to be done only once. For the sake of clarity, we assume that 2 I. With these additional the noise covariance matrix reduces to the N = σN assumptions, Eq. (80) can be rewritten as follows:
O O2 n n O 1 O O k O α − min γ ν + λk &γ k &1 &νk &1 , O O k 2 O {γ k , νk } σ N O k=1
F
(82)
k=1
where we have written α = T XT the coefficients of the data matrix X in the multichannel dictionary = ⊗ . In other words, we are seeking a decomposition of a matrix α into a sum of sparse rank one matrices αk = γ k νk by minimizing
min γ, ν
1 σN
&α − γν&2F + 2
n
λk &γ k νk &1 ,
(83)
k=1
The minimization problem in Eq. (83) has at least one solution by coercivity and is nonconvex. But, for fixed γ (respectively ν), the marginal minimization problem over ν (respectively γ) is convex. Since solutions of Eq. (83) have no explicit formulation, we again propose solving it by means of a block-coordinate relaxation iterative algorithm by alternately minimizing with respect to γ holding ν fixed and vice versa. Thus, by classical ideas in convex analysis, a necessary condition for (γ, ν) to be a minimizer is that the zero is an element of the subdifferential of the objective at (γ, ν). Using (Proposition 3.1) of Combettes and Wajs (2005), this can be written as the system of coupled proximal forward-backward fixed-point equations:
⎧ ⎨ ν = η ν + 1 2 βν γ T (α − γ ν) σN , ⎩ γ = ζ γ + 1 2 (α − γ ν) νT βγ σ
(84)
N
where βν and βγ are relaxation matrices of appropriate sizes such that the spectral radius of I − σ 1 2 βν γ T γ and I − σ 1 2 ννT βγ is bounded above N N by 1. By assuming the left invertibility of A and the right invertibility
Blind Source Separation: The Sparsity Revolution
279
of S, σ 1 2 γ T γ and σ 1 2 ννT are symmetric and invertible. Hence, taking βν = N / N 0−1 / 0−1 and βγ = σ N 2 ννT , the above are rewritten as the following σN 2 γ T γ update rules on the coefficient matrices γ and ν:
⎧ / 0−1 T ⎪ ⎨ ν = η γ T γ γ α / 0 . ⎪ ⎩ γ = ζ ανT ννT −1 η is a vector of size n, each of its entries η[k] =
(85)
λk σ N 2 &γ k &1 2&γ k &2
, and similarly ζ is
2
a vector of size n the entries of which ζ[k] =
λk σ N 2 &νk &1 2&νk &2
. The multichannel
2
soft-thresholding operator η acts on each row k with threshold η[k], and ζ acts on each column k of γ with threshold ζ[k]. Both update rules can be interpreted as a soft-thresholding operator applied onto the result of a weighted least-squares regression in the ⊗ representation. Finally, in the spirit of the fast GMCA algorithm (see Section V.C.2), we propose that a solution to the above set of coupled Eq. (84) also can be approached efficiently using a symmetric iterative alternating least-squares scheme in conjunction with a shrinkage operator with a progressively decreasing threshold. In the present case, the transformation into ⊗ space is applied only once, which has a major impact on computation speed, especially when dealing with large hyperspectral datasets. The two-stage iterative process leads to the following fast hypGMCA algorithm: 1. Set the number of iterations Imax and initial thresholds λ(0) k 2. Transform the data into X into α (h) 3. While λk are higher than a given lower bound λmin , • Update ν assuming / γ is fixed: 0 ν(h+1) = λ(h) (γ T γ)−1 γ T α • Update γ assuming / ν is fixed: 0 γ (h+1) = λ(h) ανT (ννT )−1
• Decrease the thresholds λ(h) k . 4. Stop when λ(h) < λ . min k 5. Transform back the coefficients to get X = γν.
The coarse to fine process is again the core of this fast version of GMCA for hyperspectral data. With the threshold successively decreasing toward zero with each iteration, the current sparse approximation is refined progressively by including finer structures alternatingly in the different morphological components, both spatially and spectrally. Soft thresholding again results from the use of an 1 -sparsity measure, which
280
Jerome Bobin et al.
as explained earlier comes as a good approximation to the desired 0 quasi-norm solution. Toward the end of the iterative process, applying a hard threshold instead leads to better results. The final threshold should vanish in the noiseless case, or it may be set to a multiple of the noise standard deviation in the presence of noise (as in common detection or denoising methods).
C. Comparison With GMCA 1. Comparison Between GMCA and Its Extension to the Hyperspectral Case According to the linear instantaneous mixture model, the data X are modeled as the linear combination of n sources. In this toy example, the sources are drawn randomly for a set of 128 × 128 images featured in Figure 18. The number of drawn sources is n = 5. The spectra are generated from a Laplacian probability density with scale parameter μ = 1 in an orthogonal wavelet domain. The spectra are to be positive; note that the GMCA algorithm is flexible enough to account for this assumption. In the next experiments, since we want to assess the impact of the spectral sparsity constraint, we do not take advantage of this prior information. The number of channels is m = 128. White Gaussian noise with covariance matrix 2 I is added. N = σN In the next experiment, we first compare the original GMCA algorithm to its extension for hyperspectral data. This first test emphasizes the enhancements provided by the sparse spectral constraint when the SNR varies from 0 to 40 dB. Figure 19 features 6 of 128 noisy channels with
FIGURE 18 Image data set used in the experiments. See text for details.
Blind Source Separation: The Sparsity Revolution
281
FIGURE 19 Six 128 × 128 mixtures (of the 128 channels). The signal-to-noise ratio (SNR) is equal to 20 dB.
SNR = 20 dB. The GMCA algorithms are computed in the curvelet domain with 100 iterations. Figure 20 depicts the sources estimated by the original GMCA algorithm (panels on the left) and by the GMCA algorithm with spectral sparsity constraints (panels on the left). Visual impression clearly favors the results provided by GMCA with spectral sparsity constraints. More quantitative results are shown in Figure 21, which pictures the
282
Jerome Bobin et al.
FIGURE 20 Left, sources estimated with the original (GMCA) algorithm. Right, sources estimated with GMCA with spectral sparsity constraints.
Blind Source Separation: The Sparsity Revolution
283
GMCA with spectral constraints 0.35
Mixing matrix criterion
0.3 0.25 0.2 0.15 0.1 0.05 0
0
5
10
15
20
25
30
35
40
SNR (dB) FIGURE 21 Evolution of the mixing matrix criterion CA as a function of the SNR (in dB). Solid line: recovery results with GMCA; Circles: recovery results with GMCA with spectral sparsity constraint.
evolution of the mixing matrix criterion CA when the SNR varies from 0 to 40 dB. Clearly, accounting for additional prior information provides better recovery results. Furthermore, as shown in Figure 21, the morphospectral sparsity constraint provides more robustness to noise.
2. Behavior in Higher Dimensions The previous paragraph emphasized the robustness to noise provided by the morphospectral sparsity constraint. Intuitively, for fixed numbers of samples t and channels m, increasing the number of sources entails estimating an increasing number of parameters, thus making the separation task more difficult. Accounting for the spectral sparsity assumption should lead to better results when the number of sources increases. In this one-dimensional (1D) toy-example experiment, the entries of S have been independently drawn from a Laplacian probability density with scale parameter μ = 1 ( is chosen as the Dirac basis). The entries of the mixing matrix are independently drawn from a Laplacian probability density with scale parameter μ = 1 ( is also chosen as the Dirac basis). The data are not contaminated by noise. The number of samples is t = 2048; the number of channels is m = 128. Figure 22 depicts the comparisons between GMCA and its extension to the hyperspectral setting. Each point of this figure has been computed
284
Jerome Bobin et al.
GMCA with spectral constraints
GMCA with spectral constraints 100
45
C1 Sparsity-based criterion
40
Recovery SNR
35 30 25 20 15 10 100
101
102
1021
1022
1023
1024 100
Number of sources
101
102
Number of sources
FIGURE 22 Solid line: recovery results with GMCA; Circles: recovery results with GMCA with spectral sparsity constraint.
as the mean over 100 trials. The left panel of Figure 22 features the evolution of the recovery SNR when the number of sources varies from 2 to 64. At a low number of sources, the morphospectral sparsity constraint leads to a slight enhancement of the separation results. When the number of sources increases (n > 15), the spectral sparsity constraint clearly enhances the recovery results. For instance, when n = 64, the GMCA algorithm with the spectral sparsity constraint outperforms the original GMCA up to 12 dB. The right panel of Figure 22 shows the behavior of the GMCA algorithms with respect to the sparsity-based criterion C1 introduced in Eq. (68). As expected, accounting for the sparsity of the spectra yields sparser results. Furthermore, as the number of sources increases, the deviation between the aforementioned methods becomes wider. This experiment enlightens the impact of the morphospectral sparsity constraint on the recovery results. As expected, adding further assumptions leads to enhanced performances. In these experiments, we illustrated that the morphospectral sparsity constraint yields better stability with respect to noise contamination and more robustness when the dimensionality (i.e., the number of sources) of the problem increases.
VII. APPLICATIONS A. Application to Multivalued Data Restoration 1. Looking for a Sparser Representation Section V.E highlighted the close links between sparse BSS and best sparsifying/unconditional bases in harmonic analysis. In this context, sparse BSS algorithms can provide a basis/representation in which a set of signals (i.e., the columns of the data matrix in the BSS framework) are jointly
Blind Source Separation: The Sparsity Revolution
285
sparse. In the BSS framework, we proved that this attractive property leads to the solution of the sparse BSS problem. Interestingly, in a wide range of multichannel inverse problems, there is a need for multichannel sparse representations.7 We illustrated in Bobin et al. (2007) that looking for an adaptive multichannel representation = ⊗ in which the data X are very sparse improves the solution to some classical inverse problems (denoising, inpainting). Let us consider the following general model for inverse problems:
Y = F (X) + N,
(86)
where again N models noise. The mapping F can represent a variety of degradation operators involved in classical inverse problems (e.g., denoising, deconvolution, inpainting). We assume that N is a white Gaussian 2 I. We also assume that the data X are noise with covariance matrix σN sparse in the multichannel dictionary = ⊗ . Solving the aforementioned inverse problem involves looking for the solution to the following optimization problem:
min λ &α&1 + α
1 &Y − F (α)&2F . 2
(87)
Looking for an adaptive multichannel representation amounts to adapting the dictionaries and to the data X. For instance, assuming is square invertible, adapting is equivalent to seeking a spectral representation in which the columns of the data matrix X are jointly sparse. Although no mixture model is available, this task can be performed by applying GMCA on the data X using as the spatial dictionary. Nevertheless, when the data X are not known up to noise contamination (for instance, if F is different from the identity), estimating with GMCA is not possible. In the scope of adaptive restoration issues, several approaches can be used: • Offline scheme: If the data X are known up to noise contamination (this is the case when F is the identity mapping), GMCA can be applied on Y to estimate an appropriate sparser spectral representation . The restoration problem can then be solved assuming and are fixed. This so-called offline scheme is applied for solving a multichannel denoising issue in Section VII. • Online scheme: If the data X are degraded by a nonlinear mapping F , estimating an adapted spectral representation cannot be performed 7 More precisely, a multichannel dictionary = ⊗ as introduced in Section IV in which the data X are
sparse.
286
Jerome Bobin et al.
using GMCA. We propose adapting the original GMCA algorithm to solve some inverse problems, such as those in Eq. (86), while adapting the spectral representation . More precisely, this so-called online scheme is applied for solving a multichannel inpainting problem in Section VIII. Adapting the representation to the data has also been introduced in various fields (e.g., Mairal et al., 2006; Sallee and Olshausen, 2003). Peyré (2007) proposed, in the monochannel case, such an adaptive dictionary learning process assuming that the sparse representation lies in a class of tree-based multiscale transforms (e.g., wavelet and cosine packets (Mallat, 1998), bandlets (LePennec and Mallat, 2005). Note also that learning patch-based spatial/spectral dictionaries could provide a means of adapting a sparse representation to the data (see Mairal et al., 2006; Peyr´e, 2007). In the multichannel case, such an adaptive recovery would need to be applied on both the spectral dictionary and the spatial dictionary . Note that if the dimensions of the data X are not too high (not exceeding a thousand samples per channel), the GMCA algorithm could be used to adapt the spatial dictionary . In practical situations, the spatial dimension t often is much higher than the spectral dimension m. In high dimensions, the GMCA algorithm is no longer relevant for adapting sparse representations. The quest for effective learning algorithm in high dimensions is still a strenuous and open problem. In the next section, we assume that the spatial dictionary is known and fixed. We look only for an adaptive spectral dictionary .
2. An Offline Approach: Application to Color Image Denoising We previously emphasized the critical importance of signal representations. We claimed that accounting for both spatial and spectral coherence or structure should enhance multichannel data restoration. This section addresess the issue of multichannel color image denoising. In this context, the data X are made√ of three color layers (red, green, and blue; √ RGB). Each color layer is a t × t image. The denoising problem reduces to choosing the perturbation mapping F in Eq. (86) as the identity so that:
Y = X + N.
(88)
A first straightforward solution consists in denoising each layer separately. Hopefully, accounting for interchannel structures or coherence would lead to better results. The top left picture of Figure 23 portrays a noisy color image with a SNR = 15 dB. The top right picture shows the RGB denoised image obtained using a classical wavelet-based denoising method on
Blind Source Separation: The Sparsity Revolution
287
FIGURE 23 Top Left, Original 256 × 256 image with additive Gaussian noise. The signal-to-noise ratio (SNR) = 15 dB. Top Right, Wavelet-based denoising in the red-green-blue space. Bottom, See text for details, Wavelet-based denoising in the curvelet-GMCA−based spectral representation.
each color plane8 using as the undecimated discrete wavelet transform (UDWT). Section V.A. described a GMCA-based BSS algorithm. We showed that this algorithm is able to seek an adapted spectral basis in which the processed multichannel data are sparser. We then can apply this algorithm to estimate such a sparse spectral representation of different color images. The results are shown at the bottom of Figure 23. This learning step leads to a GMCA-based algorithm that adapts the sparse representation to the data. Such an adaptive process will also be applied to the inpainting problem in Section VII; (for further details, refer to Bobin et al. (2007)). 8 All color images can be downloaded at http://perso.orange.fr/jbobin/gmca2.html.
288
Jerome Bobin et al.
FIGURE 24 Zoom test images. Top Left, Original image with additive Gaussian noise. The SNR = 15 dB. Top Right, Wavelet-based denoising in the RGB space. Bottom, Wavelet-based denoising in the curvelet-GMCA space.
The top right image in Figure 23 is obtained by applying the GMCA algorithm with the following choices: is the UDWT, is the adaptive basis obtained with the GMCA-based BSS algorithm described in Section V.A. Visually, denoising in the “adaptive color space” performs better than in the RGB space. Figure 24 zooms in on a particular part of the previous images; Visually the contours are better restored. We also applied this denoising scheme with other nonadaptive spectral representations (which is equivalent to choosing a different color space representations: YUV, YCC (luminance and chrominance spaces). For comparative purposes, we also applied the ICA algorithm JADE described in Cardoso (1999) on the original color images to determine yet another adaptive color representation in which to run the same denoising algorithm. A natural question that arises is whether it is worth denoising in a different space
Blind Source Separation: The Sparsity Revolution
289
2.5 2
SNR gain
1.5 1 0.5 0 20.5
5
10
15
20
25
30
Mean SNR (dB) FIGURE 25 Denoising color images. The y-axis shows the gain in terms of SNR (dB) compared to a denoising process in the RGB color space. Solid line, GMCA−based; dashed-dotted line, JADE; circles, YUV; +, YCC.
(YUV, YCC, JADE, or GMCA-based) instead of denoising in the original RGB space. Figure 25 shows the SNR improvement (in dB) compared to denoising in the RGB space obtained by each method (YUV, YCC, JADE, and GMCA-based). Figure 25 shows that YUV and YCC representations lead to the same results (note that the YCC color standard is derived from the YUV one). With this particular color image, JADE yields satisfactory results as it can improve denoising up to 1 dB. Finally, as expected, a sparsity-based representation such as the GMCA-based spectral representation provides better results. Here, the use of the sparsest GMCA-based representation enhances denoising up to 2 dB. This series of tests confirms the visual impression derived from Figure 23. According to our claim, accounting for interchannel coherence improves multichannel data denoising quality.
3. An Online Approach: Application to Color Image Inpainting Throughout this chapter, we have focused on accounting for both spectral and spatial coherence/structures to better solve multichannel inverse problems such as inpainting or denoising issues. Furthermore, we used the GMCA algorithm to devise a spectral basis to better (i.e., sparsely) represent the multichannel data. We showed that adapting the representation to the data greatly enhances denoising results. Designing adaptive algorithms is then of crucial importance for restoration issues. This section considers the particular case of color image inpainting. Again, the data X consist of three observed channels corresponding to each color layer (for instance, red, green, and blue) that cannot be strictly called
290
Jerome Bobin et al.
spectra. Note that restoring color images in a different color basis (i.e., YUV) may sometimes enhance the restoration performance. We propose recovering masked color images using the proposed GMCA inpainting method, which seeks to adapt the color space to the data X. In this context, we assume that is a 3 × 3 invertible matrix. In the GMCA framework, D = 1 and the data X are the linear combination of D multichannel morphological components. Adapting the spectral basis (i.e., the color space) to the data then amounts to estimating an “optimal” matrix . The GMCA algorithm is then adapted such that at each iteration h the matrix is updated by its least-squares estimate:
(h+1)
O O2 O O D O (h) (h) O O = arg min OY − αj j O O . O O j=1
(89)
F
This problem has a unique minimizer defined as follows:
⎡ (h+1) = Y(h) ⎣
D
⎤† (h)
αj j ⎦ ,
(90)
j=1
D
(h)
†
where is the pseudo-inverse of the matrix j=1 αj j The GMCA algorithm is then adapted as follows:
D
(h) j=1 αj j .
1. Set the number of iterations Imax and threshold λ(0) . 2. While λ(h) is higher than a given lower bound λmin (e.g., can depend on the noise variance), (h−1) a. Compute Y(h) = Y + Mc - X˜ . b. Initialize to zero each residual morphological components { ˜ j }(h−1) . For j = 1, . . . , D • Compute the residual term Rj(h) assuming the current estimates of {p} ={ j} , ˜ p (h−1) are fixed: =j
Rj(h) = Y(h) −
p =j
˜ p(h−1) .
• Estimate the current coefficients of ˜ j(h) by thresholding with threshold λ(h) : T (h) (h) α˜ j = λ(h) (h) Rj Tj . • Determine the new estimate of j by reconstructing from the selected coefficients α˜ (h) j : ˜ j(h) = (h) α˜ (h) j j .
(h) ˜ j(h) . c. Update the hypercube X˜ = D j=1 d. Update the spectral basis :
† (h) D (h+1) = Y(h) j=1 j j . 3. Decrease the threshold λ(h) following an appropriate strategy (e.g., linear, mMOM).
(As a reminder, the mMOM strategy was described in subsection IV.C.1.).
Blind Source Separation: The Sparsity Revolution
(a)
(b)
(c)
(d)
291
FIGURE 26 Recovering color images. (a) Original Barbara color image (b) Masked image; 90% of the color pixels are missing. (c) Inpainted image using the original morphological component analysis (MCA) algorithm on each color channel. (d) Inpainted image using the adaptive generalized morphological component analysis (GMCA) algorithm.
Figure 26a shows the original Barbara color image. Figure 26b depicts the masked color image where 90% of the color pixels are missing. Figure 26c portrays the recovered image using GMCA in the original RGB color space, which amounts to performing a monochannel MCA-based inpainting on each channel; see Elad et al. (2005); Fadili et al. (2005). Figure 26d shows the image recovered with the color space–adaptive GMCA algorithm. The zoom on the recovered images in Figure 27 shows that adapting the color space–avoids chromatic aberrations and hence produces a better visual result. This visual impression is quantitatively confirmed by SNR measurements, where the color space–adaptive GMCA improves the SNR by 1 dB.
292
Jerome Bobin et al.
(a)
(b)
(c)
(d)
FIGURE 27 Zoom on recovered Barbara color image. (a) Original Barbara color image. (b) Masked image; 90% of the color pixels are missing. (c) Inpainted image using the original (MCA) algorithm on each color channel. (d) Inpainted image using the adaptive (GMCA) algorithm.
B. Application to the Planck Data 1. Introduction to the Planck Data Set Investigating cosmic microwave background (CMB) data is of great scientific importance as it improves our knowledge of the universe (Jungman et al., 1996). Indeed, most cosmological parameters can be derived from the study of CMB data. In the past decade several experiments (Archeops, BOOMERANG, MAXIMA, WMAP; see Bennett et al. (2008)) have already provided large amounts of data and astrophysical information. The forthcoming Planck ESA mission will provide new accurate data requiring effective data analysis tools. More precisely, recovering useful scientific information requires disentangling in the CMB data the
Blind Source Separation: The Sparsity Revolution
293
contribution of several astrophysical components, namely, CMB itself; galactic emissions from dust and synchrotron; and Sunyaev-Zel’dovich (SZ) clusters (Sunyaev and Zel’dovich, 1980) to name a few. In the frequency range used for CMB observations (Bouchet and Gispert, 1999), the observed data combine contributions from distinct astrophysical components the recovery of which falls in the frame of component separation. Following a standard practice in the field of component or source separation, which has physical grounds here, the observed sky is modeled as a linear mixture of statistically independent components. The observation with detector i is then a noisy linear mixture of n independent sources {sj }j=1,...,n : xi = nj=1 aij sj + ni . The coefficient aij reflects the emission law of source sj in the frequency band of the i-th sensor; ni models instrumental noise.
2. Applying GMCA to Simulations The GMCA method described above was applied to synthetic data composed of m = 6 mixtures of n = 3 sources: CMB, galactic dust emission, and SZ maps (Figures 28 and 29). The synthetic data mimic the observations that will be acquired in the six frequency channels of Planck–highfrequency instrument (HFI), namely, 100, 143, 217, 353, 545, and 857 GHz, as shown on Figure 29. White Gaussian noise N is added with diagonal covariance matrix N reflecting the foreseen Planck-HFI noise levels. Experiments were led with 7 global noise levels with SNR from 1.7 to 16.7 dB such that the experimental noise covariance N was proportional to the nominal noise covariance. Note that the nominal Planck-HFI global noise level is approximately 10 dB. Each measurement point was computed from 30 experiments involving random noise, randomly chosen sources from a data set of several simulated CMB, galactic dust, and SZ 256 × 256 maps. The astrophysical components and the mixture maps were generated as in Moudden et al. (2005) according to Eq. (2) based on model or experimental emission laws, possibly extrapolated,
FIGURE 28 SZ map.
The simulated sources, Left, CMB. Middle, galactic dust emission. Right,
294
Jerome Bobin et al.
FIGURE 29
The observed CMB data. Global SNR = 2.7 dB.
of the individual components. Separation was obtained with GMCA using a single 2D-DWT. Figure 30 depicts the average correlation coefficients over experiments between the estimated source maps and the true source maps. Figure 30a shows the correlation coefficient between the true simulated CMB map and the one estimated by JADE (dotted line with open square 2), SMICA (dashed line with open circle ◦), and GMCA (solid line). The CMB map is well estimated by SMICA, which indeed was designed for the blind separation of stationary colored Gaussian processes, but it does not perform as well using JADE as might be expected. GMCA performs similarly to SMICA. In Figure 30c galactic dust is well estimated by both GMCA and SMICA. The SMICA estimates seem to have a slightly higher variance than GMCA estimates for higher global noise levels (SNR < 5 dB). Finally, Figure 30e shows that GMCA yields better estimates of the SZ map than SMICA when the noise variance increases. The right panels provide the dispersion (i.e., standard deviation) of the correlation coefficients of the sources estimates. It appears that GMCA is a general method yielding simultaneous SZ and CMB estimates comparable to state-of-the-art blind separation techniques that seem mostly dedicated to individual components. In a noisy context, assessing separation techniques is more accurate using a mixing matrix criterion, as it is experimentally much more sensitive to separation errors. Figure 30h illustrates the behavior of the mixing matrix criterion CA with JADE, SMICA, and GMCA as the global noise
295
CMB map 0.95 0.9 0.85 0.8 GMCA
0.75
JADE
0.7 0.65
SMICA 0
5
10
15
20
SNR (dB)
(b)
Galactic dust map
Dispersion of the correlation coefficient
Correlation coefficient
1
(a)
0.95
0.9
0.85
0
5
10
(c)
15
20
SNR (dB)
(d)
SZ map
Dispersion of the correlation coefficient
Correlation coefficient
1
0.8
Correlation coefficient
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0
5
10
(e)
15
20
SNR (dB)
(f)
10
Dispersion of the mixing matrix criterion
0.2
Mixing matrix criterion
1.4 1.2 1 0.8 0.6 0.4 0.2 0 (g)
2
4
6
Dispersion of the correlation coefficient
Blind Source Separation: The Sparsity Revolution
8
12
SNR (dB)
14
16
18
(h)
CMB map 0.12 0.1 0.08 0.06 0.04 0.02 0
0
5
10
15
20
15
20
15
20
SNR (dB) Galactic dust map 0.12 0.1 0.08 0.06 0.04 0.02 0
0
5
10 SNR (dB) SZ map
0.03 0.025 0.02 0.015 0.01 0.005 0
0
5
10 SNR (dB)
0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06
2
4
6
8
10
12
14
16
18
SNR (dB)
FIGURE 30 The left column (a, c, e, g) shows correlation coefficients between the estimated source map and the true source map. The right column shows the dispersion of these correction coefficients. (a) and (b): CMB. (c) and (d): Galactic dust. (e) and (f): Sunyaev-Zel'dovich map. (g) and (h): Mixing matrix criterion CA . JADE: dotted line with open squares; SMICA: dashed line with open circles; GMCA: solid line.
296
Jerome Bobin et al.
variance varies. GMCA clearly outperforms SMICA and JADE when applied to CMB data.
3. Addition of Physical Constraint: The Versatility of GMCA In practice, the separation task is only partly blind. Indeed, the CMB emission law is very well known. In this section, we demonstrate that GMCA is sufficiently versatile to account for such prior knowledge. In the following experiment, CMB-GMCA has been designed by constraining the column of the mixing matrix A related to CMB to its true value. This is equivalent to placing a strict prior on the CMB column of A; that is, cmb is the P(acmb ) = δ(acmb − acmb 0 ) where δ(.) is the Dirac distribution and a0 true simulated CMB emission law in the frequency range of Planck-HFI. Figure 31 shows the correlation coefficients between the true source maps and the source maps estimated using GMCA with and without the CMB prior. As expected, Figure 31a shows that assuming acmb 0 is known improves the estimation of CMB. Interestingly, the galactic dust map (Figure 31b) is also better estimated. Furthermore, the CMB-GMCA SZ map estimate is likely to have a lower variance (Figure 31e and f). Moreover, it is likely to provide more robustness to the SZ and galactic dust estimates, thus enhancing the global separation performances.
C. Software A website have been designed that gives an overview of some applications based on morphological diversity: http://www.morphologicaldiversity.org. A Matlab toolbox coined GMCALab is available online at http://perso. orange.fr/jbobin/.
VIII. CONCLUSION This chapter provides, we overview of the application of sparsity and morphological diversity in the scope of BSS problems. The contribution of this chapter is twofold: (1) it adds new insights into how sparsity enhances BSS, and (2) it provides a new sparsity-based source separation method coined generalized morphological component analysis (GMCA) that takes better advantage of sparsity, thereby yielding good separation results. GMCA improves the separation task by the use of recent sparse overcomplete (redundant) representations. Numerical results confirm that morphological diversity clearly enhances source separation. When the number of sources is unknown, we introduce a GMCA-based heuristic that provides good separation performances. Further will clearly enlighten the behavior of GMCA when the number of sources must be estimated. This chapter also
297
CMB map
Correlation coefficient
1 0.9 0.8 0.7
GMCA CMB-GMCA
0.6 0.5 4
6
8
(a) 1
10
12
14
16
18
Correlation coefficient
0.96 0.95 0.94 0.93 0.92 6
8
(c)
10
12
14
16
18
Correlation coefficient
0.86 0.84 0.82
(e)
6
8
10
12
10
12
SNR (dB)
14
16
18
(f)
14
16
18
14
16
18
14
16
18
SNR (dB) Dust map
0.1 0.08 0.06 0.04 0.02 0
2
4
6
8
10
12
SZ map
0.9
8
4
SZ map
0.88
6
2
SNR (dB)
0.92
4
0
(d)
0.94
2
0.02
SNR (dB)
0.96
0.8
0.04
Dispersion of the correlation coefficient
4
0.06
Dust map
0.97
2
0.08
(b)
0.98
0.91
0.1
SNR (dB)
0.99
CMB map
0.12
Dispersion of the correlation coefficient
2
Dispersion of the correlation coefficient
Blind Source Separation: The Sparsity Revolution
0.12 0.1 0.08 0.06 0.04 0.02 0
2
4
6
8
10
12
SNR (dB)
FIGURE 31 The left column (a, c, e, g) shows correlation coefficients between the estimated source map and the true source map. The right column shows the dispersion of these correction coefficients. (a) and (b): CMB. (c) and (d): Galactic dust. (e) and (f): Sunyaev-Zel'dovich map. GMCA: solid line; GMCA assuming that the CMB emission law is known: dotted line.
extends the GMCA framework to the particular case of hyperspectral data. Numerical results are given that illustrate the reliability of morphospectral sparsity constraints. In a wider framework, GMCA is shown to provide an effective basis for solving classical multivariate restoration problems, such as color image denoising or inpainting. Future work will focus on extending GMCA to the under-determined BSS case (when the number of sources is greater than the number of observations). Finally, GMCA also provides promising
298
Jerome Bobin et al.
prospects in other applications such as multivalued data restoration. Since GMCA provides a general tool for multivariate data analysis, future work will also emphasize the use of GMCA-like methods to other multivalued data applications.
REFERENCES Aharon, M., Elad, M., and Bruckstein, A. (2006). k-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Proc. 54(11), 4311–4322. Akaike, H. (1970). Statistical predictor estimation. Ann. Inst. Stat. Math. 22. Amari, S.-I. (1999). Superefficiency in blind source separation. IEEE Trans. Signal Proc. 47(4), 936–944. Balan, R. (2006). Estimator for number of sources using minimum description length criterion for blind sparse source mixtures. In “Independent Component Analysis and Signal Separation,” vol. 4666 of LNCS (M. E. Davies, C. J. James, S. A. Abdallah, and M. D. Plumbley (eds.), pp. 333–340, Springer, New York. Barlow, H. (1961). Possible principles underlying the transformation of sensory messages. In “Sensory Communications” (W. Rosenblith, ed.), pp. 217–234. Bell, A., and Sejnowski, T. (1995). An information maximisation approach to blind separation and blind deconvolution. Neural Comput. 7(6), 1129–1159. Belouchrani, A., Meraim, K. A., Cardoso, J.-F., and Moulines, E. (1997). A blind source separation technique based on second-order statistics. IEEE Trans. Signal Proc. 45(2), 434–444. Bennett, C. et al. (2008). First year Wilkinson microwave anisotropy probe (WMAP) observations: preliminary maps and basic results. Astrophys. J. Suppl. 148. Bobin, J. GMCAlab. Available at: http://pagesperso-orange.fr/jbobin/gmcalab.html. Bobin, J., Fadili, M. J., Moudden, Y., and Starck, J.-L. (2007). Morphological diversity and sparsity: new insights into multivariate data analysis, in Proceedings of the SPIE Conference Wavelets–SPIE. Bobin, J., Moudden, Y., and Starck, J.-L. (2006). Enhanced source separation by morphological component analysis, in ICASSP ’06 5, pp. 833–836. Bobin, J., Moudden, Y., Fadili, M. J., and Starck, J.-L. Morphological diversity and sparsity for multichannel data restoration. J. Math. Imag. Vis. In press. Bobin, J., Moudden, Y., Starck, J.-L., and Elad, M. (2006). Morphological diversity and source separation. IEEE Signal Proc. Lett. 13(7), 409–412. Bobin, J., Starck, J.-L., Fadili, M. J., and Moudden, Y. (2007). Sparsity and morphological diversity in blind source separation. IEEE Trans. Image Proc. 16(11), 2662–2674. Available at: http://pagesperso-orange.fr/jbobin/articles/manuscript_gmca.pdf Bobin, J., Starck, J.-L., Fadili, M. J., Moudden, Y., and Donoho, D. L. (2007). Morphological component analysis: an adaptive thresholding strategy. IEEE Trans. Image Proc. 16(11), 2675–2681. Bouchet, R., and Gispert, R. (1999). Foregrounds and CMB experiments: I. semi-analytical estimates of contamination. New Astron. 4, 443–470. Bronstein, A., Bronstein, M., Zibulevsky, M., and Zeevi, Y. (2005). Sparse ICA for blind separation of transmitted and reflected images. Int. J. Imag. Sci. Technol. 15(1), 84–91. Bruckstein, A., and Elad, M. (2002). A generalized uncertainty principle and sparse representation in pairs of RN bases. IEEE Trans. Inform. Theory 48, 2558–2567. Bruckstein, A., Donoho, D. L., and Elad, M. (2008). From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. [to appear].
Blind Source Separation: The Sparsity Revolution
299
Candès, E., and Donoho, D. (1999a). Curvelets. Stanford University Tech. Rep., Statistics, Stanford University, Palo Alto, California. Candès, E., and Donoho, D. (1999b). Ridgelets: the key to high dimensional intermittency? Phil. Trans. Roy. Soc. London A 357, 2495–2509. Candès, E., Demanet, L., Donoho, D., and L. Ying, (2006). Fast discrete curvelet transforms. SIAM J. Multiscale Model. Simul. 5(3), 861–899. Cardoso, J.-F. (1997). Infomax and maximum likelihood for source separation. IEEE Lett. Signal Proc. 4(4), 112–114. Cardoso, J.-F. (1998). Blind signal separation: statistical principles. Proc. IEEE 86, 2009–2025. Cardoso, J.-F. (1999). High-order contrasts for independent component analysis. Neural Comput. 11(1), 157–192. Cardoso, J.-F. (2001). The three easy routes to independent component analysis; contrasts and geometry, in Proc. ICA. San Diego, CA. Cardoso, J.-F. (2003). Dependence, correlation and non Gaussianity in independent component analysis. J. Machine Learning Res. 4, 1177–1203. Chen, J., and Huo, X. (2005). Sparse representations for multiple measurement vectors (MMV) in an overcomplete dictionary, in ICASSP ’05. Chen, S., Donoho, D. L., and Saunders, M. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61. Cichocki, A., and S. Amari, (2002). “Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications.” John Wiley & Sons, New York. Combettes, P. L., and Wajs, V. R. (2005). Signal recovery by proximal forward-backward splitting. SIAM J. Multiscale Model. Simul. 4(4), 1168–1200. Comon, P. (1994). Independent component analysis, a new concept? Signal Proc. 36(3), 287–314. Cotter, S., Rao, B., Engan, K., and Kreutz-Delgado, K. (2005). Sparse solutions to linear inverse problems with multiple measurement vectors. IEEE Trans. Signal Proc. 53, 2477–2488. Curvelab 2.1 for Matlab7.x. (2006). Available at http://www.curvelet.org/. Darmois, G. (1953). Analyse g ´en ´erale des liaisons stochastiques. Rev. Inst. Int. Stat., 2–8. Daubechies, I, Defrise, M., and Mol, C. D. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math. 57, 1413–1541. Davies, M. (2004). Identifiability issues in noisy ICA. IEEE Signal Proc. Lett. 11, 470–473. Demanet, L., and Ying, L. (2006). Wave atoms and sparsity of oscillatory patterns, ACHA Accepted Do, M. N., and M. Vetterli, M. (2005). The contourlet transform: an efficient directional multiresolution image representation. IEEE Trans. Image Proc. 14(12), 2091–2106. Donoho, D. L. High-dimensional data analysis: the curse and blessing of dimensionality, Lecture delivered at the Conference Math Challenges of the 21st Century. Donoho, D. L. (1993). Unconditional bases are optimal bases for data compression and for statistical estimation. Appl Comput. Harmon. Anal. 1, 100–115. Donoho, D. L., and Elad, M. (2003). Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc. Natl. Aca. Sci. 100, 2197–2202. Donoho, D. L., and Huo, X. (2001). Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory 47(7), 2845–2862. Donoho, D. L., and Johnstone, I. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 90, 1200–1224. Donoho, D. L., and Tsaig, Y. (2006). Fast solution of l-1-norm minimization problems when the solution may be sparse, in: Preprint available at http://www.dsp.ece.rice.edu/cs/. Donoho, D. L., Elad, M., and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52, 6–18. Donoho, D. L., Tsaig, Y., Drori, I., and Starck, J.-L. Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit. IEEE Trans. Inform. Theory Submitted for publication.
300
Jerome Bobin et al.
Donoho, D. L., Vetterli, M., DeVore, R. A., and Daubechies, I. (1998). Data compression and harmonic analysis. IEEE Trans. Inform. Theory 44, 2435–2476. Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression, Ann. Stat. 32(2), 407–499. Elad, M. (2006). Why simple shrinkage is still relevant for redundant representations? IEEE Trans. Inform. Theory 52(12), 5559–5569. Elad, M., Starck, J.-L., Donoho, D., and Querre, P. (2005). Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA). ACHA 19(3), 340–358. Fadili, M. J., Starck, J.-L., and Murtagh, F. (2005). Inpainting and zooming using sparse representations. Comput J. In press. Feuer, A., and Nemirovsky, A. (2003). On sparse representation in pairs of bases. IEEE Trans. Inform. Theory 49(6), 1579–1581. Field, D. (1999). Wavelets, vision and the statistics of natural scenes. Phil. Trans. R. Soc. Lond. A 357, 2527–2542. Figueiredo, M., and Nowak, P. (2003). An em algorithm for wavelet-based image restoration. IEEE Trans. Image Proc. 12(8), 906–916. Fornasier, M., and H. Rauhut, H. (2008). Recovery algorithms for vector valued data with joint sparsity constraints. SIAM J. Numer. Anal., 46(2), 577–613. Fuchs, J. J. (2006). Recovery conditions of sparse representations in the presence of noise. ICASSP ’06 3(3), 337–340. Fuchs, J.-J. (2004). On sparse representations in arbitrary redundant bases. IEEE Trans. Inform. Theory 50(6), 1341–1344. Georgiev, P. G., Theis, F., and Cichocki, A. (2005). Sparse component analysis and blind source separation of underdetermined mixtures. IEEE Trans Neural Networ. 16(4), 992–996. Gribonval, R., and Nielsen, M. (2003). Sparse representations in unions of bases. IEEE Trans. Inform. Theory 49, 3320–3325. Gribonval, R., and Nielsen, M. (2003). Sparse representations in unions of bases. IEEE Trans. Inform. Theory 49(12), 3320–3325. Gribonval, R., and Nielsen, M. (2006). Beyond sparsity: recovering structured representations by l1 -minimization and greedy algorithms. Application to the analysis of sparse underdetermined ICA. Adv. Comput. Math. 28(1), 23–41. Gribonval, R., and Vandergheynst, P. (2006). On the exponential convergence of matching pursuits in quasi-incoherent dictionaries. IEEE Trans. Inform. Theory 52(1), 255–261. Hyvarinen, A., Karhunen, J., and Oja, E. (2001). “Independent Component Analysis.” Ichir, M., and Mohammad-Djafari, A. (2006). Hidden Markov models for wavelet-based blind source separation. IEEE Trans. Image Proc. 15(7), 1887–1899. Jourjine, A., Rickard, S., and Yilmaz, O. (2000). Blind separation of disjoint orthogonal signals: demixing n sources from 2 mixtures. ICASSP ’00 5, 2985–2988. Jungman, G., Kamionkowski, M., Kosowsky, A., and Spergel, D. N. (1996). Cosmological parameter determination with microwave background maps. Phys. Rev. D 54, 1332–1344. Koldovsky, P. T. Z., and Oja, E. (2006). Efficient variant of algorithm fastICA for independent component analysis attaining the Cramer-Rao lower bound. IEEE Trans. Neural Networ. 17, 1265–1277. Koldovsky, P. T. Z., and Tichavsky, P. (2006). Methods of fair comparison of performance of linear ICA techniques in presence of additive noise, in Proc. ICASSP 2006 5, 873–876. Lee, T.-W., Girolami, M., Bell, A. J., and Sejnowski, T. J. (1998). A unifying informationtheoretic framework for independent component analysis. vol. pages. LePennec, E., and Mallat, S. (2005). Sparse geometric image representations with bandelets. IEEE Trans. Image Proc. 14(4), 423–438.
Blind Source Separation: The Sparsity Revolution
301
Li, Y., Amari, S., Cichocki, A., and Guan, C. (2006). Underdetermined blind source separation based on sparse representation. IEEE Trans. Inform. Theory 52, 3139–3152. Li, Y., Amari, S., Cichocki, A., Ho, D., and Xie, S. (2006). Underdetermined blind source separation based on sparse representation. IEEE Trans. Signal Proc. 54, 423–437. M. Plumbley, M. (2006). Recovery of sparse representations by polytope faces pursuit, in ICA 06, pp. 206–213. Mairal, J., Elad, M., and Sapiro, G. (2006). Sparse representation for color image restoration, ITIP Submitted for publication. Malioutov, D. M., Cetin, M., and Willsky, A. S. (2005). Homotopy continuation for sparse signal representation, in ICASSP ’05, vol. 5, pp. 733–736. Mallat, S. (1998). “A Wavelet Tour of Signal-Processing.” Academic Press, New York. Mallat, S., and Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Proc. 41(12), 3397–3415. Meyer, Y. Images et vision. Personal communication. Moudden, Y., Cardoso, J.-F., Starck, J.-L., and Delabrouille, J. (2005). Blind component separation in wavelet space: application to CMB analysis. Eurasip J. Appl. Signal Proc. 15, 2437–2454. Nadal, J.-P., and Parga, N. (1994). Non-linear neurons in the low-noise limit: a factorial code maximises information transfer. Network 4, 295–312. Olshausen, B., and Field, D. (2006). Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis. Res. 37, 3311–3325. Osborne, M. R., Presnell, B, and Turlach, B. A. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20(3), 389–403. Pearlmutter, B., and Parra, L. (1997). Maximum likelihood blind source separation: a contextsensitive generalization of ICA. Adv. Neural Inform. Proc. Syst. 9. Peyré, G. (2007). Best basis compressed sensing, in SSVM, 2007, preprint available at http://www.dsp.ece.rice.edu/cs/. Peyré, G. (2007). Texture synthesis and modification with a patch-valued wavelet transform, in SSVM 07. Pham, D.-T., Garrat, P., and Jutten, C. (1992). Separation of a mixture of independent sources through a maximum likelihood approach, in Proc. EUSIPCO, pp. 771–774. S. Amari, J.-F. Cardoso, Blind source separation: semiparametric statistical approach, IEEE Tr. on Signal-Processing 45(11). Saito, N., and Benichou, B. (2003). Sparsity vs. statistical independence in adaptive signal representations: a case study of the spike process. In “Beyond Wavelets, Studies in Computational Mathematics,” vol. 10/9 (G. V. Welland, ed.), pp. 225–257. Sallee, P., and Olshausen, B. (2003). Learning sparse multiscale image representations. Adv. Neural Inform. Proc. Syst. 15, 1327–1334. Sardy, S., Bruce, A., and Tseng, P. (2000). Block coordinate relaxation methods for nonparametric wavelet denoising. J. Comput. Graph. Stat. 9(2), 361–379. Schwarz, G. (1978). Estimating the dimension of a model. Ann. Stat. 6, 461–464. Simoncelli, E., and Olshausen, B. (2001). Natural image statistics and neural representation. Ann. Rev. Neurosci. 24, 1193–1216. Starck, J.-L., Candès, E., and Donoho, D. L. (2002). The curvelet transform for image denoising. IEEE Trans. Image Proc. 11(6), 670–684. Starck, J.-L., Candès, E., and Donoho, D. L. (2002). The curvelet transform for image denoising. IEEE Trans. Image Proc. 11(6), 131–141. Starck, J.-L., Elad, M., and Donoho, D. (2005). Image decomposition via the combination of sparse representation and a variational approach. IEEE Trans. Image Proc. 14(10), 1570–1582. Starck, J.-L., Elad, M., and Donoho, D. L. (2004). Redundant multiscale transforms and their application for morphological component analysis. Adv. Imag. Elect. Phys. 132, 287–348.
302
Jerome Bobin et al.
Starck, J.-L., Fadili, M. J., and Murtagh, F. (2007). The undecimated wavelet decomposition and its reconstruction. IEEE Trans. Image Proc. 16, 297–309. Sunyaev, R., and Zel’dovich, Y. (1980). The velocity of cluster of galaxies to the microwave background. The possibility of its measurement, Ann. Rev. Astron. Astrophys. 18. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58(1), 267–288. Tropp, J. (2004). Greedy is good: algorithmic results for sparse approximation. IEEE Trans. Inform. Theory 50(10), 2231–2242. Tropp, J. (2006). Just relax: convex programming methods for subset selection and sparse approximation. IEEE Trans. Inform. Theory 52(3), 1030–1051. Tropp, J., and Gilbert, A. Signal recovery from partial information via orthogonal matching pursuit. Preprint available at http://www.dsp.ece.rice.edu/cs/. Vetterli, M. (2001). Wavelets, approximation, and compression. IEEE Signal Proc. 18(5), 59–73. Vincent, E. (2007). Complex nonconvex lp norm minimization for underdetermined source separation, in Proc. Int. Conf. on Independent Component Analysis and Blind Source Separation (ICA), pp. 430–437. WaveLab 850 for Matlab7.x. (2005). Available at: http://www-stat.stanford.edu/∼wavelab/. Zibulevski, M. (2003). Blind source separation with relative Newton method. Proc. ICA 2003, pp. 897–902. Zibulevsky, M., and Pearlmutter, B. (2001). Blind source separation by sparse decomposition. Neural Comput. 13, 863–882.
CHAPTER
6 “Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry Ray L. Withers*
Contents
I Introduction A Crystal Chemical and Other Types of Flexibility II The Modulation Wave Approach III Applications of The Modulation Wave Approach A Transverse and Longitudinal Polarization B Extinction Conditions C Planar Diffuse Absences D Size Effect E Effects of Multiple Scattering and How to Minimize Them IV Selected Case Studies A Compositionally “Disordered,” NaCl-Related, Solid-Solution Phases B Inherently Flexible, Tetrahedrally Corner-Connected Framework Structures C The Nonmagnetic Kondo Effect Material ThAsSe and the Role of the Fermi Surface D Materials Susceptible to Ferroelastic Strain Distortions Such as α-PbO V Conclusions Acknowledgments References
303 306 309 313 313 315 318 320 321 323 323 326 328 329 332 332 332
I. INTRODUCTION The conventional notion of crystalline materials is of essentially perfectly ordered, three-dimensionally (3D) periodic objects characterized in real space by 3D unit cells and their atomic contents and in reciprocal space * Research School of Chemistry, Australian National University Canberra, A.C.T, 0200, Australia Advances in Imaging and Electron Physics,Volume 152, ISSN 1076-5670, DOI: 10.1016/S1076-5670(08)00606-x. Copyright © 2008 Elsevier Inc. All rights reserved.
303
304
Ray L. Withers
by close to infinitely sharp Bragg reflections falling on the nodes of a corresponding 3D reciprocal lattice. While this is undoubtedly an excellent approximation for many materials, a rather large, continually growing, and often technologically important family of phases is known for which such a description is either inappropriate (e.g., long-range ordered icosohedral, decagonal, or dodecagonal quasicrystalline phases, incommensurate or compositely modulated structures, and so on; see, for example, van Smaalen, 2007; Janssen et al., 2007) or grossly inadequate (such as heavily “disordered” crystalline materials whose reciprocal spaces exhibit highly structured diffuse intensity distributions accompanying the strong Bragg reflections of an underlying average structure; see Figure 1). Disordered, or rather locally ordered (as shown by the highly structured diffuse distributions) phases of the latter type are very widespread and are the subject of this Chapter. Examples (by no means exhaustive) of materials types that exhibit highly structured diffuse intensity distributions at one temperature (or in one polymorphic form) or the other include the following: 1. A very large number of compositionally “disordered” solid-solution phases—for example, (a) the wide-range nonstoichiometric (1 − x) M2+ S·xLn3+ 2/3 S, M = Ca, Mg, Mn, Ln = rare earth or Y, solid-solution phases (Flahaut, 1979, Withers et al., 1994a, 2007); (b) the widely substoichiometric, transition metal carbide (TC1−x ) and nitride (TN1−x ) solid-solution phases—for example, Billingham et al. (1972), Brunel et al. (1972), Sauvage and Parthé (1972), Anderson (1984); or (c) the large family of disordered alloy, and disordered semiconductor alloy, solidsolution phases (Gomyo et al., 1988; Matsumura et al., 1991; Ohsima and Watananbe, 1973; Sato et al., 1962).
(a) (b) (c) FIGURE 1 A typical (a) <111> zone axis electron diffraction pattern (EDP) of β-cristobalite (SiO2 ), (b) < −1, 3, 0> zone axis EDP of “Sr2 NdNbO6 ,” and (c) < −1, 5, 0> zone axis EDP of the relaxor ferroelectric Ba(Ti0.7 Sn0.3 )TiO3 . Note the highly structured diffuse intensity distributions accompanying the strong Bragg reflections of the underlying average structure in each case.
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
305
2. Inherently flexible framework structures—for example, (a) the ReO3 or ideal perovskite structure types (Brink et al., 2002; Glazer, 1972); (b) the quartz, cristobalite, and tridymite forms of silica, SiO2 , and AlPO4 (van Tendeloo et al., 1976; Withers et al., 1989, 1994b); or (c) the large family of zeotypic, microporous molecular sieve materials (Hartmann and Kevan, 1999; Liu et al., 2003; Withers and Liu, 2005). 3. Materials susceptible to electronic or Fermi surface–driven structural instabilities—for example, (a) low-dimensional materials susceptible to charge density wave (CDW)-type instabilities, such as the layered transition metal dichalcogenides, the organic charge transfer salts, or the nonmagnetic Kondo effect materials ThAsSe and UAsSe (Khanna et al., 1977; Wilson et al., 1975; Withers et al., 2006); (b) certain “defect” transition metal oxide phases (e.g., Castles et al., 1971); or (c) various alloy phases (e.g., Norman et al., 1985). 4. Dynamically disordered, solid electrolyte and related phases—for example, (a) BaTiO3 , in all but its lowest-temperature rhombohedral polymorphic form, and BaTiO3 -doped relaxor ferroelectric phases (Comes et al., 1968; Harada and Honjo; 1967; Liu et al., 2007); (b) ionic conductors such as α-AgI or γ-RbAg4 I5 (e.g., Andersson et al., 1985; Funke and Banhatti; 2006) and the cubic stabilized zirconias (Welberry et al., 1995); or (c) Ag- or Cu-containing mineral sulfosalts such as the mineral pearceite (Bindi et al., 2006). 5. Materials susceptible to ferroelastic strain distortions—for example, (a) the tetragonal α form of PbO and Sn1−x O (Withers et al., 1993; Withers and Schmid, 1994; Moreno et al., 1997); (b) partially ordered potassium feldspars such as the orthoclase, adularia, and intermediate microcline feldspars (McClaren and Fitz Gerald, 1987; Putnis and Salje, 1994); or (c) the Ni-rich, Ni1−xAlx , alloy phase (van Tendeloo and Amelinckx; 1998) and the Inx Ga1−xAsy P1−y semiconductor alloy phase (Treacy et al., 1985). Significant insight into the local order (and hence the underlying crystal chemistry) of “disordered” materials of the above types often can be obtained simply from the “shapes” of the various structured diffuse intensity distributions characteristic of them (e.g., de Ridder et al., 1976a,b, 1977a,b; Withers et al., 2007; see also Section IV.A). Reciprocal space mapping of this type, however, requires a diffraction probe that is sufficiently sensitive to weak features of reciprocal space. Electron diffraction is an ideal such probe as a result of the strength of the interaction between fast electrons and matter. This chapter reviews the application of electron microscopy, in particular electron diffraction, to the detection and characterization of “disordered”/locally ordered phases. It builds on much earlier pioneering work in the area (see in particular Honjo et al., 1964; Harada and Honjo, 1967), as well as on many later
306
Ray L. Withers
subsequent reviews (e.g., van Tendeloo Amelinckx, 1998; Withers, 2005, and so on).
A. Crystal Chemical and Other Types of Flexibility A review on disordered/locally ordered phases is simultaneously and inevitably also a review of crystal chemical flexibility and of the numerous ways in which this can be manifested on local, mesoscopic, and/or macroscopic length scales (Figure 2). A major theme will be that nature, when thoroughly and carefully investigated, often throws up surprises that do not fit into preexisting categories of how atoms/objects should be arranged and hence, unfortunately, are often neglected—worse ignored. A good example is the experimental discovery of quasicrystalline materials which began with Dany Shechtman being the first to take seriously electron diffraction patterns (EDPs) exhibiting “forbidden” symmetries (Shechtman et al., 1984) and not accepting the then-entrenched conventional wisdom that orientational twinning of conventional 3D crystalline material somehow had to be responsible (see Pauling, 1985). Images such as Figure 2b strongly suggest (if not show directly) that large, single-phase quasicrystals do indeed exist. Similarly, the many electron microscopists who inadvertently took images of curved “graphite” planes and “onions” in their background holey carbon films in the 1960s and 1970s but did not seriously consider the implication may well have missed discovering close relatives of buckyballs and buckytubes much
(a) (b) FIGURE 2 (a) A Scanning electron microscope micrograph of the unusual scroll morphology in which the misfit layer material Bi1.17 Nb2 S5.17 naturally grows (see Otero-Diaz et al., 1995). (b) An optical micrograph of the perfect dodecahedral symmetry of the millimeter-sized, single icosahedral quasicrystal Ho8.7 Mg34.6 Zn56.8 (see Fisher et al., 2000). Micrographs courtesy of L. C. Otero-Diaz and I. R. Fisher, respectively.
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
307
sooner than they were actually discovered (Iijima, 1991; Kroto et al., 1985; Ugarte, 1992). Although structural science needs systematic categorization, it also always needs flexibility—the openness to allow new types of previously unrecognized “order” to be envisioned, recognized, and explored (Andersson et al., 1988; Aragón et al., 1993; Jacob and Andersson, 1998; Klinowski et al., 1996; Mackay, 1976, 1988). What is “fringe” today is quite often mainstream tomorrow. A recent (probable) example is the use of a multimetrical (scaling) approach to describe the symmetry and morphology of snow crystals (Janner, 1997) and the later extension of the same idea to biomacromolecules such as nucleic acids with helical structure (Janner, 2001). An earlier definite example, and one that is central to this review, was the recognition of long-range aperiodic order in (3 + n)-d modulated structures and the idea of describing them by embedding into higherdimensional “superspace” rather than attempting to use the (up until then) firmly entrenched notion of 3D translational symmetry. The notion of aperiodic order was first introduced in a seminal paper by de Wolff (1974) and is now firmly established, including reciprocal space indexation in (3 + n) dimensions (see Figure 3 ), as well as systematic listings of higherdimensional superspace group symmetries and their associated systematic extinction conditions (Janssen et al., 1995). In the case of the (3 + 3)-d incommensurately modulated (1 − x)Bi2 O3 · xNb2 O5 , 0.06 < x < 0.25, solid-solution phase (Esmaeilzadeh et al., 2001;
(a) (b) FIGURE 3 Typical (a) <110> and (b) <001> zone axis electron diffraction patterns of the (3 + 3)-D incommensurately modulated (1 − x)Bi2 O3 ·xNb2 O5 , 0.06 ≤ x ≤ 0.25, solid-solution phase for x ∼ 0.2. Six-dimensional indexation is with respect to the basis vectors a*, b*, c*, q1 = εa*, q2 = εb*, q3 = εc*, ε ∼ 0.37 for x = 0.2 (see Withers et al., 1999; for details).
308
Ray L. Withers
Withers et al., 1999), for example, the characteristic extinction conditions F([hklmnp]∗ ) = 0 unless h + k, k + l, h + l, m + n, n + p, and m + p are all even and F(
∗ ) = 0 unless m + n = 4J, J an integer (a d hyperglide condition) are clearly observed (see Figure 3). These systematic extinction conditions, in conjunction with the m 3 m Laue symmetry of reciprocal space, imply an overall six-dimensional superspace group symmetry of P : Fm3m : Fd3m (in the notation of Yamamoto, 1982). The constraints that such higher-dimensional superspace symmetry places on the atomic modulation functions (AMFs; see Pérez-Mato et al., 1987; Janssen et al., 1995; van Smaalen, 2007) describing the deviation of any particular long-range ordered modulated structure away from its underlying average structure are now generally well understood and encoded into available structure refinement packages such as JANA 2000 (Petricek et al., 2000). This is of direct relevance to this chapter because systematic extinction conditions are not only characteristic of long-range ordered 3D or (3 + n)-d (modulated) structures but also of the structured diffuse intensity distributions characteristic of many so-called disordered materials. See, for example, the [001] zone axis EDP of ThAsSe (average structure space group symmetry P4/nmm) shown in Figure 4a and the <130> zone axis EDP of Ca3 CuTi4 O12 , CCTO (average structure space group symmetry Im − 3) shown in Figure 4b. Note that the G ± ∼0.14 <110>∗ ± ε < 1, −1, 0>∗ (G = [hkl]∗ , ε continuous; that is, the G ± ∼0.14 [110]∗ ± ε [1, −1, 0]∗ and
(a) (b) FIGURE 4 An (a) [001] zone axis EDP of P4/nmm ThAsSe (see Withers et al., 2006 for details) and (b) a <130> zone axis electron diffraction pattern of Im − 3 CCTO (see Liu et al., 2005 for details). Note that the G ± ∼ 0.14 <110>*± ε <1, −1, 0>* diffuse streaking in (a) occurs only around the h + k odd, n-glide forbidden parent reflections, whereas the diffuse streaking perpendicular to < 001 > in (b) only runs through the [hkl]*, l even parent reflections.
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
309
G ± ∼0.14[1, −1, 0]∗ ± ε[110]∗ ) diffuse “streaking” in the case of ThAsSe (see Figure 4a), occurs only around the h + k odd, n-glide forbidden parent reflections, while the diffuse streaking perpendicular to <001> in the case of CCTO (see Figure 4b) only runs through the [hkl]∗ , l even parent reflections. To gain insight into the local order hidden in disordered materials of this type it is very helpful, if not essential, to understand the symmetry constraints underlying such extinction conditions; for this purpose, the language of modulated structures is ideally suited. The existence of such extinction conditions also suggests the distinct advantages of a modulation wave approach (de Fontaine, 1972; Krivoglaz, 1969; Pérez-Mato et al., 1986, 1987) to the structural description and characterization of disordered materials (Withers, 2005). Such an approach involves describing any observed diffuse distribution in terms of what are presumed to be essentially independent, uncorrelated modulation waves, one for each point on the observed diffuse distribution. An approach of this type automatically emphasizes the close relationship between the crystallography of disordered structures and aperiodic crystallography in general (see Pérez-Mato et al., 1998; Withers, 2005). We begin with a description of the notation used and the derivation of structure factor expressions appropriate for disordered modulated structures. In the following text, it is assumed that the modulations giving rise to the observed structured diffuse distribution are of coupled compositional and displacive character. (Note that a sinusoidal compositional modulation with modulation wave-vector q will automatically induce a displacive modulation with the same wave vector.) Of course, other types of modulation are possible (e.g., modulations of magnetic moment or spin state) but the most common modulated structures are of coupled compositional and/or displacive character.
II. THE MODULATION WAVE APPROACH In a modulation wave approach (see, Pérez-Mato et al., 1986, 1987; Janssen et al., 1995), the deviation of the scattering factor of the μth atom in the average structure unit cell t away from its average value can be written in the form
δfμ (rμ + t) = fμav q aμ (q) exp[2πiq·(rμ + t)],
(1)
where fμav represents the average atomic scattering factor of the μth atom and aμ (q) represents the complex compositional eigenvector associated with the modulation wave-vector q, with the property that aμ (−q) =
310
Ray L. Withers
aμ (q)∗ . The displacement of the μth atom in the unit cell t away from its average structure position at (rμ + t) can likewise be written in a similar form as
uμ (rμ + t) = q eμ (q)[exp 2πiq·(rμ + t)] = q (Re{eμ (q)} cos[2πq·(rμ + t)] − Im{eμ (q)} sin[2πq·(rμ + t)],
(2)
where eμ (q) = α = a, b, c αεμα exp(iθμα ) represents the complex displacement eigenvector of the μth atom associated with the modulation wavevector q and has the property that eμ (−q) = eμ (q)∗ . Any arbitrary atomic ordering arrangement and/or atomic displacement pattern can then be described in terms of an appropriate summation (over modulation wave vectors within or on the first Brillouin zone of the underlying average structure) of such compositional and displacive modulation waves (Withers, 2005). The total scattering amplitude, F(k), from such a modulated structure is given by
F(k) = μ t fμ (rμ + t) exp{−2πik·[(rμ + t) + uμ (rμ + t)]} = μ t fμav (1 + q {aμ (q ) exp[2πiq ·(rμ + t)] + aμ (q )∗ exp[−2πiq ·(rμ + t)] ) exp[−2πik·rμ ] exp[−2πik·t] × "q exp{−2πik·(Re{eμ (q)} cos[2πq·(rμ + t)] − Im{eμ (q)} sin[2πq·(rμ + t)])}
(3)
(Note that the − sign in the exp{−2πik. ..} factor in Eq. (3) is not standard crystallographic usage. It is used here, however, to be logically consistent with the use of + signs in the exp[+2πik. ..] factors in Eqs (1) and (2).) Then, using the Jacobi–Auger generating relation exp[ixsin θ] = m Jm (x) exp[imθ], where the summation over the integer m is from −∞ to +∞ and the Jm are mth -order Bessel functions,
F(k) = μ t fμav (1 + q {aμ (q ) exp[2πiq ·(rμ + t)] + aμ (q )∗ exp[−2πiq ·(rμ + t)]}) exp[−2πik·rμ ] exp[−2πik·t] × "q m, m Jm [2πk·(Re{eμ (q)})] exp im[2πq·(rμ + t) − π/2] Jm [2πk·(Im{eμ (q)})] exp im [2πq·(rμ + t)]
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
311
= μ t fμav exp[−2πik·rμ ] exp[−2πik·t] (1 + q {aμ (q ) exp[2πiq ·(rμ + t)] + aμ (q )∗ exp[−2πiq ·(rμ + t)]}) × "q m, m Jm (2πk·(Re{eμ (q)}) Jm (2πk·(Im{eμ (q)}) exp[−imπ/2] exp[i(m + m )2πq·(rμ + t)]. At this stage, reasonable approximations are needed to simplify the above general expression into a more tractable, and hence useful, form. The question is which possible approximations are reasonable and which are not? In Figure 4a, note that only the first-order harmonic G ± q (G a Bravais lattice allowed, average structure Bragg reflection, and q = ∼ 0.14 <110>∗ ± ε <1, −1, 0>∗ , ε continuous) primary “reflections” constituting the continuous “primary” diffuse streaking are visible (i.e., no second-harmonic copy of the primary diffuse distribution is visible). The same is also true of the more complex curved primary diffuse distribution shown in Figure 5b. Such observations strongly suggest that there is almost invariably never any correlation between the different individual “primary modulation wavevectors” constituting the primary diffuse distribution, in contrast to what is clearly the case for a conventional (3 + n)-d (n > 1) incommensurately modulated structure (e.g., see Figure 3 where higher-order harmonic satellite reflections are clearly present). Thus, in almost all cases an eminently reasonable assumption is that the generally weak, primary diffuse distribution at G ± q arises solely from compositional and/or displacive modulation waves associated with the different individual primary modulation wave-vectors q.
(a) (b) FIGURE 5 An (a) < −1, 2, 0> zone axis electron diffraction pattern (EDP) of NbV0.52 NbIV 0.48 O1.52 F1.48 (see Brink et al., 2004 for details) and (b) a <1, 1, −2> zone axis EDP characteristic of the (1−x)CaS·xY2 S3 , 0 < x < 0.29, solid solution at x = 2/7 (indexed with respect to the underlying NaCl-type average structure; see Withers et al., 1994a, for details).
312
Ray L. Withers
In Figure 5a, however, note that the primary modulation wave-vectors q = ∼ 0.49 <100>∗ ± <0, 0.245, γ>∗ (γ continuous, see Brink et al., 2004, for the details) constituting the primary diffuse streaking are accompanied by a very much weaker (barely visible) second-order harmonic 2q = ∼ 0.98 <100>∗ ± <0, 0.489, γ>∗ secondary diffuse streaking (arrow in Figure 5a). This suggests that, for completeness, the above assumption (that the primary diffuse distribution at G ± q arises solely from modulation waves associated with the different individual primary modulation wavevectors q) may need, in some inherently more anharmonic cases (e.g., Brink et al., 2002), to be supplemented by the addition of necessarily small-amplitude compositional and/or displacive modulation waves associated with the second-harmonic modulation wave-vectors 2q. Under these approximations, the above expression simplifies to
F(k) =q μ m, m ,n, n fμav exp[−i(m + n)π/2] exp −2πi[k − (m + m )q − (n + n )2q]·rμ Jm (2πk·(Re{eμ (q)}) Jm (2πk·(Im{eμ (q)}) × Jn (2πk·(Re{eμ (2q)}) Jn (2πk·(Im{eμ (2q)})t exp(−2πi[k − (m + m )q − (n + n )2q]·t) + aμ (q) exp[2πiq·rμ ] exp(−2πi[k − q − (m + m )q − (n + n )2q]·t) + aμ (q)∗ exp[−2πiq·rμ ] exp(−2πi[k + q − (m + m )q − (n + n )2q]·t) + aμ (2q) exp[4πiq·rμ ] exp(−2πi[k − 2q − (m + m )q − (n + n )2q]·t) + aμ (2q)∗ exp[−4πiq·rμ ] exp(−2πi[k + 2q −(m + m )q − (n + n )2q]·t) = Nq μ m, m , n, n fμav exp[−i(m + n) π/2] exp{−2πi[k − (m + m )q − (n + n )2q]·rμ } × Jm (2πk · (Re{eμ (q)})Jm (2πk·(Im{eμ (2q)}) × Jn (2πk · (Re{eμ (2q)}) Jn (2πk·(Im{eμ (2q)}) × G {δ([k − (m + m )q − (n + n )2q] − G) + aμ (q) exp[2πiq·rμ ] δ([k − q − (m + m )q − (n + n )2q] − G) + aμ (q)∗ exp[−2πiq·rμ ] δ([k + q − (m + m )q − (n + n )2q] − G) + aμ (2q) exp[4πiq·rμ ] δ([k − 2q − (m + m )q − (n + n )2q] − G) + aμ (2q)∗ exp[−4πiq·rμ ] δ([k + 2q − (m + m )q − (n + n )2q] − G)}
(4)
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
313
Note that the second-order modulation wave amplitudes associated with the individual secondary modulation wave-vectors, 2q, must necessarily each have small amplitudes so that diffuse intensity is either not observed, or only very weakly observed, at the G ± 2q positions of reciprocal space (to be consistent with the observed experimental diffraction evidence; see Figures 4a and 5). For the purposes of this derivation, the modulation wave amplitudes aμ (2q) and eμ (2q) were thus treated as being of second order with respect to the already weak primary modulation wave amplitudes aμ (q) and eμ (q), respectively. Given that the observed intensity at (G ± 2q) is either zero or very weak in most cases and that F(k = G + 2q) = Nμ fμav exp[−2πiG·rμ ] × {aμ (2q) − /i 2 (2π[G + 2q]·eμ (2q)) − 1/8 (2π[G + 2q]·eμ (q))2 − /i 2 aμ (q) × (2π [G + 2q]·eμ (q) + ··} [from substitution into Eq. (4)], corresponding to a fourth-order contribution (in terms of aμ (q) and eμ (q)) to the intensity at (G + 2q), it is logical to only include up to third-order terms in a Taylor series–type expansion of F(k = G + q). Expanding Eq. (4) for F(k = G + q) up to third order in the modulation wave eigenvectors aμ (q) and eμ (q) then leads to the final structure factor expression:
F(k = G + q) = Nμ fμav exp[−2πiG·rμ ] × {aμ (q) − /i 2 × 2π[G + q]·eμ (q) − ¼ aμ (q)|2π[G + q]·eμ (q)|2 + /i 16 |2π[G + q]·eμ (q)|2 × (2π[G + q]·eμ (q)) − 1/8 aμ (q)∗ (2π[G + q]·eμ (q))2 − ¼ (2π[G + q]·eμ (q)∗ ) × (2π[G + q]·eμ (2q)) − /i 2 aμ (2q)(2π[G + q]·eμ (q)∗ ) ..}
(5)
This is the fundamental equation governing the scattering from a disordered modulated structure and is used throughout this chapter to understand and interpret important qualitative and quantitative features of structured diffuse intensity distributions. In most instances, however, it will be necessary to include the expansion in Eq. (5) only up to first order (i.e., only including the first two terms of the above expression).
III. APPLICATIONS OF THE MODULATION WAVE APPROACH A. Transverse and Longitudinal Polarization Given the above structure factor expression, a number of qualitative, but nonetheless very important, conclusions often can be drawn from
314
Ray L. Withers
experimental EDPs. For example, when strong azimuthal (angular) variation in the intensity of an observed diffuse distribution occurs (see the [001] zone axis EDP of ThAsSe shown in Figure 4a or the [130] zone axis EDP of CaCu3 Ti4 O12 (CCTO) in Figure 4b; particularly at large G toward the edges of these patterns, where the relative contribution of the first-order displacive to compositional component of the structure factor F(k = G + q) is significantly enhanced and where the effects of multiple scattering are much reduced), then the major contribution to the observed diffuse intensity necessarily arises from displacive disorder. (Note, however, that this does not mean that the observed distribution cannot have a compositional origin; see, for example, Brink et al., 2002.) Furthermore, the relationship between the displacement eigenvector of the most heavily displaced atom(s), eμ (q), and the modulation wave vector itself, q, can be readily deduced (see Figure 6).
bp ap
q
em(q)
G′1q
G1q
(a)
bp ap
em(q) q
G′1q
G1q
(b) FIGURE 6 (a) A transverse modulated (i.e., eμ (q) perpendicular to q) square, single-atom array on the left and its corresponding diffraction pattern on the right, and (b) a longitudinally modulated (i.e., eμ (q) parallel to q) square, single-atom array on the left and its corresponding diffraction pattern on the right. The modulation wave-vector q is the same in both (a) and (b). Adapted from Harburn et al., 1975. Reproduced with Permission from F. J. Brink.
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
315
If eμ (q) is, for example, perpendicular to q, the modulation is said to be transverse polarized (in just the same way as an electromagnetic wave is transverse polarized because the fluctuating electric and/or magnetic field direction involved in an electromagnetic wave is perpendicular, or transverse, to the wave direction itself; see Figure 6a), whereas the modulation is said to be longitudinally polarized if eμ (q) is parallel to q (in just the same way as a sound wave is longitudinally polarized because the fluctuating pressure or density wave involved in a sound wave is parallel to the wave direction itself; see Figure 6b). Note that F(G + q) is experimentally zero whenever (G + q) is perpendicular to eμ (q) in Figure 6 (along the vertical direction in Figure 6a and along the horizontal direction in Figure 6b). This is a direct consequence of the structure factor expression given in Eq. (5) in that Fμ (G + q) α 2π[G + q]·eμ (q) and clearly goes to zero whenever (G + q) is perpendicular to eμ (q), just as the calculated diffraction patterns on the right-hand side of Figure 6 show. The same type of resultant reciprocal space intensity distribution holds true when q is not an isolated individual primary modulation wave vector (as in Figure 6) but rather part of an essentially continuous diffuse intensity distribution (as in Figure 4). Thus, the G ± ∼ 0.14 <110>∗ ± ε <1, −1, 0>∗ (ε continuous) “lines” of diffuse streaking (running along the [1, −1, 0]∗ and [110]* directions of reciprocal space, respectively, around the nominally n glide forbidden hk0, h + k odd reflections) in Figure 4a necessarily arise from transverse polarized, [110] and [1, −1, 0] displacements, respectively (for further details, see Withers et al., 2004). Similarly, the strong transverse polarized, azimuthal intensity variation of the characteristic diffuse streaking running perpendicular to c* through the hkl, l even Bragg reflections of CCTO in Figure 4b requires the atomic displacements responsible to run perpendicular to the diffuse streaking itself (i.e., along the [001] direction; for more details, see Liu et al., 2005).
B. Extinction Conditions The structure factor expression given in Eq. (5) also provides the basis for understanding the existence of extinction conditions in experimentally observed diffuse distributions, both symmetry induced and accidental (see e.g., Withers, 2005). Consider, for example, the [130] zone axis EDP of CCTO shown in Figure 4b. Note the G ± ε[6, −2, 0]∗ (ε continuous) diffuse streaking that runs through only the G = [hkl]∗ , l even, parent reflections to a very good approximation. (Similar -type zone axis EDPs to Figure 4b [Liu et al., 2005] show that this q = ε[6, −2, 0]∗ diffuse streaking is only a part of essentially continuous (001)∗ sheets of diffuse intensity running through the [hkl]∗ , l even but not l odd, parent reflections.) Furthermore, note that the
316
Ray L. Withers
intensity of this G ± ε[6, −2, 0]∗ diffuse streaking in Figure 4b goes precisely to zero along the exact G = [6, −2, 0]∗ direction of reciprocal space itself. As discussed in Section III.A, this observation requires that the atomic displacements responsible for the diffuse streaking in Figure 4b and the overall {001}∗ sheets of diffuse intensity necessarily run along the <001> real-space directions (see e.g. the Ti shifts indicated by arrows in Figure 7b). Finally, note that the diffuse streaking disappears altogether at the exact <001> zone axis orientation (see Figure 7a). How can these experimental observations be rationalized or understood First, CCTO has a nonpolar, average structure space group symmetry of Im − 3 (shown in Figure 7b). Now imagine breaking the observed (001)∗ sheet of diffuse intensity into individual modulation waves characterized by modulation wave vectors of the type q = αa∗ + βb∗ and then treating each of these primary modulation waves as completely independent of one another (as described in Section II). The structural description problem then becomes effectively equivalent to dealing with a series of uncoupled or independent (3 + 1)-d incommensurately modulated structures (IMSs), one for each modulation wave vector making up the observed (001)∗ sheet of diffuse intensity.
(a) (b) FIGURE 7 (a) An <001> zone axis EDP of Im − 3, Ca3 CuTi4 O12 (CCTO). Note the complete absence of diffuse streaking at this zone axis orientation. (b) A unit cell of the Im − 3, CCTO average structure in projection along a <100> direction. The corner-connected TiO6 octahedra are shown in blue; the Ca ions are represented by the large pink balls; the Ti and Cu ions are shown by the medium-sized blue and green balls, respectively; and the O ions are indicated by the small red balls. The black arrows show the correlated Ti shifts along one of the <001> directions (correlated along this particular <001> row direction but not from one such <001> to the next) responsible for the structured diffuse scattering observed in Figure 4b. (See color plate).
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
317
For each of these individual IMSs, there are only two symmetry operations of the parent m − 3 point group symmetry that map the associated q = αa∗ + βb∗ into itself—the identity E and reflection in a mirror plane perpendicular to c (i.e., mz ). The so-called little co-group of a general such modulation wave-vector Gq is then = {E, mz } (see Bradley and Cracknell, 1972). There are only two possible irreducible representations (irreps) associated with such a modulation wave vector: the totally symmetric representation and a second irrep with a character of −1 under the parent symmetry operation {mz |0}. For satellite reflections of the type [hk0]∗ + q to be symmetry forbidden (and hence missing from EDPs such as Figures 7a and 4b), the sole requirement is that the irrep associated with each individual modulation wave-vector q = αa∗ + βb∗ (making up the overall (001)∗ sheet of diffuse intensity) transforms according to this second irrep. An alternative, but equivalent, way to state this is to say that the (3 + 1)-d superspace group symmetry associated with any individual IMS should be I112/m (α, β, 0)0s (a nonstandard setting of the superspace group B112/m (α, β, 0)0s, Number 12.2 in Table 9.8.3.3.5 of Janssen et al., 1995). In addition to the centering condition F(hklm) = 0 unless h + k + l is even, this superspace group also ensures that F(hk0m) = 0 for m = 1, just as is required and observed experimentally (see Figures 4b and 7a). This then explains the absence of diffuse intensity in the [001] zone axis EDP of Figure 7a. However, it does not explain the fact that the G ± ε [6, −2, 0]* diffuse streaking in Figure 4b runs only through the [hkl]*, l even (and not l odd) parent reflections. To understand this, it is necessary to return to the first-order displacive form of Eq. (5), that is, to the expression
F(G + q) = Nμ fμ exp[−2πiG·rμ ] × {−i/2 (2π[G + q]·eμ (q))}. From the mathematical form of this structure factor expression, it is apparent that the observed F([hkl]∗ + [αβ0]∗ ) = 0 unless l is odd, pseudoextinction condition (see Figure 4b) requires the displacement eigenvectors, eμ (q = [αβ0]∗ ), of two atoms separated by 1/2 c to displace by the same amount along the c axis. The only candidate atoms that can satisfy this requirement are the Ti atoms (see Figure 7b). Consider, for example, two such Ti atoms − Ti1 at rTi1 = [¼, ¼, ¼] and Ti2 at rTi2 = [¼, ¼, ¾]. Suppose that eTi2 (at rTi2 = [¼, ¼, ¾]) = +eTi1 (at rTi1 = [¼, ¼, ¼]) = eTi c for all q = [αβ0]∗ . Substitution into the above expression shows that
F(G + q)α exp[−iπ(h + k + l)/2]{1 + exp[−iπl]} × (2π[G + q]·eTi c(q)) = 0 unless l is even. There are four such pairs of Ti atoms per parent unit cell (see Figure 7b). Provided the atoms in each pair obey the constraint that they displace
318
Ray L. Withers
by the same amount along c, the Ti contribution to the observed extinction condition will continue to be obeyed. Conversely, if the Ca, Cu, or O atoms displaced in a similar correlated fashion, then the observed extinction condition would no longer be obeyed. Likewise, if the Ti atoms displaced perpendicular to c in opposite directions (as allowed by the second irrep) in a correlated fashion, then the observed extinction condition would again no longer be obeyed. Thus, the observation of this well-obeyed (see Figure 4b) extinction condition not only allows us to determine that eTi2 = +eTi1 = eTi c for all q = [αβ0]∗ but also that the Ca, Cu, and O ions do not displace at all. In the case of CCTO, this is of some interest as these <001> displacements of the Ti ions (correlated along the <001> directions themselves but uncorrelated from any one such column to the next in the transverse direction) effectively set up 1−d dipole moments and show that CCTO is an incipient ferroelectric (Liu et al., 2005). The existence of extinction conditions (whether absolute in symmetry terms or not) in diffuse distributions can thus clearly provide great insight into the origin of the structural disorder responsible (whether static or dynamic in origin; see Aroyo et al., 2002a,b; Perez-Mato et al., 1998; and van Tendeloo, 1998). It also emphasizes the links between disorder and the crystallography of modulated structures. In certain favorable cases, as seen above, it is even possible to virtually solve for the real-space origin of an observed diffuse distribution from such extinction conditions.
C. Planar Diffuse Absences Apparent extinction conditions in diffuse distributions can, however, also arise for other reasons. For example, it is surprisingly often true that a highly structured diffuse intensity distribution is clearly visible when tilted only slightly away from some major zone axis orientation but disappears entirely when tilted to the exact zone axis orientation (see Figure 8; see also Welberry, 1986; Liu et al., 2003; and Butler et al., 1992). While the absence of diffuse intensity in the <001> zone axis EDP of CCTO in Figure 7a (as well as the hyperglide plane–type extinction condition exhibited by the diffuse distribution in Figure 4a; see Withers et al., 2006) can be explained in terms of superspace symmetry arguments (as outlined in Section II.B), the same cannot be true for the [1, 1, −2] and [111] zone axis EDPs of Bi1.89 GaSbO6.84 shown in Figure 8. The reason is that the average structure space group symmetry of this disordered pyrochlore is Fd-3m (Ismunandar et al., 1999), and hence there is no mirror plane perpendicular to either [1, 1, −2] or [111] in reciprocal space (i.e., the little co-group of a general modulation wave vector in the zeroorder Laue zones of Figure 8a or 8b consists only of the identity operation). The sudden disappearance of the structured diffuse scattering at the exact
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
(a)
319
(b)
(c) (d) FIGURE 8 (a) [1, 1, −2] and (b) [111] zone axis electron diffraction patterns (EDPs) of the disordered Fd-3m, Bi-based pyrochlore Bi1.89 GaSbO6.84 (see Ismunandar et al., 1999, for details of the average structure). Note the complete absence of diffuse streaking in either EDP at the exact zone axis orientation. EDPs taken tilted only a few degrees away from the exact [1, 1, −2] and [111] zone axis orientations are shown in (c) and (d), respectively.
zone axis orientations in Figure 8 cannot therefore be attributed to any symmetry condition. Rather, such planar diffuse absences require that there is simply no modulation (either compositional and/or displacive) in projecting down the exact zone axis orientations. Why should this be the case? The most likely explanation is that it avoids energetically costly macroscopic strain along the polyhedrally closestconnected <110>, <112>, and <111> directions (see Figure 9). Any compositional modulation (e.g., of the occupancy of the O ions in the red O Bi4 tetrahedra or of the relative Ga/Sb occupancy of the green Ga1/2 Sb1/2 O6 octahedra) characterized by a modulation wave vector exactly perpendicular to [1, 1, −2], for example, automatically creates rows of tetrahedra or octahedra along the same [1, 1, −2] direction (see Figure 9a) that contain more or less of either the O oxygen vacancies or of the Ga ions relative to the Sb ions.
320
Ray L. Withers
[111] ¯ [112]
[111] ¯ [110]
(a) (b) FIGURE 9 The disordered average structure of Bi1.89 GaSbO6.84 ≡ (O 0.84 Bi1.89 ). (GaSbO6 ) is shown in projection down an (a) [1, −1, 0] direction and (b) a [1, 1, −2] direction. The [1, 1, −2] direction is horizontal and the [111] direction vertical in (a), whereas the [1, −1, 0] direction is horizontal and the [111] direction vertical in (b). The O 0.84 Bi1.89 tetrahedral substructure, built of O Bi4 tetrahedra, is shown in red and the (GaSb)O6 octahedral substructure, built of Ga1/2 Sb1/2 O6 octahedra, in green. The average structure unit cell is outlined in each case. (See color plate).
The required expansion or contraction in size of the O’Bi4 tetrahedra or Ga1/2 Sb1/2 O6 octahedra that must necessarily accompany any such compositional modulation (i.e., the displacive size effect relaxation; see Welberry, 1986; Butler et al., 1992; and Section III.D) would therefore be different from one [1, 1, −2] row of polyhedra to the next and hence would necessarily lead to macroscopic strain. As such, it does not occur (i.e., it has zero amplitude, as shown experimentally by Figure 8). The existence of “dark planes” in the midst of diffuse intensity distributions has only rarely been noticed—let alone reported—and hence the theoretical understanding of the phenomenon is not as well developed as other areas of diffuse scattering theory. Nonetheless, it is clear that the avoidance of macroscopic strain along high-symmetry, closest-contact directions of the atoms or constituent polyhedra of the underlying average structure plays an important role in the crystal chemistry underlying the existence of these planar diffuse absences (see Welberry, 1986; Butler et al., 1992).
D. Size Effect From the diffraction point of view, the size effect refers to the marked transferring of intensity from regions on one side of average structure Bragg
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
321
reflections to the other. It is usually attributed to a mixed compositional and displacive contribution to the scattered intensity arising from the displacive relaxations accompanying compositional ordering (Warren et al., 1951; Borie and Sparks, 1971; Welberry, 1986; Welberry and Butler, 1994) and was first described in the context of disordered binary alloy phases (Warren et al., 1951). The same effect is commonly observed in a range of other types of disordered phases. For example, Figure 10c shows an <001>-type zone axis EDP typical of the disordered Bi-based pyrochlore (Bi1.5 Zn0.5 )(Ti1.5 Nb0.5 )O7 ≡ (O Bi1.5 Zn0.5 )·(Ti1.5 Nb0.5 O6 ) (BZNT). In addition to the strong, sharp Bragg reflections of the underlying Fd-3m, pyrochlore-type average structure, note the presence of characteristic “blobs” of additional diffuse intensity (verging on additional satellite reflections) at the G ± <001>∗ regions of reciprocal space whose intensity is systematically stronger on the low-angle sides of the associated parent Bragg reflection than on the high-angle sides (compare, for example, the relative intensities of the “satellite” reflections surrounding the [−8, 8, 0]∗ parent Bragg reflection encircled by the square white box in Figure 10c). These G ± <001>* satellite reflections arise from compositional Bi/Zn ordering on the (O’Bi1.5 Zn0.5 ) tetrahedral substructure (see Figure 10a) of the pyrochlore average structure type (shown in a different context in Figure 9). As a result of local crystal chemical considerations (see Liu et al., 2006, for the details), the O’ ions move out of the center of each tetrahedra toward the Zn ions, which in turn induces a coupled size effect–like relaxation of the surrounding Bi and Zn ions (shown in Figure 10b). Without this size effect–like displacive relaxation (corresponding to a systematic expansion of the nearest-neighbor [nn] Bi-Bi distances and a corresponding systematic reduction of the nn Bi-Zn distances) accompanying the Bi/Zn compositional ordering, the characteristic size effect–induced transfer of intensity from the high- to the low-angle side of the neighboring Bragg reflections is missing. With this displacive relaxation included, however, the observed intensity redistribution is clearly well described (compare the simulated [001] zone axis EDP of Figure 10d with the experimental one shown in Figure 10c; for more details see Liu et al., 2006).
E. Effects of Multiple Scattering and How to Minimize Them It should be evident that much very useful information regarding the nature of the structural disorder giving rise to structured diffuse distributions can be extracted via the use of the purely kinematic structure factor expression given in Eq. (5). The possibility of multiple scattering obscuring this information when using electron diffraction, however, should never be ignored. EDPs such as Figure 4a and Figure 5b demonstrate clearly that the theoretical possibility of multiple scattering involving more than one
322
Ray L. Withers
(a)
(b)
(c) (d)
FIGURE 10 (a) The Bi/Zn ordered (O Bi1.5 Zn0.5 ) tetrahedral substructure of BZNT responsible for the G ± <001>∗ satellite reflections apparent in the <001> zone axis EDP of BZNT shown in (c). The Bi ions are represented by the large pink balls and the Zn ions by the smaller blue balls. The O ions are in the center of the O Bi3 Zn tetrahedra in (a). (b) shows the size effect–relaxed Bi/Zn distribution, while (d) shows the simulated EDP corresponding to (b). For more details, see Liu et al., 2006. (See color plate).
individual modulation wave vector on the primary diffuse distribution can safely be ignored, but the possibility/probability of a redistribution of the kinematic primary diffuse distribution by strongly excited parent Bragg reflections usually cannot. For example, Figure 11, shows (a) close to [001], (b) [0, −1, 3], and (c) [001] zone axis EDPs of the O/F-disordered oxyfluoride FeOF. In the case of Figure 11a and, in particular, Figure 11b note the presence of a clear pseudo-extinction condition in that the observed diffuse only runs through the parent hkl, h + k + l odd reflections. In the case of Figure 11c, however, the observed diffuse runs through both hkl, h + k + l odd, as well as even, reflections. This is clearly due to a redistribution of the rather more kinematic diffuse intensity distribution apparent in Figure 11a by significantly more strongly excited h + k + l odd parent Bragg reflections at the
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
323
(a) (b) (c) FIGURE 11 (a) Close to [001], (b) [0, −1, 3], and (c) [001] zone axis electron diffraction patterns of the disordered oxyfluoride FeOF. For details, see Brink et al., 2000.
exact zone axis orientation, leading to an apparent break in the pseudoextinction condition (for a real-space description of the structural origin of the pseudo-extinction condition itself, see Brink et al., 2000; Withers, 2005). By tilting off-axis somewhat (as in Figure 11a), this intensity redistribution effect is significantly reduced and the kinematic pseudo-extinction condition again becomes apparent. A most important way to avoid missing the existence of such pseudoextinction conditions (and the critical insight they often provide into the structural distortions responsible; see Section III.B) is to minimize the dynamical redistribution of the kinematic diffuse intensity by deliberately taking EDPs a few degrees off-axis to keep the low-order parent Bragg reflections causing the intensity redistribution as weak as possible. An alternative method is to take EDPs at more minor zone axis orientations where the effect of multiple scattering from parent Bragg reflections often can be significantly reduced (compare the [0, −1, 3] zone axis EDP of FeOF in Figure 11b with the exact [001] zone axis EDP shown in Figure 11c). In practice, it is acknowledged that it is never possible to entirely remove the effects of multiple scattering by such means. Provided that the probability of intensity redistribution via multiple scattering, is duly noted, however, it is usually possible to extract the desired information from experimental EDPs by application of one of the above approaches.
IV. SELECTED CASE STUDIES A. Compositionally “Disordered,” NaCl-Related, Solid-Solution Phases Many compositionally disordered solid-solution phases that have an NaCltype average structure (such as, for example, LiFeO2 (Brunel et al., 1972), the widely substoichiometric, transition metal carbide and nitride (MC1−x
324
Ray L. Withers
and MN1−x ) solid-solution phases (Billingham et al., 1972; Brunel et al., 1972; Sauvage and Parthé, 1972) or the wide-range nonstoichiometric (1 − x)M2+ S·xLn3+ 2/3 S, M = Mg, Ca, Mn, Ln = a rare earth ion or Y, solidsolution phases (Flahaut, 1979; Withers et al., 1994a, 2007)) exhibit a highly structured and characteristic diffuse intensity distribution accompanying the strong average structure Bragg reflections that is well described by the relatively simple expression cos πh + cos πk + cos πl = 0 (Billingham et al., 1972; Sauvage and Parthé; 1972, 1974; de Ridder et al., 1976a,b; 1977a,b; Withers et al., 2003, 2007). For example, Figure 12, shows (a) <001>, (b) <110>, and (c) <3, 0, −1> zone axis EDPs typical of the disordered (1 − x)Mg2+ S·xYb3+ 2/3 S, 0 ≤ x ≤ 0.45, solid-solution phase for x = 0.30. Figures 12d, e, and f show the equivalent calculated sections through the diffuse surface cos πh + cos πk + cos πl = 0. Clearly there is very good agreement between the experimental EDPs and the calculated sections. Why should this be so, and what is the real-space meaning of this reciprocal space relationship? Rewriting the expression cos πh + cos πk + cos πl = 0 in the form exp(2πiq·½ a) + exp(−2πiq·½ a) + exp(2πiq·½ b) + exp(−2πiq·½ b) + exp(2πiq·½ c) + exp(−2πiq·½ c) = 0 (where q = ha∗ + kb∗ + lc∗ , h, k, and l continuous) suggests that such a well-defined shape in reciprocal space implies a well-defined and strongly obeyed multibody
(a)
(b)
(c)
(d) (e) (f) FIGURE 12 (a) <001>, (b) <110>, and (c) <3, 0, −1> zone axis EDPs typical of the disordered (1 − x)Mg2+ S·xYb3+ 2/3 S, 0 ≤ x ≤ 0.45, solid-solution phase for x = 0.30. Parts (d), (e), and (f) show the equivalent calculated sections through the diffuse surface cos πh + cos πk + cos πl = 0 (from Sauvage and Parthe, 1972).
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
325
correlation (see Welberry, 2004) in real space—in this case, a six-body correlation involving a central atom and six surrounding atoms at ± ½ a, ± ½ b and ± ½ c with respect to the central atom. In the specific case of (1 − x)Mg2+ S·xYb3+ 2/3 S, x = 0.30 (i.e., for Mg0.7 Yb0.2 20.1 S (2 indicates a vacancy) of defect NaCl average structure type, this real-space correlation amounts to a requirement that each S ion (at position t) should be surrounded, as far as possible, by the average number of Mg2+ and Yb3+ ions (as well as 2s) in its nearest-neighbor octahedral coordination polyhedron (see Sauvage and Parthé, 1972, 1974; Withers et al., 2007)—that is, as far as possible there should always be 4.2 Mg ions, 1.2 Yb ions and 0.6 2s vacancy in the nearest-neighbor octahedron of surrounding M ions (at t ± ½ a, t ± ½ b and t ± ½ c). In the language of the modulation wave approach,
δfM (t + ½a) + δfM (t − ½a) + δfM (t + ½b) + δfM (t − ½b) + δfM (t + ½c) + δfM (t − ½c) = 2fM av q aM (q) exp 2πiq·t{cos 2πq·½a + cos 2πq·½b + cos 2πq·½c} should always equal zero regardless of the value of t. The only way this can be true for all t is if {cos 2πq·½a + cos 2πq·½b + cos 2πq·½c} = cos πh + cos πk + cos πl = 0, just as is observed experimentally. Of course, at the local level it is not possible to have 4.2 Mg ions, 1.2 Yb, ions and 0.6 2 s (vacancies) surrounding each S ion. What the constraint implies is that the local octahedral configurations closest in composition to this (i.e., Mg4 Yb1 21 , Mg5 Yb1 20 , and Mg4 Yb2 20 ) must be heavily favored and occur most commonly. Note that the overall macroscopic stoichiometry results if only these three local configurations occur and if the relative proportions of their occurrence are 60%, 20%, and 20%, respectively (see Withers et al., 2007 for further details). The observed diffuse distribution is thus the characteristic signature in reciprocal space of the real-space condition that the smallest polyhedral building blocks of the average NaCl structure type (the SM6 octahedra) should each have, as far as possible, the same composition as the overall macroscopic composition (SMg4.2 Yb1.2 20.6 ) to minimize substitutional strain. Such a local occupational ordering constraint is entirely reasonable from the local crystal chemical point of view (see Withers et al., 2007). It is also the basis of the so-called cluster expansion approach to compositionally disordered solid-solution phases based originally on an idea of Pauling (1960) and subsequently expanded on, developed, and successfully applied by Brunel et al. (1972), Sauvage and Parthé (1972, 1974), and de Ridder et al. (1976a,b, 1977a,b) to a range of substitutional order/disorder problems.
326
Ray L. Withers
The same approach also has been successfully applied to a range of other local polyhedral cluster shapes, including tetrahedra, cubes, and trigonal prisms. In each case, different diffuse intensity surfaces have been predicted (see Sauvage and Parthé, 1974; de Ridder et al., 1976a,b; Liu et al., 2006) and, in many cases, verified experimentally.
B. Inherently Flexible, Tetrahedrally Corner-Connected Framework Structures Inherently flexible, tetrahedrally corner-connected framework structures, such as the quartz, cristobalite, and tridymite forms of silica, SiO2 , and AlPO4 or many of the members of the large family of microporous zeolitic aluminosilicate and zeotypic aluminophosphates, particularly those exhibiting apparent 180-degree T-O-T (T = Al, Si, P, and so on) angles (see Figure 13) represent another broad class of (displacively) disordered materials that often exhibit highly structured diffuse intensity distributions. Despite the inherent rigidity of the tetrahedral building blocks of these materials, they often are strongly polymorphic (see Pryde and Dove, 1998) and, in their high-temperature, high-symmetry polymorphic forms, usually dynamically disordered as a result of the simultaneous excitation of numerous, essentially zero-frequency, rigid unit mode (RUM) modes of distortion (Dove et al., 1998, 2002) involving changes in the relative orientation of neighboring polyhedral units without distorting either the shape or size of the individual polyhedral units themselves (see Figure 14).
z y
x y
x
(a) (b) FIGURE 13 An (a) [001] and (b) [110] projection of the disordered average crystal structure of the microporous aluminophosphate, AlPO4 -5. The larger AlO4 tetrahedra are light blue and the smaller PO4 tetrahedra dark blue in (b). The unit cell is shown in outline in both (a) and (b). Note the existence of apparent 180-degree Al-O-P angles along the z direction in (b). (See color plate).
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
(a)
327
(b)
(c) (d) FIGURE 14 (a) <111>, (b) < −3, 3, 1>, (c) <1, 1, −4>, and (d) [−1, 1, 0] zone axis electron diffraction patterns (EDPs) typical of (a) the β-cristobalite form of SiO2 above 270◦ C, (b) SiO2 -tridymite above 250◦ C, (c) β-hexacelsian above 310◦ C, and (d) room-temperature AlPO4 -5.
The sharp, continuous diffuse intensity distributions observed in EDPs from materials of this type (see Figure 14) map the zero-energy cost, or zero frequency, RUM phonon modes of distortion of these various inherently flexible framework structures. For these materials, the relevant constraint giving rise to the observed structured diffuse distribution is not this time a local compositional constraint but rather a requirement that the individual polyhedral units should not be distorted by the particular soft phonon mode involved. Because the oxygen ions in such framework structures typically link two such tetrahedral units (see Figure 13), it is clear that the only allowable soft modes are those involving necessarily coupled correlated tetrahedral rotations and translations of neighboring, essentially rigid, tetrahedral units. The topological connectivity of the individual tetrahedral units involved is thus crucial in determining the allowed modulation wave vectors and hence the observed diffuse distribution.
328
Ray L. Withers
The topological connectivity of the constituent SiO4 tetrahedral units in the case of the β-cristobalite form of SiO2 , for example, gives rise to soft phonon modes with modulation wave vectors localized to reciprocal lattice directions perpendicular to the six <110> directions of real space (see Figure 13a; see also Withers, 2005). In other cases, however, the shape of the observed diffuse distribution is considerably more complex (see Figure 13b). In the case of the tridymite form of SiO2 , for example, the topological connectivity of the constituent SiO4 tetrahedral units yields a curved diffuse distribution intensity distribution whose analytical shape is well described by the expression sin2 πl = 8/9{1 − cos πh cos πk cos π (h + k)} (see Withers, 2003 for the details). Regardless of the particular details involved in any one case, it is the topological connectivity of the tetrahedral units that always determines the allowed patterns of correlated polyhedral rotations and translations and hence the shapes of the resultant observed diffuse distributions. Such materials are fundamentally jellylike and unavoidably displacively disordered. A correct understanding of the local crystal chemistry and the fundamental physicochemical properties of such materials requires knowledge of this inherent coupled orientational and translational flexibility.
C. The Nonmagnetic Kondo Effect Material ThAsSe and the Role of the Fermi Surface Low-dimensional materials susceptible to CDW-type structural instabilities, such as the nonmagnetic Kondo effect material ThAsSe, also often exhibit highly structured diffuse intensity distributions that effectively map the Fermi surface (FS) of the undistorted parent structure. For example, Figure 15a, shows an [001] zone axis EDP of ThAsSe taken at 100◦ K, and Figure 15c shows the clearly closely related, calculated FS of ThAsSe in projection along c∗ . Additional EDPs taken on tilting away from this [001] zone axis orientation (see Withers et al., 2004, 2006, for details) show that the diffuse streaking in Figure 15a is, in fact, part of sheets of G ± ∼ 0.14 < 110 >∗ ± ε < 1, −1, 0 >∗ + η [001]* (ε and η continuous) diffuse intensity running perpendicular to the two < 110 > directions of real space. The P4/nmm average structure of ThAsSe is shown in Figure 15b. The calculated FS of the undistorted ThAsSe parent structure in projection along c* is shown in Figure 15c. Temperature-dependent electron diffraction shows that the highly structured diffuse intensity distribution characteristic of ThAsSe at low temperature is the result of the gradual condensation of a 1D CDW and associated pattern of As-As dimerization (shown in Figure 15d), correlated in 1D strings along < 110 > directions but not correlated from one < 110 >
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
(a)
(b)
329
(c) <110>
(d) FIGURE 15 (a) An [001] zone axis electron diffraction pattern of ThAsSe taken at 100◦ K. (b) The P4/nmm average structure of ThAsSe (As ions are represented by the large blue balls, Se ions by the medium-sized red balls, and Th ions by the small black balls). (c) The calculated Fermi surface (FS) of ThAsSe in projection along c *. Some q-vectors spanning this FS are marked on (c). (d) The pattern of As-As dimerization responsible for the observed diffuse distribution (see Withers et al., 2006, for details). (See color plate).
string to the next. Electronic band structure calculations (see Withers et al., 2006) show that the ∼ 7 times periodicity of this As-As dimerization along each of the two < 110 > real-space directions is determined by the FS of the undistorted parent structure. The q-vectors, obtained by measuring the distances between the flattened parts of the corresponding FS, are in remarkably good accord with the values determined from the EDPs (compare Figure 15c with Figure 15a).
D. Materials Susceptible to Ferroelastic Strain Distortions Such as α-PbO The P4/nmm, layered, tetragonal α form of PbO (at = 3.9719, ct = 5.023 Å, subscript t for tetragonal parent structure, at room temperature, see Figure 16 below) undergoes a low-temperature, incommensurate phase transition in the vicinity of 208◦ K. It was first noticed because it is improper ferroelastic (Boher et al., 1985; Moreau et al., 1989; Withers et al., 1993; Withers and Schmid, 1994)—the freezing of an incommensurate q ∼ 0.185(−a∗t + b∗t ) displacive modulation on cooling below ∼ 208◦ K induces an orthorhombic strain distortion of the underlying tetragonal substructure, leading to a Cmma(a = at + bt , b = −at + bt , c = ct ) average structure for the low-temperature phase.
330
Ray L. Withers
b
a
(a) (b) FIGURE 16 (a) A close to <110> and (b) [001] projection of the P4/nmm, tetragonal α form of PbO. The small red balls represent the O ions and the larger dark balls the Pb ions. Note that the O ions form square arrays in (001) planes and that each oxygen ion is tetrahedrally coordinated by Pb ions. (See color plate).
Moreau et al. (1989) reported the existence of a significant precursor diffuse background in neutron powder diffraction profiles of α-PbO at room temperature above this low-temperature transition but were unable to determine the distribution in reciprocal space of this diffuse distribution. Room-temperature [001] zone axis EDPs show the presence of extremely characteristic diffuse crosses centered on each Bragg reflection and running along the <110>∗t directions of reciprocal space (see Figure 17a). EDPs taken on tilting away from [001] (see the [0, −1, 2] zone axis EDP in Figure 17b) show that the diffuse distribution is not confined to the <110>* directions of reciprocal space but rather takes the form of quite narrow discs or ellipses of diffuse intensity perpendicular to the <110>t directions of real space—the diffuse always runs along the * directions of reciprocal space (e.g., along the [221]* and [−2, 2, 1]* directions of reciprocal space in Figure 17b). Note also that the corresponding real-space displacive modulations responsible are necessarily transverse polarized as shown by the <−1, 1, 0>* systematic row EDP in Figure 17d (also see Section III.A). This type of characteristic diffuse distribution is strongly reminiscent of the diffuse streaking present in many materials susceptible to a change in crystal class at a phase transformation, such as the partially ordered potassium feldspars (McClaren and Fitz Gerald, 1987), doped 1:2:3 oxide superconductors (Schmahl et al., 1989), and disordered semiconductor alloy phases (see Treacy et al., 1985). Its presence demonstrates that the room-temperature P4/nmm structure of α-PbO shown in Figure 16 is only a time- and space-averaged structure. Diffraction contrast images of these types of materials typically show a characteristic “tweed” microstructure. A rather similar tweed microstructure (exhibiting a minimum dimension
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
(a)
331
(b)
(c) (d) FIGURE 17 Room-temperature (a) [001] and (b) [0, −1, 2] zone axis electron diffusion patterns (EDPs) of α-PbO. (d) shows a [−1, 1, 0]* systematic row EDP and (c) a [200]t * Dark-field (DF) image of α-PbO taken close to an [001] zone axis orientation. The corresponding EDP is shown in the inset. Note the characteristic tweed microstructure visible in the vicinity of the [200]t * extinction bend contour. For details, see Withers et al., 1993.
of ∼20 Å) can also be observed in α-PbO, as shown in the ∼ [001] zone axis room-temperature dark-field (DF) micrograph of Figure 17c. Intriguingly, this tweed microstructure displayed dynamical behavior in that bands of tweed contrast changed in appearance (“twinkled”) in time scales of less than a second; that is there is still some dynamical character to the instability at room temperature. In general, the narrowest bands of the tweed pattern changed contrast most rapidly so that images recorded typically over times of 5–10 seconds were dominated by the most stable and broad bands (see Withers et al., 1993, for more details). The structural origin of the diffuse crosses and the corresponding tweed microstructure arise from the instability of the α-PbO framework to long-wavelength transverse shear waves characterized by modulation wave vectors perpendicular to the two <110>t directions of real space (in
332
Ray L. Withers
much the same manner as the transverse modulated square atom array in Figure 6a). It therefore appears that the room-temperature P4/nmm structure of α-PbO resembles a 2D “jelly” in that it is unstable against specific long-wavelength transverse shear waves. Indeed, on cooling below 208◦ K, one of these unstable transverse shear waves characterized by the incommensurate primary modulation wave vector q ∼ 0.185(−a∗t + b∗t ) freezes out completely, giving rise to the low-temperature improper ferroelastic phase of α-PbO (see Withers and Schmid, 1994).
V. CONCLUSIONS It is hoped that this review has demonstrated that a wealth of unexplained diffraction phenomena, as well as detailed structural information, remains to be extracted from careful electron diffraction investigations of “disordered”/locally ordered materials and that the modulation wave approach to the description and interpretation of structured diffuse scattering is a particularly useful approach for this endeavor. The existence of absolute extinction conditions and pseudo-extinction conditions in diffuse distributions and the existence of “dark planes” in diffuse distributions are important diffraction phenomena that deserve more systematic exploitation in investigations of disordered materials.
ACKNOWLEDGMENTS The author acknowledges his gratitude to many colleagues. In particular, thanks are due to his close colleague, Richard Welberry. Thanks are also extended to G. van Tendeloo, F. Brink, L. Norén, Y. Liu, R. Vincent, J. D. Fitz Gerald, S. Schmid, J. G. Thompson, J. M. Pérez-Mato, J. Etheridge, and P. Midgeley for their interest in this area and to F. Brink, I. R. Fisher and P. Midgeley for permission to reproduce material from joint publications. The Australian Research Council (ARC) provided financial support in the form of ARC Discovery Grants. The author also thanks Peter Hawkes for inviting him to write this chapter.
REFERENCES Anderson, J. S. (1984). Nonstoichiometric compounds: A critique of current structural views. Proc. Indian Acad. Sci. (Chem. Sci.) 93, 861–904. Andersson, S., Hyde, S. T., and Bovin, J-O. (1985). On the periodic minimal surfaces and the conductivity mechanism of α-AgI. Z. für Kristallogr. 173, 97–99. Andersson, S., Hyde, S. T., Larsson, K., and Lidin, S. (1988). Minimal surfaces and structures: From inorganic and metal crystals to cell membranes and biopolymers. Chem. Rev. 88, 221–242. Aragón, J. L., Terrones, H., and Romeu, D. (1993). Model for icosahedral aperiodic graphite structures. Phys. Rev. B 48, 8409–8411.
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
333
Aroyo, M. I., Boysen, H., and Pérez-Mato, J. M. (2002a). Inelastic neutron scattering selection rules for phonons: Application to leucite phase transitions. Appl. Phys. A (Suppl.) 74, 1043–1045. Aroyo, M. I., Boysen, H., and Pérez-Mato, J. M. (2002b). Application of phonon extinction rules in thermal diffuse scattering experiments. Physica B 316–317, 154–157. Billingham, J., Bell, P. S., and Lewis, M. H. (1972). Vacancy short-range order in substoichiometric transition metal carbides and nitrides with the NaCl structure. I. Electron diffraction studies of short-range ordered compounds. Acta Cryst. A 28, 602–606. Bindi, L., Evain, M., and Menchetti, S. (2006). Temperature dependence of the silver distribution in the crystal structure of natural pearceite, (Ag,Cu)16(As,Sb)2S11. Acta Crystallogr. B 62, 212–219. Boher, P., Garnier, P., Gavarri, J. R., and Hewat, A. W. (1985). Monoxyde quadratique PbOα(I): Description de la transition structurale ferroélastique. J. Solid State Chem. 57, 343–350. Borie, B., and Sparks, C. J. (1971). The interpretation of intensity distributions from disordered binary alloys. Acta Crystallogr. A 27, 198–201. Bradley, C. J., and Cracknell, A. P. (1972). “The Mathematical Theory of Symmetry in Solids.” Clarendon Press, Oxford. Brink, F. J., Norén, L., and Withers, R. L. (2004). Electron diffraction evidence for continuously IV variable, composition-dependent O/F ordering in the ReO3 type, NbV 1−x Nbx O2−x F1+x , 0 ≤ x = 0.48, solid solution. J. Solid State Chem. 177, 2177–2182. Brink, F. J., Withers, R. L., and Norén, L. (2002). An electron diffraction and crystal chemical investigation of oxygen/fluorine ordering in niobium oxyfluoride, NbO2 F. J. Solid State Chem. 166, 73–80. Brink, F. J., Withers, R. L., and Thompson, J. G. (2000) An electron diffraction and crystal chemical investigation of oxygen/fluorine ordering in rutile-type iron oxyfluoride, FeOF. J. Solid State Chem. 155, 359–365. Brunel, M., de Bergevin, F., and Gondrand, M. (1972). Determination theorique et domaines d’existence des differentes surstructures dans les composes A3+ B1+ X22− de type NaCl. J. Phys. Chem. Sol. 33, 1927–1941. Butler, B. D., Withers, R. L., and Welberry, T. R. (1992). Diffuse absences due to the atomic size effect. Acta Crystallogr. A 48, 737–746. Castles, J. R., Cowley, J. M., and Spargo, A. E. C. (1971). Short-range ordering of vacancies and Fermi surface of TiO. Acta Crystallogr. A 27, 376–383. Comes, R., Lambert, M., and Guinier, A. (1968). The chain structure of BaTiO3 and KNbO3 . Solid State Commun. 6, 715–719. Dove, M. T., Heine, V., Hammonds, K. D., Gambhir, M., and Pryde, A. K. A. (1998). Short range disorder and long range order: Implications of the Rigid Unit Mode model. In “Local Structure from Diffraction” (S. J. L. Billinge and M. F. Thorpe, eds.), pp. 253–571. Plenum Press, New York. Dove, M. T., Tucker, M. G., and Keen, D. A. (2002) Neutron total scattering method: Simultaneous determination of long-range and short range order in disordered materials. Eur. J. Mineral. 14, 331–348. Esmaeilzadeh, S., Lundgren, S., Halenius, U., and Grins, J. (2001). Bi1−x Crx O1.5+1.5x , 0.05 < x < 0.15: A new high temperature solid solution with a three-dimensional incommensurate modulation. J. Solid State Chem. 156, 168–180. Fisher, I. R., Kramer, M. J., Islam, Z., Wiener, T. A., Kracher, A., Ross, A. R., Lograsso, T. A., Goldman, A. I., and Canfield, P.C. (2000). Growth of large single-grain quasicrystals from high-temperature metallic solutions. Mater. Sci. Eng. A 294–296, 10–16. Flahaut, J. (1979). Sulfides, selenides and tellurides. In “Handbook on the Physics and Chemistry of the Rare Earths” (K. L. Gschneider and L. Eyring, eds.), Chapter 31, Vol. 4, pp. 1–88. North Holland, Amsterdam.
334
Ray L. Withers
de Fontaine, D. (1972). An analysis of clustering and ordering in multicomponent solid solutions—I. Stability criteria. J. Phys. Chem. Sol. 33, 297–310. Funke, K., and Banhatti, R. D. (2006). Ionic motion in materials with disordered structures. Solid State Ionics 177, 1551–1557. Glazer, A. M (1972). The classification of tilted octahedra in perovskites. Acta Crystallogr. B 28, 3384–3392. Gomyo, A., Suzuki, T., and Iijima, S. (1988). Observation of strong order in Gax In1−x P alloy semiconductors. Phys. Rev. Lett. 60, 2645–2648. Harada, J., and Honjo, G. (1967). X-ray studies of the lattice vibration in tetragonal barium titanate. J. Phys. Soc. Jpn. 22, 45–57. Harburn, G., Taylor, C. A., and Welberry, T. R. (1975). “Atlas of Optical Transforms.” G. Bell and Sons Ltd., London. Hartmann, V. M., and Kevan, L. (1999). Transition-metal ions in aluminophosphate and silicoaluminophosphate molecular sieves: Location, interaction with adsorbates and catalytic properties. Chem. Rev. 99, 635–663. Honjo, G., Kodera, S., and Kitamura, N. (1964). Diffuse streak patterns from single crystals I. General discussion and aspects of electron diffraction diffuse streak patterns. J. Phys. Soc. Japan 19, 351–367. Iijima, S. (1991). Helical microtubules of graphitic carbon. Nature 354, 56–58. Ismunundar, Kennedy, B. J., and Hunter, B. A. (1999). Observations on pyrochlore oxide structures. Mat. Res. Bull. 34, 1263–1274. Jacob, M., and Andersson, S. (1998). “The Nature of Mathematics and the Mathematics of Nature.” Elsevier Press, Amsterdam. Janner, A. (1997). De Nive Sexangula Stellata. Acta Crystallogr. A 53, 615–631. Janner, A. (2001). DNA enclosing forms from scaled growth forms of snow crystals. Cryst. Eng. 4, 119–129. Janssen, T., Chapuis, G., and de Boissieu, M. (2007). “Aperiodic Crystals: From Modulated Phases to Quasicrystals” (International Union of Crystallography Monographs on Crystallography, No. 20). Oxford University Press, Oxford. Janssen, T., Janner, A., Looijenga-Vos, A., and de Wolff, P. M. (1995). Incommensurate and commensurate modulated structures. In “International Tables for Crystallography,” vol. C (A. J. C. Wilson, ed.), pp. 797–835. Dordrecht: Kluwer Academic Publishers. Khanna, S. K., Pouget, J. P., Comes, R., Garito, A. F., and Heeger, A. J. (1977). X-ray studies of 2kF and 4kF instabilities in tetrathiafulvalene-tetracyanoquinodimethane (TTF-TCNQ). Phys. Rev. B 16, 1468–1479. Klinowski, J., Mackay, A. L., and Terrones, H. (1996). Curved surfaces in chemical structure. Phil. Trans. Roy. Soc. Lond. A 354, 1975–1987. Krivoglaz, M. A. (1969). “The Theory of X-Ray and Thermal Neutron Scattering by Real Crystals.” Plenum Press, New York. Kroto, H. W., Heath, J. R., O’Brien, S. C., Curl, R. F., and Smalley, R. E. (1985). C60 : Buckminsterfullerence. Nature 318, 162–163. Liu, Y., Withers, R. L., Nguyen, B., and Elliott, K. (2007). Structurally frustrated polar nanoregions in BaTiO3 -based relaxor ferroelectric systems. Appl. Phys. Lett. 91, 152907. Liu, Y., Withers, R. L., and Norén, L. (2003). An electron diffraction, XRD and lattice dynamical investigation of the average structure and RUM modes of distortion of microporous AlPO4 -5. Solid State Sci. 5, 427–434. Liu, Y., Withers, R. L., and Wei, X. Y. (2005). Structurally frustrated relaxor ferroelectric behaviour in CaCu3 Ti4 O12 . Phys. Rev. B 72, 134104:1–4. Liu, Y., Withers, R. L., Welberry, T. R., Wang, H., and Du, H. (2006). Crystal chemistry on a lattice: The case of BZN and BZN-related pyrochlores. J. Solid State Chem. 179, 2141–2149. Mackay, A. L. (1976). Crystal symmetry. Phys. Bull. 27, 495–497.
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
335
Mackay, A. L. (1981). De Nive Quinquangula. On the pentagonal snowflake. Sov. Phys. Crystallogr. 26, 517–522. Mackay, A. L. (1988). New geometries for superconduction and other purposes. Speculat. Sci. Technol. 11, 4–8. Matsumura, S., Takano, K., Kuwano, N., and Oki, K. (1991). Dynamical Monte Carlo simulation of L11 (CuPt)-type ordering during (001) epitaxial growth of III-V semiconductor alloys. J. Crystal Growth 115, 194–198. McLaren, A. C., and Fitz Gerald, J. D. (1987). CBED and ALCHEMI investigations of local symmetry and Al,Si ordering in K-feldspars. Phys. Chem. Min. 14, 281–292. Moreau, J., Kiat, J. M., Garnier, P., and Calvarin, G. (1989). Incommensurate phase in lead monoxide α-PbO below 208K. Phys. Rev. B 39, 10296–10299. Moreno, M. S., Varela, A., and Otero-Diaz, L. C. (1997). Cation non-stoichiometry in tinmonoxidephase Sn1−δ with tweed microstructure. Phys. Rev. B 56, 5186–5192. Moret, R., Huber, M., and Comes, R. (1977). Short-range and long-range order of titanium in titanium sulfide (Ti1+x S2 ). J. Phys. Colloque 7, 202–206. Norman, P. D., Morton, A. J., Wilkins, S. W. and Finlayson, T. R. (1985). Imaging the Fermi surface of ß’-brass through electron diffuse scattering. Metals Forum 8, 43–48. Ohsima, K., and Watanabe, D. (1973). Electron diffraction study of short-range-order diffuse scattering from disordered Cu-Pd and Cu-Pt alloys. Acta Crystallogr. A 29, 520–526. Otero-Diaz, L. C., Withers, R. L., Gomez-Herrero, A., Welberry, T. R., and Schmid, S. (1995). A TEM and XRD study of (BiS)1−δ (Nb1−ε S2 ) n misfit layer structures. J. Solid State Chem. 115, 274–282. Pauling, L. (1960). The Nature of the Chemical Bond. 3rd ed., p. 547. Ithaca: Cornell University Press. Pauling, L. (1985). Apparent icosahedral symmetry is due to directed multiple twinning of cubic crystals. Nature 317, 512–514. Pérez-Mato, J. M., Madariaga, G., and Tello, M. J. (1986). Diffraction symmetry of incommensurate structures. J. Phys. C Solid State Phys. 19, 2613–2622. Pérez-Mato, J. M., Madariaga, G., Zuniga, F. J., and Garcia Arribas, A. (1987). On the structure and symmetry of incommensurate phases. A practical formulation. Acta Crystallogr. A 43, 216–226. Pérez-Mato, J. M., Aroyo, M., Hlinka, J., Quilichini, M., and Currat, R. (1998). Phonon symmetry selection rules for inelastic neutron scattering. Phys. Rev. Letts. 81, 2462–2465. Petricek,V., Dusek, M., and Palatinus, L. (2000). “Jana2000. The crystallographic computing system.” Institute of Physics, Praha, Czech Republic. Pryde, A. A., and Dove, M. T. (1998). On the sequence of phase transitions in tridymite. Phys. Chem. Min. 26, 171–179. Putnis, A., and Salje, E. (1994). Tweed microstructures: experimental observations and some theoretical models. Phase Transit. 48, 85–105. de Ridder, R., van Dyck, D., van Tendeloo, G., and Amelinckx, S. (1977a). A cluster model for the transition state and its study by means of electron diffraction II. Application to some particular systems. Phys. Stat. Sol. A 40, 669–683. de Ridder, R., van Tendeloo, G., and Amelinckx, S. (1976a). A cluster model for the transition from the short range order to the long range order state in f.c.c based binary systems and its study by means of electron diffraction. Acta Crystallogr. A 32, 216–224. de Ridder, R., van Tendeloo, G., van Dyck, D., and Amelinckx, S. (1976b). A cluster model for the transition state and its study by means of electron diffraction I. Theoretical model. Phys. Stat. Sol. A 38, 663–674. de Ridder, R., van Tendeloo, G., van Dyck, D., and Amelinckx, S. (1997b). The transition state as an interpretation of diffuse intensity contours in substitutionally disordered systems. J.Physique Colloque 38, 178–186.
336
Ray L. Withers
Sato, H., Watanabe, D., and Ogawa, S. (1962). Electron diffraction study on CuAu at temperatures above the transition point of order-disorder. J. Phys. Soc. Japan 17, 1647–1651. Sauvage, M., and Parthé, E. (1972). Vacancy short range order in substoichiometric transition metal carbides and nitrides with the NaCl structure. II. Numerical calculation of vacancy arrangement. Acta Crystallogr. A 28, 607–616. Sauvage, M., and Parthé, E. (1974). Prediction of diffuse intensity surfaces in short range ordered ternary derivative structures based on ZnS, NaCl, CsCl and other structures. Acta Crystallogr. A 30, 239–246. Schmahl, W. W., Putnis, A., Salje, E., Freeman, P., Graeme-Barber, A., Jones, R., Singh, K. K., Blunt, J., Edwards, P. P., Loram, J., and Mirza, K. (1989). Twin formation and structural modulations in orthorhombic and tetragonal YBa2 (Cu1−x Cox )3O7−δ . Phil. Mag. Lett. 60, 241–248. Shechtman, D., Blech, I., Gratias, D., and Cahn, J. W. (1984). Metallic phase with long-range orientational order and no translational symmetry. Phys. Rev. Lett. 53, 1951–1953. van Smaalen, S. (2007). “Incommensurate Crystallography” (International Union of Crystallography Monographs on Crystallography, vol. 21). Oxford University Press, Oxford. van Tendeloo, G., van Landuyt, J., and Amelinckx, S. (1976). The α → β phase transition in quartz and aluminum phosphate as studied by electron microscopy and diffraction. Phys. Stat. Sol. A 33, 723–735. van Tendeloo, G., and Amelinckx, S. (1998). The origin of diffuse intensity in electron diffraction patterns. Phase Transit. 67, 101–135. Treacy, M. M. J., Gibson, J. M., and Howie, A. (1985). On elastic relaxation and long wavelength microstructures in spinodally decomposed indium gallium arsenide phosphide (Inx Ga1−x AsyP1−y ) epitaxial layers. Phil. Mag. A 51, 389–417. Ugarte, D. (1992). Curling and closure of graphitic networks under electron-beam irradiation. Nature 359, 707–709. Warren, B. E., Averbach, B. L., and Roberts, B. W. (1951). Atomic size effect in the X-ray scattering by alloys. J. Appl. Phys. 22, 1493–1496. Welberry, T. R. (1986). Multi-site correlations and the atomic size effect. J. Appl. Crystallogr. 19, 382–389. Welberry, T. R. (2004). The importance of multisite correlations in disordered structures. Ferroelectrics 305, 117–122. Welberry, T. R., and Butler, B. D. (1994). Interpretation of diffuse X-ray scattering via models of disorder. J. Appl. Crystallogr. 27, 205–231. Welberry, T. R., Withers, R. L., and Mayo, S. C. (1995). A modulation wave approach to understanding the disordered structure of cubic stabilized zirconias (CSZs). J. Solid State Chem. 115, 43–54. Wilson, J. A., Di Salvo, F. J., and Mahajan, S. (1975). Charge-density waves and superlattices in the metallic layered transition metal dichalcogenides. Adv. Phys. 24, 117–201. Withers, R. L. (2005). Disorder, structured diffuse scattering and the transmission electron microscope. Z. für Kristallogr. 220, 1027–1034. Withers, R. L., Ling, C. D., and Schmid, S. (1999). Atomic modulation functions, periodic nodal surfaces and the three-dimensional incommensurately modulated (1 − x)Bi2 O3 .xNb2 O5 , 0.06 < x < 0.23, solid solution. Z. für Kristallogr. 214, 296–304. Withers, R. L., Otero-Diaz, L. C., Goméz-Herrero, A., Landa-Canovas, A. R., Prodan, A., van Midden, H. J. P., and Norén, L. (2005). As-As dimerization, Fermi surfaces and the anomalous electrical transport properties of UasSe and ThAsSe. J. Solid State Chem. 178, 3159–3168. Withers, R. L., Otero-Diaz, L. C., and Thompson, J. G. (1994a). A TEM study of defect ordering in a calcium yttrium sulfide solid solution with an average NaCl-type structure. J. Solid State Chem. 111, 283–293.
“Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
337
Withers, R. L., and Liu, Y. (2005). Local crystal chemistry, structured diffuse scattering and inherently flexible framework structures. In “Inorganic Chemistry Highlights II” (G. Meyer, D. Naumann, L.Wesermann, eds.), pp. 347–363. Wiley-VCH, Weinheim. Withers, R. L., and Schmid, S. (1994). A TEM and group theoretical study of α-PbO and its low temperature improper ferroelastic phase transition. J. Solid State Chem. 113, 272–280. Withers, R. L., Schmid, S., and Thompson, J. G. (1993). The tweed microstructure of α-PbO and its relationship to the low temperature improper ferroelastic phase transition. In “Defects and Processes in the Solid State: Geoscience Applications: The McLaren Volume” (J. N. Boland and J. D. Fitz Gerald, eds.), pp. 305–316. Elsevier, Amsterdam. Withers, R. L., Thompson, J. G., and Welberry, T. R. (1989). The structure and microstructure of α-Cristobalite and its relationship to ß-Cristobalite. Phys. Chem. Min. 16, 51–523. Withers, R. L., Thompson, J. G., Xiao, Y., and Kirkpatrick, R. J. (1994b). An electron diffraction study of the polymorphs of SiO2 tridymite. Phys. Chem. Min. 21, 421–433. Withers, R. L., Urones-Garrote, E., and Otero-Diaz, L. C. (2007). Structured diffuse scattering, local crystal chemistry and metal ion ordering in the (1 − x)MgS.x /3 Yb2 S3 , 0 ≤ x ≤∼ 0.45, “defect” NaCl system. Phil. Mag. 87, 2807–2813. Withers, R. L., Vincent, R., and Schoenes, J. (2004). A low temperature electron diffraction study of structural disorder and its relationship to the Kondo effect in ThAsSe. J. Solid State Chem. 177, 701–708. Withers, R. L., van Midden, H. J. P., Prodan, A. B., Midgley, P. A., Schoenes, J., and Vincent, R. (2006). As-As dimerization, Fermi surfaces and the anomalous transport properties of UAsSe and ThAsSe. J. Solid State Chem. 179, 2190–2198. Withers, R. L.,Welberry, T. R., Brink, F. J., and Norén, L. (2003). Oxygen/fluorine ordering, structured diffuse scattering and the local crystal chemistry of K3 MoO3 F3 . J. Solid State Chem. 170, 211–220. de Wolff, P. M. (1974). The pseudo-symmetry of modulated crystal structures. Acta Crystallogr. A 30, 777–785. Yamamoto, A. (1982). Modulated structure of wüstite (Fe1−x O). Acta Crystallogr. B 38, 1451–1456.
This page intentionally left blank
CONTENTS OF VOLUME 151*
C. Bontus and T. Köhler, Reconstruction algorithms for computed tomography L. Busin, N. Vandenbroucke and L. Macaire, Color spaces and image segmentation G. R. Easley and F. Colonna, Generalized discrete Radon transforms and applications to image processing T. Radlicka, Lie agebraic methods in charged particle optics ˇ V. Randle, Recent developments in electron backscatter diffraction
*Lists of the contents of volumes 100–149 are to be found in volume 150; the entire series can be searched on ScienceDirect.com
This page intentionally left blank
INDEX
A Activation function, 156–158, 212 boundedness for, 156–158 of complex-valued neurons, 156–158, 174 regularity of, 156–158 sigmoid function in, 196 Adali, T., 157, 158 Adaptive algorithms, 26, 27 Adaptive color space, denoising in, 288 Adaptive dictionary learning process, 286 Adaptive multichannel representation, 285 Adaptive pattern classifiers model, see APCM Additive Gaussian noise, 229, 287 Aharon, M., 254 AIC, 258 Aiyoshi, E., 209 Akaikes information criterion, see AIC α-PbO, 329 Amagishi, Y., 82 Amari, S., 164, 229 AMF, 308 Amplitude-phase relationship, 157, 158 Anderson, J. S., 304 Anisotropic diffusion equation, 145 APCM, complex-valued, 162–164 Arbitrary spins, 50 Atomic modulation functions, see AMF Average error, 164
B Backpropagation learning algorithm, see BP learning algorithm Balan, R., 258 Banon, G. J. F., 17 Barrera, J., 17 Basis pursuit, 233, 238 Basis pursuit denoising, see BPDN Bayesian framework, 231, 243, 249 Bayesian information criterion, see BIC Beam-optical equation, 58 Beam-optical Hamiltonian, 58 Beam wavefunction, 58 Belousov–Zhabotinsky-type media, 99 Benichou, B., 264 Benvenuto, N., 211 Bernoulli–Gaussian distribution, 249
Bessel functions, 310 Best sparsifying basis, 264 Bialynicki-Birula, I., 61, 66, 70 BIC, 258 Billingham, J., 304 Binary domain, equivalent optimality in, 23–25 Binary functions, 15 Binary image filtering, 42 Binary signal filtering, 2 Binary signal operators, 4 Blind source separation, see BSS Bloch equation, 50 Bobin, J., 246, 248, 252, 253, 285 Boolean function, 5 Boolean lattice, 5 Born, M., 62 Bounded noise, 243 BPDN, 238 BP learning algorithm complex-valued, 157, 162–169 average of learning cycles of, 171, 173 comparison with real-BP algorithm, 169, 170, 173, 175, 177–181 complex transformation as rotation of, 187–194 computational complexity of, 170, 172 error backpropagation in, 167 generalization ability of, 175–181, 191–194, 208, 209 learning and test patterns of, 177–180, 192, 195, 200, 201, 205 learning rule of, 164–169, 174 learning speed of, 169–174 model neuron used in, 156 qualitative properties of, 196–209 simple transformation as rotation of, 182–187 three-layered network for, 170, 172, 194 transforming geometric figures, 181–203 usual generalization performance of, 205, 208, 209 weights and thresholds in, 183 real valued average of learning cycles of, 171, 173
341
342
Index
BP learning algorithm (continued) comparison with complex-BP algorithm, 169, 170, 173, 175, 177–181 computational complexity of, 170, 172 error backpropagation in, 167, 168 generalization ability of, 175–181 generalization of, 164–166 learning and test patterns of, 177–180, 204 learning rule of, 174 learning speed of, 171–174 three-layered network for, 170, 172 usual generalization performance of, 205, 208, 209 weights and thresholds for, 181 Bragg reflection, 321, 323 Brink, F. J., 314 Brunel, M., 304, 325 BSS, 222, 224, 225 GMCA for, see GMCA goal of, 225 morphological diversity and, 244 noiseless sparse, 263 overdetermined, 231 sparsity in, 229–232 benefits of, 265–267 neuroscience in, 230 role of, 272 strenuous inverse problem, 224–231 use of independence in, 226 Bump signals, 266
C Cardoso, J.-F., 229, 288 Cauchy–Riemann equations, 156 CCD, 134 CDW, 305, 328 Cellular neural network, see CNN Chaotic oscillators, 101 Charge coupled devices, see CCD Charge density wave, see CDW Charged-particle beam optics, 50, 51 Hamiltonians in prescriptions, 63 Lie algebraic formalism of, 59 quantum formalism of, 58, 59, 63 quantum prescription, 58 Chemical photosensitive nonlinear media, 81 Chen, J., 234 Chromatic aberrations, 291 Chua, L. O., 80
CMB, 292, 293 true simulated and estimated map of, 294, 295, 297 CMB emission law, 296 CMOS image sensors, 134, 135 CMOS technology, 134, 135 CNN, 80, 83, 96, 134, 135 integration, principle of, 82 Coliseum, noisy image of, 120 Color image denoising, application to, 286–289 Color image inpainting, application to, 289–292 Color images, recovering, 291 Combettes, P. L., 278 Combinatorial optimization problem, 233 Comon, P., 226, 227 Complex function, 157, 158, 162, 203, 207 Complex numbers, 155, 156, 162, 163, 170–172, 181, 196, 197, 200, 202 Complex-valued neural network, 155–162 Complex-valued neuron, 155, 156, see also Real-valued neuron activation function of, 156–158 basic structure of weights of, 159 classifying, 211 decision boundary of, 210, 211 defined, 155, 156 fading equalization problem for, 216, 217 mathematical analysis of, 194–202 multilayered, 162 orthogonality of decision boundary in, 209–217 output value of, computation of, 196–198 relationship with real-valued neuron, 158–161 rotation of, 198 symmetry problem, solving, 214–216 thresholds in, 156, 159, 162, 165 weight parameters of, 161 XOR problem, solving, 212, 213 Complex-valued output signal, 156 Complex-valued parameter, 163, 164, 169, 215 Complex-valued pattern, 162, 169–171, 173 Complex-valued signals, 168, 169 two-dimensional motion for, 161 Compositional modulation waves, 310 Computational sensors, 134 Conductance function, 70
Index
Contrast enhancement, 96–99 Contrast function, 265 Convergence theorem for APCM, 163, 164 learning, 163, 164 Cosmic microwave background, see CMB Cryptography, 99, 101 Crystal chemical flexibility, 306 Crystalline materials, 303 Cubic law, 108
D Dark-field micrograph (DF), 331 Darmois theorem, 226 Darwin term, 56 Davies, M., 229 DCT, 244, 267 de Broglie, Louis, 62 de Broglie wavelength, 59, 62 Decision boundary definition of, 209 to detect symmetry of input patterns, 215 hypersurface, 209–211 for imaginary-part of net-input, 215 orthogonal, 211–217 in complex-valued neuron, 200–217 utility of, 211–217 for real-part of net-input, 215 of single complex-valued neuron, 210, 211 Dedekind’s problem, 17 Dellamonica, D., Jr., 26, 33, 34 Demixing matrix, 226–229 Denoising color images, 289 de Ridder, R., 325 de Wolff, P. M., 307 Dictionary learning algorithm, 254 Diffuse distribution characteristic, 320 in extinction conditions, 315 kinematic primary, 322 Diffuse intensity, 314, 316 Diffusive soliton, 87 Digital signal, 4 Digital speech signals, 11 Dirac distribution, 296 Dirac equation, 50, 51, 57, 61, 63, 72, 73 Dirac particle, 50, 56 Discrete cosine transform, see DCT Discrete orthogonal wavelet transform, see DWT
343
Disordered/locally ordered phases, 306 Disordered materials, 308 modulation wave approach, 309 Displacive modulation waves, 310 Donoho, D. L., 233, 234, 240, 243, 264 DRAMs, 134 DWT, 266
E Economy/compression principle, 230 Edge detection algorithms, 122 Edge filtering, 120–124 EDP, 304, 306, 319 EFICA, 269, 272 Elad, M., 291 Electron diffraction, 305 Electron diffraction pattern, see EDP Electronic implementation, 103–108 of multistable network, 130–133 Elementary cell of inertial systems, 103 of multistable nonlinear network, 130 temporal evolution of, 106 Equation Bloch, 50 Cauchy–Riemann, 156 Dirac, 50, 51, 57, 61, 63, 72, 73 Fisher’s, 85 Helmholtz, 50, 57, 60 Klein–Gordon, 50, 57, 60, 63–65 Maxwell, 50, 60, 68 Nagumo, 88, 89 Error propagation in complex BP, 167 in real BP, 167, 168 Espejo, S., 135 Even operator, 52 Excited state, 84 Exclusive OR problem, see XOR problem Extinction conditions, 315–318 diffuse distributions in, 315
F Fadili, M. J., 291 Fading equalization technology encoded, 216 generalization ability for, 217 input-output mapping in, 216 single complex-valued neurons applied to, 216, 217 Fan, K.-C., 33
344
Index
Fast blind GMCA algorithm, 253, 254, 274 complexity analysis of, 254 convergence behavior of, 255 two-stage iterative process, 254 Fast hypGMCA algorithm, 279 FastICA algorithm, 229 Fermat’s principle, 60, 62 geometrical approach, 60 square-root approach, 60 Ferroelastic strain distortions, 305 materials susceptible to, 329 Feshbach, H., 60 Feshbach–Villars form, 57 of Klein–Gordon equation, 64, 65 Filtering binary image, 42 binary signal, 2 edge, 120–124 noise, see Noise filtering one-dimensional signal, 111–119 of salt-and-pepper noise, 41 two-dimensional, 119–133 Fisher, I. R., 306 Fisher’s equation, 85 Fishman, L., 50 Fixed-point algorithm, 255 Flat filters, 19 Flexibility crystal chemical, 306 types of, 306 Foldy, L. L., 49 Foldy–Wouthuysen theory, 50 Foldy–Wouthuysen transformation, 49, 51–58 Forbes, G. W., 62 Fourth-order Runge–Kutta algorithm, 86, 96, 114 Fractal images, 209 Framework structures inherently flexible, 305, 326 inherent rigidity, 326 tetrahedrally corner-connected, 326 Frobenius norm, 224, 249, 261 Fermi surface (FS), 328 materials susceptible to, 305, 328 Fuchs, J. J., 243 Furuya, T., 155, 157
G Galactic dust emission, 293, 295 Gaussian distribution, 226, 249
Gaussianity, 226, 227 Generalization error on angle, 196, 199 on parallel displacement vector, 202 on similitude ratio, 200 on transformation of geometric figures, 194, 207, 208 Generalized morphological component analysis, see GMCA Generalized rank-order filters, 13 Georgiou, G. M., 156, 158, 211 Gilbert, E. N., 9 GMCA algorithm, 223, 245–247 from Bayesian viewpoint, 249 complexity analysis of, 248 computational cost, 274, 275 convergence of, 273 dictionary, 247, 248 estimating number of sources, 258, 261 and extension to hyperspectral case, comparison between, 280–283 fast blind, 253, 254 handling noise, 248, 268 in higher dimensions, 273, 283, 284 for hyperspectral BSS, 276–280 increasing iteratively the number of components, 260 inpainted image using, 290, 291 iterative thresholding algorithm, 247 noisy case, 260–263 provides sparsest solution, 267, 268, 270 role of, 259 to simulations, 293–296 sources estimated using, 270 with spectral sparsity constraints, 281, 282, 284 speeding up blind convergence study, 255 fast GMCA algorithm, 253, 254 fixed-point algorithm, 255 orthonormal case, 250 redundant case, 251 thresholding strategy, 248 versatility of, 296 vs. BSS techniques, 269 vs. PCA, 260–262 GMCALab toolbox, 268, 296 Graph search-based algorithms, 29 Gray-level extraction, 99, 100 Greedy algorithms, 236 Gribonval, R., 234, 239 Guest, C. C., 157, 158, 211
Index
H Hamming weight, 16 Han, C.-C., 33 Harburn, G., 314 Hard thresholding, 241, 248, 256, 257 Hawkes, P. W., 62 HeaviSine signals, 266 Heijmans, H. J. A. M., 19 Helmholtz equation, 50, 57, 60 Heuristic algorithms, 3, 27 Hidden neuron, 158, 162, 167, 194 Higher-order statistics, 228 Hirata, N. S. T., 29, 31, 32, 44 Hirota, R., 82 Homogeneous medium, matrix representation, 67 Homotopy-continuation algorithms, 236 Huo, X., 233–234 Hyperspectral data GMCA algorithm for, 276–280 properties of, 275, 276 specificity of, 275, 276 Hypersurfaces decision boundary, 209–211 of single complex-valued neuron, 210, 211
I Ibn Al-Haitham, A., 62 ICA, 226–228, 269 limits of noisy ICA, 229 probability density assumption, 228 Identity theorem, 207 ILP, 25 formulation of MAE stack-filter problem, 25 heuristic solutions, 27–33, 43 Image encryption, 99–103 Image inversion, 96–99 Image-processing, 95–103 contrast enhancement, 96–99 gray-level extraction, 99, 100 image encryption, 99–103 image inversion, 96–99 problems in, 111 two-dimensional filtering, 119–133 Image rotation, 61 Image sensors CMOS, 134, 135 solid-state, 134 Impulse noise, 37 Incommensurately modulated structures (IMSs), 316
345
Independence, 226 approximating, 227, 228 measures of, approaches for higher-order statistics, 228 information maximization, 227 maximum likelihood (ML), 228 Independent component analysis, see ICA Inertial systems, 90–108 elementary cell of, 103 image processing, 95–103 theoretical analysis, 91, 92 Information maximization, 227 Inherently flexible framework structures, 305 Inhomogeneous medium, 70 Input-output mapping detection of symmetry problem with, 214 encoded, 212, 214 in fading equalization problem, 216 in XOR problem, 212 Integer linear programming problem, see ILP Interchannel coherence, 286 Inversion set, 32 Irreducible representations, 317 Isotropic diffusion equation, 136 Iterative thresholding, 237, 247
J Jacobian elliptic function, 91, 143, 144 JADE (joint approximate diagonalization of eigen-matrices), 269, 272, 289, 295 Jagannathan, R., 51, 61 Joint probability density function, 226, 255–257
K Kasper, E., 62 Khan, S. A., 51 Kim, M. S., 157, 158, 211 Kim, Y.-T., 28 Kinematic primary diffuse distribution, 322 Kirchhoff’s laws, 110, 132 Klein–Gordon equation, 50, 57, 60, 63 Feshbach–Villars form of, 64, 65 time-dependent, 64 Kondo effect material, nonmagnetic, 305, 328–329 Koutsougeras, C., 157, 158, 211
346
Index
Kullback–Leibler (KL) divergence, 226, 228 Kuroe, Y., 157
L Laboratory functions, 70 Laplacian distributions, 263, 277 Laplacian law, 249 Laplacian probability density, 260, 280, 283 Lattice theory, 17 LDCT, 271 Learnable parameters, 168, 169, 194, 199 Learned rotation, 195 Learning constant, 163, 164, 176, 181 Learning pattern of complex-BP algorithm, 177–180, 192, 195, 200, 201, 205 input, 205, 206 limiting values of, 195, 199, 202 of real-BP algorithm, 177–180, 204 Learning rule of complex-BP algorithm, 164–169, 174 of real-BP algorithm, 174 Learning speed of complex-BP algorithm, 169–174 factors to improve, 174, 175 of real-BP algorithm, 171–174 Least angle regression/liberty alliance single sign on (LARS/LASSO), 236 Lee, W.-L., 33 Leptokurtic sources, 229 L-filters, 13 Lie algebraic formalism, 59, 61 Light beam optics, 57 Hamiltonians in prescriptions, 63 quantum methodologies in, 60 scalar wave theory, 60 Light polarization, 61 Lin, J.-H., 27–28 Linearly separable Boolean functions, 15 Linear mixture model, 224, 276 Linear programming (LP), 25, 26, 236 Liouville’s theorem, 156 Lippmann, R. P., 209 Local discrete cosine transform, see LDCT Longitudinal polarization, 313–315 Lorentzian function, 146
M MAE, 3, 20–23 of stack filters, 20
Malik, J., 136, 137, 145 Mallat, S., 268 Mammalian visual cortex, 230, 232 Maragos, P., 2, 17, 19 Mathematical analysis of complex neural network, 194–202 of decision boundaries of complexvalued neuron, 210, 211 learning and test patterns used in, 195, 200, 201 Mathematical morphology, 17 Matheron, G., 17 Matrix equations, 66 Maximum a posteriori (MAP) estimator, 231, 249, 275, 277 Maximum likelihood, see ML Maxwell equations, 50, 60, 66, 68 in homogeneous medium, 67–69 in inhomogeneous medium, 70 matrix representations of, 66 Maxwell optics, nontraditional formalism, 51 MCA, 237, 238 inpainted image using, 291 MCNF problem, 34 Mean absolute error, see MAE Mean of Max (MOM), 241 Mechanical analogy, of nonlinear systems, 83–95 Median filters, 2 applications of, 11 characteristic function, 7, 8 effects of, 3 threshold decomposition structure of, 7, 8 Microelectronic implementation, 134–135 Minimization problem, 233, 235, 238, 239, 277, 278 Minimum-cost network flow problem, see MCNF Minsky, M. L., 211 Miura, M., 209 Mixing matrix criterion, 250, 267, 271, 272, 295 as function of SNR, 273, 283 ML, 228, 265 Modeling multichannel data, 224, 225 Modulation wave approach, 309 applications of, 313 effects of multiple scattering, 321 extinction conditions, 315–318 planar diffuse absences, 318–320 size effect, 320
Index
transverse and longitudinal polarization, 313–315 Modulation wave-vector, 309 Monochannel sparse model, 235 Monotonicity enforcement, 28 Moreau, J., 330 Morphological component analysis, multichannel, 239–241 Morphological diversity, 237, 238 and BSS, 244 in multichannel data, 238, 239 role of, 265 Morphological filters, 2, 17 Morphological operators, 17 Morphospectral coherence, 276 Morphospectral sparsity constraint, 283, 284 Moudden, Y., 293 MP (matching pursuit), 236, 239 Multichannel data, morphological diversity in, 238, 239 Multichannel dictionary, 234, 277 Multichannel inverse problems, 285 Multichannel matching pursuit (mMP), 241 Multichannel morphological component analysis (mMCA), 239–241 handling bounded noise with, 243 recovering sparse multichannel decompositions using, 242, 243 thresholding strategy, 241 Multichannel morphological components, 238, 239, 245 Multichannel mutual coherence, 235 Multichannel overcomplete sparse decomposition, 239 Multichannel overcomplete sparse recovery, 239–244 Multichannel sparse recovery results, 235, 236 Multiple scattering, effects of, 321 Multiplicative mixing process, 276 Multistable network, 127–130 electronic implementation of, 130–133 Multivalued data restoration, application to, 284–286 Multivariate data, 222 Multivariate Gaussian distribution, 249 Mutual coherence, 233–235
N NaCl type average structure, 323 Nagashima, H., 82 Nagumo equation, 88, 89, 111, 136
347
Natural gradient form, 228 Natural images, 230, 237, 272 Neumann boundary conditions, 110 Neural network, complex-valued, 155–162 Neuron complex-valued, see Complex-valued neuron hidden, 158, 162, 167, 194 real-valued, see Real-valued neuron sparse activation of, 230 Neutron powder diffraction, 330 Nielsen, M., 234, 239 Nitta, T., 155–157 NLTLs, 82 Noise additive Gaussian, 229, 287 bounded, 243 handling with GMCA algorithm, 248 impulse, 37 in nonlinear systems, 138 salt-and-pepper, 39, 41 white Gaussian, 280 Noise contamination, 285 Noise covariance matrix, 249, 271, 278 Noise filtering, 120–122 based on anisotropic diffusion, 137 of one-dimensional signal, 111–119 experimental results, 116–119 theoretical analysis, 111–119 theoretical and numerical results, 114–116 Noiseless mixtures, 269 Noiseless sparse BSS problem, 263 Noisy color image, 286, 287 Noisy-ideal images, 37 Noisy image, of Coliseum, 120 Nondeterministic polynomial time (NP)-hard problem, 233 Non-Gaussianity, 227, 228 Nonlinear differential equations, 80, 82, 90 Nonlinear electrical lattice, 108, 109 Nonlinear electrical transmission lines, see NLTLs Nonlinear filters, 1 Nonlinearity parameter, 94, 95, 98, 99, 130 Nonlinear network, multistable, electronic cell of, 130 Nonlinear oscillator equations, 91 Nonlinear oscillator network, 96 encrypted image of, 96 histogram of, 96
348
Index
Nonlinear oscillators, properties of, 92–95 Nonlinear reaction-diffusion equations, 80 Nonlinear resistor RNL , 108 realization of, 110 Nonlinear systems chaotic behavior of, 99 efficiency of, 81, 94 image processing applications, 135–140 mechanical analogy of, 83–95 noise effects in, 138 overdamped case, 84–90 bistable behavior of, 86 coupled case, 87–90 limit of continuous media, 88–90 uncoupled case, 85, 86 weak coupling limit, 87, 88 response of, 98 Nonlinear transformation, 159 Nonlinear voltage, 109 Nonmagnetic Kondo effect material, 305, 328–329
O Odd operator, 52 Offline scheme, 285 to color image denoising, 286–289 Ohm’s law, 109 OMP, 236, 239 One-dimensional lattice, 108–111 Online scheme, 285 to color image inpainting, 289–292 Operators binary signal, 4 even, 52 increasing, 5 morphological, 17 odd, 52 signals and, 4 visual representation, 5 W-operators, 5, 18 Optimal stack filters, 3, 20 application, 36 design problem heuristic solution, 27–33 optimal solution, 33 design procedure, 35 formulation as linear programming problem, 25 mean absolute error, 20–23 Order statistic filters, 2 Orris, G. J., 50
Orthogonal matching pursuit, see OMP Oscillating texture part, 244 Oscillators chaotic, 101 dynamics of, 92 evolution of, temporal, 93 Otero-Diaz, L. C., 306 Overcomplete dictionary, 224, 233, 234 choice of, 243, 244 Overcomplete multichannel representations, 234, 235 Overcomplete signal representations, 231, 232 Overdamped network, 142, 143 Overdamped particle, evolution of, 144, 145 Overdamped system, 84–90, 111 bistable behavior of, 86
P Papert, S. A., 211 Parallel displacement, 201 of straight line, 186, 187, 190, 191 Parallel displacement vector, 191, 201, 202 generalization error on, 202 Parametric Bayesian approach, sparsity in, 231 Parthé, E., 304, 325 Particle beam optics, 57 Patton, R. S., 50 Pauli equation, 50 Pauling, L., 325 PBF, 9, 40, 42 Pearlmutter, B., 231 Perona, P., 136, 137, 145 Perona and Malik anisotropic diffusion algorithm, 145, 146 PEs (processing elements), 134 Peyré, G., 286 Photon wave function, 73 Photosensitive nonlinear media, chemical, feature of, 81 Piazza, F., 211 Pitas, I., 11 Pixel-level processing, 134 Planar diffuse absences, 318–320 Planck data set, application to, 292–296 Planck–high frequency instrument, 293 Platykurtic sources, 229 Polarization, 61 longitudinal, 313–315 transverse, 313–315 Polarized beams, 50
Index
Polynomial voltage, 109 Positive Boolean function, see PBF Positive definite matrix, 163, 164 Practical sparse signal decomposition, 236, 237 Primary modulation wave-vectors, 312 Primary visual cortex, 230 Principal component analysis (PCA), vs. GMCA, 260–262 Probabilistic-descent method, 163 Probabilistic model, 263 Probability density function (pdf), 226, 228, 255 Propagation failure effect, 88 Propagation mechanism, 87 Pseudo-extinction condition, 322, 323 Pyrochlore average structure type, 321
Q QABP (quantum aspects of beam physics), 59 Quantum formalism, of charged-particle beam optics, 58, 59, 63
R Rank-order filters, 12 applications of, 13 weighted version, 13 Reaction–diffusion systems, 83, 108–133, 135 Real-valued neuron, see also Complex-valued neuron basic structure of weights, 159 complex-valued neuron relationship with, 158–161 input signals of, 159 Real-valued parameter, 164 Relative Newton algorithm (RNA), 231, 250, 269, 272 Resistance function, 70 Resistor, nonlinear, see Nonlinear resistor Rest state, 84 RGB denoised image, 286, 287, 289 Riemann-Silberstein complex vector, 61, 66, 69 Rigid unit mode modes, see RUM Rotation angle, 191, 196 Rotation factor, 188 r-th order statistics, 13 RUM, 326 Rumelhart D. E., 211 Runge–Kutta algorithm, fourth-order, 86, 96, 114 Ruska, E., 62
349
S Saito, N., 264 Salt-and-pepper noise, 39 filtering of, 41 Sauvage, M., 304, 325 Scalar wave theory, 60 Scattering factor, 309 Schafer, R. W., 2, 19 Score function, 228 Sensors CMOS, 134, 135 computational, 134 Shechtman, D., 306 Sigmoid functions, 156, 174 in activation function, 196 approximated by linear function, 197, 207, 208 derived function of, 174 Signals and operators, 4 processing, 4 threshold decomposition structure, 6 translation of, 4 Signal-to-noise ratio, see SNR SIMD, 135 Similarity transformation of complex BP algorithm, 183–187, 189, 190, 199 with similitude ratio, 200 of test point, 200 Similitude ratio, 191, 199, 200 Simon, R., 73 Simplex algorithm, 34 Single-instruction multiple-data, see SIMD Sinusoidal law, 130, 131 SMICA, 294 Snow crystals, 307 SNR, 250, 261, 267, 271, 272, 281, 287 Soft thresholding, 240, 246, 248, 279 Solid electrolyte, 305 Solid-solution phases, 304 compositionally disordered, 323 Solid-state image sensors, 134 Space complexity, 169, 170, 172, 173 Sparse BSS and sparse ICA, 265 and unconditional bases, 264 Sparse coding, 230, 232, 254 Sparse decomposition, 232 to preserve linearity, 252 Sparse decomposition issue, 233, 234 algorithms to solve greedy algorithms, 236
350
Index
Sparse decomposition issue (continued) iterative thresholding, 237 linear programming, 236 Sparse multichannel decompositions, recovering using mMCA, 242, 243 Sparse multichannel signal representation, 231–237 Sparse redundant decompositions, 252 Sparse spectral constraint, 281, 282, 284 Sparsity in BSS, 229–232, 265–267, 272 in parametric Bayesian approach, 231 variations on, 263 Sparsity divergence, 270 Spatial dictionary, 245, 263, 285 Spatial morphological diversity, 239 Spatial/temporal dictionary, 234 Spatial/temporal diversity, 235 Spectral dictionary, 234, 244 Spectral morphological diversity, 239 Spectral waveforms, 275 SR (stochastic resonance), 83, 138, 139, 140 Stack filters, 2 characterization, 9 definition, 7–10 design approaches, 26 adaptive algorithms, 26 genetic algorithms, 26 heuristic approaches, 26, 27 heuristic solutions, 27–33 optimal solution, 33 taxonomy, 26 effects of, 3 MAE of, 3, 20 optimal, 20 properties, 7–10 subclass of, 10 Stacking, 6 Stacking property, 8 Stagewise orthogonal matching pursuit (StOMP), 240 Starck, J.-L., 237 Statistical independence, 227, 229 Structured diffuse intensity distributions, 304, 305 Structuring element, 17 Sunyaev-Zel’dovich (SZ) clusters, 293, 295 Suzuki, K., 82 Symmetry problem detection of, 214–216
solved by complex-valued neuron, 214–216 System-on-chip approach, 134
T T˘abu¸s, I., 33 Test patterns of complex-BP algorithm, 177–180, 192, 195, 200, 201, 205 of real-BP algorithm, 177–180, 204 Test points, 176 counterclockwise rotation of, 196 distances between training points and, 176, 181 input and output, 182 on circle, 191 mapping of, 183, 185, 187, 189, 190, 203 on straight lines, 182, 183 parallel displacement of, 202 similarity transformation of, 200 ThAsSe, 305, 328 Threshold decomposition structure, 2, 6 Thresholding, 6, see also Hard thresholding; Soft thresholding operator commutes with, 7 Time complexity, 169, 170, 172 Training points, 176 comparison of theoretical and experimental, 207 counterclockwise rotation of, 196–199, 202 distances between test points and, 176, 181 input and output, 182 on straight lines, 182, 183, 186–188, 190, 191, 194 Transfer maps, 59 Translation invariant, 5 Transverse polarization, 313–315 Tropp, J., 243 Tukey, J. W., 10 Two-dimensional (2D) signals, 10 Two-dimensional-DWT, 269 Two-dimensional filtering, image processing, 119–133 bistable network, limit of, 126, 127 edge filtering, 120–124 extraction of regions of interest, 124–133 multistable network, 127–130 noise filtering, 120–122
Index
U Unconditional basis, 265 Undecided sets, 33 Undecimated discrete wavelet transform (UDWT), 287 Unitary transformation, 52
V Velocity function, 70 Venetsanopoulos, A. N., 11 Villars, F. M. H., 60 Vision systems on chip, see VSoCs Visual receptive fields, 230 Voltage, nonlinear, 109 VSoCs, 134
W Wajs, V. R., 278 Watanabe, A., 209 Waveform dictionary, 237 Wavelet-based denoising, 287, 288 Wavization, 73 Weighted least-squares regression, 279 Weighted median filters, 11
351
Weighted order statistic filters, see WOS Weight vectors, 15 White Gaussian noise, 280, 293 Wolf, E., 62 W-operators, 5, 18 WOS, 13–17 Wouthuysen, S. A., 49 Wurmser, D., 50
X XOR problem input-output mapping in, 212 solved by single real-valued neuron, 212, 213
Y Yang, L., 80 Yoo, J., 28, 29
Z Zero-flux conditions, 110 Zibulevsky, M., 231, 269 Zitterbewegung, 56
This page intentionally left blank
CORRIGENDUM
Lie algebraic methods in charged particle optics, by Tomáš Radliˇcka, published in vol. 151, pp. 241–362. The correct forms of certain equations in this chapter are as follows: Equation (72):
$ ' F12 γφ
γφ
η2 B2 X + ∗X + + + ∗2 X (72a) 2φ 4φ∗ 4φ∗ 8φ $ ' $ ' F12 γp2 ηQ2 ηB2 γF1 + − + − ∗ cos
(cos(2 )X − sin(2 )Y) = 1 8φ∗2 2φ∗ φ∗ 12 φ∗ 2 2φ $ ' 2
2 B2 F γφ γφ η Y
+ ∗ Y + + + 1∗2 Y (72b) 2φ 4φ∗ 4φ∗ 8φ $ ' $ ' F12 γp2 ηQ2 ηB2 γF1 − − + − ∗ sin
(sin(2 )X + cos(2 )Y) = − 1 8φ∗2 2φ∗ φ∗ 12 φ∗ 2 2φ
Equation (75):
$ ' F12 γφ
γφ
η2 B2 Q + ∗Q + + + ∗2 Q = 0 2φ 4φ∗ 4φ∗ 8φ
(75)
Equation (76):
$ '
2 2 F2 ˆP1 ( f ) = f
+ γφ f + γφ + η B + 1 f 2φ∗ 4φ∗ 4φ∗ 8φ∗2
Equation (92):
$ ' F12 γφ
γφ
η2 B2 Q + ∗Q + + + ∗2 Q = 0 2φ 4φ∗ 4φ∗ 8φ
(76)
(92)
Corrigendum
Equation (100):
$ ' F12 γφ
γφ
η2 B2 δF1 X + ∗X + + + ∗2 X = cos( ) 2φ 4φ∗ 4φ∗ 8φ 4eφ∗2 $ ' 2
2 B2 F γφ γφ η δF1 Y
+ ∗ Y + + + 1∗2 Y = − sin( ) ∗ ∗ 2φ 4φ 4φ 8φ 4eφ∗2
(100a)
(100b)
Equation (148):
Pˆ 1 (Q1 ) = Q
1 +
' $ γφ
γφ
η2 B2 γ 2 F12 Q + + + ∗2 Q1 = 0 2φ∗ 1 4φ∗ 4φ∗ 8φ
(148)
Equation (149):
Q
2 +
$ ' γφ
γφ
η2 B2 γ 2 F12 Q + + + ∗2 Q2 = f 2 (Q1 ,Q1 ,Q1
,z) 2φ∗ 2 4φ∗ 4φ∗ 8φ
(149)
Equation (290):
H4int
2 2 1 ˜a L1 t4 + 2L2 t2 t 2 + L3 t 4 Q =φ 4 2 ˜a Q ˜Q ˜a + L1 st3 + L2 tt (st) + L3 s t 3 Q ∗− 12
Ws−1
2 ˜Q ˜a + L1 s2 t2 + 2L2 sts t + L3 s 2 t 2 − R Q 2 2 1 2 2 ˜a ˜ Q L1 s t + L2 (s 2 t2 + s2 t 2 ) + L3 s 2 t 2 + 2R Q 2 2 ˜ Q ˜Q ˜a + L1 ts3 + L2 ss (ts) + L3 t s 3 Q +
2 2 2 1 4 ˜ ˜ a Lz L1 s + 2L2 s2 s 2 + L3 s 4 Q + CQ t 2 + CP t2 Q 4 2 / 0 ˜Q ˜ a Lz + CQ s 2 + CP s2 Q ˜ Lz − 2 CQ s t + CP st Q (290)
+
500
500
450
450
400
400
350
350
300
300
250
250
200
200
150
150
100
100
50
50 50 100 150 200 250 300 350 400 450 500
50 100 150 200 250 300 350 400 450 500
C O L O R P L A T E 1 (Figure 5.3) Contour plots of a simulated joint probability density function (pdf) of two independent sources generated from a generalized Gaussian law f (x) ∝ exp( − μ|x|0.5 ). Left, joint pdf of the original independent sources; right, joint pdf of 2 mixtures.
500
500
450
450
400
400
350
350
300
300
250
250
200
200
150
150
100
100
50
50 50 100 150 200 250 300 350 400 450 500
50 100 150 200 250 300 350 400 450 500
C O L O R P L A T E 2 (Figure 5.4) Contour plots a simulated joint probability density function (pdf) of two independent sources generated from a generalized Gaussian law that have been hard-thresholded. Left, joint pdf of the original independent sources that have been hard-thresholded; right, joint pdf of two mixtures of the hard-thresholded sources.
(a) (b) C O L O R P L A T E 3 (Figure 6.7) (a) An <001> zone axis EDP of Im − 3, Ca3 CuTi4 O12 (CCTO). Note the complete absence of diffuse streaking at this zone axis orientation orientation. (b) A unit cell of the Im − 3, CCTO average structure in projection along a <100> direction. The corner-connected TiO6 octahedra are shown in blue; the Ca ions are represented by the large pink balls; the Ti and Cu ions are shown by the medium-sized blue and green balls, respectively; and the O ions are indicated by the small red balls. The black arrows show the correlated Ti shifts along one of the <001> directions (correlated along this particular <001> row direction but not from one such <001> to the next) responsible for the structured diffuse scattering observed in Figure 4b.
[111] ¯ [112]
[111] ¯ [110]
(a) (b) C O L O R P L A T E 4 (Figure 6.9) The disordered average structure of Bi1.89 GaSbO6.84 ≡ (O 0.84 Bi1.89 ). (GaSbO6 ) is shown in projection down an (a) [1, −1, 0] direction and (b) a [1, 1, −2] direction. The [1, 1, −2] direction is horizontal and the [111] direction vertical in (a), whereas the [1, −1, 0] direction is horizontal and the [111] direction vertical in (b). The O 0.84 Bi1.89 tetrahedral substructure, built of O Bi4 tetrahedra, is shown in red and the (GaSb)O6 octahedral substructure, built of Ga1/2 Sb1/2 O6 octahedra, in green. The average structure unit cell is outlined in each case.
(a)
(b)
(c)
(d)
C O L O R P L A T E 5 (Figure 6.10) (a) The Bi/Zn ordered (O Bi1.5 Zn0.5 ) tetrahedral substructure of BZNT responsible for the G ± <001> * satellite reflections apparent in the <001> zone axis EDP of BZNT shown in (c). The Bi ions are represented by the large pink balls and the Zn ions by the smaller blue balls. The O ions are in the center of the O Bi3 Zn tetrahedra in (a). (b) shows the size effect–relaxed Bi/Zn distribution, while (d) shows the simulated EDP corresponding to (b). For more details, see Liu et al., 2006.
z y
x y
(a)
x
(b)
C O L O R P L A T E 6 (Figure 6.13) An (a) [001] and (b) [110] projection of the disordered average crystal structure of the microporous aluminophosphate, AlPO4 -5. The larger AlO4 tetrahedra are light blue and the smaller PO4 tetrahedra dark blue in (b). The unit cell is shown in outline in both (a) and (b). Note the existence of apparent 180-degree Al-O-P angles along the z direction in (b).
(a)
(b)
(c) <110>
(d) C O L O R P L A T E 7 (Figure 6.15) (a) An [001] zone axis electron diffraction pattern of ThAsSe taken at 100◦ K. (b) The P4/nmm average structure of ThAsSe (As ions are represented by the large blue balls, Se ions by the medium-sized red balls, and Th ions by the small black balls). (c) The calculated Fermi surface (FS) of ThAsSe in projection along c*. Some q-vectors spanning this FS are marked on (c). (d) The pattern of As-As dimerization responsible for the observed diffuse distribution (see Withers et al., 2006, for details).
b a
(a) (b) C O L O R P L A T E 8 (Figure 6.16) (a) A close to <110> and (b) [001] projection of the P4/nmm, tetragonal α form of PbO. The small red balls represent the O ions and the larger dark balls the Pb ions. Note that the O ions form square arrays in (001) planes and that each oxygen ion is tetrahedrally coordinated by Pb ions.