ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 141
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES-CNRS Toulouse, France
HONORARY ASSOCIATE EDITORS
TOM MULVEY BENJAMIN KAZAN
Advances in
Imaging and Electron Physics
E DITED BY
PETER W. HAWKES CEMES-CNRS Toulouse, France
VOLUME 141
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier 525 B Street, Suite 1900, San Diego, California 92101-4495, USA 84 Theobald’s Road, London WC1X 8RR, UK ∞ This book is printed on acid-free paper.
Copyright © 2006, Elsevier Inc. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the Publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (www.copyright.com), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-2005 chapters are as shown on the title pages. If no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/2006 $35.00 Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, E-mail:
[email protected]. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.” For information on all Elsevier Academic Press publications visit our Web site at www.books.elsevier.com ISBN-13: 978-0-12-014783-0 ISBN-10: 0-12-014783-1 PRINTED IN THE UNITED STATES OF AMERICA 06 07 08 09 9 8 7 6 5 4 3 2 1
CONTENTS
C ONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . P REFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F UTURE C ONTRIBUTIONS . . . . . . . . . . . . . . . . . . . . . .
vii ix xi
Phase Diversity: A Technique for Wave-Front Sensing and for Diffraction-Limited Imaging L AURENT M. M UGNIER , A MANDINE B LANC , AND J ÉRÔME I DIER I. II. III. IV. V. VI. VII.
Introduction and Problem Statement . . . . . . . . . . . . . Applications of Phase Diversity . . . . . . . . . . . . . . . . Phase Estimation Methods . . . . . . . . . . . . . . . . . . Properties of the Phase Estimation Methods . . . . . . . . . . Restoration of the Object . . . . . . . . . . . . . . . . . . . Optimization Methods . . . . . . . . . . . . . . . . . . . . Application of Phase Diversity to an Operational System: Calibration of NAOS-CONICA . . . . . . . . . . . . . . . . VIII. Emerging Methods: Measurement of Large Aberrations . . . . IX. Emerging Applications: Cophasing of Multiaperture Telescopes References . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
3 12 16 28 36 42
. . . .
46 53 63 68
Solving Problems with Incomplete Information: A Grey Systems Approach Y I L IN AND S IFENG L IU I. II. III. IV. V. VI. VII. VIII.
Problems with Uncertainty . . . . . . . . . . . . The Fundamentals . . . . . . . . . . . . . . . . Methods for Sequences with Abnormal Behaviors . Incidence Analysis . . . . . . . . . . . . . . . . Clustering and Evaluations . . . . . . . . . . . . Law of Exponentiality and Predictions . . . . . . Decision-Making Based on Incomplete Information Programmings with Uncertain Parameters . . . . . v
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
78 84 92 101 114 130 143 152
vi
CONTENTS
IX. Control of Not Completely Known Systems . . . . . . . . . . . 161 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Recent Developments in the Imaging of Magnetic Domains W ITOLD S ZMAJA I. II. III. IV. V. VI.
Introduction . . . . . . . . . Experimental Techniques . . . SEM Type I Magnetic Contrast Bitter Pattern Method . . . . . Magnetic Force Microscopy . Conclusions . . . . . . . . . References . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
175 180 183 204 224 240 244
. . . . . . . .
. . . . . . . .
. . . . . . . .
258 260 263 266 268 279 292 296
Deconvolution Over Groups in Image Reconstruction B IRSEN YAZICI AND C AN E VREN YARMAN I. II. III. IV. V. VI. VII.
Introduction . . . . . . . . . . . . . . . . Convolution and Fourier Analysis on Groups Group Stationary Processes . . . . . . . . . Wiener Filtering Over Groups . . . . . . . . Wideband Extended Range-Doppler Imaging Radon and Exponential Radon Transforms . Conclusion . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
I NDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
CONTRIBUTORS
Numbers in parentheses indicate the pages on which the authors’ contributions begin.
A MANDINE B LANC (1), Office National d’Etudes et de Recherches Aérospatiales, Department d’Optique Théorique et Appliquée, 92322 Châtillon cedex, France J ÉRÔME I DIER (1), Institut de Recherche en Communication et Cybernétique de Nantes, Analyse et Décision en Traitement du Signal et de l’Image, 1 rue de la Noe, BP 92101, 44321 Nantes cedex 3, France Y I L IN (77), Department of Mathematics, Slippery Rock University, Slippery Rock, Pennsylvania 16057, USA S IFENG L IU (77), College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, PR China L AURENT M. M UGNIER (1), Office National d’Etudes et de Recherches Aérospatiales, Department d’Optique Théorique et Appliquée, 92322 Châtillon cedex, France W ITOLD S ZMAJA (175), Department of Solid State Physics, University of Łód´z, Pomorska 149/153, 90-236 Łód´z, Poland C AN E VREN YARMAN (257), Department of Electrical, Computer and Systems Engineering, Rensselaer Polytechnic Institute, Troy, New York 12180, USA B IRSEN YAZICI (257), Department of Electrical, Computer and Systems Engineering, Rensselaer Polytechnic Institute, Troy, New York 12180, USA
vii
This page intentionally left blank
PREFACE
The subjects covered in this volume span a wide range: phase diversity, grey systems, ways of imaging magnetic domains, and image reconstruction. In the first chapter, L. Mugnier, A. Blanc, and J. Idier describe the technique for wavefront sensing and diffraction-limited imaging based on phase diversity. This has reached a high degree of sophistication in astronomy. Here, the authors give a clear and readable account of current developments. The second chapter deals with the recovery of information in a difficult domain, in which some essential information is missing or the information available may be misleading. The authors, Y. Lin and S. Liu, have introduced the notion of “grey numbers”, and they explain what these numbers are and how they are manipulated. Their utility in a variety of difficult situations is then shown by example. The imaging of magnetic domains has preoccupied microscopists for many years and several techniques for observing phenomena in the scanning and transmission electron microscopes have been devised. Newer types of microscopy have now joined the armoury, and W. Szmaja describes recent developments, concentrating on a type of contrast provided by the scanning electron microscope, the Bitter pattern method and magnetic force microscopy. Finally, B. Yazici and C.E. Yarman describe a group-theoretic approach to deconvolution for image reconstruction. This procedure was first pioneered by B. Yazici and I am delighted to include this extended account of his work here. All the contributors are thanked for the efforts they have made to ensure that their work is accessible to a wide readership and contributions promised for future volumes are listed in the next section. Peter Hawkes
ix
This page intentionally left blank
FUTURE CONTRIBUTIONS
G. Abbate New developments in liquid–crystal-based photonic devices S. Ando Gradient operators and edge and corner detection A. Asif Applications of noncausal Gauss–Markov random processes in multidimensional image processing C. Beeli Structure and microscopy of quasicrystals V.T. Binh and V. Semet Cold cathodes G. Borgefors Distance transforms A. Buchau Boundary element or integral equation methods for static and time-dependent problems B. Buchberger Gröbner bases J. Caulfield (vol. 142) Optics and information sciences T. Cremer Neutron microscopy H. Delingette Surface reconstruction based on simplex meshes A.R. Faruqi Direct detection devices for electron microscopy R.G. Forbes Liquid metal ion sources xi
xii
FUTURE CONTRIBUTIONS
C. Fredembach Eigenregions for image classification S. Fürhapter Spiral phase contrast imaging L. Godo and V. Torra Aggregation operators A. Gölzhäuser Recent advances in electron holography with point sources M.I. Herrera The development of electron microscopy in Spain D. Hitz (vol. 144) Recent progress on high-frequency electron cyclotron resonance ion sources D.P. Huijsmans and N. Sebe Ranking metrics and evaluation measures K. Ishizuka Contrast transfer and crystal images J. Isenberg Imaging IR-techniques for the characterization of solar cells K. Jensen Field-emission source mechanisms L. Kipp Photon sieves G. Kögel Positron microscopy T. Kohashi Spin-polarized scanning electron microscopy W. Krakow Sideband imaging R. Leitgeb Fourier domain and time domain optical coherence tomography B. Lencová Modern developments in electron optical calculations W. Lodwick Interval analysis and fuzzy possibility theory
FUTURE CONTRIBUTIONS
xiii
L. Macaire, N. Vandenbroucke, and J.-G. Postaire Color spaces and segmentation M. Matsuya Calculation of aberration coefficients using Lie algebra S. McVitie Microscopy of magnetic specimens S. Morfu and P. Morquié Nonlinear systems for image processing M.A. O’Keefe Electron image simulation D. Oulton and H. Owens Colorimetric imaging N. Papamarkos and A. Kesidis The inverse Hough transform K.S. Pedersen, A. Lee, and M. Nielsen The scale-space properties of natural images I. Perfilieva Fuzzy transforms E. Rau Energy analysers for electron microscopes H. Rauch The wave-particle dualism E. Recami Superluminal solutions to wave equations ˇ J. Rehᡠcek, Z. Hradil, J. Peˇrina, S. Pascazio, P. Facchi, and M. Zawisky (vol. 142) Neutron imaging and sensing of physical fields G. Ritter and P. Gader (vol. 144) Fixed points of lattice transforms and lattice associative memories J.-F. Rivest (vol. 144) Complex morphology P.E. Russell and C. Parish Cathodoluminescence in the scanning electron microscope
xiv
FUTURE CONTRIBUTIONS
G. Schmahl X-ray microscopy G. Schönhense, C.M. Schneider, and S.A. Nepijko (vol. 142) Time-resolved photoemission electron microscopy R. Shimizu, T. Ikuta, and Y. Takai Defocus image modulation processing in real time S. Shirai CRT gun design methods N. Silvis-Cividjian and C.W. Hagen (vol. 143) Electron-beam-induced nanometre-scale deposition H. Snoussi Geometry of prior selection T. Soma Focus-deflection systems and their applications I. Talmon Study of complex fluids by transmission electron microscopy G. Teschke and I. Daubechies Image restoration and wavelets M.E. Testorf and M. Fiddy Imaging from scattered electromagnetic fields, investigations into an unsolved problem M. Tonouchi Terahertz radiation imaging N.M. Towghi Ip norm optimal filters D. Tschumperlé and R. Deriche Multivalued diffusion PDEs for image regularization E. Twerdowski Defocused acoustic transmission microscopy Y. Uchikawa Electron gun optics C. Vachier-Mammar and F. Meyer Watersheds
FUTURE CONTRIBUTIONS
K. Vaeth and G. Rajeswaran Organic light-emitting arrays M. van Droogenbroeck and M. Buckley Anchors in mathematical morphology M. Wild and C. Rohwer Mathematics of vision J. Yu, N. Sebe, and Q. Tian (vol. 144) Ranking metrics and evaluation measures
xv
This page intentionally left blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 141
Phase Diversity: A Technique for Wave-Front Sensing and for Diffraction-Limited Imaging LAURENT M. MUGNIERa , AMANDINE BLANCa , AND JÉRÔME IDIERb a Office National d’Etudes et de Recherches Aérospatiales, Department d’Optique Théorique et
Appliquée, 92322 Châtillon cedex, France b Institut de Recherche en Communication et Cybernétique de Nantes, Analyse et Décision en Traitement du Signal et de l’Image, 1 rue de la Noe, BP 92101, 44321 Nantes cedex 3, France
I. Introduction and Problem Statement . . . . . . . . . A. Context . . . . . . . . . . . . . . . . B. Image Formation . . . . . . . . . . . . . . 1. PSF of a Telescope . . . . . . . . . . . . 2. Origin of PSF Degradations: Intrinsic Aberrations . . . 3. Origin of PSF Degradations: Atmospheric Turbulence . . 4. Parameterization of the Phase . . . . . . . . . 5. Discrete Image Model . . . . . . . . . . . C. Basics of Phase Diversity . . . . . . . . . . . 1. Uniqueness of the Phase Estimate . . . . . . . . 2. Inverse Problems at Hand . . . . . . . . . . II. Applications of Phase Diversity . . . . . . . . . . A. Quasi-Static Aberration Correction of Optical Telescopes . . 1. Monolithic-Aperture Telescope Calibration . . . . . 2. Cophasing of Multiaperture Telescopes . . . . . . B. Diffraction-Limited Imaging Through Turbulence . . . . 1. A posteriori Correction . . . . . . . . . . . 2. Real-Time Wave-Front Correction . . . . . . . . III. Phase Estimation Methods . . . . . . . . . . . . A. Joint Estimator . . . . . . . . . . . . . . 1. Joint Criterion . . . . . . . . . . . . . 2. Circulant Approximation and Expression in Fourier Domain 3. Tuning of the Hyperparameters . . . . . . . . B. Marginal Estimator . . . . . . . . . . . . . 1. Expression of RI−1 . . . . . . . . . . . . 2. Determinant of RI . . . . . . . . . . . . 3. Marginal Criterion . . . . . . . . . . . . 4. Relationship Between the Joint and the Marginal Criteria . 5. Expression in the Fourier Domain . . . . . . . . 6. Unsupervised Estimation of the Hyperparameters . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 3 4 5 5 6 8 10 11 11 11 12 13 13 14 14 15 16 16 17 17 20 21 22 23 23 24 24 25 25
1 ISSN 1076-5670/05 DOI: 10.1016/S1076-5670(05)41001-0
Copyright 2006, Elsevier Inc. All rights reserved.
2
MUGNIER ET AL .
IV.
V.
VI.
VII.
VIII.
IX.
C. Extended Objects . . . . . . . . . . . . . . . . . . . 1. Apodization . . . . . . . . . . . . . . . . . . . . 2. Guard Band . . . . . . . . . . . . . . . . . . . . Properties of the Phase Estimation Methods . . . . . . . . . . . . . A. Image Simulation . . . . . . . . . . . . . . . . . . . B. Asymptotic Properties of the Two Estimators for Known Hyperparameters . . . C. Joint Estimation: Influence of the Hyperparameters . . . . . . . . . . D. Marginal Estimation: Unsupervised Estimation . . . . . . . . . . . E. Performance Comparison . . . . . . . . . . . . . . . . . F. Conclusion . . . . . . . . . . . . . . . . . . . . . Restoration of the Object . . . . . . . . . . . . . . . . . . A. With the Joint Method . . . . . . . . . . . . . . . . . . B. With the Marginal Method . . . . . . . . . . . . . . . . . 1. Principle . . . . . . . . . . . . . . . . . . . . . 2. Results . . . . . . . . . . . . . . . . . . . . . . 3. Influence of the Hyperparameters . . . . . . . . . . . . . . C. With a “Hybrid” Method . . . . . . . . . . . . . . . . . 1. Principle . . . . . . . . . . . . . . . . . . . . . 2. The Three Steps . . . . . . . . . . . . . . . . . . . 3. Results . . . . . . . . . . . . . . . . . . . . . . D. Conclusion . . . . . . . . . . . . . . . . . . . . . Optimization Methods . . . . . . . . . . . . . . . . . . . A. Projection-Based Methods . . . . . . . . . . . . . . . . . B. Line-Search Methods . . . . . . . . . . . . . . . . . . 1. Strategies of Search Direction . . . . . . . . . . . . . . . 2. Step Size Rules . . . . . . . . . . . . . . . . . . . C. Trust-Region Methods . . . . . . . . . . . . . . . . . . Application of Phase Diversity to an Operational System: Calibration of NAOS-CONICA A. Practical Implementation of Phase Diversity . . . . . . . . . . . . 1. Choice of the Defocus Distance . . . . . . . . . . . . . . 2. Image Centering . . . . . . . . . . . . . . . . . . . 3. Spectral Bandwidth . . . . . . . . . . . . . . . . . . B. Calibration of NAOS and CONICA Static Aberrations . . . . . . . . . 1. The Instrument . . . . . . . . . . . . . . . . . . . 2. Calibration of CONICA Stand-Alone . . . . . . . . . . . . . 3. Calibration of the NAOS Dichroics . . . . . . . . . . . . . 4. Closed-Loop Compensation . . . . . . . . . . . . . . . C. Conclusion . . . . . . . . . . . . . . . . . . . . . Emerging Methods: Measurement of Large Aberrations . . . . . . . . . A. Problem Statement . . . . . . . . . . . . . . . . . . . B. Large Aberration Estimation Methods . . . . . . . . . . . . . 1. Estimation of the Unwrapped Phase . . . . . . . . . . . . . 2. Estimation of the Wrapped Phase (Then Unwrapping) . . . . . . . . C. Simulation Results . . . . . . . . . . . . . . . . . . . 1. Choice of an Error Metric . . . . . . . . . . . . . . . . 2. Results . . . . . . . . . . . . . . . . . . . . . . Emerging Applications: Cophasing of Multiaperture Telescopes . . . . . . . A. Background . . . . . . . . . . . . . . . . . . . . . B. Experimental Results on an Extended Scene . . . . . . . . . . . .
26 26 27 28 28 29 32 34 35 36 36 37 38 38 39 39 40 40 40 41 42 42 42 44 44 45 45 46 46 46 47 47 48 48 49 51 52 53 53 53 55 55 56 59 59 59 63 63 65
A TECHNIQUE FOR WAVE - FRONT SENSING C. Conclusion References .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
3 . .
. .
. .
. .
. .
68 68
I. I NTRODUCTION AND P ROBLEM S TATEMENT A. Context The theoretical angular resolution of an optical imaging instrument such as a telescope is given by the ratio of the imaging wavelength λ over the aperture diameter D of the instrument. For a real-world instrument, optical aberrations often prevent this so-called diffraction-limit resolution λ/D from being achieved. These aberrations may arise both from the instrument itself and from the propagation medium of the light. When observing Space from the ground, the aberrations are predominantly due to atmospheric turbulence: inhomogeneities of air temperature induce inhomogeneities of refraction index. The aberrations can be compensated either during the image acquisition by real-time techniques or a posteriori (i.e., by postprocessing). Adaptive optics (AO) is a technique to compensate in real time for turbulence-induced aberrations (Roddier, 1999). Most of these techniques require the measurement of the aberrations, also called wave-front, by a wave-front sensor (WFS). A large number of WFSs exist today; these are thoroughly reviewed in (Rousset, 1999) and can be classified into two families: focal-plane sensors and pupil-plane sensors. Today’s AO systems use either Shack–Hartmann WFSs (Shack and Plack, 1971) or curvature WFSs (Roddier, 1988), which both divert part of the incoming light by means of a (dichroic) beam-splitter into some auxiliary optics and belong to the second family. For AO they both have the appealing property that they work with broadband light (because they are well described by geometrical optics) and that the relationship between the unknown wave-front and the recorded data is linear, so that it can be inverted in real time. The focal-plane family of sensors was born from the very natural idea that an image of a given object contains information not only about the object, but also about the wave-front. A focal-plane sensor thus requires little or no optics other than the imaging sensor; it is also the only way to be sensitive to all aberrations down to the focal plane. The first practical method for wave-front sensing from focal-plane data was proposed by Gerchberg and Saxton (1972) in the electron microscopy context and later rediscovered by Gonsalves (1976). If the pupil (or aperture) function of the imaging system is known, this method only requires one focal-plane image of a point source in order to estimate the aberrations, which are coded in the phase of the
4
MUGNIER ET AL .
pupil transmittance. It finds the aberrations that are the most compatible ones with the known constraints in the pupil plane (known aperture) and in the focal plane (measured image). The original implementation uses projections: it works by imposing the known constraints on the wave’s complex amplitude alternatively in the two domains until convergence. The connection between this projection-based algorithm and the minimization of a least-square functional of the unknown aberrations was later made by Fienup (1982). This so-called phase-retrieval method has two major limitations. First, it only works with a point source. Second, there is generally a sign ambiguity in the recovered phase (i.e., the solution is not unique), as detailed below. Gonsalves (1982) showed that by using a second image with an additional known phase variation with respect to the first image (such as defocus), it is possible to estimate the unknown phase even when the object is extended and unknown. The presence of this second image additionally removes the above-mentioned sign ambiguity of the solution. This technique is referred to as “phase diversity” by analogy with a technique used in wireless telecommunications. The idea of using two images of the same object with a known finite relative defocus in order to determine phase information from intensity measurements can actually be traced back to the work of Misell (1973), again in the electron microscopy context. This contribution attempts to provide a (necessarily incomplete) survey of the phase diversity technique, with an emphasis on its wave-front sensing capabilities. In most of this text, we consider a single-aperture1 optical imaging instrument working with spatially incoherent light, such as a telescope. The remainder of this section provides an introduction to the image formation for such an instrument, reviews the sources of image degradation, and states the inverse (estimation) problem to be solved. B. Image Formation Image formation is well described by the scalar theory of diffraction, presented in detail in reference books (Goodman, 1968; Born and Wolf, 1993). It can be modeled by a convolution of the observed object by the instrument’s point spread function (PSF), at least within the so-called isoplanatic patch of the instrument. At visible wavelengths, this patch is typically of the order of 1 degree when considering only the aberrations of the telescope itself and of the order of a few arc-seconds (1 arcsec = 1/3600◦ ) for a telescope observing Space through atmospheric turbulence. 1 As opposed to multiple-aperture instruments such as imaging interferometers, see Section IX.
A TECHNIQUE FOR WAVE - FRONT SENSING
5
1. PSF of a Telescope The PSF of a telescope or of the “telescope + atmosphere” system at an imaging wavelength λ is the square modulus of the (inverse) Fourier transform of the complex amplitude ψ, where ψ = P exp(j ϕ) is the electromagnetic field in the instrument pupil (or aperture) when the observed object is a point source: 2 (1) hopt (x, y) = FT−1 P (λu, λv) ej ϕ(λu,λv) (x, y). In this expression, the Fourier transform models the transformation of the electromagnetic field between infinity and the focal plane, the square modulus is due to the quadratic detection (i.e., the detection of the field’s intensity), and x and y are angles on the sky, in radians (rd). For a perfect telescope and without turbulence, P is constant within the pupil and ϕ is zero; for a real telescope, the variations of ψ are due to the aberrations of the telescope itself and to those introduced by turbulence. From this point forward in the text, we shall write P (u, v) = P (λu, λv) and φ(u, v) = ϕ(λu, λv) to deal with dimensionless quantities. In the following, we assume that P is simply the indicatrix of the pupil, (i.e., that the intensity variations in the pupil are negligible). This assumption is generally valid in astronomical imaging and is called the near field approximation (Roddier, 1981). With this assumption, the PSF is completely described by the pupil phase φ. Eq. (1) indicates that the optical transfer function (OTF) h˜ opt is the autocorrelation of P exp(j φ): h˜ opt (u, v) = P exp(j φ) ⊗ P exp(j φ) (u, v), (2) where the correlation of two complex-valued functions f1 and f2 is defined by f1 ⊗f2 (x) f1 (t)f2 (t +x) dt. In the absence of aberrations (i.e., if φ = 0), the OTF is thus the autocorrelation of P . Consequently, it has a cutoff spatial frequency of D/λ rd−1 , where D is the pupil diameter, and is strictly zero beyond. In a real system for Space observation from the ground, turbulenceinduced aberrations, if uncorrected, lead to a cutoff frequency much smaller than D/λ. 2. Origin of PSF Degradations: Intrinsic Aberrations Some aberrations are intrinsic to the instrument; they originate in imperfections in the design, fabrication and assembly, as well as in the environment of the instrument (e.g., thermomechanical stresses and vibrations for a space telescope). Optical telescopes for Space observation from the ground or for Earth observation from Space are usually instruments of very good optical quality,
6
MUGNIER ET AL .
so that the overall amplitude of these aberrations is small (notably less than the imaging wavelength λ). Their spatial spectrum is related to their origins: • Some aberrations originate in the optical design; they are fixed and of low spatial frequencies. • Some aberrations are due to the fabrication process (polishing); they are also fixed but, conversely, of high spatial frequencies. • Some aberrations are due to misalignments, either because of an imperfect alignment during the integration or because of thermomechanical drifts during operation. Such aberrations are slowly varying and of low spatial frequencies. • Finally, some aberrations may occur at the location where the optical components are supported. These too are slowly varying but may be of variable spatial frequencies. In summary, the salient features of intrinsic aberrations are that they are slowly varying and usually of small overall amplitude. 3. Origin of PSF Degradations: Atmospheric Turbulence Inhomogeneities of the temperature of atmospheric air induce inhomogeneities of the air refraction index, which perturb the propagation of light waves through the atmosphere. We shall assume that these random index fluctuations follow the Kolmogorov law: their probability density function is Gaussian, with zero mean and a power spectral density (PSD) proportional to |ν|11/3 , where ν is the three-dimensional (3D) spatial frequency (Roddier, 1981). This assumption is usually valid, at least for the spatial frequencies of today’s single-aperture telescopes. Astronomical observation from the ground In the case of astronomical observations from the ground, the light wave coming from a point source is planar at the entrance of the atmosphere. By integration of the index fluctuations’ PSD along the optical path and within the near field approximation, it is possible to derive the spatial statistics of the phase in the telescope’s pupil. This phase is Gaussian as it results from the sum of all index perturbations from the high atmosphere down to the ground (Roddier, 1981). Its PSD depends on only one parameter denoted by r0 and reads (Noll, 1976): −5/3 −11/3
Sφ (f ) = 0.023 r0
f
,
(3)
where f is the modulus of the 2D spatial frequency in the pupil and r0 , called Fried diameter (Fried, 1965), is the key parameter that quantifies the turbulence’s strength. It is all the smaller as the turbulence is stronger; it is typically 10 cm in the visible in a relatively good site.
A TECHNIQUE FOR WAVE - FRONT SENSING
7
The typical evolution time τ of the turbulent phase in the pupil is given by the ratio of its characteristic scale r0 over an average wind speed, which is, more precisely, the standard deviation of the wind speed modulus distribution (Roddier et al., 1982): τ = r0 /Δv.
(4)
For r0 ≈ 10 cm and Δv ≈ 10 m · s−1 , τ ≈ 10−2 sec, images corresponding to an integration time that is notably longer than this time will be referred to as “long exposure”; those of shorter exposure time will be referred to as “short exposure.” For a comprehensive exposition on the temporal statistics of the turbulent phase, see Conan et al. (1995). The long-exposure turbulent OTF (without AO correction) is the product of the telescope’s OTF without atmosphere T by an atmosphere transfer function B of cutoff frequency r0 /λ (Roddier, 1981): opt h˜ opt (f ) h˜ t (f ) = T (f )B(f ), where B(f ) = exp{−3.44(λf/r0 )5/3 } and · denotes a temporal average on an arbitrarily long time. Because the cutoff frequency of T is D/λ, this equation shows that the phenomenon that limits the resolution depends on the ratio D/r0 : if D < r0 , the instrument is diffraction limited, that is, its resolution is given by its diameter (if its intrinsic aberrations are reasonable), whereas if D r0 , the long-exposure resolution of the instrument is limited by turbulence and is not better than that of a telescope of diameter r0 . As noted by Labeyrie (1970), when the exposure time is short enough to freeze the atmospheric turbulence [typically shorter than 10 ms, cf. Eq. (4)], some high-frequency information is preserved in the images, in the form of speckles, whose typical size is λ/D and whose position is random. This is illustrated in Figure 1. If a number of these (uncorrected) short-exposure images are processed jointly in a more clever way than a simple average, it is thus possible to restore a high-resolution image of the observed object. Earth observation from Space In the case of Earth observation from Space, a light wave coming from a point source on the ground is spherical and not planar as in astronomical observation. Because it is spherical, such a wave intersects less and thus interacts less with the lower layers of the atmosphere, which are the ones where the turbulence is strongest. It can be shown theoretically (Fried, 1966a) that the lower layers contribute less to the overall turbulence strength r0 , whose value is typically of a few tens of meters (see, e.g., Blanc, 2002). As a consequence, turbulence is not a limiting factor in the case of a space telescope observing the Earth.
8
MUGNIER ET AL .
F IGURE 1. Simulated short-exposure (left) and long-exposure (right) images of a star through turbulence. The turbulence strength is D/r0 = 10; the sampling rate respects the Nyquist-Shannon criterion.
4. Parameterization of the Phase The pupil phase φ can often be described parsimoniously when expanded on a modal basis. The Zernike polynomials (Noll, 1976) form an orthonormal basis on a disk and thus make a convenient basis for the expansion of the phase on the circular pupil of a telescope. Each of these polynomials is the product of a trigonometric function of the polar angle θ with a polynomial function of the radius r: Zi (r) = Rnm (r)Θnm (θ ), where the trigonometric function reads: ⎧√ if m = 0, ⎪ ⎨√n + 1 m Θn (θ ) = 2(n + 1) cos(mθ ) if m = 0 and i even, ⎪ ⎩√ 2(n + 1) sin(mθ ) if m = 0 and i odd
(5)
(6)
and the polynomial function reads: Rnm (r)
=
(n−m)/2 s=0
(−1)s (n − s)! r n−2s . s![(n + m)/2 − s]![(n − m)/2 − s]!
(7)
Parameter n in Eqs. (6) and (7) is called the radial degree of the corresponding Zernike polynomial, and parameter m is called the azimuthal degree; m and n have the same parity and are such that 0 ≤ m ≤ n. The first Zernike polynomials are represented in Figure 2. Several additional properties make this Zernike basis very commonly used:
A TECHNIQUE FOR WAVE - FRONT SENSING
F IGURE 2.
First Zernike polynomials.
9
10
MUGNIER ET AL .
• The first Zernike polynomials correspond to the well-known low-order optical aberrations: Z4 is defocus, Z5 and Z6 are astigmatism, Z7 and Z8 are coma, and Z11 is spherical aberration. • Their ordering by increasing radial degree corresponds to an order of increasing spatial frequency. The expansion of the phase φ on this basis reads: φ(r) =
∞
ak Zk (r),
(8)
k=1
where r denotes the spatial coordinates in the pupil, normalized to a unit radius. For a multiple-aperture instrument, such an expansion can be used on each of the apertures. The first term is called piston (Z1 ) and codes for the average optical path difference of a given aperture; the two following terms (Z2 and Z3 ) are tip/tilt and code for the position of the aperture’s PSF. For a single-aperture telescope, the sum in Eq. (8) is usually started with k = 4, which corresponds to a centered PSF. Additionally, in practice, the sum is necessarily limited to a finite number kmax , which depends on the problem at hand. It is of the order of 10 or a few tens when estimating the aberrations of a space telescope and of the order of 100 or a few hundreds when observing Space from the ground. 5. Discrete Image Model The image is recorded by a detector such as a charge-coupled device (CCD) camera, which integrates the flux on a grid of pixels. This can be conveniently modeled as the convolution by a detector PSF hdet followed by a sampling operation. The global PSF of the instrument is thus: h = hdet hopt .
(9)
Due to the inevitable noise of the recording process (photon noise and detector noises), the recorded image reads: i = [h o]x + n,
(10)
where [·]x denotes the sampling operation. This model is generally approximated by a discrete convolution with the sampled version of the (unknown) object o and written in matrix form: i = h o + n = H o + n,
(11)
where H is the matrix representing the discrete convolution by the sampled version h of h, and where i is the vector obtained by stacking together the columns of the corresponding image. Similarly, o is the vector obtained by stacking together the columns of the sampled object.
A TECHNIQUE FOR WAVE - FRONT SENSING
11
C. Basics of Phase Diversity 1. Uniqueness of the Phase Estimate We mentioned in Section I.A that phase retrieval from a single image generally faces non-uniqueness of the solution, even if the object is known. This is due to the relationship between the OTF and the pupil phase [Eq. (2)], as shown below. For any complex-valued function f , a simple change of variables in the integration shows that the function f defined as f (t) f (−t) and f have identical autocorrelations: f ⊗ f = f ⊗ f . Let f (t) = P (t) ej φ(t) , one obtains f (t) = P (−t) e−j φ(−t) as P is real-valued; thus, if P is centrosymmetrical (i.e., even) for any phase φ(t), the phase defined by φ (t) = −φ(−t) yields the same OTF, that is, h˜ opt (φ ) = h˜ opt (φ) (as noted by Gonsalves, 1976), and thus the same image. This result can be cast into a somewhat more informative form: if the phase is decomposed (uniquely) into its even and odd components, that is, φ(t) = φeven (t) + φodd (t), then φ (t) = −φeven (t) + φodd (t). In other words, there is an indetermination on the sign of the even part of the phase (Blanc, 2002). Recording a second image of the same object with the instrument suffering from the same unknown phase plus a known additional even one removes this indetermination and adds enough information to retrieve both the phase and the possibly unknown object. More quantitative results on the uniqueness of the phase estimate can be found in Idier et al. (2005). Let φd be this “diversity” phase (often defocus), the two images read:
(12) i 1 = hdet hopt (φ) o + n1 ,
det opt (13) i 2 = h h (φ + φd ) o + n2 . 2. Inverse Problems at Hand The phase-diversity technique can be used in two different contexts. (1) One can be interested in imaging a remote object, for instance in solar astronomy or Space surveillance. (2) One can be interested in measuring the aberrations of an imaging system, either to correct the latter in real time or to restore a posteriori the images it takes. These two problems are obviously very related, but they are not identical. In particular, when interested in imaging a remote object through unknown aberrations, the aforementioned sign ambiguity on the phase can be tolerated, provided the object is recovered satisfactorily. Indeed, multiframe “blind” deconvolution from short-exposure turbulent images has been successfully demonstrated (Schulz, 1993; Thiébaut and Conan, 1995), where blind means
12
MUGNIER ET AL .
deconvolution without a dedicated WFS but with the use of the strong constraints that each PSF is fully described by a pupil phase [cf. Eq. (1)] and that the unknown object is identical in all images. Yet, it can be advantageous to record WFS data simultaneously with the images, in particular because blind deconvolution is usually impaired by the presence of local minima in the criterion to minimize. The WFS can be a focal-plane WFS consisting of a diverse image for each recorded image or a pupil-plane WFS such as a Shack– Hartmann (Fontanella, 1985; Primot et al., 1988; Mugnier et al., 2001). When interested in estimating wave-fronts, the presence of (at least) two images is necessary to avoid a sign indetermination in the phase, as shown in Subsection I.C.1. In both problematics, the basis of the inversion consists in estimating the phase2 and the object that are consistent with the measurements, given the recorded images. The most “natural” solution, which is the one used by Gonsalves (1982) originally, is based on the minimization of the following least-square criterion3 : 2 2 J (o, φ) = i 1 − H (φ)o + i 2 − H (φ + φ d )o (14) as a function of (o, φ). The remainder of this contribution is organized as follows: Section II reviews the domains of application of phase diversity. Then, Sections III and IV review the wave-front estimation methods associated with this technique and their properties, while Section V examines the possible object estimation (i.e., image restoration) methods. Section VI gives some background on the various minimization algorithms that have been used for phase diversity. Section VII illustrates the use of phase diversity on experimental data for wave-front sensing. Finally, Sections VIII and IX highlight two fields of phase-diversity wave-front sensing that have witnessed noteworthy advances. Section VIII reviews the methods used to estimate the large-amplitude aberrations faced in imaging through turbulence and proposes a novel approach for this difficult problem. Section IX reviews the developments of phase diversity for a recent application: the phasing (also called cophasing) of multiaperture telescopes.
II. A PPLICATIONS OF P HASE D IVERSITY The concept of phase diversity has been first proposed by Gonsalves in 1982 as a WFS for adaptive optics. Since 1990, this method has been successfully used in several applications, including astronomy, space observation, and 2 The phases in the case of a sequence of image pairs. 3 For simplicity, this criterion is stated for the case of two images per phase screen and one single
phase screen; it is readily generalizable to more than two images and several phase screens.
A TECHNIQUE FOR WAVE - FRONT SENSING
13
Earth observation. Phase diversity has the particularity of providing the estimation of the unaberrated object as well as the aberrations responsible for the blurring. This method directly uses image data for the estimation of the aberrations. It is thus sensitive to all aberrations degrading the quality of the imaging telescope, contrarily to WFSs such as the Shack–Hartman, which use a dedicated light path and thus are affected by noncommon-path aberrations. Furthermore, the optical hardware of this technique is simple. These are some of the reasons why phase diversity is becoming a widespread method both to compensate quasi-static optical aberrations and to obtain diffraction-limited imaging through turbulence. A. Quasi-Static Aberration Correction of Optical Telescopes Imperfections of an optical telescope can originate from design, fabrication of the optical system (e.g., polishing errors), misalignments (integration and launch), and thermomechanical stresses. These aberrations correspond to different ranges in term of spatial frequencies, but all are slowly changing. Phase diversity using the image data from the science camera obviates the need for important auxiliary optics and thus is a strong candidate for the calibration of telescopes. 1. Monolithic-Aperture Telescope Calibration Space-based telescopes In the case of imaging space or Earth from space, the images are only perturbed by the imperfections of the optical system. A first practical application of the phase-diversity technique has been the determination of the Hubble Space Telescope aberrations (Roddier and Roddier, 1991, 1993; Fienup et al., 1993). In this case, the observed object was known (an unresolved star), so only the aberrations had to be estimated, resulting in a much easier problem, referred to phase-diverse phase retrieval (Ellerbroek et al., 1997). Using the ability of phase diversity to also work with extended objects (including those extending beyond the field of view), studies have been made for the calibration of telescopes imaging the Earth (Blanc et al., 2003b). The implementation of the real-time correction of static optical aberrations has also successfully been demonstrated (Kendrick et al., 1998). Ground-based telescopes Phase diversity has also been used to calibrate space-imaging systems on Earth. The images obtained from Earth are mostly degraded by the deleterious effects of the atmospheric turbulence but also by the static aberrations of the system. Aberrations induced by the atmosphere
14
MUGNIER ET AL .
are zero mean and quickly changing, unlike aberrations due to imperfections of the system. The calibration of the whole optical system, from the entrance pupil of a telescope to the focal plane, can be done by averaging a large number of aberration estimates corresponding to a series of short-exposure pairs of images of an astronomical object (Acton et al., 1996; Baba and Mutoh, 2001). How these estimates are obtained is explained in Subsection II.B.1. The calibration through the atmosphere has also been done, by Lee et al. (1997b), in the case where the optical instrument contains an AO system. In the latter reference, the diversity introduced in the images is unusual: no additional defocused image is required, and successive changes to the AO introduce the diversity. If only the AO and the camera (and not the telescope itself) are to be calibrated, the most effective procedure is to install an internal point source at the entrance of the AO system. The calibration of the noncommon-path aberrations of the European very large telescope (VLT) AO system (called NAOS) and its camera (called CONICA) has been recently done this way; see Blanc et al. (2003a), Hartung et al. (2003), and Section VII for details. Phase diversity is also a practical tool for calibrating deformable mirrors (Löfdahl et al., 2000). 2. Cophasing of Multiaperture Telescopes The resolution of a telescope is ultimately limited by its aperture diameter. The latter is limited by current technology to approximately 10 m for groundbased telescopes and to a few meters for space-based telescopes because of volume and mass considerations. Multiaperture telescopes (interferometers) have the potential to remove these limitations. In order to reach the diffraction-limited resolution, all subapertures must be precisely phased with respect to one another. This so-called cophasing of the interferometer can be performed by use of a phase diversity sensor. Indeed, for a multiaperture telescope as well as a single-aperture one, the image is the result of interferences between all aperture points. Thus, there is information in the image (whether focused or defocused) about the misalignments between subapertures, which are the specific aberrations of interferometry and can be described on each subaperture by the first three Zernike polynomials, called piston and tip-tilt [see Eq. (8) and Figure 2]. Section IX is dedicated to this relatively recent application of phase diversity. B. Diffraction-Limited Imaging Through Turbulence Phase diversity can be used to correct the phase errors due to atmospheric turbulence in two ways: it can be used as an a posteriori correction technique
A TECHNIQUE FOR WAVE - FRONT SENSING
15
(image restoration) or as a real-time WFS for adaptive optics. Note that the ability of phase diversity to recover both wave-front phase and amplitude has been demonstrated on simulated (Gonsalves, 1997) and experimental data (Jefferies et al., 2002). 1. A posteriori Correction For this application, the object is the parameter of interest. Image restoration by means of phase diversity can either correct all the aberrations degrading an imaging system without an AO, or it can be used after AO correction, as a second step, to correct for the residual aberrations. A special processing approach has been proposed to use phase diversity for imaging through the atmosphere, called phase-diverse speckle (a technique that blends the speckle imaging and the phase diversity concepts). Several short-exposure pairs of phase diversity data (in and out of focus) are collected. This method has been applied for imaging through turbulence without AO, in particular for imaging satellites (Seldin et al., 1997; Thelen et al., 1999b) and for imaging the Sun. Additionally, when the object being imaged through turbulence is very extended, the PSF is no longer space invariant in the field of view. The problem of correcting for turbulence-induced blur becomes thus more complicated. Phase diversity can accommodate for space-variant blur. Two methods have been investigated for solving this problem: correcting separately subfields that are smaller than the isoplanatic patch, which is the field of view in which the PSF can be considered as space invariant (Löfdahl and Scharmer, 1994; Seldin and Paxman, 1994; Paxman et al., 1996) or using a tomographic phase reconstruction (Gonsalves, 1994; Acton et al., 1996; Thelen et al., 1999a, 2000; Paxman et al., 1994, 1998). With subfielding, a series of overlapping subframe reconstructions is combined to provide the entire corrected field of view. In the other more sophisticated approach, the volumic nature of turbulence is taken into account by reconstructing the phase in several screens located at different altitudes. Postcorrection by means of phase diversity is also useful for AO-corrected telescopes. A first reason is the existence of noncommon-path aberrations, either unseen because of being outside the AO loop or corrected by the AO loop while not in the science path. A second reason is that the AO correction is always partial (Roggemann, 1991; Conan, Madec and Rousset, 1994; Conan, 1994). Phase-diverse techniques have been successfully demonstrated for postcorrection of binary stars in Seldin et al. (1996b), of satellites in Seldin et al. (1996a), and of the Sun in Löfdahl and Scharmer (2002).
16
MUGNIER ET AL .
2. Real-Time Wave-Front Correction The correction of atmospheric turbulence can be done in real time by using AO systems. Phase diversity is potentially a good candidate for use as a real-time AO WFS for a number of reasons: it is very simple optically; it is also easy to calibrate; and it directly relies on the image so it corrects all aberrations degrading the images (no noncommon-path aberration). However, the computational time required on today’s computers to obtain estimates of the wave-front with phase diversity is, for the moment, considerable compared to the evolution time of the turbulence (a few milliseconds) so that current AO systems generally use other (pupil-plane) sensors. Demonstrations of real-time correction have been obtained for very few corrected aberrations by Gates et al. (1994) and Kendrick et al. (1994a, 1998). Efforts in making phase-diversity estimation faster have thus been made: first by proposing better numerical algorithms in Vogel et al. (1998), Löfdahl et al. (1998a), then by modifying the error metric used to estimate the aberrations and object from the data (Kendrick et al., 1994b; Scharmer et al., 1999; Löfdahl and Scharmer, 2000). Phase-diversity sensors depend on an imaging model [Eq. (10)] involving convolutions that are usually implemented using fast Fourier transforms (FFTs) and thus computationally demanding. The goal of new metrics is to reduce the number of computed FFTs. The use of these metrics for real-time correction has been demonstrated only for a few aberrations; even these new methods involve rapidly increasing computing time as the number of aberrations increases. Another difficulty of the phase-diversity WFS for this application is that it exhibits phase wrapping when the peak-to-valley phase variation is higher than 2π , which is often the case for turbulence-induced dynamic aberrations before closing the loop of the AO system. This is due to the fact that this sensor is sensitive only to the phase modulo 2π , as can be seen from Eq. (1). Recent works provide some methods to alleviate this problem—see Section VIII for details.
III. P HASE E STIMATION M ETHODS The main problem to address in the phase-diversity framework is to estimate the unknown quantities (the object o and/or the aberrated phase φ) from the data (focused and defocused images). The choice of a relevant estimator is thus essential. This section presents the conventional phase estimator in the phase-diversity literature. More precisely, it focuses on the estimation of the aberrated phase from a focal image i 1 and an additional defocused one i 2 obtained from a single-aperture telescope. The estimation methods presented here can be easily generalized to more than one phase screen (i.e., to the
A TECHNIQUE FOR WAVE - FRONT SENSING
17
phase-diverse speckle context). The estimation of the object is discussed in Section V and the specificity of the estimation from segmented-aperture telescope in Section IX. A. Joint Estimator 1. Joint Criterion The conventional processing scheme found in the literature is based on the joint estimation of the aberrations and of the observed object (Paxman et al., 1992). The Bayesian interpretation of such an approach is that it consists in computing the joint maximum a posteriori (JMAP) estimator: ˆ MAP = arg max p(i 1 , i 2 , o, φ; θ) (ˆo, φ) o,φ
= arg max p(i 1 |o, φ; θ n )p(i 2 |o, φ; θ n )p(o; θ o )p(φ; θ φ ),(15) o,φ
where p(i 1 , i 2 , o, φ; θ) is the joint probability density function of the data (i 1 , i 2 ), the object o, and the aberrations φ. It may also depend on a set of hyperparameters θ = (θ n , θ o , θ φ ). The likelihood of the data i k is denoted by p(i k |o, φ; θ); p(o; θ o ) and p(φ; θ φ ) are the a priori probability density functions of o and φ. The majority of the estimation structures used in the phase-diversity literature can be rewritten as Eq. (15) even if they were not originally introduced in a Bayesian framework. Gonsalves (1982) proposed to use a joint least-square approach for the estimation of aberrations parameters. A maximum likelihood estimation of the unknowns was later presented in Paxman et al. (1992) under Gaussian and Poisson noise models. The Gonsalves least-square approach is equivalent to the joint maximum likelihood (JML) approach presented in Paxman et al. (1992) under the Gaussian noise model. The JML, in turn, is obtained by setting p(o; θ o ) = p(φ; θ φ ) = 1 in the JMAP approach [Eq. (15)]. Bucci et al. (1999) introduce regularization under a deterministic approach. A first stochastic interpretation of joint estimation is presented in Vogel et al. (1998) and in Thelen et al. (1999b). By introducing regularization on the aberrations (p(φ; θ φ ) = 1), they propose a so-called generalized maximum likelihood (GML) estimation.4 The use of statistical information on both the object and the aberrations leads to the JMAP estimator (Vogel et al., 1998; Blanc et al., 2003b) of Eq. (15). 4 The denomination GML comes from the statistics literature and refers to a JML criterion that is penalized by a regularization term on some of the unknowns only.
18
MUGNIER ET AL .
Noise It is assumed here, for simplicity, that the noise is stationary white Gaussian with the same variance σ 2 for each image. The case of different variances has been presented in Löfdahl and Scharmer (1994). Hence, the likelihood p(i k |o, φ; θ) reads p(i k |o, φ; θ ) 1 1 t exp − (i − H o) (i − H o) , = k k k k 2 2σ 2 (2π σ 2 )N /2
k = {1, 2},(16)
where N 2 is the number of pixels in the image and the hyperparameter vector θ n reduces to σ 2 . This stationary white Gaussian model is a reasonable approximation for a bright and extended object (Earth or solar observations). Object prior probability distribution Various methods have been proposed to introduce regularization on the object in the phase-diversity literature. Some authors (Löfdahl and Scharmer, 1994; Lee et al., 1997a) use a low-pass filter. Terminating the iterations of the criterion before convergence is also a (somewhat ad hoc) regularization strategy (Seldin and Paxman, 1994; Thelen et al., 1999b). A quadratic regularization model has been proposed (Vogel et al., 1998; Bucci et al., 1999). We choose the latter method, which is easily interpretable in a Bayesian framework, as a Gaussian prior probability distribution for the object. The general expression for such a prior is: 1 1 t −1 p(o; θ o ) = (o − o exp − ) R (o − o ) m m , (17) o 2 2 (2π )N /2 det(Ro )1/2 where om is the mean object and Ro its covariance matrix. Phase prior probability distribution Concerning the aberrations, implicit regularization is achieved by expanding the phase on a finite linear combination of basis functions. Usually the aberrated phase is expanded on a finite set of Zernike polynomials (see Noll, 1976, and Section I.B.4): φ(r) =
k max
ak Zk (r).
(18)
k=4
Note that coefficients a1−3 have not been introduced as mentioned in Section I.B.4: the piston coefficient a1 is the average phase and has no influence on the point spread function, and the tilt coefficients a2−3 introduce a shift in the image that is of no importance for extended objects. In the following, we note a = (a4 , . . . , akmax )t , the {kmax − 3}-dimensional vector gathering the aberration coefficients to be estimated. Additionally, in the case of imaging through turbulence, a statistical prior on the turbulent phase is available
A TECHNIQUE FOR WAVE - FRONT SENSING
19
according to the Kolmogorov model (Thelen et al., 1999b). It leads to a Gaussian prior probability distribution for the aberrations, with a zero mean and a covariance matrix Ra given by Noll (1976): 1 t −1 1 p φ(a); θ φ = exp − a Ra a . (19) 2 (2π )(kmax −3)/2 det(Ra )1/2 The a priori information on the aberrations a is the covariance matrix Ra , so that in this case, θ φ = Ra . Note that in the particular case where the aberrations are only intrinsic (see Subsection I.B.2) and high-frequency (polishing) errors are negligible, a few Zernike coefficients are enough to describe all the aberrations, regularization due to the truncated expansion of the phase is sufficient, and the a priori probability density function p(φ(a); θ) can be omitted (leading to a GML estimator). Criterion Under the above Gaussianity assumptions, we have: p(i 1 , i 2 , o, a; θ) 1 1 t exp − 2 (i 1 − H1 o) (i 1 − H1 o) = 2 2 2σ (2π )N /2 σ N 1 1 t × exp − 2 (i 2 − H2 o) (i 2 − H2 o) 2 2 2σ (2π )N /2 σ N 1 1 t −1 (o − o × exp − ) R (o − o ) m m o 2 2 (2π )N /2 det(Ro )1/2 1 1 t −1 × a exp − R a . (20) a 2 (2π )(kmax −3)/2 det(Ra )1/2 The JMAP approach amounts to maximizing p(i 1 , i 2 , o, a; θ), which is equivalent to minimizing the criterion: LJMAP (o, a, θ) = − ln p(i 1 , i 2 , o, a; θ) 1 1 = N 2 ln σ 2 + ln det(Ro ) + ln det(Ra ) 2 2 1 1 + (i 1 − H1 o)t (i 1 − H1 o) + (i 2 − H2 o)t (i 2 − H2 o) 2 2σ 2σ 2 1 1 (21) + (o − om )t Ro−1 (o − om ) + a t Ra−1 a + A, 2 2 where A is a constant.
20
MUGNIER ET AL .
Expression of oˆ Canceling the derivative of LJMAP with respect to the object gives (Gonsalves, 1982; Paxman et al., 1992) a closed-form expression for the object oˆ (a, θ) that minimizes the criterion for given (a, θ): (22) oˆ (a, θ) = R H1t i 1 + H2t i 2 + σ 2 Ro−1 om , where R = (H1t H1 +H2t H2 +σ 2 Ro−1 )−1 . Substituting oˆ (a, θ) into the criterion of Eq. (21) yields a “new” criterion that does not explicitly depend on the object: L JMAP (a, θ) = LJMAP oˆ (a, θ ), a, θ 1 1 = N 2 ln σ 2 + ln det(Ro ) + ln det(Ra ) 2 2 1 t + i i 1 + i t2 i 2 2σ 2 1 1 t − i 1 H1 + i t2 H2 + σ 2 otm Ro−1 R H1t i 1 + H2t i 2 + σ 2 Ro−1 om 2 2σ 1 t −1 + a Ra a + A. (23) 2 The dimension of the parameter space over which the minimization of this new criterion is performed is dramatically reduced compared to the minimization of the criterion LJMAP (o, a, θ) since the N 2 object parameters have been eliminated (Paxman et al., 1992). Note that there is no such closedform expression of oˆ with a Poisson noise model. 2. Circulant Approximation and Expression in Fourier Domain H1 and H2 correspond to convolution operators, thus they are Toeplitzblock-Toeplitz (TBT) matrices. If (o − om ) can be assumed stationary, the covariance matrix Ro is also TBT. Such matrices can be approximated by circulant-block-circulant matrices, with the approximation corresponding to a periodization (Hunt, 1973). Under this assumption, the covariance matrix Ro and the convolution matrices H1 and H2 are diagonalized by the discrete Fourier transform (DFT). We can write Ro = F −1 diag[So ]F, H1 = F −1 diag[h˜ 1 ]F, H2 = F −1 diag[h˜ 2 ]F, where F is the 2D DFT matrix, diag[x] denotes a diagonal matrix having x on its diagonal, tilde (˜) denotes the 2D DFT, and So is the object power spectral
A TECHNIQUE FOR WAVE - FRONT SENSING
21
density model. Thus the criterion LJMAP and the closed-form expression oˆ (a, θ) can be written in the discrete Fourier domain, leading to a faster computation: 1 1 LJMAP (o, a, θ) = N 2 ln σ 2 + ln So (v) + ln det(Ra ) 2 v 2 1 1 |˜ı − h˜ 1 o˜ |2 + |˜ı − h˜ 2 o˜ |2 (24) + 2 1 2 2 2σ 2σ v v +
|˜o − o˜ m |2 v
and o(a, ˜ˆ θ, v) =
1 + a t Ra−1 a + A 2So (v) 2
σ 2 o˜ m (v) So (v) , 2 2 2 |h˜ 1 (a, v)| + |h˜ 2 (a, v)| + Soσ(v)
h˜ ∗1 (a, v)˜ı1
+ h˜ ∗2 (a, v)˜ı2 +
(25)
where v is the spatial frequency. The expression in the Fourier space of the criterion L JMAP (a, θ) is: L JMAP (a, θ) = LJMAP oˆ (a, θ), a, θ 1 ln So (v) = N 2 ln σ 2 + 2 v +
1 |˜ı1 (v)h˜ 2 (a, v) − ı˜2 (v)h˜ 1 (a, v)|2 2 v σ 2 |h˜ 1 (a, v)|2 + |h˜ 2 (a, v)|2 + σ 2 So (v)
1 |h˜ 1 (a, v)o˜ m (v) − ı˜1 (v)|2 + |h˜ 2 (a, v)o˜ m (v) − ı˜2 (v)|2 + 2 2 v So (v) |h˜ 1 (a, v)|2 + |h˜ 2 (a, v)|2 + Soσ(v) 1 1 ln det(Ra ) + a t Ra−1 a + A. (26) 2 2 Note that the objective function first proposed by Gonsalves (1982) is only composed of the first and third terms of criterion (26). L JMAP (a, θ) must be minimized with respect to the aberrations a. There is no closed-form ˆ so the minimization is done using an iterative method (see expression for a, Section VI for a description of minimization methods). Additionally, before minimizing the criterion, the value of the regularization parameters must be chosen. +
3. Tuning of the Hyperparameters Noise The noise model requires the tuning of the variance of the noise in the image θn = {σ 2 }. It can be estimated using the total flux in the image and the previously calibrated electronic noise level of the camera.
22
MUGNIER ET AL .
Aberrations In the case of atmospheric turbulence, the a priori information on the aberrations a is the covariance matrix Ra . Noll (1976) has shown that this matrix is completely defined by the ratio D/r0 , where D is the diameter of the telescope and r0 is the Fried diameter (Fried, 1966b). The value of the latter can be obtained by a seeing monitor, for example. Object We choose the following model for So : 2 2
p ˜ − o˜ m (v) = k/ vo + v p − o˜ m (v) , So (v) E o(v)
(27)
where E stands for the mathematical expectation. This heuristic model and similar ones have been quite widely used (Kattnig and Primot, 1997; Conan et al., 1998). This model introduces four hyperparameters θ o = (k, vo , p, om ). The tuning of the hyperparameters of the object θ o is not easy; their optimum values depend on the structure of the object. Thus, they must be estimated for each object. Unfortunately, in a joint estimation of the object and the aberrations, these hyperparameters θ o cannot be jointly estimated with o and a. Indeed, the criterion of Eq. (21) degenerates when, for example, one seeks θ o together with o and a. In particular, for the pair {θˆ o = (k = 0, vo , p, om ), oˆ = om }, which does not depend on the data, the criterion tends to minus infinity. So before minimizing LJMAP , these hyperparameters must be chosen empirically by the user or estimated using a sounder statistical device. B. Marginal Estimator In this method, the aberrations a and the hyperparameters (θ n , θ o ) linked to the noise and the object are first estimated. Then if the parameter of interest is the object, it can be restored, in a second step, using the estimated aberrations and hyperparameters (see Section V for a detailed explanation). The marginal estimator restores the sole aberrations by integrating the object out of the problem.5 It is a maximum a posteriori (MAP) estimator for a, obtained by integrating the joint probability density function: aˆ MAP = arg max p(i 1 , i 2 , a; θ) = arg max p(i 1 , i 2 , o, a; θ ) do a a = arg max p(i 1 |a, o; θ )p(i 2 |a, o; θ)p(a; θ)p(o; θ) do. (28) a
Let I = (i 1 i 2 )t denote the vector that concatenates the data. As a linear combination of jointly Gaussian variables (o and n), I is a Gaussian vector. 5 In the vocabulary of probabilities, to integrate out (i.e., to marginalize) a quantity means to compute a marginal probability law by summing over all possible values of the quantity.
A TECHNIQUE FOR WAVE - FRONT SENSING
23
Maximizing p(i 1 , i 2 , a; θ) = p(I , a; θ) is thus equivalent to minimizing the following criterion: LMAP (a, θ) =
1 1 ln det(RI ) + (I − mI )t RI−1 (I − mI ) 2 2 1 1 + ln det(Ra ) + a t Ra−1 a + B, 2 2
(29)
where B is a constant, mI = (H1 om H2 om )t , and RI E[I I t ] − E[I ]E[I ]t is the covariance matrix of I . 1. Expression of RI−1 The expression of RI−1 is obtained by the block matrix inversion lemma (Gantmacher, 1966): Q11 Q12 −1 RI = (30) Q21 Q22 with
Q12 Q21 Q22
−1 −1 H1 Ro H1t + σ 2 Id − H1 Ro H2t H2 Ro H2t + σ 2 Id H2 Ro H1t , −1 = −Q11 H1 Ro H2t H2 Ro H2t + σ 2 Id , −1 = − H2 Ro H2t + σ 2 Id H2 Ro H1t Q11 ,
−1 −1 = H2 Ro H2t + σ 2 Id − H2 Ro H1t H1 Ro H1t + σ 2 Id H1 Ro H2t . (31)
Q11 =
2. Determinant of RI Let Δ be a matrix that reads
Δ=
A C
B D
in a block form. Its determinant is given by det(Δ) = det(A) det(D − CA−1 B). Using this formula, it is easy to calculate the determinant of RI : det(RI ) = det H1 Ro H1t + σ 2 Id
× det H2 Ro H2t + σ 2 Id −1 − H2 Ro H1t H1 Ro H1t + σ 2 Id H1 Ro H2t . (32)
24
MUGNIER ET AL .
3. Marginal Criterion Substituting the definition of mI and the expression of RI−1 of Eq. (30) into (I − mI )t RI−1 (I − mI ) yields the following expression: (I − mI )t RI−1 (I − mI ) = (i 1 − H1 om )t Q11 (i 1 − H1 om ) + (i 1 − H1 om )t Q12 (i 2 − H2 om ) + (i 2 − H2 om )t Q21 (i 1 − H1 om ) + (i 2 − H2 om )t Q22 (i 2 − H2 om ). Basic algebraic manipulations yield the following expression for the marginal criterion: LMAP (a, θ) 1 1 = ln det(RI ) + ln det(Ra ) 2 2 1 t + i 1 i 1 + i t2 i 2 2 2σ 1 t − i 1 H1 + i t2 H2 + σ 2 otm Ro−1 R H1t i 1 + H2t i 2 + σ 2 Ro−1 om 2 2σ 1 t −1 (33) + a Ra a + B, 2 where R = (H1t H1 + H2t H2 + σ 2 Ro−1 )−1 . 4. Relationship Between the Joint and the Marginal Criteria The comparison of the expression of the criterion LMAP [Eq. (33)] and of the criterion LJMAP [Eq. (23)] shows that the two criteria are related by the following relationship: 1 ln det(RI ) − N 2 ln σ 2 2 1 − ln det Ro + L JMAP (a, θ) + C, (34) 2 where C is a constant. If we focus only on the terms depending on the phase (i.e., suppose that the hyperparameters are known), relationship (34) can be summarized by (Goussard et al., 1990): LMAP (a, θ) =
1 ln det(RI ) + LJMAP (a) + C , (35) 2 where C is a constant. Thus, the difference between the marginal and the joint estimator consists of a single additional term dependent on the phase, which is ln det(RI ). Although the two estimators differ only by one term, it is shown in Section IV that their properties differ considerably. LMAP (a) =
A TECHNIQUE FOR WAVE - FRONT SENSING
25
5. Expression in the Fourier Domain In practice, the marginal estimator is computed in the Fourier domain. Using the circulant approximations (see Subsection III.A.2) for H1 , H2 , and Ro and noting that ln det(Ro ) = v ln So (v), the term ln det(RI ) [Eq. (32)] can be expressed as follows: ln det(RI ) = ln So (v) + N 2 ln σ 2 v
+
2 2 σ2 . ln h˜ 1 (a, v) + h˜ 2 (a, v) + So (v) v
(36)
Combining Eqs. (26), (34), and (36) gives the marginal estimator in the Fourier domain: LMAP (a, θ) ln So (v) + N 2 ln σ 2 = v
+
2 2 σ2 ln h˜ 1 (a, v) + h˜ 2 (a, v) + So (v) v
+
1 |˜ı1 (v)h˜ 2 (a, v) − ı˜2 (v)h˜ 1 (a, v)|2 2 v σ 2 |h˜ 1 (a, v)|2 + |h˜ 2 (a, v)|2 + σ 2
+
1 |h˜ 1 (a, v)o˜ m (v) − ı˜1 + |h˜ 2 (a, v)o˜ m (v) − ı˜2 (v)|2 2 2 v So (v) |h˜ 1 (a, v)|2 + |h˜ 2 (a, v)|2 + σ
So (v)
(v)|2
So (v)
1 1 + ln det(Ra ) + a t Ra−1 a + B. 2 2 Let us see, now, how the hyperparameters can be estimated.
(37)
6. Unsupervised Estimation of the Hyperparameters For the marginal estimator, the estimation of θ o = (k, vo , p, om ) and θ n = σ 2 can be jointly tackled with the aberrations, according to: ˆ θˆ o , θˆ n ) = arg max p(i 1 , i 2 , a; θ). (a,
(38)
a,θ o ,θ n
The criterion LMAP (a) of Eq. (29) becomes LMAP (a, σ 2 , k, vo , p, om , θ a ). It must be minimized with respect to the aberrations a and the five hyperparameters (σ 2 , k, vo , p, om ). If we adopt the change of variable μ = σ 2 /k, the cancellation of the derivative of the criterion with respect to k
26
MUGNIER ET AL .
ˆ μ, vo , p, om , θ a ), which minimizes the gives a closed-form expression k(a, criterion for given values of the other parameters. Injecting kˆ into LMAP yields LMAP (a, μ, vo , p, om , θ a ). There is no closed-form expression for μ, ˆ vˆo , p, ˆ and oˆ m but it is easy to calculate the analytical expression of the gradients of the criterion with respect to these hyperparameters and then to use numerical methods for the minimization of the criterion.
C. Extended Objects The problem of edge effects must be addressed in order to process extended scenes (Earth or solar observations). This problem is due to the fact that the joint and the marginal criteria are expressed in the Fourier domain thanks to an approximation: the convolutions are made using FFTs. This introduces a periodization that produces severe wraparound effects on extended objects. To solve this problem, two solutions have been proposed in the phase-diversity literature: the apodization technique (Löfdahl and Scharmer, 1994) and the guard-band technique (Seldin and Paxman, 1994). 1. Apodization To reduce the computing time required to minimize the joint and the marginal criteria, they are computed in Fourier space and thus do not take into account the effects of boundaries of the images. To reduce the edges effects while still computing o(a) ˆ with FFTs, the images can be apodized. The entire apodization of the data by a Hanning window has been first used by Paxman and Crippen (1990) but led to poor results. Löfdahl and Scharmer (1994) suggested to apodize only the edges of the images by a modified Hanning window (an example of a 1D Hanning window and of a modified Hanning window are shown in Figure 3 for comparison). In this technique, the summation in the joint criterion expression [Eq. (24), line 2] is computed in the image space instead of the Fourier space (according to Parseval’s theorem). It allows keeping in the summation only data that have not been apodized (i.e., the ones for which the apodization function is unity). Note that this method can be easily adapted to the marginal criterion. This type of apodization has been used already in speckle techniques (Von der Lühe, 1993) and works well with phase-diversity data. The advantage of this technique is that it provides fast computation of the criterion. Its disadvantage, apart from the fact that it is approximate, is that a part of the data is apodized and is not used in the estimation.
A TECHNIQUE FOR WAVE - FRONT SENSING
27
F IGURE 3. A 1D Hanning window (dotted line) and a 1D modified Hanning window (solid line) for comparison.
2. Guard Band Another way of tackling the edge effects has been proposed for the joint estimator by Seldin and Paxman (1994) and is given below for both estimators. Joint estimator The technique consists first in acknowledging the fact that object pixels beyond the field of view of the image do influence the data due to the convolution operator involved in the image formation [see Eq. (10)] and estimating the object value on the guard-band pixels (i.e., pixels beyond the effective field of view) as well. The criterion is minimized numerically with respect to the aberrations and the object (no fast solution for computing o). ˆ The guard-band width depends on the severity of the aberrations (i.e., on the effective PSF support width). Second, in practice, the object and the PSF 2D arrays are immersed in arrays of size larger than the sum of their support in order to be able to compute h o exactly by means of FFTs. Marginal estimator To apply the guard-band technique to the marginal estimator LMAP (a, θ ), a new algorithm is used, called the “alternating” marginal estimator and noted Lalt MAP (o, a, θ). The relationship between the joint estimator and the marginal one [see Eq. (34)] can be summarized by LMAP (a, θ) = L JMAP (a, θ) + ε(a, θ). The alternating marginal criterion is then defined by Lalt MAP (o, a, θ) = LJMAP (o, a, θ) + ε(a, θ) and: alt arg min Lalt MAP (o, a, θ) = arg min arg min LMAP (o, a, θ) o,a,θ
a,θ
o
= arg min arg min LJMAP (o, a, θ) + ε(a, θ) a,θ
o
28
MUGNIER ET AL .
= arg min L JMAP (a, θ) + ε(a, θ) a,θ
= arg min LMAP (a, θ).
(39)
a,θ
The minimization of Lalt MAP (o, a, θ) with respect to o, a, and θ is therefore equivalent to the minimization of LMAP (a, θ) with respect to the sole a and θ. The guard band can then be applied to the criterion Lalt MAP (o, a, θ). In the guard-band technique, the measured data are unperturbed but the disadvantage of this method is the extensive computation time (due to the iterative estimation of the object).
IV. P ROPERTIES OF THE P HASE E STIMATION M ETHODS This section studies the properties of the two phase estimation methods presented in the previous section, by means of simulations. Their asymptotic properties, the influence of the hyperparameters on the quality of the estimated phase, and finally their performance are compared. A. Image Simulation The simulations have been obtained in the following way: our object is an Earth view. The aberrations are due to the imperfections of the optical system. The phase is a linear combination of the first 21 Zernike polynomials, with coefficients listed in Table 1; the estimated phase is expanded on the same polynomials. The defocus amplitude for the second observation plane is 2π radians, peak to valley. The simulated images are monochromatic and are sampled at the Shannon rate. They have been obtained by convolution between the PSF and the object, computed in the Fourier domain using FFTs. The result is corrupted by a stationary white Gaussian noise (Figure 4). The images generated in this way are periodic. This is an artificial situation, under TABLE 1 VALUES OF THE C OEFFICIENTS U SED FOR S IMULATIONS Coefficient
a4
a5
a6
a7
a8
a9
a10
a11
a12
Value (rd)
−0.2
0.3
−0.45
0.4
0.3
−0.25
0.35
0.2
0.1
Coefficient
a13
a14
a15
a16
a17
a18
a19
a20
a21
Value (rd)
0.05
−0.05
0.05
0.02
0.01
−0.01
−0.02
0.01
0.01
A TECHNIQUE FOR WAVE - FRONT SENSING
(a)
(b)
(c)
(d)
29
F IGURE 4. Aberrated phase (a) of RMS value λ/7 and true object (b) used for the simulation. Simulated focused (c) and defocused (d) images.
which Ro is truly circulant block circulant. The fact that the images are periodic allows estimation of the phase without the additional computing cost of the guard-band technique. B. Asymptotic Properties of the Two Estimators for Known Hyperparameters For the time being, we consider that the hyperparameters are the “true” ones— we fit the PSD of the object using the true object, and we assume that σ 2 is
30
MUGNIER ET AL .
known (note that the mean object is set to zero). Additionally, as in all the following, no regularization on the aberrations is introduced, save the fact that only the Zernike coefficients a4 to a21 are estimated. Consequently, the marginal estimation (which was based on a MAP approach) now corresponds to a maximum likelihood (ML) estimation. Similarly, the JMAP approach corresponds to a GML estimation because the phase is not regularized (see Subsection III.A.1). The minimized criteria will then be denoted LML and LGML , respectively. Figure 5 shows the bias, standard deviation, and root mean square error (RMSE) on the phase provided by the joint method (left) and the marginal method (right) as a function of the noise level for three image sizes (128 × 128, 64 × 64, and 32 × 32 pixels). These three quantities are defined as: ˆ k − aktrue ] • The empirical bias, b = 21 k=4 [a • The empirical standard deviation, σ = [ 21 ˆ k − aˆ k )2 ]1/2 k=4 (a • The empirical RMSE, e = (b2 + σ 2 )1/2 The empirical average is done on 50 different noise realizations. Furthermore, for the image sizes of 32 × 32 and 64 × 64 pixels, the quantities are averaged on all the subimages of 32 × 32 pixels (respectively 64 × 64) contained in the image of 128 × 128 pixels. For joint estimation, the bias increases with the noise level. Furthermore, processing a larger number of data is not favorable in terms of bias. On the contrary, the standard deviation of the phase estimate is a decreasing function of the image size. Finally, the RMSE, which is dominated by the bias term, does not decrease as the number of data increases. This pathological behavior meets several statistical studies (Champagnat and Idier, 1995; Little and Rubin, 1983): the estimate does not converge toward the true value as the size of the data set tends to infinity. An intuitive explanation of this phenomenon is that, if a larger image is used to estimate the aberrations, the size of the object, which is jointly reconstructed, increases also, so that the ratio of the number of unknowns to the number of data does not tend toward zero. On the contrary, for marginal estimation, the ratio of unknowns to data tends toward zero because the number of unknowns stays the same regardless of the size of the data set. In this case, the bias, standard deviation, and RMSE of the phase decrease when the number of data increases (see Figure 5, right). Indeed, under broad conditions, the marginal estimator is expected to converge, since it is a true ML estimator (Lehmann, 1983; De Carvalho and Slock, 1997). The curve of the joint estimation standard deviation presents an irregularity for the noise level of 14% and an image size of 128 × 128 pixels. This surprising result can be interpreted by looking at the different phase estimates obtained in this condition, which are shown Figure 6. The minimization of the joint criterion leads to two different sets of aberration coefficients. This
A TECHNIQUE FOR WAVE - FRONT SENSING
31
F IGURE 5. Bias, standard deviation, and RMSE of phase estimates as a function of noise level given in percent (it is the ratio between the noise standard deviation and the mean flux per pixel). Left figures are for the joint estimator, right figures for the marginal one. The solid, dashed, and dotted lines, respectively, correspond to images of dimensions 128 × 128, 64 × 64, and 32 × 32 pixels. All estimates were obtained as empirical averages on 50 independent noise realizations.
explains why the standard deviation reaches a much larger value for this simulation condition. It also shows that the joint criterion presents local minima. Furthermore, we have empirically checked that it presents local minima regardless of the size of the data set, whereas such local minima are not seen with the marginal criterion (see Figure 5, right). Indeed, the
32
MUGNIER ET AL .
F IGURE 6. The different aberration estimates obtained with the joint method for the image size of 128 × 128 pixels and a noise level of 14%. Dashed line: true aberrations; solid line: aberration estimates.
marginal criterion tends to be asymptotically more and more regular. The latter observation is in agreement with the asymptotic Gaussianity of the likelihood, which is expected under suitable statistical conditions. C. Joint Estimation: Influence of the Hyperparameters An important problem for the estimation of the aberrations (and the object) is the tuning of the hyperparameters. For the joint estimator, we have pointed out that they must be adjusted by hand. Particularly important is the global hyperparameter, which we shall denote by μ and is the one that quantifies the trade-off between goodness of fit to the data and to the prior.6 Let us study its influence on the joint method. Figure 7 shows the RMSE on the phase estimates and on the object estimate as a function of the value of this hyperparameter (its true value is μ = 1). The RMSE on the object is defined ˆ − otrue (r))2 ]1/2 /[ r o(r) ˆ 2 ]1/2 . We see that the best value of as [ r (o(r) this hyperparameter (i.e., the one that gives the lower error on the estimate) is not the same for the object and for the phase. It means that the object and the phase cannot be jointly optimally restored. Note that the optimal hyperparameter value for the object coincides with the true value μ = 1. If the parameter of interest is the phase, the object must be underregularized 6 For the Gaussian prior used in this work, tuning μ is equivalent to tuning a scale factor in the object PSD So .
A TECHNIQUE FOR WAVE - FRONT SENSING
33
(a)
(b)
F IGURE 7. Plots of RMSE for joint phase estimates (dashed line—see right vertical axis) and joint object estimate (solid line—see left vertical axis) as a function of the value of the hyperparameter μ for an image size of 32 × 32 pixels. Figure (a) depicts a noise level of 14%; Figure (b) is 4%.
to have a better estimation of the aberrations. The behavior of RMSE on the phase strongly depends on the noise level. For a high noise level (14% here), there is an optimal basin. However, for lower noise levels (4% and below), any value under 1, including a near null regularization (“near null regularization” means that the parameter μ is not equal to zero but to a small arbitrary constant [10−16 in our case] to avoid numerical problems due to computer precision), is almost optimal with respect to estimation of the aberrations, even though the jointly estimated object becomes of the poorest quality. This observation sheds some light on the fact that, when the parameters of interest are the aberrations and when the noise level is low, estimation without object regularization can be successfully used as the literature testifies (Thelen et al., 1999b; Meynadier et al., 1999; Seldin and Paxman, 2000; Carrara et al.,
34
MUGNIER ET AL .
F IGURE 8. Performance of the joint estimator. RMSE of the phase estimates as a function of noise level for a near null regularization, for three image sizes.
2000). This empirical observation has also led us to study the asymptotic behavior of the joint estimator with near null regularization. Figure 8 shows the results. In this case, when the number of data increases, the RMSE on the aberrations estimates decreases. Although the number of data samples over the number of unknowns is the same as that of the estimation with the true hyperparameters, the estimator behaves as if the object were not being estimated. This surprising behavior of joint aberration estimates when the regularization parameter μ vanishes has been recently explained in Idier et al. (2005); this study has shown that the GML is a consistent phase estimator (i.e., it converges toward the true value as the number of data increases).
D. Marginal Estimation: Unsupervised Estimation For the marginal estimator, we must show that the unsupervised estimation of the hyperparameters (i.e., when the hyperparameters are estimated jointly with the aberrations) yields good aberration estimates. To this end, we compare the quality of the aberrations reconstruction obtained either by minimizing LML (a) with the true hyperparameters or by minimizing LML (a, μ, vo , p), for several image sizes. As shown in Figure 9, for low noise levels, the unsupervised restoration is very good (the maximum difference is less than 5%). For 128 × 128 pixels, it is quite good (the maximum difference is less than 15%) for any noise level. Only for 32 × 32 pixels and high noise levels is the reconstruction seriously degraded because of the lack of information contained in the noisy data.
A TECHNIQUE FOR WAVE - FRONT SENSING
35
F IGURE 9. Performance of the marginal estimator with the true hyperparameters (plus signs) and with an unsupervised estimation (diamonds), as measured by the RMSE of the phase estimates as a function of the noise level.
E. Performance Comparison We present the performance comparison of the joint and the marginal estimators for phase estimation. In order to compare these estimators in a realistic way, we use the joint estimator with a near null regularization (which provides good results for the estimation of the aberrations as seen in Subsection IV.C) and the unsupervised marginal estimator, described in Subsection III.B.6. We compare the RMSE of the phase estimates as a function of noise level for two image sizes (32 × 32 pixels and 128 × 128 pixels). The results for the two estimators are plotted in Figure 10. We can see two different domains: when the signal to noise ratio (SNR) is high (noise level < 5%), the two estimators give approximately the same results. At lower SNR (5% < noise level < 20%), marginal estimation is significantly better. Note that this result has been checked on experimental data. The performance comparison depends on the studied object. A comprehensive comparison of the two methods has been done in Blanc (2002). This study has shown that the marginal method leads to better phase estimates for high noise levels. For low noise levels, the joint estimator performs sometimes slightly better than the marginal one; the significance of the difference between the two phase estimates depends on the observed object. These performance differences between several observed objects are probably due to the Gaussian hypothesis on the object used in the marginal approach—see Subsection III.B.
36
MUGNIER ET AL .
F IGURE 10. RMSE of the aberrations estimates for the unsupervised marginal estimator (diamonds) and for the joint one with near null regularization (plus signs), as a function of the noise level.
F. Conclusion The two latter sections have presented two methods for estimating the aberrations. The conventional estimator found in the literature, which is interpretable as a joint maximum a posteriori approach, is based on a joint estimation of the aberrated phase and the observed object. It has been shown, by means of simulations, that it has poor asymptotic properties unless it is underregularized and that it does not allow an optimal joint estimation of the object and the aberrated phase. The joint estimator without an object regularization can nevertheless be successfully used to estimate the aberrations, thanks to the fact that the GML is a consistent phase estimator. The marginal estimator, which estimates the sole phase by maximum a posteriori, is obtained by integrating the observed object out of the problem. This reduces drastically the number of unknowns, allows the unsupervised estimation of the regularization parameters, and provides better asymptotic properties than the joint approach. Finally, the comparison of the quality of the phase restoration has shown that the marginal method leads to better phase estimates for high noise levels and that the two estimators yield quite similar performance (sometimes slightly better for the joint one depending on the observed object) for low noise levels.
V. R ESTORATION OF
THE
O BJECT
When phase diversity is used as a WFS, the estimation of the aberrations is the unique goal. On the contrary, in the case of image restoration, the object is
A TECHNIQUE FOR WAVE - FRONT SENSING
(a)
(b)
37
(c)
F IGURE 11. Joint estimation: (a) True object. (b) and (c) Object restored by the joint method (with near null regularization used). The noise level is equal to 1% for (b). In this case, the RMSE is equal to 115%. Restoration (c) depicts a noise level of 14%; the RMSE is 1500%.
the parameter of interest. This section shows how the object can be estimated by the joint and by the marginal approaches. A. With the Joint Method In the case of the joint method, the object is estimated jointly with the aberrations by JMAP. Section III.A.3 has shown that the hyperparameters of the object must be adjusted empirically by the user, which is not easy, especially with an extended object. Furthermore, we have seen in Subsection III.A.3 that the joint method does not allow an optimally joint estimation of the object and of the aberrations. In particular, a near null regularization is favorable for a good estimation of the aberrations. Let us now see the object obtained with no regularization. Figure 11 shows the results of the restoration of the object for two noise levels (1% and 14%) and an image size of 64 × 64 pixels (the first image is the true object). The image simulations are obtained in the same conditions as in Section IV.A. As shown, when there is no regularization on the object, at the minimum of the joint criterion, the object estimate is completely buried in noise. In order to obtain a good object estimate, the object estimation needs to be regularized, which has been done either by incorporating an explicit regularization term into the criterion (see Section III.A.1) or by interrupting the iterative minimization before the convergence (Strand, 1974; Thelen et al., 1999b). Yet, as mentioned above, this is prejudicial to the phase estimation and thus does not lead to the best object estimate. Additionally, it leads to the problem of hyperparameter tuning.
38
MUGNIER ET AL .
B. With the Marginal Method 1. Principle The marginal estimator has been obtained by integrating the observed object out of the problem. It is based on a MAP approach and restores the sole aberrations. The previous section has shown that this estimator has good asymptotic properties and allows the unsupervised estimation of the noise variance and of the regularization parameters of the object. Furthermore, the marginal method provides a simple way to restore the object. The idea is to calculate oˆ once the aberrations aˆ marg and the hyperparameters θˆ marg are estimated by the marginal method. In particular, this estimation can be done by MAP: oˆ MAP (aˆ marg , θˆ marg ) = arg max f (o|i 1 , i 2 , aˆ marg ; θˆ marg ) o
= arg max f (i 1 , i 2 , aˆ marg |o; θˆ marg )f (o; θˆ marg ). (40) o
Maximizing f (i 1 , i 2 , aˆ marg |o; θˆ marg )f (o; θˆ marg ) is thus equivalent to minimizing the following criterion: LMAP (o) ∝
i 1 − H1 (aˆ marg )o2 i 2 − H2 (aˆ marg )o2 + σ2 σ2 t −1 + (o − om ) Ro (o − om ).
(41)
The cancellation of the derivative of this criterion with respect to o gives the following closed-form expression: (42) oˆ MAP (aˆ marg , θˆ marg ) = R −1 H1t i 1 + H2t i 2 + σ 2 Ro−1 om , where R = (H1t H1 +H2t H2 +σ 2 Ro−1 ). The object oˆ MAP is the bi-frame Wiener filter associated with the aberration and hyperparameter estimates aˆ marg and θˆ marg . Note that the restoration is given by a Wiener filter as with the joint estimator. But the two methods differ in their tuning of the regularization parameters and in their estimation of the aberrations. In the case of a compact object, the criterion LMAP (o) and the closedform expression of oˆ MAP can be written in the Fourier domain. For objects more extended than the image, the computation of the criterion requires the introduction of a guard band and the minimization is done iteratively (see Section III.C.2 for details). To summarize, the object is restored in two steps: first the phase and the hyperparameters linked to the noise and the object (θ n , θ o ) are estimated. Then the object is restored by a MAP approach. This approach benefits from
A TECHNIQUE FOR WAVE - FRONT SENSING
(a)
(b)
39
(c)
F IGURE 12. Marginal estimation. (a) True object. (b) and (c) Object restored with the marginal method. The noise level is equal to 1% for (b). In this case, the RMSE is equal to 22%. Restoration (c) is for a noise level of 14%; the RMSE is 38%.
the good properties of this estimator: oˆ MAP (aˆ marg , θˆ marg ) converges toward oˆ MAP (aˆ true , θˆ true ) as the size of the data increases. Additionally, although the restoration of the object is formally done in a second step, in practice, it is in fact performed at each computation of the marginal criterion [see Eqs. (34) and (25)], so it is available at convergence. 2. Results Figure 12 presents the restoration of the object obtained by this approach (same conditions as for the joint restoration). The RMSE on the object is equal to 22% for a noise level of 1% and 38% for a noise level of 14%. The quality of the object estimates provided by this method is satisfactory. Thus, this method provides a robust and easy (although indirect) MAP estimation of the object. 3. Influence of the Hyperparameters Section IV.C has shown that the JMAP estimation of the object and the phase does not allow optimal reconstruction of the two quantities. We now study the influence of the global hyperparameter on the RMSE of marginal estimates. By contrast with the joint estimator, Figure 13 indicates that here there is a unique optimal hyperparameter for both the object and the aberrations. We have drawn the curves only for the 4% noise level because the general behavior of this estimator is the same for all noise levels. The marginal method is thus able to optimally restore the aberrations and the object.
40
MUGNIER ET AL .
F IGURE 13. Plots of the RMSE for the marginal phase estimate (dashed line; see right vertical axis) and for the marginal object estimate (solid line; see left vertical axis) as a function of the value of the hyperparameter μ, for an image size of 32 × 32 pixels, and a noise level of 4%.
C. With a “Hybrid” Method 1. Principle For low noise levels, the joint method with no regularization on the object provides a very degraded object estimate but a good restoration of the aberrations (sometimes even better than the marginal method—see Section IV.E). Additionally, the estimation of the aberrations by the joint method requires less computation time than the restoration by the marginal method because the latter requires hyperparameter estimation. We thus propose to use the joint method to estimate the aberrations, then to estimate the sole hyperparameters for these aberrations by the marginal method. This step is fast since only few parameters remain to be estimated. At this point, the conditions are the same as in the case of the marginal estimation: the aberrations and the hyperparameters are known. Then we estimate the object by a MAP approach. This method is called the “hybrid” method and consists in the three steps detailed below. 2. The Three Steps • Estimation of the aberrations The estimate of the aberrations is obtained jointly with the object by a GML approach, that is, without a regularization of the object (see Section III.A.1 and IV.B): ˆ GML = arg max f (i 1 , i 2 |a; o, θ )f (a; θ ). (ˆo, a) (43) o,a
From this estimation, the sole aberrations aˆ GML are kept.
A TECHNIQUE FOR WAVE - FRONT SENSING
41
• Estimation of the hyperparameters The estimation of the hyperparameters relative to the noise and to the object is done by a ML approach. The joint probability density function of the object o, the images i 1 and i 2 , and the aberrations a is marginalized with respect to the object o, as done for the marginal estimator (except that here the aberrations a are fixed to their values estimated by GML): θˆ = arg max
f (i 1 , i 2 , o, aˆ GML ; θ ) do
θ
= arg max f (i 1 , i 2 , aˆ GML ; θ ).
(44)
θ
The expression of the associated criterion, as well as its implementation and computation, are identical to the ones used in Section III.B.6 for the marginal estimation of the hyperparameters and the aberrations. This step is very fast due to the fact that the minimization is done with fixed aberrations and only the (three) hyperparameters are estimated. • Estimation of the MAP object From the aberrations aˆ GML and the hyperparameters θˆ estimated in the two previous steps, the object is restored by a MAP approach: oˆ MAP (aˆ GML , θˆ ) = arg max f (i 1 , i 2 , aˆ GML |o; θˆ )f (o; θˆ ).
(45)
o
This step is identical to the restoration of the object from the marginal estimator (see the previous subsection) except that here this step must be explicitly computed. 3. Results We present the results of the object estimation obtained by the hybrid method in Figure 14 (the simulation conditions are always the same). The RMSE on the object is 22% for a noise level of 1% and 41% for a noise level of 14%. The object estimates obtained by the joint method with no regularization (Section V.A) do not bear comparison with these estimates: here the quality of the restoration is close to the one provided by the marginal method. For the noise level of 1%, the hybrid method and the marginal one yield nearly identical object estimates. For the lower SNR, the marginal method performs slightly better. This is not surprising: for high noise level, the quality of the aberration estimates obtained by the marginal method is better than the one given by the joint method (see Section IV.E).
42
MUGNIER ET AL .
(a)
(b)
(c)
F IGURE 14. Hybrid estimation: (a) True object. (b) and (c) Restored object with the marginal method. The noise level is equal to 1% for (b). In this case, the RMSE is equal to 22%. Restoration (c) is for a noise level of 14%; the RMSE is 41%.
D. Conclusion We have shown that the joint method does not provide good object estimates. For high noise levels, the marginal method provides a simple and robust way to estimate the object. For lower noise levels, we propose to use the good phase estimate obtained by the joint method, in a novel hybrid method, leading to a fast and satisfactory estimation of the object.
VI. O PTIMIZATION M ETHODS A. Projection-Based Methods As mentioned in Section I, phase-diversity techniques originate from the problem of phase retrieval, where the observed object o is a point source and only one focused image i is available. In such conditions, it is clear from Eqs. (1) and (11) that the observed image reads i = |FT−1 (P ej φ )|2 + n. Neglecting the noise component and assuming that P is known (as the indicatrix of the pupil), the issue of recovering the pupil phase φ can be restated in a more basic way: how to recover the phase φ of a function of known modulus P , given the modulus ρ = i 1/2 of its Fourier transform. The most common approach for solving this problem is to alternately enforce the known moduli in the two domains, which leads to the Gerchberg– Saxton algorithm (Gerchberg and Saxton, 1972; see also Gerchberg, 1974; Papoulis, 1975; Fienup, 1982): 1. Given a current value of φ, compute g 0 = FT−1 (P ej φ ). 2. Replace the modulus of g 0 by ρ, that is, compute g = ρ g 0 /|g 0 |.
A TECHNIQUE FOR WAVE - FRONT SENSING
43
3. Compute G = FT(g). 4. Take the phase of G as the new current value of φ, and go back to Step 1. The enforcement of the constraints can be mathematically interpreted as projections onto two subsets S1 and S2 of the set of phase vectors φ. At convergence, the aim is to find a phase vector that belongs to S1 ∩ S2 . Variants of the same alternating procedure are also encountered, for instance, to tackle the problem where P is an unknown nonnegative function of known support (Fienup, 1982; Bauschke et al., 2002). In the simplest phase-diversity problem, a couple of images of a point source are available according to Eqs. (12) and (13). Adapted versions of the Gerchberg–Saxton algorithm have been proposed to cope with this situation (Misell, 1973; Baba and Mutoh, 2001). In the general phase-diversity problem, the object is unknown, and the Gerchberg–Saxton algorithm does not easily generalize (see Baba and Mutoh, 1994, for an attempt). Instead, optimization procedures are more commonly found in the literature of phase diversity, either to minimize the plain leastsquare criterion of Eq. (14) or a penalized version of it [e.g., Eq. (21)]. In the authors’ mind, least-square minimization techniques should not be considered as stopgap solutions when the Gerchberg–Saxton algorithm is not readily usable. On the contrary, for several reasons, adopting the optimization framework should rather be seen as highly recommendable to solve phasediversity (or even phase retrieval) problems: • The Gerchberg–Saxton algorithm will not converge if S1 ∩ S2 is an empty set (i.e., if the data set is not feasible). This difficulty is shared by nearly all projection techniques. Obviously, the nonfeasible case is far from academic. It is rather expected to correspond to practical situations, as far as noisy measurements and approximated models must be handled. In Censor et al. (1983) and in Byrne (1997), modified projection techniques are introduced to cope with nonfeasible, linear problems. Convergence is then shown toward a least-square solution. • In Fienup (1982), the Gerchberg–Saxton algorithm is tested against a least-square approach based on a gradient-search method, on a phase retrieval problem. The former approach is reported to converge very slowly compared to the latter. Finally, it is the authors’ view that much clarity is gained when an objective function (such as a least-square criterion) is explicitly defined, first and foremost. Only then, an appropriate algorithm is to be selected among different families of optimization schemes, one of which is based on successive projections. In the remaining part of this section, two other families are introduced: line-search methods and trust-region schemes. For the sake of simplicity, we mainly restrict our attention to the minimization of L = L JMAP
44
MUGNIER ET AL .
defined by Eq. (23) [or indifferently to L = LMAP defined by Eq. (33)], as a function of aberration parameters a. B. Line-Search Methods Line-search minimization is by far the most commonly adopted framework in the context of phase diversity (see, e.g., Gonsalves, 1982; Fienup, 1982; Paxman et al., 1992; Blanc et al., 2003b). Each iteration of a line-search method is twofold: first, a search direction p is computed. Then, a stepsize α is determined, which corresponds to how far to move along p . The resulting iteration reads: a +1 = a + α p . Important particular cases are obtained according to specific choices of the search direction. 1. Strategies of Search Direction A steepest descent method is obtained when p is chosen opposite to the criterion gradient g = ∇L(α ). Such a method is computationally simple but rather slow to converge. Conjugate gradient methods form a very useful extension of steepest descent, since they usually converge much faster, for almost the same low computational requirement. They were originally designed to minimize convex quadratic functions, but they also provide an efficient and popular approach to general optimization problems. According to conjugate gradient methods, each direction p is built as a linear combination of −g and the previous direction p −1 : p = −g + β p−1 , where β is chosen to generate mutually conjugated search directions when the criterion is convex quadratic. For instance, the Fletcher–Reeves choice for the conjugate directions corresponds to β =
g t g . g t−1 g −1
In the phase-diversity context, conjugate gradient methods are proposed in Gonsalves (1982), Fienup (1982), Paxman et al. (1992), and Blanc et al. (2003b). p = −H−1 g yields Newton’s method when H is the Hessian ∇ 2 L(α ). The asymptotic convergence of Newton’s method is fast, but its behavior far
A TECHNIQUE FOR WAVE - FRONT SENSING
45
from the solution is hardly predictable, especially when L is not a convex criterion. Moreover, it is more computationally demanding than the previous methods, since each search direction p is the solution of a linear system. Quasi-Newton methods correspond to p = −B−1 g ,
(46)
where B is “well-chosen.” Typically, matrix B results from a trade-off: (1) it is close to H for large values of ; (2) it is a symmetric and structurally positive definite, even for the first iterations (contrarily to the Hessian H ); (3) it is more easily invertible than H . The most popular quasi-Newton algorithm is the BFGS method (for Broyden–Fletcher–Goldfarb–Shanno), where B−1 is maintained and updated, rather than matrix B itself. For largescale problems, it is still too demanding to handle matrices B−1 . Limitedmemory BFGS directly approximates B−1 g , using information gathered from the m earlier iterations, where m is usually a small number. For more details on BFGS and on limited-memory BFGS, see Nocedal and Wright (1999). Vogel et al. (1998) reports that the performance of BFGS applied to the minimization of L = L JMAP is disappointing; Vogel (2000) resorts to limited-memory BFGS. 2. Step Size Rules Choosing a clever strategy of search direction is not a sufficient condition to obtain an efficient minimization algorithm. Modern analyzes of convergence combine both the search direction and the choice of the step size α . The presumably “ideal” step size rule α = arg minα L(a + α p ) usually has no closed-form expression. We are thus led to employ so-called inexact step size strategies (i.e., inexact line search strategies [Moré and Thuente, 1994]). To find even a local minimizer of f (α) = L(a + α p ) needs a certain amount of iterations of a univariate minimization method, depending on the required precision. In this respect, a key result is brought by sufficient decrease conditions such as Wolfe’s (Nocedal and Wright, 1999): inexact step size rules that satisfy such conditions, in conjunction with appropriate search directions, form the ingredients of algorithms that are both implementable and converging. C. Trust-Region Methods Trust-region methods are based on a model function M whose behavior near the current point a is similar to that of the actual criterion L. The model M is usually a chosen convex quadratic. Then, the trust-region approach corresponds to the following strategy:
46
MUGNIER ET AL .
1. Solve pr+1 = arg min M (a + p),
(47)
p
where a + p is restricted to belong to a trust region. The latter is typically chosen as a ball of radius, say r. 2. If the candidate solution pr+1 does not produce a sufficient decrease L(a ) − L(a + p r+1 ), then the trust region is considered as too large: the radius r is decreased, and the minimization step is resolved, as many times as necessary. In practice, it may be computationally expensive to solve Eq. (47). Actually, it suffices to compute inexact solutions, provided that a condition of sufficient decrease still holds. The Levenberg–Marquardt method is often presented as a method of choice to solve nonlinear least-square problems (Press et al., 1992). It is based on updates of the form of Eq. (46), with B = Jt J + λ I , where: • Jt J is a positive semi-definite approximation of the Hessian. • I is the identity matrix. • λ is a parameter that must be adjusted along the iterations to ensure the sufficient decrease of L. The Levenberg–Marquardt method can be viewed as a pioneering trustregion method, where the trust-region is a ball of radius λ , and M (p) = J p + g 2 (Nocedal and Wright, 1999). In the context of phase diversity, Löfdahl and Scharmer (1994) propose a simplified form of the Levenberg–Marquardt method, where λ = 0. Such a choice does not ensure the theoretical convergence of {a } for arbitrary initial points. Vogel et al. (1998) also resort to a variant of the Levenberg– Marquardt method, and Luke et al. (2000) apply limited memory BFGS with trust regions.
VII. A PPLICATION OF P HASE D IVERSITY TO AN O PERATIONAL S YSTEM : C ALIBRATION OF NAOS-CONICA A. Practical Implementation of Phase Diversity 1. Choice of the Defocus Distance An appropriate choice of the defocus distance d is essential to optimize the performance of the phase-diversity sensor. If it is too small, the images are almost identical (not enough diversity), which brings back the problems
A TECHNIQUE FOR WAVE - FRONT SENSING
47
associated with phase retrieval. If it is too large, the defocused image contains virtually no information. The RMS defocus coefficient a4d [such that φd (r) = a4d Z4 (r), see Eq. (13)] depends on d, on the wavelength λ, the telescope diameter D, and the focal length F through: πd a4d = √ 8 3λ(F /D)2
(in radian),
(48)
φd (r) is quadratic, minimum for r = 0 and maximum for r = 1. The corresponding peak-to-valley optical path difference is equal to √ √ d 3λa4d d λ × 2 3a4 = = . (49) = 2π π 8(F /D)2 Some studies (Lee et al., 1999; Meynadier et al., 1999) have shown that choosing d such that is approximately equal to λ provides accurate results. In fact, the “optimal” defocus distance depends on the object structure, on the amplitude of the aberrations, and on the SNR of the images. In practice, a large domain around this value (typically, ≈ λ ± λ/2) still provides good results. This result has been first obtained experimentally (Meynadier et al., 1999). It has been later confirmed in a simplified theoretical framework, using the expression of the Cramer–Rao lower bound for the variance of unbiased estimators (Lee et al., 1999; Prasad, 2004). 2. Image Centering The estimation of aberrations from experimental data requires the determination of two parameters in addition to those describing the aberrated phase (Löfdahl and Scharmer, 1994). These parameters correspond to the relative alignment in x and y between the focused and the defocused images. They can be described by tip-tilt coefficients a2 and a3 in the Zernike basis and can be estimated in the same way as those corresponding to the wave-front, except that they relate only to the defocused image. This estimation of the alignment parameters eliminates the need for subpixel alignment of the images, but the images still need to be first recentered to an accuracy of about one or two pixels (e.g., by cross-correlation) because phases larger than ±π are difficult to estimate by phase diversity and often induce local minima (see Section VIII). 3. Spectral Bandwidth The phase-diversity concept, as described so far, is a monochromatic WFS. Nevertheless, Meynadier et al. (1999) have shown that the use of broadband
48
MUGNIER ET AL .
filters does not significantly degrade the accuracy as long as λ/λ is reasonable (typically λ/λ ≤ 0.15). Note that the concept can be extended to polychromatic images through the appropriate change in the data model (Seldin and Paxman, 2000). B. Calibration of NAOS and CONICA Static Aberrations 1. The Instrument The European Very Large Telescope (VLT) instrument NAOS-CONICA is composed of the AO (Roddier, 1999) system NAOS (Rousset et al., 1998) and of the high-resolution camera CONICA (Lenzen et al., 1998). It is aimed at providing very high-quality images on one (UT4-Yepun) of the 8-m telescopes of the European Cerro Paranal observatory. Figure 15 presents the simplified outline of such an AO instrument. It consists of two different optical paths separated by a beam-splitter: the imaging path and the AO path (having in common the so-called “common path”). The imaging path is composed of the different filters and objectives of CONICA. The AO path consists of the WFS that estimates the optical phase delays of the real-time computer, which applies the appropriate corrections via the deformable mirror (DM). In order to reach the ultimate performance of the instrument, it has been necessary to calibrate the remaining aberrations induced by its optical components. Defects of the wave-front originating from any component within the AO loop (common and AO path) are seen by the AO WFS and thus corrected. This is not the case for a degradation of image quality induced by components
F IGURE 15.
Simplified outline of the VLT instrument NAOS-CONICA.
A TECHNIQUE FOR WAVE - FRONT SENSING
49
outside the AO loop, in the imaging path (i.e., the beam-splitters and filters and objectives of the camera). Phase diversity has been used to calibrate these unseen aberrations. More details about the calibration procedure can be found in Blanc et al. (2003a) and Hartung et al. (2003). Because of the huge number of observation modes of NAOS-CONICA, the calibration has been split into several parts: NAOS dichroics (i.e., the beam-splitters) and CONICA (i.e., the different filters and objectives). The goal is to be able to assign the aberration contributions to the various optical components. Two different procedures have been used to obtain the phase-diversity data (i.e., the focused and defocused images). The first procedure has provided the calibration of the optical path of CONICA, the second one, the aberrations of NAOS dichroics. 2. Calibration of CONICA Stand-Alone Phase diversity setup The estimation of CONICA stand-alone aberrations (objectives and filters) is obtained through the use of pinholes located at differently defocused positions on a wheel in the camera entrance focal plane. Figure 16 depicts the setup for the corresponding measurements. CONICA has as many as 40 possible filters and 7 different camera objectives. A calibration point source is slid in front of CONICA. The telescope pupil is simulated by a cold pupil placed inside CONICA. After rotating the wheel that holds the pinholes, the focused and defocused images are recorded and then the phase-diversity estimation is performed. Example of CONICA aberration estimation The true value of the aberrations is of course not available in the real world. A practical way to be confident in the correctness of the estimated aberrations is to compare the recorded images to the ones reconstructed with these estimated aberrations. Figure 17 shows an example of the comparison between experimental images and the reconstructed ones. Note that, here, the observed object is
F IGURE 16.
Calibration of CONICA stand-alone: use of pinholes in the entrance focal plane.
50
MUGNIER ET AL .
F IGURE 17. Comparison between measured images and PSFs reconstructed from the estimated aberrations. Left: focused plane; right: defocused plane (a log scale is adopted for each image).
close to a Dirac function. The phase estimation is obtained using the joint method with no regularization on the object. The phase estimate is expanded on the first 15 Zernike polynomials. The Strehl ratio (SR)7 computed from the aberrations’ estimate is equal to 87%. It compares nicely to the SR directly computed on the focal plane image, which is equal to 85%. Calibration of the filters and objectives The aberrations estimated from CONICA phase diversity data correspond to the contributions of a filter and a camera objective. Thanks to the fact that the camera objectives are achromatic and that there are many different filters with very different aberrations, the contributions of the filters and of the objectives can be separated. First, the total aberrations for a given camera and a given filter i (1 ≤ i ≤ n) are measured. This operation is repeated for all the n filters. The camera objective contribution a Cam is then estimated by taking the median value of these total aberrations a Ctot,filti for all different filters: a Cam = median a Ctot,filt1 , a Ctot,filt2 , . . . , a Ctot,filtn . (50) Finally, filter aberrations are obtained as the difference between the measured aberrations and the estimated camera aberrations: a filti = a Ctot,filti − a Cam .
(51)
Figure 18 shows the calibration results in the J–H band, for a specific camera objective (C50S) and several filters. The solid line corresponds to the median representing the camera objective aberrations. 7 The Strehl ratio is a common way to describe the quality of the point spread function. It is given by the ratio of the measured to the theoretical, diffraction-limited, peak intensity in the image of a point source.
A TECHNIQUE FOR WAVE - FRONT SENSING
51
F IGURE 18. CONICA’s aberrations measured by several filters in the J and H bands with camera objective C50S. The solid lines indicate the median value representing the aberrations of the camera objective.
3. Calibration of the NAOS Dichroics Phase-diversity setup The estimation of the NAOS dichroic aberrations is obtained through the use of the AO system. Note that NAOS has five different dichroics. A focused image of a fiber source, located in the entrance focal plane of NAOS, is recorded in closed-loop in order to avoid the commonpath aberrations from the optical train between the source and the dichroic. Then a known defocus is introduced on the DM with the AO loop still closed to record the defocused image. This approach gives the sum of a NCtot , the NAOS dichroic aberrations and the CONICA aberrations. Calibration of NAOS dichroic aberrations The contribution of the NAOS dichroics a dichro can be determined by subtracting the total CONICA instrument aberrations a Ctot (provided by the previous calibration of CONICA stand-alone) from the overall NAOS-CONICA instrument aberrations a NCtot : a dichro = a NCtot − a Ctot .
(52)
52
MUGNIER ET AL .
4. Closed-Loop Compensation Thanks to the individual calibration of the different optical components, the correction coefficients are known for any possible configuration of the instrument. The precompensation of these static aberrations can be done by using NAOS, which is able to introduce known static aberrations on the DM in closed loop. In this manner, the DM takes the shape needed for compensation of the static optical aberrations. To demonstrate the final gain in optical quality, we compare the originally acquired images without correction for static aberrations with the images obtained after closed-loop compensation. Figure 19 shows two examples of closed-loop compensations in J-band and in K-band. We achieve a striking correction in J-band, visible with the naked eye
F IGURE 19. Comparison of the PSFs before and after closed-loop compensation. The SR of each PSF is indicated in percent.
A TECHNIQUE FOR WAVE - FRONT SENSING
53
between the images before and after correction. In K-band the noncorrected image is already very close to the diffraction limit, and the improvement is barely visible. However, the computation of the SR (see Figure 19) shows that even in K-band the performed correction is still significant. C. Conclusion These calibrations have shown that phase diversity is a simple and powerful approach to improve the overall optical performance of an AO system. This is of a great interest for future very high SR systems, in which an accurate estimation and correction of aberrations is essential to achieve the ultimate performance and to reach the scientific goals (for instance, regarding exoplanet detection).
VIII. E MERGING M ETHODS : M EASUREMENT OF L ARGE A BERRATIONS A. Problem Statement Large-amplitude aberrations refer to the fact that variations higher than 2π can be encountered. In such a situation, the data used in phase diversity (i.e., focused and defocused images) do not contain the full information on the aberrated pupil phase. In fact, Eq. (1), recalled below, shows that the PSF is related to the pupil phase φ by a nonlinear relationship such that the phase appears in a complex exponential: 2 hopt (x, y) = FT−1 P (λu, λv) ej ϕ(λu,λv) (x, y). Any 2π-variation of any phase point in the pupil leaves the PSF unchanged and thus the image unchanged. Hence the phase-diversity data only contain information on the wrapped phase φ[2π ]. The sole data (whatever their number) do not allow discrimination between the true continuous phase and all the equivalent phases. Figure 20 shows a 1D example of two phases that yield the same image. In the joint criterion as in the marginal one, the term corresponding to the likelihood presents an infinite number of equivalent local minima. Figure 21 illustrates this in a 1D representation: the unwrapped true phase corresponds to one of these minima, but if only using the data, it is not differentiated from the others. Note that, unfortunately, the defocused image, which removes the indetermination of the sign of the even part of the pupil phase (see Subsection I.C.1), has no effect on this indetermination. In practice, this indetermination appears for any strongly
54
MUGNIER ET AL .
F IGURE 20.
The pupil phase in solid line and the one in dashed line yield the same PSF.
F IGURE 21.
Symbolic 1D representation of the likelihood term.
degraded system (i.e., for phase amplitudes greater than 2π ). It is the case, for example, for astronomical observations from the ground where images are strongly degraded by the atmospheric turbulence. This indetermination does not affect a posteriori correction. Indeed, for this application, the object is the parameter of interest; thus it does not bear the indetermination provided that the wrapped phase φ[2π ] is correctly estimated. By contrast, real-time wavefront correction requires the estimation of the unwrapped phase8 and thus the 2π -indetermination must be removed. Regularity criteria are usually invoked to perform phase unwrapping. However, the latter is a difficult task, both from the informational viewpoint (only poor information is available) and from the computational viewpoint (the criterion to optimize is highly multimodal). 8 Especially if the scientific imaging is not performed at the same wavelength as the wave-front sensing or if it is polychromatic.
A TECHNIQUE FOR WAVE - FRONT SENSING
55
B. Large Aberration Estimation Methods In the phase-diversity literature, two different ways have been proposed to estimate large aberrations. The first one is based on a single-step estimation of the unwrapped phase; the second one is composed by two steps: first, the wrapped phase is estimated, then, if necessary, the phase is unwrapped. 1. Estimation of the Unwrapped Phase This method consists in simultaneously estimating and unwrapping the phase. It requires introducing information in the estimation process in addition to those contained in the data, in order to remove the 2π -indetermination and to construct a criterion that presents (if possible) a unique global minimum corresponding to the unwrapped phase. This information can be brought by an appropriate choice of the basis function for the expansion of the phase and/or by the use of an explicit regularization term on the phase parameters. In the phase-diversity literature, two different basis have been used to expand the phase. Zernike polynomials In all previous sections and as often done in the phase-diversity literature, the phase has been expanded on the Zernike modal basis (see Subsection I.B.4). A parameterization of the phase on an infinite number of Zernike polynomials leaves the likelihood criterion unchanged (Figure 21). But in practice, the phase is expanded on a finite number of polynomials. In this case, the minima of the criterion are no more equivalent and a unique global minimum corresponds to the unwrapped phase. Even if this decomposition seems to remove the ambiguity, in practice all the minima are very close to each other (see Figure 22), and it is almost impossible to reach the global minimum (corresponding to the unwrapped phase). In order to better differentiate between the global minimum and these others, it is essential to add more information on the phase by introducing
F IGURE 22. 1D representation of the likelihood term for the pupil phase expanded on a truncated Zernike decomposition.
56
MUGNIER ET AL .
a regularization term in the criterion. For imaging through turbulence, one can use the statistical knowledge on the atmospheric turbulence and introduce the a priori probability distribution presented in Subsection III.A.1, Eq. (19). This simultaneous estimation and unwrapping of the phase has been chosen by Thelen et al. (1999b). Delta functions Another possible basis for the phase expansion is the delta functions: the phase is estimated point by point in the pupil (i.e., at each sampled point in the pupil plane). By contrast with the truncated Zernike basis, this expansion of the phase leaves the indetermination unchanged, i.e., the likelihood term keeps the shape of Figure 21. Thus, in order to reach the unwrapped phase solution, one must introduce a priori information on the phase to “distort” the criterion and remove the equivalence between all 2πvariation phase estimates. The solution proposed by Jefferies et al. (2002) is to parameterize the phase by a function convolved with a smoothing kernel. This is a mean to impose smoothness on the estimated phase, and it has the same unwanted effect as using a truncated Zernike basis: it distorts the criterion into something like Figure 22. To summarize, for the Zernike modal regularization approach as well as for the point-by-point one, the regularized criterion presents many local minima. The global minimum corresponding to the searched unwrapped phase is unique but very close to the other local solutions. The success of the minimization of such criteria by the local minimization method (see Section VI) is very uncertain and strongly depends on the starting point of the iterative minimization (which should be very close to the unwrapped solution). Thus, the simultaneous estimation and unwrapping of the phase is very difficult. This is due to the fact that each process (estimation and unwrapping) is, by itself, a difficult problem. 2. Estimation of the Wrapped Phase (Then Unwrapping) An alternative approach is to keep the problem separated in two steps: first, estimate the wrapped phase φ[2π ], then, if necessary, unwrap the phase to obtain φ. This approach is guided by, on one hand, the above-mentioned difficulties to solve the two problems simultaneously, and, on the other hand, by noting that for some applications, the estimation of the sole wrapped phase φ[2π] is enough (as for postcorrection). In the phase-diversity literature, Löfdahl et al. (1998b) and Baba and Mutoh (2001) have chosen this twostep method. Their application was the estimation of static aberrations of a telescope by observing an unresolved point source (a star) through turbulence. We have also developed a phase estimation method based on this two-step approach. It is briefly presented in the following. More details can be found in Blanc (2002).
A TECHNIQUE FOR WAVE - FRONT SENSING
57
Estimation of φ[2π]: Construction of a regularized criterion In this approach, we first acknowledge the fact that it is almost impossible to find the global minimum of a criterion that has many almost equivalent minima. Thus, to estimate φ[2π ] without the minimization problems associated with local minima, all minima corresponding to a 2π-variation of the phase must remain strictly equivalent: the criterion has to keep the form of Figure 21. Concerning the likelihood term, it requires the choice of an appropriate basis for the phase expansion. As we indicate in Subsection VIII.B.1, the delta functions keep the equivalence. Hence, the phase is expanded on this basis. The implementation of the point-by-point phase estimation leads to a very large parameter space (compared to a truncated Zernike basis) and thus requires the use of an explicit regularization term on the phase. In order to keep the equivalence between all local minima, this term must also remain the indetermination. • Choice of the phase regularization function The goal of the a priori information on the phase is to ensure a good smoothing of the small gradients (i.e., corresponding to noise) but to be insensitive to the large gradients (corresponding to 2π -variations). We choose the following expression for the phase regularization function: ej (φl−1,m −φl,m ) − ej (φl,m −φl+1,m ) 2 Jregul (φ) = (l,m)∈S
2 + ej (φl,m−1 −φl,m ) − ej (φl,m −φl,m+1 ) .
(53)
The summation is done on all the pixels within the pupil (S is the pupil support). Furthermore, we impose a strict support constraint, that is, all terms | . . . |2 that contain, at least, a pixel out of the pupil support are suppressed. To ensure that the regularization function is insensitive to any 2π variation of the phase, the a priori information has been imposed on the phasors (i.e., ej φ ). Note that, even if the regularization term involves the phasors, the estimated parameters are still the phase values on pixels. In order to keep all the indeterminations of the global criterion, the regularization function has been constructed in such a way that it is insensitive, as the data are, to a global piston and to tip-tilt (see Sections I.B.4 and III.A.1). A first-order expansion of Eq. (53) provides insight on the effect of the regularization term on the phase. For small phase differences, Eq. (53) is approximately given by: Jregul (φ) |φl−1,m − 2φl,m + φl+1,m |2 (l,m)∈S
+ |φl,m−1 − 2φl,m + φl,m+1 |2 .
58
MUGNIER ET AL .
It corresponds to a quadratic penalization on the second-order differences of the phase values. • MAP estimation We propose a MAP approach for the point-by-point phase estimation: φˆ MAP = arg min JMAP (φ).
(54)
φ
The expression of φˆ MAP is obtained by using Eq. (37), where the regularization term on the Zernike coefficients a t Ra−1 a is replaced by Eq. (53). Eq. (37) becomes: LMAP (φ, θ) = ln So (v) + N 2 ln σ 2 v
2 2 σ2 ˜ ˜ + ln h1 (φ, v) + h2 (φ, v) + So (v) v +
1 |˜ı1 (v)h˜ 2 (φ, v) − ı˜2 (v)h˜ 1 (φ, v)|2 2 v σ 2 |h˜ 1 (φ, v)|2 + |h˜ 2 (φ, v)|2 + σ 2 So (v)
1 |h˜ 1 (φ, v)o˜ m (v) − ı˜1 (v)|2 + |h˜ 2 (φ, v)o˜ m (v) − ı˜2 (v)|2 + 2 2 v So (v) |h˜ 1 (φ, v)|2 + |h˜ 2 (φ, v)|2 + Soσ(v) ej (φl−1,m −φl,m ) − ej (φl,m −φl+1,m ) 2 +γ (l,m)∈S
2 + ej (φl,m−1 −φl,m ) − ej (φl,m −φl,m+1 ) ,
(55)
where γ is the hyperparameter that quantifies the trade-off between goodness of fit to the data and to the prior. There is no closed-form expression for φˆ that minimizes the criterion LMAP (φ, θ), so the minimization has to be done using an iterative method. • Tuning of the hyperparameters — Noise and object: As in Subsection III.B.6, the hyperparameters linked to the noise and the object are estimated jointly with the phase parameters. We recall that: ˆ θˆ o , θˆ b ) = arg min JMAP (φ, θˆ o , θˆ b , θˆ φ ). (φ,
(56)
φ,θ o ,θ b
— Phase: The value of γ has to be adjusted by hand. In fact, γ cannot be jointly estimated with φ (the solution γ = 0 corresponding to a null regularization minimizes the criterion taken as a function of γ for any given value of the phase).
A TECHNIQUE FOR WAVE - FRONT SENSING
59
Phase-unwrapping The preceding estimation procedure produces an estimate of the wrapped phase. For applications requiring the phase itself, it is necessary, in a second step, to calculate the phase from the wrapped phase. In this second step, a phase-unwrapping method is used. The phase-unwrapping problem is found in a variety of applications and several phase-unwrapping methods have been proposed in the literature (see, e.g., Ghiglia and Pritt, 1998). We will not develop these methods here. In the following, we will focus on the wrapped-phase estimation, which is specific to phase diversity contrary to the phase-unwrapping problem. C. Simulation Results We will show simulation results obtained with the MAP estimator on turbulence-induced aberrations. 1. Choice of an Error Metric Because we focus on the estimation of the wrapped phase φ[2π ], we have developed a metric that quantifies the quality of the phase estimate within the interval [−π, π ]. 1 c(l, m)2 (57) ε= A with c(l, m) =
(l,m)∈S
j (φ true −φ estimated ) j (φ true −φ estimated ) 2 , e l,m l,m − e l,m l,m
(58)
where A is the number of pixels in the pupil. c is the 2D error map that indicates the difference between the true phase and its estimate at each pupil pixel. Note that this error metric is insensitive, as the data are, to any 2πvariation of the phase and to piston terms. A first-order expansion of Eq. (57) shows that, for small errors on the phase, ε corresponds to the standard deviation of the residual phase. 2. Results Data generation Twenty turbulent wave-fronts are obtained using a modal method (Roddier, 1990): the phase is expanded on the Zernike polynomial basis and given Kolmogorov statistics. The strength of the turbulence is fixed by the ratio D/ro of the telescope diameter to the Fried parameter (see Subsection I.B.3). For each of these turbulent phases, we compute a Shannonsampled image. The image noise is a uniform Gaussian noise of 1%. The defocus amplitude between the two images is set to 2π radians, peak to valley.
60
MUGNIER ET AL .
F IGURE 23. Example of a turbulent phase estimate (D/ro = 30) from a point source object. The 2D error map representation has been multiplied by 10 compared to the others. The error term ε is equal to λ/28.
Point source We first use a point source object. The strength of the turbulence is set to D/ro = 30 (this corresponds to very strong turbulence conditions). The theoretical spatial standard deviation of the phase is then equal to 2.7λ (Noll, 1976), which corresponds to a peak-to-valley amplitude of 30π radians. Despite this high value, this case is a favorable situation because the object is known and the noise level is low. Accurate results are then obtained, even without regularization on the phase (i.e., the pointby-point phase is estimated using the maximum likelihood approach). The average error ε on the 20 turbulent wave-fronts is equal to λ/30. An example of a phase estimate is shown Figure 23 (middle), to be compared to the true phase (left). The comparison of these two phases by visual inspection does not give a correct idea of the quality of the estimation. This is due to the fact that the phase estimate presents several 2π -jumps, whereas the true phase is unwrapped. The good quality of the estimate can be seen through the 2D error map c (Figure 23, right) or in Figure 24, which shows the true phase and the estimated one both wrapped within [−π, π ]. Extended object The discrete extended object used here is a spiral galaxy (Figure 25). Three different strengths of turbulence are studied: low (D/ro = 4), medium (D/ro = 6), and strong turbulence levels (D/ro = 8). Phase estimates are obtained using the MAP approach of Eq. (55). • Low turbulence level The strength of turbulence is set to D/ro = 4, corresponding to a peak-to-valley amplitude of 1.2π radians. The pointby-point phase estimates have been obtained using a focused image and a defocused one and the starting phase estimate of the iterative criterion minimization is zero. The average error ε on the 20 phase estimates is equal to λ/60. Figure 26 (middle) shows an example of a phase estimate for
A TECHNIQUE FOR WAVE - FRONT SENSING
61
F IGURE 24. Example of a turbulent phase estimate (D/ro = 30) from a point source object: left, the true phase within [−π, π ] and right, the phase estimate within [−π, π ].
F IGURE 25.
Extended object used for simulations.
D/ro = 4, compared to the true phase (left). The right side of this figure depicts the (10 times magnified) 2D error map c. • Medium turbulence level The strength of turbulence is now set to D/ro = 6, corresponding to a peak-to-valley amplitude of 1.5π radians. The quality of the point-by-point phase estimates obtained with a focused and a defocused image is poor (ε = λ/15). This is probably due to the presence of local minima in the MAP criterion. Obviously, although we have constructed a criterion such that all minima corresponding to a 2π -variation of the phase are equivalent, there are other reasons for the presence of local minima such as the nonlinear relationship between the phase and the image [see Eq. (1)]. To improve the estimation, we have used an additional defocused image (with a defocus from the focused plane
62
MUGNIER ET AL .
F IGURE 26. Example of a turbulent phase estimate (D/ro = 4) from the galaxy object. The 2D error map representation has been multiplied by 10 compared to the others. The error term is equal to λ/60.
equal to 4π). The reconstruction clearly shows an improvement when the three images are used: the mean error ε becomes λ/40. • Strong turbulence level Finally, we consider stronger conditions of turbulence with D/ro = 8, corresponding to a peak-to-valley amplitude of 2π radians. The use of three images leads to a poor quality of estimation (i.e., the mean error ε is equal to λ/17). We propose, as suggested by Jefferies et al. (2002), to improve the estimation by using a better starting phase estimate point for the criterion minimization. An initial estimate of low-order Zernike polynomial coefficients (that exhibit greater power) is performed: the phase is expanded on the first Zernike coefficients a4 to a6 . After estimation of these few Zernike coefficients, the estimation of the point-by-point phase parameters is started using the corresponding phase as the starting estimate for the minimization of the MAP criterion. The use of this starting point for the minimization makes the estimation more precise: the average error ε obtained using three images becomes λ/45. This starting point is closer to a global minimum than the null phase initially used. Figure 27 (middle) shows an example of a phase estimate for D/ro = 8, compared to the true phase (left). On the right side of this figure, the 2D error map c is depicted. To summarize, we have shown that the estimation of large aberrations from a known object and for a low noise level does not require the use of an explicit regularization term on the phase. For extended objects, the use of the two conventional images of the phase diversity perform well for low turbulence levels. For medium turbulence levels (here D/ro = 6), the quality of the phase estimate is enhanced through the use of an additional defocused image. Finally, for higher turbulence cases, the accuracy is maintained if the pointby-point phase estimation is done with three images and is preceded by a first
A TECHNIQUE FOR WAVE - FRONT SENSING
63
F IGURE 27. Example of a turbulent phase estimate (D/ro = 8) from the galaxy object. The 2D error map representation has been multiplied by 10 compared to the others. Three images have been used, and the initial phase estimate is obtained using a first estimation of the Zernike coefficients a4 to a6 . The error term is equal to λ/45.
modal estimation of the low spatial frequencies of the aberrated phase. The measurement of large aberrations by use of phase diversity is a problem that has only been recently addressed. The results of the works referenced in these pages together with the ones presented here are quite promising and open the way to making phase diversity a practical wave-front sensor for adaptive optics.
IX. E MERGING A PPLICATIONS : C OPHASING OF M ULTIAPERTURE T ELESCOPES A. Background The resolution of a telescope is ultimately limited by its aperture diameter. The latter is limited by current technology to about 10 m for ground-based telescopes and to a few meters for space-based telescopes because of volume and mass considerations. Interferometry allows going beyond this limit; it consists in making an array of subapertures interfere; the resulting instrument is called an interferometer or a multiaperture telescope. So far, this technique has been used solely on ground-based instruments. The subapertures can either be telescopes per se, as in astronomy (e.g., VLT interferometer, Navy prototype optical interferometer) or segments of a common primary mirror. If these segments are adjacent, such as in the Keck telescope, the instrument is referred to as a segmented telescope rather than an interferometer, even though it is conceptually one. Regarding high-resolution space-borne missions, interferometers are forecast in astronomy (with a segmented aperture for the
64
MUGNIER ET AL .
James Webb Space Telescope and a diluted aperture for the Darwin9 mission, for instance) and can also be considered for Earth observation (Mugnier et al., 2004). For a correct performance, the aperture of such an instrument must be phased to within a small fraction of the wavelength. A critical subsystem of interferometers is thus the cophasing sensor (CS), whose goal is to measure the relative positioning (differential piston and tip-tilt) of the subapertures, which are the main sources of wave-front degradations, and possibly the higher-order aberrations on each sub-aperture. Differential piston and tip-tilt measurement has been studied extensively and demonstrated for distant ground-based telescopes. Most of the proposed devices are based on a pupil-plane combination of the light coming from a given pair of subapertures. Because of their pupil-plane combination, the contrast of interference fringes decreases quickly as the object extension increases, which makes these devices useless on very extended scenes such as the Earth viewed from Space. Because of the pairwise light combination, these devices become impractical for instruments made of many subapertures. The fact that phase diversity can be used as a CS on a segmented aperture telescope was recognized very early (Paxman and Fienup, 1988). Additionally, in contrast with the above-mentioned devices, phase diversity enjoys two appealing characteristics: first, it is appropriate for an instrument with a large number of subapertures, because the complexity of the hardware does not scale with the number of subapertures and remains essentially independent of it. Second, it can be used on very extended objects. This first property and the absence of noncommon-path aberrations (see, e.g., Subsection VII.B) are two strong motivations for the choice of phase diversity as a CS, even when looking at an unresolved source. Phase-diversity experiments as a CS on a point source have been performed on the ground, notably to cophase the segments of the Keck telescope (Löfdahl et al., 1998b). Regarding space instruments, phase diversity has long been foreseen as a candidate of choice for the JWST segmented-aperture telescope (see, e.g., Redding, 1998; Carrara et al., 2000; Lee et al., 2003), and has recently been selected as the CS for the Darwin interferometer (Cassaing et al., 2003; Mocoeur et al., 2005). In 1994 (Kendrick et al., 1994a, 1994b), phase diversity was experimentally validated as a CS on an extended source for correcting static aberrations in real time with a segmented mirror. Quite recently, the loop was also closed with a phase-diversity CS to correct static aberrations in the difficult framework of 9 Darwin is a forecast European Space Agency mission whose aim is to find and characterize Earth-like planets. The instrument will be a so-called nulling interferometer, which cancels the light coming from the star in order to detect the planet—see, e.g., http://www.esa.int/science/darwin for more information.
A TECHNIQUE FOR WAVE - FRONT SENSING
65
a six-telescope imaging interferometer using broadband light (Zarifis et al., 1999; Seldin and Paxman, 2000). B. Experimental Results on an Extended Scene As mentioned in the previous subsection, phase diversity can be used with extended scenes. Actually, for applications such as Earth observation from Space, the scenes extend beyond the field of view recorded by the image, and phase diversity is one of the very few possible CSs (Mugnier et al., 2004). We have designed, built, and validated a prototype phase-diversity CS for this application. After a short presentation of this prototype and its testbed, we present its latest results.10 A more comprehensive presentation of the testbed along with earlier results can be found in Sorrente et al. (2004) and Baron (2005). A schematic view of the testbed, called BRISE (Banc Reconfigurable d’Imagerie sur Scènes Etendues) is shown on Figure 28. BRISE is mainly
F IGURE 28. Schematic view of the BRISE testbed and photograph of the deformable mirror (DEF), courtesy F. Cassaing. EXT: extended object; REF: reference point-source; ARC: arc lamp used for the illumination of EXT. 10 These experimental results are courtesy of I. Mocœur, after preliminary results by F. Baron. F. Cassaing is gratefully acknowledged for being the master architect of this prototype and testbed and B. Sorrente for overseeing their realization.
66
MUGNIER ET AL .
composed of four modules (source, perturbation, detection, and control), described below. The source module delivers two objects: an extended scene, which is an Earth scene on a high-resolution photographic plate illuminated by an arc lamp, and a reference point source, which is the output of a monomode fiber fed with a He–Ne laser. The perturbation module has three functions: it images the source on the detector, defines the aperture configuration, and introduces calibrated aberrations; its main component is the DM, which performs the latter function. In order to introduce only piston and tip-tilt, we have chosen to manufacture a specific segmented DM consisting of three planar mirrors mounted on piezoactuated platforms by Physik Instrument, which have exactly these three degrees of freedom. The detection module is a water-cooled CCD camera that simultaneously records a focal-plane image and a defocused image of each of the two objects to implement a phase-diversity CS. Figure 29 shows an experimental example of such an image. The control module drives the experiment. Special care has been given to the control of errors that could limit CS performance or the evaluation of the CS performance on extended objects. In particular, the two objects are observed simultaneously through very close paths, to minimize the differential effects of field aberrations, vibrations, or
F IGURE 29. Focused (left) and defocused (right) experimental images of the extended scene (bottom) and reference point-source (top) objects. These images are recorded simultaneously on different parts of the same detector and used for phase diversity.
A TECHNIQUE FOR WAVE - FRONT SENSING
67
air turbulence. A very accurate aberration calibration can thus be achieved thanks to the high SNR of the measurement obtained on the reference point source. Figure 30 presents the piston measured at high photon level on a given subaperture as a function of the piston effectively introduced by the DM, for the reference point source at λr = 633 nm and for the extended scene, illuminated with white light and a spectral filter of width 40 nm centered around λe = 650 nm. For each introduced piston, three measurements are performed and reported on this figure. The point-source measurements exhibit an excellent linearity between roughly −λr /2 and +λr /2, at which points the expected modulo 2π-wrapping occurs. With the extended object, the curve is linear on a slightly smaller piston range. Some features are different on this curve with respect to the one obtained with the reference point: the slope is not exactly unity, although this would not be a major problem in closed loop, and the sort of smooth wraparound that occurs around +λe /2 is somewhat surprising and currently interpreted as a consequence of the spectral bandwidth. Figure 31 shows the repeatability obtained on the piston measurement with the extended object. The standard deviation of the estimated piston is, as expected, dominated by detector noise for low fluxes and then inversely proportional to the square root of the number of photons
F IGURE 30. Piston measured at high photon level on the first subaperture, as a function of the piston effectively introduced by the DM.
68
MUGNIER ET AL .
F IGURE 31. Repeatability obtained on the measurement of the piston on the first sub-aperture with the extended object, as a function of the average photon level per pixel.
per pixel (photon-noise regime). It is, for instance, below 1 nm as soon as the average flux is above 1000 photo-electrons per pixel. C. Conclusion Both the literature cited in this section and the quantitative results presented in it testify that phase diversity can be successfully used as a CS on segmentedaperture telescopes and on interferometers, for point sources as well as for extended objects. Some challenges remain to be met before this use is widespread. They are essentially the same as for single-aperture telescope wave-front sensing: the ability to sense large-amplitude aberrations and the reduction of the computing cost for use in real-time applications, be it the compensation of atmospheric turbulence for ground-based instruments or the compensation of environmental perturbations for space-based instruments.
R EFERENCES Acton, D.S., Soltau, D., Schmidt, W. (1996). Full-field wavefront measurements with phase diversity. Astron. Astrophys. 309, 661–672.
A TECHNIQUE FOR WAVE - FRONT SENSING
69
Baba, N., Mutoh, K. (1994). Iterative reconstruction method in phasediversity imaging. Appl. Opt. 33 (20), 4428–4433. Baba, N., Mutoh, K. (2001). Measurement of telescope aberrations through atmospheric turbulence by use of phase diversity. Appl. Opt. 40 (4), 544– 552. Baron, F. (2005). Définition et test d’un capteur de cophasage sur télescope multipupilles: application à la détection d’exoplanètes et à l’observation de la Terre. PhD thesis, Ecole Doctorale d’Astronomie et d’Astrophysique d’Ile de France. Bauschke, H.H., Combettes, P.L., Luke, D.R. (2002). Phase retrieval, Gerchberg–Saxton algorithm, and Fienup variants: A view from convex optimization. J. Opt. Soc. Am. A 19 (7), 1334–1345. Blanc, A. (2002). Identification de réponse impulsionnelle et restauration d’images: apports de la diversité de phase. PhD thesis, Université Paris XI Orsay, July. Blanc, A., Fusco, T., Hartung, M., Mugnier, L.M., Rousset, G. (2003a). Calibration of NAOS and CONICA static aberrations. Application of the phase diversity technique. Astron. Astrophys. 399, 373–383. Blanc, A., Mugnier, L.M., Idier, J. (2003b). Marginal estimation of aberrations and image restoration by use of phase diversity. J. Opt. Soc. Am. A 20 (6), 1035–1045. Born, M., Wolf, E. (1993). Principles of Optics. Pergamon Press. Sixth (corrected) edition. Bucci, O.M., Capozzoli, A., D’Elia, G. (1999). Regularizing strategy for image restoration and wave-front sensing by phase diversity. J. Opt. Soc. Am. A 16 (7), 1759–1768. Byrne, C.L. (1997). Convergent block-iterative algorithms for image reconstruction from inconsistent data. IEEE Trans. Image Processing 6 (9), 1296–1304. Carrara, D.A., Thelen, B.J., Paxman, R.G. (2000). Aberration correction of segmented-aperture telescopes by using phase diversity. In: Fiddy, M.A., Millane, R.P. (Eds.), Image Reconstruction from Incomplete Data, vol. 4123. Soc. Photo-Opt. Instrum. Eng., pp. 56–63. Cassaing, F., Baron, F., Schmidt, E., Hofer, S., Mugnier, L.M., Barillot, M., Rousset, G., Stuffler, T., Salvad’e, Y. (2003). DARWIN Fringe Sensor (DWARF): Concept study. In: Towards Other Earths, vol. SP-539, Conference date: April 2003, ESA, pp. 389–392. Censor, Y., Eggermont, P.P.B., Gordon, D. (1983). Strong underrelaxation in Kaczmarz’s method for inconsistent systems. Numerische Mathematik 41, 83–92. Champagnat, F., Idier, J. (1995). An alternative to standard maximum likelihood for Gaussian mixtures. In: ICASSP, pp. 2020–2023.
70
MUGNIER ET AL .
Conan, J.-M. (1994) Étude de la correction partielle en optique adaptative. PhD thesis, Université Paris XI Orsay, October. Conan, J.M., Madec, P.Y., Rousset, G. (1994). Image formation in adaptive optics partial correction. In: Merkle, F. (Ed.), Active and Adaptive Optics, ESO Conference and Workshop Proceedings, vol. 48, Garching bei München Germany, ESO/ICO, pp. 181–186. Conan, J.-M., Mugnier, L.M., Fusco, T., Michau, V., Rousset, G. (1998). Myopic deconvolution of adaptive optics images using object and point spread function power spectra. Appl. Opt. 37 (21), 4614–4622. Conan, J.-M., Rousset, G., Madec, P.-Y. (1995). Wave-front temporal spectra in high-resolution imaging through turbulence. J. Opt. Soc. Am. A 12 (12), 1559–1570. De Carvalho, E., Slock, D. (1997). Maximum-likelihood blind FIR multichannel estimation with Gaussian prior for the symbols. In: ICASSP, pp. 3593–3596. Ellerbroek, B.L., Thelen, B.J., Lee, D.J., Paxman, R.G. (1997). Comparison of Shack–Hartmann wavefront sensing and phase-diverse phase retrieval. In: Tyson, R.K., Fugate, R.Q. (Eds.), Adaptive Optics and Applications, vol. 3126. Soc. Photo-Opt. Instrum. Eng., pp. 307–320. Fienup, J.R. (1982). Phase retrieval algorithms: A comparison. Appl. Opt. 21 (15), 2758–2769. Fienup, J.R., Marron, J.C., Schulz, T.J., Seldin, J.H. (1993). Hubble space telescope characterized by using phase-retrieval algorithms. Appl. Opt. 32 (10), 1747–1767. Fontanella, J.-C. (1985). Analyse de surface d’onde, déconvolution et optique active. J. Optics (Paris) 16 (6), 257–268. Fried, D.L. (1965). Statistics of a geometric representation of wavefront distortion. J. Opt. Soc. Am. 55 (11), 1427–1435. Fried, D.L. (1966a). Limiting resolution looking down through the atmosphere. J. Opt. Soc. Am. 56 (10), 1380–1384. Fried, D.L. (1966b). Optical resolution through a randomly inhomogeneous medium for very long and very short exposures. J. Opt. Soc. Am. 56, 1372– 1379. Gantmacher, F.R. (1966). L’algorithme de Gauss et quelques-unes de ses applications. In: Théorie des matrices, Tome I. Dunod, pp. 42–50. Chapter II. Gates, E.L., Restaino, S.R., Carreras, R.A., Dymale, R.C., Loos, G.C. (1994). Phase diversity as an on-line wavefront sensor: Experimental results. In: Schulz, T.J., Snyder, D.L. (Eds.), Image Reconstruction and Restoration, vol. 2302. Soc. Photo-Opt. Instrum. Eng., pp. 330–339. Gerchberg, R.W. (1974). Super-resolution through error energy reduction. Opt. Acta 21, 709–720.
A TECHNIQUE FOR WAVE - FRONT SENSING
71
Gerchberg, R.W., Saxton, W.O. (1972). A practical algorithm for the determination of phase from image and diffraction plane pictures. Optik 35, 237–246. Ghiglia, D.C., Pritt, M.D. (1998). Two-Dimensional Phase Unwrapping: Theory, Algorithms, and Software. Wiley-Interscience. Gonsalves, R.A. (1976). Phase retrieval from modulus data. J. Opt. Soc. Am. 66 (9), 961–964. Gonsalves, R.A. (1982). Phase retrieval and diversity in adaptive optics. Opt. Eng. 21 (5), 829–832. Gonsalves, R.A. (1994). Nonisoplanatic imaging by phase diversity. Opt. Lett. 19 (7), 493–495. Gonsalves, R.A. (1997). Compensation of scintillation with a phase-only adaptive optic. Opt. Lett. 22, 588–590. Goodman, J.W. (1968). Introduction to Fourier Optics. McGraw-Hill. Goussard, Y., Demoment, G., Idier, J. (1990). A new algorithm for iterative deconvolution of sparse spike. In: ICASSP, pp. 1547–1550. Hartung, M., Blanc, A., Fusco, T., Lacombe, F., Mugnier, L.M., Rousset, G., Lenzen, R. (2003). Calibration of NAOS and CONICA static aberrations. Experimental results. Astron. Astrophys. 399, 385–394. Hunt, B.R. (1973). The application of constrained least squares estimation to image restoration by digital computer. IEEE Trans. Comp. C-22 (9), 805– 812. Idier, J., Mugnier, L., Blanc, A. (2005). Statistical behavior of joint least square estimation in the phase diversity context. IEEE Trans. Image Processing, in press. Jefferies, S.M., Lloyd-Hart, M., Keith Hege, E., Georges, J. (2002). Sensing wave-front amplitude and phase with phase diversity. Appl. Opt. 41 (11), 2095–2102. Kattnig, A.P., Primot, J. (1997). Model of the second-order statistic of the radiance field of natural scenes, adapted to system conceiving. In: Park, S.K., Juday, R.D. (Eds.), Visual Information Processing VI, vol. 3074. Soc. Photo-Opt. Instrum. Eng., pp. 132–141. Kendrick, R.L., Acton, D.S., Duncan, A.L. (1994a). Experimental results from the Lockheed phase diversity test facility. In: Schulz, T.J., Snyder, D.L. (Eds.), Image Reconstruction and Restoration, vol. 2302. Soc. PhotoOpt. Instrum. Eng., pp. 312–322. Kendrick, R.L., Acton, D.S., Duncan, A.L. (1994b). Phase-diversity wavefront sensor for imaging systems. Appl. Opt. 33 (27), 6533–6546. Kendrick, R.L., Bell, R., Duncan, A.L. (1998). Closed loop wave front correction using phase diversity. In: Bely, P.Y., Breckinridge, J.B. (Eds.), Space Telescopes and Instruments V, vol. 3356. Soc. Photo-Opt. Instrum. Eng., pp. 844–853.
72
MUGNIER ET AL .
Labeyrie, A. (1970). Attainment of diffraction-limited resolution in large telescopes by Fourier analysing speckle patterns. Astron. Astrophys. 6, 85– 87. Lee, D.J., Roggemann, M.C., Welsh, B.M. (1999). Cramer–Rao analysis of phase-diverse wave-front sensing. J. Opt. Soc. Am. A 16 (5), 1005–1015. Lee, D.J., Roggemann, M.C., Welsh, B.M., Crosby, E.R. (1997a). Evaluation of least-squares phase-diversity technique for space telescope wave-front sensing. Appl. Opt. 36, 9186–9197. Lee, D.J., Welsh, B.M., Roggemann, M.C. (1997b). Diagnosing unknown aberrations in an adaptive optics system by use of phase diversity. Opt. Lett. 22 (13), 952–954. Lee, L.H., Vasudevan, G., Smith, E.H. (2003). Point-by-point approach to phase-diverse phase retrieval. In: Mather, J.C. (Ed.), IR Space Telescopes and Instruments, vol. 4850. Soc. Photo-Opt. Instrum. Eng., pp. 441–452. Lehmann, E. (1983). Theory of Point Estimation. Wiley. Lenzen, R., Hofmann, R., Bizenberger, P., Tusche, A. (1998). CONICA: The high-resolution near-infrared camera for the ESO VLT. In: Fowler, A.M. (Ed.), Infrared Astronomical Instrumentation, vol. 3354. Soc. Photo-Opt. Instrum. Eng., pp. 606–614. Little, R.J.A., Rubin, D.B. (1983). On jointly estimating parameters and missing data by maximizing the complete-data likelihood. The American Statistician 37 (3), 218–220. Löfdahl, M.G., Kendrick, R.L., Harwit, A., Mitchell, K.E., Duncan, A.L., Seldin, J.H., Paxman, R.G., Acton, D.S. (1998b). Phase diversity experiment to measure piston misalignement on the segmented primary mirror of the Keck II telescope. In: Bely, P.Y., Breckinridge, J.B. (Eds.), Space Telescopes and Instruments V, vol. 3356. Soc. Photo-Opt. Instrum. Eng., pp. 1190–1201. Löfdahl, M.G., Scharmer, G.B. (1994). Wavefront sensing and image restoration from focused and defocused solar images. Astron. Astrophys. 107, 243–264. Löfdahl, M.G., Scharmer, G.B. (2000). A predictor approach to closed-loop phase-diversity wavefront sensing. In: Breckinridge, J.B., Jakobsen, P. (Eds.), UV, Optical and IR Space Telescopes and Instruments, vol. 4013. Soc. Photo-Opt. Instrum. Eng., pp. 737–748. Löfdahl, M.G., Scharmer, G.B. (2002). Phase diverse speckle inversion applied to data from the Swedish 1-meter solar telescope. In: Keil, Avakyan (Eds.), Innovative Telescopes and Instrumentation for Solar Astrophysics, vol. 4853. Soc. Photo-Opt. Instrum. Eng. Löfdahl, M.G., Scharmer, G.B., Wei, W. (2000). Calibration of a deformable mirror and Strehl ratio measurements by use of phase diversity. Appl. Opt. 39 (1), 94–103.
A TECHNIQUE FOR WAVE - FRONT SENSING
73
Löfdahl, M.G., Duncan, A.L., Scharmer, G.B. (1998a). Fast phase diversity wavefront sensor for mirror control. In: Bonaccini, D., Tyson, R.K. (Eds.), Adaptive Optical System Technologies, vol. 3353. Soc. Photo-Opt. Instrum. Eng., pp. 952–963. Luke, D.R., Burke, J.V., Lyon, R.G. (2000). Fast algorithms for phase diversity and phase retrieval. In: Lyon R.G. (Ed.), Proceedings of the Workshop on Computational Optics and Imaging for Space Applications, NASA/GSFC, May, pp. 130–150. Meynadier, L., Michau, V., Velluet, M.-T., Conan, J.-M., Mugnier, L.M., Rousset, G. (1999). Noise propagation in wave-front sensing with phase diversity. Appl. Opt. 38 (23), 4967–4979. Misell, D.L. (1973). An examination of an iterative method for the solution of the phase problem in optics and electron optics: I. Test calculations. J. Phys. D: Appl. Phys. 6, 2200–2216. Mocoeur, I., Cassaing, F., Baron, F., Mugnier, L.M., Rousset, G., Sorrente, B., Blanc, A. (2005). Multi-telescope interferometer cophasing for astronomy. In: Semaine de l’astrophysique Française. EDP Sciences. Moré, J.J., Thuente, D.J. (1994). Line search algorithms with guaranteed sufficient decrease. ACM Transactions on Mathematical Software 20, 286– 307. Mugnier, L., Cassaing, F., Sorrente, B., Baron, F., Velluet, M.-T., Michau, V., Rousset, G. (2004). Multiple-aperture optical telescopes: Some key issues for Earth observation from a GEO orbit. In: 5th International Conference on Space Optics, vol. SP-554, Toulouse, France, CNES/ESA, ESA, pp. 181–187. Mugnier, L.M., Robert, C., Conan, J.-M., Michau, V., Salem, S. (2001). Myopic deconvolution from wavefront sensing. J. Opt. Soc. Am. A 18, 862– 872. Nocedal, J., Wright, S.J. (1999). Numerical Optimization. Springer Texts in Operations Research. Springer-Verlag, New York. Noll, R.J. (1976). Zernike polynomials and atmospheric turbulence. J. Opt. Soc. Am. 66 (3), 207–211. Papoulis, A. (1975). A new algorithm in spectral analysis and band-limited extrapolation. IEEE Trans. Circuits Syst. CAS-22 (9), 735–742. Paxman, R.G., Crippen, S.L. (1990). Aberration correction for phased-array telescopes using phase diversity. In: Gmitro, A.F., Idell, P.S., LaHaie, I.J. (Eds.), Digital Image Synthesis and Inverse Optics, vol. 1351. Soc. PhotoOpt. Instrum. Eng., pp. 787–797. Paxman, R.G., Fienup, J.R. (1988). Optical misalignment sensing and image reconstruction using phase diversity. J. Opt. Soc. Am. A 5 (6), 914–923. Paxman, R.G., Schulz, T.J., Fienup, J.R. (1992). Joint estimation of object and aberrations by using phase diversity. J. Opt. Soc. Am. A 9 (7), 1072–1085.
74
MUGNIER ET AL .
Paxman, R.G., Seldin, J.H., Löfdahl, M.G., Scharmer, G.B., Keller, C.U. (1996). Evaluation of phase-diversity techniques for solar-image restoration. Astrophys. J. Paxman, R.G., Thelen, B.J., Carrara, D.A., Seldin, J.H., Gleichman, K.W. (1998). Myopic deblurring of space-variant blur by using phase-diverse speckle. IEEE Trans. Image Processing. Paxman, R.G., Thelen, B.J., Seldin, J.H. (1994). Phase-diversity correction of turbulence-induced space-variant blur. Opt. Lett. 19 (16), 1231–1233. Prasad, S. (2004). Information-optimized phase diversity speckle imaging. Opt. Lett. 29 (6), 563–565. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P. (1992). Numerical Recipes in C, the Art of Scientific Computing, 2nd ed. Cambridge University Press, New York. Primot, J., Rousset, G., Fontanella, J.-C. (1988). Image deconvolution from wavefront sensing: Atmospheric turbulence simulation cell results. In: Ulrich, M.-H. (Ed.), Very Large Telescopes and their Instrumentation, vol. II, ESO Conference and Workshop Proceedings, vol. 30, Garching bei München, Germany, ESO, pp. 683–692. Redding, D., et al. (1998). Wavefront sensing and control for a Next Generation Space Telescope. In: Bely, P.Y., Breckinridge, J.B. (Eds.), Space Telescopes and Instruments V, vol. 3356 (2). Soc. Photo-Opt. Instrum. Eng., pp. 758–772. Roddier, C., Roddier, F. (1991). Reconstruction of the Hubble Space Telescope mirror figure from out-of-focus stellar images. In: Bely, P.Y., Breckinridge, J.B. (Eds.), Space Astronomical Telescopes and Instruments, vol. 1494. Soc. Photo-Opt. Instrum. Eng., pp. 78–84. Roddier, C., Roddier, F. (1993). Combined approach to the Hubble space telescope wave-front distortion analysis. Appl. Opt. 32 (16), 2992–3008. Roddier, F. (1981). The effects of atmospherical turbulence in optical astronomy. In: Wolf, E. (Ed.), Progress in Optics, vol. XIX. North-Holland, Amsterdam, pp. 281–376. Roddier, F. (1988). Curvature sensing and compensation: A new concept in adaptive optics. Appl. Opt. 27 (7), 1223–1225. Roddier, F. (Ed.) (1999). Adaptive Optics in Astronomy. Cambridge University Press, Cambridge. Roddier, F., Gilli, J.M., Lund, G. (1982). On the origin of speckle boiling and its effects in stellar speckle interferometry. J. of Optics (Paris) 13 (5), 263–271. Roddier, N. (1990). Atmospheric wavefront simulation using Zernike polynomials. Opt. Eng. 29 (10), 1174–1180. Roggemann, M.C. (1991). Limited degree-of-freedom adaptive optics and image reconstruction. Appl. Opt. 30 (29), 4227–4233.
A TECHNIQUE FOR WAVE - FRONT SENSING
75
Rousset, G. (1999). Wave-front sensors. In: Roddier (1999), Chapter 5, pp. 91–130. Rousset, G., Lacombe, F., Puget, P., Hubin, N., Gendron, E., Conan, J.-M., Kern, P., Madec, P.-Y., Rabaud, D., Mouillet, D., Lagrange, A.-M., Rigaut, F. (1998). Design of the Nasmyth Adaptive Optics System (NAOS) of the VLT. In: Bonaccini, D., Tyson, R.K. (Eds.), Astronomical Telescopes & Instrumentation, vol. 3353. Soc. Photo-Opt. Instrum. Eng. Scharmer, G.B. (1999). Object-independent fast phase-diversity. In: Rimmele, T.R., Balasubramaniam, K.S., Radick, R.R. (Eds), High Resolution Solar Physics: Theory, Observations and Techniques, Astron. Soc. Pacific Conf. Series, vol. 183, pp. 330–341. Schulz, T.J. (1993). Multiframe blind deconvolution of astronomical images. J. Opt. Soc. Am. A 10 (5), 1064–1073. Seldin, J.H., Paxman, R.G. (1994). Phase-diverse speckle reconstruction of solar data. In: Schulz, T.J., Snyder, D.L. (Eds.), Image Reconstruction and Restoration, vol. 2302. Soc. Photo-Opt. Instrum. Eng., pp. 268–280. Seldin, J.H., Paxman, R.G. (2000). Closed-loop wavefront sensing for a sparse-aperture, phased-array telescope using broadband phase diversity. In: Breckinridge, J.B., Carreras, R.A., Czyzak, S.R., Eckart, M.J., Fiete, R.D., Idell, P.S. (Eds.), Imaging Technology and Telescopes, vol. 4091. Soc. Photo-Opt. Instrum. Eng., pp. 48–63. Seldin, J.H., Paxman, R.G., Ellerbroek, B.L. (1996a). Post-detection correction of compensated imagery using phase-diverse speckle. In: Cullum, M. (Ed.), Proceedings of the ESO/OSA Topical Meeting on Adaptive Optics, No. 54, ESO Conference and Workshop Proceedings, ESO, pp. 471–476. Seldin, J.H., Paxman, R.G., Ellerbroek, B.L., Johnston, D.C. (1996b). Phasediverse speckle restorations of artificial satellites imaged with adaptiveoptics compensation. In: Adaptive Optics, No. 13, OSA. Seldin, J.H., Reiley, M.F., Paxman, R.G., Stribling, B.E., Ellerbroeck, B.L., Johnston, D.C. (1997). Space-object identification using phase-diverse speckle. In: Schulz, T.J. (Ed.), Imaging Reconstruction and Restoration II, vol. 3170. Soc. Photo-Opt. Instrum. Eng., pp. 2–15. Shack, R.B., Plack, B.C. (1971). Production and use of a lenticular Hartmann screen (abstract). J. Opt. Soc. Am. 61, 656. Sorrente, B., Cassaing, F., Baron, F., Coudrain, C., Fleury, B., Mendez, F., Michau, V., Mugnier, L., Rousset, G., Rousset-Rouvière, L., Velluet, M.-T. (2004). Multiple-aperture optical telescopes: Cophasing sensor testbed. In: 5th International Conference on Space Optics, vol. SP-554, Toulouse, France, CNES/ESA, ESA, pp. 479–484. Strand, O.N. (1974). Theory and methods related to the singular-function expansion and Landwebers’s iteration for integral equations of the first kind. SIAM J. Numer. Anal. 11 (4), 798–815.
76
MUGNIER ET AL .
Thelen, B.J., Carrara, D.A., Paxman, R.G. (1999a). Fine-resolution imagery of extended objects observed through volume turbulence using phasediverse speckle. In: Roggemann, M.C., Bissonnette, L.R. (Eds.), Propagation and Imaging through the Atmosphere II, vol. 3763. Soc. Photo-Opt. Instrum. Eng., pp. 102–111. Thelen, B.J., Carrara, D.A., Paxman, R.G. (2000). Pre- and post-detection correction of turbulence-induced space-variant blur. In: Roggemann, M.C. (Ed.), Propagation and Imaging through the Atmosphere IV, vol. 4125. Soc. Photo-Opt. Instrum. Eng. Thelen, B.J., Paxman, R.G., Carrara, D.A., Seldin, J.H. (1999b). Maximum a posteriori estimation of fixed aberrations, dynamic aberrations, and the object from phase-diverse speckle data. J. Opt. Soc. Am. A 16 (5), 1016– 1025. Thiébaut, E., Conan, J.-M. (1995). Strict a priori constraints for maximumlikelihood blind deconvolution. J. Opt. Soc. Am. A 12 (3), 485–492. Vogel, C.R. (2000). A limited memory BFGS method for an inverse problem in atmospheric imaging. In: Hansen, P.C., Jacobsen, B.H., Mosegaard, K. (Eds.), Methods and Applications of Inversion. In: Lecture Notes in Earth Sciences. Springer-Verlag, pp. 292–304. Vogel, C.R., Chan, T., Plemmons, R. (1998). Fast algorithms for phasediversity-based blind deconvolution. In: Bonaccini, D., Tyson, R.K. (Eds.), Adaptive Optical System Technologies, vol. 3353. Soc. Photo-Opt. Instrum. Eng., pp. 994–1005. Von der Lühe, O. (1993). Speckle imaging of solar small scale structure. I—Methods. Astron. Astrophys. 268 (1), 374–390. Zarifis, V., et al. (1999). The multi aperture imaging array. In: Unwin, S., Stachnik, R. (Eds.), Working on the Fringe: Optical and IR Interferometry from Ground and Space, Astron. Soc. Pacific Conf. Series, vol. 194, pp. 278–285.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 141
Solving Problems with Incomplete Information: A Grey Systems Approach YI LINa AND SIFENG LIUb a Department of Mathematics, Slippery Rock University, Slippery Rock, Pennsylvania 16057, USA b College of Economics and Management, Nanjing University of Aeronautics and Astronautics,
Nanjing 210016, PR China
I. Problems with Uncertainty . . . . . . . . . . A. Stochastic Uncertainty . . . . . . . . . . B. Grey Uncertainty . . . . . . . . . . . C. Uncertainty . . . . . . . . . . . . . D. Fuzzy Uncertainty . . . . . . . . . . . E. Rough Uncertainty . . . . . . . . . . . F. Soros Reflexive Uncertainty . . . . . . . . G. Appearance of Grey Systems Research . . . . . II. The Fundamentals . . . . . . . . . . . . A. Grey Numbers and Their Whitenizations . . . . . 1. Grey Numbers with Only Lower Limits . . . . . 2. Grey Numbers with Only Upper Limits . . . . . 3. Interval Grey Numbers . . . . . . . . . 4. Continuous Grey Numbers and Discrete Grey Numbers 5. Black and White Numbers . . . . . . . . 6. Essential Grey Numbers and Nonessential Grey Numbers B. Arithmetic of Interval Grey Numbers . . . . . . C. Degree and Information Content of Greyness . . . III. Methods for Sequences with Abnormal Behaviors . . . A. Sequences with Missing Entries . . . . . . . B. Sequences under Influence of Shock Waves . . . . C. Calculus Generalized to Time Series . . . . . . IV. Incidence Analysis . . . . . . . . . . . . A. Several Well-Employed Sequence Operators . . . . B. Degrees of Grey Incidences . . . . . . . . C. Analysis of Preferences . . . . . . . . . . V. Clustering and Evaluations . . . . . . . . . . A. Two Practical Situations . . . . . . . . . B. Methods of Clustering . . . . . . . . . . C. Grey Statistics . . . . . . . . . . . . VI. Law of Exponentiality and Predictions . . . . . . A. Model GM(1, 1) . . . . . . . . . . . . B. GM(1, N ) and GM(0, N ) . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78 79 80 80 81 82 83 83 84 84 84 85 85 85 85 85 87 87 92 92 94 97 101 102 104 110 114 115 117 123 130 130 132
77 ISSN 1076-5670/05 DOI: 10.1016/S1076-5670(05)41002-2
Copyright 2006, Elsevier Inc. All rights reserved.
78
LIN AND LIU
C. Time Series Predictions . . . . . . . . 1. Interval Predictions . . . . . . . . . 2. Disaster Predictions . . . . . . . . 3. Seasonal Disaster Predictions . . . . . . 4. Stock Market-Like Predictions . . . . . 5. Systems Predictions . . . . . . . . D. A Test of Applications . . . . . . . . VII. Decision-Making Based on Incomplete Information A. Decision-Making with Uncertain Targets . . . B. Decision-Making Employing Incidence Analysis C. Decisions Based on Predicted Future . . . . D. Collective Decision-Making . . . . . . E. A Test of Applications . . . . . . . . VIII. Programmings with Uncertain Parameters . . . A. Linear Models . . . . . . . . . . B. Properties of Solutions of Grey Linear Models . C. Assignment Problems of Grey Prediction Type . D. Nonlinear Programming . . . . . . . IX. Control of Not Completely Known Systems . . . A. Linear Control Systems . . . . . . . . B. Transfer Functions . . . . . . . . . C. Other Kinds of Controls . . . . . . . D. A Test of Applications . . . . . . . . References . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
135 135 136 137 138 139 141 143 145 147 148 149 150 152 153 155 157 159 161 162 163 166 167 169
I. P ROBLEMS WITH U NCERTAINTY To appreciate the place of grey systems research in the river of scientific history, this chapter classifies different types of uncertainties in a more or less uniform fashion. So, at the end, it becomes natural to clearly acknowledge the significance of the research known as grey systems theory. For more related discussion, please see Ackoff (1973), Bertalanffy (1968), Checkland (1981), and Klir (1991). We now acknowledge the fact that Narendra Patel and Adnan Mahmood are part of the team that initially worked on this classification of uncertainties as presented in this chapter. To begin, how can the concept of information (Lin, 2001b; Liu et al., 2001) be seen in a general fashion? By a piece of tidings, it is meant to be the totality of a special form of objective motions. It is an objective entity, which reduces humans’ level of ignorance. For example, the statement that “it will snow today,” is a piece of tidings. This statement improves our outlook about the weather condition of the day. Let A be a piece of tidings, then A ∪ A means no tidings, since it contains the universal description of the tidings A and the opposite A. For example, let A = the stock market will go up. Then A = the market will go down or remain the same. Now, A ∪ A = the stock will go in some direction. At
A GREY SYSTEMS APPROACH
79
the same time, the combined tidings A ∩ A = the market will not go in any direction. In this chapter, lowercase letters x, y, z, . . . are used for unknowns, which could be variables or statements, and A represents a piece of tidings. The notation AΔx represents that the tidings A can make people know the value of the unknown x. Otherwise, A∇x is used to mean A is a piece of unrelated tidings about x. For example, given two pieces of tidings, A = everyone has gone to watch a movie in the theatre. B = the Dow Jones Industrial Average has gone down 400 points. Now the concern is with the unknown x = “where is Joe?” So, AΔx, since A provides an answer to x, even though it is not known whether the answer is true or false. That is, based on A, it is known that Joe went to the theatre. At the same time, B∇x, since B does not provide any value and answer to x. Assume x is an unknown, U a piece of tidings, and S a set of Cantor type. If U makes people realize x ∈ S, U is called a piece of x-position tidings. Each piece of tidings A ⊆ U is called a piece of information regarding the position tidings U (or just information for short). The totality of all pieces of information of U is called an informational hierarchy. Each so-called informational uncertainty stands for uncertainties related to information or the quality of information. As indicated by Soros’s reflexivity theory, in a true historic process, all information involved in the formation of predictions about the future can be very certain and definite. However, it is the certainty and the definiteness of the information and the accuracy and preciseness of the predictions that make the future different and more uncertain. So, informational uncertainties are different from practical uncertainties. Based on published studies, the following types of uncertainties exist: stochastic uncertainty, grey uncertainty, uncertainty, ascertainty, fuzzy uncertainty, rough uncertainty, Soros reflexive uncertainty, and blind uncertainty. A. Stochastic Uncertainty If x is unknown, S a nonempty set, U = “x belongs to S,” and A = “x belongs to S, the possibility for x = e ∈ S is αe , where 0 ≤ αe ≤ 1 and α e∈S e = 1.” In this case, A is called a piece of stochastic information. When a piece of stochastic information is given, the consequent uncertainty is called stochastic uncertainty. Such uncertainty is created because the piece of stochastic information A can only spell out how likely the unknown x equals a special element e ∈ S. This implies that the probability αe can be very close to 1 or equal to 1; however, the large probability does not guarantee that x = e will definitely be true.
80
LIN AND LIU
Example 1.1. In the business of commodity trading, based on historical price data, the market for Standard & Poor’s (S&P)500 has a 90% chance to go up on a certain Thursday. Therefore, some traders will buy into the S&P500 futures contracts on Thursday and sell out on the calculated day, which may be the next Monday or Tuesday. However, the 90% possibility of a rising S&P500 futures market does not guarantee that when we buy on Thursday that the market will go up as expected. Example 1.2. The current commercial weather forecasting business tends to provide services as follows. The chance of snow for tomorrow is 70%. If it does snow the next day, the weather forecasting service is correct since they said it would. On the other hand, if it does not snow the next day, the weather forecasting service is still correct since it only provided 70% a likelihood of a forthcoming snow. Now, if the figure 70% is replaced by 100%, the same thing can still be said about the weather forecasting service since the service only stated the chance of snow was 100%, which was not a guarantee. B. Grey Uncertainty Suppose A is a piece of grey information (Liu, 1995b) as defined as follows. Let x be an unknown, S = ∅ a set, S a subset of S, U = “x belongs to S,” and A = “x belongs to S .” Then the so-called grey uncertainty stands for the uncertainty of which specific value of the unknown x should take. For example, suppose it is given that U = “x belongs to S,” S = “R is the set of all real numbers,” S = the interval [2, 3], and A = “x belongs to S .” Then the piece of grey information A brings about the following uncertainty: we know that x is a number between 2 and 3 inclusive. However, we do not know which value x really assumes. Example 1.3. In the negotiation process of buying a car, the buyer knows he or she will pay no more $30,000. If x stands for the final negotiated price of the car that he or she likes, then x is a number between $0 and $30,000. Here, the grey uncertainty is the uncertainty about the final purchase price. C. Uncertainty If in the definition of a piece of stochastic information A, we replace the condition that αe = 1 e∈S
A GREY SYSTEMS APPROACH
81
by
αe ≤ 1,
e∈S
then A is called a piece of uncertainty information (Wang, 1990). The main difference between stochastic and unascertained information is that the former concept is developed on the assumption that all possible outcomes of an experiment are known, whereas for unascertained information, we assume that only some possible outcomes of the experiment are known to the researcher. Example 1.4. A group of researchers has a scheduled meeting at 11:30 AM Thursday. However, at around 11:45 AM, Genti, a key member of the group, does show up. So the rest of the group needs to decide where to find Genti for their urgent business decision-making. The group faces two possible situations. Situation 1. The group knows Genti very well. So the members come up with a definite list of possible places Genti could be at the moment. Because they know Genti so well, they could also attach a probability to each place on the list. So, to locate Genti successfully, they only need to check these places in the order from the largest probability to the smallest probability. This is an example of stochastic uncertainty. Situation 2. No member of the group knows Genti well enough to draw up a list of all possible places and relevant probabilities where Genti could be at this very moment. This is an example of uncertainty. The second situation explains that the whereabouts of Genti at the special moment was certain because as a living being, he must be at some place. However, the decision-makers did not know the true state of Genti or relevant information used by Genti in his decision about where to go and to be at the special moment. That is, the concept of uncertainty deals with the situation that regardless of whether an objective event is definite or not, whether it has already occurred or not, it will be “unascertained” as long as the decisionmaker does not completely understand the essential information. D. Fuzzy Uncertainty Piece A of tidings is called a piece of fuzzy information, if A satisfies: x is an unknown, S a nonempty set, the position tiding U = “x belongs to S,” and A = “x belongs to S and the degree of the membership for x = e ∈ S is αe , 0 ≤ αe ≤ 1.”
82
LIN AND LIU
Example 1.5. Jacklin Ruscitto is an official member of many committees. Due to the nature of these committees, Jacklin does not have enough time to be involved 100% in all of the committee work and relevant decision-making. Consider one committee—Committee A. If Jacklin is listed as a member but did not ever do anything for the committee, then her degree of membership in the committee will be very close to zero. If Jacklin was not listed as a member on the committee and did not do anything for the committee, then her degree of membership in the committee is zero. Even though she is not a listed member of the committee, if she has been involved in activities of the committee, then her degree of membership in the committee should be greater than zero. Now, the so-called fuzzy uncertainty will be that for a given piece of fuzzy information A = “x belongs to S and the degree of membership for x = e ∈ S is αe , 0 ≤ αe ≤ 1.” One has no clue on that for a given variable y; should y be considered with the set S or not, even though it is known that the degree of membership of y in S is αe . For example, Jacklin is listed as an official member of Committee A and has been involved in all committee activities. So Jacklin’s degree of membership in Committee A is 1. Now, the fuzzy uncertainty implies that her degree 1 of membership in Committee A does not guarantee her 100% involvement or membership in Committee A in the future. On the other hand, even though John Opalanko is not a listed member of Committee A, it is very likely that since Committee A is involved in a special project that looks extremely important in John’s eyes that John may very well get involved in the project. In this case, John’s degree of membership in Committee A should be more than zero, even though his previous degree of membership in Committee A was zero. E. Rough Uncertainty Let U be a set of elements. A subset r ⊆ p(U ), the power set of U , is called a partition of U , if the following conditions hold true: 1. r = {x: x ∈ r} = U , and 2. ∀A, B ∈ r, if A = B, then A ∩ B = ∅. Let K = (U , R) be a knowledge base over U , where U is the universal set of all objects involved in a study, and R a given set of partitions of the set U , called a knowledge base over U . A subset X ⊆ U is called exact in K, if there exists a P ⊆ R such that X is the union of some elements in P. Otherwise, X is said to be rough in K (Pawlak, 1991).
A GREY SYSTEMS APPROACH
83
F. Soros Reflexive Uncertainty Let x be the unknown path a true historic process (Soros, 1998) will eventually take; S = all possible outcomes of this historical process. Then, a Soros reflexive unascertained information is defined as follows: U = “x ∈ S” and A ⊆ S is a piece of information regarding the position of x in S defined by A = “if it is expected x = e ∈ S with a degree of credence αe , 0 ≤ αe ≤ 1, then x = e ∈ S has a degree of credence 1 − αe .” Now the uncertainty associated with a piece of Soros reflexive uncertain information is that the more accurate a prediction about a true historical process is, the more uncertain the expected future will become. G. Appearance of Grey Systems Research When faced with systems and/or problems with partially known and partially unknown information, Professor Deng Ju-Long in 1982 published the first research paper in the area of grey systems in the international journal Systems and Control Letters, published by North-Holland Co. In the following years, the theory of grey systems has been developed and is maturing rapidly. It has been widely applied to the analyses, modelings, predictions, decision-making, and controls, with significant consequences to various systems, including, but not limited to, social, economic, scientific and technological, agricultural, industrial, transportation, mechanical, petrologic, meteorological, ecological, hydrological, geological, financial, medical, legal, military, systems, and so on. Research papers on grey systems have been cited by many scholars around the globe and been reviewed by internationally authoritative review periodicals. Currently, more than 80 universities worldwide, located in countries such as Australia, China, Japan, Taiwan, and the United States, have offered courses or workshops on grey systems, and hundreds of graduate students are applying the methodology of grey systems in their research and dissertations. Many international conferences have listed grey systems as a special topic. All of these represent the fact that grey systems theory with its strong vitality has already stood in the forest of scientific theories, and the fact of its position as a transfield scientific theory has been well established. This chapter consists of eight sections. Section II concentrates on the fundamentals of grey systems theory by focusing on such elementary concepts as grey numbers, their arithmetic operations, and the degree and information content of greyness. Section III introduces different methods to handle sequences or time series with abnormal behaviors. Section IV is devoted to the study of incidence analysis of sequences of data. Section V teaches how to
84
LIN AND LIU
use grey systems theory to cluster variables considered in a study and evaluate these variables. Section VI focuses on the well-known law of exponentiality and various grey systems models to make predictions. Decision-making is considered in Section VII, where all situations involve partially known and partially unknown information. In Section VIII, programming with uncertain parameters is studied. The final section considers problems of grey systems control. Since different authors have used very different sets of symbols and terminology, this presentation uses the set of symbols and terminology as given in Liu and Lin (1999). It is hoped that this presentation will help to increase communication among scholars who are interested in our work. Through such increased dialogue, we will be able to bring this line of study to a different level with more successful practical applications. Also, we would like to use this opportunity to thank Peter W. Hawkes for his invitation to contribute to Advances in Imaging and Electron Physics.
II. T HE F UNDAMENTALS In this chapter, we will learn about the fundamental building block, grey numbers, of a grey system. When one deals with a problem with partially known and partially unknown information, he faces grey numbers and their relationships. A. Grey Numbers and Their Whitenizations A grey number (Liu, K., 1982) is such a number whose exact value is unknown, but a range within which the value lies is known. There are several classes of grey numbers: 1. Grey Numbers with Only Lower Limits The grey numbers with lower limits but not upper limits are denoted as ⊗ ∈ [a, ∞) or ⊗(a) where a represents the lower limit of the grey number ⊗, which is a fixed value. For example, the weight of a living tree is a grey number with a lower limit, since the weight of the tree must be greater than zero. However, the exact value for the weight cannot be obtained through normal means.
A GREY SYSTEMS APPROACH
85
2. Grey Numbers with Only Upper Limits The grey numbers with only upper limits are written as ⊗ ∈ (−∞, a] ¯ or ⊗(a) ¯ where a¯ stands for the upper limit of the grey number ⊗ and is a fixed number. 3. Interval Grey Numbers A grey number with both a lower limit a and an upper limit a¯ is called an ¯ interval grey number, denoted as ⊗ ∈ [a, a]. 4. Continuous Grey Numbers and Discrete Grey Numbers The grey numbers taking on a finite number of values or a countable number of values in an interval are called discrete grey numbers. Those continuously taking values that cover an interval are continuous grey numbers. 5. Black and White Numbers When ⊗ ∈ (−∞, ∞) or ⊗ ∈ (⊗1 , ⊗2 ), that is, when ⊗ has neither upper limits nor lower limits, or the upper and the lower limits are all grey numbers, ¯ and a = a, ¯ ⊗ is called a white ⊗ is called a black number. When ⊗ ∈ [a, a] number. 6. Essential Grey Numbers and Nonessential Grey Numbers An essential grey number is a grey number for which it is impossible or temporarily not possible to find a white number to represent. A nonessential grey number is a grey number ⊗ that can be described with a white number as its “representative,” where the white number is determined by using either previously known information or through some other means. This white number is called the whitenization (value) of the relevant grey number, . ⊗(a) will be used to stand for the grey number with a as its denoted as ⊗ whitenization value. For example, we ask somebody to buy a winter coat for about $100. This number 100 can be treated as the whitenization value of the (100) = 100. future coat price ⊗(100), denoted as ⊗ If a grey number ⊗(a) vibrates around a base value a, then the base value can be used as the main whitenization value, written ⊗(a) = a + δa
or
⊗(a) ∈ (−, a, +)
where δa stands for the vibration variable. The whitenization value of this grey of a general interval grey number number is ⊗(a) = a. If the whitenization ⊗
86
LIN AND LIU
F IGURE 1.
The weight function of whitenization for the content of quick-acting nitrogen in
soil.
⊗ ∈ [a, b] is in the form = αa + (1 − α)b, ⊗
α ∈ [0, 1].
This value is called equal weight whitenization. In this case, if α = 12 , it is called an equal weight mean whitenization. When the distribution information of an interval grey number is unknown, the equal weight mean whitenization is often used. When the distribution information of a grey number is known, nonequal weight whitenization is often used. For example, the contents of chemical elements nitrogen, phosphorus, and potassium are all grey numbers. To obtain a normal growing soil condition, the content of quick-acting nitrogen should be between 15 to 40 ppm. So, we can use weight function of whitenization as shown in Figure 1 to describe the content of quick-acting nitrogen in the soil of interest. Here, the flat top with weight 1 represents the optimal content of quick-acting nitrogen. The left slope stands for the content of quick-acting nitrogen from 5 to 15 ppm, so that the higher the better the effects will be. The right slope indicates the content from 40 to 60 ppm such that the higher the content is the worse the effect will be on the production of a certain crop. The curve starts at 5 ppm and ends at 60 ppm, which implies that content of less than 5 ppm or more than 60 ppm is not allowed for the production of the crop in the area of consideration. In practical applications, for the sake of convenience for computer programming and calculation, one often uses functions of the following form as the weight function of whitenization for a grey number ⊗ ∈ [x1 , x4 ] ⎧ x − x1 ⎪ ⎪ , x ∈ [x1 , x2 ) ⎪ ⎪ x ⎪ ⎨ 2 − x1 (1) f (x) = 1, x ∈ [x2 , x3 ] ⎪ ⎪ ⎪ x4 − x ⎪ ⎪ ⎩R(x) = , x ∈ (x3 , x4 ]. x4 − x3
87
A GREY SYSTEMS APPROACH
For the weight function of whitenization in Eq. (1), the following |x1 − x2 | |x4 − x3 | 2|x2 − x3 | o + max , g = x2 + x 3 x2 x3
(2)
is called the degree of greyness of the grey number ⊗. B. Arithmetic of Interval Grey Numbers Given grey numbers ⊗1 ∈ [a, b],
a
and
⊗2 ∈ [c, d],
c < d,
the sum of ⊗1 and ⊗2 , written ⊗1 + ⊗2 , is defined as follows: ⊗1 + ⊗2 ∈ [a + c, b + d]. The negative inverse of ⊗1 , written −⊗1 , is defined as follows: −⊗1 = [−b, −a]. The difference of ⊗1 with ⊗2 is defined as follows: ⊗1 − ⊗2 = ⊗1 + (−⊗2 ) ∈ [a − d, b − c]. Assume that ab > 0. Then 1 1 the reciprocal of ⊗1 , written ⊗−1 , is defined as follows: ⊗−1 1 ∈ [ b , a ]. The product of ⊗1 and ⊗2 is defined as follows:
⊗1 · ⊗2 ∈ min{ac, ad, bc, bd}, max{ac, ad, bc, bd} . Assume ⊗1 and ⊗2 satisfy c < d and cd > 0. Then quotient of ⊗1 divided by ⊗2 is defined as follows: ⊗1 /⊗2 = ⊗1 · ⊗−1 2 . That is a a b b a a b b ⊗1 , max , , , . ∈ min , , , ⊗2 c d c d c d c d Let k be a positive real number. The scalar multiplication of k and ⊗1 is defined as follows: k · ⊗ ∈ [ka, kb]. Examples can be constructed to show that interval grey numbers cannot in general be canceled additively or multiplicatively. More specifically, the difference of any two grey numbers is generally not zero, except in the case that they are identical. The division of any two grey numbers is generally not 1, except in the case when they are identical. Theorem 2.1.
The totality of all interval grey numbers constitutes a field.
For more detailed studies on grey numbers and their properties, please see Liu, L.S. (1987), Peng and Lu (1991), and Wang and Li (1996). C. Degree and Information Content of Greyness Assume that a grey number ⊗ satisfies ⊗ ∈ [a, b], a < b. Then, (⊗) = |b − a| is called the length of the information field of ⊗.
88
LIN AND LIU
! = E(⊗) is 1. When the weight function of whitenization of ⊗ is known, ⊗ called the mean-value whitenization (number) of the grey number ⊗, where E(⊗) stands for the expected value of ⊗, if the grey number is a random variable. 2. When the weight function of whitenization is unknown; (i) if ⊗ is ! = 12 (a + b) is called the mean-value whitenization a continuous, then ⊗ (number) of ⊗. (ii) If ⊗ is a discrete such that ai ∈ [a, b], for i = 1, 2, . . . , are the possible values of ⊗, then ⎧ n ⎪ 1 ⎪ ⎪ ai , ⊗ has a finite number of possible values ⎪ ⎨n i=1 != ⊗ n ⎪ 1 ⎪ ⎪ lim ai ⊗ takes a countable number of possible values ⎪ ⎩n→+∞ n i=1
is called the mean-value whitenization (number) of the grey number ⊗. (N OTE. If ai (⊗) is also a grey number such that ai (⊗) ∈ [ai , bi ] with ai < bi , then one can take ai = aˆ i (⊗).) We will continue to use g o (⊗) to represent the greyness of a grey number ⊗, as in Eq. (2). The following is an axiomatic system for the concept of greyness of grey numbers. Axiom 2.1. For any grey number ⊗ ∈ [a, b], a < b, g o (⊗) ≥ 0. Axiom 2.2. When a = b, that is when (⊗) = 0, g o (⊗) = 0, that is, the greyness of the grey number ⊗ is zero. Axiom 2.3. When either a → −∞ or b → +∞, g o (⊗) → ∞, that is, the greyness approaches ∞. Axiom 2.4.
g o (k⊗) = g o (⊗).
Axiom 2.5. g o (⊗) is directly proportional to (⊗) and inversely propor!. tional to ⊗ With these axioms in place, one can see that the following function: g o (⊗) =
(⊗) !| |⊗
(3)
can be used to reflect the greyness of the grey number ⊗ ∈ [a, b], a < b, ! the where (⊗) stands for the length of the information field of ⊗ and ⊗ mean-value whitenization value.
A GREY SYSTEMS APPROACH
89
In terms of the relationship between combinations of grey numbers and their greyness, we have the following: Theorem 2.2. For given grey numbers ⊗1 ∈ [a, b] and ⊗2 ∈ [c, d] satisfying a < b, c < d, 1. If a ≥ 0, c ≥ 0 or b ≤ 0 and d ≤ 0, the following holds true: g o (⊗1 + ⊗2 ) ≤ g o (⊗1 ) + g o (⊗2 ).
(4)
2. If one of the following conditions holds true: (i) a ≥ 0 and c ≥ 0; (ii) a ≥ 0, d ≤ 0; (iii) b ≤ 0, c ≥ 0; or (iv) b ≤ 0, d ≤ 0, then the following holds true: " # g o (⊗1 · ⊗2 ) ≥ max g o (⊗1 ), g o (⊗2 ) . (5) 3. If one of the following conditions holds true: (i) a > 0 and cd < 0; |c| d (ii) b ≤ 0, cd < 0; (iii) ab < 0, cd < 0, and |a| b ≥ max{ d , |c| }; or (iv) ab < 0, cd < 0, and true:
b |a|
d ≥ max{ |c| d , |c| }, then the following holds
g o (⊗1 · ⊗2 ) = g o (⊗2 ).
(6)
4. If one of the following conditions holds true: (i) c > 0 and ab < 0; |a| b (ii) d ≤ 0, ab < 0; iii) ab < 0, cd < 0, and |c| d ≥ max{ b , |a| }; or (iv) ab < 0, cd < 0, and true:
d |c|
b ≥ max{ |a| b , |a| }, then the following holds
g o (⊗1 · ⊗2 ) = g o (⊗1 ).
(7)
5. If one of the following conditions holds true: (i) a ≥ 0 and c > 0; (ii) a ≥ 0, d < 0; (iii) b ≤ 0, c > 0; or (iv) b ≤ 0, d < 0, then the following holds true: " # g o (⊗1 ÷ ⊗2 ) ≥ max g o (⊗1 ), g o (⊗2 ) . (8) 6. If either ab < 0 and c > 0 or ab < 0, d < 0, then the following holds true: g o (⊗1 ÷ ⊗2 ) = g o (⊗1 ).
(9)
g o (⊗1 ÷ ⊗2 ) = ∞.
(10)
7. If cd < 0, then
90
LIN AND LIU
In 1948, C. E. Shannon introduced the following formula for the computation of information measure on the space made up of random discrete systems: I =−
n
Pi log Pi ,
i=1
where
n
Pi = 1,
i=1
which is generally known as Shannon (information) entropy. Zhang et al. (1994b) introduced the concept of (information) entropy for the difference information sequence X = (x1 , x2 , . . . , xs ) using structural image sequence Y = (y1 , y2 , . . . , ys ): I (X) = −
s
yj · ln yj .
(11)
j =1
The information content contained in grey numbers should reflect the degree of comprehension of the researcher on a specific grey system. To achieve this end, let μ(⊗) be the measure of the field on which the grey number ⊗ is defined and I (⊗) the information content of ⊗. Then, I (⊗) should satisfy: Axiom 2.6.
0 ≤ I (⊗) ≤ 1.
Axiom 2.7.
I (Ω) = 0.
Axiom 2.8. I (⊗) is directly proportional to μ(⊗) and inversely proportional to μ(Ω), where if assuming the background of introducing the grey number ⊗ is Ω with ⊗ ⊂ Ω, then ⊗ = Ω − ⊗ is the remanent set of ⊗. At this junction, Axiom 2.6 limits the information content of a grey number to the range [0, 1]. The closer to zero I (⊗) is the less information content contained in the grey number ⊗. On the other hand, the closer to 1 I (⊗) is, the more information content the grey number ⊗ contains. Axiom 2.7 stipulates that the information content of the background on which a grey number was initially introduced is zero. That is because in general the background is commonly known to people and covers the entire field on which the grey number is defined. That is why a knowledge of the background Ω does not provide much, if any, useful information to the researcher. For example, the proposition that “a train is able to pull more than zero pounds” does not provide much useful information, since Ω = (0, +∞) represents the background of all possible weights. Axiom 2.8 states that when the background Ω is fixed, the larger the measure μ(⊗) of the remanent set ⊗, the larger the information content contained in the grey number ⊗. That is,
A GREY SYSTEMS APPROACH
91
the smaller the measure of the grey number ⊗ itself, the larger its information content. For example, if a grey number ⊗ stands for an estimate for a specific real number value, then when the reliability is fixed, the smaller the measure of ⊗, the more meaningful an estimate the grey number ⊗ represents. Now, if one defines μ(⊗) , (12) μ(Ω) it can be seen that this ratio I (⊗) can be used to measure the information content of the grey number ⊗. The following theorem lists the relevant properties of information content of grey numbers. I (⊗) =
Theorem 2.3. For grey numbers ⊗1 ∈ [a, b] and ⊗2 ∈ [c, d] satisfying a < b and c < d, 1. If ⊗1 ⊂ ⊗2 , then I (⊗1 ) ≥ I (⊗2 ). 2. I (⊗1 ∪ ⊗2 ) ≤ I (⊗k ), k = 1, 2, where " # ⊗1 ∪ ⊗2 = ξ | ξ ∈ [a, b] or ξ ∈ [c, d]
(13)
is the union of the grey numbers ⊗1 and ⊗2 . 3. I (⊗1 ∩ ⊗2 ) ≥ I (⊗k ), k = 1, 2, where " # ⊗1 ∩ ⊗2 = ξ | ξ ∈ [a, b] and ξ ∈ [c, d]
(14)
is the intersection of the grey numbers ⊗1 and ⊗2 . 4. If ⊗1 ⊂ ⊗2 , then I (⊗1 ∪ ⊗2 ) = I (⊗2 ) and I (⊗1 ∩ ⊗2 ) = I (⊗1 ). 5. If μ(Ω) = 1 and the grey numbers ⊗1 and ⊗2 are independent of the measure μ, then the following hold true: a. I (⊗1 ∪ ⊗2 ) = I (⊗1 )I (⊗2 ); and b. I (⊗1 ∩ ⊗2 ) = I (⊗1 ) + I (⊗2 ) − I (⊗1 )I (⊗2 ). Different ways on which grey numbers are combined (Chen, 1984), affect the information content and the reliability of the information contents of the consequent grey numbers. In general, when grey numbers are unioned, the consequent information content decreases while the reliability of the information content increases. On the other hand, when grey numbers are intersected, the information content increases with its reliability decreases. When facing a practical problem with the necessity to process a large number of grey numbers, one can consider combining these grey numbers at different levels so that useful information can be extracted through each level. In the process of combining the available grey numbers, one can also apply the concepts of union and intersection across different levels so that the final extracted information can satisfy one’s desires in terms of reliability and content.
92
LIN AND LIU
III. M ETHODS
FOR
S EQUENCES
WITH
A BNORMAL B EHAVIORS
In the study of grey systems, it is through various organizations of raw data for the researcher to sort out development or governing laws, if any. This is a path of determining realistic governing laws from the available data. This path is called generations of grey sequence. Even though objective systems phenomena can be complicated and related data chaotic, they always represent a whole, hence, implicitly contain the underlying governing laws. The key is for the researcher to uncover these laws and to make use of them by using appropriate methods. A. Sequences with Missing Entries When collecting data, often due to some unconquerable difficulties, there appear some blanks in the data sequence collected. There also exist such data sequences that even though the data is complete some abnormal values are included due to dramatic behavioral changes of the system under investigation. These abnormal data values bring more difficulties to the researcher, since the researcher just simply does not know whether they are mistakes made in the collection of data or they represent a major shift in the underlying structure of the system of study. However, if these abnormal values are deleted, some blanks in the data sequence will be created. Therefore, how to effectively fill these blanks naturally becomes the first problem the researcher has to face when dealing with sequential data. For a given sequence (time series) X = x(1), x(2), . . . , x(k), x(k + 1), . . . , x(n) (15) of data with x(k) collected before x(k + 1), k = 1, 2, . . . , n − 1, if a blank exists at the location k, then the blank entry is denoted ∅(k). We say that this blank is filled by a mean generated value, if ∅(k) = x ∗ (k) = 0.5x(k − 1) + 0.5x(k + 1). For a given sequence X, when the starting x(1) or the ending entry x(n) is blank, that is x(1) = ∅(1) or x(n) = ∅(n), there is no way to use the method of mean generation to fill these blank(s). In this case, the methods of stepwise ratio generation and smooth ratio generation are often used. Let X be a sequence as defined above. Then, σ (k) =
x(k) , x(k − 1)
(16)
A GREY SYSTEMS APPROACH
93
k = 2, 3, . . . , n are called stepwise ratios of X and x(k) ρ(k) = k−1 , x(i) i=1
(17)
k = 2, 3, . . . , n, smooth ratios of X. Assume X is a sequence with blanks at the two ends. If the stepwise ratio (or smooth ratio) of the right side neighbor of ∅(1) is used to generate x(1) and the stepwise ratio (or smooth ratio) of the left side neighbor of ∅(n) is used to generate x(n), then x(1) and x(n) are said to be stepwise ratio generated (or smooth ratio generated). The sequence, with blanks filled by stepwise ratio generation (or smooth ratio generation), is called a sequence generated with stepwise ratios (or smooth ratios). Proposition 3.1. Assume that X is a sequence with blank ends. 1. If stepwise ratio generation is applied, then x(1) =
x(2) , σ (3)
x(n) = x(n − 1)σ (n − 1);
2. If smooth ratio generation is used, then x(1) =
x(n) = x(n − 1) 1 + ρ(n − 1) .
x 2 (2) , x(3) − x(2)
Proposition 3.2. The following equation establishes the relationship between stepwise ratios and smooth ratios: σ (k + 1) =
ρ(k + 1) 1 + ρ(k) ρ(k)
(18)
for k = 2, 3, . . . , n. Proposition 3.3. If X = (x(1), x(2), . . . , x(n)) is an increasing sequence, satisfying that 1. for k = 2, 3, . . . , n, σ (k) < 2; 2. for k = 2, 3, . . . , n, ρ(k + 1) < 1, ρ(k) that is, the smooth ratio is decreasing, then for any fixed real number ε ∈ [0, 1] and k = 2, 3, . . . , n, when ρ(k) ∈ [0, ε], it must be that σ (k + 1) ∈ [0, 1 + ε].
94
LIN AND LIU
If a sequence X, as shown in Eq. (15), that (i) for k = 2, 3, . . . , n − 1, ρ(k + 1) < 1; ρ(k) (ii) for k = 3, 4, . . . , n, ρ(k) ∈ [0, ε]; and (iii) ε < 0.5, then X is said to be a quasi-smooth sequence. B. Sequences under Influence of Shock Waves Due to interference of some uncontrollable shock waves, the data set collected sometimes may show too fast or too slow development tendencies, which do not reflect the true development tendency of the system under consideration. If such a set of data is used to build models for the purpose of making predictions without first eliminating the effect of the uncontrollable interference, the conclusions obtained are often not usable. In order to eliminate the effect of the uncontrollable noises, so-called sequence operators (Ko, 1996) are introduced to uncover the true pattern of the original data. Based on conclusions of relevant qualitative analysis, sequence operators can be used to either strengthen or weaken the development tendency of the raw sequences so that resultant prediction accuracy can be improved. Let X be a sequence of raw data, as shown in Eq. (15). Then any operator D applied on X will produce another sequence: D(X) = x(1)d, x(2)d, . . . , x(n)d . (19) Such a D is called a sequence operator, and D(X) is the image sequence of X under D. In practical applications, sequence operators and the number of times they are applied on the given sequence of data can be appropriately chosen and defined based on how much the sequence is interfered with by uncontrollable shock waves. For modeling purposes in grey systems theory, only sequence operators D, called buffer operators, satisfying the following three axioms are considered: Axiom 3.1 (Axiom of Fixed Points). For any sequence X, x(n)d = x(n). That is, the last datum in X must be kept unchanged. Axiom 3.2 (Axiom on Sufficient Usage of Information). When D is applied, all the information contained in each datum x(k), k = 1, 2, . . . , n, of X should be sufficiently applied, and any effect of each entry x(k), k = 1, 2, . . . , n, should also be directly reflected in D(X). Axiom 3.3 (Axiom of Analytic Representations). For any k = 1, 2, . . . , n, x(k)d can be described with a uniform and elementary analytic representation in x(1), x(2), . . . , x(n).
A GREY SYSTEMS APPROACH
95
Assume that X is a sequence of raw data and D is a buffer operator (Liu, 1991). When X is respectively a monotonously increasing, decreasing, or vibrational sequence, (i) if the sequence D(X) increases or decreases more slowly or vibrates with a smaller amplitude than the original sequence X, then D is termed as a weakening operator. (ii) If the sequence D(X) increases or decreases more rapidly or vibrates with a greater amplitude than the original sequence X, then D is termed as a strengthening operator. Theorem 3.1. Assume that X = (x(1), x(2), . . . , x(n)) is a sequence of raw data and D a buffer operator. When X is monotonically increasing, the following hold true: 1. If D is a weakening operator, then x(k)d ≥ x(k), k = 1, 2, . . . , n; 2. If D is a strengthening operator, then x(k)d ≤ x(k), k = 1, 2, . . . , n. That is, the data in a monotonic increasing sequence expand when a weakening operator is applied and shrink when a strengthening operator is applied. Theorem 3.2. Assume the same as in the previous theorem. Then, when X is monotonically decreasing, the following hold true: 1. If D is a weakening operator, then x(k)d ≤ x(k), k = 1, 2, . . . , n; 2. If D is a strengthening operator, then x(k)d ≥ x(k), k = 1, 2, . . . , n. That is, the data in a monotonic decreasing sequence shrink when a weakening operator is applied and expand when a strengthening operator is applied. Theorem 3.3. Assume the same as in the previous theorem. Then, when X is a vibrational sequence, the following holds true: 1. If D is a weakening operator, then " # " # max x(k)d ≤ max x(k) 1≤k≤n
and
1≤k≤n
" # " # min x(k)d ≥ min x(k) .
1≤k≤n
1≤k≤n
2. If D is a strengthening operator, then " # " # max x(k)d ≥ max x(k) 1≤k≤n
and
1≤k≤n
" # " # min x(k)d ≤ min x(k) .
1≤k≤n
1≤k≤n
96
LIN AND LIU
Proposition 3.4. Assume that X = (x(1), x(2), . . . , x(n)) is a sequence of raw data and D is a buffer operator such that D(X) = (x(k)d)nk=1 , where for k = 1, 2, . . . , n,
1 x(k)d = x(k) + x(k + 1) + · · · + x(n) . (20) n−k+1 Then when X is either a monotonic increasing, or a monotonic decreasing, or a vibrational sequence, D is always a weakening operator. Proposition 3.5. Assume X and D are the same as in Proposition 3.4 except that for any k = 1, 2, . . . , n − 1, $ 1 x(1) + x(2) + · · · + x(k − 1) + kx(k) , k = n (21) x(k)d = 2k − 1 x(n)d = x(n), k = n. Then, when X is either monotonic increasing or monotonic decreasing, D is always a strengthening operator. Example 3.1. The overall business revenue of a county, located in Henan Province of The People’s Republic of China, for the years from 1983 to 1986 was recorded as X = (10155, 12588, 23480, 35388) where the unit is omitted. This record showed a tendency of rapid growth. The average rate of growth for these years was 51.6%, and the average rate of growth for the years 1984–1986 was 67.7%. All the people involved in the economic planning of the county, including some politicians, scholars, related experts, and residents, commonly believed that the overall revenue of this county could not keep up with this record speed of growth in the coming years. If these data were directly used to build models and make predictions, nobody could accept the resultant conclusions. After numerous rigorous analyses and discussions, all parties involved recognized that the reason for the high growth rate to have appeared was mainly due to the low baseline that was a consequence of the fact that in the past the policies relevant to private enterprises either had not been in existence or encouraged or applied thoroughly. To weaken the growth rate of the sequence of the raw data, it is necessary to artificially add all favorable environmental factors, created from the introduction of the related policies, for the development of private enterprise to the past years. With this goal in mind, in 1987, we introduced the weakening operator D, as defined in Proposition 3.2, and obtained the following sequence: D 2 (X) = (27260, 29547, 32411, 35388).
A GREY SYSTEMS APPROACH
97
Then our consequent modeling, based on D 2 (X), produced the prediction of an average 9.4% annual growth for the years 1987–2000 for the county’s growth of business revenue. Looking back, this predicted rate of growth agreed very well with the recorded values over the time span of our predictions. C. Calculus Generalized to Time Series In the modern history of science, the concepts and theories of differentiation and integration have helped to effect many important and magnificent advances in various scientific fields. However, modern attempts to apply these methods in the areas of social science and the humanities have encountered great difficulties. There are many reasons for the lack of success in these attempts. For example, George Soros believes that all systems involving humans do not follow the laws of physical science, since human participants alter the path of evolution so that the past patterns will not continue into the future. In a way, such an explanation is similar to that of the so-called equal quantitative effects as proposed by Yi Lin, Shoucheng OuYang, and their colleagues (Lin, 1998). Other than these two explanations, we also believe that the reason for the lack of success in social science and humanity is due to the failure of the conditions of continuity and differentiability, which are the backbones for all calculus-based theories to work validly, since discrete data series can satisfy neither of these conditions. This section shows how the concepts of limits, differentiation, and integration can be generalized to sequences of discrete data. Then we delve into some deeper results regarding the accumulation operator, an equivalent version of the integration operator defined for discrete sequences of data. (0) (0) (0) Assume that X(0) = (x1 , x2 , . . . , xn ) is a sequence of data values with the indices 1, 2, . . . , n representing the time moments when each value (0) xi , i = 1, 2, . . . , n, was collected. In the rest of this section, this sequence is always given and available. To study the concept of limits, let I be a set of units used to measure time. If I = {. . . , year, month, day, hour, minute, second, . . .}, then the set I is called the set of general time units. Let Ii and Ij be two time units of the ith level and the j th level in the set I of the general time units, respectively. If Ii < Ij , we say that the ith level time is denser than the j th level time. Now, the concept of limits can be written as follows: For a function f (t) of time t, f (t) → L(t),
as the time unit approaches the minimum
if and only if when the time units involved in a specific study approach their minimum as specified in the study, the function value f (t) approaches L(t).
98
LIN AND LIU
For example, if the set I of general time units is applied in a study, then the limit of f (t) as the time unit approaches the minimum equals the limit of f (t) as the time unit approaches zero. Now, the concept of integration can be generalized to the case of discrete data sequences as follows (Lin, 2001a): Let y = f (t) be a piecewise continuous function defined on the closed interval [a, b], then the Riemannian integral of f (t) is defined as follows: x f (t) dt = lim
F (t) =
n−1
Δ→0
a
f (tk )Δk ,
k=0
where x ∈ [a, b], Δ = a partition of the interval [a, x] with the subintervals [a0 , a1 ], [a1 , a2 ], . . . , [an−1 , an ]
(a0 = a and an = x),
tk ∈ [ak , ak+1 ], Δk = ak+1 −ak , and Δ = max{Δk : k = 0, 1, 2, . . . , n−1}. So, if f (t) is a function of time t, the integral of f (t) over a time period [0, k] is defined as follows: x k−1 f (ti )Δi , F (k) = f (t) dt = lim Δ→min
a
i=0
where Δ, Δ, ti , and Δi are defined as above. The practical usefulness of the integral function F (k) is that if the original function f (t) contains discontinuities, the integral function F (k) will not have as many discontinuities as f (t). Practically speaking, this end implies that if the given sequence of data X(0) contains outliers, the integral function of the sequence will have no outliers. After the integration operation is applied on X(0) a few times, the resultant sequence will be as smooth as needed. Since the minimum time unit for X(0) is 1, the previous equation implies that xk(1) =
k
xi(0) .
(22)
i=1 (1)
(1)
(1)
That is, the sequence of data X(1) = (x1 , x2 , . . . , xn ), where each entry (1) xk is computed out of Eq. (22), is the integral of the original sequence X (0) . To avoid any confusion, the operator D : X(0) → X(1) , as defined in Eq. (22), will be called the accumulating operator (Dai, 1997) and X(1) the first-order generation by accumulation. In general, the rth order generation by accumulation of the sequence X(0) , for r > 1, is the sequence X (r) = D r X(0) = x1(r) , x2(r) , . . . , xn(r) (23)
99
A GREY SYSTEMS APPROACH
where (r) xi
=
i
(r−1)
xk
.
k=1
Now, the results below state some of the important properties of the accumulating operator D (Lin and Liu, 2000b). Proposition 3.6. Assume that X (0) is a nonnegative sequence, where (0) (0) xk ≥ 0 and xk ∈ [a, b], k = 1, 2, . . . , n. Then for any ε > 0, when r is sufficiently large, there exists a N such that for any k satisfying N < k ≤ n the following holds true: (r)
xk
k−1 i=1
(r)
< ε.
xk
That is to say, for a bounded nonnegative sequence, after many applications of accumulating generations, the resultant sequence can be sufficiently smooth and the smooth ratio ρ(k) → 0, as k → ∞. Proposition 3.7. Let X(0) be the same as in Proposition 3.6 and Z (1) = z2(1) , z3(1) , . . . , zn(1) is the sequence of mean generation of consecutive neighbors of X(1) , that is, 1 (1) (1) xi + xi−1 , i = 2, 3, . . . , n. 2 Then for any ε1 ≤ ε2 ∈ [0, 1], there exists a positive integer N = N(ε1 , ε2 ) such that for any k with N < k ≤ n, the following holds true: (1)
zi
=
x (0) (k) < ε1 , ρ(k) = k−1 (0) i=1 x (i)
x (0) (k) < ε2 . z(1) (k) (0)
For the sequence X (0) , (i) if for k = 1, 2, . . . , n, xk = ceak , c = 0 = a, then X(0) is called a homogeneous exponential sequence; and (ii) if for (0) k = 1, 2, . . . , n, xk = ceak + b with c, a, b = 0, then X (0) is called a nonhomogeneous exponential sequence. Theorem 3.4. A sequence X (0) is homogeneously exponential, if and only x(k) if, for k = 1, 2, . . . , n, the equation σ (k) = x(k−1) = a positive constant always holds true.
100
LIN AND LIU
For a sequence X(0) , (i) if for any k, σ (k) ∈ (0, 1], then the sequence X (0) is said to satisfy the law of negative grey exponent; (ii) if for any k, σ (k) ∈ (1, b], for some b > 1, then the sequence X(0) is said to satisfy the law of positive grey exponent; (iii) if for any k, σ (k) ∈ [a, b] with b − a = δ, then X(0) is said to satisfy the law of exponentiality with the absolute degree of greyness δ; and (iv) when δ < 0.5, the sequence X(0) is said to satisfy the law of quasiexponent. Theorem 3.5. If X(0) is a nonnegative quasi-smooth sequence, then the sequence X(1) , generated by applying accumulating generation once on X(0) , satisfies the law of quasi-exponent. Theorem 3.6. If X(0) is nonnegative and X (r) satisfies a law of exponentiality and the stepwise ratio of X(r) is given by σ (r) (k) = σ , then 1. we have σ (r+1) (k) =
1 − σk ; 1 − σ k−1
2. when σ ∈ (0, 1), lim σ (r+1) (k) = 1,
k→∞
and for each k, σ (r+1) (k) ∈ [1, 1 + σ ]; 3. when σ > 1, lim σ (r+1) (k) = σ,
k→∞
and for each k, σ (r+1) (k) ∈ (σ, 1 + σ ]. This last theorem implies that if an rth accumulating generation of X(0) satisfies an obvious law of exponentiality, an additional application of accumulating generation operator (AGO) will destroy the pattern that has been obviously seen. It indicates that application of accumulating generation needs to be stopped when necessary. In practical applications, if an rth accumulation generation of X (0) satisfies the law of quasi-exponentiality, we will generally no longer apply any further generations. From Theorem 3.5, it follows that only one application of the accumulating generation is needed for a nonnegative quasi-smooth sequence before establishing an exponential model.
A GREY SYSTEMS APPROACH
101
To conclude this section, we need to generalize the concept of differentiation of calculus to the case of discrete data sequences. For a fixed whole number r, assume that the sequence X(r) = D r (X(0) ) has been obtained, where D is the accumulating operator. The concept of derivatives of the (r) (r) (r) sequence X(r) = (x1 , x2 , . . . , xn ) can be defined as follows: (r) xk(r) − xk−1 d (r) X = lim time unit→minimum time unit dt k k (r−1) (r−1) − k−1 (r−1) i=1 xi i=1 xi = xk . (24) = 1 That is, the derivative of X(r) with respect to the time is X(r−1) and each ordinary differential equation f (y (n) , y (n−1) , . . . , y , y) = 0 can be rewritten for discrete data sequence X(0) as follows (Deng, 1993; Lin, 2001a): (0) (1) (n−1) (n) f x k , x k , . . . , xk , xk = 0. (25)
IV. I NCIDENCE A NALYSIS Any system with a level of sophistication generally involves many factors. Mutual reactions among these factors determine the development situation and tendency of the system. In scientific research, the researcher often wants to know: Which factors among the many are more important than others? Which factors have more effect on the future development of the system than others? Which factors actually cause desirable changes in the systems so that these factors need to be strengthened? Which factors hinder desirable development of the systems so that they need to be controlled? All these problems are commonly studied in the analysis of systems. Many methods in statistics, such as regression analysis, variance analysis, and principal component analysis, are all commonly used in the analysis of systems. However, these methods suffer from many pitfalls, including: (1) a large amount of data is required. (2) It is required that all samples or populations satisfy certain typical probability distribution(s), that the relation between the main characteristic variable of the system and factor variables is roughly linear. These requirements are often difficult to satisfy in reallife practice. (3) It often happens that quantitative conclusions may not agree with qualitative analysis results, causing misunderstandings about the systems under consideration.
102
LIN AND LIU
The so-called grey incidence analysis remedies these defects found in existing statistics when applied in the content of systems analysis. It can be applied to cases of various sample sizes and distributions with a relatively small amount of computation. In general, each application of grey incidence analysis does not result in situations of disagreement between quantitative analysis and qualitative analysis. The fundamental idea of grey incidence analysis is that the closeness of a relationship is judged based on the similarity level of the geometric patterns of sequence curves. The more similar the curves are, the higher degree of incidence between sequences and vice versa. A. Several Well-Employed Sequence Operators Similar to what was studied in Section III, we will apply sequence operators (Liu, 1995c), if quantitative analysis is needed, to analyze all available factors so that these quantities and factors might become nondimensional with similar behaviors for negatively correlated factors and positively correlated factors. Assume that Xi is a system’s factor with the kth observation value being xi (k), k = 1, 2, . . . , n. Then (26) Xi = xi (1), xi (2), . . . , xi (n) is called a behavioral sequence of the factor Xi . (1) If k stands for time, then xi (k) represents an observation of the factor Xi at the time moment k, and Eq. (26) is called a behavioral time sequence of the factor Xi . (2) If k is an ordinality of some criteria, xi (k) is the observation of the factor Xi at the criterion k, and Eq. (26) is called a behavioral criterion sequence of the factor Xi . (3) If k is the ordinal number of the object observed, xi (k) stands for the observation of the factor Xi of the kth object, and Eq. (26) is called a behavioral horizontal sequence of the factor Xi . Regardless of the sequence— time, criterion, or horizontal—the needed incidence analysis can always be conducted. Assume Eq. (26) is a behavioral sequence of a factor Xi , and D1 a sequence operator satisfying D1 (Xi ) = (xi (j )d1 )nj=1 , where xi (k) , k = 1, 2, . . . , n. (27) xi (1) Then D1 is called an initialing operator, and D1 (Xi ) is the initial image of Xi . Let D2 be a sequence operator such that D2 (Xi ) = (xi (j )d2 )nj=1 and xi (k)d1 =
xi (k)d2 =
xi (k) Xi
1 xi (k), n n
,
Xi =
i=1
k = 1, 2, . . . , n.
(28)
A GREY SYSTEMS APPROACH
103
Then D2 is called an averaging operator with D2 (Xi ) the average image of Xi . Let D3 be a sequence operator satisfying D3 (Xi ) = (xi (j )d3 )nj=1 and xi (k)d3 =
xi (k) − mink {xi (k)} , maxk {xi (k)} − mink {xi (k)}
k = 1, 2, . . . , n.
(29)
Then D3 is called an interval operator with D3 (Xi ) as the interval image of Xi . Proposition 4.1. Initialing operator D1 , averaging operator D2 , and interval operator D3 can all transform a behavioral sequence of a system into nondimensional sequence. Let Xi be the same as in Eq. (26) satisfying xi (k) ∈ [0, 1], for k = 1, 2, . . . , n, and D4 a sequence operator such that D4 (Xi ) = (xi (j )d4 )nj=1 , where xi (k)d4 = 1 − xi (k),
k = 1, 2, . . . , n.
(30)
Then D4 is called a reversing operator with D4 (Xi ) as the reverse image of Xi . Proposition 4.2. The interval image of any behavioral sequence has a reverse image. Let Xi be the same as in Eq. (26) satisfying xi (k) = 0, for k = 1, 2, . . . , n and D5 a sequence operator such that D5 (Xi ) = (xi (j )d5 )nj=1 , where xi (k)d5 = 1/xi (k),
k = 1, 2, . . . , n.
(31)
Then D5 is called a reciprocating operator with D5 (Xi ) as the reciprocal image of Xi . Proposition 4.3. If there exists a negative correlation between a system factor Xi and a system behavior X0 , then the reverse image Xi D4 and the reciprocal image Xi D5 of the factor Xi have a positive correlation with X0 . Each of the operators {Di | i = 1, 2, 3, 4, 5} is called a (grey) incidence operator. The space, consisting of system factors and grey incidence operators, forms a base for grey incidence analysis. On such a base, comparisons and evaluations can be done in order to study the system’s behavior of factors. Now, if each factor in a space of grey incidence factors is seen as a point in the space without size and volume and each data value of the factor, observed
104
LIN AND LIU
at different time moment, different index, or different object, is seen as the coordinate of the point, we are able to study the relationship between factors or between factors and system’s characteristics in a special n-dimensional space. In this way, the relevant degree of grey incidence can be defined by using the distance function in the n-dimensional space. B. Degrees of Grey Incidences Assume that
X = x(1), x(2), . . . , x(n)
(32)
is a sequence of data. Then X=
n−1 %
"
# t, x(k) + (t − k) x(k + 1) − x(k) | t ∈ [k, k + 1]
k=1
is called the zigzagged line corresponding to the sequence X. Here, the same symbol X is used to represent the original sequence and its zigzagged line. For the sake of convenience for our discussion, we will not always distinguish a sequence and its zigzagged line. Proposition 4.4. Assume that a sequence X0 of data is increasing and Xi is a sequence of relevant factors behaviors. Then, 1. When Xi is increasing, Xi and X0 are positively correlated; and 2. When Xi is decreasing, Xi and X0 are negatively correlated. Since negatively correlated sequences can be made positively correlated by using a reversing operator or a reciprocating operator, we will emphasize the study of positively correlated relationships. Let X be the same as in Eq. (32). (1) The following α = x(k) − x(k − 1), for k = 1, 2, . . . , n, is called the slope of X on the interval [k − 1, k]. (2) The following α = x(s)−x(k) s−k , s > k, k = 1, 2, . . . , n, is the mean slope of X on the interval [k, s]. (3) The following 1 α = n−1 [x(n) − x(1)] is called the mean slope of X. Theorem 4.1. Assume that Xi , and Xj are nonnegative increasing sequences of data, Xj = Xi + c, where c is a nonzero constant and D1 is an initialing operator. Let Yi = D1 (Xi ), Yj = D1 (Xj ) be the initial images of Xi and Xj , αi , and αj the mean slopes of Xi and Xj , and βi and βj the mean slopes of Yi and Yj , respectively. Then, the following must hold true: 1. αi = αj ; 2. When c < 0, βi < βj and when c > 0, βi > βj .
105
A GREY SYSTEMS APPROACH
This theorem reflects the following characteristics of increasing sequences: When the absolute amounts of increase of two increasing sequences are the same, the sequence with smaller initial value will increase faster than the sequence with a greater initial value. In order to keep a same relative rate of increase, the absolute amount of increase of the sequence with greater initial value must be greater than that of the sequence with a smaller initial value. Assume that X0 = x0 (1), x0 (2), . . . , x0 (n) (33) is a sequence of data representing a system’s characteristics and Xi = xi (1), xi (2), . . . , xi (n) , i = 1, 2, . . . , n,
(34)
are sequences of relevant factors. For a given real number γ (x0 (k), xi (k)), if the real number 1 γ (X0 , Xi ) = γ x0 (k), xi (k) n n
(35)
k=1
satisfies 1. The property of normality 0 < γ (X0 , Xi ) ≤ 1, γ (X0 , Xi ) = 1
⇐⇒
X0 = Xi .
2. The property of wholeness ∀Xi , Xj ∈ X = {Xs | s = 1, 2, . . . , m; m ≥ 2}, one has γ (Xi , Xj ) = γ (Xj , Xi )
(i = j ).
3. The property of pair symmetry. For Xi , Xj ∈ X, γ (Xi , Xj ) = γ (Xj , Xi )
⇐⇒
X = {Xi , Xj }.
4. The property of closeness. The smaller |x0 (k) − xi (k)| the larger γ (x0 (k), xi (k)). Then γ (X0 , Xi ) is called a degree of grey incidence of Xi with respect to X0 (Deng, 1985a), and γ (x0 (k), xi (k)) is the incidence coefficient of Xi with respect to X0 at point k (Guao, 1985). Theorem 4.2. Assume that m+1 behavioral sequences of a system are given Xi = xi (1), xi (2), . . . , xi (n) , i = 0, 1, 2, . . . , m.
106
LIN AND LIU
For ζ ∈ (0, 1), define γ0i = γ x0 (k), xi (k)
= min minx0 (k) − xi (k) + ζ max maxx0 (k) − xi (k) i k i k
÷ x0 (k) − xi (k) + ζ max max x0 (k) − xi (k) i
k
(36)
and 1 γ x0 (k), xi (k) . n n
γ (X0 , Xi ) =
(37)
k=1
Then γ (X0 , Xi ) satisfies the four axioms for grey incidences, where ζ is called the distinguishing coefficient. Assume that two sequences Xi and Xj are of the same length, and si and sj are defined below: n si =
Xi − xi (1) dt,
1
where Xi − xi (1) stands for the zigzagged line of (xi (j ) − xi (1))nj=1 . Then εij =
1 + |si | + |sj | 1 + |si | + |sj | + |si − sj |
(38)
is called the absolute degree of grey incidence of Xi and Xj , or absolute degree of incidence for short (Liu, 1992). As for sequences of different lengths, several methods can be used to define this concept. For example, one can either delete the extra values of the longer sequence or employ the grey modeling method GM(1, 1) (see Section VI for more details) developed for predictions to prolong the shorter sequence to the length of the longer sequence so that absolute degree of grey incidence can be defined. However, all these methods would lead to different values of absolute degree of grey incidence. Theorem 4.3.
The absolute degree of grey incidence εij =
1 + |si | + |sj | 1 + |si | + |sj | + |si − sj |
satisfies the properties of normality, pair symmetry and closeness, but not wholeness.
A GREY SYSTEMS APPROACH
107
Theorem 4.4. The absolute degree εij of grey incidences satisfies the following conditions: 1. 0 < εij ≤ 1; 2. εij is only related to the geometric shapes of Xi and Xj and has nothing to do with the spacial positions of Xi and Xj . In other words, moving horizontally does not change the value of the absolute degree of grey incidences; 3. Any two sequences are not absolutely unrelated. That is, εij never equals zero; 4. The more Xi and Xj are geometrically similar, the greater εij ; 5. When Xi and Xj are parallel or Xj0 is vibrating around Xi0 with the area of the parts with Xj0 on top of Xi0 being equal to that of the parts with Xj0 beneath Xi0 , εij = 1; 6. When any one of the data values in Xi or Xj changes, εij also changes accordingly; 7. When the lengths of Xi and Xj change, εij also changes accordingly; 8. εii = 1, εjj = 1; and 9. εij = εj i . When the idea of rate of change is considered, the following concept of relative degree of incidence applies. Assume that Xi and Xj are two sequences of the same length with the initial values being zero and Xi and Xj are the initial images of Xi and Xj , respectively. Then, the absolute degree of grey incidence of Xi and Xj is called the relative degree of (grey) incidence of Xi and Xj , denoted rij . This concept is a quantitative representation of the rates of change of Xi and Xj relative to their starting points. The closer the rates of change of Xi and Xj are, the greater rij is, and vice versa. Proposition 4.5. Assume that Xi and Xj are two sequences of the same length with nonzero initial values. 1. If Xi = cXj , where c > 0 is a constant, then rij = 1. 2. The relative degree rij and the absolute degree εij of incidence of Xi and Xj do not have to have any connections. When εij is relatively large, rij can be very small. When εij is very small, rij can also be relatively large. 3. Let a and b be nonzero constants, and the relative degree of incidence of aXi and bXj is rij . Then rij = rij . In other words, scalar multiplication does not change relative degree of incidence. Theorem 4.5. Each relative degree rij of grey incidence satisfies the following properties:
108
LIN AND LIU
1. 0 < rij ≤ 1; 2. rij is only related to the rate of change of the initial entries of Xi and Xj and has nothing to do with the magnitudes of other entries. Scalar multiplication does not change the relative degree of incidences; 3. There always exists some relationship between the rates of change of any two sequences. That is, rij never equals zero; 4. The closer the individual rates of change of Xi and Xj with respect to their initial points, the greater rij is; 5. When the rates of change of Xi and Xj with respect to their initial points are the same, that is, Xi = aXj , or when the images of zero initial points of the initial images of Xi and Xj satisfy Xj 0 waves around Xi 0 and the area of the parts with Xj 0 above Xi 0 equals that of the parts with Xj 0 underneath Xi 0 , rij = 1; 6. When an entry in Xi or Xj is changed, rij will change accordingly; 7. When the lengths of sequences change, rij will also change; 8. rii = 1, rjj = 1; and 9. rij = rj i . When the overall relationship of closeness between sequences is considered, one has the following concept of synthetic degree of incidence. Assume that Xi and Xj are sequences of the same length with nonzero initial entries, that εij and rij are the absolute degree and the relative degree of grey incidence of Xi and Xj , and that θ ∈ [0, 1]. Then ρij = θ εij + (1 − θ )rij
(39)
is called the synthetic degree of (grey) incidence between Xi and Xj . This concept of incidence is a numerical index that well describes the overall relationship of closeness between sequences. For example, it reflects the similarity between the zigzagged lines Xi and Xj and also depicts the degree of closeness of the individual rates of change of Xi and Xj with respect to their initial points. In general, we can take θ = 0.5. If we are more interested in the relationship between some absolute quantities, some greater value can be used as θ . If we are putting more emphasis on rates of change, some smaller value can be used for θ . Theorem 4.6. The synthetic degree ρij of grey incidence satisfies the following properties: 1. 0 < ρij ≤ 1; 2. ρij is related to not only each observation value in the sequences Xi and Xj , but also the rate of change of each data value with respect to its initial point;
A GREY SYSTEMS APPROACH
3. 4. 5. 6. 7. 8. 9.
109
ρij never equals zero; If an entry value in Xi or Xj is changed, ρij also changes accordingly; If the length of Xi or Xj changes, ρij also changes accordingly; For different θ value, ρij is also different; When θ = 1, then ρij = εij , and when θ = 0, then ρij = rij ; ρii = 1, ρjj = 1; and ρij = ρj i .
All different degrees of incidence, as discussed earlier, are numerical characteristics for the relationship of closeness between two sequences. For a chosen sequence operator, the values of the absolute and relative degrees incidence are all unique. When a θ value is further chosen, the synthetic degree of incidence is also unique. This kind of conditional uniqueness does not affect the analysis of problems of interest. When analyzing systems and studying relationships between systems characteristic behaviors and relevant factors behaviors, one is mainly interested in the ordering of the degrees of incidence between the systems characteristic behaviors and each relevant factor’s behavioral sequence. So, the importance of magnitudes of the degrees of incidence is relative. Assume that X0 is a sequence of a system’s characteristic behaviors, Xi and Xj are sequences of two relevant factors’ behaviors, and γ is the degree of grey incidence. If γ0i ≥ γ0j , then the factor Xi is said to be more favorable than Xj , denoted as Xi Xj . The relation “” is called the (grey) incidence order induced by the degree of grey incidence. Accordingly, the orders of incidence, induced by generalized degrees of grey incidence, are called generalized orders of grey incidence, where the generalized orders include absolute order, relative order, and synthetic order of grey incidence. Theorem 4.7. Assume that X0 is a sequence for a system’s characteristic behaviors, and X1 , X2 , . . . , Xm are sequences of relevant factors’ behaviors. Let X = {X1 , X2 , . . . , Xm }. Then the following hold true: 1. The order of grey incidences, absolute order of grey incidence, relative order of grey incidence, and the synthetic order of grey incidence are all partial orderings on the set X; 2. The order of grey incidence and absolute order of grey incidence are linear orders on the set X; and, 3. If the initial entries of X0 , X1 , X2 , . . . , Xm are nonzero, then the relative order of grey incidence and the synthetic order of grey incidence are also linear orders on the set X.
110
LIN AND LIU
For more detailed studies on grey incidences, please consult Shui et al. (1992), Wang (1993), Xu (1993), and Zhu and Lian (1992). C. Analysis of Preferences Assume that Y1 , Y2 , . . . , Ys are sequences of a system’s characteristic behaviors, and X1 , X2 , . . . , Xm are behavioral sequences of relevant factors. If the sequences Y1 , Y2 , . . . , Ys ; X1 , X2 , . . . , Xm have the same length, γij , i = 1, 2, . . . , s; j = 1, 2, . . . , m, is the degree of grey incidence of Yi and Xj , then ⎤ ⎡ γ11 γ12 . . . γ1m γ22 . . . γ2m ⎥ ⎢γ (40) Γ = [γij ] = ⎣ 21 ... ... ... ...⎦ γs1 γs2 . . . γsm is called the matrix of grey incidences. Similarly, we can introduce matrices for various generalized incidences. For example, when γij = εij , the absolute matrix of incidences A = [εij ]s×m , when γij = rij , the relative matrix of incidences B = [rij ]s×m , and when γij = ρij , the synthetic matrix of incidence C = [ρij ]s×m are all well defined. By making use of these matrices of grey incidences, one can conduct preference analysis for a system’s behaviors or relevant factors. Let Γ = [γij ]s×m be a matrix of grey incidence, as in Eq. (40). If there exist k and i ∈ {1, 2, . . . , s} satisfying γkj ≥ γij ,
for j = 1, 2, . . . , m,
then the system’s characteristic Yk is said to be more favorable than Yi , denoted as Yk Yi . If for any i = 1, 2, . . . , s, with i = k, Yk Yi is true, then Yk is said to be the most favorable characteristic. On the other hand, if there exist and j ∈ {1, 2, . . . , m} satisfying γi ≥ γij , for i = 1, 2, . . . , s, then the factor X is said to be more favorable than the factor Xj , denoted as X Xj . If for any j = 1, 2, . . . , m, j = , we always have X Xj , then X is seen as the most favorable factor. If there exist k, i ∈ {1, 2, . . . , s}, satisfying m j =1
γkj ≥
m
γij ,
j =1
then the system’s characteristic Yk is said to be more quasi-favorable than the characteristic Yi , denoted Yk Yi . If there exist , j ∈ {1, 2, . . . , m}
111
A GREY SYSTEMS APPROACH
satisfying s i=1
γi ≥
s
γij ,
i=1
then the factor X is said to be more quasi-favorable than the factor Xj , denoted X Xj . If there exists k ∈ {1, 2, . . . , s} such that for any i = 1, 2, . . . , s, Yk Yi , then Yk is called a quasi-preferred characteristic of the system. If there exists ∈ {1, 2, . . . , m} such that for any j = 1, 2, . . . , m, X Xj , then the factor X is said to be a quasi-preferred factor. Proposition 4.6. In a system with s characteristics and m relevant factors, there may not exist the most favorable characteristic and the most favorable factor, but there must be quasi-preferred characteristics and factors. Let us conclude this section with a real-life project in which we have been involved. Example 4.1. In this project (Lin and Liu, 2000c), we will look at a grey incidence analysis of such an economy that consists of nongovernmental enterprises owned individually and collectively at Change County, Henan Province, The People’s Republic of China. In recent years, (nongovernmental) enterprises owned individually and collectively at Change County have developed rapidly. From 1983 to 1986, for example, the average annual growth of these nongovernmental enterprises was 51.6%. These enterprises occupied an important position in the overall picture of the region’s economic development. In 1986, the revenue of these enterprises reached 35,388 (10,000 yuan), accounting for 60% of the total industrial and agricultural revenue of the county. So, it became a common concern of the county how to effectively speed up the development of these nongovernmental enterprises in order to help the region’s economy to take off from its historical ground. Based on relevant analysis, it was known that these enterprises were mainly dominated by four factors: fixed capital, circulating capital, labor forces, and after-tax profits. The sequences of the production revenue and the relevant factors of this county’s nongovernmental enterprises are given in Table 1. 1. Compute the absolute degree of incidence. Let 4 4 Xi0 = xi (j ) − xi (1) j =1 = xi0 (j ) j =1 , then
i = 0, 1, 2, 3, 4,
112
LIN AND LIU TABLE 1 N ONGOVERNMENTAL E NTERPRISES Factors
X0 X1 X2 X3 X4
Year
(production revenue) (fixed capital) (circulating capital) (labor forces: person) (after-tax profits)
1983
1984
1985
1986
10,155∗ 3,799 1,752 24,186 1,164
12,588 3,605 2,160 45,590 1,788
23,408 5,460 2,213 57,685 3,134
35,388 6,982 4,753 85,540 4,478
∗ 10,000 yuan as the unit.
X00 = (0, 2433, 13325, 25233),
X10 = (0, −194, 1661, 3183),
X20 = (0, 408, 461, 3001),
X30 = (0, 21404, 33499, 61354),
X40 = (0, 624, 2030, 3314). Therefore, |s0 | = 2433 + 13325 + 12 · 25233 = 28374.5, |s1 | = −194 + 1661 + 12 · 3183 = 3058.5, |s2 | = 408 + 461 + 12 · 3001 = 2369.5, |s3 | = 21404 + 33499 + 1 · 61354 = 85580, 2
and |s4 | = 624 + 2030 +
1 2
· 3314 = 4311.
So, |s1 − s0 | = 25316,
|s2 − s0 | = 26005,
|s3 − s0 | = 57205.5,
|s4 − s0 | = 24063.5,
and ε01 = 0.554,
ε02 = 0.542,
ε03 = 0.666,
ε04 = 0.576.
2. Compute the relative degree of incidence. We first compute the initial images of Xi , i = 0, 1, 2, 3, 4. From 4 xi (j ) 4 = xi (j ) j =1 , i = 1, 2, 3, 4, Xi = xi (1) j =1
113
A GREY SYSTEMS APPROACH
it follows that X0 = (1, 1.2396, 2.3051, 3.4848), X1 = (1, 0.9489, 1.4372, 1.8379), X2 = (1, 1.2329, 1.2631, 2.7129), X3 = (1, 1.8850, 2.3851, 3.5368), and X4 = (1, 1.5361, 2.6924, 3.8471). The images of Xi , i = 0, 1, 2, 3, 4, of zero starting points are given as follows: 4 4 Xi 0 = xi (j ) − xi (1) j =1 = xi 0 (j ) j =1 , i = 1, 2, 3, 4, and X0 0 = (0, 0.2396, 1.3051, 2.4848), X1 0 = (0, −0.0511, 0.4372, 0.8379), X2 0 = (0, 0.2329, 0.2631, 1.7129), X3 0 = (0, 0.8850, 1.3851, 2.5368), and X4 0 = (0, 0.5361, 1.6924, 2.8471). So, |s0 | = 2.7871,
|s1 | = 0.80505,
|s3 | = 3.5385,
|s4 | = 3.65205;
|s1 − s0 | = 1.98205,
|s2 − s0 | = 1.43465,
|s3 − s0 | = 0.7514,
|s4 − s0 | = 0.86495,
|s2 | = 1.35245,
and r01 = 0.6985,
r02 = 0.7818,
r03 = 0.9070,
r04 = 0.8958.
3. Compute the synthetic degree of incidence. Take θ = 0.5. We have ρ01 = 0.6263,
ρ02 = 0.6619,
ρ03 = 0.7865,
ρ04 = 0.7359.
114
LIN AND LIU
4. Final analysis. From ρ03 > ρ04 > ρ02 > ρ01 , it can be known that X 3 X4 X 2 X 1 , with X3 being the most favorable factor, X4 the second, X2 the third, and X1 the last. That is, the labor forces have the greatest effect on production revenue of the county, after-tax profits have the second greatest effect, and fixed capital has the least effect on the revenue. This result agrees very well with the actual situation in the region, where the nongovernmental enterprises have been mainly human labor-intensive types so that production growth has been realized mainly through increases of labor forces. In the countryside of China, there is an unlimited source of labor surplus. How to sufficiently and effectively make use of this supply of labor is the only way for China, with its current and special circumstances, to develop its commodity production and to achieve a prosperous economy. Therefore, active development of businesses that require intensive labor is the main direction for the near future development of Chinese nongovernmental enterprises. As for after-tax profits, they have been used mainly to improve employees’ fringe benefits and for technology innovations. This has, on one hand, stimulated the employees’ enthusiasm for efficiency and more working hours and, on the other hand, increased the production qualities of businesses. For more discussion along the line of grey incidence models, please see Xiao (1997).
V. C LUSTERING AND E VALUATIONS Grey cluster is a method, based on matrices of grey incidences or whitenization weight functions of grey numbers, to classify observation indices or observational objects into definable classes. A cluster can be seen as a set of all observational objects arranged in a same class. When grey incidences are used, factors of the same type are clustered in order to simplify complicated systems. It belongs to the study of the problem of deleting variables of a system without altering its fundamental characteristics. The clustering using whitenization weight functions of grey classes is mainly applied to check whether or not an observational object belongs to a predefined class. Otherwise the object could be treated differently.
A GREY SYSTEMS APPROACH
115
Grey statistical evaluation is a method for checking, through the whole of the system under consideration, to which one of the predefined classes a set of same-class observational objects belong, based on a comprehensive evaluation of the objects. It has been applied to evaluation of plans for production investments, divisions of agricultural economic districts, community plannings, teaching schedules, and so forth, and to the final determination of an optimal or satisfactory plan after statistical methods have been applied to the relevant data. A. Two Practical Situations This section lists two real-life problems faced in our work in the recent past. Situation 5.1. A study consisted of three economic districts with three cluster criteria: revenue from farming, revenue from livestock husbandry, and revenue from industry. The observational values xij , i = 1, 2, 3; j = 1, 2, 3, of the ith economic district with respect to the j th criterion are given in the following matrix A: ⎡ ⎤ 80 20 100 A = [xij ]3×3 = ⎣40 30 30 ⎦ . 10 90 60 The question is based on the three income classes: high, medium, and low. Which economic district belongs to which class? Situation 5.2. With the start of reform in the infrastructure of science and technology in 1985, the operational mechanism and organizational structure of the system of science and technology in Henan Province, China, have experienced a major change (Liu, 1995a; Liu et al., 1999). The equipment of various scientific and technological resources has become more available. The overall provincial strength in science and technology has been constantly improving. The relationship between science and technology and economic and social development has been more mature and harmonic. According to relevant statistics, in the last 5 years, progress in science and technology has made over a 44% contribution to the overall economic growth of the province. For the year 1995, the recorded values of the criteria Xi , i = 1, 2, . . . , 21, collected in Henan Province, are given in Table 2. Here, the criteria Xi , i = 1, 2, . . . , 21, are defined as follows: For all input in terms of science and technology,
116
LIN AND LIU TABLE 2 M ATERIALIZED VALUES OF THE C RITERIA Xi
Symbol Value
X1 95.700
X2 104.53
X3 7.800
X4 8
X5 7
X6 3,3500
X7 18.6
Symbol Value
X8 7.9
X9 22
X10 199
X11 9.7030
X12 13.4
X13 8
X14 2,2440
Symbol Value
X15 10.1
X16 12.88
X17 722.97
X18 9.4
X19 10.450
X20 6.7
X21 43.5
X1 stands for the number of scientists and technicians with 10,000 as its unit, X2 the average number of scientists and technicians among each population of 10,000 people, X3 the concentration of engineers with % as its unit, X4 R & D spending/GDP (gross domestic products) with % as its unit, X5 spending in science, technology, and applications with % as its unit, X6 average funding available to individual scientist and/or technician with 1,000 as its unit, X7 total value of equipment used in scientific research with 1 billion as its unit, X8 sale of scientific and technological books with 1 billion as its unit, and X9 the number of personal computers owned in each 10,000 residents. For all input in terms of activities related to science and technology, we have: X10 number of existing agencies engaged in activities for scientific research and applications, X11 number of ongoing scientific research projects with 1,000 as its unit, X12 number of people currently enrolled in a college per 10,000 residents, and X13 average year of formal education of individual person in the work force. In terms of products produced as a consequence of application of progress in science and technology: X14 number of scientific achievements and patents with 1,000 as its unit, X15 number of research papers published in 1,000 articles, X16 amount in terms of money of commercial contracts assigned in the market of technology in 1 billion, X17 amount of increase in terms of money in the area of manufacturing industry in 10 billion,
A GREY SYSTEMS APPROACH
117
X18 concentration of high tech with % as its unit, X19 amount of taxes collected compared to spending in industry, X20 productivity with 1,000 as its unit, and X21 percent of contribution made by progress in technology. Now the questions that need to be addressed include: How is the comprehensive strength of Henan in the area of science and technology classified according to the three classes: weak, medium, and strong? Which of the listed criteria (or indicators) have mainly affected the comprehensive strength of Henan? B. Methods of Clustering Assume that there exist n observational objects and that m characteristic data values for each of these objects have been collected. So, we have the sequences Xi = xi (1), xi (2), . . . , xi (n) , i = 1, 2, . . . , m, (41) where xi (j ) is the observational value of object i with respect to criterion j . For all i ≤ j , i, j = 1, 2, . . . , m, calculating the absolute degree εij of incidence of Xi and Xj leads to the following upper triangular matrix A, called the incidence matrix of the characteristic variables: ⎤ ⎡ ε11 ε12 · · · ε1m ε22 · · · ε23 ⎥ ⎢ (42) A=⎢ .. ⎥ .. ⎣ . . ⎦ εmm where εii = 1, i = 1, 2, . . . , m. Based on practical needs, take a fixed critical value r ∈ [0, 1] with the general requirement that r > 0.5. When εij ≥ r, i = j , the variables Xi and Xj are treated as those of the same characteristics. Such a classification of variables is called a cluster of r incidences (Dong, 1995). It can be seen that the closer to 1 the value of r is, the finer the classification is with fewer variables in each class. On the other hand, the smaller the value of r is, the coarser the classification is with relatively more variables in each class. More generally, assume that there exist n objects to be clustered according to m cluster criteria into s different grey classes. The clustering method based on the observational value of the ith object, i = 1, 2, . . . , n, at the j th criterion, j = 1, 2, . . . , m, the ith object is classified into the kth grey class, 1 ≤ k ≤ s, is called a grey clustering (Zhang et al., 1994a, 1994b). Let the
118
LIN AND LIU
F IGURE 2.
A typical whitenization function.
whitenization weight function of the kth subclass of the j -criterion be fjk (·) with its graph shown in Figure 2. Consider four cases as follows: 1. The whitenization weight function fjk (·) looks as above. 2. The whitenization weight function fjk (·) above does not have the first and the second turning points xjk (1) and xjk (2). 3. The second xjk (2) and the third xjk (3) turning points of the coincide. 4. The whitenization weight function fjk (·) does not have the third and fourth turning points xjk (3) and xjk (4). Now, define the so-called critical value for the kth subclass of the j -criterion as follows: For case 1, define λkj =
1 k xj (2) + xjk (3) , 2
for case 2, λkj = xjk (3), and for cases 3 and 4, λkj = xjk (2). Then the weight of the j -criterion with respect to the kth subclass is defined by λkj ηjk = m
k j =1 λj
.
(43)
A GREY SYSTEMS APPROACH
119
Then, σik
=
m
fjk xi (j ) · ηjk
(44)
j =1
is called the cluster coefficient of variable weight for object i to belong to the kth grey class, σi = (σi1 , σi2 , . . . , σis ) the cluster coefficient vector of object i, and Σ = [σik ]n×s the cluster coefficient matrix. If " # ∗ (45) σik = max σik , 1≤k≤s
then object i belongs to the grey class k ∗ . When the criteria for clustering have different meanings, dimensions, and sizes of observational data, applying variable weight clusterings may lead to the problem that some criteria participate in the clustering process very weakly. There are two ways to resolve this problem. One is to first transform the sample of data values of the criteria into nondimensional values by using either the initiating operator or averaging operator and then cluster the resultant criteria. In this way, all the clustering criteria will be treated equally in the clustering process. The other way is to define a weight for each individual criterion before starting the clustering process. Let us now emphasize this second method. In the discussion above, if the weight ηjk of the j -criterion with respect to the kth subclass has nothing to do with k, j = 1, 2, . . . , m; k = 1, 2, . . . , s, that is, for any k1 and k2 ∈ {1, 2, . . . , s}, one always has that ηjk1 = ηjk2 , then the superscript k in the symbol ηjk will be omitted and ηjk written as ηj instead, j = 1, 2, . . . , m. σik =
m
fjk (xij ) · ηj
(46)
j =1
is called the fixed weight cluster coefficient for the object i to belong to the kth grey class. If for any j = 1, 2, . . . , m, ηj = m1 always holds true, then σik =
m j =1
1 k fj (xij ) m m
fjk (xij ) · ηj =
(47)
j =1
is called equal weight cluster coefficient for the object i to belong to the kth grey class. Example 5.1. Let us conclude this section by looking at Situation 5.1 (Li, 1990b; Qiu, 1995). Assume that the whitenization weight functions fjk (·),
120
LIN AND LIU
j = 1, 2, 3; k = 1, 2, 3, of the three economic districts for the criteria of the revenues of farming, livestock husbandry, and industry are shown in Figures 3, 4, and 5.
F IGURE 3.
F IGURE 4.
The whitenization weight function for the revenue from farming.
The whitenization weight function for the revenue from livestock husbandry.
F IGURE 5.
The whitenization weight function for the revenue from industry.
121
A GREY SYSTEMS APPROACH
From these figures, it follows that ⎧ 0, x<0 ⎪ ⎪ ⎨x , 0 ≤ x < 80 f11 (x) = 80 ⎪ ⎪ ⎩ 1, x ≥ 80,
f12 (x)
=
⎧ 0, ⎪ ⎪ ⎪ x ⎪ ⎪ ⎪ , ⎪ ⎨ 40 80 − x ⎪ ⎪ ⎪ , ⎪ ⎪ 40 ⎪ ⎪ ⎩ 0,
x<0 0 ≤ x ≤ 40 40 < x ≤ 80 x > 80,
⎧ 0, x<0 ⎪ ⎪ ⎪ ⎪ ⎪ 0 ≤ x ≤ 10 ⎨1, f13 (x) = 20 − x , 10 < x ≤ 20 ⎪ ⎪ ⎪ 10 ⎪ ⎪ ⎩ 0, x > 20, ⎧ 0, x<0 ⎪ ⎪ ⎨x , 0 ≤ x < 90 f21 (x) = 90 ⎪ ⎪ ⎩ 1, x ≥ 90, ⎧ 0, x<0 ⎪ ⎪ ⎪ ⎪ x ⎪ ⎪ 0 ≤ x ≤ 45 ⎪ ⎨ 45 , 2 f2 (x) = 90 − x ⎪ ⎪ , 45 < x ≤ 90 ⎪ ⎪ ⎪ 45 ⎪ ⎪ ⎩ 0, x > 90, ⎧ 0, x<0 ⎪ ⎪ ⎪ ⎪ ⎪ 0 ≤ x ≤ 15 ⎨1, 3 f2 (x) = 30 − x , 15 < x ≤ 30 ⎪ ⎪ ⎪ 15 ⎪ ⎪ ⎩ 0, x > 30,
f31 (x)
=
⎧ 0, ⎪ ⎪ ⎨ x 100 ⎪ ⎪ ⎩ 1,
x<0 ,
0 ≤ x < 100 x ≥ 100,
⎧ 0, ⎪ ⎪ ⎪ ⎪ x ⎪ ⎪ ⎪ ⎨ 50 , f32 (x) = 100 − x ⎪ ⎪ , ⎪ ⎪ ⎪ 50 ⎪ ⎪ ⎩ 0,
x<0 0 ≤ x ≤ 50 50 < x ≤ 100 x > 100,
122
LIN AND LIU
and
⎧ 0, ⎪ ⎪ ⎪ ⎪ ⎪ ⎨1, 3 f3 (x) = 40 − x , ⎪ ⎪ ⎪ 20 ⎪ ⎪ ⎩ 0,
x<0 0 ≤ x ≤ 20 20 < x ≤ 40 x > 40.
Therefore, λ11 = 80,
λ12 = 90,
λ13 = 100;
λ21 = 40,
λ22 = 45,
λ23 = 50;
λ31 = 10,
λ32 = 15,
λ33 = 20;
and 80 , 270 40 η12 = , 135 10 η13 = , 45
90 , 270 45 η22 = , 135 15 η23 = , 45
η11 =
η21 =
100 ; 270 50 η32 = ; 135 20 η33 = . 45 η31 =
From σik =
3
fjk (xij ) · ηjk ,
j =1
it follows that when i = 1, σ11 = 0.74,
σ12 = 0.15,
σ13 = 0.22.
That is, σ1 = σ11 , σ12 , σ13 = (0.74, 0.15, 0.22). When i = 2 and 3, we can similarly obtain the following σ2 = σ21 , σ22 , σ23 = (0.37, 0.74, 0.22) and σ3 = σ31 , σ32 , σ33 = (0.59, 0.15, 0.22).
A GREY SYSTEMS APPROACH
123
So, the coefficient matrix of grey cluster is obtained as follows: ⎡ ⎤ 0.74 0.15 0.22
k Σ = σi 3×3 = ⎣0.37 0.74 0.22⎦ . 0.59 0.15 0.22 From
" # max σ1k = σ11 = 0.74,
1≤k≤3
and
" # max σ2k = σ22 = 0.74,
1≤k≤3
" # max σ3k = σ31 = 0.59,
1≤k≤3
it follows that the second economic district belongs to the medium-income grey class, and the first and the third economic districts belong to the highincome grey class. Furthermore, from the cluster coefficients σ11 = 0.74 and σ31 = 0.59, it follows that there still exist some differences between the first and the third districts, even though they all belong to the high-income grey class. If the income grey classes are refined—if the five grey classes are used—high income, mid-high income, medium income, mid-low income, and low income, then different results can be obtained. C. Grey Statistics Assume that n objects have been clustered into s different grey classes according to m evaluation criteria. Let xij , i = 1, 2, . . . , n, j = 1, 2, . . . , m, be the observational value of object i in terms of criterion j . The task now is to apply the values xij , j = 1, 2, . . . , m, to evaluate and analyze object i, i = 1, 2, . . . , n. To achieve this end, only the following steps are needed (Liu and Zhu, 1993; Luo and Yang, 1994): Step 1. Based on the predetermined number s of grey classes for the planned evaluation, divide the individual ranges of the criteria into s grey classes. For example, let [a1 , as+1 ] be the range of the values of criterion j . Now, divide [a1 , as+1 ] into s grey classes as follows: [a1 , a2 ], . . . , [ak , ak+1 ], . . . , [as−1 , as ], [as , as+1 ], where ak , k = 1, 2, . . . , s, in general, can be determined based on specific requirements of a situation or relevant qualitative analysis. Step 2. Let the whitenization weight function value for (ak + ak+1 )/2 to belong to the kth grey class be 1. When ( ak +a2 k+1 , 1) is connected to the starting point ak−1 of the (k − 1)th grey class and the ending point ak+2 of the (k + 1)th grey class, a triangular whitenization weight function fjk (·) is
124
LIN AND LIU
F IGURE 6.
Construction of triangular whitenization weight functions.
obtained in terms of criterion j about the kth grey class, j = 1, 2, . . . , m, k = 1, 2, . . . , s. For fj1 (·) and fjs (·), the range of criterion j can be extended to the left and the right to a0 and as+2 , respectively (see Figure 6 for more details). For any observational value x of criterion j , one can use the following ⎧ 0, x∈ / [ak−1 , ak+2 ] ⎪ ⎪ ⎪ ⎪ x − a k−1 ⎨ , x ∈ [ak−1 , λk ] (48) fjk (x) = λk − ak−1 ⎪ ⎪ ⎪ − x a k+2 ⎪ ⎩ , x ∈ [λk , ak+2 ] ak+2 − λk to compute the degree of membership fjk (x) for x to belong to the kth grey
class, k = 1, 2, . . . , s, and λk = ak +a2 k+1 . Step 3. Compute the cluster coefficient σik for object i, i = 1, 2, . . . , n, in terms of the kth grey class, k = 1, 2, . . . , s: σik =
m
fjk (xij ) · ηj ,
(49)
j =1
where fjk (xij ) stands for the whitenization weight function for object i to belong to the kth grey class under criterion j and ηj the weight of criterion j of the clustering. ∗ Step 4. If max1≤k≤s {σik } = σik , then object i belongs to the k ∗ th grey class. When more than one object belongs to the k ∗ th grey class, the order of preference among these objects can be further determined based on the magnitudes of their individual cluster coefficients. Next, consider how criteria used in a study can be clustered. Assume that there exist n statistical objects and m statistical criteria and that these criteria will be clustered into s grey classes. Classifying criterion j into the kth grey
A GREY SYSTEMS APPROACH
125
class, k = 1, 2, . . . , s, based on the observational values xij , of the n objects with respect to the j th criterion, i = 1, 2, . . . , n, and j = 1, 2, . . . , m, is called grey statistics. Let f k (·) (k = 1, 2, . . . , s) be the whitenization weight function ofthe kth grey class and ηi (i = 1, 2, . . . , n) the weight of object i satisfying ni=1 ηi = 1. 1. When ηi = n1 , i = 1, 2, . . . , n, then n f k (xij ) · ηi k n , j = 1, 2, . . . , m; k = 1, 2, . . . , (50) s, σj = s i=1 k k=1 i=1 f (xij ) · ηi is called the statistical coefficient of equal weight objects. 2. When there are i1 and i2 ∈ {1, 2, . . . , n} so that ηi1 = ηi2 , the following n f k (xij ) · ηi k n , j = 1, 2, . . . , m; k = 1, 2, . . . , s, (51) σj = s i=1 k k=1 i=1 f (xij ) · ηi is called the statistical coefficient of unequal weight objects. The vector σj = (σj1 , σj2 , . . . , σjs ), j = 1, 2, . . . , m, and the matrix Σ = [σjk ]m×s are called the vector of statistical coefficients of the criterion j and ∗
the matrix of statistical coefficients. If σjk = max1≤k≤s {σjk }, then the j th statistical criterion, seen from the whole of the system under consideration, belongs to the grey class k ∗ . When the statistical objects under consideration are different economic districts or different departments of a government entity related to economic development and the economic criteria are different business types, grey statistics can be applied to analyze and to synthetically evaluate groups of economic bodies with respect to different business types to decide which business types warrant more attention. When the statistical objects are different sectors of a decision-making unit and when the statistical criteria are different decision-making plans, the relevant grey statistics can synthesize all different ideas from different sectors, evaluate all decision-making plans, and select the optimal plan. Example 5.2. As an example of testing our work, the rest of this section finishes Situation 5.2 (Li, 1990a; Lin and Liu, 1999a, 2000a). Based on a combination of various methods of survey, we organized the index system for evaluating regional strength in the area of science and technology as shown in Table 3. Based on Table 3, the general form of triangular whitenization weight functions for the criteria Xi , i = 1, 2, . . . , 21, is shown in Figure 7.
126
LIN AND LIU TABLE 3 I NDEX S YSTEM FOR E VALUATING R EGIONAL S TRENGTH
Symbol
Weight
X1
8
X2
Medium class
Strong class
2 ≤ x 11 < 20
20 ≤ x 21 < 70
70 ≤ x 31 < 110
5
80 ≤ x 12 < 150
150 ≤ x 22 < 250
250 ≤ x 32 < 500
X3
6
X4
5
X5
5
X6
4
X7
4
X8
4
X9
4
5 ≤ x 13 3 ≤ x 14 2 ≤ x 15 1 ≤ x 16 5 ≤ x 17 0.5 ≤ x 18 8 ≤ x 19
12 ≤ x 23 6 ≤ x 24 5 ≤ x 25 2 ≤ x 26 10 ≤ x 27 2.5 ≤ x 28 12 ≤ x 29
X10
4
30 ≤ x 110 < 100
100 ≤ x 210 < 200
200 ≤ x 310 < 350
X11
6
2 ≤ x 111 < 4
4 ≤ x 211 < 7
7 ≤ x 311 < 15
X12
5 5
15 ≤ x 112 < 25 6 ≤ x 213 < 9
25 ≤ x 112 < 50
X13
8 ≤ x 112 < 15 4 ≤ x 113 < 6
X14
5
0.5 ≤ x 114 < 1
1 ≤ x 214 < 2
2 ≤ x 314 < 4
X15
5
3 ≤ x 115 < 5
5 ≤ x 215 < 8
8 ≤ x 315 < 15
X16
5
X17
3
X18
4
X19
4
X20
3
X21
6
F IGURE 7. technology.
Weak class
< 12 <6 <5 <2 < 10 < 2.5 < 12
2 ≤ x 116 5 ≤ x 117 5 ≤ x 118 2 ≤ x 119 3 ≤ x 120 25 ≤ x 121
<5 < 20 <8 <5 <6 < 35
< 20
20 ≤ x 33 < 30
< 10
10 ≤ x 34 < 15
<8
8 ≤ x 35 < 12
<5
5 ≤ x 36 < 12
< 20
20 ≤ x 37 < 40
< 5.5
5.5 ≤ x 38 < 9
< 20
20 ≤ x 39 < 50
5 ≤ x 216 20 ≤ x 317 8 ≤ x 218 5 ≤ x 219 6 ≤ x 220 35 ≤ x 221
9 ≤ x 313 < 11
< 10
10 ≤ x 316 < 20
< 60
60 ≤ x 317 < 120
< 12
12 ≤ x 318 < 20
<8
8 ≤ x 319 < 11
<9
9 ≤ x 320 < 15
< 45
45 ≤ x 321 < 55
The general form of triangular whitenization weight functions in science and
127
A GREY SYSTEMS APPROACH
Here xj0 and xj5 represent expanded values of the range of values of the criteria Xj , j = 1, 2, . . . , 21. For an observational value x of criterion j , we can compute the whitenization weight function value fjk (x) for the kth grey class, k = 1, 2, 3, as follows: ⎧
⎪ 0, x∈ / xjk−1 , xjk+2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x − xjk−1
⎪ ⎪ , x ∈ xjk−1 , λkj ⎨ k k−1 (52) fjk (x) = λj − xj ⎪ ⎪ ⎪ ⎪ ⎪ xjk+2 − x
⎪ ⎪ ⎪ , x ∈ λkj , xjk+2 . ⎪ k+2 ⎩ k x j − λj For example, when j = 1, we expand the range of values for criterion X1 , the number of scientists and technicians, to x10 = 0.5 and x15 = 160. In this case, x11 , x12 , x13 , and x14 are, respectively, the threshold values for the three grey classes: weak, medium, and strong, that is, x11 = 2, x12 = 20, x13 = 70, and x14 = 110. Now, we let λk1 be the average value of x1k and x1k+1 , that is, 1 1 x1 + x12 = 11, 2 1 λ31 = x13 + x14 = 90. 2 λ11 =
λ21 =
1 2 x1 + x13 = 45, 2
and
Now, substituting these specific values into Eq. (52) leads to the following triangular whitenization weight functions for the case j = 1. ⎧ 0, x∈ / [0.5, 70] ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ x − 0.5 , x ∈ [0.5, 11] 1 (53) f1 (x) = 11 − 0.5 ⎪ ⎪ ⎪ ⎪ 70 − x ⎪ ⎩ , x ∈ [11, 70], 70 − 11 ⎧ 0, ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ x−2 , 2 f1 (x) = 45 − 2 ⎪ ⎪ ⎪ 110 − x ⎪ ⎪ ⎩ , 110 − 45
x∈ / [2, 110] x ∈ [2, 45] x ∈ [45, 110],
(54)
128
LIN AND LIU TABLE 4 E XPANDED VALUES FOR THE C RITERIA X1 –X21 Symbol xj0
X1
X2
X4
X5
X6
X7
40
2
1
1
300
2
xj5
160
800
40
20
18
20,000
60
Symbol
X8
X9
X11
X12
xj0
xj5 Symbol xj0 xj5
0.5
X3
0.20
X10
X13
X14
3
10
11
3
3
300
12
80
500
20
80
12
6,000
X15
X16
X18
X19
1 20
0.5 30
X17
X20
X21
20
3
1
1
15
1,800
30
15
20
65
and
⎧ 0, x∈ / [20, 160] ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ x − 20 , x ∈ [20, 90] (55) f13 (x) = 90 − 20 ⎪ ⎪ ⎪ 160 − x ⎪ ⎪ ⎩ , x ∈ [90, 160]. 160 − 90 After substituting x1 = 95.70, the value materialized in 1995 in Henan Province, into Eqs. (53), (54), and (55), we obtain the whitenization weight values of the criterion X1 in terms of the three grey classes: weak, medium, and strong as follows: f11 (95.70) = 0,
f12 (95.70) = 0.22,
f13 (95.70) = 0.919.
From these values, it can be seen that in terms of the number of scientists and technicians, Henan Province has entered the group of provinces with relatively strong scientific and technological strength. Table 4 lists all the expanded values for the criteria X1 –X21 , and Table 5 lists the relevant whitenization weight function values regarding the three grey classes: weak, medium, and strong. From Eq. (49), we can compute the cluster coefficients σHk N , k = 1, 2, 3, for the provincial strength of Henan in areas of science and technology in terms of the three grey classes: weak, medium, and strong as follows: σH1 N
=
21 j =1
fj1 (xj ) · ηj = 26.67,
129
A GREY SYSTEMS APPROACH
TABLE 5 W HITENIZATION W EIGHT F UNCTION VALUES R EGARDING THE T HREE G REY C LASSES : W EAK , M EDIUM , AND S TRONG Symbol
X1
X2
fj1 (x)
0
0.860
0.892
0.636
0.222
0.471
0.112
0.22
0.204
0.225
1
0.909
0.925
0.86
fj3 (x)
0.919
0
0
0.308
0.4
0.208
0.43
Symbol
X8
X9
X10
X11
X12
X13
X14
fj1 (x)
0
0
0.007
0
0.859
0.25
0
0.22
0.824
0.755
0.558
0.45
0.857
0.702
fj3 (x)
0.863
0.435
0.566
0.814
0
0.5
0.622
Symbol
X15
X16
X17
X18
X19
X20
X21
fj1 (x)
0
0
0
0.473
0
0.511
0.233
0.576
0.618
0.596
0.88
0.122
0.822
0.9
0.785
0.728
0.747
0.175
0.827
0.117
0.433
fj2 (x)
fj2 (x)
fj2 (x)
fj3 (x)
σH2 N
X3
=
21
X4
X5
X6
X7
fj2 (xj ) · ηj = 61.261,
j =1
and σH3 N =
21
fj3 (xj ) · ηj = 48.157.
j =1
From the fact that 1 ≤ k ≤ 3} = 61.216 = σH2 N , it can be seen that the comprehensive strength of Henan Province in areas of science and technology belongs to the grey class of “medium.” However, the value of σH3 N = 48.157 is relatively close to the value of σH2 N . We can see that the overall strength of Henan Province is getting near the grey class of “strong.” This end discovery agrees well with the fact that in the recent years Henan Province has been recognized as being above the medium of the national level in terms of strength in areas of science and technology. Table 5 also shows that the Henan Province strength has been affected by such factors as X2 , X3 , X, X12 , and X20 , which are all about per capita averages, as well as factor X2 , that is, the concentration of high technology. So, in order to strengthen its ranks when compared to other provinces and major cities in China, Henan Province needs to emphasize improving its per max{σHk N :
130
LIN AND LIU
capita averages. It needs to apply effective means to speed up the development of high technology in the province. Henan Province is the largest province in China in terms of population with a relative weak foundation in such areas as high technology. Improving its per capita averages and developing high-tech manufacturing industry will not be simply a matter of days. It will possibly take years to accomplish such a goal.
VI. L AW
OF
E XPONENTIALITY AND P REDICTIONS
This section will show how the idea of sequence operators, as studied in Section III, can be applied to uncover some fundamental patterns existing in sequences of data. After establishing the so-called law of exponentiality, we will see how this law can be handily applied to make predictions for time series data.
A. Model GM(1, 1) (0)
(0)
(0)
Given a sequence of data X (0) = (x1 , x2 , . . . , xn ), let D be the accumulating operator as studied in Section III. We have D(X(0) ) = X (1) = (x1(1) , x2(1) , . . . , xn(1) ) and Z (1) = (z1(1) , x2(1) , . . . , zn(1) ), where z1(1) = x1(1) (1) and zk(1) = 0.5xk(1) + 0.5xk−1 , for k = 2, 3, . . . , n, called the sequence mean generated based on consecutive neighbors of X(1) . Now, the so-called GM(1, 1) model, meaning a grey model of the first order in one variable, is the following equation: xk(0) + azk(1) = b,
(56)
where a and b are some constants to be determined. The basic concept is that we like to use a solution of the differential equation dx dt + ax = b to simulate the pattern of X(0) . For more detailed and relevant studies, please see Ci (1995), Guo (1991), Hao and Wang (2000), He (1997), Lin et al. (2000, 2002), Liu and Deng (1999), Mu and Liu (1996), Shui and Qing (1998), Wang et al. (2001), Wen et al. (2001), and Yang et al. (1998). Theorem 6.1 (Law of Exponentiality). Assume that X(0) is a nonnegative (0) sequence, that is xk ≥ 0, k = 1, 2, . . . , n, X(1) and Z (1) the same as above.
131
A GREY SYSTEMS APPROACH
(1) If aˆ = [a, b]T is a sequence of parameters and ⎡
⎤ x (0) (2) ⎢x (0) (3)⎥ ⎢ ⎥ Y = ⎢ .. ⎥ , ⎣ . ⎦ x (0) (n)
⎡
−z(1) (2) ⎢−z(1) (3) ⎢ B=⎢ .. ⎣ .
⎤ 1 1⎥ ⎥ .. ⎥ , .⎦
−z(1) (n) 1
then the least squares estimate sequence of Eq. (56) satisfies aˆ = [B T B]−1 B T Y.
(57)
(2) Let B and Y be the same as in (1) and aˆ given in Eq. (56). Then the following hold true: a. The solution of the grey Eq. (56) is given by b −at b (1) (1) x (t) = x (0) − e (58) + . a a !(1) of X(1) out of the GM(1, 1) model in b. The simulated sequence X Eq. (56) is given by b −ak b (1) (1) e + , k = 1, 2, . . . , n. (59) xˆ (k + 1) = x (0) − a a c. By letting x (1) (0) = x (0) (1), one has b −ak b (1) (0) xˆ (k + 1) = x (1) − e + , a a
k = 1, 2, . . . , n. (60)
d. The restored values of x (0) (k)’s can be given by xˆ (0) (k + 1) = α (1) xˆ (1) (k + 1) = xˆ (1) (k + 1) − xˆ (1) (k),
(61)
k = 1, 2, . . . , n. Theorem 6.1(1) is the basis for the construction of each GM(1, 1) model. Since this result is established using vertical distances of data, it is reasonable to expect better simulation results if true distances of data are applied. To this end, the following theorem is similar to Theorem 6.1(1), but based on the estimate of least sum of squared true distances between the data points in X(0) and a special exponential curve satisfying the differential equation of the form: dx dt + ax = b.
132
LIN AND LIU
Theorem 6.2. Assume that the sequence X(0) is nonnegative. If aˆ = [a b]T is the estimate based on the least sum of squared distances of parameters [α β]T such that a curve satisfying dx dt + ax = b provides the (0) best true distance fit of X , then aˆ satisfies (1,1) A(0) X (0) AC
aˆ =
0 −1 1 0
(0,1) A(0) X(0) AB
det((1,1) A(0) X(0) AC)
(62)
where for i = 1, 2, . . . , n,
j j ··· j = 1 A = (r) , x1 x2(r) · · · xn(r) 2×n ⎤ ⎡ (0) 0 ··· 0 x1
⎢ 0 x2(0) · · · 0 ⎥ (0) ⎥ X = diag x1(0) , x2(0) , . . . , xn(0) = ⎢ , ⎣··· ··· ··· ···⎦ (0) 0 0 · · · xn n×n
(0) (0) (0) A = diag 1 A, 2 A, . . . , (0) n A A2 ⎤ ⎡ (0) 0 0 ··· 0 0 1 x1 (0) ⎢0 0 1 x2 ··· 0 0 ⎥ ⎥ =⎢ , ⎣· · · · ··· · · ⎦ (0) 0 0 0 0 · · · 1 xn n×2n T
(0) (0) T (0) B = 1 A, 2 A, . . . , (0) 1 x2(0) · · · 1 xn(0) , n A 1×2n = 1 x1 (0) (0) (0) (0) (0) (0) T A/x A/x · · · · · · A/x n n 1 1 2 2 . C= B 2×2n (0) i A
xi(0) 1×2 ,
(r,j )
B. GM(1, N) and GM(0, N) This section shows how to generalize GM(1, 1) modeling to other practical situations. Assume that X1(0) = x1(0) (1), x1(0) (2), . . . , x1(0) (n) (63) is a sequence of data of a system’s characteristics, Xi(0) = xi(0) (1), xi(0) (2), . . . , xi(0) (n) ,
i = 2, 3, . . . , N,
(64)
133
A GREY SYSTEMS APPROACH
sequences of relevant factors, Xi(1) = D(Xi(0) ), i = 1, 2, . . . , N, and Z1(1) the sequence mean generated based on consecutive neighbors of X1(1) . Then (0)
(1)
x1 (k) + az1 (k) =
N
(1)
bi xi (k)
(65)
i=2
is called a GM(1, N) grey differential equation (Chen and Chang, 2000; Chen and Nie, 1999). In such a GM(1, N) grey differential equation, (−a) is called (1) the development coefficient of the system, bi xi (k) the driving term, bi the driving coefficient, and aˆ = [a, b2 , b3 , . . . , bN ]T the sequence of parameters. (0)
(0)
(1)
(1)
Theorem 6.3. With X1 , Xi , i = 2, 3, . . . , N, Xi , and Z1 defined as above. Let ⎡ (0) ⎤ ⎡ (1) ⎤ (1) (1) x1 (2) −z1 (2) x2 (2) · · · xN (2) ⎥ ⎢ ⎢ (1) ⎥ (1) ⎢x (0) (3)⎥ ⎢−z1 (3) x2(1) (3) · · · xN (3)⎥ 1 ⎥ ⎢ ⎥, B=⎢ Y = ⎢ .. ⎥ . ⎢ ⎥ . . . ⎥ ⎢ . .. ⎦ .. .. ⎣ ⎦ ⎣ (1) (0) −z1(1) (n) x2(1) (n) · · · xN (n) x1 (n) Then the least squares estimate of the sequence of parameters aˆ = [a, b2 , b3 , . . . , bN ]T satisfies −1
aˆ = B T B B T Y. (0)
(1)
(66) (1)
Theorem 6.4. Assume that Xi , Xi , i = 1, 2, . . . , N, Z1 , B, and Y are the same as above, and aˆ is given by Eq. (66). Then, the following hold true: 1. When all of Xi(1) , i = 2, 3, . . . , N, vary slightly, N
(1)
bi xi (k)
(67)
i=2 (1)
is seen as a grey constant. Then the simulated value of X1 GM(1, N) grey differential Eq. (65) is given by
out of the
134
LIN AND LIU
, (1) xˆ1 (k
+ 1) =
(1) x1 (0) −
N 1 (1) bi xi (k + 1) e−ak a i=2
+
1 a
N
(1)
bi xi (k + 1),
(68)
i=2
where x1(1) (0) is taken to be x1(0) (1). 2. The simulated value of X1(0) , using inverse accumulation, is given by (0)
(1)
(1)
(1)
xˆ1 (k + 1) = α (1) xˆ1 (k + 1) = xˆ1 (k + 1) − xˆ1 (k).
(69)
3. The simulated value of X1(0) , using differences, is given by x1(0) (k) = −az1(1) (k) +
N
bi xi(0) (k).
(70)
i=2 (0)
(1)
Let Xi be the same as defined above, i = 1, 2, . . . , N, and Xi (0) D(Xi ), i = 1, 2, . . . , n, where D is the accumulating operator. Then (1)
(1)
(1)
(1)
X1 = b 2 X2 + b 3 X3 + · · · + bN XN + a
=
(71)
is called a GM(0, N) model. Since GM(0, N) does not contain derivatives, it is a static model. In form, it is similar to that of linear regression models but has some essential differences from linear regression models. In general, linear regression models are built on the basis of original data sets, while the (1) (0) foundation for GM(0, N) models is Xi = D(Xi ) of the original data. (0)
(1)
Theorem 6.5. Let Xi and Xi be the same as above, and ⎡ (1) ⎤ ⎤ ⎡ (1) (1) (1) x2 (2) x3 (2) · · · xN (2) 1 x1 (2) ⎢ (1) ⎥ ⎥ ⎢ (1) (1) (1) ⎢ ⎥ ⎥ ⎢ Y = ⎢x1 (3)⎥ . B = ⎢x2 (3) x3 (3) · · · xN (3) 1 ⎥ , ⎣ ⎦ ⎦ ⎣ ··· · · · ··· ··· ··· ··· (1) (1) (1) (1) x1 (n) x2 (n) x3 (n) · · · xN (n) 1 Then the least squares estimate of the parameter sequence bˆ = [b2 , b3 , . . . , bN , a]T is
−1 bˆ = B T B B T Y.
(72)
135
A GREY SYSTEMS APPROACH
For more general models, such as GM(n, h), see Li (1997) and Shui et al. (1990). C. Time Series Predictions Prediction of sequences is mostly done using GM(1, 1) models. Based on practical circumstances, other grey systems models can also be used. On the foundation of a qualitative analysis, appropriate sequence operators are defined and GM models are built on the sequences obtained by applying the chosen sequence operators. With accuracy checks (Chen, 1985; Deng, 1986), the models can be used to make predictions (Jie et al., 2001). 1. Interval Predictions For chaotic data sequences, regardless of which model is used, it is difficult for the simulation outcome to pass accuracy tests. In this case, a range can be derived for future changes and prediction of the interval of possible values. Assume that X (0) (t) is the zigzagged line of a sequence and that f (t) and fu (t) are continuous smooth curves. If for any t, the following always holds true: f (t) ≤ X (0) (t) ≤ fu (t),
(73)
X (0) (t),
and fu (t) an upper then f (t) is called a lower bound (function) of bound (function); and "
# S = t, X(t) | X(t) ∈ f (t), fu (t) the value band of X(0) (t). The two curves, one of which connects all the local minimum points and the other which connects all the local maximum points in the zigzagged curve of the sequence X (0) (t), are called the lower bound (0) curve and the upper bound curve of the sequence X (0) (t), respectively. If X stands for the sequence corresponding to the lower bound curve of X(0) (t), (0) and Xu the upper bound curve of X(0) (t), let b b (1) (0) xˆ (k + 1) = x (1) − · e−a k + (74) a a and bu bu xˆu(1) (k + 1) = xu(0) (1) − · e−au k + (75) au au (0)
(0)
are respectively the GM(1, 1) simulations of X and Xu , then "
(1) # S = t, X(t) | X(t) ∈ xˆ (t), xˆu(1) (t)
(76)
136
LIN AND LIU
F IGURE 8.
Appearance of a wrapping band.
is called a wrapping band. Each wrapping band is given in Figure 8. Now, an interval prediction is made as follows: If f (t) and fu (t) are the lower bound and upper bound functions of X(1) = D(X(0) ), where D is the accumulating operator, then for any k > 0, (1)
xˆ (n + k) = f (n + k)
(77)
is called the lowest predicted value, xˆu(1) (n + k) = fu (n + k)
(78)
the highest predicted value, and xˆ (1) (n + k) =
1 f (n + k) + fu (n + k) 2
(79)
the basic predicted value. 2. Disaster Predictions Essentially, disaster prediction is a prediction for abnormal values. So, the first question is: What kinds of values are abnormal? In general, people use their experience and subjective criteria to determine what values are normal and what are not. The task for disaster predictions is to pinpoint the time moment(s) for one or several abnormal values to occur so that relevant parties can have enough time to make preparations for disasters to come. Assume that X = x(1), x(2), . . . , x(n) (80) is a sequence of raw data. For a given upper abnormal (or catastrophe) value ξ , the subsequence of X
Xξ = x q(1) , x q(2) , . . . , x q(m) " # = x q(i) | x q(i) ≥ ξ, i = 1, 2, . . . , m (81)
A GREY SYSTEMS APPROACH
137
is called an upper catastrophe sequence. For a given lower abnormal (or catastrophe) value ζ , the subsequence of X
Xζ = x q(1) , x q(2) , . . . , x q() " # = x q(i) | x q(i) ≤ ζ, i = 1, 2, . . . , (82) is called a lower catastrophe sequence. The upper and lower abnormal sequences are both called catastrophe sequences. Since different catastrophe sequences require different approaches to handle related details, in the following discussions, we will not distinguish between the upper and lower catastrophe sequences. For a catastrophe sequence of X
Xξ = x q(1) , x q(2) , . . . , x q(m) ⊂ X, the following is called a catastrophe date sequence Q(0) = q(1), q(2), . . . , q(m) .
(83)
The so-called disaster prediction is about finding patterns, if any, through the study of catastrophe date sequences in order to predict future dates of occurrences of catastrophes. In grey systems theory, each disaster prediction is realized or done through establishing of GM(1, 1) models for relevant catastrophe date sequences. 3. Seasonal Disaster Predictions Assume that Ω = [a, b] is the overall time interval of a study. If ωi = [ai , bi ] ⊂ Ω = [a, b],
(84)
i = 1, 2, . . . , s, satisfy 1. Ω = si=1 ωi ; 2. ωi ∩ ωj = ∅, for any j = i, then each ωi , i = 1, 2, . . . , s, is called a season in Ω, or a time interval or time zone. For example, when Ω = [1, 365] represents a year with February 1 as the starting point 1, then ω1 = [1, 89],
ω2 = [90, 181],
ω3 = [182, 273],
ω4 = [274, 365] would represent the spring, summer, autumn, and winter seasons of a year.
138
LIN AND LIU
Let ωi ⊂ Ω be a season and X = x(1), x(2), . . . , x(n) ⊂ ωi a sequence of raw data. For a fixed abnormal value ξ , the corresponding catastrophe sequence
Xξ = x q(1) , x q(2) , . . . , x q(m) (85) is called a seasonal catastrophe sequence. Accordingly, Q(0) = q(1), q(2), . . . , q(m)
(86)
is called a seasonal catastrophe date sequence. Now, a seasonal disaster prediction can be conducted according to the following steps: 1. Collect the sequence of raw data X = (x(1), x(2), . . . , x(n)). 2. Study the range of change of the sequence of raw data and determine the season ωi = [ai , bi ] of interest. 3. Let y(k) = x(k) − ai and transform the original sequence into Y = (y(1), y(2), . . . , y(n)) in order to improve the distinguishing rate between the data values. 4. Choose an abnormal value ξ and find the seasonal catastrophe sequence Yξ = (y[q(1)], y[q(2)], . . . , y[q(m)]) and the seasonal catastrophe date sequence Q(0) = (q(1), q(2), . . . , q(m)). 5. Establish the catastrophe GM(1, 1) model q (0) (k) + az(1) (k) = b. 6. Test the simulation accuracy and make predictions. 4. Stock Market-Like Predictions When the sequence of raw data vibrates widely with a relatively large amplitude, it is often difficult to find an appropriate simulation model. In this case, if the prediction on the ranges of change, as described earlier in this section, is not satisfactory, predictions can be made on the wavy curve of the future development of the data based on the wavy curve of the known data sequence. This kind of prediction is called a stock market-like prediction. More specifically, assume that X = (x(1), x(2), . . . , x(n)) is a sequence of raw data. Define " # " # σM = max x(k): 1 ≤ k ≤ n , σm = min x(k): 1 ≤ k ≤ n . For any ξ ∈ [σm , σM ], X = ξ is called a ξ -contour (line). The solution (ti , x(ti )) (i = 1, 2, . . .) of the system of equations "
# X = xk = x(k) + (t − k) x(k + 1) − x(k) | k = 1, 2, . . . , n − 1 (87) X=ξ
A GREY SYSTEMS APPROACH
139
is called a ξ -contour point. ξ -Contour points are the intersection points of the zigzagged line of X and the ξ -contour line. Assume that Xξ = (P1 , P2 , . . . , Pm ) is a sequence of ξ -contour points, where Pi is located on the ti th zigzagged line segment with coordinates ξ − x(ti ) ti + ,ξ . x(ti + 1) − x(ti ) Let ξ − x(ti ) , x(ti + 1) − x(ti )
(88)
Q(0) = q(1), q(2), . . . , q(m)
(89)
q(i) = ti + i = 1, 2, . . . , m. Then
is called a sequence of ξ -contour moments. Establishing a GM(1, 1) for the sequence of ξ -contour moments can produce predicted values for future ξ -contour moments: q(m ˆ + 1), q(m ˆ + 2), . . . , q(m ˆ + k). Assume that X = ξi , i = 1, 2, . . . , s are s different contour lines, = (qi (1), qi (2), . . . , qi (mi )), i = 1, 2, . . . , s, sequences of contour moments, and qˆi (mi + 1), qˆi (mi + 2), . . . , qˆi (mi + ki ), i = 1, 2, . . . , s, GM(1, 1) predicted values for ξi -contour moments. If there exist i = j such that qˆi (mi + i ) = qˆj (mj + j ), then qˆi (mi + i ) and qˆj (mj + j ) are called a pair of useless predicted moments. (0) Qi
Proposition 6.1. Assume that qˆi (mi + 1), qˆi (mi + 2), . . . , qˆi (mi + ki ), i = 1, 2, . . . , s are the GM(1, 1) predicted values for ξi -contour moments. Delete all useless moments in qˆ1 (m1 + 1), qˆ1 (m1 + 2), . . . , qˆ1 (m1 + k1 ); qˆ2 (m2 + 1), . . . , qˆ2 (m2 + k2 ); . . . qˆi (mi + 1), . . . , qˆi (mi + ki ); . . . qˆs (ms + 1), . . . , qˆs (ms + ks ) and rank the remaining moments from the smallest to the greatest as follows: is q(1) ˆ < q(2) ˆ < · · · < q(n ˆ s ), where ns ≤ k1 + k2 + · · · + ks . If X = ξq(k) ˆ (0) the contour line corresponding to q(k), ˆ then the predicted wavy curve of X !(0) = {ξq(k) is X = X + [t − q(k)] ˆ · (ξq(k+1) − ξq(k) ˆ ˆ ˆ ) | k = 1, 2, . . . , n}. 5. Systems Predictions For a system with several mutually related factors and many behavioral variables, no single model can truly reflect the development pattern of the
140
LIN AND LIU
system. In this case, one must consider establishing a system of models in order to make effective predictions for the system of interest. (0) are sequences of raw data for the state Assume that X1(0) , X2(0) , . . . , Xm (0) (0) variables of a system and U1 , U2 , . . . , Us(0) sequences of data for the control variables of the system. Then, ⎧ (1) dx1 ⎪ (1) (1) (1) (1) ⎪ ⎪ = a11 x1 + · · · + a1m xm + b11 u1 + · · · + b1s us ⎪ ⎪ dt ⎪ ⎪ ⎪ ⎪ dx2(1) (1) (1) (1) (1) ⎪ ⎪ = a21 x1 + · · · + a2m xm + b21 u1 + · · · + b2s us ⎪ ⎪ dt ⎪ ⎪ ⎪· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ⎪ ⎪ ⎪ (1) ⎪ ⎪ dxm (1) (1) (1) (1) ⎪ ⎨ = am1 x1 + · · · + amm xm + bm1 u1 + · · · + bms us dt (90) (1) ⎪ ⎪ du1 (1) ⎪ = c1 u1 + d1 ⎪ ⎪ ⎪ ⎪ dt(1) ⎪ ⎪ ⎪ ⎪ du2 = c u(1) + d ⎪ 2 2 2 ⎪ ⎪ ⎪ ⎪· ·dt ⎪ ···· ······ · · ⎪ ⎪ ⎪ (1) ⎪ du ⎪ ⎩ s = cs u(1) + ds s dt is called a system of prediction models. Proposition 6.2. The simulated sequences of the system of prediction models are given below: $ , m 1 (1) (1) a11 k · x1 (0) + a1j xj(1) (k + 1) xˆ1 (k + 1) = e a11 j =2 -. s (1) b1i ui (k + 1) + i=1
, m s 1 (1) (1) a1j xj (k + 1) + b1i ui (k + 1) , − a11 j =2 i=1 $ , 1 (1) (1) (1) a22 k · x2 (0) + a2j xj (k + 1) xˆ2 (k + 1) = e a22 j =2 -. s (1) b2i ui (k + 1) + i=1
141
A GREY SYSTEMS APPROACH
,
−
-
s 1 (1) (1) a2j xj (k + 1) + b2i ui (k + 1) , a22 j =2
i=1
$ (1) xˆm (k
+ 1) = e
amm k
+
·
s
(1) xm (0) +
,
amm j =m -.
(1)
1 amm
,
j =m
(1) amj xj (k
+ 1) +
d1 d1 · ec1 k − , + 1) = c1 c1 d d 2 2 (1) (1) · ec2 k − , uˆ 2 (k + 1) = u2 (0) + c2 c2 ds ds (1) · ecs k − , uˆ (1) (k + 1) = u (0) + s s cs cs
(1) uˆ 1 (k
amj xj(1) (k + 1)
bmi ui (k + 1)
i=1
−
1
s
(1) bmi ui (k
+ 1) ,
i=1
(1) u1 (0) +
where the simulated sequences of the state variables are approximate. D. A Test of Applications This section considers a stock market-like prediction done for the annual runoff amount of the upper reaches of Fen River Reservoir in Shanxi Province, the People’s Republic of China. The total volume of water that can be kept in the Fen River Reservoir, which was built in 1958, is 0.72 billion m3 . The upper reaches of the reservoir include Ningwu County, Jingle County, Lan County, and Loufan County with a total drainage area of 52,680 km2 . The curve in Figure 9 gives the annual runoff amounts from 1951 to 1980. Let us take ξ1 = 2, ξ2 = 2.5, ξ3 = 3, ξ4 = 4, ξ5 = 5, ξ6 = 6, ξ7 = 7, ξ8 = 8, and ξ9 = 8.5 with 0.1 billion m3 as the dimension. Then, the sequences of ξi -contour moments are given, respectively, as follows: " #5 (0) Q1 = q1 (k) 1 = (15, 21.5, 22.1, 24.4, 25.2), " #8 Q(0) 2 = q2 (k) 1 = (13, 14.8, 15.2, 22.1, 22.2, 23.9, 25.6, 29.9), " #13 Q(0) 3 = q3 (k) 1 = (1.7, 2.2, 5, 7.1, 12.4, 13.1, 14.7, 15.5, 20.7, 22.3, 23.7, 26, 29.6),
142
LIN AND LIU
F IGURE 9.
Annual runoff amounts in upper reaches of Fen River Reservoir.
" #19 (0) Q4 = q4 (k) 1 = (1, 2.6, 4.8, 5.5, 6.4, 7.4, 9.8, 10.7, 11.4, 13.5, 14.4, 16.1, 17.9, 18.1, 19.9, 22.6, (0) Q5
23.4, 26.4, 29.1), " #13 = q5 (k) 1 = (3, 4.6, 7.8, 9.6, 13.9, 14.1, 16.1,
Q(0) 6 (0) Q7 Q(0) 8
17.8, 19, 22.8, 23.1, 26.7, 28.4), " #6 = q6 (k) 1 = (3.3, 4.5, 8.1, 9.4, 16.2, 17.7), " #6 = q7 (k) 1 = (3.5, 4.3, 8.5, 9.3, 16.3, 17.6), " #6 = q8 (k) 1 = (3.7, 4.2, 8.8, 9, 16.5, 17.5),
and " #5 (0) Q9 = q9 (k) 1 = (3.8, 4.1, 9.1, 16.6, 17.4). Apply accumulating generation once on Q(0) i , i = 1, 2, . . . , 9. Then, the (1) GM(1, 1) simulation sequences of Qi , i = 1, 2, . . . , 9 are respectively given as follows: (1)
qˆ1 (k + 1) = 359.86 · e0.06k − 344.86, (1)
qˆ2 (k + 1) = 128 · e0.11k − 115.7, qˆ3(1) (k + 1) = 45.65 · e0.13k − 43.95, qˆ4(1) (k + 1) = 54.15 · e0.1k − 53.15, (1)
qˆ5 (k + 1) = 68.68 · e0.12k − 65.98, (1)
qˆ7 (k + 1) = 15.52 · e0.3k − 13.02, (1)
qˆ9 (k + 1) = 14.59 · e0.37k − 10.79.
(1)
qˆ6 (k + 1) = 16.1 · e0.3k − 12.8, (1)
qˆ8 (k + 1) = 16.7 · e0.3k − 13,
A GREY SYSTEMS APPROACH
F IGURE 10. Reservoir.
143
Predicted curve for the annual runoff amounts at upper reaches of Fen River
(1)
(1)
Let qˆi (k + 1) = qˆi (k + 1) − qˆi (k). We then obtain the predicted sequences of ξi -contour moments, for i = 1, 2, . . . , 9, as follows: !(0) = qˆ1 (i) 9 = (33.79, 35.79, 37.92, 40.16), Q 1 i=6 11 (0) ! Q2 = qˆ2 (i) i=8 = (33.26, 37.2, 41.6, 46.52), !(0) = qˆ3 (i) 16 = (36.91, 42.17, 48.19), Q 3 i=14 23 (0) ! Q4 = qˆ4 (i) i=20 = (34.02, 37.59, 41.52, 45.87), !(0) = qˆ5 (i) 17 = (34.17, 38.38, 43.11, 48.42), Q 5 i=14 (0) ! = qˆ6 (7), qˆ6 (8) = (34.39, 46.47), Q 6 !(0) = qˆ7 (7), qˆ7 (8) = (34.13, 45.96), Q 7 !(0) = qˆ8 (7), qˆ8 (8) = (33.97, 45.68), and Q !(0) = qˆ9 (6) = (40.17). Q 8 9 Based on these predicted values, we can draw the predicted curve for the annual runoff amounts at the upper reaches of Fen River Reservoir (see Figure 10) for more details. For more relevant discussions, see He (1993) and Lin and Wu (2001).
VII. D ECISION -M AKING BASED ON I NCOMPLETE I NFORMATION Generally speaking, decision-making means the entire process of raising questions, collecting supporting materials, determining goals, making plans, analyzing and evaluating situations, choosing plans, implementing, feedbacking, and modifying the plans, and so on. Decision-making (Deng, 1985b) based on incomplete information is one where the model in use contains grey elements or where a general decision-making model is combined with some
144
LIN AND LIU
grey models. The emphasis of this kind of decision-making is the study of the problem of choosing plans. The totality of all events within a range of research is called the set of events of the research, denoted A = {a1 , a2 , . . . , an },
(91)
where ai , i = 1, 2, . . . , n, stands for the ith event. The corresponding totality of all possible countermeasures is called the countermeasure set, denoted B = {b1 , b2 , . . . , bm },
(92)
where bj , j = 1, 2, . . . , m, the j th countermeasure. Now, the Cartesian product " # A × B = (ai , bj ) | ai ∈ A, bj ∈ B
(93)
is called the situation set, denoted S = A × B. For any ai ∈ A and bj ∈ B, the pair (ai , bj ) is called a situation, denoted sij = (ai , bj ). Throughout this section, these three sets will be given at all times. For example, in the decision-making about what to plant in the field, weather conditions can be used as the set of events with a normal year denoted a1 , a drought year a2 , and a flood year a3 . Then, the set of events is A = {normal year, drought year, flood year} = {a1 , a2 , a3 }. In addition, different strains of crops can be seen as countermeasures with corn denoted b1 , Chinese sorghum b2 , soybeans b3 , sesame b4 , potatoes and yams b5 , . . . , and then the countermeasure set is given as B = {corn, Chinese sorghum, soybeans, sesame, potatoes, and yams, . . . } = {b1 , b2 , b3 , b4 , b5 , . . .}. Therefore, the situation set is S = A × B = {s11 , s12 , s13 , s14 , s15 , . . . ; s21 , s22 , s23 , s24 , s25 , . . . ; s31 , s32 , s33 , s34 , s35 , . . .}, where sij = (ai , bj ). Here, the events and countermeasures are simple. So, the situations constructed are relatively simple, too. In practical decision-making, the events under consideration are often complicated, consisting of many kinds of simple events, and the countermeasures are also complicated. Hence, the resultant situations can be extremely complicated. For a fixed situation sij ∈ S under a set of prescheduled targets or objectives, one needs to evaluate the effects. According to the evaluation, choices will need to be made. This is the decision-making. The following sections address several different grey decision-making methods.
A GREY SYSTEMS APPROACH
145
A. Decision-Making with Uncertain Targets (k)
Assume that S is the situation set, uij the effect value of situation sij with objective k, and R the set of all real numbers. Then, (k)
u(k) : S → R, sij → uij
(94)
(k) is called the effect mapping of S with objective k. First, if u(k) ij = uih , then the countermeasures bj and bh are said to be equivalent with respect to event ai with objective k, denoted bj ∼ = bh , and the set
Bi(k) = {bj | bj ∈ B, bj ∼ = bh }
(95)
is called the effect equivalence class of the countermeasure bh with respect to event ai with objective k. Second, assume that k is such an objective (k) satisfying that the greater the effect value the better. If u(k) ij > uih , then the countermeasure bj is said to be superior to the countermeasure bh with respect to event ai with objective k, denoted bj bh . The set (k) = {bj | bj ∈ B, bj bh } Bih
(96)
is called the superior class of the countermeasure bh with respect to event ai with objective k. Similarly, one can define the concept of superior classes of countermeasures for the cases of (1) the closer to a fixed moderate value the effect value is the better and (2) the smaller the effect value the better. (k) (k) If uih = ugh , then the events ai and ag are said to be equivalent with respect to countermeasure bh with objective k, denoted ai ∼ = ag . The set ∼ A(k) h = {ai | ai ∈ A, ai = ag }
(97)
is called the effect equivalence class of the event ag with respect to countermeasure bh with objective k. Assume that k is an objective such that the (k) greater the effect value the better. If u(k) ih > ugh , then the event ai is said to be superior to the event ag with respect to countermeasure bh with objective k, denoted ai ag . The set A(k) gh = {ai | ai ∈ A, ai ag }
(98)
is called the superior class of the event ag with respect to countermeasure bh with objective k. Similarly, the concept of superior classes of events can be defined for the cases of (1) the closer to a fixed moderate value the effect value is the better and (2) the smaller the effect value the better.
146
LIN AND LIU
(k) If u(k) ij = ugh , then the situation sij is said to be equivalent to the situation sgh with objective k, denoted sij ∼ = sgh . The set
S (k) = {sij | sij ∈ S, sij ∼ = sgh }
(99)
is called the effect equivalence class of the situation sgh with objective k. Assume that k is an objective satisfying that the greater the effect value the (k) better. If u(k) ij > ugh , then the situation sij is said to be superior to the situation sgh with objective k, denoted sij sgh . The set (k)
Sgh = {sij | sij ∈ S, sij sgh }
(100)
is called the effect superior class of the situation sgh with objective k. Similarly, one can define the concept of superior classes of situations for the cases of (1) the closer to a fixed moderate value the effect value is the better and (2) the smaller the effect value the better. Proposition 7.1. Assume that " # S = sij = (ai , bj ) | ai ∈ A, bj ∈ B = ∅ and
" # U (k) = u(k) ij | sij ∈ S
the effect set with objective k and {S (k) } the set of effect equivalence classes of situations with objective k. Then, the mapping " # u(k) : S (k) → U (k) , S (k) → uij (k) is one-to-one and onto. (k)
(k)
Assume that d1 and d2 are the upper and the lower threshold values for situation effects with objective k. Then, " (k) (k) # S 1 = r | d1 ≤ r ≤ d2 (101) is called the grey target of one-dimensional decision-making with objective k, (k) (k) (k) and uij ∈ [d1 , d2 ] a satisfactory effect with objective k, the corresponding sij the desirable situation with objective k, and bj the desirable countermeasure with respect to event ai with objective k. Proposition 7.2.
(k)
Assume that uij is the effect value of the situation sij with
(k)
objective k. If uij ∈ S 1 , that is, sij is a desirable situation with objective k, then for any s ∈ Sij(k) , s is also desirable. That is, when sij is desirable, all situations in its effect superior class are all desirable.
147
A GREY SYSTEMS APPROACH
Similarly, one can study the decision-making grey targets of situations with multiple objectives. Here, all details are omitted. Grey targets of decision-making (Zhong, 1988) are essentially the region for the location of desirable effects in terms of relative optimization. In many cases, since achieving the absolute optimum is often impossible, one is happy with reaching a satisfactory result. Of course, according to the need, one can gradually shrink the grey target for the decision-making need, until it degenerates into a point, which is the optimum effect with the corresponding situation as the optimum situation and the corresponding countermeasure as the optimum countermeasure. Therefore, the discussion below will be around solid spherical targets. The following . $ s (i) (1) (2) (i) 2 s (s) 2 (102) ≤R r − r0 R = r ,r ,...,r i=1 (1)
(2)
is called the s-dimensional spherical grey target with center r0 = (r0 , r0 , . . . , r0(s) ) and radius R and r0 = (r0(1) , r0(2) , . . . , r0(s) ) the optimum effect vector. (1) (2) Assume that sij and sgh are two different situations, uxy = (uxy , uxy , . . . , (s) uxy ), for (x, y) = (i, j ), (g, h), their effect vectors, respectively. If |uij − r0 | ≥ |ugh − r0 |, then the situation sgh is said to be superior to the situation sij , denoted sgh sij . When an equal sign holds true here, the situations sgh and sij are said to be equivalent, denoted sgh ∼ = sij . If for any i = 1, 2, . . . , n and j = 1, 2, . . . , m, it is always true that uij = r0 , then we say that the optimum situation does not exist or that the event does not have any optimum countermeasure. If the optimum situation does not exist but there exist g and h such that for any i = 1, 2, . . . , n and j = 1, 2, . . . , m, it is always true that |uij − r0 | ≥ |ugh − r0 |, that is, for any sij ∈ S, sgh sij , then sgh is called a quasi-optimum situation, ag a quasi-optimum event, and bh a quasi-optimum countermeasure. Theorem 7.1. For the given situation set S and s-dimensional spherical grey target R s , S becomes an ordered set with “superiority” being the ordering relation. In the situation set (S, ), there must exist a quasi-optimum situation. B. Decision-Making Employing Incidence Analysis Assume that S = {sij = (ai , bj ) | ai ∈ A, bj ∈ B} is the situation set (2) (s) and ui0 j0 = (u(1) i0 j0 , ui0 j0 , . . . , ui0 j0 ) the optimum effect vector. If the situation / S, then ui0 j0 is called the imagined corresponding to ui0 j0 satisfies si0 j0 ∈
148
LIN AND LIU
optimum effect vector and the corresponding si0 j0 the imagined optimum situation. Proposition 7.3. For sij ∈ S, let the corresponding effect vector be uij = (1) (2) (s) (uij , uij , . . . , uij ), i = 1, 2, . . . , n; j = 1, 2, . . . , m. 1. When k is an objective satisfying that the greater the effect value the better, (k) (k) take ui0 j0 = max1≤i,j ≤n {uij }. 2. When k is an objective satisfying that it is good when the effect value is (k) close to a moderate value u0 , take ui0 j0 = u0 . 3. When k is an objective satisfying that the smaller the effect value is the (k) (k) better, take ui0 j0 = min1≤i,j ≤n {uij }. (2) (s) Then ui0 j0 = (u(1) i0 j0 , ui0 j0 , . . . , ui0 j0 ) is the imagined optimum effect vector. (1)
(2)
Proposition 7.4. Assume the same as above. Let ui0 j0 = (ui0 j0 , ui0 j0 , . . . ,
u(s) i0 j0 ) be the imagined optimum effect vector and εij , i = 1, 2, . . . , n; j = 1, 2, . . . , m, the degree of incidence between uij and ui0 j0 . If εi1 j1 satisfies that for any i, j ∈ {1, 2, . . . , n} with i = i1 and j = j1 , it is always true that εi1 j1 ≥ εij , then ui1 j1 is said to be a quasi-optimum effect vector and si1 j1 a quasi-optimum situation. For more in-depth study on incidence and decision-making, please see Xu (1993). C. Decisions Based on Predicted Future Assume that event set A, countermeasure set B, and situation set S are given. Then, (k) (k) (k) u(k) (103) ij = uij (1), uij (2), . . . , uij (h) is called the situation effect time sequence of sij with objective k. Here, as time moves, the case of constant changing situation effects is addressed. Assume that the simulated sequence through inverse accumulating of the GM(1, 1) of the situation effect time sequence u(k) ij is given by (k) uˆ ij ( + 1)
= 1−e
(k)
aij
(k)
·
(k) uij (1) −
bij
(k)
aij
(k)
· e−aij .
(104)
A GREY SYSTEMS APPROACH
149
When k is an objective satisfying that the greater the effect value is the better, (1) if " (k) # (k) −aij = −ai0 j0 , max 1≤i≤n,1≤j ≤m
then si0 j0 is called the optimum situation with objective k; (2) if # " (k) uˆ ij (h + ) = uˆ (k) max i0 j0 (h + ), 1≤i≤n,1≤j ≤m
then si0 j0 is called the optimum situation of predictions with objective k. If k is an objective satisfying that the closer to a moderate value the effect value is the better: (1) When $, . , m n m n 1 (k) 1 (k) (k) (k) aij − aij = aij − ai0 j0 min 1≤i≤n n + m n + m 1≤j ≤m j =1 i=1 j =1 i=1 (105) si0 j0 is called the optimum situation of development coefficients with objective k. (2) When $ , -. m n 1 (k) (k) uˆ ij (h + ) min uˆ ij (h + ) − 1≤i≤n n+m 1≤j ≤m j =1 i=1 , m n 1 (k) (k) uˆ ij (h + ) (106) = uˆ i0 j0 (h + ) − n+m j =1 i=1
si0 j0 is called the optimum situation of predictions with objective k (Deng, 1986). Theorem 7.2. Assume that k is an objective satisfying that the greater the effect value is the better, si0 j0 the optimum situation with objective k, that is, (k) (k) (k) −ai0 j0 = max1≤i≤n,1≤j ≤m {−aij }, and uˆ i0 j0 (h + + 1) the predicted value for the situation effect of si0 j0 . Then, there must exist a 0 > 0 such that # " (k) max uˆ ij (h + 0 + 1) . uˆ (k) i0 j0 (h + 0 + 1) = 1≤i≤n,1≤j ≤m
That is, in a sufficiently distant future, si0 j0 will become the optimum situation of predictions. D. Collective Decision-Making Assume that ai , i = 1, 2, . . . , n, are the units involved in the decision-making process, bj , j = 1, 2, . . . , m, the decision schemes, xij , i = 1, 2, . . . , n;
150
LIN AND LIU
j = 1, 2, . . . , m, the evaluation value about scheme j of unit i, f k (·), k = 1, 2, . . . , s, the whitenization weight function for grey class k, ηi , i = 1, 2, . . . , n, the decision weight of unit i, satisfying ni=1 ηi = 1. Then n f k (xij ) · ηi n , σjk = s i=1 k k=1 i=1 f (xij ) · ηi
(107)
j = 1, 2, . . . , m; k = 1, 2, . . . , s, is called the decision coefficient for scheme j to belong to grey class k. Now, σj = σj1 , σj2 , . . . , σjs ,
(108)
j = 1, 2, . . . , m, is called the vector of the decision coefficients of scheme j . ∗ If max1≤k≤s {σjk } = σjk , then we say that scheme j belongs to grey class k ∗ . E. A Test of Applications As an example, in this section, we will see how collective decision-making is used to analyze the “Vitalizing the City through Science and Technology” plan of a certain city. More specifically, the governing body of the city, which, due to an agreement, we cannot name here, had organized several groups of experts to work out three practical schemes on how to “Vitalize XXX City through Progress in Science and Technology.” Each of the schemes had its own characteristics. These schemes are denoted as b1 , b2 , and b3 , respectively. So, the set of decision schemes is B = {b1 , b2 , b3 }. Next, five groups of experts were organized to evaluate the three schemes. That is, we had decision-making units a1 , a2 , a3 , a4 , and a5 ; the matrix of evaluation values done by the decision-making units on the decision schemes in B are given as follows: ⎡
C = [xij ]5×3
80 ⎢60 ⎢ = ⎢75 ⎣90 50
60 50 70 80 70
⎤ 40 50⎥ ⎥ 60⎥ . 80⎦ 60
Let us in the following do grey evaluation and decision-making based on the four grey classes: the best, realistic, basically realistic, and not implementable. Solution: The whitenization weight function f k (·) of grey class k, k = 1, 2, 3, 4, is given in Figure 11.
151
A GREY SYSTEMS APPROACH
F IGURE 11.
The whitenization weight functions of the grey classes.
From the graphs of the whitenization weight functions in have that ⎧ 0, ⎪ ⎪ ⎪ ⎧ ⎪ x − 50 ⎪ 0, x < 60 ⎪ ⎪ ⎪ ⎪ ⎪ 20 , ⎪ ⎨ ⎨ x − 60 , 60 ≤ x ≤ 90 f 1 (x) = f 2 (x) = 30 90 − x ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ , ⎪ ⎩ ⎪ 20 ⎪ 1, x > 90; ⎪ ⎪ ⎩ 0, ⎧ 0, x < 35 ⎪ ⎧ ⎪ ⎪ 0, ⎪ ⎪ x − 35 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ , 35 ≤ x ≤ 55 ⎪ ⎪ ⎨ 20 ⎨1, 3 4 f (x) = 40 − x f (x) = , 75 − x ⎪ ⎪ ⎪ ⎪ 20 ⎪ ⎪ , 55 < x ≤ 75 ⎪ ⎪ ⎪ ⎪ 20 ⎪ ⎩ ⎪ ⎪ 0, ⎩ 0, x > 75;
Figure 11, we x < 50 50 ≤ x ≤ 70 70 < x ≤ 90 x > 90; x<0 0 ≤ x ≤ 20 20 < x ≤ 40 x > 40.
Assume that the decision-making weights of the individual decision units ai , i = 1, 2, 3, 4, 5, are given as follows: η1 = 0.25,
η2 = 0.25,
η3 = 0.2,
η4 = 0.2,
η5 = 0.1.
152
LIN AND LIU
Then, for scheme 1, when k = 1, 2, 3, 4, 5 i=1 5
f 1 (xi1 ) · ηi = 0.47;
5
f 2 (xi1 ) · ηi = 0.4;
i=1
f (xi1 ) · ηi = 0.2625; 3
i=1
5
f 4 (xi1 ) · ηi = 0.
i=1
So, we have 5 4
f k (xi1 ) · ηi = 1.1325.
k=1 i=1
Therefore,
σ1 = σ11 , σ12 , σ13 , σ14 = (0.42, 0.35, 0.23, 0).
For schemes 2 and 3, similar calculations can be done to produce that σ2 = σ21 , σ22 , σ23 , σ24 = (0.19, 0.44, 0.37, 0) and
σ3 = σ31 , σ32 , σ33 , σ34 = (0.15, 0.29, 0.56, 0).
Since max1≤k≤4 {σ1k } = 0.42 = σ11 , max1≤k≤4 {σ2k } = 0.44 = σ22 , and max1≤k≤4 {σ3k } = 0.56 = σ33 , we can see that scheme 1 is the best, scheme 2 is realistic and implementable, and scheme 3 is basically realistic and implementable. Since all these three schemes are possible to be implemented, it is suggested that scheme 1 should be used as the foundation, and by merging all desirable features in schemes 2 and 3, we would obtain a more satisfactory and comprehensive plan for “vitalizing the city through science and technology.”
VIII. P ROGRAMMINGS WITH U NCERTAIN PARAMETERS Programming essentially belongs to the category of decision-making. It mainly studies under certain constraints how to guarantee the objective of achieving the possible optimum. If the constraint condition and objective function are linear, the problem is called a linear programming. When the objective function or the constraint condition is a nonlinear function, the corresponding problem is called a nonlinear programming. Here, linear programming is one of the most important branches in operations research, which had been developed early and matured quickly with a wide range of
A GREY SYSTEMS APPROACH
153
practical applications. However, the normal linear programming and nonlinear programming all have the following problems: 1. They are all “static” programming, which cannot be used to reflect the situation of change when the constraints are changing with time. 2. When grey numbers appear in either the programming model or the constraint, applications will become difficult. 3. In theory, each convex function defined on a convex set has a solution. However, in practical applications, due to technical reasons, the process of finding the solution cannot be finished. By using the idea and modeling method of grey systems theory, these problems with programmings can be resolved to a certain degree (Lin and Liu, 1999b). A. Linear Models Assume that aij , bi , and cj , i = 1, 2, . . . , m, j = 1, 2, . . . , n, are all constants, and xj , j = 1, 2, . . . , n, are unknown quantities. Then max(min)S = c1 x1 + c2 x2 + · · · + cn xn ⎧ a11 x1 + a12 x2 + · · · + a1n xn ≤ (=, ≥)b1 ⎪ ⎪ ⎪a21 x1 + a22 x2 + · · · + a2n xn ≤ (=, ≥)b2 ⎨ ··· ··· ··· ··· ··· ··· ··· ··· ··· s.t. ⎪ ⎪ ⎪ ⎩am1 x1 + am2 x2 + · · · + amn xn ≤ (=, ≥)bm x1 ≥ 0, x2 ≥ 0, . . . , xn ≥ 0
(109)
is called a general mathematical model of linear programming problems, where S = c1 x1 + c2 x2 + · · · + cn xn is called an objective function and ⎧ a11 x1 + a12 x2 + · · · + a1n xn ≤ (=, ≥)b1 ⎪ ⎪ ⎪a21 x1 + a22 x2 + · · · + a2n xn ≤ (=, ≥)b2 ⎨ ··· ··· ··· ··· ··· ··· ··· ··· ··· ⎪ ⎪ ⎪am1 x1 + am2 x2 + · · · + amn xn ≤ (=, ≥)bm ⎩ x1 ≥ 0, x2 ≥ 0, . . . , xn ≥ 0 the constraint conditions. The following max S = CX AX ≤ b s.t. X≥0
(110)
154
LIN AND LIU
programming problem, where C =
is called the standardized type of linear [c1 , c2 , . . . , cn ], X = [x1 , x2 , . . . , xn ]T , ⎡ a11 a12 ⎢ a21 a22 A=⎣ ··· ··· am1 am2
⎤ · · · a1n · · · a2n ⎥ , ··· ···⎦ · · · amn
and b = [b1 , b2 , . . . , bm ]T satisfying bi ≥ 0, i = 1, 2, . . . , m. If the matrices C = C(⊗), b = b(⊗), and A = A(⊗) above contain bounded grey entries with the lower bounds ≥ 0, then Eq. (110) is called a problem of linear programming with grey parameters (LPGP) (Wang, 1997), C(⊗) a grey price vector, A(⊗) a grey consumption matrix, b(⊗) a grey constraints vector for resource, and X the decision vector of the LPGP. As a matter of fact, in this case X is a grey vector as well. Suppose that αj , βi , γij ∈ [0, 1], i = 1, 2, . . . , m, j = 1, 2, . . . , n, and c˜j (⊗) = αj c¯j + (1 − αj )cj , b˜i (⊗) = βi b¯i + (1 − βi )bi ,
j = 1, 2, . . . , n, i = 1, 2, . . . , m,
a˜ ij (⊗) = γij a¯ ij + (1 − γij )a ij ,
and
1 ≤ i ≤ m, 1 ≤ j ≤ n,
˜ where C(⊗), b(⊗), and A(⊗) are, respectively, the whitenization vector of price, constraints for resources, and the whitenization matrix of consumption. Then max S = C(⊗)X ˜ A(⊗)X ≤ b(⊗) s.t. X≥0
(111)
is called a positioned programming of the LPGP (Liu, 1993; Liu and Dong, 1997a), αj (j = 1, 2, . . . , n) the positioned coefficients of price vector, βi (i = 1, 2, . . . , m) the positioned coefficients of constraint vector for resources, and γij (i = 1, 2, . . . , m, j = 1, 2, . . . , n) the positioned coefficients of consumption. For the sake of convenience, we first make the following suppositions. 1. Rank(A(⊗)) = m < n. 2. The set composed of the feasible solution of LP((αj , βi , γij ) | i = 1, 2, . . . , m, j = 1, 2, . . . , n) is nonempty. ˜ 3. The set {X | A(⊗)X ≤ b(⊗), X ≥ 0} composed of real vectors is bounded. At the same time, the positioned programming LP((αj , βi , γij ) | i = 1, 2, . . . , m, j = 1, 2, . . . , n) can be rewritten into the following form
155
A GREY SYSTEMS APPROACH
B (⊗), C N (⊗) XB max S = C XN ⎧ XB ⎨ ˜ ≤ b(⊗) B(⊗), N(⊗) XN s.t. ⎩ XB ≥ 0, XN ≥ 0.
(112)
Proposition 8.1. Suppose that the positioned programming in 3, satisfies the above suppositions 1, 2, and 3, and X = [x1 , x2 , . . . , xn ]T is the basic solution of the positioned programming in 3. Then, {xj | j = 1, 2, . . . , n} is bounded. Proposition 8.2. There is at least one basic feasible solution of the positioned programming LP((αj , βi , γij ) | i = 1, 2, . . . , m, j = 1, 2, . . . , n), which satisfies the suppositions 1, 2, and 3 listed above. For a grey linear programming problem as in the form of Eq. (110), first = [a˜ ij ]. Based on the historical = [c˜i ] and A(⊗) = A whitenize C(⊗) = C data of bi (⊗), i = 1, 2, . . . , m, bi (⊗) = (bi (1), bi (2), . . . , bi (s)), establish a GM(1, 1) model and solve for its predicted value bˆi (s +k) at the time moment s + k, for i = 1, 2, . . . , m. Denote bˆ = [bˆ1 (s + k), bˆ2 (s + k), . . . , bˆm (s + k)]. Then max S= CX AX = bˆ s.t. X≥0
(113)
is called a linear programming problem of grey prediction type (Xu, 1993). Each linear programming problem of grey prediction type can be solved according to the method of solving a general linear programming problem. B. Properties of Solutions of Grey Linear Models In practice, each problem of LPGP (Liu and Dong, 1997b) is a set composed of some ordinary problems of linear programming. Theorem 8.1. For a positioned programming of tioned coefficients of the price vector satisfy αj ≤ following is true: max S = f (αj , βi , γij ) | i = 1, 2, . . . , m; ≤ f (αj , βi , γij ) | i = 1, 2, . . . , m; = max S .
a LPGP, when the posiαj , j = 1, 2, . . . , n, the j = 1, 2, . . . , n j = 1, 2, . . . , n
(114)
156
LIN AND LIU
Theorem 8.2. For a positioned programming of a LPGP, when the positioned coefficients of restriction vectors for resource satisfy βi ≤ βi , i = 1, 2, . . . , m, the following holds true: max S = f (αj , βi , γij ) | i = 1, 2, . . . , m; j = 1, 2, . . . , n ≤ f (αj , βi , γij ) | i = 1, 2, . . . , m; j = 1, 2, . . . , n = max S .
(115)
Theorem 8.3. For a positioned programming LP((αj , βi , γij ) | i = 1, 2, . . . , m, j = 1, 2, . . . , n) of a LPGP, when the positioned coefficients of consumption satisfy γij ≥ γij , i = 1, 2, . . . , m, j = 1, 2, . . . , n, one has max S = f (αj , βi , γij ) | i = 1, 2, . . . , m; j = 1, 2, . . . , n ≤ f (αj , βi , γij ) | i = 1, 2, . . . , m; j = 1, 2, . . . , n = max S .
(116)
Assume that ∀i = 1, 2, . . . , m and j = 1, 2, . . . , n, αj = α, βi = β, and γij = γ . Then, the corresponding positioned programming is called a (α, β, γ )-positioned programming, written LP(α, β, γ ). Its optimal value is denoted max S(α, β, γ ), called the (α, β, γ )-positioned optimal value. Theorem 8.4.
For a positioned programming LP(α, β, γ ):
1. When α = α0 and β = β0 are fixed, if γ1 ≤ γ2 , then max S(α0 , β0 , γ1 ) ≥ max S(α0 , β0 , γ2 ). 2. When β = β0 and γ = γ0 are fixed, if α1 ≤ α2 , then max S(α1 , β0 , γ0 ) ≤ max S(α2 , β0 , γ0 ). 3. When α = α0 and γ = γ0 are fixed, if β1 ≤ β2 , then max S(α0 , β1 , γ0 ) ≤ max S(α0 , β2 , γ0 ). When α = β = 1 and γ = 0, the corresponding positioned programming LP(1, 1, 0) is called an ideal model of the LPGP. Its optimal value is written as max S. When α = β = 0 and γ = 1, the corresponding positioned programming LP(0, 0, 1) is called a critical model of the LPGP. Its optimal value is written as max S. When α = β = γ = θ , the corresponding positioned programming is called a θ -positioned programming. It is written as LP(θ ). Similarly, its optimal value is written as max S(θ ), which is called the θ -positioned optimal value. Theorem 8.5.
∀α, β, γ , θ ∈ [0, 1]:
1. max S ≤ max S(α, β, γ ) ≤ max S, and
A GREY SYSTEMS APPROACH
157
2. max S ≤ max S(θ ) ≤ max S. For fixed α, β, γ ∈ [0, 1], max S 1 max S(α, β, γ ) 1 1− + μ(α, β, γ ) = 2 max S(α, β, γ ) 2 max S
(117)
is called the pleased degree of the positioned programming LP(α, β, γ ). Similarly, we can define the concept of pleased degree of μ(θ ) for θ -positioned programming LP(θ ). Given a grey target D = [μ0 , 1], if μ(α, β, γ ) ∈ D, then the corresponding optimal solution is called the pleased solution of the LPGP. C. Assignment Problems of Grey Prediction Type Let n tasks be assigned to m people. Assume that each person will finish only one task. When n = m, this kind of assignment is called a balanced assignment problem. In a balanced assignment problem, let 1, if the ith task is assigned to the j th person xij = 0, if the ith task is not assigned to the j th person. Assume that cij is the expense for the j th person to accomplish the ith task, for i, j = 1, 2, . . . , n. Then, min S =
n n
cij xij
i=1 j =1
s.t.
⎧ n ⎪ ⎪ ⎪ xij = 1, i = 1, 2, . . . , n ⎪ ⎪ ⎪ ⎪ j =1 ⎪ ⎪ ⎨ n ⎪ ⎪ xij = 1, j = 1, 2, . . . , n ⎪ ⎪ ⎪ ⎪ i=1 ⎪ ⎪ ⎪ ⎩ xij = 0 or 1, i, j = 1, 2, . . . , n
(118)
is called a mathematical model of the assignment problem. The square matrix C = [cij ]n×n is called an efficiency matrix. Theorem 8.6. If a constant is added to all the entries of a row (or column) of the efficiency matrix C, then the optimal assignment, obtained from the new efficiency matrix, is the same as that obtained from C.
158
LIN AND LIU
When the entries in the efficiency matrix are some grey predicted values or grey development coefficients of efficiency sequences, the corresponding assignment programming is called a grey 0-1 programming (Wang, 1993). When the values cij in the originalproblem are values for benefits and n the objective function is max S = i,j =1 cij xij , one can take ci0 j0 =
= c max1≤i,j ≤n {cij } and let cij i0 j0 − cij , i, j = 1, 2, . . . , n. Then, the n
objective function is converted to min S = i,j =1 cij xij . One can follow the steps below to solve a grey 0-1 programming problem: (0) (0) (0) Step 1. Collect the benefit time sequence uij = (uij (1), uij (2), . . . , (0)
uij (h)), i, j = 1, 2, . . . , n. Step 2. Establish the GM(1, 1) time response series $ (1) uˆ ij (k + 1) = ωij · e−aij k − ij (0)
(1)
(1)
uˆ ij (k + 1) = uˆ ij (k + 1) − uˆ ij (k), (0)
(0)
(0)
(119)
(0)
for uij = (uij (1), uij (2), . . . , uij (h)), i, j = 1, 2, . . . , n. Step 3. Write out the benefit matrix C = [cij ]n×n . Here, one can define (0) cij = uˆ ij (h + s) or cij = −aij , i, j = 1, 2, . . . , n. Step 4. Compute ci0 j0 = max1≤i≤n max1≤j ≤n {cij }.
= c Step 5. Let cij i0 j0 − cij , i, j = 1, 2, . . . , n. One obtains the following grey 0-1 programming model: min S =
n n
cij xij
i=1 j =1
s.t.
⎧ n ⎪ ⎪ ⎪ xij = 1, ⎪ ⎪ ⎪ ⎪ j =1 ⎪ ⎪ ⎨ n ⎪ ⎪ xij = 1, ⎪ ⎪ ⎪ ⎪ i=1 ⎪ ⎪ ⎪ ⎩ xij = 0 or 1,
i = 1, 2, . . . , n (120) j = 1, 2, . . . , n i, j = 1, 2, . . . , n.
] Step 6. Convert the efficiency matrix C = [cij n×n . Subtract the minimum entry of each row and each column of the efficiency matrix C so that each row and each column contains at least one zero entry. If the number of zeros located at different rows and different columns equals the rank n of the efficiency matrix, stop the conversion. Otherwise, repeat the previous conversion until the number of zeros, located at different rows and different columns, equals the rank n of the efficiency matrix.
A GREY SYSTEMS APPROACH
159
Step 7. Add “( )” to the n zero entries located at different rows and different columns. These zero entries are called independent zero. Let 1, if there exists an independent zero at (i, j ) location xij = 0, otherwise. Then, X = {xij | i, j = 1, 2, . . . , n} is the optimal solution we are looking for. D. Nonlinear Programming Assume that X = [x1 , x2 , . . . , xn ] is a decision vector and ⊗ a set of grey parameters. Then, max(min)S = f (X, ⊗)
(121)
is called a grey nonlinear programming problem without constraints, where f (X, ⊗) is a grey price or consumption functional. Whitenizing all grey elements in f (X, ⊗) results in a programming problem, called a whitenized programming of Eq. (121), which is denoted max(min)S = f (X). If f (X) is a differentiable function, where X is given above, then the solution of the gradient vector ∂f ∂f ∂f =0 (122) , ,..., grad f (X) = ∂x1 ∂x2 ∂xn is called a stationary point of f (X). Theorem 8.7. Assume that f (X) is second order differentiable, and its Hesse matrix is ⎤ ⎡ 2 ∂ 2f ∂ 2f ∂ f ··· ⎢ ∂x1 ∂x2 ∂x1 ∂xn ⎥ ⎥ ⎢ ∂x12 ⎥ ⎢ 2 ∂ 2f ∂ 2f ⎥ ⎢ ∂ f ⎥ ⎢ ··· H (X) = ⎢ ∂x2 ∂x1 . (123) ∂x2 ∂xn ⎥ ∂x22 ⎥ ⎢ ⎥ ⎢ ··· ··· ··· ··· ⎥ ⎢ 2 ⎣ ∂ f ∂ 2f ∂ 2f ⎦ ··· ∂xn ∂x1 ∂xn ∂x2 ∂xn2 If X0 is a fixed point of f (X), then: 1. When H (X 0 ) is a positive definite matrix, X0 is a minimum point. 2. When H (X0 ) is a negative definite matrix, X0 is a maximum point.
160
LIN AND LIU
3. When H (X0 ) is semipositive definite, if there exists a neighborhood U (X0 , δ) of X0 such that for any X ∈ U (X0 , δ), H (X0 ) is semipositive definite, then X0 is a minimum point. 4. When H (X0 ) is seminegative definite, if there exists a neighborhood U (X0 , δ) of X0 such that for any X ∈ U (X0 , δ), H (X0 ) is seminegative, then X0 is a maximum point. 5. When H (X0 ) is a nondefinite matrix, X0 is not an extreme point of the functional f (X). Assume that X is the same as above, ⊗(1) , ⊗(j ) , and ⊗(i) , j = 1, 2, . . . , m, i = 1, 2, . . . , s are sets of grey parameters. Then, min S = f X, ⊗(1) $ (124) gj X, ⊗(j ) ≥ 0, j ∈ J = {1, 2, . . . , m} s.t. hi X, ⊗(i) = 0, i ∈ I = {1, 2, . . . , s} is called a problem of nonlinear programming with constraints, where f (X, ⊗(1) ) is the grey consumption functional and gj (X, ⊗(j ) ) and hi (X, ⊗(i) ), respectively, grey constraint functionals. Then, the whitenized programming problem min S = f (X) gj (X) ≥ 0, j ∈ J = {1, 2, . . . , m} s.t. hi (X) = 0, i ∈ I = {1, 2, . . . , s}
(125)
can be solved by following the steps below. Step 1. If I = ∅, go to Step 2 directly. When I = ∅, that is, when there does not exist any equation constraint, solve the programming problem as one without constraints: min S = f (X). Assume that X(0) is the optimal solution. Let k = 0; go to Step 3. Step 2. Solve the subprogramming problem (Lagrange multipliers can be applied): min S = f (X) s.t.
{hi (X) = 0,
i ∈ I = {1, 2, . . . , s}
(126)
with equation constraints. Assume that X(0) is the optimal solution. Let k = 0. Step 3. Substitute X(k) into the inequality constraints and compute the index set of inequalities that are not satisfied: " # Jk = p | gp X(k) < 0, p ∈ J = {1, 2, . . . , m} . (127) When Jk = ∅, X(k) is the optimal solution of the original programming problem. So, stop the calculation. If Jk = ∅, go to the next step.
161
A GREY SYSTEMS APPROACH
Step 4. Select an arbitrary element p from Jk , introduce nonnegative slack variable yp2 , and change the corresponding inequality constraints into equation constraints. Now, solve the following programming problem with augmented equation constraints: min S = f (X) gp (X) − yp2 = 0, p ∈ Jk s.t. hi (X) = 0, i ∈ I = {1, 2, . . . , s}.
(128)
Assume that X(k+1) (yp2 ) is the optimal solution. Step 5. Solve the following programming problem without constraints: min S = f [X (k+1) (yp2 )]. Assume that X(k+1) is the optimal solution. Replace k + 1 with k and go back to Step 3.
IX. C ONTROL OF N OT C OMPLETELY K NOWN S YSTEMS By control is meant that the controlling equipment or party is imposing a special function or action on the controlled equipment or party. This special function or action is a purposeful and selected dynamic activity. In a control system, there exists at least three parts: controlling equipment, controlled equipment, and a communication tunnel. A control system, consisting of these three parts only, is called an open-loop control system. For open-loop control systems, it is relatively simple to do a needed control, since the output can be controlled directly by using the input. However, the weakness is that it is not very robust against disturbances. Those control systems with feedback loops are called closed-loop control systems. In a closed-loop control system, the control is realized through interactions of the input and the feedback of the output. An outstanding strength of a closed-loop control system is that it is very robust against disturbances with its output always around the predefined objectives. The so-called control of not completely known systems or simply grey control (Deng, 1982) means control of grey systems, including general control systems with grey parameters and controls built on the analysis, modeling, prediction, and decision-making of grey systems. When compared to the traditional control theory, the methodology of grey control can more deeply reveal the characteristics of the problem under consideration and is more beneficial for the realization of the objectives of control.
162
LIN AND LIU
A. Linear Control Systems Assume that U = [u1 , u2 , . . . , us ]T is a control vector, X = [x1 , x2 , . . . , xn ]T a state vector, and Y = [y1 , y2 , . . . , ym ]T the output vector. Then, X˙ = A(⊗)X + B(⊗)U (129) Y = C(⊗)X is called the mathematical model of a grey linear control system (Zhou and Deng, 1986), where A(⊗), B(⊗), and C(⊗) are n × n, n × s, and m × n matrices with grey entries, respectively. Accordingly, A(⊗) is called a state (grey) matrix, B(⊗) a control (grey) matrix, and C(⊗) a (grey) output matrix. To emphasize changes of U , X, and Y with time, one also denotes the control vector, state vector, and output vector as U (t), X(t), and Y (t), respectively. The first equation in Eq. (129) is called the state equation; the second equation is the output equation. For a fixed time moment t0 and predetermined accuracy requirement, if there exists t1 ∈ [t0 , ∞) such that based on the output Y (t), t ∈ [t0 , t1 ] of the system the system’s state X(t) can be measured with the desirable accuracy, then the system is said to be observable on the interval [t0 , t1 ]. If for any t0 , t1 , the system is observable on the interval [t0 , t1 ], then the system is said to be observable. For a given accuracy requirement and an objective vector J = [j1 , j2 , . . . , jm ]T , if the controlling equipment and the control vector U (t) can make the output Y (t) reach objective J with the desirable accuracy through the control of the input, then the system is said to be controllable. When some perturbations are applied to the initial value of the system, (1) if the amplitude of the response (output) is bounded, the system is said to be stable; (2) if the response (output) can recover its initial state after a period of time, then the system is said to be asymptotically stable; and (3) if the amplitude of the response becomes unbounded, then the system is said to be unstable. In general, the concept of systems stability means that of asymptotic stability. Theorem 9.1.
and
For the system in Eq. (129), let ⎡ ⎤ C(⊗) ⎢ C(⊗)A(⊗) ⎥ ⎢ ⎥ D(⊗) = ⎢ C(⊗)A2 (⊗) ⎥ ⎣ ⎦ ··· n−1 C(⊗)A (⊗)
L(⊗) = B(⊗)
A(⊗)B(⊗)
Then, the following hold true:
A2 (⊗)B(⊗) · · ·
(130)
. An−1 (⊗)B(⊗) (131)
A GREY SYSTEMS APPROACH
163
1. When rank(D(⊗)) = n, the system is observable; 2. When rank(L(⊗)) = n, the system is controllable; and 3. A sufficient and necessary condition for the system to be asymptotic stable is that the upper bounds of the grey elements of the real parts of the grey characteristic roots of the state grey matrix A(⊗) are all less than zero. B. Transfer Functions Assume that the mathematical model for the nth-order linear system with grey parameters is d nx d n−1 x dx + ⊗0 x = ⊗ · u(t). + ⊗ + · · · + ⊗1 (132) n−1 dt n dt dt n−1 Apply Laplace transformation to both sides of the equation and denote
L x(t) = X(s), L u(t) = U (s). (133) ⊗n
Then G(s) =
⊗ X(s) = n n−1 U (s) ⊗n s + ⊗n−1 s + · · · + ⊗1 s + ⊗0
(134)
is called a grey transfer function. A grey system, which is described with an equation, is also called a grey link or grey component (Fan and Wang, 1995). When the transfer function of a certain link is known, the Laplace transformation of the response term can be solved through X(s) = G(s) · U (s)
(135)
using the Laplace transformation of the driving term. Now, by using an inverse transformation, the response x(t) can be obtained. In the following, we will discuss several typical transfer functions. Proposition 9.1. The link or component with the driving term u(t) and the response term x(t) satisfying x(t) = K(⊗)u(t)
(136)
is called a grey proportional link or component, where K(⊗) is a grey amplifying coefficient. The transfer function of a grey proportional link is given by G(s) = K(⊗).
(137)
164
LIN AND LIU
Proposition 9.2. With a unit jump occurring in the drive, if the response satisfies x(t) = K(⊗) · 1 − e−tT , (138) then the link is called a grey inertia link, where T is a time constant of the link. The transfer function of a grey inertia link is given by G(s) =
K(⊗) . Ts +1
(139)
Proposition 9.3. When the drive and response satisfy the relationship x(t) = K(⊗)u(t) dt, (140) the link is called a grey integral link. The transfer function of a grey integral link is given by G(s) =
K(⊗) . s
(141)
Proposition 9.4. When the drive and response satisfy du(t) , (142) dt the link is called a grey differential link. The transfer function of a grey differential link is given by x(t) = K(⊗) ·
G(s) = K(⊗) · s.
(143)
Proposition 9.5. When the drive and response satisfy the relationship
x(t) = u t − τ (⊗) , (144) the link is called a grey postponing link, where τ (⊗) is a constant. The transfer function of a grey postponing link is given by G(s) = e−τ (⊗)·s .
(145)
We can study systems problems of stability, controllability, and so forth, through the study of extremum points of the relevant grey transfer functions. From the theorem below, it follows that each nth-order grey linear system can be converted to an equivalent first-order grey linear system. Therefore, we can make use of the results in the previous section to discuss problems of nth-order grey linear systems.
A GREY SYSTEMS APPROACH
165
Theorem 9.2. For the nth-order grey linear system given in Eq. (132), there exists an equivalent first-order grey linear system. Assume that an application of a Laplace transformation on Eq. (129) gives sX(s) = A(⊗)X(s) + B(⊗)U (s) (146) Y (s) = C(⊗)X(s). So, one has sE − A(⊗) X(s) = B(⊗)U (s) Y (s) = C(⊗)X(s). If [sE − A(⊗)] is nonsingular, then he further has
−1 X(s) = sE − A(⊗) B(⊗)U (s) Y (s) = C(⊗)X(s).
(147)
(148)
That is
−1 Y (s) = C(⊗) sE − A(⊗) B(⊗)U (s).
(149)
The m × n matrix G(s) = C(⊗)[sE − A(⊗)]−1 B(⊗) is called the matrix of transfer functions of the grey linear control system, or grey transfer matrix for short. For an nth-order grey linear system, when the state grey matrix A(⊗) of the corresponding equivalent first-order system is nonsingular, lim G(s) = −C(⊗)A(⊗)−1 B(⊗)
s→0
(150)
is called a grey gain matrix. If the grey gain matrix −C(⊗)A(⊗)−1 B(⊗) is used to approximate the grey transfer matrix G(s), then the system is simplified to a proportional link. From Y (s) = G(s)U (s), it follows that when m = s = n, if G(s) is nonsingular, one can also obtain U (s) = G(s)−1 Y (s). The following
G(s)−1 = B(⊗)−1 sE − A(⊗) C(⊗)−1 (151) is called a grey structure matrix. Under the condition that the grey structure matrix is known, in order for the output vector Y (s) to reach or get closer to a predetermined objective J (s), one can use G(s)−1 J (s) to determine the system’s control vector U (s). One can also discuss the controllability and observability of systems through uses of grey transfer matrices.
166
LIN AND LIU
C. Other Kinds of Controls Let G(s)−1 be a system’s structure matrix and G∗ (s)−1 an objective structure matrix. Then, Δ−1 = G(s)−1 − G∗ (s)−1 is called a structural deviation matrix. And, one can obtain G∗ (s)−1 Y (s) − Δ−1 Y (s) = U (s).
(152)
The following −Δ−1 Y (s) is called a superfluous term. The control through a feedback of Δ−1 Y (s) to cancel the superfluous term is called a control with abandonment (for more discussion on control systems, see Xiong et al., 1999). The system G(s)−1 Y (s) = U (s)
(153)
by a feedback of Δ−1 Y (s) can be converted to G(s)−1 Y (s) + Δ−1 Y (s) = U (s), that is, [G(s)−1 + Δ−1 ]Y (s) = U (s). Therefore, G∗ (s)−1 Y (s) = U (s) already has the desirable objective structure. The number of entries in the structural deviation matrix Δ−1 , used in a control with abandonment, directly affects the number of components in the controlling equipment. From the angles of economics, reliability, being easy to realize technically, and so on, under the guarantee that the system will possess desirable dynamic characteristics, we always try to keep the number of entries in the deviation matrix Δ−1 to a minimum. That is, to say, in the objective structural matrix, we should try to keep the corresponding entries of the original structural matrix. Assume that Y = (y1 , y2 , . . . , ym )T is the output vector and J = (j1 , j2 , . . . , jm )T the objective vector. If the components of the control vector U = (u1 , u2 , . . . , us )T satisfy uk = fk [γ (J, Y )], k = 1, 2, . . . , s, where γ (J, Y ) is the degree of grey incidence between the output vector Y and the objective vector J , then the systems control is called a grey incidence control. Grey incidence control systems are obtained by attaching grey incidence controllers to regular control systems. It determines the control vector U through the degree of grey incidence γ (J, Y ) so that the degree of incidence between the output vector and the objective vector does not go beyond a certain predetermined range. The idea of control of grey predictions is used to predict a system’s future behaviors based on a collection of data regarding the system’s behaviors in order to uncover the development law, if any, of the system and to perform precontrols on relevant controlling decisions by using the predicted future development tendency of the system. In this way, it becomes possible to prevent a predicted disaster before it actually occurs and to impose controls in a timely fashion. Therefore, this method has a relatively stronger adaptability in practical applications.
A GREY SYSTEMS APPROACH
167
The principle behind the concept of grey prediction control systems is as follows. First, through the use of sampling equipment, collect and organize data for the output vector Y . Second, through the equipment of prediction, establish a model, from which predicted values after several steps are computed. Third, check with the objective, and determine the control vector U , so that the future output vector Y will be as close to the objective J as possible. More specifically, assume that ji (k), yi (k), and ui (k), i = 1, 2, . . . , m are objective components, output components, and control components at time moment k, respectively. For i = 1, 2, . . . , m, let yi = yi (1), yi (2), . . . , yi (n) ji = ji (1), ji (2), . . . , ji (n) , and
ui = ui (1), ui (2), . . . , ui (n) .
For the control operator f : [ji (λ), yi (λ)] → ui (k), (1) when k > λ, the system is an after-event control; (2) when k = λ, the system is said to be an on-time control; and (3) when k < λ, the system is said to be a prediction control (Deng and Zhou, 1986). If the operator f , as defined above, satisfies f [ji (λ), yi (λ)] = ji (λ)−yi (λ), that is, ui (k) = ji (λ) − yi (λ), then (1) when k > λ, the system is said to be an error-afterward control; (2) when k = λ, the system is said to be error-ontime control; and (3) when k < λ, the system is said to be error-prediction control. Assume that yi = (yi (1), yi (2), . . . , yi (n)), for i = 1, 2, . . . , m, is an observational sequence of the output components, whose GM(1, 1) response sequence is ⎧ bi bi ⎪ ⎨yˆi(1) (k + 1) = yi (1) − · e−ai k + ai ai (154) ⎪ ⎩ (0) (1) (1) yˆi (k + 1) = yˆi (k + 1) − yˆi (k). If the control operator f satisfies ui (n + k0 ) = f [ji (k), yˆi(0) (k)], n + k0 < k, i = 1, 2, . . . , m, then the system’s control is called a grey prediction control. D. A Test of Applications This section considers a control of our biological prevention system (Li and Deng, 1984) of cotton aphids (see Chen, 1982, for another intriguing example). Cotton aphids are injurious insects for cotton production. Ladybugs are a natural enemy of cotton aphids. By planting rapes in cotton fields,
168
LIN AND LIU
an existing biological prevention system of cotton aphids can be effectively controlled. Ladybugs eat not only cotton aphids, but also rape aphids. At first, plant rapes in cotton fields so that rape aphids can grow so that ladybugs will fly, stay, and be reproduced with the rapes. When the growth of ladybugs has reached a certain level, cotton and cotton aphids start to grow. At this time, the planting of the rapes is reduced so that the ladybugs must turn their attention to the cotton aphids to realize the objective of destroying the cotton aphids. Suppose that x1 (k), x2 (k), and x3 (k) stand for the numbers of ladybugs, rape aphids, and cotton aphids at the kth time phase, respectively. Then, these three numbers satisfy the following relation ⎧ ⎨x1 (k + 1) = a11 x1 (k) + a12 x2 (k) x2 (k + 1) = −a21 x1 (k) + a22 x2 (k) + ⊗23 x3 (k) ⎩x (k + 1) = −a x (k) + ⊗ x (k) + a x (k). 3 31 1 32 2 33 3 That is, X(k + 1) = A(⊗)X(k), where ⎤ ⎡ x1 (k + 1) X(k + 1) = ⎣x2 (k + 1)⎦ , x3 (k + 1) and
⎡
a11 A(⊗) = ⎣−a21 −a31
⎡
⎤ x1 (k) X(k) = ⎣x2 (k)⎦ x3 (k) a12 a22 ⊗32
⎤ 0 ⊗23 ⎦ . a33
By taking k = 0, one has X(1) = A(⊗)X(0). When k = 1, one has X(2) = A(⊗)X(1) = A(⊗)2 X(0). If one needs to eliminate cotton aphids in the second phase, let x3 (2) = 0. From the preceding equation, it follows that x3 (2) = −[a31 a11 + a21 ⊗32 + a31 a33 ]x1 (0)
2 + [−a31 a12 + a22 ⊗32 + a33 ⊗32 ]x2 (0) + ⊗32 ⊗23 + a33 x3 (0).
So, −[a31 a11 + a21 ⊗32 + a31 a33 ]x1 (0) + [−a31 a12 + a22 ⊗32 + a33 ⊗32 ]x2 (0)
2 + ⊗32 ⊗23 + a33 x3 (0) = 0. Let x1 (0) and x3 (0) = 0. Then, it follows that λ13 (0) =
and take ⊗32
λ13 (0) =
λ23 (0) =
x2 (0) , x3 (0)
2 a33 a31 a12 − λ23 (0). a31 (a11 + a33 ) a31 (a11 + a33 )
A GREY SYSTEMS APPROACH
169
Therefore, one can choose the ratio of ladybugs and cotton aphids to satisfy 2 2 a33 a33 − a31 a12 λ23 (0) . , λ13 ∈ a31 (a11 + a33 ) a31 (a11 + a33 ) Here λ13 is a control quantity of our biological prevention system of cotton aphids. It is a grey number with the lower and upper limits λ13 =
2 − a a λ (0) a33 31 12 23 a31 (a11 + a33 )
and
λ¯ 13 =
2 a33 . a31 (a11 + a33 )
If we let μ be the ratio of the production capacity of ladybugs and that of a11 cotton aphids, that is, μ = , then a33 λ¯ 13 =
a33 . a31 (1 + μ)
Obviously, the greater μ is, the smaller λ¯ 13 will be. When μ is fixed, the greater a33 is, the greater λ¯ 13 will be. When a33 and μ are fixed, the greater a31 is, the smaller λ¯ 13 will be. In general, we take λ13 = λ¯ 13 . Then, it is assured that in the second phase, all cotton aphids will be eliminated.
R EFERENCES Ackoff, R.L. (1973). Science in the systems age: Beyond IE. OR. and MS. Operations Research 21, 661–671. Bertalanffy, L.V. (1968). General Systems Theory: Foundation, Development, Applications. George Braziller, New York. Checkland, L.V. (1981). Systems Thinking, Systems Practice. John Wiley, New York. Chen, H.-S., Chang, W.-C. (2000). Optimization of GM(1, N) modeling. The Journal of Grey System 12, 53–57. Chen, M.-Y. (1982). Grey dynamic control of a boring lathe. Journal of Huazhong College of Technology 10, 7–11. Chen, M.-Y. (1985). Stability and steadiness problems of grey systems. Fuzzy Mathematics 5, 54–58. Chen, Q.-B., Nie, R. (1999). Some study on grey GM(1, 3) growth model and applications. Journal of Song-Liao (Natural Science Edition) 4, 32–35. Chen, Z.-B. (1984). Remnant recursive recognition of grey interval functions. Journal of Huazhong College of Technology 12, 67–70. Ci, X.-B. (1995). Some methods for improved accuracy of GM(1, 1) predictions. Journal of Sho-Guan University (Natural Science Edition) 2, 34–39.
170
LIN AND LIU
Dai, W.-Z. (1997). nth accumulating grey models with remnant corrections. Systems Engineering Theory and Practice 12, 121–124. Deng, J.L. (1982). Control problems of grey systems. Systems & Control Letters 1, 211–215. Deng, J.L. (1985a). Grey systems incidence space. Fuzzy Mathematics 5, 1–10. Deng, J.L. (1985b). Decision making for grey situations. Fuzzy Mathematics 5, 43–50. Deng, J.L. (1986). Grey Prediction and Decision Making. Press of Huazhong University of Science and Technology, Wuhan. Deng, J.-L. (1993). Grey differential equations. The Journal of Grey System 5, 1–14. Deng, J.-L., Zhou, C.-S. (1986). Sufficient conditions for the stability of a class of interconnected dynamic systems. Systems & Control Letters 5, 105–108. Dong, Y.-G. (1995). Valuation of grey clustering with human experience. The Journal of Grey System 7, 179–184. Fan, J.-F., Wang, R.-Y. (1995). A simply test for the stability of symmetric grey systems. Engineering Mathematics 2, 31–33. Guao, H. (1985). Distinguishing coefficients of grey degree of incidence. Fuzzy Mathematics 5 (2), 55–58. Guo, H. (1991). Grey exponential law attributed to mutual complement of series. The Journal of Grey System 3, 153–162. Hao, Y.-H., Wang, X.-M. (2000). Period residual modification of GM(1, 1) modeling. The Journal of Grey System 12, 79–84. He, M.-X. (1997). A new method to establish GM(1, 1) prediction models. Agricultural Systems Science and Comprehensive Research 13, 241–244. He, Y. (1993). Solution of GM(1, 1) models and discussions on relevant accuracy tests. Systems Engineering 11, 36–70. Jie, P.-S., Huong, W.-S., Hu, Y.-Y. (2001). Some study on the characteristics of grey prediction models. Systems Engineering Theory and Practice 21 (9), 105–108. Klir, G.J. (1991). Facets of Systems Science. Plenum Press, New York. Ko, J.-Z. (1996). Data sequence transformation and GM(1, 1) model accuracy. In: Liu, S.-F., Xu, Z.-X. (Eds.), New Development in Grey Systems Research. Press of Huazhong University of Science and Technology, Wuhan, pp. 233–235. Li, B.-L., Deng, J.-L. (1984). Grey model for the prevention system of cotton insects. Exploration of Nature 3 (3), 44–49. Li, W.-X. (1990a). Grey algebraic curve models. Systems Engineering 8 (1), 32–36. Li, W.-X. (1990b). A clustering analysis method based on grey incidences and its applications. Systems Engineering 8 (3), 37–44.
A GREY SYSTEMS APPROACH
171
Li, X.-Q. (1997). A generalization of applications of grey systems GM(n, h) model. Systems Engineering Theory and Practice 17 (8), 82–85. Lin, J.-L., Wang, K.-S., Yan, B.-H., et al. (2000). Optimization of parameters in GM(1, 1) model by Taguchi method. The Journal of Grey System 11 (4), 33–38. Lin, L.-Z., Wu, W.-J. (2001). Exploration on optimal grey GM(1, 1) model. Systems Engineering Theory and Practice 21 (8), 92–96. Lin, Y. (1998). Mystery of nonlinearity and Lorenz’s chaos. Kybernetes: The International Journal of Systems and Cybernetics 27, 605–854. Lin, Y. (2001a). Differential and integral calculus on discrete time series data. In: Proceedings of the 14th International Conference on Systems Science, vol. 1: Plenary and Invited Papers: Systems Theory and Control Theory, pp. 123–131. Also, Systems Science 27 (3), 49–58. Lin, Y. (2001b). Information, prediction and structural whole: An introduction. Kybernetes: The International Journal of Systems and Cybernetics 30 (4), 305–364. Lin, Y., DeNu, R., Patel, N. (2002). True distance fit of exponential curves and tests of applications. International Journal of Applied Mathematics 9 (1), 49–68. Lin, Y., Liu, S.-F. (1999a). Regional economic planning based on systemic analysis of small samples (I). Problems of Nonlinear Analysis in Engineering Systems 10 (2), 24–39. Lin, Y., Liu, S.-F. (1999b). Several programming models with unascertained parameters and their application. Journal of Multi-Criteria Decision Analysis 8, 206–220. Lin, Y., Liu, S.-F. (2000a). Regional economic planning based on systemic analysis of small samples (II). Problems of Nonlinear Analysis in Engineering Systems 11 (6), 33–49. Lin, Y., Liu, S.-F. (2000b). Law of exponentiality and exponential curve fitting. Systems Analysis Modelling Simulation 38, 621–636. Lin, Y., Liu, S.-F. (2000c). A systemic analysis with data (I) and (II). International Journal of General Systems 29, 989–999, 1001–1013. Liu, K. (1982). Grey sets and stability of grey systems. Journal of Huazhong University of Science and Technology 10 (3), 23–25. Liu, K.-D., Lin, Y., Gao, L.-G. (2001). Informational uncertainties and their mathematical expressions. Kybernetes: The International Journal of Systems and Cybernetics 30 (4), 378–396. Liu, L.S. (1987). P–F theorems of grey non-negative matrices. Journal of Henan Agricultural University 21 (4), 376–380. Liu, L.S. (1991). The three axioms of buffer operator and their application. The Journal of Grey System 3 (1), 39–48.
172
LIN AND LIU
Liu, L.S. (1992). Generalized degree of grey incidence. In: Zhang Shengkai (Ed.), Information and Systems. DMU Publishing House, Dalian, pp. 113– 116. Liu, L.S. (1993). Positioned solution of grey parametric linear programming. Grey Systems Theory and Applications 3 (2), 23–28. Liu, L.S. (1995a). Progress in technology and its impact on productivity. In: Management Science and Systems Science. Press of Xian Jiaotong University, Xian, pp. 85–91. Liu, L.S. (1995b). On measure of grey information. The Journal of Grey System 7 (2), 97–101. Liu, L.S. (1995c). Sequence operators and systems modeling. In: New Developments in Fundamental Science Research. Press of Chinese Science and Technology, Beijing, pp. 16–20. Liu, L.S., Deng, J.-L. (1999). GM(1, 1) coding for exponential series. The Journal of Grey System 11 (2), 147–152. Liu, S.-F., Dong, Y.-G. (1997a). The positioned solution of the linear programming with grey parameters. Advances in Systems Science and Application 3 (1), 371–377. Liu, S.-F., Dong, Y.-G. (1997b). LPGP floatation and satisfactory degree of positioned solutions. Journal of Huazhong University of Science and Technology 25 (1), 24–27. Liu, S.-F., Dong, Y.-G., Li, B.-J. (1999). Computations of contribution rates due to progress in technology of Henan Province and horizontal comparison. Journal of Henan Agricultural University 33 (1), 40–43. Liu, S.-F., Lin, Y. (1999). An Introduction to Grey Systems: Foundations, Methodology and Applications. IIGSS Academic Publisher, Grove City, PA. Liu, S.-F., Zhu, Y.-D. (1993). Triangular membership function evaluation model for regional economic evaluations. Journal of Agricultural Engineering 9 (2), 8–13. Luo, X.-M., Yang, H.-H. (1994). Grey comprehensive evaluation models. Systems Engineering and Electronic Technology 16 (9), 18–25. Mu, Y., Liu, J.-G. (1996). Studies on grey systems modeling method. Journal of Shan-Dong College of Construction Materials 10 (3), 67–71. Pawlak, Z. (1991). Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht. Peng, X.-L., Lu, X. (1991). Algebraic criteria for stability and instability of grey discrete systems. Applied Mathematics 4 (2), 123–128. Qiu, X.-J. (1995). Incidence analysis for grey clustering and its applications. Systems Engineering Theory and Practice 15 (1), 15–21. Shui, N.-X., Qing, S.-C. (1998). Some theoretical problems of grey systems GM(1, 1) model. Systems Engineering Theory and Practice 18 (4), 59–63.
A GREY SYSTEMS APPROACH
173
Shui, N.-X., Tong, T.-H., Sa, Z. (1990). About the GM(2, 1) model of grey systems. Systems Engineering 8 (6), 31–34. Shui, N.-X., Tong, T.-H., Sa, Z. (1992). Some theoretical problems on grey degrees of incidence. Systems Engineering 10 (6), 23–26. Soros, G. (1998). The Crisis of Global Capitalism: Open Society Endangered. Public Affairs, New York. Wang, B.-L. (1993). Computation of degrees of incidence for a class of problems. In: Selected Papers on Grey Systems. Press of Henan University, Kaifeng, pp. 101–102. Wang, G.-Y. (1990). Unascertained information and its mathematical treatment. Journal of College of Harbin Construction Engineering 23 (4), 11– 19. Wang, W.-P. (1993). Analysis of optimal values in GLP. The Journal of Grey System 5 (4), 315–317. Wang, W.-P. (1997). Study on grey linear programming. The Journal of Grey System 9 (1), 41–46. Wang, Y.-N., Liu, K.-D., Li, Y.-C. (2001). Optimal GM(1, 1) modeling of the whitenization of grey derivatives. Systems Engineering Theory and Practice 21 (5), 124–128. Wang, Z.-L., Li, X.Z. (1996). Ordering of grey numbers and inequalities. In: New Development in Grey Systems Research. Press of Huazhong University of Science and Technology, Wuhan, pp. 364–366. Wen, J.-C., Chen, J.-L., Yang, J.-S. (2001). Study of the non-equal gap GM(1, 1) modeling. The Journal of Grey System 13 (3), 5–12. Xiao, X.-P. (1997). Some study and comments on the theory of grey incidence models. Systems Engineering Theory and Practice 17 (8), 76–81. Xiong, H.-J., Chen, M.-Y., Hao, T. (1999). Two classes of grey models for control systems. Journal of Wuhan Transportation University of Science and Technology 30 (5), 465–468. Xu, G.-H. (1993). Improved generalized degrees of incidence and criteria for multi-target grey incidence decision making. In: Selected Papers in Grey Systems. Press of Henan University, Kaifeng, pp. 96–100. Xu, J.-P. (1993). Grey prediction linear programming. Management Science of China 1 (4), 5–42. Yang, X.-H., Zhang, G.-T., Jing, J.-L. (1998). Parametric estimation for grey systems GM(1, 1) model. Monitor of Drought Environment 12 (2), 76–80. Zhang, Q.-S., Deng, J.-L., Fu, G. (1994a). On grey clustering in grey hazy set. The Journal of Grey System 7 (4), 377–390. Zhang, Q.-S., Han, G.-L., Deng, J.-L. (1994b). Information entropy of discrete grey number. The Journal of Grey System 6 (4), 303–314. Zhong, S.-J. (1988). Multi-target grey situation decision making developed for port selections. Systems Engineering Theory and Practice 8 (4), 59–66.
174
LIN AND LIU
Zhou, C.-S., Deng, J.-L. (1986). The stability of grey linear system. International Journal of Control 43 (1), 313–320. Zhu, H.-G., Lian, G.-Z. (1992). Grey incidence analysis for systems with and without time series. Information and Control 21 (3), 177–179.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 141
Recent Developments in the Imaging of Magnetic Domains WITOLD SZMAJA Department of Solid State Physics, University of Łód´z, Pomorska 149/153, 90-236 Łód´z, Poland
I. II. III. IV. V. VI.
Introduction . . . . . . Experimental Techniques . . SEM Type I Magnetic Contrast Bitter Pattern Method . . . Magnetic Force Microscopy . Conclusions . . . . . . References . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
175 180 183 204 224 240 244
I. I NTRODUCTION Magnetic phenomena have been known and used by mankind for many centuries. The earliest experiences with magnetism involved magnetite (the only material that occurs naturally in a magnetic state), known also as lodestone (after its property of aligning itself in certain directions if allowed to rotate freely, thus being able to indicate the positions of north and south and to some extent also latitude), and the compass have served mariners throughout the ages. However, only Gilbert assumed correctly that the compass needle orients itself in the earth’s magnetic field. He was the first to make serious contributions into an understanding of magnetism. Quantitative studies of magnetism were first conducted by Coulomb, who used torsion measurements to determine the forces between magnetic poles. Work to relate electricity and magnetism began with the systematic observations of Oersted. Work to explore the interactions between light and magnetic materials began with the classic studies of Faraday. Maxwell’s efforts to provide a theoretical context for these observations stand as a legacy of the 1800s (Bader, 2002; Jakubovics, 1994; Cowburn, 2000; Mankos et al., 1996). A ferromagnet is usually divided into regions, within which the atomic magnetic moments are aligned parallel to each other and the magnetization (defined as the magnetic moment per unit volume) is equal to a constant value, called the saturation magnetization. These regions are known as magnetic domains. The magnetization in different domains is in different directions, so that the overall magnetization of the ferromagnetic specimen can be small or even zero (in the latter case, the specimen is in the demagnetized 175 ISSN 1076-5670/05 DOI: 10.1016/S1076-5670(05)41003-4
Copyright 2006, Elsevier Inc. All rights reserved.
176
SZMAJA
state). Magnetic saturation of the specimen is produced by aligning the magnetization of each domain with the applied magnetic field. To explain the fact that in some cases it is possible to change the overall magnetization of the specimen from zero to a saturation value of the order of 1000 G by the application of a very low magnetic field, of the order of 0.01 Oe, in 1907, Weiss introduced the concept of a molecular field, which tends to align the atomic magnetic moments parallel to each other (below the Curie temperature). He also introduced the concept of magnetic domains, to explain the fact that the overall magnetization of ferromagnetic materials can be zero in the absence of an external magnetic field. In fact, Weiss did not justify either of his two concepts. It is at present known that the origin of the molecular field lies in the quantum-mechanical exchange force, as explained by Heisenberg in 1926, while the reason for the occurrence of domains was given by Landau and Lifshitz in 1935, who showed that domains are formed through the minimization of the total energy of the system. The domain concept was further refined by Néel and Kittel, leading to the micromagnetic theory formulated by Brown (Kittel, 1949; Jakubovics, 1994; Mankos et al., 1996; Hubert and Schäfer, 1998). A material is always in a state in which its total energy is a minimum. The total energy of a ferromagnet is the sum of different energy contributions. Domains occur when a state with domain structure has a lower total energy than a uniformly magnetized state (which is also commonly called a singledomain state). Thus, what requires explanation is why the former state is energetically more favorable than the latter state. This can be understood on the basis of Figure 1. First, it must be remembered that the strong exchange interaction (without which the material would be paramagnetic and not ferromagnetic) tends to align all the atomic magnetic moments in the same direction. As a result, the exchange energy is a minimum for a uniformly magnetized state of the specimen (Figure 1a). Any change in orientation of the magnetization away from uniform alignment increases the exchange energy, and therefore a reduction in other energy contribution(s) must take place, so that the total energy of the specimen is a minimum. In fact, the considered state of uniform magnetization generates a large amount of stray magnetic field, and consequently the magnetostatic energy is large. The magnetostatic energy is much smaller for the specimen subdivided into four domains (Figure 1b) and will become gradually less as the specimen is subdivided into more and more domains, simply because the stray field is confined to a smaller and smaller region near the specimen surface. The transition region between domains, which typically extends through a few hundred interatomic distances, is called the domain wall and has a certain amount of energy associated with it. Thus, the process of subdivision into domains is expected to be continued until the energy required to establish an
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
(a)
(b)
(c)
177
(d)
F IGURE 1. Domain formation in (b–d) as a result of reduction in the magnetostatic energy of a uniformly magnetized specimen (a).
additional domain wall is larger than the reduction in the energy resulting from the finer subdivision (Kittel, 1949; Jakubovics, 1994; Mankos et al., 1996; Hubert and Schäfer, 1998). It is also possible to propose domain configurations such as those shown in Figures 1c and 1d. They are stray field free (except for small regions near the domain walls) and are referred to as closed domain structures. It is known from experimentation that the magnetization of a domain tends to lie along certain preferred axes, called magnetic easy axes or easy axes of magnetization. These axes are well established for different crystals. In cobalt, which has a hexagonal close-packed (HCP) crystal structure, the hexagonal axis is the only magnetic easy axis, and in Nd2 Fe14 B, which possesses a tetragonal crystal structure, the tetragonal axis is the only magnetic easy axis. As a consequence, such crystals are referred to as uniaxial. In iron, which has a body-centered cubic (BCC) crystal structure, the magnetic easy axes are the cube edges, while in nickel, which possesses a facecentered cubic (FCC) crystal structure, the magnetic easy axes are the cube diagonals. It results from this that the magnetic properties of a crystal are anisotropic, and consequently an additional amount of energy (in some cases very large) is needed to produce magnetic saturation of the crystal along an arbitrary axis than along one of the easy axes of magnetization. The excess energy required is called the anisotropy energy, and for the case considered, it is known as the magnetocrystalline anisotropy energy. In general, we must distinguish between crystalline anisotropy of the undisturbed crystal structure and induced anisotropies describing the effects of deviations from ideal symmetry as, for example, because of lattice defects or partial atomic
178
SZMAJA
ordering. In view of the presented considerations, it becomes obvious that the domain configuration shown in Figure 1b is energetically more advantageous for uniaxial materials with correspondingly high magnetic anisotropy, while the domain structure of Figure 1c is more appropriate for cubic materials or uniaxial materials with sufficiently low magnetic anisotropy (Kittel, 1949; Jakubovics, 1994; Mankos et al., 1996; Hubert and Schäfer, 1998). Figures 1c and 1d present two possible domain configurations for a material with cubic crystal structure. As mentioned previously, neither of these configurations exhibits any stray magnetic fields outside the specimen, and therefore neither has any magnetostatic energy. Because the configuration of Figure 1d contains fewer domain walls, it possesses smaller domain wall energy, and therefore it may be considered as more likely to occur than that shown in Figure 1c. However, this is not the case in reality. To explain the domain structure of cubic materials, the interaction between the magnetization and elastic distortions must be considered. It is known that when the magnetization of a specimen is changed that there is a slight change in its shape, typically of the order of 10−5 . This effect is called magnetostriction. There is also an inverse effect, in which the material effectively acquires an extra contribution to its anisotropy, induced by the stress; this additional energy is referred to as the magnetoelastic energy. The distortion associated with the magnetization is always present in the specimen. The domains are magnetized to saturation, and therefore each domain is distorted. A demagnetized specimen contains domains magnetized in various directions, and the different domains are distorted in different directions. However, the specimen of Figure 1c can be freely deformed over most of its length because the distortion produced by magnetization in opposite directions is the same. Some magnetoelastic energy is needed to hold the small triangular domains (called closure domains) to the rest of the specimen, but this energy is much smaller than that required for the large domains of Figure 1d. As a consequence, the size of the domains in materials with cubic crystal structure is generally limited by the magnetoelastic energy (Kittel, 1949; Jakubovics, 1994; Mankos et al., 1996; Hubert and Schäfer, 1998). Domains are a widespread phenomenon in ferromagnets. Domains occur when a state with domain structure has a lower total energy than a uniformly magnetized state. However, in general particular reasons for their presence appear to be different, that is, domains may exist to reduce the magnetostatic (stray field) energy or to adapt to local anisotropies or to the specimen shape— depending on the material constants, the size of the specimen, and external parameters (e.g., stress, magnetic field). For sufficiently small specimens (particles or grains and films) a uniformly magnetized state is usually energetically preferred (i.e., they contain no domain structure even if they are commonly termed to be in a single-domain state). In low-anisotropy materials, there is
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
179
an intermediate range in which continuous micromagnetic vortex states, rather than classical domains, prevail. As a consequence, magnetic domains are not a universal feature of ferromagnetic materials (Hubert and Schäfer, 1998). Magnetic domain structures can be imaged by a number of methods involving different physical mechanisms of magnetic contrast formation. The methods used include the conventional Bitter pattern technique (Carey and Isaac, 1966; Craik, 1974) and its various modifications (colloid-scanning electron microscopy [colloid-SEM] method [Goto and Sakurai, 1977; Šimšová et al., 1991], colloid-scanning tunneling microscopy [colloid-STM] technique [Rice and Moreland, 1991], interference contrast colloid method [Hartmann and Mende, 1985; S.Z. Hua et al., 1997], gas evaporation technique [Herring and Jakubovics, 1973; Sakurai and Shimada, 1992], sputtering method [Sakurai et al., 1994], solid oxygen particles decoration technique [Szewczyk et al., 1983], the method using stray field-induced birefringence of magnetic colloids [Jones and Puchalska, 1979], the technique using magnetostactic bacteria [Harasko et al., 1995]), (far-field) magnetooptic microscopy (magneto-optic Kerr effect [Kranz and Hubert, 1963; Rave et al., 1987], Faraday effect [Dillon, 1958; Chizhik et al., 1998], Voigt and gradient effects [Schäfer and Hubert, 1990; Thiaville et al., 1991; Schäfer, 1995a], magneto-optical indicator film [MOIF] technique [Nikitenko et al., 1998], magnetization-induced optical second-harmonic generation [MSHG] method [Kirilyuk et al., 1997]), transmission electron microscopy (TEM) (Fresnel and Foucault modes [Jakubovics, 1975; Chapman, 1984], [modified] differential phase contrast [(M)DPC] [Chapman et al., 1990; Tsuno, 1988], summed image differential phase contrast [SIDPC] [Daykin and PetfordLong, 1995], Lorentz phase microscopy [Volkov and Zhu, 2004], coherent Foucault imaging [Chapman et al., 1994], electron holography [Mankos et al., 1996; McCartney et al., 2001]), electron mirror microscopy (EMM) (Spivak et al., 1955; Mayer, 1957), scanning electron microscopy (SEM) (type I magnetic contrast [Jones, 1978; Szmaja, 1999], type II magnetic contrast [Jones, 1978; Tsuno, 1988], scanning electron microscopy with polarization analysis [SEMPA] [Koike et al., 1987; Scheinfein et al., 1990]), scanning ion microscopy (SIM) (type I magnetic contrast [Lloyd et al., 1999], scanning ion microscopy with polarization analysis [SEMPA] [Zheng and Rau, 1993; Li and Rau, 2005]), scanning electron acoustic microscopy (SEAM) (Balk et al., 1984; Urchulutegui et al., 1991), spin-polarized low-energy electron microscopy (SPLEEM) (Bauer, 1994), scanning probe microscopy (SPM) (magnetic force microscopy [MFM] [Grütter et al., 1992; Hartmann, 1999], scanning near-field magneto-optic microscopy [SNMOM] [Betzig et al., 1992; Bertrand et al., 1998], spin-polarized scanning tunneling microscopy [SPSTM] [Bode et al., 1998; Wulfhekel et al., 2001], scanning Hall probe microscopy [SHPM] [Chang et al., 1992; Howells et al., 1999],
180
SZMAJA
scanning superconducting quantum interference device [SQUID] microscopy [Hartmann, 2005], scanning microscopy using magneto-resistive sensor [Nicholson et al., 1996], scanning microscopy using inductive head [Prance et al., 1999], ballistic electron magnetic microscopy [BEMM] [Rippard and Buhrman, 1999], imaging by probing the magnetostrictive response using atomic force microscopy [AFM] [Wittborn et al., 2000]), photoelectron emission microscopy (PEEM) (X-ray magnetic circular dichroism [XMCD] [Schneider et al., 1997; Schönhense, 1999], X-ray magnetic linear dichroism [XMLD] [Scholl et al., 2005], magnetic contrast due to the specimen stray field [Mundschau et al., 1996], Kerr effect-like contrast [Schönhense, 1999]), X-ray microscopy (using XMCD [Kagoshima et al., 1996; Fischer et al., 1998]), lensless magnetic imaging by X-ray spectro-holography (Eisebitt et al., 2004), X-ray topography (Polcarová, 1969; Miltat, 1976), neutron topography (Schlenker and Baruchel, 1978). Each individual method of magnetic structure observation has its specific advantages and limitations, which in turn determine the applications to which it is well suited. The subjects of interest of this chapter are the SEM type I magnetic contrast, the conventional Bitter pattern technique, the colloid-SEM method, and MFM. Recent developments in the imaging of magnetic structures are demonstrated by referring to selected examples. The images presented are the original digital ones (i.e., obtained directly from the optical microscope, scanning electron microscope [SEM], magnetic force microscope [MFM], atomic force microscope [AFM], or transmission electron microscope [TEM]) after application of a simple digital procedure for contrast improvement only, except where otherwise stated. The procedure used is described in detail elsewhere (Szmaja, 1998). Nevertheless, it is to be noted that this procedure does not modify the original intensities of the image points, but only changes the way of displaying the image on the computer monitor.
II. E XPERIMENTAL T ECHNIQUES The magnetic images in this chapter were taken of the magnetic structures of cobalt monocrystals, anisotropic sintered Nd–Fe–B-based permanent magnets of different chemical composition, nanocomposite Nd2 Fe14 B/Fe3 B permanent magnets, anisotropic sintered SmCo5 permanent magnets, thin polycrystalline permalloy and cobalt films, and ferrimagnetic garnet specimens. The cobalt monocrystals had different thicknesses, down to 10 µm. The anisotropic sintered Nd–Fe–B-based and SmCo5 magnets were produced by powder metallurgy followed by sintering in an applied magnetic field (the specimens studied were in the shape of cuboids a few millimeters in size). They consisted of grains with an average size of about 10 µm and 20 µm, respectively,
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
181
TABLE 1 M AGNETIC P ROPERTIES OF THE S TUDIED M AGNETS AT ROOM T EMPERATURE Chemical composition
Nd14.5 Fe79 B6.5
Type of magnet
Anisotropic sintered (Nd0.85 Dy0.15 )14.5 Fe79 B6.5 Anisotropic sintered (Nd0.7 Dy0.3 )14.5 Fe79 B6.5 Anisotropic sintered Nd13.7 Dy0.2 Fe79.7 TM0.4 B6 , Anisotropic where TM: Al, Ga, Co, Cu sintered Nd4 Fe77.5 B18.5 Nanocomposite Nd2 Fe14 B/Fe3 B SmCo5 Anisotropic sintered
Ms (G)
(BH)max (kOe) (MGOe) M Hc
Ha K (kOe) (erg/cm3 )
Q
10.1 48
65.1
3.7 × 107
4.6
1003 >21.0 38
84.3
4.2 × 107
6.7
907 >30.0 32
103.0
4.7 × 107
9.1
87.0
5.1 × 107
5.9
1130
1170
12.7 53
1035
3.5 15
788 >20
25
260.0 1.02 × 108 26.3
Ms , saturation magnetization; M Hc , intrinsic coercivity; (BH)max , maximum energy density; Ha , anisotropy field; K, magnetocrystalline anisotropy constant; Q = K/2π Ms2 , relative magnetic anisotropy.
oriented with their c axes, being the easy axes of magnetization, along the direction of the applied field. The nanocomposite Nd2 Fe14 B/Fe3 B permanent magnets were produced from amorphous alloy prepared by a melt-spinning technique and a subsequent heat treatment procedure. They were in the form of ribbons about 40–50 µm in thickness and were composed of randomly oriented grains with an average size of about 30 nm. The magnetic properties of the investigated magnets at room temperature are given in Table 1. The thin permalloy (Ni81 Fe19 ) films with 40 nm thickness were evaporated at an incidence angle of 0 degrees (with respect to the surface normal) in a vacuum of about 10−5 mbar. The thin cobalt films with 40 nm and 100 nm thicknesses were evaporated at an incidence angle of 45 degrees in a vacuum of about 10−5 mbar. The ferrimagnetic garnet specimens were about 10 µm in thickness and had optically smooth surfaces. All of the specimens were investigated in the demagnetized state, except where otherwise stated. The magnetic structures were imaged by different techniques. The methods used include mainly the type I magnetic contrast of SEM, the conventional Bitter pattern technique, the colloid-SEM method, and MFM. Thin permalloy and cobalt films were also observed with the Fresnel (or defocus) mode of TEM. Experimental conditions, concerning both the specimens studied and the techniques used, were optimized to improve the magnetic contrast in the images. In the case of cobalt monocrystals, anisotropic sintered Nd–Fe–B-
182
SZMAJA
based and SmCo5 permanent magnets, and nanocomposite Nd2 Fe14 B/Fe3 B permanent magnets, the observed surfaces were carefully polished to reduce the topographic contrast (and thus effectively enhance the magnetic domain contrast) using successively finer SiC abrasive papers (down to 1200 grade) and diamond powders (usually 3, 1, and 0.25 µm in average diameter) with a water-free lubricant. Investigations by the type I magnetic contrast method were made using a Tesla BS 340 SEM with a conventional tungsten filament and an Everhart– Thornley electron detector. To obtain a better signal-to-noise ratio, a maximum beam current of about 10 nA and the relatively long time of image recording (20 s) were used. The specimen tilt by 25 degrees away from the detector was used to enhance the type I magnetic contrast by filtering out the backscattered electrons and the higher-energy secondary electrons. This was especially beneficial for thicker crystals under study. In the case of relatively thin crystals (thinner than ∼100 µm), a circular aperture 6 mm in diameter was placed in front of the scintillator Faraday cage of the detector for further enhancement of the magnetic contrast. The optimum size of the aperture was determined experimentally. The specimens were rotated around the surface normal to find the position corresponding to maximum type I magnetic contrast. Images were recorded at 20 keV beam energy and 15 mm working distance. To reduce the degradation of magnetic contrast due to the contamination produced by applying the comparatively large beam current, the pump was “trapped.” The vacuum in the column of the scanning microscope was better than 5 × 10−6 mbar. To study the effect of temperature on the domain structure, a bifilarly coiled heater with 30 W power (produced by VARIAN Inc., Scientific Instruments) was used. The specimen temperature was measured by an iron-constantan (ANSI symbol J) thermocouple. The accuracy of the temperature measurement was found to be 2 K under the experimental conditions. A digital image processing (DIP) system was applied to the original SEM images for their restoration, enhancement, and analysis. The system consists of a 12-bit analog-to-digital converter (ADC), a computer, and software. The ADC is used for acquisition of analog images from the SEM. The DIP system used and its capabilities are described in detail elsewhere (Szmaja, 1998). To study the domain structure with Bitter pattern technique, a drop of water-based colloidal suspension of magnetite (Fe3 O4 ) particles was applied to the specimen surface and covered with a thin microscope cover glass to spread the colloid uniformly on the surface. The optimum concentration of the colloid was determined experimentally. The domain patterns were observed by the conventional Bitter pattern method under a conventional optical microscope (PZO MET-3) using reflected unpolarized light or under a conventional optical microscope (PZO BIOLAR) using transmitted unpo-
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
183
larized light (in the case of thin permalloy and cobalt films and ferrimagnetic garnet specimens). Using a coil surrounding the specimen, a magnetic field up to a maximum value 350 Oe could be applied perpendicular to the specimen surface. For the purpose of studying the domain behavior under the influence of an external magnetic field parallel to the specimen surface, an electromagnet capable of producing a maximum magnetic field of about 5 kOe was used. The optical microscopes were equipped with a DIP system for image restoration, enhancement, and analysis. The system is composed of a charge-coupled device (CCD) camera of high sensitivity, a computer, and software. A detailed description of the DIP system and demonstration of its capabilities are presented elsewhere (Szmaja, 2002a). Investigations of the domain patterns with the colloid-SEM method were carried out using a Tesla BS 340 SEM after previous drying of the colloid placed on the specimen surface. To prevent charging of the specimen surface in the SEM, the surface was coated by sputtering a thin metallic layer (15 nm in thickness) of gold– palladium or copper. As previously mentioned, the SEM was equipped with the DIP system. Investigations with the MFM technique were made by NT-MDT instrument operated in the ac (called also dynamic) mode using MikroMasch silicon cantilevers with tips magnetized along the tip axis, which was perpendicular to the specimen surface. In this case, MFM senses the vertical component of the derivative of the force between the specimen and the tip (Grütter et al., 1992). The measurements were conducted using the two-pass method, described further in the text (see Section V and Figure 27). The image signal was detected as the phase or amplitude shift of an oscillating cantilever. The tips used were coated with a cobalt film of ∼60 nm in thickness, onto which a chromium protective film of ∼20 nm in thickness was deposited. The coercivity of the tips was approximately 400 Oe.
III. SEM T YPE I M AGNETIC C ONTRAST A simple illustration of the mechanism of type I magnetic contrast formation in the SEM, together with the coordinate system used, is shown in Figure 2 (Szmaja, 1999). The specimen is assumed to consist of plate domains with magnetization along the +z and −z directions, perpendicular to the specimen surface. The primary electron beam is incident in the −z direction and releases secondary electrons from the specimen surface. These are collected by the detector, usually of the Everhart–Thornley type, with a positive potential (typically 100–200 V) on a Faraday cage. The secondary electrons passing through the regions with a component of the stray magnetic field along −x are deflected by the Lorentz force in the +y direction. On the other hand, the
184
SZMAJA
F IGURE 2. The principle of type I magnetic contrast formation in the SEM. Reprinted from Szmaja (1999). © 1999 with permission from Elsevier.
stray field with a component along +x deflects the secondary electrons in the −y direction. As a consequence, the angular secondary electron distributions are tilted toward and away from the detector, respectively, as indicated in the figure (solid circles). Thus a difference in the number of collected secondary electrons is observed between adjacent regions of the opposite stray field, giving rise to type I magnetic contrast. As with some other methods of domain observation, the type I magnetic contrast relies on the stray fields above the specimen surface, from which the nature of the domain structure can be derived. Type I magnetic contrast is essentially a form of trajectory contrast, in which the number of emitted electrons is not influenced by the magnetic fields, as opposed to number contrast, in which different numbers of electrons leave the specimen at different places. For this reason, type I magnetic contrast cannot be obtained when the specimen current signal is used. In addition, no contrast is observed by the backscattered electrons, which have energies close to that of the incident electrons, because they are not sufficiently deflected when they pass through the specimen stray fields to affect collection with the detector. Thus type I magnetic contrast can be obtained only in the secondary electron mode, which is the most commonly used mode in the SEM. Type I magnetic contrast was recorded for the first time by Dorsey (1966). Nevertheless, the principal understanding of the physics of the contrast
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
185
mechanism and the associated signal detection must be attributed to Joy and Jakubovics (1968, 1969). They elaborated a simple analytical theory of type I magnetic contrast. The theory of the considered technique of domain observation was also the subject of interest by other authors (Wardly, 1971; Yamamoto and Tsuno, 1975; Dunk et al., 1975; Jones, 1976; Wells, 1985; Chim et al., 1993; Chim, 1994, 1995, 1997; Kotera et al., 1995). At present, it appears to be generally accepted that the decisive factor determining the maximum ∞ magnitude of type I magnetic contrast is the so-called field integral ξ = 0 Hx (z) dz, which gives a measure of the flux per unit length of the microfield. A close examination of Figure 2 shows that extremum values of the magnetic signal should be observed at the domain walls. Indeed, this is the case in most experiments, although sometimes the effective symmetry axis for secondary electron collection does not coincide with the physical symmetry axis of the detector (Dunk et al., 1975). Of course, the sign of the extremum (maximum or minimum) is determined by the direction of the Lorentz force. A maximum magnetic signal occurs where the Lorentz force is toward the detector, and a minimum magnetic signal where the Lorentz force is away from the detector. As a general rule, the maximum type I magnetic contrast, observed between adjacent domain walls, is proportional to the product of the saturation magnetization Ms and domain size D (Jakubovics, 1975; Jones, 1978; Szmaja, 1999). The mechanism of type I magnetic contrast depends on some of the secondary electrons escaping collection. According to the monoenergetic model of Joy and Jakubovics (1969), which neglects different secondary electron velocities, the magnetic signal is inversely proportional to the secondary electron velocity. This means that the lowest-energy secondary electrons, deflected by the largest amounts by the stray field of the specimen, contribute most to the observed type I magnetic contrast. However, the recent energydependent model (Chim et al., 1993; Chim, 1994), using an approximate form of the Chung and Everhart secondary electron energy distribution (Chung and Everhart, 1974), shows that the magnetic contrast is produced most strongly by the secondary electrons with energies in the range of 1–3 eV. This occurs because the contrast depends not only on the solid angle of collection (which is inversely proportional to the secondary electron velocity), but is also weighted by the secondary electron yield, which is an increasing function of the secondary electron energy in the range of 0–4 eV for most materials. In practice, the type I magnetic contrast can be enhanced by the use of energyselecting devices that filter out electrons carrying little information about the stray fields of the specimen. In this context, the simplest but very effective methods for improving the directionality of secondary electron collection and thereby the magnetic contrast are placing an aperture over the collector
186
SZMAJA
F IGURE 3. Type I magnetic contrast image of the domain structure at the basal surface of a bulk cobalt monocrystal (a) and image signal profile across the domain structure along the selected line (b).
(Banbury and Nixon, 1967; Joy and Jakubovics, 1968; Cort and Steeds, 1972; Dunk et al., 1975; Fidler et al., 1977; Szmaja, 1999) and tilting the specimen (Kammlott, 1971; Fidler et al., 1977; Szmaja, 1999). It has also been shown theoretically that these two methods can lead to improvements in the type I magnetic contrast (Wardly, 1971; Dunk et al., 1975; Jones, 1976; Chim et al., 1993; Chim, 1994, 1997). Type I magnetic contrast is easily observable from domain structures with ξ > 1 G · cm. However, in the cases of ξ < 0.1 G · cm, poor-quality images with very weak or no type I magnetic contrast are obtained, even when standard techniques of analog signal processing belonging to SEMs and different instrumental modifications for improving magnetic contrast are used (Jakubovics, 1975; Jones, 1978; Szmaja, 1999). Type I magnetic contrast has been used to image domains of materials with uniaxial or orthorhombic magnetic anisotropy, and in early studies, it was also obtained from recording heads and tapes and from currents passing through parts of etched circuits. To date, it has been impossible to observe type I magnetic contrast from materials with in-plane magnetization, where stray fields are present only at domain walls. The reason for this is the very low level of contrast, estimated to be less than 0.1% (Fathers et al., 1973a; Jakubovics, 1975). As a consequence, materials with in-plane magnetization cannot be investigated by the type I magnetic contrast technique. Figure 3a shows an image of the magnetic domain structure at the basal surface of a bulk cobalt monocrystal. Alternate black and white bands are due to the main domains extending through the entire thickness of the crystal along the c axis, which is the magnetic easy axis. This finding has been proved
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
187
by observation that the domain patterns on the two opposite basal surfaces of the crystal were identical, except for the surface domains (Joy and Jakubovics, 1969). The main domains are undulated only near the surface. In the interior of the specimen, the structure is simpler and consists of a series of (nearly) regular domains magnetized in opposite directions along the magnetic easy axis. This has been shown convincingly by domain observations on the axial (or prism) planes of crystals not only of cobalt, but also other uniaxial materials with sufficiently high magnetic anisotropy (e.g., Kaczér and Gemperle [1960], Craik and Tebble [1961], Wysłocki et al. [1996], Lee et al. [1999]). Small circular regions, visible within the main domains in the image of Figure 3a, are due to the surface domains of reverse magnetization, which are most frequently termed reverse spikes or reverse domains. Both the main domains and reverse spikes contribute to the stray fields of the specimen. However, the type I magnetic contrast from the surface domains is weaker because they are smaller in size than the main domains. This is clearly visible when observing the image contrast in a line scan across the domain structure, as shown in Figure 3b, where the reverse spikes are imaged as perturbations in the saw tooth-like intensity plot due to the main domains. In this way, internal domains can be distinguished from surface domains (Szmaja, 1999). The presence of the main domains in type I magnetic contrast images clearly shows that the considered method has a large probing distance of information above the specimen surface. The large probing distance, on the other hand, is connected to a low surface sensitivity. These two features of type I magnetic contrast can easily be understood on the basis of Figure 4, which presents in a simplified manner the configuration of the stray field above the surface perpendicular to the easy axis of magnetization for a thick specimen of uniaxial ferromagnet with sufficiently high anisotropy (Szmaja, 2000). Although the character of the stray field is highly complex, there is a general property that the distance over which the stray field from a magnetic domain extends above the specimen surface is, to a good approximation, equal to the domain size. Consequently, the stray field produced by surface domains extends over a much smaller distance than that resulting from much larger main domains (i.e., the character of the stray field becomes simpler with increasing distance from the specimen surface). That the stray field from main domains is not completely “masked” by that from a complex system of numerous surface domains has been proved directly by using the secondary electron mode of SEM (Fathers and Jakubovics, 1977; Lewis et al., 1998). In the type I magnetic contrast method of SEM, the stray field is probed by the secondary electrons through its entire range on their way from the specimen surface to the detector, and information on the surface domains and domain walls “carried” by the secondary electrons near the specimen surface is partially lost as they travel away from the surface. As a result,
188
SZMAJA
F IGURE 4. Schematic representation of the configuration of the stray field above the surface perpendicular to the magnetic easy axis for thick specimen of uniaxial material with sufficiently high magnetic anisotropy. Reprinted from Szmaja (2000). © 2000 with permission from Elsevier.
the method is related to a large probing distance of magnetic information (of some tens of micrometers or more in the case of thick specimens) and weak surface sensitivity, complementary to Bitter colloid technique and MFM. This becomes apparent when Figure 3a is compared with Figures 14c, 16a, 23, and 35c. In the type I magnetic contrast image of Figure 3a, the main domains are clearly seen, and the surface domains are displayed approximately as circles, whereas the Bitter patterns of Figures 14c, 16a, 23, and the MFM image of Figure 35c (in the absence of an external magnetic field) show only the surface domain structure but in much more detail, with the larger-scale surface domains visible in the form of stars and flowers (Szmaja, 2000). Type I magnetic contrast is easily observed at room temperature from thick crystals with high uniaxial magnetic anisotropy, such as cobalt (Joy and Jakubovics, 1969; Szmaja, 1996), magnetoplumbite (PbFe12 O19 ) (Fathers et al., 1973b; Jones, 1977), or Nd2 Fe14 B (Lewis et al., 1998; Zhu and McCartney, 1998), because the level of contrast is fairly high. For example, in the case of bulk cobalt crystals, the magnetic contrast can be as high as 30%. Nevertheless, for relatively thin cobalt crystals (thinner than ∼50 µm) or at sufficiently high temperatures (higher than ∼450 ◦ K), the type I magnetic contrast becomes very low and thus difficult to observe. Poor-quality SEM images with little or no domain contrast are then obtained, even when optimizing the most important instrumental factors and using standard techniques of analog signal processing provided with SEMs. An example is presented in Figure 5a, which shows an original image of the domain structure of a 20 µm thick cobalt monocrystal. However, the mentioned problem can be overcome
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
189
F IGURE 5. (a) Very-low-quality original type I magnetic contrast image of the domain structure at the basal surface of a 20 µm thick cobalt monocrystal. Image (b) was obtained by applying a simple digital procedure for contrast improvement to the original image. Averaging 10 individual images, one of which is shown in (b), resulted in the image (c). Image (d) was obtained after application to the image (b) a circularly symmetric low-pass filter designed using the Kaiser window. Reprinted from Szmaja (1999). © 1999 with permission from Elsevier.
by the application of DIP. Applying a simple digital procedure to the original image resulted in the image with significantly enhanced contrast (Figure 5b). It is now clearly seen that the magnetic signal of interest is superimposed on a large amount of background noise, and for this reason further image processing is desirable. The image with reduced noise (Figure 5c) was obtained by averaging 10 individual images similar to that in Figure 5b. A still better quality image is presented in Figure 5d, which was obtained by the application to the image from Figure 5b of a circularly symmetric low-pass filter designed using the Kaiser window. The effects of contrast improvement and noise reduction are striking when Figure 5a is compared with Figure 5d (Szmaja, 1998, 1999). Figure 6 shows a series of images of the domain structure at the (0001) surface of cobalt monocrystals of different thicknesses. The domain pattern changes with the specimen thickness, as expected. For thicker crystals (Figure 6a–c), the magnetic domain structure consists of the main domains
190
SZMAJA
F IGURE 6. Type I magnetic contrast images of the domain structure at the basal surface of 1.91 mm (a), 1.19 mm (b), 400 µm (c), 280 µm (d), 110 µm (e), 50 µm (f), 35 µm (g), and 10 µm (h) thick cobalt monocrystals. Parts (a), (c–f), and (h) are reprinted from Szmaja (1996). © 1996 with permission from Elsevier.
undulated near the surface and surface domains. Rotation of the specimen by 180 degrees about the surface normal yielded contrast reversal from both the main and surface domains. This in turn means that the surface domains are not magnetized parallel to the surface, that is, they are not closure domains (Szmaja, 1996, 2000). The surface domains are likely to be reverse spike domains, and still more likely, their magnetizations are tilted away from the surface normal, as suggested in the past by Hubert (1967a). In the case of cobalt, the energy of the crystal cannot be reduced by the formation of closure domains near the surface because of the sufficiently high magnetocrystalline anisotropy. The waviness amplitude of the main domains near the surface decreases slightly with decreasing specimen thickness. The waviness amplitude is also expected to decrease gradually with depth of
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
F IGURE 6.
191
(Continued)
penetration into the interior of the specimen; this reduces the total wall area at the cost of the appearance of magnetic poles associated with the domain walls (Goodenough, 1956). The cause for the undulation of the main domains (the wavy or maze structure) and the occurrence of reverse spike domains (i.e., branching of the main domains) near the surface is the reduction in the magnetostatic energy at the cost of a larger total Bloch wall area. For the specimen 280 µm thick, practically no reverse spike domains can be seen in the image (Figure 6d). Meanwhile, the literature shows that the disappearance of reverse spikes has been observed in cobalt crystals approximately 50–100 µm thick by the conventional Bitter pattern method (Takata, 1963) and magneto-optic Kerr microscopy (Hubert, 1967a). The exact reason for this discrepancy is not clear. However, the most likely explanations seem to be that the type I magnetic contrast produced by the reverse spikes present within the main domains
192
SZMAJA
became too weak and consequently they are not seen in the image or that the reverse spikes could not be observed because of insufficient spatial resolution of the type I magnetic contrast technique. As the specimen thickness was further reduced, the waviness amplitude of the domains became smaller and smaller (Figures 6e–g), and the Kittel’s parallel-plate domain structure was finally observed for a crystal 10 µm thick (Figure 6h), in agreement with the results reported by Gemperle and Gemperle (1968). Our interest is the dependence of the width of the main domains on the crystal thickness. The domain width was determined by digital means. This is of significance for complex domain patterns observed for thicker cobalt monocrystals (Figures 6a–c). In such cases, the application of DIP techniques allows more precise and objective determination of the domain width than by visual measurements. A simple procedure of contrast improvement, median filter (with sufficiently wide median window to eliminate surface domains from the image), and segmentation technique (on the level of the average image intensity) was applied in sequence to each original image, resulting in the corresponding black-and-white images, from which the statistical distributions of the widths of the main domains and their various parameters were calculated, taking into account all horizontal image lines (Szmaja, 1996, 1998). An example is shown in Figure 7. The procedure applied for determining the domain width uses the property that the main domains run mostly in a vertical direction (this always can be obtained easily by electronic rotation belonging to the SEM) and is illustrated in Figure 7a for the selected horizontal line indicated by the arrow. For each horizontal image line, we move from the left to the right and find the points being the left boundaries of the successive domains. For each such point, the corresponding point on the right domain boundary is found in such a way that the distance between these two points is the smallest one (in Figure 7a the resulting segments are marked in black for “white” domains, and conversely); this distance is taken as the width for the domain considered. In addition, the domain widths were determined using Fourier analysis. For each image line, the fast Fourier transform (FFT) was used to detect the main spatial frequency and the corresponding domain width. The value obtained by averaging over all image lines was taken as the domain width of the image. Within experimental error, the domain widths obtained on the basis of the Fourier analysis and on the basis of the statistical distribution of the domain widths were the same for a given image (Szmaja, 1996, 1998). For each specimen of a fixed thickness, the domain width was determined as an average from at least five images recorded at various places on the specimen surface. The relation between the width of the main domains D and the crystal thickness L was analyzed and established in the form of a power dependence D(L) = aLb . Using the method of least squares, the
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
193
F IGURE 7. (a) Image obtained by digital processing of the image from Figure 6b and the corresponding statistical distribution of widths of the main domains (b). The method used to determine the domain widths is illustrated in (a) for the selected horizontal line indicated by the arrow.
dependence was found to be given by D(L) = 0.79L0.49 for the thickness range approximately 10 µm to Lcr and D(L) = 0.30L0.66 for the thickness range approximately Lcr to 1.9 mm, where Lcr ≈ 50–80 µm is the critical thickness above which reverse spike domains are expected to appear. The character of the domain structure observed (Figure 6f–h) and the results obtained suggest that below Lcr the thickness dependence of the domain width of cobalt monocrystals can be described by the Kittel’s one-half power law (Kittel, 1946). For larger crystal thicknesses, the dependence is a power dependence with an exponent close to 2/3. The latter dependence is found to be due to the surface decoration with reverse spikes, which becomes more complicated with increasing crystal thickness (Szmaja, 1996). Such a dependence was derived theoretically for the first time by Kaczér (1964). The type I magnetic contrast method of SEM allows a quantitative interpretation of the image contrast in terms of the domain structure of the specimen (Fathers et al., 1973a; Jones, 1977; Szmaja, 1999). The procedure used is to assume some model for the domain structure and then to calculate what images would show. This is done for different models until theoretical results are in close agreement with the features observed in the images. In the case of applying such a procedure, doing a reliable quantitative analysis of the complicated domain structure—and that present at the (0001) surface of thick cobalt monocrystals clearly belongs to this category—proves to be substantially impossible. This is because of problems arising from the need to ascribe a functional form of the magnetization variation with a modest number of variable parameters and from the number of other specimen and instrumental parameters that affect the image intensity distribution (Chapman, 1984). In
194
SZMAJA
the context of quantitative determination of the domain structure, it is also noteworthy that the techniques of domain observation that use the specimen stray field do not allow the derivation of a unique specimen magnetization distribution from the measured quantities because the same stray field can be produced by a number of magnetization configurations (Mallinson, 1981; Vellekoop et al., 1998). The domain structure at the basal surface of thick cobalt monocrystals was investigated quantitatively for the first time by Unguris et al. (1989) using SEMPA and then by Hubert et al. (1990) using magneto-optic Kerr microscopy. This was possible because these techniques measure quantities directly related to the magnetization of the specimen and not to the stray fields. The perpendicular magnetization patterns obtained by SEMPA resembled those observed previously (in a qualitative way) by the conventional Bitter pattern technique and the polar magneto-optic Kerr effect. In addition, an in-plane magnetic structure was detected for the first time. The in-plane magnetization was found to be fractured into well-defined submicron domains that appear to reflect the sixfold symmetry of the cobalt (0001) surface. The SEMPA measurements showed that the perpendicular magnetization component varies continuously and is in general smaller than the in-plane magnetization component (Unguris et al., 1989). The presence of the in-plane magnetization component was subsequently confirmed by quantitative Kerr microscopy, with necessarily reduced spatial resolution because this technique is an optical one (Hubert et al., 1990). However, in contrast to the SEMPA measurements, the study by the magneto-optic Kerr effect showed that the perpendicular magnetization component is generally dominating. The reason for the difference in quantitative information on the magnetization obtained by SEMPA and the magneto-optic Kerr effect is their different probing depths— ∼1 nm (Unguris et al., 1989; Celotta et al., 2000) and ∼20 nm (Traeger et al., 1992; Schäfer, 1995b) in metals, respectively. The character of the domain structure of the specimen is dependent on the relative magnetic anisotropy Q, defined as Q = K/Kd , where K is the (effective) anisotropy constant and Kd = 2π Ms2 is the stray field energy constant (Ms is the saturation magnetization). For cobalt, Q ≈ 0.4 (at room temperature), that is, cobalt is a material with a strong uniaxial magnetocrystalline anisotropy (K ≈ 106 –107 erg/cm3 ), but not very high in comparison with the stray field energy constant Kd . In particular, Q < 1 means that the anisotropy energy density K for in-plane magnetization is smaller than the stray field energy density Kd for uniform perpendicular magnetization. As a result, sufficiently thin cobalt films, thinner than ∼26–28 nm (Grundy and Johnson, 1969), are magnetized parallel to the surface. (Other types of the domain structure occur in ultrathin films.) For the thickness range 26–28 nm (Grundy and Johnson, 1969) up to ∼10 µm (Hubert, 1967a; Szmaja, 1996),
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
(a)
(b)
195
(c)
F IGURE 8. Models of the domain structure of uniaxial ferromagnets: the Kittel model (a), the Landau–Lifshitz model (b), and the model with partial opening of the ‘tilted closure’ surface domains (c). Reprinted from Szmaja (2000). © 2000 with permission from Elsevier.
the domain structure of cobalt can reasonably be represented by the Kittel’s parallel-plate domain configuration (Figure 8a), and it becomes increasingly complicated with increasing crystal thickness, as described previously. For materials with high and low uniaxial magnetic anisotropy, the Kittel model of open domain configuration (Figure 8a) and the Landau–Lifshitz model of closed domain structure (Figure 8b) are energetically advantageous, respectively. In the case of cobalt, however, the energies of the two mentioned domain configurations are nearly the same (Kittel, 1946). This means that, in fact, neither the Kittel nor the Landau–Lifshitz model are especially adequate. In this context, it is interesting to determine which conceivable domain model yields the lowest energy and thus comes closest to reality in an intermediate range of Q-values. Although the Kittel model is primarily applicable to materials with large Q-values, its applicability can be extended toward lower Q by the μ∗ -correction (the μ∗ -effect is related to deviations of the magnetization from the magnetic easy axis or axes due to the stray field). When comparing some modifications of the Landau–Lifshitz structure with the μ∗ -corrected Kittel configuration, the results obtained can be summarized as follows (Hubert and Schäfer, 1998). For Q > 0.8, the μ∗ -corrected Kittel model is the best choice for uniaxial specimens. For smaller relative magnetic anisotropies (including cobalt with Q ≈ 0.4), the combined model shown in Figure 8c, which allows for a partial opening of the “tilted closure” surface domains, is energetically more favorable. On the one hand, the model of Figure 8c is not certainly a very good representation of the domain structure of bulk monocrystalline cobalt because it is rather simple and two-dimensional, whereas the real structure near the (0001) surface is much more complex and three-dimensional. On the other hand, this simple model can be recognized as a reasonable approximation of the real, complicated domain structure because it substantially predicts and allows us to understand the most important results
196
SZMAJA
F IGURE 9. Temperature changes in the type I magnetic contrast from the basal surface of a cobalt monocrystal. Images of the domain structure (a–c, e, f) and surface topography (d) taken at 295 ◦ K (a), 480 ◦ K (b), 505 ◦ K (c), 507 ◦ K (d) during the heating process, and 504 ◦ K (e), 295 ◦ K (f) during the cooling process. Specimen thickness 1.87 mm. Reprinted from Szmaja (1999). © 1999 with permission from Elsevier.
of domain studies, in particular the partial flux opening effect observed in fact quantitatively by SEMPA and magneto-optic Kerr microscopy (Szmaja, 2000). The type I magnetic contrast technique of SEM is well suited to studies of changes in the domain structure as a function of temperature, applicable over a wide temperature range. An example is presented in Figure 9, which shows temperature dependence of the domain structure at the (0001) surface of a thick cobalt monocrystal (Szmaja, 1999). In the present case, the specimen was heated from room temperature (Figure 9a) to the maximum temperature 671 ◦ K and then cooled down to room temperature (Figure 9f). As the temperature was increased, the type I magnetic contrast was observed only to the temperature 505 ◦ K (Figure 9c). Above this temperature a closed domain structure (i.e., without stray fields above the specimen surface) was formed since no type I contrast was observed (Figure 9d). On returning the specimen to room temperature, an open domain configuration (i.e., with stray fields above the specimen surface) reappeared at 504 ◦ K (Figure 9e), where type I magnetic contrast was again visible. In the case presented, the changes in
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
197
F IGURE 10. Original images of the domain structure at the basal surface of a cobalt monocrystal at 480 ◦ K (a) and 505 ◦ K (b) with very weak or no type I magnetic contrast. These images should be compared with the images with considerably enhanced contrast, shown in Figures 9b and 9c, respectively. (b) reprinted from Szmaja (1998). © 1998 with permission from Elsevier.
the domain structure during the heating cycle were reversible. By observing how the type I magnetic contrast disappeared (reappeared) during heating (cooling) the specimen, the temperature of the magnetic phase transition between an open domain structure and a closed one could be determined (Szmaja et al., 1995a; Szmaja, 1999). It must be noted, however, that a detailed investigation of the temperature changes of the domain structure in the vicinity of the magnetic phase transition would not be possible without using the DIP system because of the very low level of type I magnetic contrast. (The statement appears to be additionally supported by the results reported by Cort and Steeds [1972], who observed the disappearance of type I magnetic contrast already at about 470 ◦ K.) An example is presented in Figure 10, which shows the original images of the domain structure of a cobalt monocrystal at 480 ◦ K (Figure 10a) and 505 ◦ K (Figure 10b), where only very weak or no type I magnetic contrast is seen. These images should be compared with the images with considerably enhanced contrast, shown in Figure 9b and Figure 9c, respectively. Moreover, with DIP we could obtain the statistical distributions of the widths of the main domains for different temperatures (Szmaja et al., 1994, 1995a, 1995b, 1997), applying the method described previously (see Figure 7). In the case of such complex domain patterns as those present at the basal surface of thick cobalt monocrystals, for obvious reasons it is practically impossible to do this in a precise and objective way by visual measurements. The domain structure on the basal plane of cobalt monocrystals during a heating cycle was studied for specimens of different thicknesses, substantially for the thickness range approximately 0.7–1.9 mm (Szmaja et al., 1995a, 1995b, 1997). The specimens were heated up to a maximum temperature,
198
SZMAJA
which was in the range approximately 590–750 ◦ K, and then cooled down to room temperature. The changes in the domain structure that occur with increasing temperature are due to the strong temperature dependence of the magnetocrystalline anisotropy energy of the HCP phase of cobalt. It is known that the first anisotropy coefficient K1 becomes equal to zero at 518 ◦ K and then decreases to a negative value less than −2K2 (K2 is the second anisotropy coefficient) above 613 ◦ K (Barnier et al., 1961; Träuble et al., 1965). This results in the rotation of the easy direction of magnetization from the hexagonal axis to a direction in the basal plane. As a consequence, the observed rearrangement of the domain structure takes place. The magnetic phase transition between an open-flux domain structure and a closed one was found to be a reversible process during the heating cycle. The temperature of the transition was thickness dependent and decreased from about 530 ◦ K to 500 ◦ K as the crystal thickness increased from approximately 0.7 mm to 1.9 mm. As mentioned previously, for cobalt at room temperature the energies of the Kittel open domain structure (Figure 8a) and the Landau– Lifshitz closed domain configuration (Figure 8b) are nearly the same (Kittel, 1946). The magnetic phase transition is therefore expected to occur at temperatures lower than that at which the process of the magnetization rotation from the hexagonal axis to the basal plane is finished. This prediction is confirmed by the experimental results. In particular, for crystals thicker than ∼1.4 mm, the magnetic phase transition took place at a temperature lower than 518 ◦ K, at which the mentioned rotation of the magnetic easy direction only begins. The changes in the domain structure during the heating cycle were reversible under the condition that the specimen was not carried through the structural transition between the HCP and the FCC phase of cobalt at ∼690 ◦ K (Szmaja et al., 1995a, 1995b, 1997). As the specimen was heated above the temperature of the structural transition, the changes in the domain structure were irreversible. The transition produces intrinsic lattice imperfections (i.e., dislocations and point defects). The dislocations, due to their strong magnetostrictive interactions with the domain walls, act as effective obstacles to domain wall mobility (Klugmann et al., 1994). This results in irreversible changes in the domain structure during the heating cycle. All techniques of magnetic domain observation suffer from the principal problem that, in addition to the magnetic information, undesirable nonmagnetic features (particularly topographic, but they can also be compositional or crystallographic in origin) also contribute to the image contrast. This also concerns techniques such as SEMPA (Koike et al., 1987; Scheinfein et al., 1990), SPLEEM (Bauer, 1994), and those using X-ray magnetodichroic effects (Schneider et al., 1997; Schönhense, 1999), although they usually record the difference signal and for this reason are sometimes referred to as substantially insensitive to nonmagnetic features, which is in fact true only for
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
199
sufficiently smooth and homogeneous specimens (Hubert and Schäfer, 1998; Celotta et al., 2000). The possibility of separating magnetic information from other effects is a very important advantage, the lack of which, in unfavorable cases, can be a source of serious confusion or can lead to the situation in which the nonmagnetic contrast is stronger than and effectively conceals the magnetic domain structure of interest. It allows direct observation of the relationship between the magnetic structure and nonmagnetic features and is of great value for understanding and producing magnetic materials with optimized properties (Szmaja, 2001). In the case of the type I magnetic contrast method of SEM, the intensity of collected secondary electrons is used as the image signal, which represents not only magnetic features, but also topographic and compositional information, for example. The images of Figures 9c and 9e clearly show that for weak type I magnetic contrasts the topographic contribution becomes a significant fraction of the total secondary electron signal, thus reducing the magnetic effects. If the SEM possesses two secondary electron detectors placed on opposite sides of the specimen, magnetic information can easily be separated from other type features. By subtracting and adding electronically the signals from the two detectors, the resulting images reflect only magnetic and nonmagnetic contrast, respectively, as a consequence of the fact that the type I magnetic contrast produced by one detector is of opposite sign to that obtained from the other detector (see Figure 2), while other contrasts are not. Nevertheless, the two-detector system has the disadvantage of increased cost and complexity; therefore, most SEMs are equipped with a single detector. Even then the separation of magnetic and nonmagnetic signal contributions is possible, thanks to the application of the DIP system, as demonstrated for the first time by Szmaja (1998). The procedure used for this purpose is illustrated in Figure 11, for cobalt monocrystal (Szmaja, 1998). It uses the property that type I magnetic contrast changes its sign when the specimen is rotated by 180 degrees around its surface normal (see Figure 2), whereas other types of contrast do not. As the first step, the image with domain structure is recorded (Figure 11a). Then the specimen is rotated (as precisely as possible) by 180 degrees about the surface normal, and the suitable image from (nearly) the same specimen area is recorded and rotated digitally by 180 degrees (Figure 11b). (Note that the bright magnetic domains in the image of Figure 11a are visible as dark in the image of Figure 11b, and vice versa. Note also that the topographic contrast is superimposed on the magnetic domain contrast in both images.) In practice, however, it is impossible to record these two images with ideal point-to-point correspondence. Simple digital procedures of image translation and rotation were used to align the images accurately in position and orientation (Szmaja, 1998, 1999, 2002b). Alternatively, this can be done by correlation techniques (Hawkes, 1984;
200
SZMAJA
F IGURE 11. Images recorded from the basal surface of a bulk cobalt monocrystal. Type I magnetic contrast reversal is observed in (b) after rotating the specimen about its surface normal by (approximately) 180 degrees with respect to the initial position (a). Image (b) was rotated digitally by 180 degrees for ease of comparison with image (a). After accurate aligning the images shown in (a) and (b), the latter was subtracted from and added to the former, resulting in the images with only magnetic (c) and topographic (d) information, respectively. Reprinted from Szmaja (1998). © 1998 with permission from Elsevier.
Reimer, 1985; Neumann et al., 1987). As the last step, difference and sum of the two aligned images are performed, resulting in the images with only magnetic (Figure 11c) and topographic (Figure 11d) information, respectively. Comparing these two images allows direct assessment of whether the magnetic structure is correlated with the surface topography. In the case presented, the domain structure is not influenced by the surface topography. An additional advantage of this procedure is that enhanced magnetic contrast is obtained in the difference image because of the reversed type I magnetic contrasts in the subtracted images. Note here that this effect, which can be of
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
201
major importance in the case of very weak magnetic contrast, is not visible when comparing Figure 11c with Figures 11a and 11b because they show images after applying a simple digital procedure for contrast improvement (Szmaja, 1998, 1999). Very recently we have succeeded for the first time in using the SEM type I magnetic contrast to observe the domain structure of Nd–Fe–B-based permanent magnets (Szmaja, 2005). The magnets studied were anisotropic sintered Nd–Fe–B-based specimens, and domain observations were carried out at the surface perpendicular to the alignment axis. The magnets were composed of grains with an average size of ∼10 µm. They exhibit relatively small domains, of the order of 1 µm in width (in the demagnetized state). Moreover, they are multiphase and chemically very reactive and consequently require careful surface preparation for the purpose of observation of the magnetic domain structure (Sagawa et al., 1984; Folks and Woodward, 1998). Because of this, type I magnetic contrast investigations are difficult because the magnetic signal is weak and additionally superimposed on a large amount of topographic contrast, as shown in Figure 12. Figure 12a shows an original image of the domain structure of a (Nd0.7 Dy0.3 )14.5 Fe79 B6.5 magnet. The original image is of low contrast, as expected. The quality of the original images could be improved considerably by the application of a simple digital procedure for contrast enhancement. As a result, images such as those shown in Figures 12b and 12c are obtained for (Nd0.7 Dy0.3 )14.5 Fe79 B6.5 and Nd14.5 Fe79 B6.5 magnets, respectively. In these images, the bright and dark areas in the form of stripes are magnetic in origin, whereas the regions in the form of spots (more sharply delineated, displayed in black and white) are of topographic origin. This was proved unambiguously by rotating the specimen by 180 degrees around the axis perpendicular to the specimen surface or tilting the specimen toward the detector. In the former case, the type I magnetic contrast reversed its sign, whereas the topographic contrast was unchanged. In the latter case, shown in Figures 12c and 12d, the type I magnetic contrast decreased and finally disappeared (Figure 12d), whereas the topographic contrast remained unchanged. By using the procedure for contrast enhancement, it was possible to obtain the images in Figures 12b and 12c with still higher contrast, but then the distinction between magnetic domains and topographic features was more difficult. The smallest domain that could be resolved using the type I magnetic contrast method had a width of 0.8 µm (Szmaja, 2005). This is the best result so far obtained with this method. However, the relatively low resolution of type I magnetic contrast is its distinct disadvantage. Because of this, we were not able to observe reverse spike domains ∼0.5 µm in diameter. The presence of such reverse spikes at the studied surface of the Nd–Fe–B-based magnets was proved by observations with Bitter pattern technique and MFM (Szmaja, 2006; also see
202
SZMAJA
F IGURE 12. Poor-quality original type I magnetic contrast image of the domain structure of anisotropic sintered (Nd0.7 Dy0.3 )14.5 Fe79 B6.5 magnet at the surface perpendicular to the alignment axis (a) and image with significantly enhanced contrast (b) obtained after applying a simple digital procedure to the original image. Type I magnetic contrast image of the domain structure of anisotropic sintered Nd14.5 Fe79 B6.5 magnet at the surface perpendicular to the alignment axis (c) and image of the surface topography obtained by tilting the specimen by 40 degrees toward the detector (d).
Sections IV and V). We attempted to eliminate or considerably reduce the topographic information in type I magnetic contrast images of anisotropic sintered Nd–Fe–B-based magnets by applying the digital image difference procedure as described earlier. Nevertheless, our attempts were unsuccessful because a large amount of the topographic contrast was present, considerably dominating over the magnetic contrast, which was not the case for cobalt monocrystals (at room temperature). The largest disadvantage of the type I magnetic contrast technique appears to be its relatively low spatial resolution (∼1 µm). The spatial resolution of a particular contrast in the SEM can be ultimately governed by the beam
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
203
F IGURE 13. Domain structure at the basal surface of a thick cobalt monocrystal, made visible with the type I magnetic contrast method. Reprinted from Szmaja (1999). © 1999 with permission from Elsevier.
size at the specimen surface, the signal-to-noise ratio, or the nature of the interaction responsible for contrast formation. For a tungsten filament gun and a beam current of 10 nA at 20 kV accelerating voltage, a corresponding beam size is ∼0.2 µm (Johansen and Werner, 1987). As a result, the beam size is not at present a decisive limitation for the resolution of type I magnetic contrast. The mechanism responsible for the magnetic contrast formation is, in fact, complex because the specimen stray field is highly inhomogeneous, and the secondary electrons are emitted in various directions and with different energies (0–50 eV). Moreover, as already mentioned, the stray field is probed by the secondary electrons through its entire range on their way from the specimen surface to the detector, and information on the fine magnetic structures “carried” by the secondary electrons near the specimen surface is partially lost as they travel away from the surface (see Figure 4). As a consequence, the domain walls and surface domains are imaged as substantially diffuse, indicating that the method is weakly surface sensitive, as clearly shown in Figure 13 (Szmaja, 1999). Nevertheless, this does not explain why the stripe domain structure, with stripes narrower than ∼1 µm and without any surface domains, could not be resolved, as reported by Joy and Jakubovics (1968), Yamamoto and Tsuno (1975), Jones (1977), and Szmaja (1994, 1996). It seems that the resolution limit of type I magnetic contrast is actually determined by the signal-to-noise ratio. It is known that the image quality depends on the signal-to-noise performance and that the minimum acceptable signal-to-noise ratio is necessary to detect a given contrast level. Indeed, the suitable images with domains of the order of 1 µm in size, presented
204
SZMAJA
by Joy and Jakubovics (1968, 1969), Atkinson and Jones (1976), Jones (1977), and Szmaja (1996) (for cobalt, MnBi and PbFe12 O19 ), are noisy, although they were recorded under optimum experimental conditions. In this respect, it should be noted that the images of Nd–Fe–B-based magnets presented in Figures 12b and 12c exhibit some amount of noise, but the noise level is certainly not high. However, as mentioned previously, in the type I magnetic contrast technique the intensity of collected secondary electrons is used as the image signal, which represents not only magnetic features, but also topographic information. The surface roughness and thus the surface topography contribution to the image signal is certainly much larger for Nd– Fe–B-based magnets than in the case of cobalt, MnBi, or PbFe12 O19 crystals. The images of Nd–Fe–B-based magnets are therefore correspondingly less noisy. As a result, the signal-to-noise ratio is found to be at present the crucial factor limiting the spatial resolution of the type I magnetic contrast method (Szmaja, 1999, 2002b, 2005). The statement is additionally supported by the fact that the resolution of the method in the scanning ion microscope (SIM), due to improved contrast compared with the SEM, was reported to be approximately 0.2 µm (Lloyd et al., 1999). With regard to the SEM, real improvements in the signal-to-noise ratio can be achieved by using SEMs with high-brightness electron guns (i.e., thermionic LaB6 or field emission cathodes). For beam diameters larger than approximately 0.1 µm, LaB6 cathodes are more suitable for the type I magnetic contrast mode because they provide a higher beam current; for smaller beam diameters, field emission guns are superior (Johansen and Werner, 1987). It seems that the spatial resolution of ∼0.3 µm should be attainable for materials with high saturation magnetization, but of course experimental investigations are necessary to validate this hypothesis. Better improvements in the resolution are unlikely in the author’s opinion (Szmaja, 1999, 2002b, 2005). The main reason appears to be a difficulty of achieving sufficiently small beam diameters because, on the other hand, the correspondingly large beam current is needed to obtain the required signal-to-noise performance.
IV. B ITTER PATTERN M ETHOD With the Bitter technique, the surface under study is decorated by fine magnetic particles, and the resulting domain pattern is usually observed by optical microscopy or SEM. The spatial deposition of magnetic particles on the specimen surface depends directly on the distribution of the local magnetic field near the surface. The total local field is due to the stray field of the specimen and an external field. In the conventional (or classic) Bitter pattern method, a wet colloid and a conventional optical microscope
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
205
F IGURE 14. Very-low-quality original image of the domain structure on the basal surface of a bulk cobalt monocrystal, taken with the conventional Bitter pattern technique (a). Images (b) and (c) were obtained by application to the original image of a simple digital procedure for contrast enhancement and then a procedure for sharpness improvement, respectively.
are used, whereas in the colloid-SEM method a dried colloid and a SEM are used. Among various techniques of imaging magnetic domains, the conventional Bitter pattern technique is the oldest one (Bitter, 1931; von Hámos and Thiessen, 1931) and has been successfully applied to studies of the domain structure of materials with uniaxial or multiaxial magnetic anisotropy. However, even today it is an important and frequently used method because of its low cost, ease of application, and good surface sensitivity. For reviews of the Bitter pattern technique, see Carey and Isaac (1966) and Craik (1974). Figure 14a presents an original image of the domain structure on the basal plane of a bulk cobalt monocrystal, recorded by the conventional Bitter pattern method. The image is of very poor quality despite optimizing experimental conditions. In a zero or near-zero external magnetic field, visual observation is very difficult because of the poor contrast and the lack of good sharpness of domain patterns—this problem was reported in a number of papers in the past (e.g., Kaczér et al. [1959], Takata [1963], Gemperle et al. [1963]). At present, however, the difficulty can be overcome by use of the DIP system (Szmaja, 2000, 2002a). After application of a simple digital procedure for contrast enhancement to the original image from Figure 14a, the image shown in Figure 14b was obtained. It is now clearly visible that the latter image suffers from the lack of sharpness. Consequently, further image processing is desirable to obtain a better quality image. For this purpose, the procedure for sharpness improvement known as unsharp masking (Pavlidis, 1987) was used. The application of the procedure to the image from Figure 14b produced the high-quality image, shown in Figure 14c. In this image, the magnetic domain structure is visible with high contrast and clarity. An interesting aspect of the
206
SZMAJA
considered high-quality image, obtained by applying DIP, is the presence of subcontrast within the larger scale domains (flowers and stars). This is an additional improvement over earlier results. In Figure 14c only the surface domains, in the form of circles, stars, and flowers, are seen. The main domains, extending through the entire specimen thickness, are invisible. The reason for this is related to the fact that the conventional Bitter pattern technique probes the stray field close to the specimen surface—it is a surface-sensitive method—as opposed to the type I magnetic contrast technique of SEM (see Section III and Figure 4). The reason for the poor-quality original images obtained with the conventional Bitter pattern method is the complex character of the surface domain structure of bulk cobalt, with the magnetization direction varying continuously between adjacent domain regions, as described previously (in Section III). As a consequence, the local maxima of the stray field near the specimen surface are not clearly marked. Also, the observed complicated Bitter patterns can be simplified suitably by covering the specimen surface with a nonmagnetic film, as can be easily understood on the basis of Figure 4. The use of a 40 µm thick plastic coat allows observation of the main domains in the regular remanent domain structure at the basal surface of bulk cobalt monocrystals (Wysłocki and Zi˛etek, 1966). The conventional Bitter pattern technique has the advantage of allowing observation of the evolution of domain configuration under the influence of an applied magnetic field. From the viewpoint of magnetic investigations, such experiments are of great significance because they allow us to obtain useful information on magnetization direction of domains, as well as on magnetizing and demagnetizing processes. An example is shown in Figures 15 and 16, which present the behavior of the domain structure at the (0001) surface of bulk cobalt monocrystals during the magnetizing cycle, as observed with the conventional Bitter pattern technique (Szmaja, 2000). The specimen was magnetized perpendicularly to the surface, increasing the field from zero to a maximum value of 350 Oe and thence reducing the field to zero, and then repeating the magnetizing process with the field in the reverse direction. As the magnitude of the perpendicular field was increased, the surface domains with magnetization oriented in the direction of the field expanded, whereas those with opposite magnetization shrank and disappeared. At the same time, the main domains gradually appeared and became increasingly better visible in the Bitter pattern images, as expected. These trends in domain behavior are fairly seen in the series of low-magnification images presented in Figure 15. Close investigations show, however, that generally different domain regions behave in a different way. The colloidal magnetic particles do not aggregate to some regions of the specimen surface even when the external perpendicular field is applied in both directions (if it is not too high), indicating that these re-
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
207
F IGURE 15. Effect of magnetizing cycle on the domain structure on the basal surface of a bulk cobalt monocrystal, imaged by the conventional Bitter pattern method. The external magnetic field was perpendicular to the specimen surface. 0 Oe (a) → 150 Oe (b) → 250 Oe (c) → 350 Oe → 0 Oe → −150 Oe (d) → −250 Oe (e) → −350 Oe → 0 Oe (f). Reprinted from Szmaja (2000). © 2000 with permission from Elsevier.
gions have certain magnetization components parallel to the specimen surface (Kaczér et al., 1959; Gemperle et al., 1963). Such domain regions are shown in Figure 16, two of which are marked by arrows. Moreover, it was observed that generally the colloidal magnetic particles settle on different regions exhibiting the described property at different values of the external magnetic field. In particular, some domain regions remained unmodified throughout the magnetizing cycle (see Figure 16). This in turn means that the surface domains on the basal plane of bulk monocrystalline cobalt have in general different inplane magnetization components, thus having the magnetization directions generally tilted away from the surface by different amounts. The changes in the domain structure during the magnetizing cycle were found to be reversible
208
SZMAJA
F IGURE 16. Effect of magnetizing cycle on the domain structure on the basal surface of a bulk cobalt monocrystal, imaged by the conventional Bitter pattern technique. The external magnetic field was perpendicular to the specimen surface. 0 Oe (a) → 100 Oe (b) → 300 Oe (c) → 350 Oe → 0 Oe → −100 Oe (d) → −300 Oe (e) → −350 Oe → 0 Oe (f). Reprinted from Szmaja (2000). © 2000 with permission from Elsevier.
(Figures 15 and 16). Most probably, only reversible displacements of the surface domain walls took place (Szmaja, 2000). Notably, the Bitter pattern contrast became stronger as the magnitude of the external magnetic field was increased (e.g., Craik [1974], Sakurai et al. [1994]). This effect is not seen in Figures 15 and 16 because they show the original images after the application of digital procedures for contrast and sharpness enhancement. The mechanism of improvement of the Bitter pattern contrast due to the application of an external magnetic field was explained theoretically by Kitakami (1991), using Hartmann’s ferrohydrodynamic treatment (Hartmann, 1987). Because of this effect, the magnitude of the nonmagnetic contrast relative to that of the magnetic contrast becomes smaller as the external field is increased. This is shown
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
209
in Figure 15, where some topographic scratches, running at a certain angle to the vertical direction, clearly seen in the images recorded in zero external field (Figures 15a and 15f), are hardly visible or almost invisible in the images taken at external fields 250 Oe of opposite directions (Figures 15c and 15e). It is also worth noting that pronounced magnetic potential modulations caused by the application of external fields offer a method of contrast modification that is of great applied importance for domain investigations using improved optical techniques, for example, the interference contrast colloid technique (Hartmann and Mende, 1985; Hartmann, 1987). Figures 17a and 17b present the images of the domain structure on the basal surface of a bulk cobalt monocrystal, recorded with the conventional Bitter pattern method at the beginning and at the end of the magnetizing cycle, respectively (Szmaja, 2002a). The external magnetic field was perpendicular to the specimen surface. The image taken at a field 200 Oe, shown in Figure 17c, proves that the domains changed under the influence of the applied field. A visual assessment of the images in Figures 17a and 17b indicates that the changes in the domain structure during the magnetizing cycle were reversible. But with the help of a DIP system, it is possible to perform a quantitative comparison between images, using the image difference technique. In the case of the Bitter pattern method, however, the magnetizing cycle usually takes several minutes (because of the slow response of the colloidal suspension to the applied magnetic field), and consequently drifts in the experimental setup occur. Thus, before forming pixel-by-pixel subtraction, the images compared must be precisely aligned. For this purpose, simple digital procedures of image translation and rotation were used. Then subtraction of the two aligned images was performed, producing the image shown in Figure 17d. The difference image is substantially structureless, proving that at the end of the magnetizing cycle the initial domain structure was restored, perhaps except for very small changes. In other words, processes in the specimen during the magnetizing cycle were (nearly) completely reversible. More generally, by using the image difference method, the character of changes (reversible or irreversible) in the domain structure during a cycle determined by an external parameter (e.g., magnetizing, temperature, or straining cycle) can be derived to a great reliability (Szmaja, 2002a). Recently, a detailed investigation of the problem of separating magnetic and nonmagnetic contributions to the image contrast for the conventional Bitter pattern method has been carried out for the first time (Szmaja, 2001). The separation of magnetic from nonmagnetic information can only be achieved by digital means, using the image difference procedure. The approach that can be applied in the case of the Bitter pattern technique consists of recording the nonmagnetic (topographic) reference image of the specimen area under study and subtracting this reference image from an image with domain structure
210
SZMAJA
F IGURE 17. Images of the domain structure on the basal surface of a bulk cobalt monocrystal, obtained by the conventional Bitter pattern method. Images (a) and (b) were recorded at the beginning and at the end of magnetizing cycle, respectively. Image (c) was taken at an external magnetic field 200 Oe perpendicular to the specimen surface. After precise aligning the images shown in (a) and (b), the latter was subtracted from the former, producing the image (d). Reprinted from Szmaja (2002a). © 2002 Wiley.
taken from the same specimen area to obtain the image containing only magnetic data. In the case of the conventional Bitter pattern method, depending on the specimen studied, the nonmagnetic reference image of the specimen area of interest can be taken most easily in the optical microscope in two ways: either by magnetizing the specimen to the saturated state and/or by removing the colloid (in some rare cases, the nonmagnetic image can be relatively easily obtained by heating the specimen above its Curie temperature, but in practice this is not used by any domain observation method and therefore was not studied). It was demonstrated that the separation of magnetic from
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
211
F IGURE 18. Image of the domain structure of a Y2.6 Sm0.4 Fe3.8 Ga1.2 O12 garnet specimen, recorded by the conventional Bitter pattern technique (a). Image of the same specimen area taken in the saturated magnetic state of the specimen (b). Image (c) is the result of subtracting image (b) from image (a). Reprinted from Szmaja (2001). © 2001 with permission from Elsevier.
nonmagnetic contrast is easy to realize in practice for comparatively soft magnetic materials, when an external field can be applied to simply produce the nonmagnetic reference image of the specimen (Szmaja, 2001, 2002a). An example is shown in Figure 18 (Szmaja, 2001). Figure 18a presents an image of the domain structure (in the form of a characteristic labyrinth pattern) of a ferrimagnetic garnet specimen with perpendicular magnetization, while the reference image of the saturated magnetization state taken from the same specimen area is shown in Figure 18b. The two largest topographic objects are clearly visible in the latter image and also in the image of Figure 18a, where they conceal the magnetic domain structure of primary interest. As the image of Figure 18b was subtracted from that of Figure 18a, the image
212
SZMAJA
with only magnetic information, presented in Figure 18c, was obtained. No image alignment had to be performed because the two subtracted images were recorded with perfect point-to-point correspondence. This was almost always the case. By comparing the images of Figure 18b and Figure 18c, it is now possible to directly assess the relationship between the surface topography structure and the magnetic structure. In the case presented, the surface topography has no or very little influence on the magnetic domain structure. In the example shown, the two topographic features are fairly large (extending over a distance of approximately three domain widths) and generate high contrast. Despite this, almost no residual contrast of topographic origin is visible in the difference image (Figure 18c). When removing the colloid from the specimen surface is used to record the nonmagnetic reference image, it was shown that separating magnetic and nonmagnetic information is principally impossible (Szmaja, 2001). The reason appears to be related to the fact that the Bitter pattern method detects (with good sensitivity) the distribution of the stray magnetic field, and not of the magnetization, at the specimen surface. In fact, the stray field distribution at the surface is determined not only by the magnetic structure alone, but also by surface imperfections of the specimen. As a consequence, the colloidal magnetic particles additionally deposit on appropriately oriented surface imperfections where discontinuity in the magnetization gives rise to free magnetic poles as the source of a stray field (Williams et al., 1949; Carey and Isaac, 1966). This effect causes an amount of image distortion and is directly responsible for the lack of success in separating magnetic from nonmagnetic information in the described case. Thin magnetic films are of significant interest from both fundamental and practical points of view. On the fundamental side, they exhibit different magnetic properties, such as magnetic anisotropy, magnetic microstructure, coercivity, and magnetoresistance, depending on their thickness, composition, crystalline structure, and preparation conditions. From the practical point of view, thin magnetic films find a wide range of applications in the areas of magnetic and magneto-optic recording and magnetic sensors. Magnetic microstructure observation of thin films of soft magnetic materials (as for example nickel–iron or cobalt) by the Bitter pattern method is difficult due to the fact that they produce only weak stray fields at domain walls (Hubert and Schäfer, 1998; Celotta et al., 2000). As a consequence, poor-quality images are obtained. An example is shown in Figure 19a, which presents an original image of the magnetic structure of a 40 nm thick permalloy film, recorded with the conventional Bitter pattern technique. However, owing to the use of a DIP system (Szmaja, 2002a; Szmaja and Balcerski, 2002), we could produce considerably better-quality images, such as that shown in Figure 19b. The latter image was obtained after applying
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
213
F IGURE 19. Poor-quality original image of the magnetic microstructure of a 40 nm thick permalloy film, taken with the conventional Bitter pattern method (a). Image (b) was obtained by applying digital procedures for contrast and sharpness enhancement to the original image.
digital procedures for contrast and sharpness enhancement to the original image from Figure 19a. In Figure 19b the cross-tie walls, displayed as dark lines, are clearly seen. The cross ties are imaged as short bars perpendicular to the main wall. The film was found to be magnetized in the plane of the film (of course, except for small regions near the domain walls in which the magnetization was found to be out-of-plane), as indicated by the behavior of the Bitter patterns under the influence of external magnetic fields. It should be noted, however, that the ripple structure of the magnetization within the domains, clearly resolved in the images recorded with TEM (Figure 20), cannot be seen in the image obtained by the conventional Bitter pattern technique supported by the DIP system (Figure 19b). The ripple structure is caused by local variations of the magnetic anisotropy. Figure 20 shows overfocus (a) and underfocus (b) images of the magnetic microstructure of a 40 nm thick permalloy film taken with the Fresnel mode of TEM from nearly the same specimen area. In these images, the features of magnetic origin are displayed in the reversed contrast. TEM is a powerful tool for imaging magnetic microstructures of thin films (Jakubovics, 1975; Chapman, 1984; Mankos et al., 1996). It offers high spatial resolution and high sensitivity to small variations in the magnetization and consequently compares favorably with the Bitter pattern method. Figures 21 and 22 show images of the magnetic domain structure of thin cobalt films, 40 nm and 100 nm in thickness, respectively, that were evaporated at an incidence angle of 45 degrees. The magnetic microstructure of the film 40 nm thick is composed of domains running predominantly in the direction perpendicular to the incidence plane of the vapor beam, as can be seen in the image of Figure 21a taken with the conventional Bitter
214
SZMAJA
F IGURE 20. Overfocus (a) and underfocus (b) images of the magnetic microstructure of a 40 nm thick permalloy film, recorded by the Fresnel mode of TEM from nearly the same specimen area.
F IGURE 21. Images of the magnetic microstructure of a 40 nm thick cobalt film evaporated at an incidence angle of 45 degrees, taken with the conventional Bitter pattern technique (a) and the Fresnel mode of TEM (b). The arrow in each image indicates the projection of the vapor beam into the film plane.
pattern technique. The magnetization of these films lies essentially in the film plane, as indicated by the behavior of the Bitter patterns under the influence of external magnetic fields. The type of the domain walls present in the considered films can be determined more precisely by TEM. An image of the magnetic microstructure recorded by the Fresnel mode of TEM is presented in Figure 21b, where the existence of cross-tie walls is seen. In general, however, both cross-tie and Néel-type walls were present in the film 40 nm thick (Szmaja and Balcerski, 2004). The cross-tie walls represent a transition between Néel and Bloch-type walls and occur only in a certain range of the
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
215
F IGURE 22. Images of the magnetic microstructure of a 100 nm thick cobalt film evaporated at an incidence angle of 45 degrees, obtained by the conventional Bitter pattern method. Image (a) was taken in the absence of external magnetic field; images (b) and (c) were recorded at external fields 300 Oe applied in opposite directions perpendicular to the film plane. The arrows in images (a–c) indicate the projection of the vapor beam into the film plane. Illustration of the deposition of colloidal magnetic particles for the case of stripe domain configuration with in-plane magnetization and domain walls of Bloch type in the absence (d) and presence (e) of an external magnetic field perpendicular to the specimen surface.
film thickness, from ∼30–40 nm to 90–100 nm (Hubert and Schäfer, 1998; Lin et al., 1998). Moreover, a ripple structure of the magnetization can be noticed in Figure 21b. In contrast to the film 40 nm thick, the film 100 nm thick has magnetic domains running predominantly in the direction parallel to the incidence plane of the vapor beam (Szmaja and Balcerski, 2004). An example is presented in Figures 22a–c, which show the images taken by the conventional Bitter pattern method. Figure 22a was recorded at zero external magnetic field, whereas the images of Figures 22b and 22c were taken at external fields of magnitude 300 Oe applied in opposite directions perpendicular to the film plane. In Figures 22b and 22c, the domains are
216
SZMAJA
displayed at the same gray shade. Nevertheless, some parts of the domain walls are clearly marked (in black), while others tend to disappear, and the described effect is reversed on a reversal of the perpendicular magnetic field. Recalling that the colloidal magnetic particles move to the regions of maximum magnitude of the total local field (which is due to the stray field of the specimen and the external field), this means that the domains are magnetized in the film plane, and the domain walls are of Bloch type. The justification is presented in Figures 22d and 22e, which show schematically the deposition of magnetic particles for the case of stripe domain configuration with in-plane magnetization and domain walls of Bloch type in the absence (Figure 22d) and presence (Figure 22e) of an external magnetic field perpendicular to the specimen surface. It is to be noted that most likely the domain walls in 100 nm thick cobalt films are asymmetric Bloch walls (Hubert and Schäfer, 1998; Lin et al., 1998). The direction in which domains are aligned is determined by the magnetic anisotropy of the film. In the case of the investigated polycrystalline cobalt films, the contributions to the magnetic anisotropy originate from the alignment of the columnar grains through the anisotropy of the demagnetizing field as well as from the c-axis alignment (crystalline texture) through the magnetocrystalline anisotropy (Hara et al., 1996; Alameda et al., 1996; Itoh et al., 1998). However, the exact reason for the change of the direction of domain magnetization from perpendicular to parallel with respect to the incidence plane of the vapor beam, reported for obliquely deposited films as the film thickness increases, is still of interest and under question, as discussed in more detail in several recent papers [e.g., Hara et al. (1996), Alameda et al. (1996), Abelmann and Lodder (1997), Itoh et al. (1998, 2002)]. In these papers, the magnetic anisotropy of obliquely deposited films with different thickness was investigated using the torque measurements, hysteresis loops, and transverse biased initial susceptibility (which provide information of the macroscopic character), but no magnetic domain studies were conducted. In this context, the results, which clearly demonstrate by domain visualization (i.e., at the microscopic level) the change of the direction of magnetic anisotropy from perpendicular to parallel with respect to the incidence plane of the vapor beam, observed for cobalt films evaporated at an incidence angle of 45 degrees as the film thickness is changed from 40 nm to 100 nm, appear to be a valuable contribution (Szmaja and Balcerski, 2004). Observations of the domain structure by the conventional Bitter pattern technique are diffraction limited to approximately 0.5 µm at optical wavelengths. Better results can be achieved by the dried colloid method in the SEM. Figure 23 shows an image of the domain structure on the (0001) plane of a bulk cobalt monocrystal, recorded by the colloid-SEM method. The image very clearly reveals the complicated character of the surface domain
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
217
F IGURE 23. Domain structure on the basal surface of a bulk cobalt monocrystal, made visible with the colloid-SEM technique.
structure (Szmaja, 2000). Using the colloid-SEM technique, we were able to resolve magnetic details on the length scale of 0.3 µm, as reproducibly indicated by analysis of the line scans of the image signal across the domain structure. Moreover, the original images of the magnetic microstructure recorded with the colloid-SEM method were of good contrast and sharpness, as opposed to those obtained by the conventional Bitter pattern technique in zero or near-zero external magnetic field (see Figure 14a). At present, the colloid-SEM method offers a spatial resolution of the order of 0.1 µm (Šimšová et al., 1991), which is not in fact limited by the resolution of the SEM but mainly by the quality of the colloids and of the specimen surface (Šimšová et al., 1991; Sakurai et al., 1994). Deposits of colloidal magnetic particles are imaged as dark regions by the optical microscope and as bright regions by the SEM (compare Figure 16a and Figure 23). This is due to their increased light absorption and secondary electron emission, respectively, in comparison with areas with little or no colloid. Domain structures of low-anisotropy materials with perpendicular magnetic anisotropy have the form of closed-flux ones, as expected and as proved, for example, by observations of the stress patterns in low-anisotropy thick films and bulk specimens of amorphous metal alloys (Livingston and Morris, 1985; Hubert et al., 1990; Koike et al., 1993). On the other hand, domain structures of materials with high uniaxial perpendicular anisotropy exhibit only twophase domains with magnetization along the magnetic easy axis; in thick specimens, the surface domains have magnetization opposite to that of the main domains within which they are located (Hubert and Schäfer, 1998). As mentioned earlier, cobalt with its intermediate value of the relative magnetic anisotropy (Q ≈ 0.4) is a kind of hybrid, with both perpendicular and
218
SZMAJA
in-plane components of domain magnetization present at the basal surface. The available experimental results show that bulk cobalt multiphase domain branching takes place at the uppermost part of the surface zone, while the remaining, deeper part of the surface zone, is dominated by a two-phase branching process. Just at the surface the magnetization is expected to lie substantially parallel to the surface to reduce the magnetostatic energy more effectively (Szmaja, 2000). For uniaxial materials with weak or intermediate perpendicular anisotropy (including cobalt), it appears plausible to assume a continuously varying surface zone under a misoriented surface with a thickness of the order of the domain wall width parameter (A/K)1/2 , where A is the exchange constant (Hubert and Schäfer, 1998). Observations by SEMPA (Unguris et al., 1989) and magneto-optic Kerr microscopy (Hubert et al., 1990), as well as the fact that the original domain patterns at the (0001) surface of bulk cobalt recorded with the conventional Bitter pattern technique in the absence of an external magnetic field are of poor contrast and not sharp (see Figure 14a), seem to indicate the presence of the postulated surface zone, but it cannot yet be considered as truly confirmed. Moreover, no rigorous calculation of the postulated surface zone is known. Is it possible that the domain structure of bulk cobalt could be interpreted by assuming a surface anisotropy contribution that favors in-plane magnetization and the volume anisotropy contribution that favors perpendicular magnetization (Szmaja, 2000)? In this context, it is worth to mention micromagnetic studies of the magnetization distribution near the surface of bulk specimens of soft magnetic material (Q $ 1) with strong surface anisotropy favoring perpendicular magnetization (presented by Hua et al. [1995, 1996]). The results obtained indicate the existence of magnetic ripple structures in the surface layer with the depth of penetration of the perpendicular magnetization into the bulk of the material on the order of A1/2 /Ms (i.e., ∼10 nm for permalloy [A = 10−6 erg/cm, Ms = 800 G]). Permanent magnets are used in a wide range of applications, and the market for them is continuing to expand as their magnetic characteristics and cost-effectiveness are improved. The applications include actuators, motors, magnetic and electromagnetic separators, magnetic sensors, magnetic resonance imaging, microwave power tubes, generators, wigglers, magnetic water and oil treatment, and many others (Coey, 2002). Among all permanent magnets, Nd–Fe–B-based magnets are currently the most powerful ones, with the highest available magnetic energies. They are produced in bulk form by two principal routes: (1) sintering microcrystalline powder into high-energy, fully dense components and (2) melt quenching nanocrystalline material for use in bonded and hot deformed components. Generally, sintered magnets have high-energy products (30–50 MGOe), full density, and relatively simple shapes. Bonded magnets have intermediate energy products (10–18 MGOe),
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
219
F IGURE 24. Images of the domain structure of anisotropic sintered (Nd0.85 Dy0.15 )14.5 Fe79 B6.5 magnet on the surface perpendicular to the alignment axis, taken by the conventional Bitter pattern technique (a) and the colloid-SEM method (b). Image (c) with the domain boundaries represented by black lines was obtained by digital processing of image (b). On image (c), five test straight lines are additionally superimposed to illustrate the stereologic method of Bodenberger and Hubert used for determining the domain width. (b, c) reprinted from Szmaja (2006). © 2006 with permission from Elsevier.
lower density, and can be formed into intricate net shapes. Hot deformed magnets possess full density, intermediate to high-energy products (15– 46 MGOe), isotropic or anisotropic properties, and have the potential to be formed into net shapes (Brown et al., 2002). Figures 24a and 24b present images of the domain structure of an anisotropic sintered (Nd0.85 Dy0.15 )14.5 Fe79 B6.5 magnet on the surface perpendicular to the alignment axis, recorded by the conventional Bitter pattern and colloid-SEM methods, respectively (Szmaja, 2006). The domains of opposite magnetization (along the alignment axis) are displayed as dark and bright. The domain structure is composed of the main domains (which extend
220
SZMAJA
through the entire grain thickness) and surface reverse spikes, the latter visible approximately as circles within the main domains. The main domains form a maze pattern, indicating good magnetic alignment of the hard magnetic grains. The main domains are visible in the images, and the domain structure is simpler than that of bulk cobalt monocrystals mainly because the magnet is composed of grains of only ∼10 µm in size, but also because the magnet has a much larger value of the relative magnetic anisotropy (Q = 6.7). However, the main domains in the image taken with the conventional Bitter pattern technique are noticeably larger than those in the image recorded by the colloid-SEM method; they have widths typically in the ranges of 3–6 µm and 1–2 µm, respectively. In other words, this means that the main domains are not resolved in the image obtained by the former technique. The described effect was reported by Szmaja (2004, 2006) for anisotropic sintered Nd–Fe–B-based specimens of different chemical composition. The reason for this effect does not appear to be related to a limited optical resolution of typically 0.5 µm; the statement is additionally supported by the observation that reverse spike domains ∼0.5–1 µm in size were present in images taken by the conventional Bitter pattern technique (see Figure 24a). The reason is related to the fact that the original wet colloid layer is considerably thicker than the dried one. The thickness of the wet colloid layer was estimated to be ∼5 µm (Carey and Isaac, 1966), while that of the dried colloid layer, as measured by using SEM, was typically in the 0.5–1 µm range. More precisely, the information obtained in images recorded by the two considered methods of domain observation is not in fact determined by the thickness of the colloid layer, but by their probing distance—the distance from the specimen surface to which the stray field of the specimen is effectively probed by magnetic particles of the colloid. The conventional Bitter pattern technique has a probing distance of ∼0.5–1 µm (Hartmann, 1987; Celotta et al., 2000). Thus, in the case of the colloid-SEM method, it seems reasonable to assume that its probing distance is ∼0.1–0.3 µm. As a consequence, apart from the fact that the colloidSEM method uses a SEM instead of a conventional optical microscope, this method is considerably more surface sensitive and clearly resolves domains, in contrast to the conventional Bitter pattern technique (Szmaja, 2004, 2006). The domain width or size is one of the most important parameters of the magnetic domain structure. We are often interested in the variations of the domain size with specimen thickness, temperature, or external magnetic field. Moreover, from measurements of domain widths, some intrinsic material parameters, such as the domain wall energy, domain wall thickness, singledomain particle diameter, and exchange constant, can be obtained. It is easy to define the domain width for simple domain configurations, and in such cases, it can be determined to a good or very good accuracy by visual method. However, in the case of complicated domain patterns, as for example those
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
221
of bulk cobalt monocrystals on the basal surface or anisotropic sintered Nd– Fe–B-based magnets on the surface perpendicular to the alignment axis, defining the domain width is not generally trivial and determining it with a digital method is much more appropriate than by visual measurements. In this context, different approaches have been used (e.g., see Bodenberger and Hubert [1977], Szmaja, [1998], Hoffmann et al. [1999], and also Figure 7). Nevertheless, the most universal and the most commonly applied method appears to be the stereologic method proposed by Bodenberger and Hubert (1977) (see also Hubert and Schäfer [1998]). In this method, an effective domain width is defined as the ratio of a chosen area to the total length of domain walls in this area. This definition of the domain width is in agreement with the ordinary definition for stripe domains with parallel straight walls. For the purpose of evaluating the total domain wall length, a large number of test straight lines running in random directions is used. The method is illustrated in the image of Figure 24c, where five test lines are drawn. The domain width D is determined from the formula / (1) li ni , D = (2/π) i
i
where li is the length of the ith test line and ni is the number of intersections of the ith test line with domain walls. However, we must also be aware that determination of the domain width requires application of a domain observation method with sufficiently high spatial resolution. For example, in the case of the anisotropic sintered Nd– Fe–B-based magnets, the results obtained by the conventional Bitter pattern technique (see Figure 24a) and the SEM type I magnetic contrast (see Figures 12b and 12c) clearly show that these methods are not suitable for this purpose (Szmaja, 2006). But the domain width can be determined on the basis of images recorded by the colloid-SEM technique and by MFM (see Section V). In the case of images obtained with the colloid-SEM method, the image contrast was observed between the domains (see Figure 24b). To determine the positions of domain walls in the image, the digital procedure proposed by Szmaja (2002a) was applied. First, a gray-scale image with the domain structure was transformed to the corresponding black-and-white image by a simple thresholding technique on the level of the average image intensity, and then a 3 × 3 pixel median filter was applied to smooth the latter image and remove from it some possible artifacts in the form of small objects produced by the thresholding technique. This resulted in the image, from which using a simple digital procedure it was easy to derive the corresponding image where the domain boundaries are represented by black lines (see Figure 24c). On the basis of such images, the surface domain width Ds was determined
222
SZMAJA
by applying the method of Bodenberger and Hubert with 1000 test straight lines (on the image of Figure 24c, five test straight lines are additionally superimposed). The procedure was repeated 10 times to improve the accuracy of determining the domain width. Then the surface domains (reverse spikes) were removed (manually) from the considered images, and the method of Bodenberger and Hubert was applied again (Szmaja, 2006), this time to determine the width of the main domains Dm . The SEM is very well suited to investigations of the domain structure of specimens that are not flat, owing to its large depth of focus—the feature inherently related to the SEM. Nice examples of using this advantage, demonstrating the possibility to study directly the relationship between the arrangement of domains on inclined planes, were presented by Joy and Jakubovics (1969) and Goto and Sakurai (1977) for cobalt crystals. Here we also present an example in Figure 25, for an anisotropic sintered Nd– Fe–B-based magnet. In the image of Figure 25a, taken with the colloid-SEM technique, the surfaces parallel and perpendicular to the alignment axis are shown at the upper and lower part of the image, respectively. It should be noted that the specimen edge, separating the mentioned surfaces, is not sharp and in fact hardly visible. Moreover, there are considerable variations in surface topography near the edge. As a consequence, in Figure 25a, displayed with comparatively high contrast, details present in the black and white parts of the image are invisible. In such a case, however, the problem can be overcome by the application of a differentiation operation, which is equivalent to a spatial frequency filter that suppresses slow and enhances fast intensity
F IGURE 25. Image of the domain structure of an anisotropic sintered (Nd0.7 Dy0.3 )14.5 Fe79 B6.5 magnet on the surfaces parallel (the upper part of the image) and perpendicular (the lower part of the image) to the alignment axis, recorded with the colloid-SEM method (a). Image (b) was obtained by differentiating of image (a).
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
223
variations in the image. The resulting image is presented in Figure 25b. In this image, the domain boundaries are clearly delineated, showing the arrangement of domains on both surfaces and near the edge separating the surfaces. The stripe-like domains with straight domain walls running nearly parallel to the alignment axis are seen on the surface parallel to the alignment axis, while the maze domains and surface reverse spikes, the latter displayed approximately as circles within the maze domains, are present on the surface perpendicular to the alignment axis. Note also that some topographic features, displayed in black and white, can be recognized in Figure 25b, mainly near the specimen edge and at the upper part of the image. For specimens in the demagnetized state, in the absence of an external magnetic field, fine magnetic particles are expected to be preferentially attracted to the regions near domain walls. In practice, however, this is not always the case. In the images of domain structures with sufficiently small domains (smaller than ∼1 µm) and a perpendicular magnetization component, taken by the conventional Bitter pattern method or the colloid-SEM technique and shown in Figures 14c, 16a, 23, 24a, and 24b, the domains, and not the domain walls, are marked. The reason for this may be related to clustering of the colloidal magnetic particles. Moreover, for sufficiently fine domain structures, in some cases it is difficult to determine whether colloidal magnetic particles deposit on the domains or the domain walls. This difficulty was reported by Šimšová et al. (1991) for observations of the domain structure of CoCr films under the influence of an external magnetic field, carried out with the colloid-SEM method. Figure 26 shows an image of a misaligned grain in anisotropic sintered Nd–Fe–B-based magnet (in the demagnetized state) on
F IGURE 26. Image of a misaligned grain in an anisotropic sintered Nd13.7 Dy0.2 Fe79.7 TM0.4 B6 magnet (where TM: Al, Ga, Co, Cu) on the surface perpendicular to the alignment axis, taken by the colloid-SEM technique.
224
SZMAJA
the surface perpendicular to the alignment axis, recorded by the colloid-SEM technique. More precisely, in this case the grain has its c axis at a certain angle to the surface—the c axis is neither parallel nor perpendicular to the surface. In the central part of the grain, the domain structure consists, to a very good approximation, of stripe domains with parallel straight domain walls. Nevertheless, it is difficult to decide whether colloidal magnetic particles aggregate to the domains or the domain walls (in this image, deposits of colloidal magnetic particles are displayed as bright).
V. M AGNETIC F ORCE M ICROSCOPY MFM was introduced in 1987 (Martin and Wickramasinghe, 1987). This technique relies on the magnetostatic interaction between the magnetic specimen and a magnetic probe placed at a constant height of tens or hundreds of nanometers (typically 10–200 nm) over the specimen surface. On the basis of this interaction, an image of the magnetic structure of the specimen is obtained. Among all methods for the observation of magnetic domain structures, MFM appears to be at present the most widely used, mainly because of its high spatial resolution (routinely better than 100 nm), high surface sensitivity, and ease of application. The MFM technique has proved to be a very useful characterization tool in both basic research and industrial applications. For reviews of MFM, see Grütter et al. (1992) and Hartmann (1999). Figure 27 schematically presents the principle of MFM imaging. To separate the magnetic contrast from the surface topography, MFM measurements are performed using the two-pass method. For each raster line, the first pass is
F IGURE 27.
Illustration of the two-pass method used in MFM.
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
225
made very close to the specimen surface (AFM) and yields knowledge of the surface topography. The second pass then follows the recorded topography but at an increased scan height. The tip-specimen distance must be large enough to eliminate (or minimize) the short-range van der Waals forces that provided the topographic contrast. Then in the second pass, the tip is affected only (mainly) by long-range magnetic forces, and the corresponding MFM image of the specimen surface is obtained. Figure 28a shows a MFM image of the domain structure of an anisotropic sintered Nd14.5 Fe79 B6.5 magnet on the surface perpendicular to the alignment axis (Szmaja, 2006). First, note that in close proximity to the mentioned surface, Nd–Fe–B-based magnets exhibit a large magnetic induction, of the order of 10 kG. On the one hand, this may be seen as an unfavorable circumstance for MFM observations due to the expected perturbation of the magnetic state of the tip by the specimen stray field, but on the other hand, this leads to large interaction forces or force gradients and high-contrast images, a favorable condition. The presence of main domains (extending through the entire grain thickness) and surface reverse spikes in Figure 28a is apparent. The main domains form a maze pattern near the surface. They are visible in the image because the magnet consists of grains of only ∼10 µm in size. The domain walls are displayed as dark, while the main (maze) domains are substantially imaged as bright. The latter means that during the scanning process the tip was remagnetized by the stray fields of main domains each time it passed from one domain to a neighboring domain (i.e., at the domain walls), resulting in the loss of information on the magnetization direction of
F IGURE 28. MFM image of anisotropic sintered Nd14.5 Fe79 B6.5 magnet on the surface perpendicular to the alignment axis, recorded with a tip-specimen distance of 100 nm (a). Corresponding image with the domain boundaries represented by black lines (b). On image (b), five test straight lines are additionally superimposed to illustrate the stereologic method of Bodenberger and Hubert used to determine the domain width. Reprinted from Szmaja (2006). © 2006 with permission from Elsevier.
226
SZMAJA
these domains. This is as expected because near the surface their stray fields are much larger than the coercivity of the tip. In general, however, there were exceptions to this, although they were very rare. One such exception is the region near the central part of the image in Figure 28a, where the neighboring main domains are visible as dark and bright and the boundaries between them are not clearly marked (see also Szmaja et al. [2004a]). On the other hand, because the reverse spikes are relatively small, cone-shaped, surface domains that contain only a small amount of material, their stray fields are insufficient to remagnetize the tip, and they are substantially imaged as dark, indicating that their magnetization is opposite to that of the main domains within which they are located. In the context of interpreting MFM images of hard magnetic materials, depending on the degree of perturbation of the magnetic state of the tip by the specimen stray field, image contrast can arise between neighboring domains, or at domain walls, or can be a combination of both, as demonstrated by Folks and Woodward (1998). We are interested in determining the surface and main domain widths on the basis of MFM images, such as that presented in Figure 28a, using the stereologic method of Bodenberger and Hubert (see Section IV). For this purpose, we must first determine the positions of domain walls in the image. In this context, the situation is more complicated for MFM images (see Figure 28a) in comparison with images recorded by the colloid-SEM method (see Section IV and Figure 24b). In the case of MFM images, the image contrast is observed between the main (maze) domains and reverse spikes, while the main domains are substantially imaged as bright (i.e., the domain walls separating the main domains are in contrast). Moreover, there were exceptions to this (in fact, very rare), the domain boundaries between the main domains are imaged as comparatively wide (the domain walls in Nd–Fe– B-based magnets are very narrow, only ∼5–10 nm in thickness), and what is found to be much more undesirable, within the main domains and the domain walls separating them, are generally quite large variations in the intensities of the image points. All these features preclude deriving the positions of domain walls in the MFM image by a digital procedure similar to that applied to the colloid-SEM image; as a consequence, they had to be determined manually. Figure 28b shows the resultant image with the domain boundaries represented by black lines, corresponding to the MFM image of Figure 28a. On the basis of such images, the surface domain width Ds and then the main domain width Dm , was determined using the method of Bodenberger and Hubert, in the same way as for suitable images derived for the colloid-SEM technique, one of which is shown in Figure 24c (see Section IV). The results obtained for the anisotropic sintered Nd–Fe–B-based magnets are presented in Table 2 (Szmaja, 2006). The given domain widths represent averages from 10 images (five images were recorded by the colloid-SEM method and five images were
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
227
TABLE 2 E XPERIMENTAL VALUES OF THE M AIN D OMAIN W IDTH Dm AND THE S URFACE D OMAIN W IDTH Ds FOR THE I NVESTIGATED A NISOTROPIC S INTERED N D –F E –B-BASED M AGNETS Chemical composition
Dm (µm)
Ds (µm)
Nd14.5 Fe79 B6.5 (Nd0.85 Dy0.15 )14.5 Fe79 B6.5 (Nd0.7 Dy0.3 )14.5 Fe79 B6.5 Nd13.7 Dy0.2 Fe79.7 TM0.4 B6 , where TM: Al, Ga, Co, Cu
1.16 ± 0.06 1.34 ± 0.07 1.59 ± 0.06 1.25 ± 0.07
0.89 ± 0.06 1.13 ± 0.07 1.45 ± 0.06 1.03 ± 0.06
Reprinted from Szmaja (2006). © 2006 with permission from Elsevier.
collected with MFM) of the domain structure taken at various places on the specimen surface. For all the investigated anisotropic sintered Nd–Fe–B-based magnets, which were composed of grains with an average size L ≈ 10 µm, the maze domain structure with reverse spike domains was observed on the surface perpendicular to the alignment axis. In such a case, the domain wall energy γ of the material is most frequently determined using the theory of Kaczér (1964), the theory of Hubert (1967b) (see also Hubert and Schäfer [1998]) or the extended theory of Szymczak (1973) (see also Szymczak et al. [1987]). The analysis carried out by Szmaja (2006) shows that the predictions of the extended model of Szymczak concerning the γ values for various uniaxial materials with high magnetic anisotropy (for which Q > 1) are closest to the experiment. According to this theoretical model, the dependence of the width of the main domains Dm on the crystal thickness L is given by the formula 1/3 2/3 L , (2) Dm = 0.395 γ μ∗ /Ms2 where μ∗ = 1 + Q−1 is the rotational permeability of the material. To obtain the domain wall energy from Eq. (2), we need to know the main domain width Dm , the crystal thickness L, the magnetic anisotropy constant K, and the saturation magnetization Ms (μ∗ = 1 + 2π Ms2 /K). Knowing the domain wall energy γ then allows determination of the exchange constant A, the domain wall thickness δ (using the standard continuum model of a domain wall), and the critical diameter for single-domain particle dc , on the basis of the relationships listed below: γ = 4(AK)1/2 δ = π(A/K) dc =
1/2
1.4γ /Ms2
(3) = π γ /4K
(4) (5)
228
SZMAJA
TABLE 3 VALUES OF THE D OMAIN WALL E NERGY γ (D ETERMINED U SING THE E XTENDED T HEORY OF S ZYMCZAK ), THE D OMAIN WALL T HICKNESS δ, THE S INGLE -D OMAIN PARTICLE D IAMETER dc AND THE E XCHANGE C ONSTANT A FOR THE S TUDIED A NISOTROPIC S INTERED N D –F E –B-BASED M AGNETS Chemical composition
γ (erg/cm2 )
δ (nm)
Nd14.5 Fe79 B6.5 (Nd0.85 Dy0.15 )14.5 Fe79 B6.5 (Nd0.7 Dy0.3 )14.5 Fe79 B6.5 Nd13.7 Dy0.2 Fe79.7 TM0.4 B6 , where TM: Al, Ga, Co, Cu
27 ± 4 34 ± 5 48 ± 6 37 ± 6
5.7 ± 0.8 6.3 ± 0.9 8.1 ± 0.8 5.7 ± 0.9
dc (µm) 0.29 ± 0.06 0.48 ± 0.08 0.82 ± 0.09 0.38 ± 0.07
A (erg/cm) (1.2 ± 0.5) × 10−6 (1.7 ± 0.6) × 10−6 (3.1 ± 0.7) × 10−6 (1.7 ± 0.6) × 10−6
Reprinted from Szmaja (2006). © 2006 with permission from Elsevier.
F IGURE 29. MFM image of an anisotropic sintered (Nd0.85 Dy0.15 )14.5 Fe79 B6.5 magnet on the surface perpendicular to the alignment axis, taken with a tip-specimen spacing of 100 nm (a). Corresponding AFM image (b).
The determined values of γ (resulting from the extended theory of Szymczak), δ, dc , and A for the studied anisotropic sintered Nd–Fe–B-based magnets are shown in Table 3 (Szmaja, 2006). One of the greatest advantages of the MFM technique is that it simultaneously provides information on the magnetic structure and the surface topography, the latter revealed by AFM. This is a direct consequence of the two-pass method (described earlier) used in the MFM imaging. Examples are shown in Figures 29–36. Figure 29a presents a MFM image of the domain structure of an anisotropic sintered (Nd0.85 Dy0.15 )14.5 Fe79 B6.5 magnet on the surface perpendicular to the alignment axis, while the corresponding AFM topographic image is shown in Figure 29b. The familiar maze domain
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
229
F IGURE 30. High-resolution MFM image of anisotropic sintered (Nd0.7 Dy0.3 )14.5 Fe79 B6.5 magnet on the surface perpendicular to the alignment axis, recorded with a tip-specimen distance of 100 nm (a). Corresponding AFM image (b). Reprinted from Szmaja (2006). © 2006 with permission from Elsevier.
F IGURE 31. High-resolution MFM image of an anisotropic sintered Nd13.7 Dy0.2 Fe79.7 TM0.4 B6 magnet (where TM: Al, Ga, Co, Cu) on the surface perpendicular to the alignment axis, taken with a tip-specimen separation of 100 nm (a). Corresponding AFM image (b).
pattern with reverse spikes is seen in the MFM image of Figure 29a. By comparing the MFM image with the corresponding AFM image, it is possible to directly assess whether the magnetic structure is correlated with the surface topography and also whether topographic features are present in the MFM image. The latter occurs for a sufficiently small tip-specimen distance, but also for a larger tip-specimen distance when correspondingly high or deep topographic objects exist on the studied surface area. In Figure 29, it should be noted that parts of some small topographic features (approximately
230
SZMAJA
F IGURE 32. MFM image of anisotropic sintered SmCo5 magnet on the surface perpendicular to the alignment axis, taken with a tip-specimen distance of 50 nm (a). Corresponding AFM image (b). Reprinted from Szmaja et al. (2004c). © 2004 with permission from American Institute of Physics.
F IGURE 33. High-resolution MFM image of anisotropic sintered SmCo5 magnet on the surface perpendicular to the alignment axis, recorded with a tip-specimen spacing of 50 nm (a). Corresponding AFM image (b). Image signal profile across the domain structure along the line marked in (a) is shown in (c). Reprinted from Szmaja et al. (2004c). © 2004 with permission from American Institute of Physics.
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
231
F IGURE 34. High-resolution MFM image of nanocomposite Nd2 Fe14 B/Fe3 B magnet ribbon, taken with a tip-specimen separation of 100 nm (a). Corresponding AFM image (b).
circular in shape, displayed in white in Figure 29b) are present in the MFM image (where they are displayed as black, circular objects). Nevertheless, the magnetic domain structure is not affected by the surface topography, as can be seen by comparison of the MFM image with the corresponding AFM image. This was true in most cases for the studied anisotropic sintered Nd– Fe–B-based magnets, and this is in accordance with expectations because these magnets are known to be easily magnetizable at relatively low fields, through the process of domain wall motion in the grains (Livingston, 1985; Sagawa et al., 1985). In some cases, generally rare, some influence of the surface topography on the magnetic domain structure was found. It was then observed, for example, that the corresponding magnetic domains had sizes and shapes closely associated with those of the topographic objects (Szmaja et al., 2004a). It is also interesting to note that the reverse spike present in the upper right quadrant of the MFM image from Figure 29a is displayed as bright (and the domain wall separating this reverse spike and the main domain within which the reverse spike lies is marked as dark), as opposed to the remaining reverse spikes (see also Figures 28a and 36b). This means that the considered reverse spike was sufficiently large, and consequently its stray field was sufficient to remagnetize the tip (i.e., the magnetostatic interaction between the reverse spike and the tip was perturbing). Such cases were, in fact, extremely rare. The reverse spike domains in Figure 28a are clearly marked. In this respect, however, the image of Figure 28a is an exception. (The reason for this is not clear.) In most cases, the reverse spikes were substantially displayed as unsharp, as can be seen in the MFM images of Figures 29a, 32a, and 36a. This appears to agree with observations performed using tips
232
SZMAJA
F IGURE 35. MFM image of a bulk cobalt monocrystal on the basal surface, recorded with a tip-specimen distance of 100 nm (a). Corresponding AFM image (b). Image (c) was obtained after applying a digital procedure for sharpness enhancement to image (a).
with moderate (∼300 Oe) or relatively high (∼1400 Oe) coercivity (Folks and Woodward, 1998; Zueco et al., 1998). To understand the presence of this effect, it must be remembered that the reverse spike domains are cone shaped and lie within the maze domains of opposite magnetization (Szmaja et al., 2004a). As a consequence, the magnetic structure is characterized by surface charges, as well as internal charges associated with domain walls which near the surface are tilted away from the surface normal. Down to a certain depth, the internal charges are also sensed by the tip magnetization. Around the reverse spike domains, the internal charges are of opposite sign than the surface charges, leading to unsharp charge contrast images, as presented by Zueco et al. (1998). This most probably also leads here to partial remagnetization of the conventional thin-film tip with the distributed nature
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
233
F IGURE 36. MFM images of anisotropic sintered Nd14.5 Fe79 B6.5 magnet, taken with a tip-specimen spacing of 90 nm (a), 150 nm (b), and 800 nm (c). Corresponding AFM image (d). Reprinted from Szmaja et al. (2004a). © 2004 with permission from Wiley.
of the magnetic tip volume (the magnetically active volume of such a tip was estimated to be extended over the last ∼0.7 µm of the tip [Babcock et al., 1996]) and consequently to the unsharp imaging of reverse spike domains (Folks and Woodward, 1998; Zueco et al., 1998) because the effect is not observed for low-coercivity tips (Folks and Woodward, 1998). It is also believed that the described effect will be absent for advanced probes with artificially confined active probe volume that were demonstrated to lead to much sharper and higher-resolution MFM images in comparison with those obtained using conventional thin-film tips (Skidmore and Dahlberg, 1997; Leinenbach et al., 1999). A detailed study of the magnetic structure of anisotropic sintered Nd– Fe–B-based magnets by the MFM technique shows that in addition to
234
SZMAJA
the coarse main domains and reverse spikes, fine-scale surface domains in the form of curved stripes are present on the surface perpendicular to the alignment axis. This fine domain structure was observed on all the studied anisotropic sintered Nd–Fe–B-based specimens, on various places of the mentioned surface (Szmaja, 2006). Examples are presented in highresolution MFM images of Figure 30a for a (Nd0.7 Dy0.3 )14.5 Fe79 B6.5 magnet and Figure 31a for a Nd13.7 Dy0.2 Fe79.7 TM0.4 B6 magnet (where TM: Al, Ga, Co, Cu), while Figures 30b and 31b show the corresponding AFM topographic data, respectively. For a given image with the fine domain pattern, no such pattern was observed in the corresponding AFM image; in addition, we did not observe any change of the MFM image with changing scan direction. This proves unambiguously that the fine-scale structure is of magnetic origin and not an image artifact. In this context, it should be noted that the existence of the fine domain structure was reported for the first time for a monocrystalline Nd2 Fe14 B specimen by Al-Khafaji et al. (1996, 1998) and then for a (Nd0.85 Dy0.15 )14.5 Fe79 B6.5 magnet by Szmaja et al. (2004b). The fine-scale domain structure of the studied anisotropic sintered Nd–Fe– B-based magnets was, in fact, inhomogeneous—it varied in size and shape when we moved from one area of the surface to another. In this respect, it was similar to that present in anisotropic sintered SmCo5 magnets, reported for the first time by Szmaja et al. (2004c) (see Figure 33a), and in contrast to the regular fine domain pattern observed in a monocrystalline Nd2 Fe14 B specimen (Al-Khafaji et al., 1996, 1998). In general, the width of the finescale domains in the studied anisotropic sintered Nd–Fe–B-based magnets was in a wide range of 20–250 nm. The MFM method detects the stray field of the specimen and consequently is insensitive to the in-plane component of magnetization. (In the case of specimens with in-plane magnetization, the only sources of the stray field are the domain walls and surface defects [Grütter et al., 1992; Hartmann, 1999].) However, for the studied anisotropic sintered Nd–Fe–B-based magnets, the hard magnetic phase (from which these magnets are mainly composed) has a strong magnetocrystalline anisotropy along the alignment axis, resulting in the large Q-values (Q > 4; see Table 1). For this reason, it is unlikely that the fine-scale domains had an in-plane magnetization component. Thus, the fine-scale domains are found to be magnetized perpendicular to the surface (Szmaja, 2006). Their occurrence is certainly associated with the reduction of the magnetostatic energy close to the specimen surface. Such a reduction is not taken into account in the theory (Goodenough, 1956; Kaczér, 1964; Hubert, 1967b; Szymczak, 1968, 1973). The fine domains are located at different depths below the specimen surface (generally not deeper than ∼100 nm) and are expected to extend over a small depth into the specimen, as explained further in the text. It should also be noted that
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
235
the fine-scale domains were not seen in images taken with the colloidSEM technique. Nevertheless, this is as expected, due to insufficient spatial resolution of the mentioned method. In fact, with this method we were able to resolve magnetic domains of 0.3 µm in size (Szmaja, 2000), the result being very close only to the upper limit (250 nm) of the width range of the fine domains observed by MFM. Šimšová et al. (1991) stated that the spatial resolution of the colloid-SEM method is approximately 0.1 µm; however, it must be remembered, as mentioned previously, that the spatial resolution of this method is not limited by the resolution of the SEM but mainly by the quality of the colloids and of the specimen surface (Šimšová et al., 1991; Sakurai et al., 1994). In the author’s opinion, clustering of the colloidal magnetic particles is at present a decisive factor limiting the spatial resolution of the colloid-SEM technique. Figure 32a shows a MFM image of the domain structure of an anisotropic sintered SmCo5 magnet on the surface perpendicular to the alignment axis; the corresponding AFM image is shown in Figure 32b (Szmaja et al., 2004c). In Figure 32a, the magnetic structure consists of the main domains, forming a maze structure on the scale of a few micrometers, within which are reverse spikes displayed approximately as circles. The main (maze) domains are visible in the image because the magnet is composed of grains of only ∼20 µm in size. The domain pattern resembles that observed for anisotropic sintered Nd–Fe–B-based magnets (see Figures 24b, 28a, and 29a). Similarly, as in the case of the MFM images shown in Figures 28a and 29a, the main domains all are displayed at nearly the same gray shade, and the domain walls separating them are in contrast, which means that the tip was remagnetized by the stray fields of these domains each time it passed from one domain to a neighboring domain (i.e., at the domain walls). This finding agrees with expectations because near the surface the stray fields of the main domains are much larger than the coercivity of the tip. The reverse spike domains are substantially displayed as white, indicating that their magnetization is opposite to that of the domains within which they lie (Szmaja et al., 2004c). Note also that the reverse spike domains are imaged as unsharp. Comparison of Figure 32a with Figures 24b, 28a, and 29a shows that the main domains in anisotropic sintered SmCo5 magnets are considerably larger (about three times) than those observed in anisotropic sintered Nd–Fe–Bbased magnets. (In addition, the reverse spike domains are noticeably larger in the former magnets than those in the latter magnets; they were typically 1–2 µm and 0.5–1 µm in diameter, respectively.) The differences in the domain structure of the two systems of magnets must be attributed to their different values of the relative magnetic anisotropy as well as to their different average grain sizes. In the case of the anisotropic sintered Nd–Fe–B-based magnets, 4 < Q < 10 (see Table 1), and the average grain size L ≈ 10 µm,
236
SZMAJA
whereas for the SmCo5 magnets, Q ≈ 26 and L ≈ 20 µm. Also of note, almost no correlation between the magnetic domain structure and the surface topography was observed for anisotropic sintered SmCo5 magnets, except for rare cases in which only a weak correlation could be noticed. This is generally as expected because these magnets, similar to anisotropic sintered Nd–Fe– B-based magnets, are nucleation-type magnets, in which the domain walls are easily moved within the grains at relatively low external magnetic fields, leading to steep initial magnetization curves (Livingston, 1973; Kumar, 1988). A detailed investigation of the magnetic structure of anisotropic sintered SmCo5 magnets by MFM reveals the presence of a complicated system of fine-scale domains in the shape of curved stripes on the surface perpendicular to the alignment axis. This fine surface domain structure is shown in the high-resolution MFM image of Figure 33a, and Figure 33b shows the corresponding AFM image (Szmaja et al., 2004c). The fine-scale structure can already be noticed in the MFM image of Figure 32a, especially within the main domain at the right edge of the image. In general, the widths of the fine domains varied in a wide range of 10–200 nm. The fine domain structure was observed on various places of the surface and on different specimens of the SmCo5 magnets. No correlation was found between a given MFM image with the fine-scale domain structure and the corresponding surface topography image; moreover, no change of the MFM image with changing scan direction was observed. This, in turn, indicates that the fine structure is not an image artifact, but really exists and is of magnetic origin (Szmaja et al., 2004c). The fine-scale domain structure resembles that observed in anisotropic sintered Nd–Fe–B-based magnets, as mentioned earlier. SmCo5 has a very high magnetocrystalline anisotropy along the c axis, resulting in the very large value of the relative magnetic anisotropy Q ≈ 26 for the studied SmCo5 magnets (see Table 1). For this reason, it is very unlikely that the fine-scale domains had a component of magnetization parallel to the specimen surface. The statement is supported convincingly by the fact that the investigation performed by SEMPA, which is a very surfacesensitive technique with probing depth of ∼1 nm, was not able to detect any in-plane domain structure in polycrystalline SmCo5 specimens on the surface perpendicular to the alignment axis (Unguris et al., 1989). Accordingly, the fine-scale domains are magnetized perpendicular to the surface. Their presence is certainly related to the reduction of the magnetostatic energy close to the specimen surface. It is difficult to make exact statements about the position of the domains below the specimen surface and how deep they extend into the specimen because MFM senses the stray fields above the specimen surface. Nevertheless, some information on this, concerning the fine-scale domains, can be derived from the recorded MFM images and general considerations on magnetic domain formation, as presented below.
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
237
The fine domains, as opposed to the large main domains, are imaged at different gray shades. This means that the magnetostatic interaction between them and the tip was nonperturbing. However, because the fine domains are all magnetized perpendicular to the surface, this also means that they are located at different depths below the specimen surface as a consequence of the fact that the domains lying deeper in the specimen produce weaker stray fields and thus different MFM signals than those located closer to the surface (accordingly, they do not cross). The described effect is seen in Figure 33a, and in Figures 30a and 31a for anisotropic sintered Nd–Fe–B-based magnets. The fine-scale domains lie close to the specimen surface, generally not deeper than ∼100 nm (being of the order of their maximum width and of the distance over which their stray fields extend above the surface), because otherwise they could not be imaged so sharply, as shown in the MFM images. Moreover, these domains are generally expected to extend over a small depth into the specimen because otherwise a decrease in the magnetostatic energy would be smaller than the corresponding expense in the domain wall energy (Szmaja et al., 2004c). The spatial resolution of the MFM method is usually believed to be determined by the tip-specimen spacing or the tip radius, whichever is the greater (Grütter et al., 1992). Nevertheless, our investigations and those presented by Al-Khafaji et al. (1998) indicate that this statement is not true. In fact, with the tip-specimen distance of 50 nm, the fine domains on a scale considerably smaller than the tip radius of ∼80 nm could clearly be observed (Figure 33a). The smallest resolvable domain had width of 12 nm, as evidenced by analysis of the line scan of the image signal across the domain structure presented in Figure 33c (Szmaja et al., 2004c). The spatial resolution and image contrast decreased with increasing tip-specimen distance, as expected, but these fine domains could be resolved up to the tip-specimen spacing of 150 nm. The results clearly show that MFM in the dynamic mode of operation possesses high spatial resolution and high surface sensitivity, much better than other methods of domain observation that rely on the specimen stray fields, such as the type I magnetic contrast technique of SEM, the conventional Bitter pattern technique, or the colloid-SEM method (Szmaja et al., 2004c). In recent years, nanocomposite exchange-coupled magnets consisting of a fine mixture of magnetically hard (to provide high coercivity) and soft (to provide high magnetization) phases have attracted much attention for potential permanent magnet development. The exchange coupling causes the magnetization of the soft phase to align with that of the hard phase, leading to enhanced remanent magnetization in a crystallographically isotropic structure along with a reasonable coercivity. Because of this, a higher-energy product would be expected in exchange-coupled magnets compared with the conventional isotropic uncoupled magnets. Nanocomposite hard magnetic
238
SZMAJA
material is also of commercial interest because of its use in bonded and hot deformed permanent magnets (Hadjipanayis, 1999). Moreover, it is generally accepted that major improvements in nanocomposite magnets are possible and accessible (Bader, 2002). Figure 34a presents a high-resolution MFM image of nanocomposite Nd2 Fe14 B/Fe3 B permanent magnet ribbon; the corresponding AFM image of the surface topography is shown in Figure 34b. The AFM image reveals the grain structure of the ribbon, on a scale of ∼30 nm, although with some difficulty (i.e., individual grains are not sharply delineated, and in some regions they cannot be resolved). The most likely reason for this seems to be related to the polishing process of the ribbon. The magnetic structure exhibits domains a few hundred nanometers in size. The domain sizes are considerably larger than the grain size of the ribbon. This indicates that the domains are clusters composed of many grains with exchange interaction between the grains. For this reason, these domains are commonly referred to as interaction domains. However, within these domains a fine-scale image contrast (on the scale of ∼30 nm) can be seen, indicating that the magnetization directions of individual grains are not precisely the same but differ slightly. As already mentioned, MFM detects the stray field of the specimen and consequently is insensitive to the in-plane component of magnetization. Because the domains are imaged as bright and dark, they have different magnetization components perpendicular to the specimen surface. On the other hand, observations by the Foucault mode of TEM, which substantially senses the in-plane component of magnetization, clearly show magnetic domains displayed as bright and dark regions (Gibbs et al., 1995; Shindo et al., 2003). As a result, it appears that in general there are both perpendicular and in-plane magnetization components at the ribbon surface. In addition, the statement is consistent with the fact that the ribbon is composed of randomly oriented grains of the hard magnetic phase. The magnetic domain structure of thick cobalt crystals has been reported in many papers, mainly by the conventional Bitter pattern technique and magneto-optic Kerr microscopy. Nevertheless, as far as the author is aware, it was investigated by the MFM method only by Rave et al. (1998), Ding et al. (2001), and Szmaja et al. (2004d). Figure 35a shows a MFM image of the domain structure on the (0001) plane of a bulk cobalt monocrystal, and the corresponding image of the surface topography is shown in Figure 35b. In general, the magnetic microstructure of bulk cobalt monocrystals was not affected by the surface topography (see Figures 35a and 35b). In the MFM image of Figure 35a, the contrast is observed between neighboring domains. The image reveals the surface domain structure in the shape of flowers, stars, and circles, while the main domains, extending through the entire crystal thickness, are invisible (Szmaja et al., 2004d). This occurs because the MFM technique senses the stray field close to the specimen surface (the MFM image
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
239
of Figure 35a was recorded with a tip-specimen spacing of 100 nm), that is, it is a surface-sensitive method. In this respect, MFM is similar to the Bitter pattern technique and in contrast to the type I magnetic contrast method of SEM (see Sections III and IV and Figure 4). MFM images of the basal surface of bulk cobalt monocrystals resemble domain patterns recorded by the Bitter colloid method (as can be seen by comparison of Figure 35a with Figures 14c, 16a, and 23), as well as those observed with the polar magneto-optic Kerr effect (Hubert, 1967a; Rave et al., 1998). This is as expected because the MFM and Bitter pattern techniques probe the stray field of the specimen, and the polar magneto-optic Kerr effect detects the perpendicular magnetization component that generates the surface charge (or stray field) distribution. As mentioned previously, the domain structure on the (0001) surface of bulk cobalt monocrystals is very complex. The quantitative data obtained for the first time by SEMPA (Unguris et al., 1989) and then by magneto-optic Kerr microscopy (Hubert et al., 1990) show that at the uppermost part of the surface zone multiphase domain branching takes place with both in-plane and perpendicular magnetization components, whereas the remaining, deeper part of the surface zone is dominated by a two-phase branching process with the magnetization perpendicular to the specimen surface. The surface domains in the MFM image of Figure 35a are substantially displayed as unsharp. On the one hand, the reason for this effect may be associated with the fact that the local maxima of the stray field near the basal surface of a bulk cobalt monocrystal are not clearly marked, which is known from investigations performed by the conventional Bitter pattern technique (see Figure 14a). On the other hand, the observed effect may be due to partial remagnetization of the used conventional thin-film tips of moderate coercivity with the distributed nature of the magnetic tip volume by the specific configuration of the stray field around the surface domains, as discussed in more detail earlier in the text. The MFM results presented by Ding et al. (2001) and Szmaja et al. (2004a, 2004d) appear to indicate that both reasons contribute to the unsharp imaging of surface domains. Nevertheless, the problem can be overcome by the use of the DIP system (Szmaja et al., 2004d). After application of a procedure for sharpness enhancement to the MFM image of Figure 35a, a better-quality image, shown in Figure 35c, was obtained. As mentioned earlier and as expected, MFM images are dependent on the tip-specimen spacing. The dependence is illustrated in Figure 36, which presents images taken of an anisotropic sintered Nd–Fe–B magnet on the surface perpendicular to the alignment axis (Szmaja et al., 2004a). The MFM image of Figure 36a, recorded at a tip-specimen distance of 90 nm, reveals the magnetic domain structure most sharply. Note, however, that some
240
SZMAJA
topographic features (displayed in black) are present in this image, which can easily be identified as the image is compared with the AFM image of Figure 36d. With increasing tip-specimen separation, the spatial resolution in the recorded MFM images deteriorated, and consequently the surface reverse spikes were less visible (Figure 36b) and finally became invisible (Figure 36c). All of the objects in Figures 36b and 36c are of magnetic origin, but there are also some streaks visible in the scanning direction (from the left to the right). In Figure 36c, taken at a tip-specimen distance of 800 nm, virtually only the main domains (which extend through the entire grain thickness and form a maze pattern near the surface) are resolved. Also note that the image contrast decreased as the tip-specimen spacing was increased. However, this effect is not seen in the MFM images of Figures 36a–c because they show MFM images after application of a simple digital procedure for contrast enhancement. The observed dependence of MFM images on the tip-specimen distance can easily be understood when the configuration of the stray field above the specimen surface and the magnetostatic interaction between the specimen and the tip placed at a constant height above the specimen surface are taken into account. The specimen stray field is highly inhomogeneous. Nevertheless, as mentioned, there is a general property that the distance over which the stray field from a magnetic domain extends above the specimen surface is, to a good approximation, equal to the domain size (see Figure 4). As a consequence, the stray field produced by reverse spike domains extends over a smaller distance than that resulting from larger main domains. The magnetostatic interaction between the specimen and the tip is long-ranged, with a power law fall-off (Grütter et al., 1992). The mentioned features cause the spatial resolution and contrast of MFM images to decrease with increasing tip-specimen separation, and consequently information on small-size magnetic objects is lost as the tip-specimen separation is correspondingly large. In other words, to improve the spatial resolution and image contrast of MFM, the tip must be placed sufficiently close to the specimen surface to detect the high spatial frequency components of the stray field (i.e., the fine magnetic objects) with good signalto-noise performance (Szmaja et al., 2004a).
VI. C ONCLUSIONS This chapter presents recent developments in the observation of magnetic domain structures. The methods used were the type I magnetic contrast technique of SEM, the conventional Bitter pattern method, the colloid-SEM technique, and the MFM method. The magnetic images shown were of the magnetic structures of cobalt monocrystals, anisotropic sintered Nd–Fe–Bbased permanent magnets of different chemical composition, nanocomposite
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
241
Nd2 Fe14 B/Fe3 B permanent magnets, anisotropic sintered SmCo5 permanent magnets, thin polycrystalline permalloy and cobalt films, and ferrimagnetic garnet specimens. Most of the presented improvements in magnetic domain imaging could be achieved by use of the DIP system. In fact, only recent rapid progress in the development of personal computers and digitizer technologies has led to the common application of the DIP systems. In general, the task of the DIP system is to process images by computer in such a way that the possibilities for a retrieval of desired information are extended or improved. The categories of image restoration, image enhancement, and image analysis are usually distinguished within DIP methods (Hawkes, 1984; Reimer, 1985; Neumann et al., 1987). Image restoration means the removal or reduction of image degradation due to the aberrations of the image formation system (blurring) or the recording system (noise). Image enhancement encompasses all the methods that improve the visibility of original images, without any detailed analysis of the causes of their poor quality. The methods of image analysis provide information that cannot be obtained by a visual assessment of the images alone. DIP systems currently are widely used in many different fields of science and industry, where examination and analysis of images can help in understanding phenomena and processes. DIP methods are simultaneously powerful and flexible. They are used by a large number of researchers, engineers, and technicians for the purpose of extracting the maximum useful information stored in the images. For example, DIP techniques allow one to improve poor-quality original images; to identify, compare, classify, count, and measure image objects; to determine a number of specimen parameters; or to perform material-phase analysis. The results obtained with the DIP system are often well worth the effort and sometimes cannot be obtained any other way. In particular, DIP is of great significance for restoration, enhancement, and analysis of magnetic domain images. Magnetic domain imaging methods become more powerful by the addition of the DIP system. The visibility limit in domain observation can be expanded by an order of magnitude. DIP techniques allow one to obtain high-quality domain images (with high contrast, good sharpness, and low noise level) in the cases of weak magnetic contrast. This was demonstrated by improving poor-quality original images of sufficiently thin cobalt monocrystals of basal plane orientation recorded with the type I magnetic contrast method of SEM, of bulk cobalt monocrystals of basal plane orientation, as well as of thin polycrystalline films of soft magnetic materials (permalloy and cobalt) taken with the conventional Bitter pattern technique. As a result, images that at first seemed featureless became alive with detail.
242
SZMAJA
DIP not only allows restoration of information distorted or almost lost in the image formation and recording processes, but also analysis of the contents of a particular image and combination of the information provided by images recorded at different operating conditions (for different values of an external parameter, e.g., temperature, magnetic field, or stress) and/or with various detectors or signal generation modes with which instruments are equipped, so as to produce a directly interpretable image with maximum information. Of great importance is that digital methods of image analysis allow one to carry out a reproducible sequence of manifold operations that generally are not associated or are associated to a small degree with such a problem as the subjective point of view of the researcher. In this context, DIP techniques certainly allow determination of the magnetic domain widths of complex domain configurations in a more precise and objective manner than by visual measurements. We used this possibility for complicated or comparatively complicated domain structures observed in thicker cobalt monocrystals on the basal surface (also at different temperatures) and in anisotropic sintered Nd–Fe–B-based permanent magnets (of different chemical composition) on the surface perpendicular to the alignment axis. By combining image alignment with image subtraction, it is possible to obtain images with magnetic information only (i.e., without information of another type). This in turn allows direct assessment of whether the magnetic domain structure is influenced by the surface morphology. The separation of magnetic contrast from undesirable nonmagnetic features in the image was demonstrated for the type I magnetic contrast technique of SEM and the conventional Bitter pattern method, using images of bulk cobalt monocrystals of basal plane orientation and ferrimagnetic garnet specimens, respectively. Using the image difference technique, the character of changes (reversible or irreversible) in the domain structure during a cycle determined by an external parameter (e.g., magnetizing, temperature, or straining cycle) can be derived with great reliability. This was shown for images of the domain structure on the basal surface of a bulk cobalt monocrystal, recorded with the conventional Bitter pattern method at the beginning and at the end of the magnetizing cycle. The great advantages resulting from the application of the DIP system for image restoration, enhancement, and analysis have afforded detailed studies of the domain behavior in the cases of low magnetic contrast and/or (comparatively) complicated domain structures that were not possible before, as convincingly shown here. The difficulties related to the investigation of the magnetic structure of soft magnetic materials and on the basal surface of bulk cobalt monocrystals by the conventional Bitter pattern technique in zero or weak external magnetic fields, reported in the past, were overcome, and improvements over previous results could be achieved. The type I magnetic contrast method of SEM allowed detailed study of the temperature and
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
243
thickness dependencies of the domain structure on the basal surface of cobalt monocrystals—any change in the width of the main domains could be detected with great accuracy. By means of the colloid-SEM technique and MFM, a close examination was made of the domain structure of anisotropic sintered Nd–Fe–B-based permanent magnets (with different chemical composition) on the surface perpendicular to the alignment axis—the main and surface domain widths were precisely determined. Despite the great advantages of the DIP system, a word is necessary. The digital processing that is done can rearrange the information recorded in the original image so that a researcher can more readily interpret it, but the DIP system cannot add to this information. In other words, if the feature of interest is not present in the image signal, then no amount of digital processing will be able to detect it. For this reason, not only the setup of the DIP system needs attention, but also the preceding stages of optimum specimen preparation and optimum experimental operating conditions. In this context, the old adage “garbage in, garbage out” is particularly appropriate. The type I magnetic contrast method of SEM has a large probing distance of information above the specimen surface of some tens of micrometers or more for thick specimens. Because of this, the main domains are visible in images. The large probing distance, on the other hand, is connected to a low surface sensitivity. The method is well suited to the investigation of domain structures of specimens that possess sufficiently large domains and correspondingly strong stray fields above the specimen surface. The greatest disadvantage of the type I magnetic contrast technique is its relatively low spatial resolution of approximately 1 µm, which at present is limited by the signal-to-noise ratio. The spatial resolution can be improved to ∼0.3 µm by use of SEMs with high-brightness electron guns. The conventional Bitter pattern method has a typical spatial resolution of ∼0.5 µm and probes the stray field of the specimen essentially up to the distance of ∼0.5–1 µm from the specimen surface. In general, this method is useful for studying domain structures of materials with uniaxial or multiaxial magnetic anisotropy. However, application of the conventional Bitter pattern technique to investigate complex or relatively complex domain structures with domains comparable in size to the spatial resolution of the technique, as the domain structure of anisotropic sintered Nd–Fe–B-based permanent magnets on the surface perpendicular to the alignment axis, is, in fact, dangerous and may lead to incorrect results. The comparison made shows that the colloid-SEM method offers not only a higher spatial resolution, but also better surface sensitivity than the conventional Bitter pattern technique. The spatial resolution of the colloid-SEM technique (∼0.1 µm) appears to be limited mainly by clustering of the colloidal magnetic particles.
244
SZMAJA
MFM is a powerful technique for investigating domain structures. In the dynamic mode of operation, it possesses high spatial resolution and high surface sensitivity. In this respect, the MFM method offers clear advantages over other techniques of domain observation that rely on the specimen stray fields, such as the type I magnetic contrast method of SEM, the conventional Bitter pattern technique, or the colloid-SEM method. Thanks to the mentioned advantages, MFM allowed detection of previously unobserved fine surface domains in the form of curved stripes, superimposed on the coarse maze domains and reverse spike domains, present in anisotropic sintered Nd–Fe– B-based and SmCo5 permanent magnets on the surface perpendicular to the alignment axis. The smallest domain resolved by us using the MFM method was only 12 nm wide. With the future common use of advanced probes with artificially confined active probe volume, further improvements in MFM imaging are expected.
ACKNOWLEDGMENTS I thank Dr. Ken Makita of NEOMAX Co., Ltd., Osaka (Japan), for providing anisotropic sintered Nd–Fe–B-based permanent magnets [Nd14.5 Fe79 B6.5 , (Nd0.85 Dy0.15 )14.5 Fe79 B6.5 and (Nd0.7 Dy0.3 )14.5 Fe79 B6.5 ] and the data on their magnetic properties; Dr. Werner Rodewald and Dr. Matthias Katter of Vacuumschmelze GmbH & Co. KG, Hanau (Germany), for providing anisotropic sintered Nd–Fe–B-based permanent magnets (Nd13.7 Dy0.2 Fe79.7 TM0.4 B6 , where TM: Al, Ga, Co, Cu) and anisotropic sintered SmCo5 permanent magnets and the data on their magnetic properties; Mr. Yasutaka Shigemoto and Dr. Satoshi Hirosawa of NEOMAX Co., Ltd., Osaka (Japan), for preparing and providing nanocomposite Nd2 Fe14 B/Fe3 B permanent magnets and providing the data on their magnetic properties; Dr. Jarosław Grobelny and Dr. Michał Cichomski for their help in MFM observations; and M.Sc. Józef Balcerski for preparing thin polycrystalline permalloy and cobalt films.
R EFERENCES Abelmann, L., Lodder, C. (1997). Oblique evaporation and surface diffusion. Thin Solid Films 305, 1. Alameda, J.M., Carmona, F., Salas, F.H., Alvarez-Prado, L.M., Morales, R., Pérez, G.T. (1996). Effects of the initial stages of film growth on the magnetic anisotropy of obliquely-deposited cobalt thin films. J. Magn. Magn. Mater. 154, 249.
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
245
Al-Khafaji, M.A., Rainforth, W.M., Gibbs, M.R.J., Bishop, J.E.L., Davies, H.A. (1996). The effect of tip type and scan height on magnetic domain images obtained by MFM. IEEE Trans. Magn. 32, 4138. Al-Khafaji, M.A., Rainforth, W.M., Gibbs, M.R.J., Bishop, J.E.L., Davies, H.A. (1998). The origin and interpretation of fine scale magnetic contrast in magnetic force microscopy: A study using single-crystal NdFeB and a range of magnetic force microscopy tips. J. Appl. Phys. 83, 6411. Atkinson, R., Jones, G.A. (1976). Domain observation in MnBi films with the scanning electron microscope. J. Phys. D: Appl. Phys. 9, L131. Babcock, K.L., Elings, V.B., Shi, J., Awschalom, D.D., Dugas, M. (1996). Field-dependence of microscopic probes in magnetic force microscopy. Appl. Phys. Lett. 69, 705. Bader, S.D. (2002). Preface to the viewpoint set on: Nanostructured permanent magnets. Scripta Mater. 47, 527. Balk, L., Davies, D., Kultscher, N. (1984). Investigation of Si–Fe transformer sheets by scanning electron acoustic microscopy (SEAM). IEEE Trans. Magn. 20, 1466. Banbury, J.R., Nixon, W.C. (1967). The direct observation of domain structure and magnetic fields in the scanning electron microscope. J. Sci. Instrum. 44, 889. Barnier, Y., Pauthenet, R., Rimet, G. (1961). Variations thermiques des constantes d’anisotropie et de l’aimantation spontanée du cobalt dans la phase hexagonale. C. R. Acad. Sci. 252, 2839 [in French]. Bauer, E. (1994). Low energy electron microscopy. Rep. Prog. Phys. 57, 895. Bertrand, P., Conin, L., Hermann, C., Lampel, G., Peretti, J., Safarov, V.I. (1998). Imaging of magnetic domains with scanning tunneling optical microscopy. J. Appl. Phys. 83, 6834. Betzig, E., Trautman, J.K., Wolfe, R., Gyorgy, E.M., Finn, P.L., Kryder, M.H., Chang, C.-H. (1992). Near-field magneto-optics and high density data storage. Appl. Phys. Lett. 61, 142. Bitter, F. (1931). On inhomogeneities in the magnetization of ferromagnetic materials. Phys. Rev. 38, 1903. Bode, M., Getzlaff, M., Wiesendanger, R. (1998). Spin-polarized vacuum tunneling into the exchange-split surface state of Gd(0001). Phys. Rev. Lett. 81, 4256. Bodenberger, R., Hubert, A. (1977). Zur Bestimmung der Blochwandenergie von einachsigen Ferromagneten. Phys. Stat. Sol. (a) 44, K7 [in German]. Brown, D., Ma, B.M., Chen, Z. (2002). Developments in the processing and properties of NdFeB-type permanent magnets. J. Magn. Magn. Mater. 248, 432. Carey, R., Isaac, E.D. (1966). Magnetic Domains and Techniques for their Observation. Academic Press, New York.
246
SZMAJA
Celotta, R.J., Unguris, J., Kelley, M.H., Pierce, D.T. (2000). Techniques to measure magnetic domain structures. In: Methods in Materials Research: A Current Protocols Publication. Wiley, New York, pp. 6b.3.1–6b.3.15. Chang, A.M., Hallen, H.D., Harriott, L., Hess, H.F., Kao, H.L., Kwo, J., Miller, R.E., Wolfe, R., Van der Ziel, J., Chang, T.Y. (1992). Scanning Hall probe microscopy. Appl. Phys. Lett. 61, 1974. Chapman, J.N. (1984). The investigation of magnetic domain structures in thin foils by electron microscopy. J. Phys. D: Appl. Phys. 17, 623. Chapman, J.N., Johnston, A.B., Heyderman, L.J., McVitie, S., Nicholson, W.A.P., Bormans, B. (1994). Coherent magnetic imaging by TEM. IEEE Trans. Magn. 30, 4479. Chapman, J.N., McFadyen, I.R., McVitie, S. (1990). Modified differential phase contrast Lorentz microscopy for improved imaging of magnetic structures. IEEE Trans. Magn. 26, 1506. Chim, W.K. (1994). An analytical model for scanning electron microscope type I magnetic contrast with energy filtering. Rev. Sci. Instrum. 65, 374. Chim, W.K. (1995). An improved energy-filtering model for scanning electron microscope type 1 magnetic contrast incorporating the effects of external electric fields. J. Phys. D: Appl. Phys. 28, 1649. Chim, W.K. (1997). An analytical model for type I magnetic contrast enhancement with sample tilting. J. Appl. Phys. 82, 4143. Chim, W.K., Chan, D.S.H., Phang, J.C.H., Low, T.S., Thirumalai, S. (1993). An energy dependent model for type I magnetic contrast in the scanning electron microscope. Scanning Microsc. 7, 533. Chizhik, A.B., Davidenko, I.I., Maziewski, A., Stupakiewicz, A. (1998). High-temperature photomagnetism in Co-doped yttrium iron garnet films. Phys. Rev. B 57, 14366. Chung, M.S., Everhart, T.E. (1974). Simple calculation of energy distribution of low-energy secondary electrons emitted from metals under electron bombardment. J. Appl. Phys. 45, 707. Coey, J.M.D. (2002). Permanent magnet applications. J. Magn. Magn. Mater. 248, 441. Cort, D.M., Steeds, J.W. (1972). Some experiments using Kossel lines to study the magnetic domain structure in poly-crystalline cobalt. Phys. Stat. Sol. (a) 10, 215. Cowburn, R.P. (2000). The attractions of magnetism for nanoscale data storage. Phil. Trans. R. Soc. A 358, 281. Craik, D.J. (1974). The observation of magnetic domains. In: Coleman, R.V. (Ed.), Methods of Experimental Physics, vol. 11. Academic Press, New York, pp. 675–743. Craik, D.J., Tebble, R.S. (1961). Magnetic domains. Rep. Prog. Phys. 24, 116.
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
247
Daykin, A.C., Petford-Long, A.K. (1995). Quantitative mapping of the magnetic induction distribution using Foucault images formed in a transmission electron microscope. Ultramicroscopy 58, 365. Dillon, J.F., Jr. (1958). Observation of domains in the ferrimagnetic garnets by transmitted light. J. Appl. Phys. 29, 1286. Ding, H.F., Wulfhekel, W., Chen, C., Barthel, J., Kirschner, J. (2001). A combined magnetic force and spin-polarized scanning tunneling microscopy study of the closure domain pattern of Co(0001). Mater. Sci. Eng. B 84, 96. Dorsey, J.R. (1966). 1st Nat. Conf. Electron Probe Microanalysis. College Park, Maryland, unpublished. Dunk, P., Jones, G.A., Sukiennicki, A. (1975). Magnetic contrast in the scanning electron microscope. IEEE Trans. Magn. 11, 1394. Eisebitt, S., Lüning, J., Schlotter, W.F., Lörgen, M., Hellwig, O., Eberhardt, W., Stöhr, J. (2004). Lensless imaging of magnetic nanostructures by X-ray spectro-holography. Nature 432, 885. Fathers, D.J., Jakubovics, J.P. (1977). Methods of observing magnetic domains by scanning electron microscopy. Physica B 86–88, 1343. Fathers, D.J., Joy, D.C., Jakubovics, J.P. (1973a). Magnetic contrast in the SEM. In: Johari, O., Corvin, I. (Eds.), Proc. 6th SEM Symp. Illinois Institute of Research Technology, Chicago, p. 259. Fathers, D.J., Joy, D.C., Jakubovics, J.P. (1973b). Domain studies in uniaxial materials in the scanning electron microscope. In: Proc. Internat. Conf. Magnetism, vol. IV, Moscow, p. 197. Fidler, J., Kirchmayr, H., Skalicky, P. (1977). Verbesserung des Kontrastes von magnetischen Domänen in SmCo5 im Rasterelektronenmikroskopischen Sekundärelektronenbild. Phil. Mag. 35, 1125 [in German]. Fischer, P., Eimüller, T., Schütz, G., Schmahl, G., Guttmann, P., Pruegl, K., Bayreuther, G. (1998). Imaging of magnetic domains by transmission X-ray microscopy. J. Phys. D: Appl. Phys. 31, 649. Folks, L., Woodward, R.C. (1998). The use of MFM for investigating domain structures in modern permanent magnet materials. J. Magn. Magn. Mater. 190, 28. Gemperle, R., Gemperle, A. (1968). Dependence of the ferromagnetic domain width on the thickness of cobalt foils. Phys. Stat. Sol. 26, 207. Gemperle, R., Gemperle, A., Bursuc, I. (1963). Thickness dependence of domain structure in cobalt. Phys. Stat. Sol. 3, 2101. Gibbs, M.R.J., Al-Khafaji, M.A., Rainforth, W.M., Davies, H.A., Babcock, K., Chapman, J.N., Heyderman, L.J. (1995). A comparison of domain images obtained for nanophase alloys by magnetic force microscopy and high resolution Lorentz electron microscopy. IEEE Trans. Magn. 31, 3349. Goodenough, J.B. (1956). Interpretation of domain patterns recently found in BiMn and SiFe alloys. Phys. Rev. 102, 356.
248
SZMAJA
Goto, K., Sakurai, T. (1977). A colloid-SEM method for the study of fine magnetic domain structures. Appl. Phys. Lett. 30, 355. Grundy, P.J., Johnson, B. (1969). The transition thickness for uniform magnetization in thin cobalt crystals. Br. J. Appl. Phys. (J. Phys. D) 2, 1279. Grütter, P., Mamin, H.J., Rugar, D. (1992). Magnetic force microscopy (MFM). In: Wiesendanger, R., Güntherodt, H.-J. (Eds.), Scanning Tunneling Microscopy II. Springer, Berlin, pp. 151–207. Hadjipanayis, G.C. (1999). Nanophase hard magnets. J. Magn. Magn. Mater. 200, 373. Hara, K., Itoh, K., Kamiya, M., Okamoto, K., Hashimoto, T. (1996). Hysteresis loops of cobalt films deposited obliquely by sputtering. J. Magn. Magn. Mater. 161, 287. Harasko, G., Pfützner, H., Futschik, K. (1995). Domain analysis by means of magnetostactic bacteria. IEEE Trans. Magn. 31, 938. Hartmann, U. (1987). A theoretical analysis of Bitter-pattern evolution. J. Magn. Magn. Mater. 68, 298. Hartmann, U. (1999). Magnetic force microscopy. Annu. Rev. Mater. Sci. 29, 53. Hartmann, U. (2005). Scanning probe methods for magnetic imaging. In: Hopster, H., Oepen, H.P. (Eds.), Magnetic Microscopy of Nanostructures. Springer, Berlin, pp. 285–307. Hartmann, U., Mende, H.H. (1985). Observation of Bloch wall fine structures on iron whiskers by a high-resolution interference contrast technique. J. Phys. D: Appl. Phys. 18, 2285. Hawkes, P.W. (1984). Processing electron images, In: Chapman, J.N., Craven, A.J. (Eds.), Quantitative Electron Microscopy, Scottish Universities Summer School in Physics Publications, Edinburgh, pp. 351–397. Herring, C.P., Jakubovics, J.P. (1973). Observation of magnetic domain patterns in terbium and dysprosium. J. Phys. F: Met. Phys. 3, 157. Hoffmann, R., Samson, Y., Marty, A., Gehanno, V., Gilles, B., Mazille, J.E. (1999). Shape instability in out of equilibrium magnetic domains observed in ultrathin magnetic films with perpendicular anisotropy. J. Magn. Magn. Mater. 192, 409. Howells, G.D., Oral, A., Bending, S.J., Andrews, S.R., Squire, P.T., Rice, P., de Lozanne, A., Bland, J.A.C., Kaya, I., Henini, M. (1999). Scanning Hall probe microscopy of ferromagnetic structures. J. Magn. Magn. Mater. 196– 197, 917. Hua, L., Bishop, J.E.L., Tucker, J.W. (1995). Simulation of magnetization ripples on permalloy caused by surface anisotropy. J. Magn. Magn. Mater. 140–144, 655. Hua, L., Bishop, J.E.L., Tucker, J.W. (1996). Simulation of transverse and longitudinal magnetic ripple structures induced by surface anisotropy. J. Magn. Magn. Mater. 163, 285.
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
249
Hua, S.Z., Lashmore, D.S., Swartzendruber, L.J., Egelhoff, W.F. Jr., Raj, K., Chopra, H.D. (1997). Observation of domain dynamics in giant magnetoresistive Co–Cu-based polycrystalline multilayers. J. Appl. Phys. 81, 4582. Hubert, A. (1967a). Der Einfluß der Magnetostriktion auf die magnetische Bereichsstruktur einachsiger Kristalle, insbesondere des Kobalts. Phys. Stat. Sol. 22, 709 [in German]. Hubert, A. (1967b). Zur Theorie der zweiphasigen Domänenstrukturen in Supraleitern und Ferromagneten. Phys. Stat. Sol. 24, 669 [in German]. Hubert, A., Schäfer, R. (1998). Magnetic Domains: The Analysis of Magnetic Microstructures. Springer, Berlin. Hubert, A., Schäfer, R., Rave, W. (1990). The analysis of magnetic microstructure. In: Proc. 5th Symp. Magn. Magn. Mater. (Taipei, 1989). World Scientific, Singapore, p. 25. Itoh, K., Ichikawa, F., Ishida, Y., Okamoto, K., Uchiyama, T., Iguchi, I. (2002). Columnar grain structure in cobalt films deposited obliquely by introducing oxygen during sputtering. J. Magn. Magn. Mater. 248, 112. Itoh, K., Okamoto, K., Hashimoto, T. (1998). Crystallographic contribution to the formation of the columnar grain structure in cobalt films deposited at oblique incidence. J. Magn. Magn. Mater. 190, 176. Jakubovics, J.P. (1975). Lorentz microscopy and applications (TEM and SEM). In: Ruedl, E., Valdré, U. (Eds.), Electron Microscopy in Materials Science. Commission of European Communities, Brussels, Luxembourg, pp. 1303–1403. Jakubovics, J.P. (1994). Magnetism and Magnetic Materials. The Institute of Materials, London. Johansen, H., Werner, U. (1987). Scanning electron microscopy: Electronoptical and technical fundamentals of the instrument. In: Bethge, H., Heydenreich, J. (Eds.), Electron Microscopy in Solid State Physics. Elsevier, Amsterdam, pp. 143–169. Jones, G.A. (1976). On the quality of type 1 magnetic contrast obtained in the scanning electron microscope. Phys. Stat. Sol. (a) 36, 647. Jones, G.A. (1977). Magnetic contrast in the scanning electron microscope with particular reference to bubble domain structures. J. Magn. Magn. Mater. 5, 305. Jones, G.A. (1978). Magnetic contrast in the scanning electron microscope: An appraisal of techniques and their applications. J. Magn. Magn. Mater. 8, 263. Jones, G.A., Puchalska, I.B. (1979). The birefringent effects of magnetic colloid applied to the study of magnetic domain structures. Phys. Stat. Sol. (a) 51, 549. Joy, D.C., Jakubovics, J.P. (1968). Direct observation of magnetic domains by scanning electron microscopy. Phil. Mag. 17, 61.
250
SZMAJA
Joy, D.C., Jakubovics, J.P. (1969). Scanning electron microscope study of the magnetic domain structure of cobalt single crystals. Br. J. Appl. Phys. (J. Phys. D) 2, 1367. Kaczér, J. (1964). On the domain structure of uniaxial ferromagnets. Soviet Phys. JETP 19, 1204. Zh. Eksp. Teor. Fiz. 46, 1787 (in Russian). Kaczér, J., Gemperle, R. (1960). The thickness dependence of the domain structure of magnetoplumbite. Czech. J. Phys. B 10, 505. Kaczér, J., Gemperle, R., Hauptman, Z. (1959). Domain structure of cobalt whiskers. Czech. J. Phys. 9, 606. Kagoshima, Y., Miyahara, T., Ando, M., Wang, J., Aoki, S. (1996). Magnetic domain-specific microspectroscopy with a scanning X-ray microscope using circularly polarized undulator radiation. J. Appl. Phys. 80, 3124. Kammlott, G.W. (1971). Observation of ferromagnetic domains with the scanning electron microscope. J. Appl. Phys. 42, 5156. Kirilyuk, V., Kirilyuk, A., Rasing, T. (1997). A combined nonlinear and linear magneto-optical microscopy. Appl. Phys. Lett. 70, 2306. Kitakami, O. (1991). A convenient method for estimating the inclination of recorded magnetization using the Bitter technique. Jpn. J. Appl. Phys. 30, L739. Kittel, C. (1946). Theory of the structure of ferromagnetic domains in films and small particles. Phys. Rev. 70, 965. Kittel, C. (1949). Physical theory of ferromagnetic domains. Rev. Mod. Phys. 21, 541. Klugmann, E., Blythe, H.J., Walz, F. (1994). Investigation of thermomagnetic effects in monocrystalline cobalt near the martensitic phase transition. Phys. Stat. Sol. (a) 146, 803. Koike, K., Matsuyama, H., Hayakawa, K. (1987). Spin-polarized scanning electron microscopy for micro-magnetic structure observation. Scanning Microsc. 1 (Suppl.), 241. Koike, K., Matsuyama, H., Tseng, W.J., Li, J.C.M. (1993). Fine magnetic domain structure of stressed amorphous metal. Appl. Phys. Lett. 62, 2581. Kotera, M., Katoh, M., Suga, H. (1995). Observation technique of surface magnetic structure using type-I magnetic contrast in the scanning electron microscope. Jpn. J. Appl. Phys. 34, 6903. Kranz, J., Hubert, A. (1963). Die Möglichkeiten der Kerr-Technik zur Beobachtung magnetischer Bereiche. Z. Angew. Phys. 15, 220 [in German]. Kumar, K. (1988). RETM5 and RE2 TM17 permanent magnets development. J. Appl. Phys. 63, R13. Lee, S.K., Das, B.N., Harris, V.G. (1999). Magnetic structure of single crystal Tb2 Fe14 B. J. Magn. Magn. Mater. 207, 137. Leinenbach, P., Memmert, U., Schelten, J., Hartmann, U. (1999). Fabrication and characterization of advanced probes for magnetic force microscopy. Appl. Surf. Sci. 144–145, 492.
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
251
Lewis, L.H., Wang, J.-Y., Canfield, P. (1998). Magnetic domains of singlecrystal Nd2 Fe14 B imaged by unmodified scanning electron microscopy. J. Appl. Phys. 83, 6843. Li, J., Rau, C. (2005). Spin-resolved magnetic studies of focused ion beam etched nano-sized magnetic structures. Nucl. Instrum. Methods Phys. Res. B 230, 518. Lin, H.-N., Chiou, Y.H., Chen, B.-M., Shieh, H.-P.D., Chang, C.-R. (1998). Magnetic force microscopy study of domain walls on a thin cobalt film. J. Appl. Phys. 83, 4997. Livingston, J.D. (1973). Domains in sintered Co5 Sm magnets. Phys. Stat. Sol. (a) 18, 579. Livingston, J.D. (1985). Magnetic domains in sintered Fe–Nd–B magnets. J. Appl. Phys. 57, 4137. Livingston, J.D., Morris, W.G. (1985). Magnetic domains in amorphous metal ribbons. J. Appl. Phys. 57, 3555. Lloyd, S.J., Twitchett, A.C., Blamire, M.G., Midgley, P.A. (1999). Magnetic contrast in a focused ion beam microscope. Inst. Phys. Conf. Ser. 161, 99. Mallinson, J.C. (1981). On the properties of two-dimensional dipoles and magnetized bodies. IEEE Trans. Magn. 17, 2453. Mankos, M., Scheinfein, M.R., Cowley, J.M. (1996). Electron holography and Lorentz microscopy of magnetic materials. In: Hawkes, P.W. (Ed.), Advances in Imaging and Electron Physics, vol. 98. Academic Press, San Diego, pp. 323–426. Martin, Y., Wickramasinghe, H.K. (1987). Magnetic imaging by “force microscopy” with 1000 Å resolution. Appl. Phys. Lett. 50, 1455. Mayer, L. (1957). Electron mirror microscopy of magnetic domains. J. Appl. Phys. 28, 975. McCartney, M.R., Dunin-Borkowski, R.E., Smith, D.J. (2001). Electron holography and its application to magnetic materials. In: De Graef, M., Zhu, Y. (Eds.), Experimental Methods in the Physical Sciences, vol. 36. Academic Press, San Diego, pp. 111–136. Miltat, J.E.A. (1976). Fir-tree patterns. Elastic distortions and application to X-ray topography. Phil. Mag. 33, 225. Mundschau, M., Romanowicz, J., Wang, J.Y., Sun, D.L., Chen, H.C. (1996). Imaging of ferromagnetic domains using photoelectrons: Photoelectron emission microscopy of neodymium–iron–boron (Nd2 Fe14 B). J. Vac. Sci. Technol. B 14, 3126. Neumann, W., Hillebrand, R., Krajewski, Th. (1987). Image processing in electron microscopy. In: Bethge, H., Heydenreich, J. (Eds.), Electron Microscopy in Solid State Physics, vol. 40. Elsevier, Amsterdam, pp. 265– 285.
252
SZMAJA
Nicholson, P.I., So, M.H., Meydan, T., Moses, A.J. (1996). Non-destructive surface inspection system for steel and other ferromagnetic materials using magneto-resistive sensors. J. Magn. Magn. Mater. 160, 162. Nikitenko, V.I., Gornakov, V.S., Dedukh, L.M., Kabanov, Yu.P., Khapikov, A.F., Shapiro, A.J., Shull, R.D., Chaiken, A., Michel, R.P. (1998). Direct experimental study of the magnetization reversal process in epitaxial and polycrystalline films with unidirectional anisotropy. J. Appl. Phys. 83, 6828. Pavlidis, T. (1987). Algorithms for Graphics and Image Processing. Wydawnictwa Naukowo-Techniczne, Warsaw [in Polish]. Polcarová, M. (1969). Applications of X-ray diffraction topography to the study of magnetic domains. IEEE Trans. Magn. 5, 536. Prance, R.J., Clark, T.D., Prance, H., Howells, G. (1999). Imaging of magnetically recorded data using a novel scanning magnetic microscope. J. Magn. Magn. Mater. 193, 437. Rave, W., Schäfer, R., Hubert, A. (1987). Quantitative observation of magnetic domains with the magneto-optical Kerr effect. J. Magn. Magn. Mater. 65, 7. Rave, W., Zueco, E., Schäfer, R., Hubert, A. (1998). Observations on highanisotropy single crystals using a combined Kerr/magnetic force microscope. J. Magn. Magn. Mater. 177–181, 1474. Reimer, L. (1985). Scanning Electron Microscopy. Springer, Berlin. Rice, P., Moreland, J. (1991). A new look at the Bitter method of magnetic imaging. Rev. Sci. Instrum. 62, 844. Rippard, W.H., Buhrman, R.A. (1999). Ballistic electron magnetic microscopy: Imaging magnetic domains with nanometer resolution. Appl. Phys. Lett. 75, 1001. Sagawa, M., Fujimura, S., Yamamoto, H., Matsuura, Y., Hiraga, K. (1984). Permanent magnet materials based on the rare earth–iron–boron tetragonal compounds. IEEE Trans. Magn. 20, 1584. Sagawa, M., Fujimura, S., Yamamoto, H., Matsuura, Y., Hirosawa, S., Hiraga, K. (1985). Magnetic properties and microstructure of rare earth–iron– boron permanent magnet materials. In: Proc. 4th Internat. Symp. Magnetic Anisotropy and Coercivity in Rare Earth-Transition Metal Alloys, University of Dayton, Dayton, OH, p. 587. Sakurai, T., Kitakami, O., Shimada, Y. (1994). Observation of high density recording states using magnetic fine particles made by sputtering method. J. Magn. Magn. Mater. 130, 384. Sakurai, T., Shimada, Y. (1992). Application of the gas evaporation method to observation of magnetic domains. Jpn. J. Appl. Phys. 31, 1905. Schäfer, R. (1995a). Magneto-optical domain studies in coupled magnetic multilayers. J. Magn. Magn. Mater. 148, 226.
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
253
Schäfer, R. (1995b). Magneto-optical microscopy for the analysis of magnetic microstructures. In: 4th Symp. Magn. Mater. Processes Devices, vol. 95-18. The Electrochemical Society, Chicago, p. 300. Schäfer, R., Hubert, A. (1990). A new magneto-optic effect related to nonuniform magnetization on the surface of a ferromagnet. Phys. Stat. Sol. (a) 118, 271. Scheinfein, M.R., Unguris, J., Kelley, M.H., Pierce, D.T., Celotta, R.J. (1990). Scanning electron microscopy with polarization analysis (SEMPA). Rev. Sci. Instrum. 61, 2501. Schlenker, M., Baruchel, J. (1978). Neutron techniques for the observation of ferro- and antiferromagnetic domains. J. Appl. Phys. 49, 1996. ´ ech, W., Brookes, N.B., Schneider, C.M., Frömter, R., Ziethen, C., Swi˛ Schönhense, G., Kirschner, J. (1997). Magnetic domain imaging with a photoemission microscope. Mater. Res. Soc. Symp. Proc. 475, 381. Scholl, A., Ohldag, H., Nolting, F., Anders, S., Stöhr, J. (2005). Study of ferromagnet–antiferromagnet interfaces using X-ray PEEM. In: Hopster, H., Oepen, H.P. (Eds.), Magnetic Microscopy of Nanostructures. Springer, Berlin, pp. 29–50. Schönhense, G. (1999). Imaging of magnetic structures by photoemission electron microscopy. J. Phys.: Condens. Matter 11, 9517. Shindo, D., Park, Y.-G., Murakami, Y., Gao, Y., Kanekiyo, H., Hirosawa, S. (2003). Electron holography of Nd–Fe–B nanocomposite magnets. Scripta Mater. 48, 851. Šimšová, J., Gemperle, R., Lodder, J.C. (1991). The use of colloid-SEM method for domain observation in CoCr films. J. Magn. Magn. Mater. 95, 85. Skidmore, G.D., Dahlberg, E.D. (1997). Improved spatial resolution in magnetic force microscopy. Appl. Phys. Lett. 71, 3293. Spivak, G.V., Prilezhaeva, I.N., Azovtsev, V.K. (1955). Magnitnii kontrast v elektronnom zerkale i nablyudeniye domenov ferromagnetika. Doklady Akad. Nauk SSSR, Fiz. 105, 965 [in Russian]. Szewczyk, A., Piotrowski, K., Szymczak, R. (1983). A new method of domain structure investigation at temperatures below 35 K. J. Phys. D: Appl. Phys. 16, 687. Szmaja, W. (1994). SEM investigation of the dependence of magnetic domain structure on the thickness of cobalt monocrystals. J. Magn. Magn. Mater. 130, 138. Szmaja, W. (1996). The thickness dependence of the magnetic domain structure in cobalt monocrystals studied by SEM. J. Magn. Magn. Mater. 153, 215. Szmaja, W. (1998). Digital image processing system for magnetic domain observation in SEM. J. Magn. Magn. Mater. 189, 353.
254
SZMAJA
Szmaja, W. (1999). Digitally enhanced type-I magnetic contrast in SEM as a method of domain investigation. J. Magn. Magn. Mater. 202, 201. Szmaja, W. (2000). Studies of the surface domain structure of cobalt monocrystals by the SEM type-I magnetic contrast and Bitter colloid method. J. Magn. Magn. Mater. 219, 281. Szmaja, W. (2001). Separation of magnetic from non-magnetic information in the Bitter pattern method. J. Magn. Magn. Mater. 234, 13. Szmaja, W. (2002a). Improvements in domain study with the conventional Bitter method by digital image processing system. Phys. Stat. Sol. (a) 194, 315. Szmaja, W. (2002b). Improvements and actual problems in domain imaging by type-I magnetic contrast in SEM. Czech. J. Phys. 52 (Suppl. A), A145. Szmaja, W. (2004). Investigation of the domain structure of sintered Nd–Fe–B permanent magnets by Bitter-pattern method. Czech. J. Phys. 54, 1503. Szmaja, W. (2005). Observation of the domain structure of Nd–Fe–B magnets using the SEM type-I magnetic contrast. J. Electron Spectrosc. Relat. Phenom. 148, 123. Szmaja, W. (2006). Investigations of the domain structure of anisotropic sintered Nd–Fe–B based permanent magnets. J. Magn. Magn. Mater. 301, 546. Szmaja, W., Balcerski, J. (2002). Domain investigation by the conventional Bitter pattern technique with digital image processing. Czech. J. Phys. 52, 223. Szmaja, W., Balcerski, J. (2004). Study of the magnetic microstructure of thin cobalt films by Bitter pattern method and Lorentz microscopy. Czech. J. Phys. 54 (Suppl. D), D245. Szmaja, W., Grobelny, J., Cichomski, M., Makita, K., Rodewald, W. (2004a). MFM study of sintered permanent magnets. Phys. Stat. Sol. (a) 201, 550. Szmaja, W., Grobelny, J., Cichomski, M., Makita, K. (2004b). Application of MFM for studying Nd–Fe–B magnets. Vacuum 74, 297. Szmaja, W., Grobelny, J., Cichomski, M. (2004c). Domain structure of sintered SmCo5 magnets studied by magnetic force microscopy. Appl. Phys. Lett. 85, 2878. Szmaja, W., Grobelny, J., Cichomski, M. (2004d). MFM investigation of the domain structure of cobalt single crystals. Czech. J. Phys. 54 (Suppl. D), D249. Szmaja, W., Pola´nski, K., Dolecki, K. (1994). SEM investigation of the temperature dependence of magnetic domain structure of cobalt monocrystals. J. Magn. Magn. Mater. 130, 147. Szmaja, W., Pola´nski, K., Dolecki, K. (1995a). The temperature dependence of magnetic domain structure in cobalt monocrystals studied by SEM. J. Magn. Magn. Mater. 151, 249.
DEVELOPMENTS IN THE IMAGING OF MAGNETIC DOMAINS
255
Szmaja, W., Pola´nski, K., Dolecki, K. (1995b). SEM temperature study of magnetic domain structure in cobalt monocrystals. Physica B 216, 71. Szmaja, W., Pola´nski, K., Dolecki, K. (1997). On temperature dependence of domain structure in cobalt. Acta Phys. Polon. A 92, 469. Szymczak, R. (1968). A modification of the Kittel open structure. J. Appl. Phys. 39, 875. Szymczak, R. (1973). Observation of internal domain structure of barium ferrite in infrared. Acta Phys. Polon. A 43, 571. Szymczak, R., Givord, D., Li, H.S. (1987). Dependence of domain width on crystal thickness in Nd2 Fe14 B single crystals. Acta Phys. Polon. A 72, 113. Takata, Y. (1963). Observation of domain structure and calculation of magnetostatic energy on the c-plane of cobalt single crystals. J. Phys. Soc. Jpn. 18, 87. Thiaville, A., Hubert, A., Schäfer, R. (1991). An isotropic description of the new-found gradient-related magneto-optical effect. J. Appl. Phys. 69, 4551. Traeger, G., Wenzel, L., Hubert, A. (1992). Computer experiments on the information depth and the figure of merit in magnetooptics. Phys. Stat. Sol. (a) 131, 201. Träuble, H., Boser, O., Kronmüller, H., Seeger, A. (1965). Ferromagnetische Eigenschaften hexagonaler Kobalt-Einkristalle. Phys. Stat. Sol. 10, 283 [in German]. Tsuno, K. (1988). Magnetic domain observation by means of Lorentz electron microscopy with scanning technique. Rev. Solid State Sci. 2, 623. Unguris, J., Scheinfein, M.R., Celotta, R.J., Pierce, D.T. (1989). Magnetic microstructure of the (0001) surface of hcp cobalt. Appl. Phys. Lett. 55, 2553. Urchulutegui, M., Piqueras, J., Aroca, C. (1991). Study of magnetic domains in amorphous ribbons by scanning electron acoustic microscopy. Appl. Phys. Lett. 59, 994. Vellekoop, B., Abelmann, L., Porthun, S., Lodder, C. (1998). On the determination of the internal magnetic structure by magnetic force microscopy. J. Magn. Magn. Mater. 190, 148. Volkov, V.V., Zhu, Y. (2004). Lorentz phase microscopy of magnetic materials. Ultramicroscopy 98, 271. von Hámos, L., Thiessen, P.A. (1931). Über die Sichtbarmachung von Bezirken verschiedenen ferromagnetischen Zustands fester Körper. Z. Phys. 71, 442 [in German]. Wardly, G.A. (1971). Magnetic contrast in the scanning electron microscope. J. Appl. Phys. 42, 376. Wells, O.C. (1985). Some theoretical aspects of type-1 magnetic contrast in the scanning electron microscope. J. Microsc. 139, 187. Williams, H.J., Bozorth, R.M., Shockley, W. (1949). Magnetic domain patterns on single crystals of silicon iron. Phys. Rev. 75, 155.
256
SZMAJA
Wittborn, J., Rao, K.V., Nogués, J., Schuller, I.K. (2000). Magnetic domain and domain-wall imaging of submicron Co dots by probing the magnetostrictive response using atomic force microscopy. Appl. Phys. Lett. 76, 2931. Wulfhekel, W., Ding, H.F., Lutzke, W., Steierl, G., Vázquez, M., Marín, P., Hernando, A., Kirschner, J. (2001). High-resolution magnetic imaging by local tunneling magnetoresistance. Appl. Phys. A 72, 463. Wysłocki, B., Zi˛etek, W. (1966). Selective powder-pattern observations of complex ferromagnetic domain structures. Acta Phys. Polon. 29, 223. Wysłocki, J.J., Suski, W., Pawlik, P., Wochowski, K., Kotur, B., Bodak, O.I. (1996). Magnetocrystalline anisotropy constants, rotational hysteresis energy and magnetic domain structure in UFe6 Al6 , UFe9 AlSi2 and ScFe10 Si2 intermetallic compounds. J. Magn. Magn. Mater. 162, 239. Yamamoto, T., Tsuno, K. (1975). Magnetic contrast in secondary electron images of uniaxial ferromagnetic materials obtained by scanning electron microscopy. Phys. Stat. Sol. (a) 28, 479. Zheng, N.J., Rau, C. (1993). Scanning-ion microscopy with polarization analysis (SIMPA). Mat. Res. Soc. Symp. Proc. 313, 723. Zhu, Y., McCartney, M.R. (1998). Magnetic-domain structure of Nd2 Fe14 B permanent magnets. J. Appl. Phys. 84, 3267. Zueco, E., Rave, W., Schäfer, R., Hubert, A., Schultz, L. (1998). Combined Kerr-/magnetic force microscopy on NdFeB crystals of different crystallographic orientation. J. Magn. Magn. Mater. 190, 42.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 141
Deconvolution Over Groups in Image Reconstruction* BIRSEN YAZICI AND CAN EVREN YARMAN Department of Electrical, Computer and Systems Engineering, Rensselaer Polytechnic Institute, Troy, New York 12180, USA
I. Introduction . . . . . . . . . . . . . . . . . A. Motivations . . . . . . . . . . . . . . . . B. Organization . . . . . . . . . . . . . . . . II. Convolution and Fourier Analysis on Groups . . . . . . . . A. Convolution on Groups . . . . . . . . . . . . . B. Fourier Analysis on Groups . . . . . . . . . . . . III. Group Stationary Processes . . . . . . . . . . . . . IV. Wiener Filtering Over Groups . . . . . . . . . . . . A. Remarks . . . . . . . . . . . . . . . . . V. Wideband Extended Range-Doppler Imaging . . . . . . . . A. Fourier Theory of the Affine Group . . . . . . . . . 1. Affine Group . . . . . . . . . . . . . . . 2. Fourier Transform over the Affine Group . . . . . . . B. Target Reflectivity Estimation . . . . . . . . . . . 1. Receiver Design . . . . . . . . . . . . . . C. Waveform Design . . . . . . . . . . . . . . . D. Numerical Experiments . . . . . . . . . . . . . VI. Radon and Exponential Radon Transforms . . . . . . . . A. Fourier Transform over M(2) . . . . . . . . . . . B. Radon and Exponential Radon Transforms as Convolutions . . 1. Radon Transform . . . . . . . . . . . . . . 2. Exponential and Angle-Dependent Exponential Radon Transforms C. Inversion Methods . . . . . . . . . . . . . . D. Numerical Algorithms . . . . . . . . . . . . . 1. Fourier Transform over M(2) . . . . . . . . . . 2. Reconstruction Algorithm . . . . . . . . . . . E. Numerical Simulations . . . . . . . . . . . . . 1. Radon Transform . . . . . . . . . . . . . . 2. Exponential Radon Transform with Uniform Attenuation . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
258 258 259 260 260 262 263 266 267 268 271 271 272 273 274 275 277 279 282 285 285 286 287 289 289 290 290 290 292
* Disclaimer
The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Research Laboratory or the U.S. Government.
257 ISSN 1076-5670/05 DOI: 10.1016/S1076-5670(05)41004-6
Copyright 2006, Elsevier Inc. All rights reserved.
258
YAZICI AND YARMAN
VII. Conclusion . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . Appendix A . . . . . . . . . . . . . Definitions . . . . . . . . . . . Definition A1.1 . . . . . . . . Definition A1.2 . . . . . . . . Definition A1.3 . . . . . . . . Appendix B . . . . . . . . . . . . . Distributions and Fourier Transform Over M(2) References . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
292 293 293 293 294 294 294 295 295 296
I. I NTRODUCTION A. Motivations Convolution integrals over groups arise in a broad array of science and engineering problems. This is due to the ubiquitous presence of invariance or symmetry in natural and man-made systems. Many imaging systems, for example, have symmetry properties. These include an ordinary circular lens that exhibits rotational invariance with respect to its optical axis and a X-ray tomographic system that exhibits invariance with respect to rigid body motions of the Euclidean space. The mathematical framework that underlies the concept of invariance is group theory. This chapter describes a group theoretic framework for system modeling and signal processing to describe and exploit invariance. We then apply this framework to two image reconstruction problems formulated as convolution integrals over groups. The discussion starts with the review of the group theoretic signal and system theory introduced in Yazici (2004) and shows how group theory leads to expanded understanding of familiar concepts such as convolution, Fourier analysis, and stationarity. Group convolution operation is defined as a representation of the input–output relationship of a linear system, which has dynamics invariant under the group composition law. Fourier transforms over groups are introduced to study convolution operation that transforms convolution to operator multiplication in the Fourier domain. The concept of group stationarity, generalizing ordinary stationarity, is introduced to model imperfect symmetries. Spectral decomposition of group stationary processes is presented using Fourier transforms over groups. Deconvolution problem over groups is introduced, and a review of the Wiener-type minimum mean square error solution to the deconvolution problem is presented. Next, this group-theoretic framework is applied to address two imaging problems: (1) target estimation in wideband extended range-Doppler imaging and diversity waveform design and (2) inversion of the Radon and exponential Radon transforms for transmission and emission tomography.
DECONVOLUTION OVER GROUPS
259
In the first problem, the output of the match filter becomes convolution integral over the affine or Heisenberg group, depending on whether the underlying model is wideband or narrowband. However, for the wideband model, modeling the received echo as the Fourier transform of the reflectivity density function over the affine group allows implementation of the Wienertype deconvolution in the Fourier transform domain. Therefore, the echo model is modeled as the Fourier transform of the superposition of the target and unwanted scatterers. The receiver and waveform design problems are addressed within the Wiener filtering framework defined over the affine group. The approach allows joint design of adaptive transmit and receive methods for wideband signals. In the second problem, both Radon and exponential Radon transforms are formulated as convolutions over the Euclidean motion group of the plane. Hence, recovering a function from its Radon or exponential Radon transform projections becomes a deconvolution problem over the Euclidean motion group. The deconvolution is performed by applying a special case of the Wiener filter introduced in Section IV. The approach presented for the Radon and exponential Radon transform can be extended to other integral transforms of transmission and emission tomography, unifying them under a single convolution representation (Yarman and Yazici, 2005e). Apart from these imaging problems, convolutions over groups appear in a broad array of engineering problems. These include workspace estimation in robotics, estimation of the structure of macromolecules in polymer science, estimation of illumination and bidirectional reflectance distribution function in computer graphics, and motion estimation in omnidirectional vision (Blahut, 1991; Chirikjian and Ebert-Uphoff, 1998; Ebert-Uphoff and Chirikjian, 1996; Ferraro, 1992; Kanatani, 1990; Lenz, 1990; Miller, 1991; Naparst, 1991; Popplestone, 1984; Ramamoorthi and Hanrahan, 2001; Srivastava and Buschman, 1977; Volchkov, 2003). B. Organization Section II introduces the concept of convolution and Fourier analysis over groups. Section III presents stochastic processes exhibiting invariance with respect to group composition law and their spectral decomposition theorems. Section IV describes Wiener filtering over groups as a solution of the deconvolution problem. Section V addresses wideband target estimation and waveform design problems. Section VI presents the inversion of Radon and exponential Radon transforms of transmission and emission tomography within the framework introduced in Sections II to IV.
260
YAZICI AND YARMAN
II. C ONVOLUTION
AND
F OURIER A NALYSIS ON G ROUPS
Let G be a group. We denote the composition of two elements g, h ∈ G by gh, the identity element by e (i.e., eg = ge = g, for all g ∈ G), and inverse of an element g by g −1 (i.e., g −1 g = gg −1 = e). It is assumed that the reader is familiar with the concepts of group theory and topological spaces. For an introduction to group theory and topological groups, the reader is referred to Sattinger and Weaver (1986) and Artin (1991). A. Convolution on Groups Recall that the input–output relationship of a linear time-invariant system can be represented as a convolution integral ∞ (fin ∗ Λ)(t) ≡ fout (t) =
fin (τ )Λ(t − τ ) dτ.
(1)
−∞
The fundamental property of the ordinary convolution integral is its invariance under time shifts. To generalize ordinary convolution to functions over groups, an appropriate integration measure invariant under group translations must be defined: f (g) dμ(g) = f (hg) dμ(g), (2) G
G
for all h in G and integrable f. For locally compact groups, such an integration measure exists and is called the left Haar measure. Left Haar measure satisfies dμ(hg) = dμ(g), while right Haar measure satisfies dμR (gh) = dμR (g). In general, left and right Haar measures are not equal for an arbitrary group. However, one has dμR (g) = Δ−1 (g) dμ(g), where Δ(g) is the modular function satisfying Δ(e) = 1, Δ(g) > 0, Δ(gh) = Δ(g)Δ(h). Those groups for which the modular function is 1 are called unimodular. For example, the Euclidean motion group and the Heisenberg group are unimodular, but affine and scale Euclidean groups are nonunimodular. Results involving the right Haar measure can be easily deduced from the results associated with the left Haar measure. Therefore, for the remainder of this chapter, unless stated otherwise, we shall use the left Haar measure and denote it by dg. Let L2 (G, dg) denote the Hilbert space of all complex-valued, squareintegrable functions on the group G, that is, f ∈ L2 (G, dg), if f (g)2 dg < ∞. (3) G
DECONVOLUTION OVER GROUPS
261
Let fin , fout ∈ L2 (G, dg) be two signals, representing the input and output of a linear system S , that is, S [fin (g)] = fout (g). If S has the following group invariant property
S fin (hg) = fout (hg), (4) for all g, h ∈ G, then the input–output relationship between fin and fout reduces to a convolution over the group G and is given as (fin ∗ Λ)(g) ≡ fout (g) = fin (h)Λ h−1 g dh. (5) G
Here, we refer to Λ as the kernel or the impulse response function of the linear system S . Note that the convolution operation is not necessarily commutative for noncommutative groups. For a function f on G, f (h−1 g) is called a translation of f by h, in the same sense that f (t − τ ) is a translation of a function on R by τ . In particular, [L(h)f ](g) = f (h−1 g) is called a left regular representation, while [R(h)f ](g) = f (gh) is called a right regular representation of the group over L2 (G, dg). Convolution integral can also be expressed in terms of the left regular representation of the group (f ∗ Λ)(g) = f, L(g)Λ∗ = L g −1 f, Λ∗ , (6) where Λ∗ (g) = Λ(g −1 ), Λ being the complex conjugate of Λ, and ·, · is the inner product on L2 (G, dg) defined as f, Λ = f (h)Λ(h) dh. (7) G
Let X be a homogeneous space of G, that is, gX = X. Then, convolution operation over homogeneous spaces is obtained by replacing the left regular representation in Eq. (6) with the quasi-left regular representation: (8) fout (g) = Lq g −1 fin , Λ∗ , where Λ and fin are elements of L2 (X) and Lq (g)fin (t) = fin (g −1 t). Note that both the echo signal in radar/sonar imaging and the tomographic imaging process described in Sections V and VI can be viewed as convolution operations over homogeneous spaces.
262
YAZICI AND YARMAN
B. Fourier Analysis on Groups Fourier analysis on groups allows spectral analysis of signals and system in invariant subspaces determined by the irreducible unitary representations of the underlying group. This requires characterization of the unitary representations, which in return leads to the definition of the Fourier transforms on groups. It was shown that if the group G is a separable, locally compact group of Type I, unique characterizations of the unitary representations can be obtained in terms of the irreducible unitary representations of the group (Naimark, 1959). This class of groups include finite, compact, and algebraic Lie groups, separable locally compact commutative groups, and the majority of well-behaved locally compact groups. Many of the groups involved in engineering applications, such as the affine group, the Heisenberg group, and Euclidean motion group, fall into this class of groups. The following text provides a review of the Fourier analysis on locally compact groups of Type I for both unimodular and nonunimodular case. Definitions of the basic concepts in group representation theory are provided in Appendix A. For a detailed discussion of the topic, the reader is referred to Groove (1997), Milies and Sehgal (2002), and Onishchik (1993). Let U (g, λ) be the λth irreducible unitary representation of a separable locally compact group of Type I. Then, the operator valued Fourier transform on G maps each f in L2 (G, dg) to a family {fˆ(λ)} of bounded operators, where each fˆ(λ) is defined as F (f )(λ) ≡ fˆ(λ) = dgf (g)U g −1 , λ , (9) G
or in component form fˆi,j (λ) =
f (g)Ui,j g −1 , λ dg,
(10)
G
where fˆi,j (λ) and Ui,j (g, λ) denote the (i, j )th matrix elements of fˆ(λ) and ! and is U (g, λ), respectively. The collection of all λ values is denoted by G called the dual of the group G. The collection of Fourier transforms {fˆ(λ)} ! is called the spectrum of the function f . The Fourier transform for all λ ∈ G ! dν(λ)), where is a one-to-one onto transformation from L2 (G, dg) to L2 (G, ! dν(λ) denotes the Plancherel measure in G. An important property of the operator-valued Fourier transform is that the group convolution becomes operator multiplication in the Fourier domain
F (f1 ∗ f2 )(λ) = F (f2 )(λ)F (f1 )(λ).
(11)
DECONVOLUTION OVER GROUPS
263
For the case of separable locally compact groups of Type I, the Fourier inversion and the Plancherel formulas are given by Duflo and Moore (1976): f (g) = trace fˆ(λ)ξλ−2 U † (g, λ) dν(λ), (12) ! G
and
f (g)2 dg =
† trace fˆ(λ)ξλ−1 fˆ(λ)ξλ−1 dν(λ),
(13)
! G
G
where fˆ† (λ) denotes the adjoint of fˆ(λ) and {ξλ } is a family of Hermitian positive definite operators with densely defined inverses satisfying the following conditions: ! and • {fˆ(λ)ξλ } is trace class for each λ ∈ G † 1/2 • U (g, λ)ξλ U (g, λ) = Δ (g)ξλ where Δ(g) is the modular function of the group G. For a given locally compact group of Type I, both the family of operators {ξλ } and the Plancherel measure can be determined uniquely. When the group is unimodular, ξλ = Iλ , Iλ being the identity operator. Thus, the Fourier inversion and the Plancherel formulas become f (g) = trace fˆ(λ)U † (g, λ) dν(λ), (14) ! G
and
G
f (g)2 dg =
trace fˆ(λ)fˆ† (λ) dν(λ).
(15)
! G
In the following section, Fourier transform is used to develop spectral decomposition theorems for a class of nonstationary stochastic processes.
III. G ROUP S TATIONARY P ROCESSES One of the key components of our development is the generalized secondorder stationary processes indexed by topological groups (Diaconis, 1988; Hannan, 1965; Yaglom, 1961). These processes are nonstationary in the classical sense but exhibit invariance under the right or left regular transformations of the group.
264
YAZICI AND YARMAN
Let G denote a group and g, h be its elements. Then, we call a process X(g), g ∈ G, group stationary if X(g) ≡ X(hg),
g, h ∈ G,
(16)
X(g) ≡ X(gh),
g, h ∈ G,
(17)
or where ≡ denotes equality in terms of all finite joint probability distributions of {X(g), g ∈ G}. Depending on whether a random process satisfies Eq. (16), Eq. (17), or both, it is called left, right, or two-way group stationary. Note that for commutative groups, the process is always two-way group stationary. Second-order group stationarity is a weaker condition than group stationarity. A process X(g), g ∈ G, is said to be second-order right group stationary if
(18) E X(g)X(h) = R gh−1 , g, h ∈ G, and second-order left group stationary if
E X(g)X(h) = R h−1 g ,
g, h ∈ G,
(19)
where R is a positive definite function defined on the group G. We shall refer to R as the autocorrelation function of X(g), g ∈ G. A process is called second-order group stationary if it is both left and right stationary. The central fact in the analysis of group stationary processes is the existence of the spectral decomposition, which is facilitated by the group representation theory. For separable locally compact groups of Type I, the left group stationary processes admit the following spectral decomposition: (20) X(g) = trace U (g, λ)Z(dλ) , ! G
and
R(g) =
trace U (g, λ)F (dλ) ,
(21)
! G
where R is the autocorrelation function defined in Eq. (19), Z(dλ) is a random ! and F (dλ) is an operator measure over G ! satisfying linear operator over G, trace F (dλ) < ∞. (22) ! G
Unless the process is both right and left group stationary, the matrix entries of the random linear operator Z(·) is column-wise correlated and row-wise
DECONVOLUTION OVER GROUPS
265
uncorrelated with correlation coefficients equal to the corresponding matrix entries of the operator F (·). For a detailed discussion of the topic, refer to Yaglom (1961) and Hannan (1965). Now, let us assume that the autocorrelation function R ∈ L2 (G, dg). We define the spectral density function S of a group stationary process as S(λ) ≡ F (R)(λ) = dgR(g)U (g, λ). (23) G
For unimodular groups, S is a bounded nonnegative definite operator, defined ! For nonunimodular groups, S can be modified to on the dual space G. S = Sξ , so that the resulting operator is Hermitian, nonnegative definite. The spectral density function represents the correlation structure of the random linear operators Z(·). Some examples of group stationary processes are provided below. 1. Shift Stationary Processes The simplest example is the ordinary stationary processes defined on the real line with the addition operation, that is, the additive group (R, +),
E X(t1 )X(t2 ) = R(t1 − t2 ), −∞ < t1 , t2 < ∞. (24) One-dimensional irreducible unitary representations of the additive group (R, +) are given by the complex exponential functions eiωt , −∞ < t1 , t2 < ∞. Hence, the spectral decomposition of the shift stationary processes is given by the ordinary Fourier transform. 2. Scale Stationary Processes Another important class of group stationary processes is defined by the multiplicative group, on the positive real line, that is, (R+ , ×). These processes exhibit invariance with respect to translation in scale and are referred to as scale stationary processes (Yazici and Kashyap, 1997). Their autocorrelation function satisfies the following invariance property:
E X(t1 )X(t2 ) = R(t1 /t2 ), 0 < t1 , t2 < ∞. (25) One-dimensional irreducible unitary representations of the multiplicative group are given by eiω log t , t > 0. As a result, the spectral decomposition of the scale stationary processes is given by the Mellin transform. Detailed analysis of the self-similar processes based on the concept of scale stationarity can be found in the first author’s previous works (Yazici, 1997; Yazici and Izzetoglu, 2002; Yazici et al., 2001; Yazici and Kashyap, 1997).
266
YAZICI AND YARMAN
3. Filtered White Noise A trivial group stationary process is the white noise process with autocorrelation function given by σ 2 δ(g), g ∈ G, where σ 2 denotes the variance of the process. Here, δ(g) is the Dirac delta function over the group G, supported at the identity element and has its integral over G equal to one. It was shown in Yazici (2004) that the convolution of white noise process with any f ∈ L2 (G, dg) leads to a second-order left group stationary process. Apart from these examples, detailed discussions on stochastic processes invariant with respect to multiplicative group, affine group, and two- and three-dimensional rotational group action can be found in Yazici and Kashyap (1997), Yazici (1997, 2004), Yazici et al. (2001), Yazici and Izzetoglu (2002), Yadrenko (1983), and Tewfik (1987) and references therein.
IV. W IENER F ILTERING OVER G ROUPS In Yazici (2004), we introduced a deconvolution method over locally compact groups of Type I. This section provides a review of this method. Let S be a left group invariant system defined on a locally compact group G of Type I, and let fout be the noisy output of the system for an input fin , given by the following convolution integral: fout (g) = fin (h)Λ h−1 g dh + n(g), (26) G
where Λ : G → C is the complex-valued, square-summable impulse response function of the group invariant system S , fin is the unknown signal, and n is the additive noise. Both fin and n are left group stationary and take values in the field of complex numbers C. Without loss of generality, we assume that E[fin (g)] = E[n(g)] = 0 and that fin and n are uncorrelated, that is,
E fin (g)n(g) = 0. (27) Our objective is to design a linear filter W on G × G to estimate fin , that is, (28) f˜in (g) = W (g, h)fout (h) dh, G
given the measurements fout , systems response function Λ, and a priori statistical information on fin and n. Under stationary assumption, it can be shown that W (hg1 , hg2 ) = W (g1 , g2 ) for all g1 , g2 , h ∈ G. Therefore, the estimate of f˜in of the signal
DECONVOLUTION OVER GROUPS
267
fin is given as the following convolution integral: ˜ fin (g) = fout (h)W h−1 g dh,
(29)
G
where W (g) = W (g, e). Let εW (g) = f˜in (g)−fin (g). We design the filter W so that the mean square error 2
E εW (g) dg, (30) G
is minimized. It can be shown that the Fourier transform of the minimum mean square error (MMSE) deconvolution filter Wopt is given by
!opt (λ) = Sfin (λ)Λ ! (31) !† (λ) Λ(λ)S ! !† (λ) + Sn (λ) −1 , λ ∈ G. W fin (λ)Λ ! is the Fourier transform of the convolution filter Λ, and Λ !† denotes Here, Λ ! Sfin and Sn are operator-valued spectral density the adjoint of the operator Λ. functions of the signal and noise, respectively. The spectral density function of the MMSE between the signal and its filtered estimate is given by !opt (λ)Λ(λ) ! Sfin (λ), (32) Sε (λ) = I − W where I denotes the identity operator. For the derivation of the Wiener filter stated above, we refer the reader to Theorem 2 in Yazici (2004). A. Remarks The Fourier domain inverse filtering can be summarized by the following diagram: fin
Λ
F −1
f0 in
fout F
!opt W
(33)
f1 out .
! !† (λ) + Sn (λ) is a nonnegative definite operator. Thus, Note that Λ(λ)S fin (λ)Λ !† (λ) + ! its inverse exists but may be unbounded. In that case, [Λ(λ)S fin (λ)Λ −1 Sn (λ)] can be interpreted as the pseudo-inverse. Similar to the classical Wiener filtering, the results stated above can be extended to the case where the signal and the noise are correlated (Yazici, 2004).
268
YAZICI AND YARMAN
Note that the proposed Wiener filter provides a regularized solution to the inversion problem. With appropriate choice of prior and noise model, one !† (λ) + Sn (λ) is a positive definite operator with ! can ensure that Λ(λ)S fin (λ)Λ eigenvalues away from zero. If a priori information on the unknown signal is not available, it can be assumed that Sfin (λ) = I (λ). Furthermore, when the measurements are free of noise, the Wiener filter becomes the minimum norm linear least squares filter given by
!opt (λ) = Λ !† (λ) Λ(λ) ! Λ !† (λ) −1 . W (34) ! is compact, this estimate is unstable in the sense that small However, if Λ(λ) deviations in measurements lead to large fluctuations in the estimation. The zero-order Tikhonov regularization (Tikhonov and Arsenin, 1977) of the form
!opt (λ) = Λ !† (λ) Λ(λ) ! Λ !† (λ) + σ 2 I (λ) −1 W (35) is equivalent to the case when Sfin (λ) = I (λ) and Sn (λ) = σ 2 I (λ). Note that in Kyatkin and Chirikjian (1998), Chirikjian et al. provided Eq. (35) as a solution to the convolution equation over the Euclidean motion groups, which is a special case of the proposed Wiener filtering method. In the next two sections, we introduce two image reconstruction problems, namely wideband extended range-Doppler imaging for radar/sonar and inversion of Radon and exponential Radon transforms for transmission and emission tomography. Both problems are addressed within the Wiener filtering framework introduced in Sections II to IV.
V. W IDEBAND E XTENDED R ANGE -D OPPLER I MAGING In radar/sonar imaging, the transmitter emits an electromagnetic signal. The signal is reflected off a target and detected by the receiver as the echo signal. Assuming negligible acceleration of the reflector, the echo model from a point reflector is given as the delayed and scaled replica of the transmitted pulse p (Cook and Bernfeld, 1967; Miller, 1991; Swick, 1969; Weiss, 1994) √ (36) e(t) = sp s(t − τ ) , where τ is the time delay and s is the time scale or Doppler stretch. The term s is given as s = c−v c+v , where c is the speed of the transmitted signal and v is the radial velocity of the reflector. When the transmitted signal is narrowband, the above echo model Eq. (36) can be approximated as e(t) = p(t − τ ) eiωt ,
(37)
DECONVOLUTION OVER GROUPS
269
where ω is called the Doppler shift. In general, Eq. (36) is referred to as the wideband echo model and Eq. (37) is referred to as the narrowband echo model. The narrowband model is sufficient for most radar applications. However, for sonar and ultrawideband radar, the wideband model is needed (Taylor, 1995). It is often desirable to image a dense group of reflectors. This means that the target environment is composed of several objects or a physically large object with a continuum of reflectors and that the reflectors are very close in rangeDoppler space. This dense group of reflectors is described by a reflectivity density function in the range-Doppler space. The received signal is modeled as a weighted average (Blahut, 1991; Miller, 1991; Naparst, 1991) of the time delayed and scaled version of the transmitted pulse. For wideband signals, the echo model is given as ∞ ∞ e(t) = −∞ 0
t − τ ds 1 TW (s, τ ) √ p dτ, s s2 s
(38)
where TW is the wideband reflectivity density function associated with each time delayed and scaled version of the transmitted signal p. For narrowband, the echo model is given as ∞ e(t) =
TN (ω, τ )p(t − τ ) eiωt dτ dω,
(39)
−∞
where TN (ω, τ ) is the narrowband reflectivity density function associated with each time-delayed and frequency-shifted version of the transmitted signal p. Note that, for wideband signals, the output of the match filter becomes a convolution integral over the affine group ∞ ∞ Ac (s, τ ) = −∞ 0
s τ − b da 1 , TW (a, b) √ Aa db, a a a2 s
(40)
where Ac denotes the cross-ambiguity function and Aa denotes the autoambiguity function (Miller, 1991). In general, the received echo is contaminated with clutter and noise. Here, we model clutter as an echo signal from unwanted scatterers and noise as the thermal noise. Therefore, the received signal is modeled as y(t) = eT (t) + eC (t) + n(t),
(41)
where n(t) is the thermal noise, eT (t) is the echo signal from the target modeled as in Eq. (38), and eC (t) is the echo signal from the clutter modeled
270
YAZICI AND YARMAN
F IGURE 1.
A block diagram of the range-Doppler echo model.
as ∞ ∞ eC (t) = −∞ 0
t − τ ds 1 dτ. C(s, t) √ p s s s
(42)
Here, we refer C as the clutter reflectivity density function. Figure 1 displays the components of the radar/sonar range-Doppler echo model. The goal in range-Doppler imaging is to estimate TW (a, b) given the transmitted and received signals and clutter, noise, and target statistics. Clearly, the transmitted pulse plays a central role in the estimation of the target reflectivity density function. The two fundamental problems addressed in this chapter can be summarized as follows: 1. Receiver design problem. How can we recover the wideband target reflectivity density function TW in range-Doppler space from the measurements y(t), t ∈ R, embedded in clutter given a priori target and clutter information? 2. Waveform design problem. Given the echo model embedded in clutter, what is the best set of waveforms to transmit for the optimal recovery of the target reflectivity density function given the prior information on the target and background clutter? The wideband model as described in Eq. (38) has been studied before. (See, Miller, 1991; Cook and Bernfeld, 1967; Swick, 1969; Weiss, 1994; Naparst, 1991; Rebollo-Neira et al., 1997, 2000 and references therein.) Naparst (1991) and Miller (1991) suggested the use of the Fourier theory of the affine group and proposed a method to reconstruct the target reflectivity density function in a deterministic setting. Weiss (1994) suggested use of the wavelet transform for the image recovery in a deterministic setting. However, this approach requires target reflectivity function to be in the reproducing kernel Hilbert space of the transmitted wavelet signal. In Rebollo-Neira et al. (1997, 2000),
DECONVOLUTION OVER GROUPS
271
the approach in Weiss (1994) is extended to include affine frames. In all these studies, the received signal is modeled clutter and noise free, which is not a realistic assumption for radar or sonar measurements. Our approach is based on the observation that the received echo can be treated as the Fourier transform of the reflectivity density function evaluated at the transmitted pulse. We model target and clutter reflectivity as stationary processes on the affine group and use the Wiener filtering approach presented in Section IV to remove clutter by transmitting clutter rejecting waveforms. Our treatment starts with the review of the affine group and its Fourier transform. Next, we discuss the estimation of target reflectivity function and design of clutter rejecting waveforms. Note that this study does not address the suppression of additive noise. For the treatment of this case, see Yazici and Xie (2005). A. Fourier Theory of the Affine Group 1. Affine Group Affine group or the ax +b group is a two-parameter Lie group whose elements are given by 2 × 2 matrices of the form a b (43) , a ∈ R+ , b ∈ R, 0 1 parameterized by the scale parameter a and the translation parameter b. The affine group operation is the usual matrix multiplication, that is, (a, b)(c, d) = (ac, ad + b), and the inverse elements are given by the matrix inversion (a, b)−1 = (a −1 , −a −1 b). This defines the affine group as a semidirect product of the additive group (R, +) and the multiplicative group (R+ , ×). For the rest of the chapter, the affine group is denoted by A. Let (s, τ ) ∈ A, and let L2 (A, s −2 ds dτ ) and L1 (A, s −2 ds dτ ) denote the space of square summable and absolutely summable functions over A, respectively, that is, f (s, τ )2 ds dτ < +∞, f (s, τ ) ds dτ < +∞, (44) 2 s s2 A
A
where s −2 ds dτ is the left Haar measure of the affine group. The inner product of two functions f1 and f2 in L2 (A, s −2 ds dτ ) is defined as ds (45) f1 , f2 = f1 (s, τ )f2 (s, τ ) 2 dτ. s A
272
YAZICI AND YARMAN
The affine group is a nonunimodular group, where the right Haar measure is s −1 ds dτ . Note that for the affine group, the modular function is given by Δ(s, τ ) = s −1 . 2. Fourier Transform over the Affine Group There are exactly two nonequivalent, infinite dimensional, irreducible, unitary representations of the affine group, that is, λ ∈ {+, −} and U ((s, τ ), ±) = U± (s, τ ). Let U+ act on the representation space H+ that consists of functions ϕ+ , whose Fourier transforms are supported on the right half-line and U− act on H− , the orthogonal complement of H+ , that consists of functions ϕ− whose Fourier transforms are supported on the left half-line. Note that L2 (R, dt) is a direct sum of H+ and H− , that is, L2 (R) = H+ ⊕ H− . Then, the representations t −τ 1 (46) U± (s, τ )ϕ± (t) = √ ϕ± s s are unitary, nonequivalent, and irreducible in the space H+ and H− , respectively. The affine Fourier transform of a function f ∈ L2 (A, s −2 ds dτ ) is defined as ∞ ∞
F± (f ) =
s −2 ds dτf (s, τ )U± (s, τ ).
(47)
−∞ 0
The inverse affine Fourier transform is given by trace U±† (s, τ )F± (f )ξ± , f (s, τ ) =
(48)
±
where U±† (s, τ ) denote the adjoint of U± (s, τ ) and ξ± are the Hermitian positive definite operators introduced in Eq. (12) for the affine group. They are defined as ξ± ϕ± (t) = ∓
i dϕ± (t). 2π dt
(49)
The convolution of two functions f1 , f2 over the affine group is given by
∞ ∞ (f1 ∗ f2 )(s, τ ) =
f1 (a, b)f2 −∞ 0
s τ − b da , db, a a a2
(s, τ ) ∈ A. (50)
DECONVOLUTION OVER GROUPS
273
Under the affine group Fourier transform, the convolution of two functions over the affine group becomes operator composition. More specifically,
F± (f1 ∗ f2 ) = F± (f1 )F± (f2 ).
(51)
n {s± (t)}
Let denote a set of orthonormal differentiable bases for H± , n (t) + s n (t), U (s, τ ) = U (s, τ ) ⊕ U (s, τ ), respectively. Define s n (t) = s+ + − − (s, τ ) ∈ A, and ξ = ξ+ ⊕ ξ− . Then for any p ∈ L2 (R, dx), U (s, τ )p = U+ (s, τ )p+ + U− (s, τ )p−
(52)
and if p is differentiable, ξp = ξ+ p+ + ξ− p− ,
(53)
where p+ and p− are orthogonal components of p in H+ and H− , respectively. n (t)} of H , the inverse For a given orthonormal, differentiable basis {s± ± affine Fourier transform can be expressed as † n n f (s, τ ) = , s± U± (s, τ )F± (f )ξ± s± ±
n
= F (f )ξ s n , U (s, τ )s n ,
(54)
n
where F (f ) = F+ (f ) ⊕ F− (f ). B. Target Reflectivity Estimation Observe that the echo model [Eq. (38)] is, in fact, the affine Fourier transform of the target reflectivity density function TW evaluated at the transmitted signal p; eT (t) = F (TW )p(t),
(55)
where eT is the received target echo. Now, assume that the unknown target reflectivity density function, TW (a, b), is a left affine stationary process contaminated with additive left affine stationary clutter C(a, b) on the rangeDoppler plane. It follows from Eqs. (29) and (31) that the optimal estimation for the target reflectivity density function in the mean square error sense is given by TW = (TW + C) ∗ Wopt . Here, Wopt is the Wiener filter over the affine group given by T C −1 T F± (Wopt ) = S± + S± S± ,
(56)
(57)
274
YAZICI AND YARMAN
T and S C are the spectral density operators of the target and clutter, where S± ± respectively. The affine Wiener filter can be estimated from a priori target and clutter information. Such information is routinely compiled for air defense radar (see Nathanson et al., 1999). The affine spectra, S T and S C , of the target and clutter can be estimated from such information. Alternatively, Eq. (56) can be expressed as
F± (TW ) = F± (TW + C)F± (Wopt ) or TW (s, τ ) =
±
trace U±† (s, τ )F± (TW + C)F± (Wopt )ξ .
(58)
(59)
This estimate can be implemented in various forms leading to different adaptive receive and transmit algorithms (Yazici and Xie, 2005). Note that T and S C , are not Hermitian operators due to both target and clutter spectra, S± ± the nonunimodular nature of the affine group. However, it can be shown that T ξ and S C ξ are Hermitian and nonnegative definite. So, we define S± ± T T S± = S± ξ
and
C C S± = S± ξ.
(60)
Then, Eq. (59) can be rewritten as T C −1 T TW (s, τ ) = S± . (61) trace U±† (s, τ )F± (TW + C)ξ S± + S± ±
Below, we describe an algorithm to implement the estimate given in Eq. (61) and discuss how the estimation problem couples with the waveform design problem. 1. Receiver Design n Let {s± (t)} be a set of orthonormal basis for H± , respectively. Then, the target reflectivity estimate in Eq. (61) can be expressed as T C −1 T n n S± + TW (s, τ ) = S± s± , U± (s, τ )s± F± (TW + C)ξ S± ±
n
±
n
n n = F± (TW + C)˜s± , U± (s, τ )s± ,
(62)
where T n C −1 T n S± s± . =ξ S± + S± s˜±
(63)
DECONVOLUTION OVER GROUPS
275
n n + s˜− is chosen as the transmitted pulse, then y n (t) = Note that if s˜ n = s˜+ n F (TW + C)˜s becomes the received echo, and Eq. (62) can be reexpressed as
TW (s, τ ) =
y n , U (s, τ )s n .
(64)
n
This observation leads to the following algorithm for receiver and waveform design. Algorithm. n } for H . 1. Choose a set of orthonormal basis functions {s± ± n n n C −1T n T + 2. Transmit pulse s˜ n = s˜+ + s˜− , where s˜± = ξ( S± S± ) S± s± . 3. At the receiver side, perform affine match filtering for each channel as follows: zn (s, τ ) = y n , U (s, τ )s n , (65)
where y n is the received echo for the nth channel. 4. Coherently sum all channels TW (s, τ ) = zn (s, τ ).
(66)
n
So far, we have not specified how we can choose the set of orthonormal n basis functions {s± }. Therefore, the wideband image formation algorithms described above are valid independent of the choice of transmitted waveforms. n n } or their filtered counterparts {˜s± } do not need The orthogonal functions {s± to be wideband signals. Thus, this reconstruction formula leads to a scenario where there are multiple radars/sonars operating independently, each with a limited low-resolution aperture (i.e., narrowband transmission). Nevertheless, appropriate processing and fusion of data from multiple narrowband sensors lead to formation of a synthetic wideband image. C. Waveform Design Note that in the image reconstruction described above, the MMSE is achieved irrespective of the basis functions or the transmitted pulses chosen. However, the requirement is that a complete set of modified basis functions {˜s n } must be transmitted to achieve the MMSE. In reality, we are only allowed to transmit finite number of, for instance N, waveforms. So the question is how to choose the N best waveforms to achieve MMSE sense.
276
YAZICI AND YARMAN
Observe that the MMSE estimate in Eq. (59) can be written as trace U±† (s, τ )F± (TW )ξ TW (s, τ ) =
(67)
±
=
n n F± (TW )ξ s± , U± (s, τ )s± , ±
(68)
n
n n n n } are orthonormal bases for H± , s± = s+ + s− . where {s± n n Let Tn (s, τ ) = F (TW )ξ s , U (s, τ )s . Then, Tn (s, τ ). TW (s, τ ) =
(69)
n
It can be easily verified that TW (s, τ ) and Tn (s, τ ) are affine stationary processes with the following properties: 1. Tn (s, τ ) and Tm (s, τ ) are jointly affine stationary. 2. E[Tn (s, τ )Tm (s, τ )] = 0 if n = m. 3. E[Tn (s, τ )TW (s, τ )] = E[|Tn (s, τ )|2 ] = F (RTW )ξ s n , s n , where RTW is the autocorrelation function of TW (s, τ ). 4. E[|TW (s, τ )|2 ] = ± trace(F± (RTW )ξ ).
∗ ∗ (R + R ) ∗ W , where W ∗ (s, τ ) = W ((s, τ )−1 ) and 5. RTW = Wopt T C opt opt opt RT , RC are autocorrelation functions of target reflectivity density process T (s, τ ) and clutter C(s, τ ), respectively.
It follows from the above properties that if only N pulses are transmitted, then the mean square error is given by 2 , N N N
2 2 Tn = E |TW | + E |Tn | − 2 E[Tn TW ] E TW − n=1
n=1
=
±
=
±
=
trace F± (RTW )ξ −
Fπ (RTW )ξ s n , s n
n=1 N n n trace F± (RTW )ξ − F± (RTW )ξ s± , s± ± n=1
∞
n n F± (RTW )ξ s± , s± .
± n=N+1
Note that
n=1
N
(70)
277
DECONVOLUTION OVER GROUPS
∗ F± (RTW )ξ = F± Wopt ∗ (RT + RN ) ∗ Wopt ξ T T ∗ N N −1 T = F± (Wopt ) S± + S± S± ξ S± + S± ∗ T = F± (Wopt )S± .
(71)
n , n = 1, . . . , N, are chosen as the Therefore, the MMSE is achieved if s± ∗ ) T corresponding to the N largest S± eigenfunctions of the operators F± (Wopt eigenvalues. Thus, step 1 of the algorithm introduced in the previous subsecn }, n = 1, . . . , N are tion can be modified so that the orthonormal functions {s± ∗ T S± corresponding to N largest chosen as the unit eigenfunctions of F± (Wopt ) eigenvalues.
D. Numerical Experiments For ease of computation, transmitted waveforms used in the numerical simulations are derived from the Laguerre polynomials. Let n (ω) = Ln−1 (ω)e−ω/2 , sˆ+
ω ∈ R+ , n ∈ N,
(72)
n n where sˆ+ (ω) is the Fourier transform of s+ (t) and Ln−1 , n ∈ N, are Laguerre polynomials defined by
L0 (x) = 1,
(73)
L1 (x) = −x + 1,
(74)
Ln+1 (x) =
2n + 1 − x n Ln (x) − Ln−1 (x), n+1 n+1
n ∈ N.
(75)
It is well known that (Abramowitz and Stegun, 1972) ∞
−x
e
Lm (x)Ln (x) dx =
1 m = n, 0 else.
(76)
0 n n } is an orthonormal basis for L2 (R+ , dx). Let s+ be the Therefore, {ˆs+ n n standard inverse Fourier transform of sˆ+ . Then, {s+ } is an orthonormal basis n (ω) = sˆ n (−ω), ω ∈ R− . Then, s n (t) = s n (t), t ∈ R, and {s n } for H+ . Let sˆ− + − + − are orthonormal bases for H− .
278
YAZICI AND YARMAN
We generated realizations of the target and clutter based on the following C n T and S± with respect to bases {s± }: spectral density operators S± ⎛
10 4.5 ⎜4.5 9 4 ⎜ ⎜ 4 8 3.5 ⎜ ⎜ 3.5 7 3 ⎜ 3 6 2.5 ⎜ T S± = ⎜ 2.5 5 2 ⎜ ⎜ 2 4 1.5 ⎜ ⎜ 1.5 3 ⎜ ⎝ 1 ⎛ 07×7 ⎜ ⎜ ⎜ ⎜ C S± = ⎜ ⎜ ⎜ ⎜ ⎝
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ 1 ⎟ 2 0.5⎠ 0.5 1
⎞
(77)
⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎠
4 8 10 12 ..
and
.
40 Figures 2a and 2b show a realization of the target and target contaminated with clutter. We synthesized 10 realizations of the target and clutter at various signalto-clutter ratios (SCR) defined as SCR = 20 log10 (σs2 /σc2 ), where σs2 and σc2 are the target and clutter variance, respectively. Each realization is generated C T and σ 2 S± according to spectral densities σs2 c S± . The transmitted waveforms are chosen as T −1 T n S + S s , sn = ξ SC
n = 1, . . . , 20.
(78)
Figures 2c and 2d present results obtained using the algorithm in Section V.B and Naparst’s (1991) method. For Naparst’s method, transmitted pulses are the Hermite polynomial basis (Naparst, 1991). Figure 3 shows the mean square error (MSE) between the true and estimated reflectivity density functions for the proposed method and Naparst’s method at different SCR levels. MSE is calculated by averaging error based on 10 realizations of the true target and the estimated target at each SCR level. Our numerical study shows that the proposed method yields lower MSE than Naparst’s method, particularly for low SCR.
DECONVOLUTION OVER GROUPS
279
(a) Target
(b) Target embedded in clutter
(c) Naparst’s method
(d) Proposed method
F IGURE 2. Estimated target reflectivity function embedded in clutter. (a) True target reflectivity function. (b) Target reflectivity function embedded in clutter. (c) Estimated target reflectivity function using Naparst’s method. (d) Estimated target reflectivity function by the proposed method.
VI. R ADON AND E XPONENTIAL R ADON T RANSFORMS In transmission computed tomography, a X-ray beam with known energy is sent through an object, and attenuated X-rays are collected by an array of detectors. The attenuation in the final X-ray beam provides a means of determining the mass density of the object along the path of the X-ray. In two dimensions, the relationship between the attenuation and mass density along the X-ray path is given by the Radon transform. In general, the Radon transform R maps an integrable function f (unknown image) over RN to its integral over the hyperplanes of RN (Radon, 1917; Deans, 1983; Natterer, 1986; Helgason, 1999) (Rf )(ϑ, r) f (x)δ(x · ϑ − r) dx, (79) RN
280
YAZICI AND YARMAN
F IGURE 3.
Mean square error between the estimated and true target reflectivity function.
where ϑ is the unit normal of the hyperplane, which is an element of the unit sphere S N−1 in RN , and r ≥ 0 is the distance from the hyperplane to the origin. While for the current discussion we will take N = 2, the results are valid for any N ∈ Z greater than 2 (Yarman and Yazici, 2005c). There has been an enormous interest in both applied and theoretical communities in analyzing and developing methods of inversion for the Radon transform (Barrett, 1984; Cormack, 1963, 1964; Gelfand et al., 1966; Helgason, 1999; Natterer, 1986; Pintsov, 1989). Although the research effort has been very diverse, inversion of the Radon transform can be broadly categorized into two approaches: analytic and algebraic. The analytic approach covers the works of Cormack (1964), Natterer (1986), and Barrett (1984) among many others. The research effort in this approach ultimately aims at numerical implementation of the Radon transform inversion. The foremost study in the algebraic approach can be found in the work of Helgason (1999) and the references therein. The algebraic approach is mainly concerned with the generalization of the Radon transform and the development of associated analysis methods in a group theoretic setting (Helgason, 1999, 2000; Gelfand et al., 1966; Rouviére, 2001; Strichartz, 1981). In Helgason (1999), the domain of the unknown function and its projections are given by the homogeneous spaces of the Euclidean motion group, M(N), of the N-dimensional
DECONVOLUTION OVER GROUPS
281
Euclidean space, namely by M(N)/SO(N) and M(N)/(Z2 × M(N − 1)). The underlying group structure enables development of a generalization of the Radon transform for the homogeneous spaces of locally compact unimodular groups. This leads to a generalization of the backprojection operator and the development of generalized filtered backprojection-type inversion methods for the generalized Radon transforms. In emission tomography, an object is identified with the emission distribution of a radiochemical substance inside the object. The measurements depend on both the emission distribution of the radiochemical substance and the attenuation distribution of the object. For a uniform attenuation, the relationship between the measurements and the emission distribution is represented by the exponential Radon transform. The exponential Radon transform of a compactly supported real valued function f over R2 for a uniform attenuation coefficient μ ∈ C, is defined as (Natterer, 1986) ⊥ (Tμ f )(ϑ, t) = f (x)δ(x · ϑ − t)eμx·ϑ dx, (80) R2
where t ∈ R, ϑ = (cos θ, sin θ )T is a unit vector on S 1 with θ ∈ [0, 2π ) and θ ⊥ = (− sin θ, cos θ )T . Clearly, Radon transform is a special case of the exponential Radon transform for which μ = 0. The exponential Radon transform constitutes a mathematical model for imaging modalities such as X-ray tomography (μ = 0) (Cormack, 1963), single photon emission tomography (SPECT) (μ ∈ R) (Tretiak and Metz, 1980), and optical polarization tomography of stress tensor field (μ ∈ iR) (Puro, 2001). A number of different approaches have been proposed for the exponential Radon transform inversion (Bellini et al., 1979; Tretiak and Metz, 1980; Inouye et al., 1989; Hawkins et al., 1988; Kuchment and Shneiberg, 1994; Metz and Pan, 1995). Bellini et al. (1979) reduced the inversion of the exponential Radon transform to finding the solution of an ordinary differential equation that leads to a relationship between the circular harmonic decomposition of the Fourier transform of the projections (Tμ f ) and circular harmonic decomposition of the Fourier transform of the function f . An alternative method based on the circular harmonic decomposition of the Fourier transform of f and (Tμ f ) was derived by Tretiak and Metz (1980) and Inouye et al. (1989) using two different approaches. Tretiak and Metz (1980) also introduced a filtered backprojection-type inversion method. Hawkins et al. (1988) used the filtered backprojection method introduced by Tretiak and Metz together with the circular harmonic decomposition and expressed the circular harmonic decomposition of the function f in terms of the circular harmonic decomposition of its projections
282
YAZICI AND YARMAN
(Tμ f ). Later the filtered backprojection method was extended to the angledependent exponential Radon transform by Kuchment and Shneiberg (1994). Reconstruction of images from the data collected by the aforementioned imaging systems requires inversion of the exponential Radon transform. Here we present an alternative deconvolution-type inversion approach for the Radon and exponential Radon transforms based on the harmonic analysis of the Euclidean motion group. The proposed deconvolution-type inversion method is a special case of the Wiener filtering method presented in Section IV. In the current discussion, we address the inversion problem in a deterministic setting. However, it should be noted that many emission tomography applications require rigorous treatment of noise, which can be addressed by extending the approach presented here to the statistical setting introduced in Section IV. Furthermore, the inversion algorithm can be implemented efficiently using the fast Fourier algorithms over M(2) (Kyatkin and Chirikjian, 2000; Yarman and Yazici, 2003, 2005a, 2005b, 2005c, 2005d). A. Fourier Transform over M(2) The rigid motions of R2 form a group called the Euclidean motion group M(2). The elements of the group are the 3 × 3 dimensional matrices of the form Rθ r (81) , Rθ ∈ SO(2), r ∈ R2 , (Rθ , r) = T 0 1 parameterized by a rotation component θ and a translation component r. SO(2) is the special orthonormal group, whose elements are 2×2-dimensional matrices with determinant equal to 1, and the group operation is the usual matrix multiplication. The group operation of M(2) is the usual matrix multiplications, and the inverse of an element is obtained by matrix inversion as (Rθ , r)−1 = (Rθ−1 , −Rθ−1 r). Let f1 and f2 be two integrable functions over M(2). Then, the inner product f1 , f2 and convolution (f1 ∗ f2 ) of f1 and f2 are defined as f1 , f2 = f1 (Rφ , x)f2 (Rφ , x) d(Rφ ) dx, (82) M(2)
(f1 ∗ f2 )(Rθ , r) =
f1 (Rθ Rφ , Rθ x + r)f2 Rφ−1 , −Rφ−1 x d(Rφ ) dx,
(83) where (Rθ , r), (Rφ , x) ∈ M(2), and d(Rφ ) dx is the normalized invariant Haar measure on M(2), with d(Rφ ) being the normalized measures on SO(2). M(2)
DECONVOLUTION OVER GROUPS
283
For an explanation of SO(2) d(Rφ ), we refer the reader to Section VII.2 of Natterer (1986). Let f be a function defined over M(2). f is said to be L2 (M(2), d(Rφ ) dx) if f (Rφ , x)2 d(Rφ ) dx < ∞. (84) M(2)
The irreducible unitary representations U ((Rθ , r), λ) of M(2) are given by the following linear operators (Vilenkin, 1988) U (Rθ , r), λ F (s) = e−iλ(r·s) F Rθ−1 s , F ∈ L2 S 1 , d(ω) , (85) where s is a point on the unit circle S 1 , ( · ) is the standard inner product over R2 , and λ is a nonnegative real number. Since the circular harmonics {Sm } form an orthonormal basis for L2 (S 1 , d(ω)) (Seeley, 1966), the matrix elements Umn ((Rθ , r), λ) of U ((Rθ , r), λ) are given by Vilenkin (1988): Umn (Rθ , r), λ = Sm , U (Rθ , r), λ Sn = Sm (ω)e−iλr·ω Sn Rθ−1 ω d(ω), (86) S1
where d(ω) is the normalized Haar measure on the unit circle. If the complex exponentials {einψ }, n ∈ Z are chosen as the orthonormal basis for L2 (S 1 , d(ω)), the matrix elements for the unitary representation U (g, λ) of M(2) become (Vilenkin, 1988) Umn (Rθ , r), λ = eimψ , U (Rθ , r), λ einψ 1 = 2π
2π
e−imψ e−i(r1 λ cos ψ+r2 λ sin ψ) ein(ψ−θ) dψ,
∀m, n ∈ Z. (87)
0
The matrix elements of U ((Rθ , r), λ) satisfy the following properties: −1 Umn Rθ−1 , −Rθ−1 r , λ = Umn (88) (Rθ , r), λ = Unm (g, λ), Umn (Rφ Rθ , Rφ r + x), λ = Umk (Rφ , x), λ Ukn (Rθ , r), λ . (89) k
Furthermore, the matrix elements Umn ((Rθ , r), λ) of U ((Rθ , r), λ) form a complete orthonormal system in L2 (M(2), d(Rφ ) dx).
284
YAZICI AND YARMAN
Let f ∈ L2 (M(2)). The Fourier transform over M(2) of f is defined as (Sugiura, 1975; Vilenkin, 1988) ˆ F (f )(λ) = f (λ) = f (Rθ , r)U (Rθ , r)−1 , λ d(Rθ ) dr, (90) M(2)
where d(Rθ ) dr = drd(Rθ ) is the normalized Haar measure on M(2). Then, the inverse Fourier transform is given by
F −1 (fˆ)(Rθ , r) = f (Rθ , r) ∞ 1 trace fˆ(λ)U (Rθ , r), λ λ dλ. = 2 (2π )
(91)
0
The matrix elements of the Fourier transform over M(2) is given by F (f )mn (λ) = fˆmn (λ) = f (Rθ , r)Umn (Rθ , r)−1 , λ d(Rθ ) dr,
(92)
M(2)
for λ > 0. Then the corresponding inverse Fourier transform is given by
F −1 (fˆmn )(Rθ , r) = f (Rθ , r) ∞ 1 fˆmn (λ)Unm (Rθ , r), λ λ dλ. (93) = 2 (2π ) m,n 0
Let f, f1 , f2 ∈ L2 (M(2), d(Rφ ) d(x)). The Fourier transform over M(2) satisfies the following properties: 1. Adjoint property: f0∗ mn (λ) = fˆnm (λ), where f ∗ (Rθ , r) = f ((Rθ , r)−1 ). 2. Convolution property:
F (f1 ∗ f2 )mn (λ) =
(94)
f!2 mq (λ)f!1 qn (λ).
(95)
∗ f!1 mn (λ)f0 2 nm (λ)λ dλ,
(96)
q
3. Plancherel relation: Let f!1 , f!2 =
1 (2π )2
∞ 0
then f1 , f2 = f!1 , f!2 .
m,n
DECONVOLUTION OVER GROUPS
285
4. SO(2) invariance—I : If f is a SO(2) invariant function over M(2), that is, f (g) = f (r) ∈ L2 (R2 ), then fˆmn (λ) = δm f˜n (−λ),
(97)
where δm is the Kronecker delta function, and f˜n (λ) is the spherical harmonic decomposition of the standard Fourier transform f˜ of f f˜n (λ) =
f˜(λω)Sm (ω) d(ω),
where f˜(ε) =
f (x)e−iε·x dx.
(98)
R2
S1
5. SO(2) invariance—I I : Let f be a SO(2) invariant function over M(2) and ϕ(g) = δ(Rθ )f (r), then ! ϕmn (λ) =
f˜(−λω)Sn (ω)Sm (ω) d(ω),
(99)
S1
and ! ϕ0n (λ) = f˜n (−λ). The Fourier transform over M(2) can be extended to the space of compactly supported functions D (M(2)) and rapidly decreasing functions S (M(2)) and is injective (Sugiura, 1975). Furthermore, Fourier transform over M(2) can be extended to the space of tempered distributions S (M(2)) and compactly supported distributions E (M(2)), together with its properties as shown in Appendix B. B. Radon and Exponential Radon Transforms as Convolutions 1. Radon Transform Let δ(Rφ ) denote the distribution over SO(2) defined as follows: δ(Rφ )ϕ(Rφ ) d(Rφ ) = ϕ(I ),
(100)
SO(2)
where I is the identity rotation. Let g = (Rθ , r), h = (Rφ , x) ∈ M(2), ϑ = RθT e1 , and r1 = r · e1 , where ei denotes the unit vector in R2 with its ith component equal to 1. Then, the Radon transform of a real valued function f can be written as a convolution over M(2) as follows:
286
YAZICI AND YARMAN
f (x)δ x · RθT e1 + r1 dx
(Rf )(ϑ, −r1 ) = R2
=
δ (Rθ x + r) · e1 δ(Rφ )f (x) dx d(Rφ )
SO(2) R2
ΛR (gh)f (h) d(Rφ ) dx
= M(2)
= (ΛR ∗ f ∗ )(g),
(101)
where f ∗ (h) = f (h−1 ), ΛR (h) = δ(x · e1 ), δ being the Dirac delta distribution (Gelfand and Shilov, 1964), and f (h) = δ(Rφ )f (x). Alternative formulations of the convolution representation of the Radon transform were given in Yarman and Yazici (2003, 2005c, 2005d). Note that for the rest of this chapter, we shall use the integral representation of distributions. 2. Exponential and Angle-Dependent Exponential Radon Transforms Motivated by the Radon transform, we now present the convolution representations of the angle-dependent exponential Radon transform. Let ϑ = RθT e1 , and r1 = r · e1 , for some Rθ ∈ SO(2). The angle-dependent exponential Radon transform of a real valued function is given by ⊥ (Tμ(ϑ) f )(ϑ, −r1 ) = f (x)δ(x · ϑ + r1 )eμ(ϑ)x·ϑ dx R2
=
f (x)δ(Rφ )δ (Rθ x + r) · e1
SO(2) R2
× exp μ (Rθ Rφ )T e2 Rθ x · e2 dx d(Rφ ). (102)
f) Multiplying both sides of (102) by eμ(ϑ)r2 , the resulting operator (Tμ(ϑ) can be represented as a convolution over M(2)
(Tμ(ϑ) f )(g) = eμ(ϑ)r1 (Tμ(ϑ) f )(ϑ, y)
= (ΛTμ (θ) ∗ f ∗ )(g),
(103) T
where f (g) = δ(Rθ )f (r) and ΛTμ (ϑ) (g) = δ(r · e1 )eμ(Rθ e2 )r·e2 . From Eq. (103) the angle-dependent exponential Radon transform can be visualized as an operator that fixes the function f while traversing an exponentially weighted projection line, where the weight is determined by the orientation of the line. When μ is independent of the angle, that is, μ(ϑ) = μ
DECONVOLUTION OVER GROUPS
287
for some fixed μ ∈ C, the angle-dependent exponential Radon transform reduces to the exponential Radon transform (Tμ f )(g) = eμr2 (Tμ f )(ϑ, −r1 ). We refer to (Tμ f ) as the modified exponential Radon transform of f . C. Inversion Methods Let A ∈ {R, Tμ }. Then, one can treat (Af ) as a tempered distribution that is the convolution of two distributions (see Appendix B for the definition of convolution of distributions), ΛA , and f given as in Section VI.B (Af )(g) = (ΛA ∗ f ∗ )(g),
(104)
in which f is compactly supported and ΛA ∈ S (M(2)). Using the convolution property of the Fourier transform over M(2), Eq. (104) can be written as (see Appendix B for Fourier transform of distributions over M(2)) † 1 ( Af )(λ) = fˆ (λ)Λ A (λ).
1 If Λ A (λ) is invertible, then f can be obtained as
−1 † 1 . Af )(λ) Λ f (h) = F −1 ( A (λ)
(105)
(106)
1 1 Since Λ A (λ) is rank deficient, we replace the inverse of Λ A (λ) with the !opt (λ) Eq. (35) of Section IV, given as special case of the optimal filter W
† −1 † !opt (λ) = Λ 1 1 1 (107) Λ W A (λ)Λ A (λ) + σ I (λ) A (λ), where I (λ) is the identity operator and σ is a small positive constant. Thus, † † !opt (λ)( f (h) ≈ F −1 W Af ) (λ) (108)
† −1 † 1 1 1 Af ) (λ) , (109) = F −1 Λ ( A (λ) Λ A (λ)Λ A (λ) + σ I (λ) which is a regularized linear least square approximation of f . For Radon and modified exponential Radon transforms with uniform attenuation, ΛA is a SO(2) invariant distribution. Hence, by the SO(2) 1 invariance properties of the Fourier transform over M(2), Λ A is rank one. Then, Eq. (105) can be simplified as 1 ( Af )mn (λ) = f˜m (−λ)Λ A 0n (λ),
(110)
288
YAZICI AND YARMAN
where f˜m (λ) is the spherical harmonic decomposition of the standard Fourier 1 Tμ f )mn (λ) and Λ transform of f . For A = Tμ , ( Tμ 0n (λ) are given by n−m 1 ( Tμ f )mn (λ) = n−m+1 ( Tμ f )−m λ2 + μ2 −μ + λ2 + μ2 λ n−m 2 2 + (Tμ f )−m − λ + μ −μ − λ2 + μ2 , (111)
n −n−1 1 μ2 + λ2 + μ (−1)n Λ Tμ mn (λ) = δm λ n
μ2 + λ2 − μ + . (112)
Tμ f )m (σ ) denotes the circular harmonic decomposition of the oneHere, ( dimensional standard Fourier transform of (Tμ f )(θ, r): (Tμ f )m (σ ) = ( Tμ f )(ω, σ )Sm (ω) d(ω), (113) S1
where
Tμ f )(θ, σ ) = (
(Tμ f )(θ, r)e−iσ r dr.
(114)
R
1 Substitution of ( Tμ f )mn (λ) and Λ Tμ 0n (λ) into Eq. (110) gives the following relationship between f and (Tμ f ) (Yarman and Yazici, 2005a, 2005b) f˜−m (λ) = ( Tμ f )−m λ2 + μ2 8 λm (−μ + λ2 + μ2 )n−m 8 8 × (−1)n ( μ2 + λ2 + μ)n + ( μ2 + λ2 − μ)n + ( Tμ f )−m − λ2 + μ2 8 λm (−μ − λ2 + μ2 )n−m 8 8 × , (115) (−1)n ( μ2 + λ2 + μ)n + ( μ2 + λ2 − μ)n for any integer n. For μ = 0, Eq. (115) gives the spherical harmonic decomposition of the projection slice theorem (Yarman and Yazici, 2005c). 1 Hence, as long as Λ A 0n (λ) = 0, k ∈ Z, a simplified inversion formula can be obtained as (Af )nk (λ) −1 ˜ . (116) δm fn (−λ) = f (x) = F 1 Λ A 0k (λ)
289
DECONVOLUTION OVER GROUPS
Numerical inversion of the Radon and exponential Radon transforms based on Eq. (108) will be presented in Section VI.E. Inversion methods based on the simplified Eqs. (110) and (116), are presented in Yarman and Yazici (2005a, 2005b, 2005c). For the modified angle-dependent exponential Radon transform neither f nor ΛTμ (ϑ) are SO(2) invariant. Therefore, Eq. (106) or Eq. (108) has to be used to recover f . Assuming that μ is known, f can be recovered from f by f (x) =
f (Rφ , x) d(Rφ ).
(117)
SO(2)
In the next section, we present numerical reconstruction algorithms based on Eq. (108).
D. Numerical Algorithms 1. Fourier Transform over M(2) The computational complexity of the inversion algorithms is directly related to the computational complexity of the Fourier transform over M(2). After reordering the integrals, the Fourier coefficients fˆmn (λ) of f over M(2) can be expressed as follows: 9 9
: f (Rθ , r)e
S1
SO(2)
iλr·ω
dr
:
Sm (Rθ−1 ω) d(θ )
Sn (ω) d(ω). (118)
R2
Choosing Sn as the complex exponentials, that is, Sn (ω) = einω , Eq. (118) can be performed in four consecutive standard Fourier transforms. For a detailed description of the Fourier transform algorithm over M(2) based on Eq. (118), we refer to Kyatkin and Chirikjian (2000) and Yarman and Yazici (2003, 2005c). If there are O(K) number of samples in each of SO(2) and R, the computational complexity of the Fourier transform implementation over M(2) described above is O(K 3 log K). If the projections and the unknown function do not depend on the r2 component and SO(2), computation complexity of the Fourier coefficients of the projections and the inverse Fourier transform of fˆmn (λ) reduces to O(K 2 log K).
290
YAZICI AND YARMAN
2. Reconstruction Algorithm The proposed inversion algorithm can be implemented in four steps as shown in the following diagram: f
A
SO(2)
Af
F 1
f
F −1
4
1 A f !opt (λ) 2 W
3
(119)
fˆ
Let f (x) = 0 for |x| > a and hence Af (θ, r1 ) = 0 for |r1 | > a. The four-step reconstruction algorithm can be implemented as follows: 1 f of the projections over M(2) for 1. Compute the Fourier transform of A
kλ0 , k = 0, . . . , K, for some λ0 > 0. m, n = 0, ±1, . . . , ±K, and λ = K+1 1 1 2. For each λ, let [ΛA (λ)], [Af (λ)], and [fˆ (λ)] denote the 2K −1 by 2K −1 matrix representations of the Fourier transforms of ΛA , Af , and f over M(2), respectively. Then, the Fourier transform of f over M(2) can be approximated as
T −1 1 1 1 1 Λ Λ A f (λ) T , (120) fˆ (λ) ≈ Λ A (λ) A (λ) A (λ) + σ I
where σ is a positive constant close to zero; overline and superscript T are the complex conjugation and transpose operations, respectively.
(λ) to obtain f . 3. Take the inverse Fourier transform of fˆmn
4. Integrate f over SO(2) to recover f . E. Numerical Simulations 1. Radon Transform The numerical implementation of the algorithm introduced in the previous subsection is performed on a two-dimensional modified Shepp–Logan phantom of size 129 × 129 pixels, generated by MATLAB’s phantom function. The projections of the phantom are taken from 129 equally spaced angles over 2π and 129 parallel lines for each angle. To avoid aliasing, the image and the projections were zero-padded to 257 pixels in horizontal and vertical and radial directions. The Fourier transform over M(2) was numerically implemented as described in Section VI.D and Yarman and Yazici (2003, 2005d). All the numerical implementations were performed using MATLAB. For comparison purposes, the standard filtered backprojection (FBP) method with Ram–Lak filter is used. Figure 4a presents the reconstructed phantom image. For reconstruction, the regularization factor σ is set to 10−5 . The effect of the regularization term
DECONVOLUTION OVER GROUPS
(a) Proposed algorithm
291
(b) FBP
F IGURE 4. Reconstruction of the modified Shepp–Logan phantom from its Radon transform using proposed algorithms and FBP. (a) Reconstruction by proposed algorithm, for σ = 10−8 . (b) Reconstruction by FBP.
F IGURE 5. transform.
Comparison of details in reconstructed Shepp–Logan phantom from its Radon
was discussed in Yarman and Yazici (2005d). For visual comparison, FBP reconstructed images are shown in Figure 4b. The details of the reconstructed images are shown in Figure 5. These results suggest that the proposed reconstruction algorithm produces details at least as good as that of the FBP algorithm.
292
YAZICI AND YARMAN
(a)
(b)
F IGURE 6. Reconstruction of the modified Shepp–Logan phantom from its exponential Radon transform using the proposed algorithm for σ = 10−10 . Reconstructed images for (a) μ = 0.154 cm−1 and (b) μ = i0.154 cm−1 .
2. Exponential Radon Transform with Uniform Attenuation Numerical simulations are performed on a two-dimensional modified Shepp– Logan phantom image corresponding to a region of 13.1 × 13.1 cm2 , discretized by 129 × 129 pixels. The projections of the phantom are taken from 129 equally spaced angles over 2π and 129 parallel lines for each angle. The Fourier transform over M(2) was numerically implemented as previously described. All numerical implementations were performed using MATLAB. The regularization factor σ is set to 10−10 . Taking μ = 0.154 cm−1 and μ = i0.154 cm−1 , the reconstructed images using the proposed algorithm is presented in Figure 6. An extensive study on the proposed reconstruction algorithm and numerical experiments can be found in Yarman and Yazici (2005b). The numerical simulations demonstrate the applicability and the performance of the proposed inversion algorithm. Note that further improvements in reconstruction can be achieved by improving the numerical implementation of the Fourier transform over M(2).
VII. C ONCLUSION In this chapter, we introduced a MMSE solution for the deconvolution problems formulated over groups using the group representation theory and the concept of group stationarity. We used these concepts to address the
DECONVOLUTION OVER GROUPS
293
receiver and waveform design problems in wideband extended range-Doppler imaging and the inversion of the Radon and exponential Radon transforms for the transmission and emission tomography. We treated the wideband radar/sonar echo signal as the Fourier transform of the range-Doppler extended target reflectivity function with respect to the affine group evaluated at the transmitted pulse. The clutter filtering and target reconstruction naturally couples with the design of transmitted pulses. We developed a Wiener filtering method in the Fourier transform of the affine group to remove clutter. This treatment leads to a framework that simultaneously addresses multiple problems, including joint design of transmission and reception strategy, suppression of clutter, and use of a priori information. We presented convolution representation of the Radon and exponential Radon transforms. The convolution representations are block diagonalized in the Fourier domain of the Euclidean motion group. Due to the rotation invariance properties of the Fourier transform of the unknown image over the Euclidean motion group, the block diagonal representation is further simplified to a diagonal form. We introduced a new algorithm for the inversion of these transforms and demonstrated their performance in numerical examples. The fundamental results introduced here are applicable to other imaging problems that can be formulated as convolutions over groups. Such problems include inverse rendering (Ramamoorthi and Hanrahan, 2001), omnidirectional image processing (Makadia et al., 2005), and inversion of other integral transforms of transmission and emission tomography (Yarman and Yazici, 2005e).
ACKNOWLEDGMENTS This material is based partly on research sponsored by the U.S. Air Force Research Laboratory, under agreement No. FA9550-04-1-0223. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright notation thereon.
A PPENDIX A Definitions Definitions of the basic concepts used in this chapter are provided for readers’ convenience. Detailed discussions and rigorous treatment of these concepts can be found in Milies and Sehgal (2002), Groove (1997), and Onishchik (1993).
294
YAZICI AND YARMAN
Definition A1.1 • Let G be a group, F a field, and V a vector field over F . A representation of a group G is a homomorphism ρ from G into the group of automorphisms of V , denoted by GL(V ), that is, ρ : G → GL(V ), such that ρ(g) = ρg . • If V is a n-dimensional vector space over F , then for a fixed basis of V there is an isomorphism ϕ from GL(V ) into GL(n, F ). Therefore, ϕ induces another representation (ϕ ◦ ρ) of G into GL(n, F ), which is called a matrix representation. Any representation of G into GL(V ) is equivalent to a representation into GL(n, F ) and vice versa. The integer n is called the degree of ρ. Definition A1.2 • Let W be a subspace of V . If for all elements g ∈ G, ρg v is again in W (ρg W ⊂ W ), then W is said to be invariant under ρ or, equivalently, ρ-invariant. If V is nonempty and has no proper ρ-invariant subspace W , then the representation is said to be irreducible, else reducible. • A group G is called a topological group if G is a topological space satisfying the Hausdorff separation axiom and the mapping (x, y) → xy −1 is a continuous mapping from G × G into G. • Let G be a topological group. A unitary representation of G over a Hilbert space H is a strongly continuous homomorphism U from G into the group of unitary operators of H , U (H ). H is called the representation space of U and denoted by H (U ). The dimension of H (U ) is the called the degree of U . • Let W be a subspace of a representation space H (U ) of a unitary representation U . Then W is said to be invariant under U if U (g)W ⊂ W for all g ∈ G. A unitary representation U is called irreducible if H (U ) is nonempty and has no proper subspace invariant under U . • Let G be a locally compact topological group and let H (R) = L2 (G, dg) be the Hilbert space of square-integrable functions on G with respect to right Haar measure on G. Let f be a function in H (R). Define the operator R on H (R) by [R(g)f ](h) = f (hg). Then R is a unitary representation of G and is called the right regular representation of G. Similarly, the left regular representation L is defined by [L(g)f ](h) = f (g −1 h). Definition A1.3 • Let G be a group, and let K be a subgroup of G. Given an element g ∈ G, the subsets of the form gK = {gk: k ∈ K},
Kg = {kg: k ∈ K}
DECONVOLUTION OVER GROUPS
295
are called the left and right cosets of the subgroup K determined by g. The equivalence class of the cosets are denoted by G/K and K\G, respectively. • G acts on the set K\G by right multiplications, and since it is an automorphism of the group K\G, it also induces automorphism over the representations of K\G. Hence, this automorphism induces a representation of G over the complex-valued function over K\G called the quasi-right regular representation.
A PPENDIX B Distributions and Fourier Transform Over M(2) Let D (M(2)) denote the space of compactly supported functions on M(2); and S (M(2)) denote the space of rapidly decreasing functions on M(2). The Fourier transform over M(2) can be extended to D (M(2)) and S (M(2)), on M(2), and is injective (Sugiura, 1975). Let D (M(2)) and S (M(2)) denote the space of linear functionals over D (M(2)) and S (M(2)), respectively. D (M(2)) and S (M(2)) are called the space of distributions and tempered
distributions over M(2) and S (M(2)) ⊂ D (M(2)). Let u ∈ D (M(2)) and ϕ ∈ D (M(2)). The value u(ϕ) is denoted by u, ϕ or M(2) u(g)ϕ(g) dg, similarly for u ∈ S (M(2)). Let ϕ ∈ S (M(2)) and u ∈ S (M(2)). The Fourier transform uˆ of u over M(2) is defined by u, ˆ ! ϕ = u, ϕ. Let u and v be two distributions, at least one of which has compact support. Then the convolution of u and v is a distribution that can be computed using either of the following u ∗ v, ϕ = u(h), v(g), ϕ(hg) = v(g), u(h), ϕ(hg) . (B.1) If either of u or v is a tempered distribution and the other is compactly supported, then u ∗ v is a tempered distribution. Without loss of generality, assume that u is compactly supported and v ∈ S (M(2)). Then uˆ can be computed by (B.2) uˆ = u(g), Umn g −1 , λ . Using Eqs. (B.1) and (B.2), the Fourier transform of the convolution u∗v over M(2) is obtained to be F (u ∗ v) = vˆmk (λ)uˆ kn (λ). (B.3) k
296
YAZICI AND YARMAN
R EFERENCES Abramowitz, M., Stegun, I.A. (1972). Orthogonal Polynomials. Dover, New York. Chap. 22, pp. 771–802. Artin, M. (1991). Algebra. Prentice-Hall, Englewood Cliffs, NJ. Barrett, H.H. (1984). The Radon transform and its applications. Progress in Optics 21, 271–286. Bellini, S., Piancentini, M., Cafforio, C., Rocca, F. (1979). Compensation of tissue absorption in emission tomography. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-27, 213–218. Blahut, R.E. (1991). Algebraic Methods for Signal Processing and Communications Coding. Springer-Verlag, New York. Chirikjian, G.S., Ebert-Uphoff, I. (1998). Numerical convolution on the Euclidean group with applications to workspace generation. IEEE Trans. Robotics Automat. 14, 123–136. Cook, C.E., Bernfeld, M. (1967). Radar Signals. Academic, New York. Cormack, A.M. (1963). Representation of a function by its line integrals with some radiological applications. Journal of Applied Physics 34, 2722–2727. Cormack, A.M. (1964). Representation of a function by its line integrals with some radiological applications II. Journal of Applied Physics 35, 2908– 2913. Deans, S.R. (1983). The Radon Transform and Some of Its Applications. Wiley, New York. Diaconis, P. (1988). Group Representations in Probability and Statistics. Institute of Mathematical Statistics Monograph, vol. 11. Inst. Math. Statist., Hayward, CA. Duflo, M., Moore, C.C. (1976). On the regular representation of a nonunimodular locally compact groups. J. Funct. Anal. 21, 209–243. Ebert-Uphoff, I., Chirikjian, G.S. (1996). Inverse kinematics of discretely actuated hyper redundant manipulators using workspace density. In: Proc. of IEEE Int. Conf. Robotics and Automation, Minneapolis, MN, pp. 139– 145. Ferraro, M. (1992). Invariant pattern representations and Lie group theory. In: Advances in Electronics and Electron Physics, vol. 84. Academic, New York, pp. 131–196. Gelfand, I.M., Graev, M.I., Vilenkin, N.J. (1966). Generalized Functions, Integral Geometry and Representation Theory, vol. 5. Academic Press, New York. Gelfand, I.M., Shilov, G.E. (1964). Generalized Functions, Properties and Operations, vol. 1. Academic Press, New York. Groove, L.C. (1997). Groups and Characters. Wiley, New York.
DECONVOLUTION OVER GROUPS
297
Hannan, E.J. (1965). Group Representations and Applied Probability. Methuens Review Series in Applied Probability, vol. 3. Methuen & Co., Ltd., London, U.K. Hawkins, G.W., Leichner, P.K., Yang, N.-C. (1988). The circular harmonic transform for SPECT reconstruction and boundary conditions on the Fourier transform on the sinogram. IEEE Transactions on Medical Imaging 7, 135–148. Helgason, S. (1999). The Radon Transform, 2nd ed. Birkhäuser, Boston, MA. Helgason, S. (2000). Groups and Geometric Analysis: Integral Geometry Invariant Differential Operators and Spherical Functions. American Mathematical Society, Providence, RI. Inouye, T., Kose, K., Hasegawa, A. (1989). Image reconstruction algorithm for single-photon-emission computed tomography with uniform attenuation. Phys. Med. Biol. 34, 299–304. Kanatani, K. (1990). Group Theoretical Methods in Image Understanding. Springer-Verlag. Kuchment, P., Shneiberg, I. (1994). Some inversion formulas in the single photon emission computed tomography. Applicable Analysis 53, 221–231. Kyatkin, A.B., Chirikjian, G.S. (1998). Regularized solution of a nonlinear convolution equation on the Euclidean group. Acta Applicandae Mathematicae 53, 89–123. Kyatkin, A., Chirikjian, G. (2000). Algorithms for fast convolutions on motion groups. Applied Computational Harmonic Analysis 9, 220–241. Lenz, R. (1990). Group Theoretical Methods in Image Processing. Lecture Notes in Computer Science. Springer-Verlag, Berlin, Germany. Makadia, A., Geyer, C., Daniilidis, K. (2005). Radon-based structure from motion without correspondences. In: Proceedings of CVPR 2005. Metz, C., Pan, X. (1995). A unified analysis of exact methods of inverting the 2-D exponential Radon transform, with implications for noise control in SPECT. IEEE Transactions on Medical Imaging 14, 643–658. Milies, C.P., Sehgal, S.K. (2002). An Introduction to Group and Rings. Kluwer, Norwell, MA. Miller, W. (1991). Topics in harmonic analysis with applications to radar and sonar. In: Blahut, R.E., Miller, W., Wilcox, C.H. (Eds.), Radar and Sonar. Part I, IMA Volumes in Mathematics and Its Applications. Springer-Verlag, New York, pp. 66–168. Naimark, M.A. (1959). Normed Rings. Noordhoff N.V., Groningen, The Netherlands. Naparst, H. (1991). Dense target signal processing. IEEE Trans. Inform. Theory 37, 317–327. Nathanson, F., Reilly, J., Cohen, M. (1999). Radar Design Principles—Signal Processing and the Environment. SciTech Publishing.
298
YAZICI AND YARMAN
Natterer, F. (1986). The Mathematics of Computerized Tomography. Wiley– Teubner, New York. Onishchik, A.L. (1993). Lie Groups and Lie Algebras I. Springer-Verlag, New York. Pintsov, D.A. (1989). Invariant pattern recognition, symmetry and the Radon transforms. J. Opt. Soc. Amer. 6, 1545–1554. Popplestone, R. (1984). Group theory in robotics. In: Brady, M., Paul, R. (Eds.), Proc. Robotics Research: The 1st Int. Symp. MIT Press, Cambridge, MA. Puro, A. (2001). Cormack-type inversion of exponential Radon transform. Inverse Problems 17, 179–188. Radon, J. (1917). Über die Bestimmung von Funktionen durch ihre Integralwerte längs gewisser Männigfaltigkeiten. Berichte Sächsische Akademie der Wissenschaften Leipzig. Math.-Phys. Kl. 69, 262–267. Ramamoorthi, R., Hanrahan, P. (2001). A signal processing framework for inverse rendering. In: Proc. 28th Annu. Conf. Computer Graphics and Interactive Techniques (SIGGRAPH), pp. 117–128. Rebollo-Neira, L., Pastino, A., Fernandez-Rubio, J. (1997). Frames: A maximum entropy statistical estimate of the inverse problem. Journal of Mathematical Physics 38, 4863–4871. Rebollo-Neira, L., Pastino, A., Fernandez-Rubio, J. (2000). Reconstruction of the joint time-delay Doppler-scale reflectivity density in the wideband regime: A frame theory based approach. Journal of Mathematical Physics 41, 5325–5341. Rouviére, F. (2001). Inverting Radon transforms: The group theoretic approach. L’Enseignement Mathématique 47, 205–252. Sattinger, D.H., Weaver, O.L. (1986). Lie Groups and Algebras with Applications to Physics, Geometry and Mechanics. Springer-Verlag, New York. Seeley, R.T. (1966). Spherical harmonics. The American Mathematical Monthly 73, 115–121. Srivastava, H.M., Buschman, R.G. (1977). Convolution Equation, with Special Function Kernels. Wiley, New York. Strichartz, R.S. (1981). Lp estimates for Radon transforms in Euclidean and non-Euclidean spaces. Duke Mathematical Journal 48, 699–727. Sugiura, M. (1975). Unitary Representations and Harmonic Analysis. Kodansha, Tokyo. Swick, D.A. (1969). A review of wideband ambiguity functions. NRL Rep. 6994. Tech. Rep., Naval Research Laboratory, Washington, DC. Taylor, J. (1995). Introduction to Ultra-wideband Radar Systems. CRC Press, Florida. Tewfik, A.H. (1987). Recursive Estimation and Spectral Estimation for 2-d Isotropic Random Fields. Ph.D. thesis, MIT, Cambridge, MA.
DECONVOLUTION OVER GROUPS
299
Tikhonov, A.N., Arsenin, V.Y. (1977). Solutions of Ill-Posed Problems. Wiley, New York. Tretiak, O., Metz, C. (1980). The exponential Radon transform. SIAM J. Appl. Math. 39, 341–354. Vilenkin, N.J. (1988). Special Functions and the Theory of Representations. American Mathematical Society, Providence, RI. Volchkov, V. (2003). Integral Geometry and Convolution Equations. Kluwer Academic Publishers, Dordrecht; Boston, MA. Weiss, L.G. (1994). Wavelets and wideband correlation processing. IEEE Signal Processing Mag. 11, 13–32. Yadrenko, M.I. (1983). Spectral Theory of Random Fields. Translation Series in Mathematics and Engineering. Optimization Software Inc./SpringerVerlag, New York. Yaglom, A.M. (1961). Second order homogeneous random fields. In: Proc. 4th Berkley Symp. Mathematical Statistics and Probability, vol. 2. Univ. Calif. Press, Berkeley, CA, pp. 593–622. Yarman, C., Yazici, B. (2003). Radon transform inversion via Wiener filtering over the Euclidean motion group. In: Proceedings of IEEE International Conference on Image Processing 2003, vol. 2, Barcelona, Spain, pp. 811– 814. Yarman, C.E., Yazici, B. (2005a). Exponential Radon transform inversion based on harmonic analysis of the Euclidean motion group. In: Proceedings of IEEE International Conference on Image Processing 2005. Yarman, C.E., Yazici, B. (2005b). Exponential Radon transform inversion based using harmonic analysis of the Euclidean motion group. Preprint. Yarman, C.E., Yazici, B. (2005c). Radon transform inversion based on harmonic analysis of the Euclidean motion group. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing 2005. Yarman, C.E., Yazici, B. (2005d). Radon transform inversion based on the harmonic analysis of the Euclidean motion group. Preprint. Yarman, C.E., Yazici, B. (2005e). Reconstructive integral transforms of transmission and emission tomography as convolutions over the Euclidean motion group. Preprint. Yazici, B. (1997). Group invariant methods in signal processing. In: Proc. Conf. Information Sciences and Systems, Baltimore, MD, pp. 185–189. Yazici, B. (2004). Stochastic deconvolution over groups. IEEE Transactions on Information Theory 50, 494–510. Yazici, B., Izzetoglu, M. (2002). Optimal Wiener filtering for self-similar processes. In: Proc. of Int. Conf. Acoustics Speech and Signal Processing, vol. 2, pp. 1697–1700.
300
YAZICI AND YARMAN
Yazici, B., Izzetoglu, M., Onaral, B., Bilgutay, N. (2001). Kalman filtering for self-similar processes. In: Proc. IEEE 11th Int. Workshop on Statistical Signal Processing, pp. 82–85. Yazici, B., Kashyap, R.L. (1997). Second order stationary, self-similar models for 1/f processes. IEEE Trans. Signal Processing 45, 396–410. Yazici, B., Xie, G. (2005). Wideband extended range-Doppler imaging and diversity waveform design. IEEE Transactions on Information Theory, submitted for publication.
Index
A
B
A posteriori correction, in diffraction-limited imaging, 15 A posteriori estimators, joint maximum, 17 Aberrated phase, 29f Aberration estimates, 32f, 50f CONICA, 49–50 RMSE of, 36f Aberrations estimation of, 41–42 hyperparameter tuning, 22 intrinsic, 5–6 Absolute degree of incidence, 107 Accumulating generation operator (AGO), 100 Adaptive optics (AO), 3, 5 Adjoint property, 284 Affine group, 259 Fourier theory of, 271–272 Fourier transforms of, 272–273 After-event control, 167 Algorithms numerical, 290 reconstruction, 290 Analog-to-digital converters (ADC), 182 Analytic representation, 94 Anisotropy energy, 177 magnetocrystalline, 177 Apodization, extended objects and, 26–27 Assignment problems balanced, 157 of grey prediction, 157–159 Astronomical observation, from ground, 6–7 Atmospheric turbulence, 6–7 Atomic force microscopes (AFM), 180, 229, 234, 235, 238 Averaging operators, 103 Axiom of analytic representation, 94 Axiom of fixed points, 94 Axiom on sufficient usage of information, 94
Balanced assignment problems, 157 Basal surface, domain structure at, 194 Behavioral criterion sequences, 102 Behavioral horizontal sequences, 102 Behavioral sequences, 102 Behavioral time sequences, 102 Bias, of phase estimates, 31f Bitter pattern technique, 180, 204–224, 243 advantages of, 206–207 cobalt magnetic microstructure images obtained by, 215f cobalt monocrystal images using, 210f contrast of, 208–209 diffraction limitations of, 216–217 domain structure study using, 182–183 garnet images using, 211f magnetizing cycle and, 209 permalloy film images using, 213f separating image contributions, 209–210 stray field probing of, 206 thin magnetic films in, 212–213 Black numbers, 85 Blank ends, 93 Bloch type domain walls, 216 BRISE, 65–66 schematic view of, 65f Buffer operators, 94
C Calculus, generalized to time series, 97–101 Catastrophe sequences date, 137 lower, 137 seasonal, 138 seasonal date, 138 upper, 137 Charge-coupled devices, 10, 183 Circulant approximation, 20–21 Circular harmonics, 283
301
302 Closed domain structures, 177 Closed-loop compensation, in NAOSCONICA, 52–53 Closed-loop control systems, 161 Closure domains, 178 Cluster coefficient matrix, 119 Cluster coefficient of variable weight, 119 Cluster coefficient vector, 119 Cluster of r incidences, 117 Clustering, grey, 114–130 methods of, 117–123 practical situations involving, 115–117 Cobalt, 188, 189 colloid-SEM technique image of, 217f images recorded at basal surface of, 200f Bitter pattern, 205f, 210f magnetic microstructure of, 215f magnetizing cycle and, 208f type I magnetic contrast images of, 189f, 190f Cobalt monocrystals, magnetic domains, 189f at basal surface, 194, 203f Bitter pattern technique images of, 205f during heating cycle, 197–198 Collective decision making, 149–150 Colloid-SEM method, 216–217, 220–221 of cobalt monocrystals, 217f CONICA stand-alone. See also NAOSCONICA aberrations, 51f calibration of aberration estimation of, 49–50 filters and objectives, 50 phase diversity setup of, 49 Conjugate gradient methods, 44 Constraint conditions, 153 Constraints vector, grey, 154 ε-contour line, 138–139 ε-contour moments, 139 ε-contour point, 139 Contrast improvement, 192 Control matrix, 162 Control of not completely known systems, 161–169 Control systems closed-loop, 161 controllable, 162 observable, 162 open-loop, 161
INDEX Control with abandonment, 166 Convolution, on groups, 260–261 Convolution property, 284 Cophasing, of multiaperture telescopes, 63–68 Cophasing sensors (CS), 64 Countermeasure sets, 144 quasi-optimum, 147 superior class of, 145 Critical values, defining, 118 Crystals, uniaxial, 177
D Decision making, based on incomplete information, 143–152 collective, 149–150 incidence analysis and, 147–149 predicted future, 148–149 test of applications, 150–152 uncertain targets and, 145–147 Decision vector, 154 Deconvolution, problem, 258–259 Defocus distance, choosing, 46–47 Deformable mirrors (DM), 48, 66 Delta functions, in unwrapped phase estimation, 56 Demagnetized state, 223 Density function reflectivity, 269 wideband reflectivity, 269 Detectors, in SEs, 199 Differential piston, 66 Diffraction, scalar theory of, 4 Diffraction-limited imaging, 14–16 a posteriori correction, 15 real-time wave-front correction, 16 Digital image processing (DIP), 182, 199, 209, 212, 241, 242 Disaster predictions, 136–137 seasonal, 137–138 Discrete Fourier transform (DFT), 20 Discrete image model, 10 Domain alignment, magnetic anisotropy and, 216 Domain walls, 176 alignment of, 216 Bloch type, 216 energy, 228t thickness, 228t
303
INDEX Doppler imaging, wideband extended range, 268–279 Drive, 164 Duals of groups, 262
inverse, 284 over M(2), 282–285, 289–290 Fresnel mode, 181 of TEM, 213, 214 Fuzzy uncertainty, 81–82
E Earth observation, from space, 7 Efficiency matrix, 157 Entropy, Shannon (information), 90 Equal weight cluster coefficient, 119 Error metric, 59 Error-afterward control, 167 Error-on-time control, 167 Error-prediction control, 167 Evaluations, 114–130 Event ag , superior class of, 145 Events, quasi-optimum, 147 Everhart–Thornley type detectors, 183 Expanded values, 128f Exponential radon transforms, 259, 279–292 angle-dependent, 286–287 as convolutions, 285–289 with uniform attenuation, 292 Exponentiality. See Law of exponentiality Extended object low turbulence levels, 60–61 medium turbulence levels, 61–62 strong turbulence levels, 62 Extended objects, 26–28 apodization and, 26–28 guard band, 27–28
F Fast Fourier transforms (FFT), 16, 192 Favorability, 110–111 Ferromagnets, 175–176 domains in, 178–179 uniaxial, domain structure of, 195f Filters, calibration of CONICA, 50 Fixed points, 94 Fixed weight cluster coefficient, 119 Focal-plane sensors, 3–4 Fourier analysis, on groups, 262–263 Fourier domain expression in, 20–21, 25 inverse, 267 Fourier theory, of Affine group, 271–272 Fourier transforms, 262 of Affine group, 272–273
G Garnets, Bitter pattern technique images of, 211f Generalized maximum likelihood (GML) estimation, 17, 30 Gerchberg–Saxton algorithm, 43 GM(0, N ) model, 132–135 GM(1, 1) model, 130–132 GM(1, N ) model, 132–135 Grains, misaligned, 223f Grey 0-1 programming, 158 Grey classes threshold values for, 127 whitenization weight functions for, 129t, 151f Grey cluster, 114–130 methods of, 117–123 practical situations involving, 115–117 Grey component, 163 Grey constraints vector, 154 Grey consumption matrix, 154 Grey control, 161–169 test of applications of, 167–169 transfer systems in, 163–167 Grey differential equations, 133 Grey gain matrix, 165 Grey incidences absolute degree of, 106 control of, 166 degrees of, 104–110 order, 109 relative degree of, 107 synthetic degree of, 108 Grey linear models, properties of solutions, 155–157 Grey link, 163 Grey nonlinear programming, 159 Grey numbers axiomatic system for greyness, 88 combinations of, 91 continuous, 85 degree of greyness, 87–91 discrete, 85
304 essential, 85–87 information content of, 90–91 interval, 85 arithmetic of, 87 with lower limits, 84–85 nonessential, 85–87 with upper limits, 85 Grey parameters, linear programming with, 154 Grey prediction assignment problems of, 157–159 control of, 166, 167 linear programming problems of, 155 Grey price vector, 154 Grey sequences, with abnormal behaviors, 92–101 missing entries, 92–94 shock waves, 94–97 Grey statistics, 123–130 Grey structure matrix, 165 Grey systems theory, 78 appearance of, 83–84 fundamentals of, 84–91 grey numbers, 84–87 incidence analysis, 101–114 not completely known systems, 161–169 Grey targets, 146 s-dimensional spherical, 147 Grey transfer function, 163 Grey transfer matrix, 165 Grey uncertainty, 80 Group stationary processes, 263–266 Groups convolution on, 260–261 Fourier analysis on, 262–263 stationary processes, 263–266 Wiener filtering over, 266–268 Guard band, extended objects and, 27–28 joint estimator, 27 marginal estimator, 27–28
H Haar measures left, 260 right, 260 Hanning windows 1D, 27f 1D modified, 27f Heisenberg group, 259
INDEX Hybrid estimation, 42f Hybrid method, for object restoration, 40–42 principle, 40 results of, 41–42 steps of, 40–41 Hyperparameters estimation of, 41 joint estimation and, 32–34 marginal method and, 39 properties of estimators for, 29–32 tuning of, 21–22, 58–59 aberrations, 22 noise, 21 unsupervised estimation of, 25–26
I Image centering, in phase diversity implementation, 47 Image formation, wave front sensor, 4–10 Image simulation coefficient values used for, 28t properties of, 28–39 Imagined optimum effect vector, 148 Incidence analysis, decision making and, 147–149 Incidence operators, 103 absolute, 106–107 degrees of, 109 grey, 104–110 relative, 107 synthetic, 108 Incidence order, 109 Incomplete information, decision making based on, 143–152 Independent zeroes, 159 Index systems, for evaluating regional strength, 126t Information, 78 incomplete, 143–152 Information content, of grey numbers, 90–91 Initial image, 102 Initialing operators, 102 Integration, 98 Interaction domains, 238 Interval operators, 103 Interval predictions, 135–136 Invariance, 258 Inverse problems, 11–12 Inversion methods, 287–289
305
INDEX
J J -criterion, 118, 124 Joint estimators, 17–22, 37f circulant approximation, 20–21 criterion, 17–20 expression of o, ˆ 20 marginal criteria and, 24 noise, 18 object prior probability distribution, 18 phase prior probability distribution, 18–19 expression in Fourier domain, 20–21 guard band, 27 hyperparameters and, 32–34 performance of, 34f RMSE plots for, 33f tuning of hyperparameters, 21–22 Joint maximum a posteriori (JMAP) estimators, 17, 30 Joint maximum likelihood (JML), 17 Joint method, for object restoration, 37
K Kittel model of open domain configuration, 195 Kolmogorov model, 19
L Landau–Lifshitz model, of closed domain configuration, 195 Laplace transformations, 163, 165 Large aberration estimation methods, 55–59 simulation results, 59–63 data generation, 59 error metric, 59 extended object, 60–61 point source, 60 unwrapped phase estimation, 55–56 Delta functions, 56 Zernike polynomials, 55–56 wrapped phase estimation, 56–59 Law of exponentiality, 100, 130–143 GM(0, N ) modeling and, 132–135 GM(1, 1) modeling and, 130–132 GM(1, N ) modeling and, 132–135 Law of negative grey exponent, 100 Law of positive grey exponent, 100 Left Haar measures, 260
Left regular representations, 261 Levernberg–Marquardt method, 46 Likelihood terms for pupil phase, 55f symbolic ID representation of, 54f Linear programming, 152 grey linear models, 155–157 grey prediction type problems, 155 with uncertain parameters, 153–155 Linear programming with grey parameters (LPGP), 154 critical model of, 156 ideal model of, 156 solutions of, 155–157 Linear regression models, 134–135 Line-search methods of optimization, 44–45 step size rules in, 45 Lorentz force, 185 Lower bounds, 135 Lower catastrophe sequence, 137
M Magnet. See Ferromagnets; Permanent magnets Magnetic anisotropy of bulk cobalt monocrystal, 186f domain alignment and, 216 domain structure and, 219f orthorhombic, 186 perpendicular, 217–218 uniaxial, 186 Magnetic contrast method, type I, 182 Magnetic domains, 175 characters of, 194–195 closed structures, 177 of cobalt monocrystals, 189f at basal surface, 194, 203f Bitter pattern technique images of, 205f during heating cycle, 197–198 digital processing of images, 193f experimental imaging of, 180–183 formation, 177f imaging techniques, 179–180 Kittel model of, 195 Landau–Lifshitz model of, 195 of uniaxial ferromagnets, 195f width of, 192, 193f, 221, 226–227 Magnetic films, thin, 212–213
306 Magnetic force microscopes (MFM), 180, 183, 224–240, 230f, 235 cobalt domain investigations, 238–239 high-resolution images, 229f, 230f, 231f images, 225f, 228f, 232, 233f main domain widths of, 226–227 observed dependence of, 240 spatial resolution of, 237 stray field detection, 234–235 surface domains in, 239–240 surface of, 226–227 two-pass method, 224f Magnetic properties, at room temperature, 181t Magnetite, 175 Magnetization easy axes, 177 saturation, 175 Magnetizing cycle Bitter pattern method and, 209 effects of, on cobalt, 207f, 208f Magnetocrystalline anisotropy energy, 177 Magnetoelastic energy, 178 Magneto-optic Kerr microscopy, 218 Magnetostriction, 178 MAP. See Maximum a posteriori estimator Marginal criterion, 24 joint criteria and, 24 Marginal estimators, 22–23, 39f determinant of R1−1 , 23
expression of R1−1 , 23 guard band, 27–28 unsupervised estimation, 34 Marginal method, object restoration, 38–39 hyperparameters in, 39 principle of, 38–39 results of, 39 Masking, unsharp, 205 Materialized values, 116t Maximum a posteriori (MAP) estimator, 22, 41, 58 Mean slopes, 104 Mean square error (MSE), 278, 280f MFM. See Magnetic force microscopes MikroMasch silicon antilevers, 183 Misaligned grains, 223f Missing entries, in grey sequences, 92–94 MMSE, 275–276 Modular function, 260 Molecular fields, 176
INDEX Monocrystals. See Cobalt monocrystals, magnetic domains Monoenergetic model, 185 Monolithic aperture telescope calibration, 13–14 ground-based telescopes, 13–14 space-based telescopes, 13–14 Monotonic decreasing, 96 Monotonic increasing, 96 Most favorable characteristic, 110 Most favorable factor, 110 Multiple aperture instruments, 10 cophasing of, 14, 63–68
N Nanocomposite exchange-coupled magnets, 237–238 NAOS dichroics aberration calibration, 51 phase-diversity setup, 51 NAOS-CONICA calibration of, 46–53 closed-loop compensation, 52–53 instrument of, 48–49 simplified outline of, 48f Narrowband reflectivity density function, 269 Nd-Fe-B-based permanent magnets, 201, 218, 221, 233–234, 237 anisotropic sintered, 227 image signal for, 204 Near null regularization, 33 Negative grey exponent, 100 Newton’s method, 44–45 Noise hyperparameter tuning, 21 as joint criterion, 18 Nongovernmental enterprises, 112t Nonlinear programming, 152, 159–161 grey, 159 NT-MDT instrument, 183 Numerical algorithms, 289–290 Numerical experiments, 277–278 Numerical simulations, 290–292
O Object prior probability distribution, as joint criterion, 18 Objective functions, 153 Objective structure matrix, 166
INDEX Objects, restoration of, 36–42 On-time control, 167 Open-loop control systems, 161 Optical transfer function (OTF), 5 long-term exposure turbulent, 7 Optimization methods, 42–46 line-search methods, 44–45 search direction strategies, 44–45 trust-region, 45–46 Output equation, 162 Output matrix, 162
P Permalloy films Bitter pattern technique images of, 213f overfocused images of, 214f underfocused images of, 214f Permanent magnets applications of, 218–219 nd-Fe-B-based, 201, 204, 218, 221, 227, 233–234, 237 Phase diversity, 4 applications of, 12–16 diffraction-limited imaging, 14–16 basics of, 11–12 inverse problems, 11–12 phase estimation uniqueness, 11 CONICA setup, 49–50 emerging applications of, 63–68 experimental results, 65–68 emerging methods of, 53–63 problem statement, 53–54 NAOS dichroics, 51 in NAOS-CONICA calibration, 46–53 object restoration, 36–42 optimization methods, 42–46 practical implementation of, 46–48 defocus distance choice, 46–47 image centering, 47 spectral bandwidth, 47–48 Phase estimation bias of, 31f methods, 16–28 circulant approximation, 20 extended objects, 26–28 Fourier domain expression, 20–21 joint estimators, 17–22 marginal criterion, 24 marginal estimators, 22–24
307 properties of methods of, 28–36 asymptotic, of estimators for hyperparameters, 29–32 image simulation, 28–39 joint estimation and hyperparameters, 32–34 performance comparisons, 35 unsupervised marginal estimation, 34 RMSE of, 31f standard deviation of, 31f turbulent, 60f, 61f, 62f, 63f uniqueness of, 11–12 Phase parameterization, 8–10 Phase prior probability distribution, as joint criterion, 18 Phase regularization function, 57–58 Phase unwrapping, 54 Phase-diverse speckle, 15 Phase-retrieval method, 4 Pistons, 64 at high photon levels, 67f Plancherel measure, 262, 284 Pleased degrees, 157 Point source, data generation, 60 Point source problems, 43 Point spread function (PSF), 4 closed-loop compensation and, 52f degradations atmospheric turbulence, 6–7 intrinsic aberrations, 5–6 of telescope, 5 Positioned coefficients, 154 Positioned programming, 154 Positive grey exponent, 100 Power spectral density (PSD), 6 Prediction control, 167 Predictions, 130–143 decisions based on, 148–149 disaster, 136–137 interval, 135–136 seasonal disaster, 137–138 stock market-like, 138–139 systems, 139–141 test of applications, 141–142 time series, 135–140 Preferences, analysis of, 110–114 Price vector, grey, 154 Probability distribution object prior, 18 phase prior, 18
308 Programming grey 0-1, 158 linear, 152 nonlinear, 152 with uncertain parameters, 152–161 linear models, 153–155 Projection-based methods, of optimization, 42–44 PSF. See Point spread function Pupil-plane sensors, 3
Q Quasi-favorable characteristics, 110–111 Quasi-left regular representation, 261 Quasi-Newton methods, 45 Quasi-optimum countermeasures, 147 Quasi-optimum events, 147 Quasi-optimum situations, 147 Quasi-preferred characteristics, 111 Quasi-preferred factors, 111 Quasi-smooth sequences, 94 Quasi-static aberration correction cophasing of multiple aperture telescopes, 14 of optical telescopes, 13–16 monolithic aperture calibration, 13–14
R Radon transforms, 259, 279–292 as convolutions, 285–289 exponential, 259, 279–292 numerical simulations of, 290–291 Range-doppler echo model, 270f Rate of change, 107 Real-time wave-front correction, 16 Receiver design problem, 270, 274–275 Reciprocating operators, 103 Reflectivity density function, 269 narrowband, 269 wideband, 269 Regional strength, index system for evaluating, 126t Regular representations, 261 left, 261 quasi, 261 right, 261 Regularized criterion, construction of, 57 Relative degree of incidence, 107 response, 164
INDEX Restoration of object, 36–42 hybrid method for, 40–42 joint method for, 37 marginal method for, 38–39 Reverse domains, 187 Reverse spikes, 187 surface, 225 Reversing operators, 103 Right Haar measures, 260 Right regular representations, 261 RMSE. See Root mean square error Root mean square error (RMSE), 30 of aberrations estimates, 36f for joint phase estimates, 33f of phase estimates, 31f Rough uncertainty, 82 Runoff amounts, 142f, 143f
S Saturation magnetization, 175 Scalar theory of diffraction, 4 Scale stationary processes, 265 Scanning electron microscopes (SEM), 180 in domain structure investigations, 222–223 dried colloid method, 216–217, 220–221 secondary electron mode of, 187 type I magnetic contrast method using, 182, 183–204, 243 of cobalt, 189f, 190f detectors in, 199 disadvantages of, 202–203 image quality of, 188, 189f improvements of, 204 main domains in, 187–188 mechanism of, 185–186 in Nd-Fe-B-based permanent magnets, 201 observation of, 188–189 poor-quality image from, 202f principle of, 184f quantitative interpretation of, 193–194 resolution limits of, 203–204 resolution of, 201–202 temperature changes in, 196f undesirable features of, 198–199 S-dimensional spherical grey target, 147 Search direction conjugate gradient methods, 44 Newton’s method, 44–45
309
INDEX quasi-Newton methods, 45 steepest descent method, 44 Seasonal catastrophe sequence, 138 Seasonal disaster predictions, 137–138 Second-order group stationary, 264 left, 264 right, 264 SEM. See Scanning electron microscopes SEMPA, 198, 218, 239 Sequence operators, 94 law of exponentiality and, 130–143 well-employed, 102–104 Sequences with abnormal behaviors, 92–101 Set of events of research, 144 Shack–Hartmann wave front sensors, 3 Shannon entropy, 90 Shepp–Logan phantom, reconstruction of, 291f Shift stationary processes, 265 Shock waves, sequences influenced by, 94–97 Signal to noise ratio (SNR), 35 Signal-to-clutter ratios (SCR), 278 Situations, 144 effect superior class of, 146 quasi-optimum, 147 Slopes, 104 SmCo5 , 235, 236–237 Soros reflexive uncertainty, 83 Spatial resolution, of type I magnetic contrast, 202–203 Specimen surface, 225 Spectral bandwidth, in phase diversity implementation, 47–48 Spectral decomposition, 258 Spectral density function, 265 Spectrum of functions, 262 SPLEEM, 198 Standard deviation, of phase estimates, 31f Star long exposure image of, 8f short exposure image of, 8f State matrix, 162 Stationary processes scale, 265 shift, 265 Statistical evaluations, grey, 114–130 Steepest descent method, 44 Step size rules, 45 Stochastic uncertainty, 79–80
Stock market-like predictions, 138–139 Stray fields Bitter patter technique and, 206 configuration of, 188f MFM detection of, 234–235 Strengthening operators, 95 Structural deviation matrix, 166 Sufficient decrease conditions, 45 Superior class of countermeasures, 145 Superior class of event ag , 145 Symmetry, 258 Synthetic degree of incidence, 108 Systems predictions, 139–141 models, 140 simulated sequences of, 140–141
T Target reflectivity estimation, 273–275, 279f Telescopes ground-based, monolithic-aperture calibration, 13–14 PSF of, 5 quasi-static aberration correction of, 13–16 cophasing of multiple-apertures, 14 monolithic aperture calibration, 13–14 space-based, monolithic-aperture calibration, 13 Temperature, SEM type I magnetic contrast and, 196–197 Time series calculus generalized to, 97–101 predictions, 135–140 Tip-specimen spacing, 239–240 Tip-tilt measurement, 64 Toeplitz-block-Toeplitz (TBT) matrices, 20 Trajectory contrast, 184 Transfer functions, in grey control, 163–167 Transmission electron microscopes (TEM), 180, 238 Fresnel mode of, 181, 213, 214 Trust-region methods, of optimization, 45–46 Turbulence diffraction-limited imaging through, 14–16 a posteriori correction, 15 real-time wave-front correction, 16 phase estimation, 60f, 61f, 62f, 63f Two-pass method, MFM, 224f
310
U Uncertain parameters, programming with, 152–161 Uncertain targets, decision making with, 145–147 Uncertainty, 78–79 fuzzy, 81–82 grey, 80 rough, 82 Soros reflexive, 83 stochastic, 79–80 types of, 79 Uniaxial crystals, 177 Uniaxial ferromagnets, domain structure of, 195f Unimodular function, 260 Unit jumps, 164 Unsharp masking, 205 Unwrapped phase estimation Delta functions, 56 Zernike polynomials, 55 Upper bounds, 135 Upper catastrophe sequence, 137 Useless predicted moments, 139
V
INDEX focal-plane, 3–4 pupil-plane, 3 Shack–Hartmann, 3 Waveform design problem, 270, 275–277 Weakening operators, 95 White noise, filtered, 266 White numbers, 85 Whitenization, 84–87 equal mean weight, 86 equal weight, 86 mean-value, 88 weight functions, 86f, 88, 118f for farming revenues, 120f of grey classes, 129t, 151f triangular, 124f, 126f, 127–128 Wideband extended range-doppler imaging, 268–279 Wideband reflectivity density function, 269 Wiener filtering, over groups, 266–268 Wrapped phase estimation, 56–59 hyperparameter tuning, 58–59 MAP estimation, 58 phase regularization function, 57–58 phase unwrapping, 59 regularized criterion, 57 Wrapping bands, 136 appearance of, 136f
Vibrational sequences, 96
W Wave front sensors discrete image model, 10 estimating, 12 image formation in, 4–10 phase parameterization, 8–9 types of, 3 curvature, 3
X X-ray magnetodichroic effects, 198
Z Zernike polynomials, 8–9 first, 9f in unwrapped phase estimation, 55–56 Zigzagged line, 104